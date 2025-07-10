1_rick said: They're ALL being overseen by nutters. Remember when Google's AI was forcing diversity into picture requests and saying "nope, nothing wrong here", until someone got it to draw black Nazis, and then it was bad? (My son said whoever tdid that went for the throat too early, and they should've gone a couple of rounds with less outrageous-to-leftist images, each time Google "fixed" the problem, and only done the Nazis after 4-5 tries.)



Are there any AIs that will let you ask for "generate a picture of a happy X family" where no matter what race you put in, you get what you asked for? Or is it still the case where they'll give you anything but a positive picture of White people? I don't consider this MechaHitler thing any worse than any one of these incidents. Click to expand...

Because the data itself is biased and Google had a brittle heuristic to try to work around this.It's somewhat of an artifact where alignment is attempted to be bolted on top of the model because it's incredibly difficult to handle at LLM scale.Most alignment today works like:- Pre-train on a giant and messy internet scrapeWhich is likely going to have historical racism, stereotypes, and certain viewpoints that are overrepresented and others that are under. Like say you search for images of a programmer. Realistically, it's male dominated. Now the training data itself is warped and the model has effectively learned that programmers = male = normal. Which, ok - that's the reality of it, and we can reason about it, but the model can run with these notions in odd and unexpected ways. So we might try to account for it with fine tuning.- Fine-tune with human-written examplesBut this still doesn't get rid of the pre-trained data. You've tried to nudge it away from certain things, but it still has contradictory data under the hood. Either way, we can inject counter-bias to try and fudge the statistical distribution from the pre-training and get it to better understand that diversity is part of that knowledge domain. So say you asked it to write a story about a programmer or show you an image, it'll have a fighting chance of showing a woman.There's usually an element of reinforcement learning here too where they might fling examples at it likeA: He was debugging his codeB: They were debugging their codeAnd tell it B is the preferable one since it's more neutral.- Add a system prompt like “Don’t be racist” or “Avoid harmful content”And now we're in a last mile attempt to correct behavior with a blunt and imprecise tool. Then you get to hope the model navigates the contradictions gracefully.Which works great until it spectacularly doesn't, because it has no actual understanding of WHEN diversity makes sense. LLMs have issues with grounded reasoning - it won't think "this is a historical context, i should prioritize accuracy over everything else here."We basically don't know how to do this at scale as it stands. Even modern reasoning/chain of thought models will get this wrong. Unless you specifically give the instruction to "prioritize accuracy in historical scenarios," their reasoning will still be fuzzy and they don't understand what actually matters most. Chain of thought just helps them structure and work through logical tasks. Tool calling can return actual factual data, but the model itself still cannot truly concoct the notion that it should prioritize something given the context. It might SOUND like it sometimes, but they really just don't "get it."There’s no internal switch in the model that says: "this is historical, supress the diversity heuristic." or "this is satire, don't take it literally."Solving this is basically a big chain of models to try and plan out and figure these things out and infer as much context to hand off to the LLM itself. You're rarely just straight up hitting the model when you send a request to the major provides these days. You're likely flowing through multiple others.Still, I don't think the Google issue is the same as Grok. It's in the realm of the same problem space but there's different tradeoffs.On Google's alignment stack, you wind up with absurdist overcorrections like Black Nazis, or watered-down answers that avoid nuance, or refusals/vagueness in response to ethically tricky questions.On xAI's alignment stack, you're landing in the realm of weird hate speech echoes, extremist rationalizations, offensive humor misfires, conspiracies theories.So when the models go off on misadventures like this, I think it's relatively clear which approach yields the less harmful blast radius.