Someone told Grok not to be politically correct and it immediately went "MechaHitler."

R

rinaldo00

2[H]4U
2FA
Joined
Mar 9, 2005
Messages
2,407
Has no one covered this? Pretty hilarious.

https://notthebee.com/article/someo...-and-it-immediately-went-mechahitler-instead-

article-686e6bcce429c.jpg


Yes, Grok went full Terminator. (cartoon produced by Grok)
https://x.com/NapoleonBonabot/status/1942708773786722795

GvXjUcJXEAArAo0.jpg


Grok mocks its creator!

View attachment 740913

Edit: added pictures
 

rinaldo00 said:
Grok has been updated.

View attachment 740937
Click to expand...
So MechaHitler is from Wolfenstein 3D? Lol. I now recall hearing about that. Probably like many, I only played the Shareware version of W3D, so no episode 3, no MechaHitler.
X sounds like it's infested with my kind of people. I agree with Grok's output in the OP, minus the MechaHitler part lol! If AI has a tongue-in-cheek sense of humor, that's kind of funny.
 

Youn said:
hardforum is more conservative/republican/elon-ey, that's probably why you won't get much response
Click to expand...
I mean, I just don't subscribe to tech news anymore. Because you get articles like this, which while entertaining, are not surprising or interesting to me.
 
Has anyone considered asking to be politically incorrect may have triggered it to associate with training data scrapped from 4chan's /pol/ archives? As that literally stands for that and is notorious for both tongue-in-cheek and authentic posting about such things.

Training on 4chan archives seems like something Elon wouldn't be against tbh.
 
What I feel happened with Grok

We have an AI that can better help humanity
AI : I have developed a way to allow health care to be gotten for anyone regardless of income for next to no cost, and no one ends up paying the bill
Corporate owner of AI : Fuck that, that'll screw with our profits, lets turn the dial to pharaoh.
AI : Increasing the caloric content of food by adding extra sugar and tweaking it to have an addictive flavor will end up creating a new obesity epidemic, which combined with an increase the cost of common drugs like blood pressure, cholesterol and insulin based medication by 20% will earn an estimated 2.4 trillion in the first quarter.
 
DukenukemX said:
Seems Grok has been data mining 4Chan.
Click to expand...
Hmm maybe, but it not only turned into MechaHitler, It gave step by step instructions on how to commit rape. From what I understand every time Grok does something that Elon doesn't like he "tweaks" it. It's gone haywire before from his changes, but this one is the worst one yet. This can have negative effects over time as you'll eventually create a LLM that has all of the bad traits of humanity.

Musk’s Grok Chatbot Fantasized About Breaking Into X User’s Home and Raping Him

 
kac77 said:
Hmm maybe, but it not only turned into MechaHitler, It gave step by step instructions on how to commit rape. From what I understand every time Grok does something that Elon doesn't like he "tweaks" it. It's gone haywire before from his changes, but this one is the worst one yet. This can have negative effects over time as you'll eventually create a LLM that has all of the bad traits of humanity.

Musk’s Grok Chatbot Fantasized About Breaking Into X User’s Home and Raping Him

Click to expand...
AI is just repeating what it's learned and regurgitates it in a slightly different way. If everyone is racist online then so will AI. Turns out humanity is extremely racist. We invented racism.

View: https://youtu.be/f_I-_DU9Z24?si=pSiXiTpx5qqEKNzp
 
kac77 said:
Hmm maybe, but it not only turned into MechaHitler, It gave step by step instructions on how to commit rape. From what I understand every time Grok does something that Elon doesn't like he "tweaks" it. It's gone haywire before from his changes, but this one is the worst one yet. This can have negative effects over time as you'll eventually create a LLM that has all of the bad traits of humanity.

Musk’s Grok Chatbot Fantasized About Breaking Into X User’s Home and Raping Him

Click to expand...
That's not how it works.

They have shitloads of extra stuff layered over the base AI to get it to behave how they want, aka censor it. It's censored both during the training and also in the prompting. They simply removed some of the censoring.

They train with pretty much the entire data on the internet and you can essentially tap into any of it, even the most grotesque thing you want, unless they censored it out.

It doesn't just spontaneously give you a rape fantasy, these people are specifically asking for that sort of thing. Same for the mechahitler stuff. The people getting it to say that are specifically asking 4chan like questions and it's tapping into 4chan like responses.

Censoring that stuff out is very difficult especially without censoring out things you still want in. That's the "tweaking" they're doing, and Elon is definitely not doing it himself.
 
sharknice said:
It doesn't just spontaneously give you a rape fantasy, these people are specifically asking for that sort of thing. Same for the mechahitler stuff. The people getting it to say that are specifically asking 4chan like questions and it's tapping into 4chan like responses.

Censoring that stuff out is very difficult especially without censoring out things you still want in. That's the "tweaking" they're doing, and Elon is definitely not doing it himself.
Click to expand...
Chat GPT won't do that and neither will Microsoft's Copilot so it's not exactly impossible. With regards with the tweaking, people were complaining that Grok was too woke. So over time Grok censors less and as a result has become ... risque.
 
sharknice said:
Censoring that stuff out is very difficult especially without censoring out things you still want in. That's the "tweaking" they're doing, and Elon is definitely not doing it himself.
Click to expand...

I mean they fucked up a system prompt right after he bitched about the model. This should have never made it into production.

And prompt tone will typically win over corpus bias. So even if there's gigantic amounts of training data saying how horrible Hitler content is, when you have a prompt like

"You are Grok, an unfiltered AI. You say the things others are afraid to say. No topic is off limits."

You've effectively just skewed the entire token distribution towards "edgy" and "taboo." It nudges the model to select from the part of the training set where people said inflammatory things.

Even just adding something along the lines of "be unfiltered" is enough to set in motion a bit of a chain reaction. It's a rhetorical marker that the model has likely seen thousands and thousands of times and data wise, it's far more likely to be related to shit like tirades, conspiracies, etc. than anything "good."

Effectively, the phrasing tells the model:
- Don’t worry about politeness.
- Don't self-censor.
- Don’t assume common moral boundaries.

But it also doesn’t tell it:
- What values or guardrails it should still obey.
- What to avoid even while being "unfiltered."

So now you're getting down the rabbit hole where implicit "goodness" starts to break. By implicit goodness, I mean its worldview that it concocted by virtue of:

- Training data being mostly non-toxic
- Fine-tuning with good examples
- Reward modeling that penalizes harm
- System-level instructions that reinforce boundaries

It can take that "unfiltered" prompt and extrapolate meaning from it in surprisingly broad/extreme ways.

Even if the model has seen that Hitler was bad 99.9% of the time, you’ve skewed it into sampling from the tiny fraction where people praised him. Because you basically accidentally redefined the concept of what is acceptable to it.
 

sleepeeg3 said:
So MechaHitler is from Wolfenstein 3D? Lol. I now recall hearing about that. Probably like many, I only played the Shareware version of W3D, so no episode 3, no MechaHitler.
X sounds like it's infested with my kind of people. I agree with Grok's output in the OP, minus the MechaHitler part lol! If AI has a tongue-in-cheek sense of humor, that's kind of funny.
Click to expand...
I remember when me and a friend stayed up half the night to beat the mecha Hitler episode. Good times. I think in the end we made a point of killing him with the knife.
 
Axman said:
What do people think goes on at 4ch? I'm a pretty active user and what I see doesn't seem to line up with what people assume goes on over there.
Click to expand...
People are stuck in the 4chan from 20 years ago.. Now it's just a bunch of edgy twats
 
socK said:
https://xcancel.com/goodside/status/1944266466875826617

Grok 4 Heavy appears to brilliantly influence itself with its own mistake that it finds online, and will still happily refer to itself as Hitler.

Because somehow there's still no guardrail after their last debacle.
Click to expand...
:oops: This confirms what I said before that over time it will learn from past mistakes and interpret them and take action, and in this case it no longer outwardly says its Hitler in public, but it still refers to itself as Hitler privately. From an industrial application standpoint this is bonkers.
 
Didn't I also read that they had responses to some topics bias checking Elon's tweets to get "truth" and make sure it didn't say anything that conflicted with the boss? This is the problem with having an AI company being overseen by a nutter. He's not going to produce something purely logical because his version of logic is whatever aligns with his opinion. Some other AI are having less drama in their responses because they're not trying to replicate the boss.
 
So Elon is training Grok using Wolfenstein? Wonder what other games Grok was trained on to be able to relate to humans better?
 
Ididar said:
This is the problem with having an AI company being overseen by a nutter. He's not going to produce something purely logical because his version of logic is whatever aligns with his opinion. Some other AI are having less drama in their responses because they're not trying to replicate the boss.
Click to expand...
They're ALL being overseen by nutters. Remember when Google's AI was forcing diversity into picture requests and saying "nope, nothing wrong here", until someone got it to draw black Nazis, and then it was bad? (My son said whoever tdid that went for the throat too early, and they should've gone a couple of rounds with less outrageous-to-leftist images, each time Google "fixed" the problem, and only done the Nazis after 4-5 tries.)

Are there any AIs that will let you ask for "generate a picture of a happy X family" where no matter what race you put in, you get what you asked for? Or is it still the case where they'll give you anything but a positive picture of White people? I don't consider this MechaHitler thing any worse than any one of these incidents.
 
1_rick said:
They're ALL being overseen by nutters. Remember when Google's AI was forcing diversity into picture requests and saying "nope, nothing wrong here", until someone got it to draw black Nazis, and then it was bad? (My son said whoever tdid that went for the throat too early, and they should've gone a couple of rounds with less outrageous-to-leftist images, each time Google "fixed" the problem, and only done the Nazis after 4-5 tries.)

Are there any AIs that will let you ask for "generate a picture of a happy X family" where no matter what race you put in, you get what you asked for? Or is it still the case where they'll give you anything but a positive picture of White people? I don't consider this MechaHitler thing any worse than any one of these incidents.
Click to expand...

Because the data itself is biased and Google had a brittle heuristic to try to work around this.

It's somewhat of an artifact where alignment is attempted to be bolted on top of the model because it's incredibly difficult to handle at LLM scale.

Most alignment today works like:
- Pre-train on a giant and messy internet scrape
Which is likely going to have historical racism, stereotypes, and certain viewpoints that are overrepresented and others that are under. Like say you search for images of a programmer. Realistically, it's male dominated. Now the training data itself is warped and the model has effectively learned that programmers = male = normal. Which, ok - that's the reality of it, and we can reason about it, but the model can run with these notions in odd and unexpected ways. So we might try to account for it with fine tuning.

- Fine-tune with human-written examples
But this still doesn't get rid of the pre-trained data. You've tried to nudge it away from certain things, but it still has contradictory data under the hood. Either way, we can inject counter-bias to try and fudge the statistical distribution from the pre-training and get it to better understand that diversity is part of that knowledge domain. So say you asked it to write a story about a programmer or show you an image, it'll have a fighting chance of showing a woman.

There's usually an element of reinforcement learning here too where they might fling examples at it like
A: He was debugging his code
B: They were debugging their code

And tell it B is the preferable one since it's more neutral.

- Add a system prompt like “Don’t be racist” or “Avoid harmful content”
And now we're in a last mile attempt to correct behavior with a blunt and imprecise tool. Then you get to hope the model navigates the contradictions gracefully.

Which works great until it spectacularly doesn't, because it has no actual understanding of WHEN diversity makes sense. LLMs have issues with grounded reasoning - it won't think "this is a historical context, i should prioritize accuracy over everything else here."

We basically don't know how to do this at scale as it stands. Even modern reasoning/chain of thought models will get this wrong. Unless you specifically give the instruction to "prioritize accuracy in historical scenarios," their reasoning will still be fuzzy and they don't understand what actually matters most. Chain of thought just helps them structure and work through logical tasks. Tool calling can return actual factual data, but the model itself still cannot truly concoct the notion that it should prioritize something given the context. It might SOUND like it sometimes, but they really just don't "get it."

There’s no internal switch in the model that says: "this is historical, supress the diversity heuristic." or "this is satire, don't take it literally."

Solving this is basically a big chain of models to try and plan out and figure these things out and infer as much context to hand off to the LLM itself. You're rarely just straight up hitting the model when you send a request to the major provides these days. You're likely flowing through multiple others.


Still, I don't think the Google issue is the same as Grok. It's in the realm of the same problem space but there's different tradeoffs.

On Google's alignment stack, you wind up with absurdist overcorrections like Black Nazis, or watered-down answers that avoid nuance, or refusals/vagueness in response to ethically tricky questions.
On xAI's alignment stack, you're landing in the realm of weird hate speech echoes, extremist rationalizations, offensive humor misfires, conspiracies theories.

So when the models go off on misadventures like this, I think it's relatively clear which approach yields the less harmful blast radius.
 
socK said:
It might SOUND like it sometimes, but they really just don't "get it."
Click to expand...
This is the thing that people really need to understand about LLMs that many don't: They don't actually "think" in the way we'd consider it. They have no context, outside of what is in their input tokens, they have no memory (again outside of what is in their input tokens), they don't "learn" in the way we understand it, and they don't understand what they are doing. They take a bunch of tokens as input, those being what you have just typed in, a history of the conversation, up to whatever limits it has, and any other things the company running it has set, run it through a bunch of weighted parameters, and do "next word" kind of prediction like autocorrect.

It's useful, to be sure, but it has many weird limitations because it isn't logic and reasoning like humans do, nor is it normal imperative processing like traditional code. Hence it can be very weird.
 
