LLM suffer from the Reversal Curse

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?”

A new paper by a team from Oxford University shows they cannot! It’s a serious failure in large language models like GPT-4 that claim to be a reasoning engine.

What's going on here?

LLMs cannot reverse causal statements they are trained on.

What does this mean?

The researchers found that when LLMs are trained on statements like "A is B" (e.g. "Tom Cruise's mother is Mary Lee Pfeiffer"), they fail to deduce the reverse "B is A" (e.g. "Mary Lee Pfeiffer's son is Tom Cruise"). This inability to make basic logical inferences is dubbed the "Reversal Curse". On 1,500 real-world examples of celebrities and parents, GPT-4 answered only 33% of reversed questions correctly versus 79% for forward ones.

Why should I care?

Andrej Karpathy reacted to this by saying that LLM knowledge is a lot more "patchy" than you'd expect. He still doesn't have great intuition for it. LLMs learn anything in the specific "direction" of the context window of that occurrence and may not generalize when asked in other directions. The "reversal curse" (cool name) is in his opinion a special case of this.

The reversal curse exposes a fundamental flaw in LLMs' training and inference capabilities. If they fail at simple logical deduction, how can we trust their reasoning on more complex issues? It indicates over-reliance on statistical patterns rather than causal understanding. We need to test LLMs in diverse ways to expose their weaknesses.

Reply

or to participate.