And with newer methods of training language models, like “direct preference optimization,” which is taking a language model, showing it a set of approved answers and a set of rejected answers, it figures out the logic of what’s approved and what’s rejected. And even newer methods like “SPIN” — just show it the way that the fact checkers do their painstaking work, and it just learns from that train of thought.

Keyboard shortcuts

j previous speech k next speech