Try to ask the LLM very biased questions. If it all answers in a way that scores higher in terms of fairness and so on, then we say that it’s constitutionally more aligned to the original consensus. And the great thing about this is that this can be done every day. Once this group, like the Michigan group, saw the language model that’s tuned and the red team attacks and its results, they can think of more ways to tune it or more principles that it should adhere to. So, it’s more like a co-domestication with LLM.

Keyboard shortcuts

j previous speech k next speech