“Yeah, definitely. And I think that is the fundamental difference between the pre...”

Yeah, definitely. And I think that is the fundamental difference between the previous age of what's called reinforcement learning with human feedback in which ChatGPT was born, because they rank the various possible outcomes of ChatGPT to any given prompt by asking individuals to rank this response as better than that response. And once they work with a lot of people to basically label good responses and bad responses, it produced something that would consistently get good score from human labellers, but also it made ChatGPT extremely flattering, extremely sycophantic, because most people, when they are in this kind of individual-to-individual dyadic relationships, they optimise for this short-term satisfaction. And so, if something that flatters me, oh, of course I'm going to let it pass. But if something that really checks my understanding, something really pushed back against my hallucination, maybe I'll just give it a thumbs down. Right? And so a AI that is trained from individual human feedback may not prioritise the relational health of that human embedded in a community. So now we're seeing more advances in like multi-agent settings, where the agent is learning not from individual feedback, but from community feedback.

2025-09-05 Accelerating AI Ethics: 6-Pack of Care

顯示前後文Show context

鍵盤快捷鍵Keyboard shortcuts