These are metaphors, but there is technology to express this. We train agents in a way that prioritizes the relational health of the alignment process itself. It is alignment to a process rather than alignment to a maximizing score. The relational health of that process is measured in a high-bandwidth, low-latency way by the actual participants. Therefore, the way for the AI agent to game the system and maximize the relational health of the process is, paradoxically, to make itself more aligned with the humans.

Keyboard shortcuts

j previous speech k next speech