Without a robust and general theory of how to go up that curve indefinitely, you probably miss a step. It might be subtle; maybe one of your systems is now power-seeking in the background. A few steps out, you end up with dramatically more capable systems that do not deeply care for human values in a truly robust way. They just had a fuzzy alignment.

Keyboard shortcuts

j previous speech k next speech