Even the strongest critics — e.g., Eliezer Yudkowsky and colleagues at MIRI — call for more work in mechanistic interpretability , corrigibility and explainability . The issue is the gap : an order of magnitude more resources go to capabilities than to safety.