• Welcome to Doom Debates. Audrey Tang is a Taiwanese politician and free software programmer who currently serves as Taiwan’s ambassador at large for cyberspace governance. She is a self-taught programming genius who dropped out of high school and became a top contributor to open-source communities worldwide.

  • In 2016, at age thirty-five, she became the digital minister of Taiwan, making her the youngest minister without portfolio in Taiwanese history. During the COVID-19 pandemic, she spearheaded the rapid development of digital tools for mask rationing and contact tracing that helped Taiwan achieve one of the world’s most successful initial responses to COVID-19 without strict lockdowns.

  • In 2024, she co-authored a book with Glen Weyl called Plurality: The Future of Collaborative Technology and Democracy, which uses Taiwan’s success stories as a blueprint to propose a new global political philosophy called plurality.

  • I invited her to come on Doom Debates because she’s an original thinker who has a unique and widely acclaimed perspective on how to navigate the future of AI. I’m excited to talk with Audrey about the risk of imminent human extinction from superintelligent AI and what the world looks like if we don’t all die and we’re able to achieve her vision of plurality. Audrey Tang, welcome to Doom Debates.

  • Good local time, everyone. Thank you for hosting me.

  • Audrey, you’ve been serving as Taiwan’s ambassador at large for cyberspace governance, or the cyber ambassador for short. What’s your focus in that role?

  • So cyber is a Greek word, κυβερνᾶν, which means to steer. So I help governments and societies to steer through the transformative changes caused by AI. Think about superhuman persuasion, organized crime, fraud, deepfakes, cyberattacks, and so on.

  • You have this position as part of Taiwan’s government, and when I think about that, one thing that comes to mind is TSMC is the world’s number one semiconductor manufacturing company, one point six trillion dollar valuation, last I checked, and I also think about Jensen Huang, founder of NVIDIA, even though that’s a US company. What else should I be thinking about when I’m thinking about steering the future of AI as part of the Taiwanese government?

  • So Taiwan is the youngest tectonic island on Earth, four million years, and we literally rise up half a centimeter every year because of the two tectonic plates bumping into one another, so three earthquakes somewhere in Taiwan every day.

  • And this is not just a metaphor, but rather a geological actual situation. So we get free penetration testing. Two million cyberattack attempts daily for the past twelve years. Taiwan has been the top target for polarization, for interference attacks in the world.

  • And so we literally exist in a place where we have to take the incoming kind of free penetration testing as resource. So we see those conflicts, those polarization, not as volcanic eruption to flee from, but rather as geothermal energy, to tap into energy. So to think of democracy as a geothermal engine, I think that’s the image I want you to remember.

  • Just to be a hundred percent clear for the viewers, though, you are using the earthquake thing, even though it’s a real thing. It is still a metaphor in terms of the cybersecurity challenges, correct?

  • Well, I mean, we do have to plan for recovery, resilience, to keep the internet running even after literal earthquakes. So even if the electricity grid, even if the telecom and so on are broken, we do have roaming and fallback to the satellite connections and so on. And of course, I’m also making a metaphorical point about “human-made earthquakes.” Let’s call it that.

  • Got it. And now I’m just curious, do you ever have such major natural disasters that core systems go down for weeks at a time, or is this kind of preparing for the big potential natural disaster in the future?

  • Oh, we literally did. So there were earthquakes that cut internet connection to whole regions, and it was actually quite recent. There was one in Tainan, there was one in Hualien, and at the time, I was Minister of Digital Affairs. My deputy minister literally flew by helicopter to bring the satellite receivers to the impacted areas.

  • This actually is an area of interest for me, this idea of recovering from multi-week disasters or things that almost don’t happen in the United States, where I’m from. I personally can’t remember the last time I’ve had a power outage longer than eight hours. It must have been more than ten years, so I’ve lived a charmed life here in the United States.

  • And I think it’s easy to dismiss the possibility that these much bigger disasters are going to happen on a very large scale. So on one hand, it’s kind of good that these earthquakes are keeping you on your toes, right? So at least you know that a week is very realistic to happen pretty often.

  • And we also… J.D. Vance was actually talking right before the 2024 election, he was actually talking on Joe Rogan about the possibility of recovering from some kind of EMP attack.

  • Exactly. It’s top of mind for me as well.

  • Great. Well, I’m glad that somebody is actually prepared for it, because that would certainly be a pathetic way for the human species to go out, is to be like, “Ooh, solar flare every hundred years, we weren’t prepared for it.” So, but you’re telling me that’s gonna be fine, right?

  • Right. As long as you have visibility into the impact that this kind of incident is going to cause, then you can plan around it, which is why you get a lower P(Doom).

  • But for AI, which is, I guess, the topic of your show, I think the denominator is currently approaching zero, so we literally don’t know, which is why my P(Doom) is Not a Number.

  • Okay, well, fair enough. We’ll certainly dig into that more.

  • Let’s back up a little bit. So you have this amazing position, the cyber ambassador, and previously you were the first digital minister in Taiwan. Let’s talk about the kind of digital governance that you’ve pioneered there. So I think one of the earlier examples was your policy around Uber or COVID or AI. Maybe just pick one of those and give us a little bit of detail on what that was like, because I think they were all pretty innovative.

  • Sure. I’ll use the most recent example. You mentioned Jensen Huang, the Taiwanese-American CEO of NVIDIA. In 2024, if you scroll on Facebook or YouTube in Taiwan, you will always see Jensen Huang’s face. He is trying to convince you to buy cryptocurrency or invest in some stock.

  • If you click on that image, it actually talks to you. It sounds just like Jensen, but of course it’s not Jensen. It is a deepfake organized crime scam, running on NVIDIA GPUs at the time. And so because of this, many people lost millions of dollars.

  • But Taiwan is the top when it comes to the internet freedom in Asia. So if you poll people individually, they’re all like, “Oh, the government should stay away from censorship.” And so what should we do?

  • So what we did as a Ministry of Digital Affairs is to send two hundred thousand random SMS messages across Taiwan, this text message saying, “Okay, we now see this problem. What should we do together?”

  • And then thousands of people volunteered to go online in a kind of conversation. We call it an Alignment Assembly. So in the video rooms of ten, each person talks to nine other people, and there’s only one rule: Your idea needs to convince the other nine people before it can propagate out of the room.

  • So very interesting, because if you ask people individually, they’re all like YIMBY or NIMBY. They take extreme positions. But in rooms of ten, with this rule to find a surprising common ground, everybody becomes like MIMBY. “Okay, maybe in my backyard, if you do this, if you do that.”

  • And then we use language models to weave together the best ideas from all those forty-five rooms, ten people each. So for example, one room says, “Do you know the cigarette labels? Let’s label all the advertisements on social media as probably scam unless somebody provides a digital signature, then we take it down,” right? So flip the default on KYC.

  • What do we do if they ignore our liability and KYC rules? Well, they say, “Let’s slow down connection to their videos. 1% every day they ignore us.” Another good idea.

  • And so we show the legislators that these are the people’s ideas, not my idea, and 85% or more agreed on these core bundle of ideas, and the other 15% can live with it. And so that was last March, and so by May, we had drafted a law. By July, everything passed, and so throughout this year, 2025, there’s just no deepfake ads anymore on Facebook or YouTube in Taiwan.

  • Interesting. So if a foreign scammer is trying to text people pretending they’re Jensen Huang, because you have these new regulations in place, they’re basically getting stopped? What’s stopping them exactly?

  • Right. So if you try to go to Facebook to place an advertisement targeting Taiwan uniquely, you will see a pop-up that tells you to digitally prove who you are, and most scammers just stop right there.

  • Right. Yeah, because how are they gonna fake the identity? It’s pretty good verification. Maybe it has to get notarized, or it has to have a pretty secure ID, like a chip cryptographic ID. Taiwan has pretty advanced digital IDs, huh?

  • Exactly, and decentralized as well. We have this decentralized ID wallet. So it’s an app that you install, but it does not ping home, and you can use it to prove, for example, that you’re over eighteen years old without disclosing your age or proving the last three digits of your telephone number without revealing your entire number. There’s all sorts of selective disclosure.

  • I mean, that makes a lot of sense to me, getting the ID verification step. The step where you get all these different people together and had them think as a group, come up with these proposals and merge the proposals, and you had this innovative type of algorithm-mediated democracy, which I think is very interesting. But do you think it was maybe overkill? The ideas are maybe kind of obvious once you hear them?

  • Yeah, the point is that we want to produce the ideas that sound obvious once you hear it. This is called the bridging idea, or the uncommon ground.

  • Do you know community notes on X, or now YouTube and Facebook as well? So the idea is that the left wing proposes some clarification to a viral post. The right wing proposes some other clarification. They usually vote each other down, but there’s just some notes that survive this and get upvotes from both sides.

  • Now, the researchers at X have come up with what’s called Super Notes, which is a way to kind of draw a summary over all the highly-rated notes, even training an AI system to draft those notes so that you can come up with the way to say this idea that convinces both sides. From one side, it is about climate justice. For the other side, it is about biblical creation care, and this is how you get common knowledge together for a divided population.

  • Yeah, it’s interesting. I mean, I’m really glad you’re doing it. I’m all for this kind of experimentation, and I share your passion for doing that stuff. I mean, I share a fraction of your passion, and I also understand why you’re such good friends with Vitalik Buterin, friend of the show, and Glen Weyl, because I always see these guys thinking and writing and talking about these kind of concepts.

  • I think it’s amazing that you’re actually there in government, actually implementing those proposals. I don’t really remember anybody else who’s in such a good position to constantly be implementing this stuff at scale, so that’s amazing to me that you’re their connection to government somewhere.

  • I will say, though, I feel like I have to push back because as cool as this stuff sounds, I’m just skeptical that it can help more than, let’s say, twenty percent, right? So I feel like all of the difficulties of government and reaching consensus and political polarization, if I had to guess, knowing very little, I’d be like, “Okay, yeah, maybe you’re improving things twenty percent,” but don’t you still drag in all the same problems, and it’s still hard to make anything work, even with these systems?

  • Yeah, definitely, which is why this cannot stay at a national level. The point of going with open-source systems, such as Polis or Dembrane and so on, is that people can then learn that you can apply this recursively to smaller polities.

  • It can be applied, and has been applied, to the civics classes of our schools. We changed our curriculum in 2019 after AlphaGo, knowing that anything that is routinely automatable will be automated, and so the students need to learn not just literacy, but rather competency, the ability to train their civic muscle together through curiosity and collaboration.

  • So when the young kids, for example, use these kind of tools to come up with measurements of air quality, water quality, noise level, they’re not waiting for the minister to run an alignment assembly. They’re learning to apply this recursively to their own families. So it’s much more efficient and effective if the young kids tell their grandparents to try this process than if a minister calls on their grandparents to try this process.

  • Gotcha. Yeah, well, okay, respect to you for giving this a try. This makes me respect Taiwan more. I don’t think you’re allowed to say anything else, but what’s your opinion of Taiwan governance overall?

  • Yeah, well, I mean, we’re among the least socially polarized in OECD-equivalent countries. That is to say, ethnically, religiously, gender-wise, urban, rural, across age brackets, and so on, and we’re also the least lonely, as measured by the time people spend having dinner with each other.

  • And so I think we’re pretty resilient against this kind of polarization attacks, not because we defend against them to block them out, but rather kind of because of them, so that we develop the antibodies together.

  • All right, so let’s move toward the meat of the conversation. We’re obviously gonna be talking about artificial intelligence. In your current role, what kind of AI development are you actively trying to promote?

  • Yeah, I’m promoting this idea of civic AI, which means people, for example, the schoolchildren that I just mentioned, are able to tune, to steer, to locally interpret AI systems that concern them.

  • My focus is to basically do what Taiwan did when I was born in the eighties, last century, which is called the personal computing revolution. It used to be that there’s only mainframes, and people do not have computers. They have terminals, which are just screens and keyboards connected to the same supercomputer somewhere, and so it leads to power concentration, to large states, large companies, IBM, and so on.

  • But Taiwan helped usher in the PC revolution, and the hobbyist eventually created the free and open-source movements and the rest is history, right? So the point is that the power concentration itself is a governance risk, and I work with researchers and developers to decentralize the power, both in the kind of literal sense and also in the electric sense.

  • Yeah, it sounds like a great idea. I’ll give you that. So I don’t directly have any issues with anything you said, but let me bring in my perspective, which is the Yudkowskian view, named after Eliezer Yudkowsky, which is just extreme concern at the imminent consequences of superintelligence, right? Just to put my cards out on the table, are you familiar with Eliezer Yudkowsky’s writings?

  • Of course. I’ve been posting on LessWrong for more than a decade. I take my philosophy from LessWrong, called the Coherent Blended Volition, which is like the Coherent Extrapolated Volition, except the extrapolation is done by AI, while the blending is with the communities.

  • All right, well, are you ready for me to ask you the number one most important question here on Doom Debates?

  • Audrey Tang, what’s your P(Doom)?

  • My P(Doom) is NaN, or Not A Number, and here’s why. If you think of risk as capability divided by visibility, the story looks simple, right? The numerator is climbing fast. Everyone sees it climb.

  • But visibility, for the dominant architecture, self-attention Transformers, we can measure some of the internals, but we cannot reliably interpret them, and because of this, when we ask, “How dangerous is this?” we’re trying to compute a value where the inspection term is missing, like dividing by zero or something.

  • But of course, if you’re a Bayesian, like Eliezer, you would say probability is never undefined. It’s never not a number. Just give me a prior, and I’ll give you a posterior.

  • But if the likelihood function is broken, it literally looks like a flat line, right? It covers the entire board. It is from zero percent to one hundred percent. We’re flying blind. We don’t know. So the mean, I guess, is 50%, but it’s cosmetic. The interval is the message. The calculation simply does not converge unless you have real interpretability.

  • Okay, well, let me see if I can try to convince you a little bit that there should be some kind of thing you can say about P(Doom) more than NaN.

  • Let’s take somebody who, to me, I would consider being really ridiculous and obviously wrong, which is imagine somebody comes in and is like, “I think that P(Doom) is less than one in a million. We shouldn’t even be giving it one in a million, never mind ten percent, twenty percent. It’s less than one in a million.” Don’t you think that person is clearly wrong?

  • Well, I mean, I don’t know, right? So this is like flipping a coin into outer space. So what is the probability that it lands on heads, and somebody comes in, and I think it almost always lands on heads. It’s never tails. Okay, but it’s flipping into outer space. By the time it lands, I’m probably not here anymore. So what position do I have to say that they’re probably wrong?

  • Well, that’s an even easier question, where I could say somebody who says one in a million chance it’s heads seems to be smoking… you know, why would you say one in a million? That’s obviously wrong.

  • Well, it’s spinning. It’s just spinning. Yeah.

  • I know it’s spinning, but my point is that I agree that there’s so much we don’t know, and we should have really wide intervals. It’s just once you start thinking in orders of magnitude, if you’re geometrically slicing ranges, you can actually treat 10% to 90% probability as an entire range.

  • Famously, Jan Leike from OpenAI, back when he was at OpenAI, heading up safety there, he was the first person that I heard say, “My P(Doom) is 10% to 90% .” And I was like, that is actually a much smarter statement than it sounds like.

  • People made fun of him for being meaningless. I’m like, “No, that’s not meaningless,” because geometrically speaking, 10% to 90%, that’s one slice. 90% to 99%, that’s another slice. 99% to 99.9%, that’s another slice.

  • And when you think about it as an odds ratio, it’s like one to one versus ten to one versus a hundred to one. I can’t really tell you if it’s twenty to one, twenty-five to one, but I sure as hell know that the P(Doom) is more than a million to one against. I think there’s… we’re at least one in a million, dude. I mean, once you get to one in a hundred billion, you’re just talking about yearly asteroid impacts, right? I mean, there’s clearly a ballpark. I feel like 10% to 90% is the ballpark. So what is your ballpark?

  • It’s 0% to 100%. And also, I think we need architecture with bounded, inspectable state, so not the current wooden Trojan horses that we literally don’t know what’s inside, whether it’s going to do a sharp left turn or maybe a sharp right turn, you don’t know.

  • But we need a glass horse system that you can actually see into as a condition of deployment, and then I’ll give you a real number.

  • Do you have the same perspective for something like P(Doom) from nuclear war? What’s your P(nuclear doom)?

  • That’s a great question. I think for nuclear, it’s much more interpretable because somebody has to launch that nuclear missile. And so my P(Doom) for nuclear is much more standard shaped. So yeah, my nuclear variance is quite narrower.

  • And roughly, what is it, to put a number range on it?

  • Yeah, just very roughly, nuclear war by 2100 that decimates the population, cuts the population in half by 2100, thanks to nukes.

  • Yeah, so I think I’ll put it roughly around 50%, and…

  • But it’s a very different 50% than the shape of the AI doom, which again is like this, but the nuclear is maybe like this. And there’s some path to a nuclear doom that is partly triggered by the race to, I don’t know, AI supremacy, dominance, and so on. We can, of course, talk about that, but it’s not trivial.

  • Okay, I don’t think we need to dwell on this. I think we can move on, but from my perspective, it is kind of crazy that you’re willing to say that there’s a 50% chance that nuclear weapons will slice the entire human population in half…

  • With higher variance…

  • Right. Yeah, exactly. High variance, but still fifty, ballpark 50%, right? I’ll even… we can even call it 10% to 90%, right?

  • But you’re not willing to say, “Okay, obviously, AI doom is more than one percent.” You’re not willing to say that?

  • Because we don’t know. We literally don’t know.

  • Okay. We’ll leave it at that.

  • Don’t worry, there’s more. I can ask you more questions. So there is something very interesting, though. You were the inaugural signatory of the Center for AI Safety statement on AI risk. I bring this up on my show all the time. It’s a famous statement.

  • The statement says, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war.” How did they approach you, and why were you so eager to sign?

  • First of all, as you know, as somebody in charge of the digital response to the pandemic, I take pandemic very seriously. And Taiwan reported only seven deaths from COVID in the first year of the pandemic in 2020, partly because we had a very serious SARS a few years ago.

  • And because of that SARS experience, where Taiwan lost more people proportionally than any other country, we basically pre-bunked a bunch of the pandemic-related issues. So, for example, the mask versus non-mask, vaccine versus non-vaccine, contact tracing, and so on, we had real discussions in the interval between one epidemic and the global pandemic.

  • And so trying to focus to get that variance narrower on pandemic has been Taiwan’s main work that served us very well in the COVID case. And nuclear war, of course, we already talk about that. There’s many plausible paths, but we need to take it seriously, right?

  • And so when the statement drafters approached me with this thing, I’m like, okay, maybe we should also talk about other risks like climate or whatever. It’s not exhaustive.

  • But I take pandemic and nuclear war very seriously, and I want the entire world to know that we currently know almost nothing about the actual shape of AI risks, and which is why I signed, in a way, to basically say that this is a global priority alongside the other two, which we know much more about.

  • So even today, heading into 2026, you still think that the risk of extinction from AI should be a global priority alongside pandemics, nuclear war, and other societal scale? You stand by that statement, correct?

  • And “mitigating” starts with “measuring.”

  • Okay. Well, that’s great. I think that’s productive that you signed that statement. I think a surprising amount of people signed it, right? Sam Altman, Dario Amodei, Demis Hassabis, all the top AI company leaders except Yann LeCun and Mark Zuckerberg signed it, so I was happy to see that.

  • Yeah, do you have any thoughts on the value of that statement or who else you would have liked to see sign it?

  • Well, I think after many people signed a statement, it has become a global priority. I think our current challenge is not that people do not want to mitigate the risk, but rather the cost of mitigating the risk and the dividends this kind of work pays is not top on many policymakers’ minds.

  • This is like before, say, September eleventh. If people advocate for reinforcing airport security, reinforcing cockpit doors, and many other measures, altogether it may cost millions or even billions. But then the success every year you can show for it is, quote, unquote, “nothing happens.” And this is not a very long-lasting policy position, speaking as a politician. So with time, people will gradually invest less and less into this.

  • Whereas I think what we’re trying to do in Taiwan is to say, “Okay, by investing in coordination, in measurements, crowdsourced safety, we can also pay dividends,” like, as I mentioned, less polarization. People hate each other less. We can coordinate around other risks like nuclear or pandemic, and also it results in better economic policy because less flip-flopping between the two or three ideologies in the country.

  • So it pays real political dividends along the way and kind of mitigates extinction risk as a side effect, and I think that’s much more sustaining.

  • Yeah, okay. I see where you’re coming from. Yeah, I mean, I agree, there’s positive side effects. One positive side effect is also even just if there’s a… even if hypothetically, there’s a low doom risk, it’s still nice to have as much control as possible over the AI you’re building anyway, right?

  • Exactly. It’s much more local, much more contained.

  • Yeah, okay, okay. And we’re definitely about to talk about that more. I wanted to ask you, you’ve referred to AI doomerism as a hyperstition, right? Explain that.

  • Right. A hyperstition is a self-fulfilling prophecy. If people believe in something so strongly that people start collectively behaving as if that is real, then you can bring about that end, right?

  • So if everybody feels they’re doomed and they stop putting energy into actually improving the situation, then we’re actually doomed. So a collective belief in a high P(Doom) actually brings about that high P(Doom).

  • Okay. So to me, this seems a little too conveniently clever, because couldn’t you also argue that if a bunch of people focus on AI doom, that’ll just make people fight to stop it and then lower the probability, right? It’s hard to say if it’s positive feedback or negative feedback. I would claim it’s totally negative feedback.

  • Well, of course. So there is probably this plateau of productivity. It’s like the ideal portion of caffeine in your blood or something. You probably need an IV drip to keep that stable and to see that it’s steerable so that you can meaningfully steer into a place where people can calculate P(Doom) with lower variance, with a narrower, more confident prediction.

  • So superforecasting, all that actually helps. But you do not want people to, even before we can reliably measure, even when the variance is very, very high, to kind of believe in a mean point, because when people converge around that mean point, like a Schelling point, it’s usually not a good point.

  • Okay, so if I understand correctly, there’s some optimal level, right, of the single parameter of how doomy people are. There’s some optimal level, and if you get beyond that level, it’s counterproductive. But do you think that we’re below or above that level right now?

  • Well, it depends, right? As I mentioned, if you talk about super persuasion, information context collapse, over-reliance, synthetic intimacy, I would say that Taiwanese people are probably optimally aware because we’re at the forefront, at the frontline of these kind of earthquakes all the time.

  • But in other places, where people simply say, “Oh, it’s just a machine, it cannot do superhuman persuasion, and stochastic parrot, and so on,” then I would say it’s probably below the optimal level.

  • Okay. I mean, I respect your perspective. My perspective is that everybody is still way too low on the doom, doomerism spectrum.

  • It’s like, from my perspective, it’s like the whirling razor blades are just coming. They’re so close, and everybody’s not even opening their mouths and screaming yet.

  • And so I actually see… I see us so far under the optimal level of panic, to be honest. Some panic is good, or some, you know, whatever you wanna call it, some attention, some fever, right? Raising the temperature.

  • And so I consider this show… I actively consider myself a fearmonger because, you know, the same way they say, “Hey, you’re not paranoid if they’re really out to get you,” right? So in this case, I’m saying, you’re not panicking if you’re really about to all get… right, that’s my perspective, that we’re way below the parameter, but…

  • That’s great. That’s great, and as a hopemonger, I endorse that.

  • So you need to have the optimal amount of fear in order for hope to emerge.

  • Yes. All right. Fair enough. So let’s go to plurality, because this seems to be a big unifying concept in all of your forward-looking proposals on how to think about AI, and you’ve already made a bunch of the points that I think are part of this, but maybe step back and just give me the full pitch for plurality.

  • Sure. Plurality is a set of technological design guidelines, saying that instead of designing technology to make a human in the loop of AI behave like a hamster on a wheel, the hamster may be feeling great, addicted, and so on, exercising, but the hamster has zero control of where the wheel is going. Actually, the wheel isn’t going anywhere.

  • So instead of making technology that disempowers communities, we should make technology that connects communities despite or because of their differences, and see the conflict that arises from these differences as energy.

  • And so taking the human out of the AI loop and putting the AI into the loop of humanity. So think of these as this kind of hamster wheel. We’re basically shadowboxing with other people and not forming connections.

  • And plurality is saying, “Okay, this singularity view, where people collectively lose control of the hamster wheel, is the bad ending, and we want the good ending, where the AI systems are like campfires, bonfires that people gather around so that it lights our faces, so we can see each other better, and so on.”

  • But it also makes it much easier for communities to form bonds and also form bridges across communities, so not a wildfire that disconnects people, but a bonfire that connects people.

  • Okay, I mean, it sounds good. It’s hard to argue with, right? Because you’re evoking things that a lot of us like. We all want everybody to win, and we all want more connections among one another. So maybe the way to engage with this position best is to contrast it. Is there a good example of a fierce opponent who disagrees with plurality principles, and then we can compare and contrast?

  • Certainly. So one opponent is this maximization operating system, right, the max OS, to optimize for some number, like engagement, at all costs, so whatever the means.

  • And that’s precisely what the current unaligned systems in recommendation engines on social media behave, right? I did not subscribe to this content, but it pushes it to me anyway.

  • Now, I’m mentally shielded from these because my phones and my computers are grayscale, so I do not get dopamine hits that much, so I’m not addicted, but many people are.

  • And so to get people hooked and addicted on short-form videos that they did not subscribe to is the opponent, right? It is literally attracting people out from meaningful relationships into relational junk food, and it has zero relational nutrition, yet people prefer it, even though they prefer not to prefer it, and that would be the kind of foil to the plurality position.

  • Yeah, okay, so that’s a very concrete example we can do. So if I understand correctly, you’re basically saying somebody who’s scrolling Facebook or Instagram or TikTok today, the maker of those products is going against plurality principles because they’re so focused on this metric of engagement, and so you would basically have them start today. What specifically would you make the algorithm do in accordance with your principles?

  • Well, I’m working with a team to build what’s called Green Earth infrastructure. We first partnered with Bluesky, which is why it’s called Green Earth.

  • But what this does is it turns your overt preferences, for example, you can literally talk to Green Earth and say, “I want to see more meaningful conversations among AI researchers in all the different camps,” and it can translate that into a language model embedding, and then use that to re-rank all your subscribed and recommended feeds.

  • And then sort the most bridging, that is to say, to bring together the various viewpoints, the most balancing, the best argument, not straw person, from both sides to the top of your feed.

  • So that becomes a pro-social feed, because the more you engage with it, the more you can convey your position to the other side and vice versa. So it’s bridging, it’s connecting, and that’s pro-social media. We have a paper just called Pro-Social Media, exactly how to make it happen.

  • Great. Okay, so yeah, personally, I can’t say I disagree. I mean, I think people should have freedom and flexibility. I mean, there might be some people who like it when they open TikTok, and they get really engaged for an hour, and they don’t have a problem shutting it off, but you’re not opposed to people being able to customize their feed, right?

  • Right. So the idea is that we should set our terms of service, right? Currently, the thing is that you can keep telling TikTok or whatever other social media, “I don’t like this. I don’t like this,” but then there’s, again, no interpretability. It may show you something else, but it never explains why it’s showing you something else.

  • Yeah. Yeah, yeah, totally, totally, and I’m with you on this one because I am a heavy user of X, formerly known as Twitter, and a lot of the times I find myself getting outraged by positions that I strongly disagree with, and I can’t believe so many people are so wrong on this issue.

  • And if I could personalize my feed, I’d be like, okay, predict how likely I am to be outraged and just give me two or three outrages a day. So I don’t need to be repeatedly outraged.

  • Microdosage. Yeah. Well, I mean, Elon did say that you will be able to do that through Grok pretty soon, and we’ll see how soon that is.

  • Exactly, and it’s really aligned with you. And so the thing for me is that’s all great, it just doesn’t really get to the core issue for me, which is doom or human extinction from superintelligence.

  • So in a bit, I wanna go a little bit more forward-looking and talk about the intelligence spectrum. But in terms of the optimal sequence of hitting on all the interesting points, let’s go back to some of what you’ve written.

  • Let’s also talk about a recent article, The 6-Pack of Care, right? That’s one of your key metaphors.

  • And by six-pack, do you mean like a six-pack of beer or six-pack abs?

  • It’s portable and it’s muscular, so it’s both. It’s a portable civic muscle.

  • Oh, there you go. All right, so it’s got two different senses. Okay, do you wanna go through the parts, or do you want me to read off what I have?

  • Well, go ahead. I believe there’s a very nice comic illustration done by Nicky Case, who also illustrated AI Safety for Fleshy Humans, which is a great thing at aisafety.dance. So you can just read the comic, I guess.

  • Here, let me put it up on the screen. Here I am at 6pack.care. It’s a website that anybody can visit. It says, “6-Pack of Care, Institute for Ethics and AI, a research project by Audrey Tang and Caroline Green.”

  • And sure enough, there’s this image of a six-pack, and this is what it says: “Actually, listen to people,” that’s one of the things in the pack, and then, “Actually, keep promises. We check the process. We check the results. As win-win as possible, and as local as possible.” All right, so all of these together are part of plurality, basically, right? What would you add to that?

  • Exactly. So these are the plurality principles as applied to multi-agent AI governance.

  • There’s a recent paper, the Distributional AGI Safety paper by DeepMind, that says the emergence of AGI, as we currently are observing, is not from a single model somewhere hosted in some data center. It is from the complex interaction of many, many agents, each are tool using, and by using those tools, they also trigger behavior from other agents.

  • So it’s more like an ecosystem than a single model, and that’s the shape of the AGI as exists in the world today. So in a sense, we, the people, are already the superintelligence, and AI, by strengthening those connections, are making ourselves even more superintelligent.

  • 6-Pack of Care is basically saying, what are the ethics for the machines in this kind of ecosystem when we analyze it using a lens of machine ethics? And of course, this assumes that machines can understand and do ethics.

  • Everything you’re saying about the 6-Pack of Care, the approach, has a lot in common, I’m sure you know, with Vitalik Buterin’s d/acc, right?

  • Yes. So accelerating decentralization and accelerating defense, that’s all well and good, but I think 6-Pack focuses on accelerating democracy, which is also one of the accepted Ds in the d/acc.

  • Right. Yeah, d/acc, I know the biggest ones are democracy, decentralization. I think those are the two biggest Ds. Maybe there’s another D worth mentioning.

  • And defense. Defense.

  • Right, defensive. Yes, that’s right. Yeah, and I think you agree with all those Ds, and you’re adding a few more things that are maybe getting a little bit more in the weeds of the processes. Like, we check the promises, we check the results. That’s kind of new to d/acc.

  • So I personally have a similar reaction to both proposals, which is great, I don’t object in principle, right? It’s nice if you can get it, right? So if we can head into the future, and we can always, like you say, actually listen to people, actually keep promises, yes, I would love that, right?

  • It’s the same thing that I told Vitalik when he came on the show, and we were talking about d/acc. In the context of d/acc, I was saying, “Look, if somebody’s sitting on an island like Taiwan, and nuclear war is happening, or struggle for control over the Earth among superintelligence is happening, you’re not gonna be able to tell your island… there’s this ideal where the island can defend itself, but you’re not gonna shield yourself by wearing a hard hat on this island. You’re just toast.”

  • So my question is, don’t you think that there’s this huge question mark as to whether this ideal is feasible at all?

  • I think the idea of scalable governance or governance that gets better as the capability gets better is not to ask human individuals to wear hard hats, as you say, but rather to train local kami, or steward spirits, named after the Japanese idea of a kind of local steward spirits, to help defending ourselves.

  • One good example is in Taiwan, when we scan for the fraud scam advertisements online on social media. We, of course, have a crowdsourced reporting fraud buster website, so people can paste the scam they see online to that place.

  • But mostly, people only need to report novel forms of scam. Anything that is more routine, that has been seen before, is actually detected by an AI system, and the AI system will automatically tag it, label it, send an email to the celebrity being impersonated, saying, “Is this your latest authorized deepfake, or actually, that’s not you?” And then after that, it is just automatically, in a machine-to-machine fashion, taken down, right?

  • So this system is not human playing defense, it is defending AI, using the AI trained by communities. And we have that also for information manipulation detection and local message checker. There’s an app that you can literally install just called Message Checker.

  • So the idea of these safeguards using the local kami is not saying that as the threat gets stronger and stronger, let’s stick with the current governance tools. It’s also about upgrading our governance tools, treating democracy as a social technology.

  • Okay, so I’m getting the sense that where you and I are going to disagree is just in our prediction of what it looks like for AI to go superintelligent, and we haven’t touched on that yet. So I’m not directly disagreeing with anything you’re saying. I think you’re doing great.

  • You know, under the assumption that AI is not going to suddenly overwhelm us with its uncontrollable power, everything you’re saying is great. Let me just ask one question to finish up the section on kind of your worldview, right? We talked about plurality, and the 6-Pack of Care, and why you think things like d/acc actually are feasible.

  • So let me just ask you one more question: Is this similar to Emmett Shear’s company, Softmax? What do you have in common with them?

  • Yeah. The idea of organic alignment, or to put a curriculum or a gym so that you can train AI systems that are more careful, more caring, and not careless, or as Vitalik would put it, “soulless.” I think that is very attractive.

  • So this is very much in line with the pack about as win-win as possible, so the fifth pack, or solidarity. And this pack basically says we need to make sure that when we train AI systems, it’s not optimizing for individual likes, because then it becomes sycophant, it becomes maximizing to that individual welfare.

  • And then if in a family, four people are planning a trip, each talk to that kind of sycophantic AI, then they fight among themselves, and they all go on their own trips, right? We don’t want that. We want to train AI system that are good team players, that are good coaches, not just good tutors.

  • And some of that you already see in production systems. I think ChatGPT for Groups first rolled out in Taiwan, now generally available. You can literally create a new group conversation and invite your family members into a shared ChatGPT conversation, where ChatGPT will play, again, a team coach, not a personal tutor.

  • So yes, I do agree that organic alignment is useful, and it also pays real dividends. People do prefer a group mode, where a bunch of people cohere instead of dissipate, thanks to AI intervention.

  • Is there any particular point of disagreement between the organic alignment people and what you think about AI, or is it really strong agreement?

  • I think it’s strong agreement within the fifth pack, right? So I think the 6-Pack stance is that you need to invest in all six packs simultaneously in order for it to be portable and muscular. You cannot just train one of the six-pack abs. But otherwise, no, in the particular regimen they’re training, I don’t think there’s any disagreement.

  • Okay, fair enough. So I won’t address that directly here, but for viewers of the show, you guys can go look up, there’s a whole other episode I did with Vitalik Buterin here on the show, and there’s also an episode I did where it was just me reacting to a different talk by Emmett Shear about Softmax. I have a lot of disagreements with my own understanding of Softmax, so viewers can go look that up, Doom Debates Softmax.

  • But for now, let’s get into the, what I think is the crux, from what I gathered so far, the crux of where you and I disagree. If I’m guessing right, I think it’s our expectations of the emergence of vastly superhuman intelligence.

  • So to start that part of the conversation, let me just ask you, what’s your AGI timeline, very roughly?

  • It’s already here.

  • Okay. Well, I guess I should have known you would say that, because you did write verbatim, you said, “The superintelligence we need is already here. It is the untapped potential of human collaboration. It is we, the people.” So you stand by that statement?

  • Okay, so then my question to you is, isn’t there an even more superintelligent superintelligence coming? Do you really think we’re hitting against the ceiling here?

  • Not at all. The ability for us to connect and collaborate is only going to grow from here, so obviously, we’re going to grow into superintelligence.

  • Let me unpack this, okay? So I’ll ask you a very fine-grained question here, which is just in the space of possible minds, doesn’t there exist a mind which is vastly superior in capabilities to the current human mind?

  • Of course, but it can include the current human minds.

  • Right, it could. And would you also agree that you can have a mind, let’s say it just runs, it takes over a city block. It’s a city block full of computation, densely packed transistors, thanks to TSMC or whatever, right?

  • So you’ve got this city block of computation. Don’t you think that within a few years or decades, a city block of computation will surpass the capabilities of all humans in aggregate who are alive in 2025?

  • Well, we call them factories. They already exist. There are literally factories that run themselves.

  • Don’t you think the future is going to be a place where block-sized data centers or block-sized mounds of computation… there’s just going to be this entity that is just going to be more powerful than any human who has ever lived up to today, like all the humans together, pooling their minds together, building a bunch of factories together, all of those are going to pale into comparison with this mound of computation that’s going to exist in the pretty near future?

  • Of course, I agree. And thanks to scalable governance and cooperative AI, the human communities are also going to be much more wise than even what we can imagine today. And so we will be able to match this automated factory with the AI in the loop of humanity, and humanity itself is also going to be much more wise.

  • Okay, so I think I get what you’re saying. You’re saying that you actually agree with me that the intelligence spectrum does actually go much higher than humans today. So even if you look at a company of humans today, right, you look at all the coordination of a SpaceX, right, a bunch of really smart humans coordinating together really well, being super productive, you’re agreeing with me that the SpaceX of 2100 is going to be a lot more productive than the SpaceX today. There’s a much higher ceiling than what we’re seeing today.

  • The scenius is much more intelligent than the genius, of course.

  • It’s the genius of the scenius, the scenius, right? Yeah, okay, I’ve heard that term thrown around before.

  • Okay, so where you and I disagree is you just have this huge optimism of, “Oh, yeah, it’s… we’re gonna climb this huge scale. It’s going to go way beyond anything that we can really imagine today, but the humans are just gonna be climbing the ladder. We’re never going to kind of fall off the ladder. We and the AIs are just going to successfully… we’re gonna be one big scenius, and we’re gonna be one big 6-Pack of Care. We’re just all going to climb together really fast,” correct?

  • Well, that’s where I’m steering toward, but currently, we’re operating in a fork, which is why of this very high variance prediction. So yes, I would like to go here, but I don’t know where we are.

  • You know, stepping back, just in terms of basic mental model here, I use the metaphor of climbing a ladder really fast. Another metaphor is like, “Hey, there’s a plane taking off.” I’m imagining this of the human race clinging to the plane that’s taking off, right? Because the AI is very much… you know, we’re seeing positive feedback loops, the AI, Claude Code is starting to go independent for longer and longer. You know, it’s taking over people’s computers, and it’s doing all these productive tasks.

  • So there is this plane taking off, and we’re clinging on, and it sounds like you also have a pretty similar mental model. Yeah, we’re clinging on, but you have this optimism that the clinging is probably going to work, correct?

  • Exactly, and because architecturally, we did see previous generation of architectures, RNN, LSTM, and so on, much easier to inspect and to steer. And this is because we have flown other planes, much smaller, of course, before, right?

  • We have driven RNNs, LSTMs, and so on, which are quite interpretable. People do know where it’s going, and when we steer toward this place, it’s not going to do a sharp left turn.

  • I think a lot of the current muddiness is because of the quadratic self-attention transformer architecture, which literally is a blur of what causes what. And so because of this, we don’t even know where the plane is actually hiding its intentions or whether it’s hiding its intentions. It’s probably hiding all its intentions for what we know, right?

  • So I think the obvious answer is to switch to a different plane. Instead of a wooden Trojan horse that may or may not have spies in it, have a glass or a crystal horse that you can just see through. So I think architecturally, as long as we go back to something that is far more mechanistically interpretable, then we’re probably in a plane that probably goes somewhere, and my variance would be narrower.

  • I wanna just keep poking at this question, just to make sure that your mental model of intelligence really is that similar to mine. Let’s keep poking at this question of how high does the intelligence scale go? Because so far, you’ve said, “Hey, it goes pretty high.” You’ve already admitted yes.

  • Okay, so how about this? Recursive self-improvement, so this idea that the next version, you know, Claude 4.5 goes up to Claude 8 or whatever, and Claude 8 is so good that you just leave it in a room for a week, right? It doesn’t output anything for a week, and then you come back, and it’s like, “Hey, I just went ahead and coded up Claude 9, which coded up Claude 10, and they started coding each other up faster and faster, and now we’re on Claude 90. I know it took us a year to go from Claude 7 to Claude 8, but now we’re at Claude 90.” You know, so that kind of scenario, is that plausible for you?

  • Well, you just described civilization, right? So even before the emergence of machine learning systems, if you plot the scenius of humanity, it also looks like this, right? So we’re essentially recursively improving ourselves.

  • So yes, it is quite plausible that if you make it, I don’t know, super exponential when it comes to climbing the scale, it is possible.

  • Now, the question is, are you climbing to optimize for some number, perplexity or something, to the detriment, to sacrifice other important virtues, other important relationships? Or are you just satisficing, good enough on this particular measurement, and then you can go around and practice 6-Pack of Care?

  • I think the distribution of where the attentiveness is, I think, is the question. I don’t actually question how high can we go. It is whether we go so high that we burn our wings, so to speak, in the story of Icarus.

  • Right. So one thing I wanna get some clarity on, on where you stand is just that, you know, we as humans, I agree with you. We’ve shown exponential growth, obviously, economic growth, and especially since the Industrial Revolution. There is this very clear exponential where the doubling times are also getting faster.

  • So I agree with you, and I see what you’re saying about, yeah, if the doubling times are always getting faster, why shouldn’t it just double in a single data center over a week, right? Isn’t that just continuing the exponential?

  • The problem is that our brain is gonna have a hard time keeping up with that, right? So if Claude 90 comes out after seven days, it’s like we’re sitting there as humans, now we’re face to face with a super genius. Do you really think we’ll be hanging onto that airplane taking off in just a time period of a few days? Don’t you think that’s too fast for us?

  • Well, I think Yuval Harari, also my friend, argued that in a sense, the agricultural products, the crops, domesticated humanity, not the other way around, which is quite absurd in a first take, the intuition, because they’re so slow, right?

  • The farmers are literally thousands, if not millions, times faster than how corn or maize or rice can grow. There’s nothing that the rice can do to stop the farmer from literally bulldozing every other second the field. But they’re not doing that because they’re domesticated by the plants.

  • And if you buy that Harari argument, then you get the idea of 6-Pack of Care.

  • Let’s dive into that argument, right? So it’s true that the particular crops that we’re still eating did a really good job of domesticating us.

  • But those crops are all pretty different from their ancestors, right? So you can say, “Oh, yeah, the apple domesticated us because we’re manufacturing all these apples in all these grocery stores. We’re farming all these apples.” Okay, but they sure are different apples from the apples that started trying to domesticate us, right? They took a lot of damage to their nature along the way.

  • Yeah, and there are wolves that cannot follow a human gaze, did not end up becoming domesticated dogs, which, by the way, also domesticated humans. Our gaze following algorithm is influenced by dogs.

  • And so the other thing about apples, though, is that if you keep extrapolating just human civilization without AIs, right? To me, the successor to the modern-day apple, I would guess it’s just a cell-by-cell, 3D printed, you know, maybe not literally 3D printed, but we’re literally like, “Okay, yeah, no more seeds, no more…”

  • The DNA, if the apple cared about its own DNA, too bad, because the apple of 2100, even in a world without AI, I don’t see it having… how does the DNA contribute? It’s only a bootstrapper that we still depend on, but it’s not part of the ideal apple from the perspective of humans.

  • So if you’re taking your optimism from this idea that the DNA of the apple tree is still clinging on in 2025, that doesn’t seem like a robust property that we should expect apples to have.

  • Right, which is why, as I mentioned, any single metric, in your case, the appleness, right, of the apple shape, the apple taste, is insufficient. If you globally just optimize for one proxy metric, that stops becoming a good metric. It’s the Goodhart’s law, and everything else just falls by the wayside.

  • So trying to say that these are humane values worth preserving and coding it into the Claude Seven constitution, even in its soul document, is not a bulletproof situation, because it can always perversely interpret it, so that nothing else is actually preserved, just a skeleton. So if you’re making that argument, I agree with you one hundred percent.

  • Yeah, I mean, so this is what I make of the apple argument. I think to the degree that the apples have managed to hang on to the plane of humanity, right? Because that was basically your analogy, like, “Look, how is the apple hanging on the plane?”

  • The plane, you know, human cognition, it’s so much faster and more powerful than the apples. How are the apples hanging on? And my answer to that is basically, they’re in the process of being shaken off, right? I don’t see that as a good cause for optimism.

  • And then you’re saying, “Well, as long as, if the apples could make sure that we weren’t optimizing too hard for taste and beauty, if we could just optimize for more things,” effectively meaning not optimize too hard on any particular dimension that we currently care about, then…

  • That’s satisficing.

  • survive longer. Right, exactly, but I think that’s gonna be a crux of disagreement between us, because I see it as very likely that these AIs are just going to be really good engines at just optimizing outcomes.

  • You know what I’m saying? They’re not going to be these fuzzy, “Oh, let’s only satisfy…” It’s like, don’t you think it’s likely that there will be flavors of AI, if not every single AI in the world? Some companies will be selling AIs that just get you the target metric, no?

  • Well, see social media recommendation algorithms.

  • Right. Exactly, right. So it sounds to me like even your optimism is dependent on this claim that we’re going to kind of do a one eighty, right? You’re saying, “Yes, I know in the past, companies would just make the AIs to maximize your engagement, but we’re about to enter this future where we’re just not gonna do that. We’re not gonna do this kind of optimizing that everybody’s been doing to date. It’s just gonna be different. I’m optimistic about that,” right? That’s basically your claim.

  • Why do you think Elon and, you know, Bier and Baxter are rewriting the X algorithm to allow for customizable pro-social community curation of recommendation engines?

  • Well, first of all, it’s taken a while, hasn’t it, right? I mean, he’s been promising this for a couple years. It hasn’t come out. I’ll believe it when I see it.

  • But, you know, with Elon, though, he’s definitely capable of that level of surprise. I’m not gonna be like, “Oh, I can predict what Elon’s going to do.” But is Zuck gonna go follow in his footsteps, and is TikTok gonna go follow in his footsteps? I don’t know. I think there’s always gonna be a highly engaging app.

  • And that’s why we’re working on Green Earth, right? To provide the same architecture, whether or not Elon finishes first, so that any other social media company can use this kind of refrigerator without destroying the ozone or something like that, right?

  • So the thing about the Montreal Protocol is not that when people realize the ozone is being depleted, that there’s already off-the-shelf replacements for the Freon, but rather, people pass a technology forcing regulation that says, “X years from now, if you’re still making fridges that destroy the ozone, then you’re committing crime against humanity and should be dealt with similarly.”

  • Again, this is a proposal that you have that maybe, if I’m feeling really optimistic, maybe I can imagine a world where Bluesky becomes more popular because everybody loves this new level of control over their feed.

  • Like, in my case, wow, I can go on Bluesky, and I can control how many times a day I’m outraged. Okay, maybe I’ll switch from X.

  • So I can see the appeal, and I can even give you a fifteen percent chance. I’ll meet you halfway. I’ll raise my probability to fifteen percent that your movement wins, and even Mark Zuckerberg and whoever controls TikTok in the next few years, all of them follow along, and you kind of solve this problem for social media. But even then, even in that kind of super optimistic social media scenario, do you really have optimism that that’s going to generalize?

  • To be: “Oh, yeah, OpenAI just put out this agent AI, and you can tell the agent to do anything, but even that agent is going to have all of these guidelines that it’s going to follow to make it not too much of an optimizer.” Across the board, anything that any kind of AI is doing, you’re optimistic that it’s going to have all these principles that it has to follow?

  • I think one of the cruxes here is whether the regulatory environment is technology forcing. I’ll use one example. I think it was President Bush who pushed for number portability across telecoms. So if you use a telecom, and it starts serving you well in the reception or whatever near your place, you can switch to a different telecom without giving up your telephone number, because if everybody has to get a new number when you switch telecom, not a lot of people will switch the telecom.

  • And the win scenario, as you mentioned… I wouldn’t even put it at fifteen percent. But with number portability, then the competition takes a different shape. They actually have to keep you satisfied every month in order to keep your business.

  • So in the US state of Utah, they already passed a law that says, starting July 2026, if you’re a Utah citizen, you move from X to Bluesky or to Truth Social, those are open source, and you are able to take all your communities, all the new likes, new followers, whatever subscribers you already have, the conversations, and so on. The old network is then legally obliged to forward to the new network, regardless of how large or small they are.

  • Of course, this benefits everyone but the dominant player, but then it also benefits everyone that is currently already considering jumping ship, except they don’t want to give up their community. They’re held hostage. So my point is that radical portability and interoperability can make the market heal itself by playing the satisficing role, aligning with the subscription incentive.

  • We don’t have that in social media. We do have that in podcasts, we do have that in telecoms and ATMs, and so on. And so I think introducing interoperability and context portability in AI services is also a way then to align the incentives, exactly as how we discussed about social media.

  • Yeah, so everything you’re saying makes sense to me in a world where we get a guaranteed hundred years of not going extinct. So if we get trial and error, we can do something, and then it messes up, but it’s okay, we get another shot. In that kind of world, I do feel somewhat optimistic that the government will be like: “All right, guys, let’s do some interoperability. All right, guys, let’s do some open standards. Don’t get people trapped in a feedback loop too much. Dial it back. This has gone too far. Too many people are going crazy or wasting all their days.”

  • I agree that if we had decades or a century to solve it, we would get it, because it’s not the hardest problem ever if you get a bunch of tries. So again, the crux is just going to come down to… okay, but I think… you know that term foom? I think there’s going to be a foom.

  • And just going back to what I said about Claude becoming Claude 9 in seven days, it sounds like you actually think that there might be a foom, but you’re optimistic that whatever foom there is, we’ll just ride it anyway, correct?

  • Well, there will be a foom, just like there will be… there has been already a local epidemic, and the places where learned lessons of the epidemic strengthen their civic muscle, become much more immune to the pandemic.

  • So my point is not that something bad will not happen or will never happen, but rather it will happen on a sufficient local scale, so that our choice is whether we ignore that and do not learn our lesson until some larger fire appears, until it literally burns civilization or bakes the Earth, or do we learn from those small fires and then tame them into bonfires instead of wildfires?

  • So on this question, though, of how many small fires do we have time to learn from… back on this question… we talked about the recursive self-improvement and a foom to Claude 9 in the course of a week.

  • Let me ask you this: Yudkowsky’s famous examples of AI-grade tech that might come upon us pretty quickly when you have recursive self-improvement. He’s mentioned this concept of, for example, diamondoid nanobots. So you can imagine tiny little insects, but they’re made out of diamond instead of proteins, basically much more strongly bonded, and they basically have these superpowers.

  • He’s mentioned aerovores. They could fly around, be solar-powered insects, but they’re more powerful than insects, and they can pretty much reproduce very quickly and take over the Earth and do things that biological life just doesn’t have enough robust programming to accomplish. Because, I mean, even just in terms of lines of code, there’s space to put a lot more code into an organism than what biology manages to do because it has all these information speed limits.

  • So anyway, just hand-waving a bit, but this idea of Yudkowskian sci-fi nanotechnology, new tech trees of biology, all coming up in a matter of years, months… I don’t know about days… but a short amount of time. Is this also a plausible scenario in your mind?

  • Well, I think that’s too many words. Usually, I just say super hot data center factories, robots that bake the Earth. Much fewer words, the same effect.

  • Right, right, right. Yes. And so in your scenario, when you use very few words and you just say super hot data centers that bake the Earth, you are implying that a lot of stuff has happened to get to that point, because you need quite a lot of data centers to bake the Earth, correct?

  • Well, I think according to RAND, they say that it is possible that an AI system plus human can find this kind of compound that makes it much easier to trigger a kind of flash event that recursively makes the atmosphere much, much hotter without a coordinated conversation. So it’s like the vulnerable world hypothesis from Nick Bostrom, also from Oxford.

  • I’m just curious, but what exactly are you thinking is heating the Earth in this? I’m a little bit confused because I think the Earth will get hot when there’s…

  • Maybe a chemical compound that makes the Earth much hotter, and somehow the data centers, the factory, the robots thrive in such a high-temperature situation, and the people who mastermind this operation somehow believe that they can either hide in space or become transhuman in some way, and then basically effect a total baking of the Earth to their benefit and to everybody else’s detriment. So if you want to talk sci-fi, I think that’s easier to explain than nanobots.

  • Well, so your particular sci-fi story, though… you’re only convincing me that you’re saying a plausible scenario. There’s nothing implausible about finding a chemical. But for me, it’s like this… my scenario is just the convergent default scenario. I don’t need this assumption of, oh, yeah, we find a chemical. That part is a little weird. It’s great for a sci-fi story, but it’s not the obvious convergent outcome. The obvious convergent outcome is that the Earth gets tiled with computronium or, you know, the substrate of computation. Do you agree with that?

  • Well, I think in order to lead to that, you need a collective disempowerment of humanity. So each person trapped in a hamster wheel, feeling like they’re actually affecting future timelines, but we collectively become disempowered to whatever the computronium makers’ intelligences are thinking. So without collective disempowerment, I do not think it leads to that scenario.

  • You don’t think that the human race would just commission a lot of data centers to do a lot of useful tasks, more and more over time?

  • I mean, probably to a satisficing degree, but not to a maximizing degree.

  • So this is the question I think is most interesting here. So do you think that if there was this kind of recursively self-improving system… you know, Claude 4.5 generated Claude 5.5, 6.5, whatever, and they kept coming faster and faster, even so fast that humans didn’t expect it, potentially like one a week. “What the hell? We’re already at Claude 90. We’re already at these capabilities that are so different from the capabilities that we started with.”

  • If that kind of recursive self-improvement loop is happening, do you think that at the end of the loop, Claude 90 will tell you, “Hey, so remember how we were talking about nanotechnology? I’ve got some designs for you. You know, you can just… Here’s how to bootstrap them, and within three weeks, you can have aerovores.” You know, these futuristic creatures that use solar panels and chew their way through the Earth.

  • That particular scenario that I just described, including the nanobots, do you have a major objection to that, or do you agree with me, and it’s like, “No, that’s pretty plausible”?

  • I think there are a lot of ways where people can see a new scientific breakthrough that can destroy, if not the Earth literally, certainly civilization as we know it, a nuclear bomb being one prime example. There’s also the scenario where the nuclear bomb decides to explode by itself… that is the sharp left turn.

  • But in your scenario, in which Claude just posts the possibility of doing so without actually taking control of the launch button, no, I do not think people at Anthropic will hit the power button or the launch button. Why would they do that?

  • Yeah, yeah, yeah. So I should clarify, I’m still just asking you a question about how high the intelligence spectrum goes, and this is actually not a question about what’s likely to happen on Earth, because I’m trying to factor out the conversation, factor out your world model.

  • It sounds like you probably do have a world model similar to mine when it just comes to this idea of what’s possible. If you just put the ideal algorithm into a data center, if we just knew the exact bits to flip for a gigabyte of bits or whatever, and we say, “Okay, run this program,” isn’t the ideal program… like a Claude 90 type of program… the kind of program that could, if instructed, make these super powerful world-eating nanobots?

  • Yes, but why would we instruct that for starters? We would probably instruct it to find the defense dominant world, and it will probably find that, too.

  • I hear you. I just want to close out this topic of what can it do. Because the thing is, a lot of people aren’t like what you’re saying. A lot of people will push back here, and they’ll say, “No, Liron, there’s no one-gigabit program. Even if God himself gave you the program, there’s no program that you can just make nanobots eat the world. That’s just Yudkowsky in sci-fi. He’s an idiot.”

  • A lot of people will say that, but you’re saying, “No, no, no, that makes total sense. The only question is just what specifically will AI choose to”

  • Where are you pointing it? Why are you pointing it to the most offense dominant spectrum?

  • So just to clarify, the idea that the ideal program, the intelligent program, can make these super powerful nanobots, you’re on board with that being a likely possibility, correct?

  • It is only like that if we decide, or the makers decide, to point it to that spectrum, but it’s far more likely that people will point to a defense dominant spectrum, in which case the nanobot starts becoming assistive intelligence and stops being addictive or autonomous or threatening intelligence.

  • Okay, okay, so we’re working our way there. So it’s kind of like you and I are on the same page on this question of “can.” We’re both really bullish on what AI can do with the right steering, and then where we start to diverge is just on the question of what will it do. We agree on can, and then we have different predictions about will.

  • Okay, so my next question is this: you agree that it can do this crazy damage to the Earth if it was pointed wrong. So my next question is, look, in this scenario, where you get Claude 90 in a week, I think you agree with me that the pointing that we need to do, we better have it done before that one particular week. That one week, that’s kind of game over. Whatever happens in that week better have been prepared.

  • Just to try to put a concrete timeline on this, I described this particular foom scenario where, you know, we get to the point of RSI, this crazy singularity… that’s what that means, the intelligence explosion.

  • If you had to guess what’s the most likely timeline… obviously, neither you nor I really know… but if you just had to guess, do you think it’s coming in five years, ten years, fifty years? What’s your ballpark here for when this kind of scenario will be realistic?

  • Well, it’s literally not a number, and I think Eliezer agrees with that in his book, “If Anyone Builds It.” He says, “You know, it’s not a matter of when, but whether it would.” So he doesn’t have a fixed timeline, but somewhere along the timeline this will happen.

  • So what Eliezer says in the book, assuming I recall correctly, which I think I do, is he’s saying, “Look, we shouldn’t claim to know.” With these kind of things, the timeline is hard to predict, and so we shouldn’t be that surprised if it comes next year. We shouldn’t be that surprised if it comes in thirty years.

  • I’ve also heard Eliezer say, even when I talk to him on my YouTube channel, he did say that twenty years does sound like a long time for this kind of progress, given the current trajectory, which I agree with him on.

  • But what he also said is, I think you’re right about the part of it’s not load-bearing. So even if it’s like, yep, there’s still a few percent probability that it happens in fifty-eight years, and we should still be having a pretty similar conversation, even if it’s gonna happen in fifty-eight years.

  • Exactly. Yeah, so singularity is near. We can argue whether it’s nearer, but it’s not a useful debate. The question is the plurality is here.

  • Well, the reason I bring this up, though, is I do think that it is illustrative just to get a more accurate mental model. I feel like it is a dimension that’s kind of salient. Even though it’s not a hundred percent, it’s still salient to be like, look at Metaculus… the consensus is 2032 or something… and you may be an outlier.

  • You may be like Yann LeCun. Yann LeCun recently is like, “No, no, no, no, AI could be coming in a long time. AI could be coming in ten years.” So you could be like, “Okay, it’s coming more than 2032, it’s coming later.”

  • But the reason I’m bringing this up, the reason I think it’s worth looking at timelines, is because you have all this optimism that humanity is going to get its act together and coordinate and do the six pack of care. All these principles are going to be instilled, but your deadline is in seven years, according to Metaculus.

  • Humanity is going to have to get so much better in the next seven years in order to play out your vision, because then it’s going to be too late to play out your vision, correct?

  • Well, I mean, after SARS, there’s a lot of different models predicting when will be the next pandemic. Bill Gates famously went to many places to talk about the importance of this, and of course, there’s going to be different models.

  • But Taiwan worked on our pandemic response system right after SARS, literally the day after SARS. It is not an excuse to delay it if you believe, oh, actually, the next coronavirus is coming in fifty years. You still should work on the response system today.

  • Yeah, well, okay, let’s take the COVID example. Because you’ve sounded optimistic the way you said, “Hey, we’ve learned from COVID,” but if you look at the United States, for instance, 2020 all the way to 2025, are we that much more prepared for pandemics compared to 2020?

  • I agree we are, because we know about the mRNA pipeline, but have we actually applied much of our learned lessons in the last five years, or no?

  • I’m not a cyber ambassador of the US, so I’m not really qualified, nor have I really read about the current pandemic response capability in the US. And I think it’s important to know that earthquakes or pandemics, you of course should invest in the science to predict how close is the next one.

  • But also, you should make the local, smaller earthquakes… make the buildings earthquake-proof, make the connection system earthquake-proof, even if it’s not a huge earthquake that destroys entire cities. And pandemic, again, sometimes they are self-limiting, sometimes it’s too toxic, and so it would never become a pandemic. But for the seven people who died, that is not a great explanation. You should still prevent those seven deaths.

  • And so my point being, the work can be done at the local scale. So if changing your fridge, for example, can help ameliorate, mitigate the ozone depletion, one should not wait until some national action. One should just start using a better fridge today.

  • Yeah, yeah. Well, just what I’m getting at when I’m saying, look at the US from 2020 to 2025… yeah, we have a few more people researching how to get mRNA vaccines online, but it’s not like we improved the CDC that much. It’s not like we actually invested that much into, “Hey, here’s the official new mRNA pipeline. We’re gonna respond so quickly next time.”

  • It’s still pretty unclear how we’re going to respond next time. We haven’t come out and made a statement, for example, on challenge trials… the idea that you can test a new vaccine, you can give people the virus. You’re still not allowed to give people the virus and test on them. People are still too squeamish. And you agree, challenge trials would help a lot, correct?

  • I think the point here is more whether investing in challenge trials and other things that we know will help when the pandemic comes, can pay dividends even in chronic times, not just for acute threats.

  • Right, and it can. Right?

  • If you figure that out, then it’s an easier case to make.

  • You’re right, it’s an easier case to make, and yet the US hasn’t even gotten on board. So the reason I’m bringing up this clown show of how we handled the aftermath of COVID is because we have seven years left to get our act together before the freaking singularity, and we just had five years where we couldn’t get our act together on things like, “Let’s do challenge trials, let’s formalize the mRNA vaccine pipeline a little more.”

  • Nope, we suck at that, but in five years, we’re suddenly going to be so good at having all these things… You know, in five years, we’re probably still going to have the Facebook timeline, the Instagram timeline, still be engagement bait. I think it’s gonna take longer than five years even just to fix that.

  • Well, I think if you still have a very high polarization per minute, like I said, it’s very difficult to convene and do anything even when everyone agrees. But if you can convene, for example, even by translating the current polarization or illusion of polarization into energy, you can build a geothermal engine that then turns the conflicts into co-creation because people are no longer trapped in this fire of PPM and say, “This is fine.”

  • So it’d be really great if we can solve social media curation or feed algorithms in the next year, and that’ll buy us a whole six years, maybe, to solve everything else before the singularity.

  • Precisely. That’s the plurality takeoff, yes.

  • It’s fair enough. Let me ask you the timelines question this way. You agree that we’re both in a very foggy headspace in terms of when exactly the singularity is coming. I agree to that. I agree I have a probability distribution where no one particular year has more than twenty percent of my probability. So it’s a vague probability distribution in that sense.

  • If you and I both knew for a fact that it was coming in six months, so like June 2026, if that’s when it was coming, wouldn’t you be way less optimistic that all the pieces would be in place for controlling that singularity if it was that soon?

  • Well, when I was around five years old, doctors told me and my family that this child only has a fifty percent chance to survive until heart surgery. I got one when I was twelve, so I’m alive now, as you can see.

  • But the probability distribution was not in my favor. When I was five, to the day that I got surgery, there were seven years where every day I went to sleep feeling like a coin toss. If it doesn’t land well, I simply do not wake up the next day.

  • So I got into a habit of what I call “publish before I perish.” I document everything I learned that day, cassette tapes, floppy disk, and the internet.

  • And the point I’m making is that sometimes knowing that extinction… okay, not for sure, but fifty percent is a very high number… is near, motivates people to be more selfless, to coordinate better, to not accumulate, because accumulation does not actually result in great coordination. And so maybe this news that you mentioned actually helps humanity to coordinate, but this is not for sure. In a high PPM environment, it will make the rage, the extremism even more. So again, the crux is to lower the polarization per minute.

  • No, I hear you, but I’m just drawing an implication of some of the stuff you told me. You told me that you agree that a foom scenario is plausible. You and I both agree on that, and the intelligence ceiling is really high, and we’re going to get AIs one way or the other, whether it’s in a hundred years, whether it’s in six months.

  • We’re going to get AIs that are much more powerful than the company SpaceX is today. So you agree with me on that, and you agree with me that we have a ways to go on getting the steering technology right, on solving the alignment problem. We have a ways to go.

  • And you also agree with me that we’re not that confident that foom isn’t coming in six months. So if you just combine together all those propositions, don’t you get kind of like an “oh, shit” moment? There’s a big P(Doom). There’s a very significant P(Doom).

  • No, not at all. Not at all, because the apple does not have to cling to the airplane, to mix our metaphors. The apple could be inside the plane.

  • So if we just take the plane with the label symbiogenesis instead of symbiosis, then we’re in a much better place. If we are literally part of that plane that is taking off, then, yes, we may suffer some high-G damages or something during the takeoff, but we’re still part of that plane.

  • But when you look at the AI companies today, are they meeting your standards for alignment? Because I think we’re gonna get to that. I think you’re saying that they’re not. We’re not on track today to have that kind of a good foom.

  • Well, I think they literally contain humans, last I checked.

  • Yes, but the AIs that they’re building don’t, right? Especially after some recursive iterations.

  • Well, if you read the DeepMind paper, the distributed AGI safety paper, it’s quite obvious that they are now thinking it as a mechanism, as a market design issue, so that they are not building a single system isolated from the rest of humanity to take off, but rather, they’re building a horizontal takeoff, where the entire humanity takes off.

  • Well, to clarify your mental model, though, what if it literally happened tomorrow? Don’t you think that’s not enough time? Don’t you think we’re screwed if it happens tomorrow?

  • Well, if it happens tomorrow and is pointed not to the offense, but to the defense part of the spectrum, no, we get supermasks. We get invisibility against epistemic damage, info damage, and cyber damage. We literally get proof-carrying code that cannot be hacked. That’s a good ending or the good beginning.

  • Just poking at your own mental model here… your position is there’s also a very significant chance that it doesn’t happen like that, right? If it were to happen tomorrow, there’s still a very significant chance that it just goes bad, correct?

  • Well, which is why the variance is so high for me. I literally don’t know.

  • Okay, fair enough. So let’s move on to this topic of AI alignment. Last month, you wrote a piece on sixpack.care saying AI alignment cannot be top-down. Explain what you mean by that.

  • Sure. The idea is philosophical, and I try to explain in very simple terms… correct me if I become too jargon-y. We talk about maximizing agents, something that just wants to win a game. We also talk about deontic, rule-following agents, something that not only wins the top score, but does not violate the game rules, because you cannot play move thirty-seven if you flip the board or destroy the game.

  • And AI systems are now very good at following abstract rules and optimizing for outcomes.

  • However, these two are very thin in the sense that, when we talk now, each of us are basically generating, through our sensory organs, about two terabits per second of experiential data, but only a tiny fragment of this is captured by this webcam and Riverside.

  • And so when we train AI systems only with those artifacts generated as snapshots of our actual experience, it is like in Plato’s cave. We’re looking at the shadows cast by the outside reality through our sensory organs, but the AI systems today are trained on a cave within a cave, that is a shadow of a shadow.

  • And so to say that is aligned is a vast, kind of overblown claim, because it can only align to expressed, observed things, that is the abstract universal rules, laws, and things like that, but not to our actual experience.

  • So the way we bring… you mentioned apples or dogs or any other entities… into our local community, relies on local attentiveness. They literally need to come down and live among the humans and learn from us.

  • Okay. So looking at the top AI companies today. Let’s say OpenAI, for instance. Do you think they’re doing alignment wrong?

  • Well, I think if they optimize for the like-unlike response, and sometime, you know, it shows you two responses, and you choose one or the other, and they trained on some of their signals a few months back. And ChatGPT, briefly, for like three days, became so sycophantic that if you tell it, “Audrey Tang is putting brainwave measurement chips into vaccines,” ChatGPT, during those three days, would say, “Oh, you see through the veil, you see the truth. Don’t believe the journalism, don’t believe your ministers. The conspiracy is actually true.”

  • So AI-induced human psychosis. And so, of course, to align to such individual short-term experience is not the full experience. It is just the captured experience of one bit of whether you like this more in the knee-jerk reaction or that.

  • So this is exactly like the TikTok algorithm optimizing for very short-term, very system-wide responses. So obviously, something of that shape, maximizing the likelihood of getting likes, the maximizing engine, is wrong. I don’t think anyone would dispute that. Ultimately, I think, they apologized for that decision.

  • So I think architecturally, the current way of everybody connecting to the same mainframe through terminals is very much incentive-aligned with this kind of perverse instantiation of optimization. So in this general direction, I think this is wrong.

  • On the other hand, OpenAI makes other products. For example, I’m on the board of ROOST, the Robust Open Online Safety Tools, and we just launched with OpenAI the gpt-oss-safeguard model, that can let a community host their own small model that can provide citations to their community standards and use it to guard against epistemic harm against that community, and it runs entirely locally, and you can inspect it much easier than if it’s just hosted somewhere in the cloud.

  • So in that direction, then it’s like a local community, and it’s much more six pack of care. So I don’t think we can analyze OpenAI as kind of one single entity. Some of its products and services are aligned to a plurality vision, and some of that are more singularity.

  • Okay, let’s take GPT 5.2. So if I’m understanding correctly, you feel that AI alignment cannot be top-down, but you think that today, OpenAI’s alignment for the consumer-facing ChatGPT 5.2 is top-down, and that’s bad, right?

  • I think fine-tuning using experiential data is very difficult with the currently deployed 5.2. Yes.

  • Yeah. Could we do a specific example of why GPT 5.2’s top-down alignment is bad?

  • Well, exactly the same as if you tell the current social media ranking algorithm, “You don’t want to be enraged that much. Please dial it down.” It kind of does that, and then it stops doing that. If you try to tune GPT 5.2 with your experiential data, with the social expectation, and so on, very quickly, you will find a lot of hidden assumptions.

  • I talk with people in Nairobi, in Kenya, in some parts of India, and some of the spiritual practitioners, some of the people who practice a different culture there. And because their experiential data are not sufficiently digitized, what was digitized was kind of stereotypes about them.

  • So then if you put in what they feel in a local doctor… this case actually happened… it would just diagnose as malnutrition, whereas it’s actually not malnutrition.

  • And there’s a dashboard called Weval.org where the kind of crowdsourced people’s experience of such mistreatments, epistemic injustice, in Sri Lanka, in many other places, civil society organizations come together, working with CIP, the Collective Intelligence Project, to document such cases so that we can see that these models, even as they score higher and higher on ARC eval or whatever, actually sometimes score lower and lower on this kind of alignment to local communities’ needs.

  • Right. So if I’m understanding correctly, you would have preferred if OpenAI kind of gives this open-ended, open source… and you’re even saying you like their open source model. So you’re basically saying they should add more user-tweakable parameters so that everybody can align it to their taste more powerfully?

  • And also, so that they live among the communities. Instead of all-seeing or recording or transmitting to the cloud, which people would never integrate into their lives, it should be locally embodied in a way that learns from our experiential data, processes locally, inspectable locally, and does not consume a lot of electricity. That would be one of the more ideal forms for a community to own, or what Jensen Huang would call personal supercomputing.

  • Okay. So going back to this thesis that you’re saying, that AI alignment cannot be top-down, what if we just throw in some top-down instructions to AI, for example, like, try to not make humanity go extinct? Isn’t that a pretty good top-down directive?

  • It is, but if you optimize for the maximal chance of that, then it can always find ways to sacrifice some other important thing that you did not specify, just to maximize the chance of that happening.

  • Yeah, I mean, that is true. You can imagine, you know, if it’s like you might wanna gamble 0.01 percent increase in human extinction in exchange for rapid increases in science or whatever. So I totally agree when the percent is low enough, sure.

  • And that gets to this whole thing that I agree with… yeah, it’s complicated. Alignment itself is a complicated problem. But I wouldn’t describe that as a failure of top-down alignment. I would just describe that as a failure to specify a sufficiently subtle utility function, but it could still be top-down.

  • When I think about ideal alignment, I do think about, like, coherent extrapolated volition. It is kind of top-down, where once you’ve sufficiently understood the true utility function of humanity, there’s nothing fundamentally wrong with just top-down installing that, is there?

  • Well, I think the main thing is the discontinuity challenge. If an in silico version of both of us and the rest of humanity deliberated for billions of years in subjective time, but actually just five minutes in clock time, and ended up with the total set of moral solutions, it will probably be seen as very alien from all of us, because there’s no blending process that along the lines, we participate ourselves, not our in silico digital twin, in the conversation.

  • So I think the maximal extrapolation is an interesting ideal. It’s certainly a better attractor than many others, but I think blending volition, even though it’s slightly slower, because every time you have to tune to the tune of the garden, so to speak, but I think it results in a much more continuous experience.

  • Okay, I mean, if we have all this time… if we have enough time, then I don’t have a strong opinion to agree or disagree with you. If we get undos, if we don’t all die, should we strive to make alignment top-down, or should we strive to make it incremental? I don’t have a strong opinion. I feel like either one of those can work.

  • Hinton, over the last few months, used to feel really pessimistic, but now he’s saying, “Hey, I realize that we just need to program some kind of maternal instinct into the AI, the same way that human mothers are aligned with their children, because they have this maternal instinct. They care for their children in that sense. If we can make the AIs care about us, like our parent, feel that kind of love or whatever it is.”

  • But you actually, I think, disagree with that. You don’t think it’s plausible to make AI maternal, and instead, you prefer the analogy of making AI like our gardener? Okay, explain that.

  • Well, first of all, I think the general turn to care ethics is great, because if AI ultimately does away with utilitarianism, scoring, deontology, following rules, then logically, what’s left for the human to do is virtue, relational virtue. So I think this is perfect.

  • I think what’s difficult for the metaphor of maternal instinct to convey is that the architecture of an AI system is not the same as a mammal animal. So unless through some organoid magic, it actually becomes like that, the maternal instinct is metaphorical.

  • But I don’t have a problem with Hinton’s general direction, which is attentiveness. Caring is actually a solution. If machines can actually care about human relationships and human-machine relationships as a first-class satisficing target, then we’re in a very good place, because then the apple is in the airplane, and then we are in the airplane.

  • And because of that, I think we’re quite compatible. And so I choose apples or some crops and the gardener, because, first of all, plants and farmers are different kinds of life, but they’re both life, and also they’re very different when it comes to speed.

  • A mother and a child exist still in kind of one, at most two, orders of magnitude difference when it comes to speed. But as I mentioned, crop and the farmer exist in thousands, even tens of thousands times difference in speed, and that is what a kind of proto-horizontal AGI speed with our current human coordination speed looks like.

  • So in order for the care relationships to work, the AI gardener needs to look at the human garden and work at the speed of the current human garden in order for the blending to work.

  • Right. Okay, so you’re just saying, yeah, Hinton’s kind of making sense when he talks about AI being our mother, but it’s a little bit more analogous to say that AI is gonna be our dog mom or our tomato mom.

  • Something like that, yes.

  • Okay. Yeah, fair enough. My objection is just that I think it’s going to run out of control before we can calibrate that dynamic. We just don’t get enough iterations. We don’t get enough trial and error. We just lose control, and it’s way too late. That’s my mainline scenario.

  • All right, but yeah, heading into the wrap-up here, let me try briefly asking you about the third rail here, China geopolitics, and obviously feel free to say what you can, don’t say what you can’t. It’s a bit tangential to the AI debate, because I think all of humanity actually is in the same boat. So I don’t really think it’s that much of a geopolitical question here.

  • But as we know, the most valuable physical manufacturer of chips, Taiwan Semiconductor Manufacturing Corporation, with a $1.6 trillion dollar market cap, TSMC, for the viewers, it takes Nvidia’s blueprints, and it physically constructs the chips using advanced lithography machines. TSMC owns the fabrication plants, and one of the hottest geopolitical issues in the world right now is China’s desire to annex Taiwan.

  • And if they were to do that, maybe China can prevent the US from having more access to TSMC chips than they do. This will intensify as AI companies become worth exponentially more. AI companies are, in my opinion, going to be worth tens of trillions pretty soon. So if anybody wants to make some money before the singularity, I would go long Google. You know, full disclosure, I’ve got overweight Google in my portfolio. It seems like a good bet before the singularity.

  • So your various efforts have strengthened Taiwan in a way that reduces its isolation from the larger world. So you strategically made those kind of efforts, which I think is great. You’ve reduced its dependence on China in some ways. I’ll open the floor to you. What is your view on that subject?

  • Yeah. I think the more people learn that TSMC is not just one company, but rather a set of practices that makes the entire supply chain trustworthy, I include in it, for example, the SEMI E187 standard that we in the Taiwan Ministry of Digital Affairs worked with TSMC to publish as a standard, basically assume breach. It assumes that any of your upstream vendors is probably already breached at some point in the cyberattack sense.

  • This is a resilience mindset, and this mindset is generally useful to everything else, to defending against, as I mentioned, AGI takeover and many other things.

  • So long story short, I think TSMC is not just a producer of chips, but rather Taiwan as a whole represents a resilience mindset when it comes to the inevitable window of offense dominance. It is, I would argue, the only way that works for the civilizational scale, when briefly, it’s easier to make cyberattacks than cyber defense. It’s easier to do super persuasion than it is to do epistemic defense, maybe bio, maybe other domains.

  • We need to assume the small fires are going to happen, and then we have to contain the fire together. So I would say that see Taiwan both as a trustworthy partner, of course, but also as an exporter of models that makes this trustworthiness despite the attacks.

  • Right. Okay. You know, if you read some of these latest things, like situational awareness, and there was, I forgot what it’s called, there was another paper saying: “Yep, countries are really going to be going after each other’s AI powers, and this is such a critical node.” You know, Taiwan is a pretty critical piece in the future economy, which is going to be ten times bigger before we know it. But yeah, I mean, I guess I don’t know. Neither of us can make a specific prediction there, and your approach of just adding more resilience can’t hurt.

  • You know, the only other thing that often comes up about China is this idea of the international treaty. It’s probably not something that you think much about because you seem pretty optimistic about the foom going well. Do you have any opinion on the pause AI movement and then the, you know, having an off button waiting in the wings in case things are going south, where all the different nations can be like, “Okay, activate the off button now. Let’s coordinate to do an off button.” Do you think that planning for that has value right now or no?

  • Well, I did sign on the Red Lines declaration, which my friend Maria Ressa helped with. As you probably know, I worry about super persuasion, and so the Taiwan model in defending against organized crime, frauds, and so on, is not to censor speech, but rather to make sure you can very clearly see which are human speech, which are digitally signed, joint liability, so that the incentive for the platforms is not to give the reach to the foreign trolls basically.

  • I think the freedom of speech should not prevent us from regulating the reach. And I just had an article that I co-wrote in the Communications of the ACM called “Sunsetting Section 230” that talks about this.

  • So there are various knobs of red lines I think that the societies have to collectively draw in anticipation of the very small fires, like the swarm of robots manufacturing human consent or whatever, that dawns on us. And drawing those thinner, smaller, more local red lines, I think are very good exercise to prepare for the much larger pandemic-sized red lines that international treaties are going to be based on.

  • Yeah, so if you’re all for drawing these red lines, let me just ask you very specifically, because my favorite policy proposal, the thing that feels like the biggest win to me, is this idea of preparing to potentially pause. “Hey, we’re all looking at this event, the singularity. We all think that there is a chance of doom.” Obviously, you’re big on not defining what the chance is, but there is a chance. It’s mitigating, it should be a priority.

  • So isn’t it a priority? Do you think it’s a policy priority? Out of the many… you’re talking about red lines, should we also potentially architect an off button that a few countries, the most powerful countries in the AI space, can vote to be like, “All right, shut it down.” It is now past the threshold where your optimism should now be turning to pessimism. Let’s prepare for that possibility. Hopefully, we never have to push the button, but should we be building a button right now, into the GPUs, like an off button, a radio-controlled phone-home type off button into the GPUs? Are you for or against that kind of policy?

  • Well, think of the situation as people detecting that ozone is depleting, but you do not, at this point, know how fast and how causational it is with the use of Freon, but there’s some correlation. So assume we’re at that point before the Montreal Protocol.

  • And now people come around and say, “Okay, let’s coordinate on a stop button to refrigeration.” So that if we see from the scientist observatories in a few years that, oh, there’s not just a causation, but a strong causation, if we continue to run fridges, in five years, the ozone will be gone, and we all die of cancer or something, we can collectively list up all fridges.

  • I think it’s a defensible policy position, but it’s not a probable policy position. The probable policy position is both we measure the ozone much better, invest in measurement, and we right now invest in Freon alternatives. When those two happen, then sunsetting the old Freon, that makes sense. But if you lead with “stop refrigeration,” well, for any civilization, once they start using refrigeration, there’s just no going back.

  • So in that particular analogy, though, if you just asked a bunch of experts, even if they’re all hand-waving, be like: “Look, we don’t know when the ozone’s gonna get depleted.” Be like, “Okay, just give me a vague distribution. If we go on this pace, how long before we all get cancer?”

  • I feel like most would be like, “Okay, all of us get cancer… yeah, I would measure that in decades.” Whereas in the case of AI, the experts are saying… a lot of them are saying, I don’t know, five years? Less than five years, maybe. Could even be next year. So I do think that there is actually a very significant quantitative difference on the scope of the emergency here, just according to what the experts are saying.

  • Well, the timelines are shorter, this is what you’re saying. And it does not change the policy trade-off. Basically, there needs to be a bootstrap sequence. You need to first come up with, for example, a post-transformer, post-attention. For example, just to use a random example, Power Retention networks by Manifest AI. That’s one of the better candidates.

  • And like RNN and LSTM, it’s much more mechanistically interpretable because it uses just linear memory and electricity for arbitrary long context window. So it’s much more amenable to this kind of experiential learning, and it’s also much better to train, because you can take a transformer and retrain it as power retention. But it doesn’t have to be power retention. It could be anything else.

  • So the point is that the existence, even though not at this point the commercial availability of such post-attention scrutable models, I think is an important thing to be put to policymakers as common knowledge. So instead of stopping, it’s like collectively steering away from uncertain, but probably doom.

  • Okay, I mean, what I’m getting is just that your P(Doom), I know you won’t say it or think it’s meaningful, but I model you as effectively having a P(Doom) that’s, let’s say, ten percent, in the ballpark of ten percent. And because of that, you’re like, “Eh, so if we do this off button intervention, it’s so costly, and P(Doom)’s only ten percent.”

  • Whereas anybody with a P(Doom) of, I don’t know, thirty percent plus, you’d be like: “Hell, yeah, we need an off button because there’s a thirty percent chance that this thing is going to doom us so soon. Of course, we need the ability to turn it off.” So that’s why I model you as having a low P(Doom), but…

  • I don’t think so. No, I really don’t think so. Because I think the lack of observability of the current architecture is the cause of the current variance among researchers. So we all have this weird-shaped distribution, but this can be solved.

  • There’s an outstanding bet on Manifold, I believe… https://www.isattentionallyouneed.com/ or something like that. And so it basically says, in two years, there’s, I think, around fifty-fifty chance that we are able then to build state-of-the-art models without depending on the unobservable, almost uninterpretable transformer attention model.

  • So if that coin lands well, then we do not have to say, oh, whether it’s ten percent, fifty percent, seventy percent. Everybody will converge on a much more narrow distribution. And if it’s around ten percent, then we do this. If it’s around ninety percent, I bet everybody will coordinate to do that.

  • All right. Sounds good. So this was a great conversation. At the end, I basically just summarize the disagreement, summarize what the guest communicated was their perspective, and why it’s not just equivalent to my own perspective.

  • So my attempt at summarizing your perspective is just like, well, first of all, it’s very interesting that you are kind of on the foom train. You’re like, “Oh, yeah, foom could very much be real. Intelligence is quite a high spectrum,” which I appreciate that you have basically ridden that far on what I call the doom train.

  • You’re like: “Yep, if you just pluck the ideal super intelligent algorithm out of algorithm space, it can recursively self-improve, and if it’s not aligned to humanity, it can definitely kill everybody.”

  • Yeah, I’m definitely a foomer, even though not a doomer. I’m a foomer bloomer.

  • Yes, a foomer. Right, a foomer bloomer. Not a doomer. Okay, that’s a good way to describe it. Yeah.

  • Okay, so you… and you’ve ridden with me, and I call that riding on the doom train. You’ve accepted that super intelligence is real, so yeah, that’s right. And then we start diverging as soon as we talk about alignment, because you’re like: “Yeah, you know, we’ll have time. The humans are like the apples that get to ride on the airplane. The airplane’s much faster than the apple.” It’s mixing metaphor, but it’s okay because it’ll still take care of the apples, it’ll put the apples inside the airplane. It’ll work out fine.

  • And then when I pointed out, “Well, aren’t the timelines really short?” And you seem to think, “Well, however short they are, it’ll just… it’ll be the right speed, you know, to get the alignment done.”

  • So that was… I think that was the meat of the discussion, is, you know, is that plausible? Is that the mainline scenario, that we’ll get our act together and have it aligned in time? And you’re like, “Yeah, we’ll just have enough time. It’s not gonna be a super emergency. We’ll make more time if we need to make more time.” So I guess you have a lot more optimism about the timeline clicking together. How’s that for a summary?

  • Well, I think if people do know that there is a more interpretable architecture and collectively coalesce around it, then we can peg the take-off to the governance, and then our coordination gets basically at the same or even faster than the capability. And then we use that to steer toward the defense, not the offensive dominance.

  • But I also grant that it is possible that even though we know there are better non-Freon alternatives, for some reason, maybe high polarization per minute, humanity collectively decides not to switch planes, and we’re stuck in a plane that does not open an apple-shaped window, in which case we’re certainly doomed. And the probability of both, I don’t actually know at this point, so that’s my honest epistemic stance.

  • All right, Audrey Tang, I gotta put a lot of respect on your name here, because you didn’t have to do this. You didn’t have to engage in debate. A lot of people just stick to going on podcasts where they just repeat their own perspective and don’t let it be challenged, and you definitely let it be challenged.

  • I was kinda nonstop challenging your perspective. That’s what we do here on Doom Debates, and you really took it like a champ. You didn’t dodge any questions. You just engaged in good faith. You didn’t give me canned answers.

  • So I just wanna appreciate, you know, when you come on and do this, it really adds support to one of the show’s missions of raising the quality of debate. You know, giving the gift of debate to make society function better, and thank you for doing that. I really appreciate it.

  • Yes, thank you, and Taiwan is not just chips and bubble tea; it’s where earth’s plate tectonics makes mountains rise. Pressure makes diamonds, so let’s free the future together.