Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Here’s a thought experiment. If you (anyone - everyone, really) could definitely stop suffering now (if not this second then reasonably soon, say within ~5-10 years) by some means, is there any valid reason for not doing so and continuing to suffer? Is there any reason for continuing to do anything else other than stop suffering (besides providing for food and shelter to that end)?

Now, what if you were to learn there really is a way to accomplish this, with method(s) developed over the course of thousands of human years and lifetimes, the fruits of which have been verified in the experiences of thousands of humans, each of whom attained a total and forevermore cessation of their own suffering?

Knowing this, what possible reason could you give to justify continuing to suffer, for yourself, for your communities, for humanity?

Why/how this preempts the priority of AI work on the present EA agenda

I can only imagine one kind of possible world in which it makes more sense to work on AI safety now and then stop suffering thereafter. The sooner TAI is likely to arrive and the more likely it is that its arrival will be catastrophic without further intervention and (crucially) the more likely it is that the safety problem actually will be solved with further effort, the more reasonable it becomes to make AI safe first and then stop suffering.

To see this, consider a world in which TAI will arrive in 10 years, it will certainly result in human extinction unless and only unless we do X, and it is certainly possible (even easy) to accomplish X in the next 10 years. Presuming living without suffering is clearly preferable to not suffering by not living, it is not prima facie irrational to spend the next 10 years ensuring humanity’s continued survival and then stop suffering.

On the other hand, the more likely it is that either 1) we cannot or will not solve the safety problem in time or 2) the safety problem will be solved without further effort/intervention (possibly by never having been much of a problem to begin with), the more it makes sense to prioritize not suffering now, regardless of the outcome. Now, it’s not that I think 2) is particularly likely, so it more or less comes down to how tractable you believe the problem is and how likely your (individual or collective) efforts are to move the needle further in the right direction on safe AI.

These considerations have led me to believe the following:

CLAIM. It is possible, if not likely, that the way to eliminate the most future suffering in expectation is to stop suffering and then help others do the same, directly, now—not by trying to move the needle on beneficial/safe AI.

In summary, given your preference, ceteris paribus, to not suffer, the only valid reason I can imagine for not immediately working directly towards the end of your own suffering and instead focusing on AI safety is a belief that you will gain more (in terms of not suffering) after the arrival of TAI upon which you intervened than you will lose in the meantime by suffering until its arrival, in expectation. This is even presuming a strict either/or choice for the purpose of illustration; why couldn’t you work on not suffering while continuing to work towards safe AI as your “day job”? Personally, the years I spent working on AI safety were the same years in which I was working towards the end of my suffering, and I don’t see why others couldn’t do the same.

So, why are you still suffering? Is it because you believe that your efforts to solve AI safety will meaningfully result in the avoidance of catastrophe in expectation, and that you could not make said efforts without temporarily deprioritizing the end of your own suffering?

The Community’s Blindspot/Why I Quit AI Safety Work

I can only imagine three possible explanations for why this argument hasn’t already been considered and discussed by the EA/Rationalist community:

Nobody in the community believes that not suffering in this lifetime is possible. Well, it is. I know so in my own experience. If you do not know so yet, perhaps historical examples such as Gotama Buddha might convince you that human experience without suffering is entirely possible.
Individuals in the community, knowing the end of suffering is a genuine possibility, have considered this possibility for themselves and concluded that their efforts are still better directed at making safety progress, presumably for the reason outlined above—that putting off working towards the end of their suffering by working on AI safety is meaningfully changing the likelihood of resulting outcomes from catastrophic to reliably safe. (This seems unlikely to me, as I’ve never seen such a discussion amongst members of the community. Not that I don’t think any individuals have such beliefs—I just think it likely that this has not been explicitly considered by many individuals or discussed amongst the community.)
Otherwise, people in this community believe that not suffering in this lifetime is possible and that the adoption of AI is likely to go either poorly or well, regardless of their efforts, and yet continue to suffer. They are therefore acting incoherently according to their own preference(s). Preferring to not suffer, that being a genuine and attainable outcome, and continuing to suffer is incoherent.

As noted, I think 1) and 3) are the most likely explanations. In the case of 1), individuals and therefore the community remain ignorant of a legitimately possible realization of their own preferences, while in the case of 3), Rationalists are acting inconsistently in accordance with their own preferences. Either way, there is a problem.

To be clear, my intention in writing this post is not to say, “if you’re thinking about and working on AI safety, you should stop,” and I’m certainly not saying that AI safety isn’t a serious concern. As mentioned above, I think you may be able to work towards the end of suffering and make efforts on the safety front. I wanted to write this post in order to illuminate some of my own reasons for stepping away from AI safety and in doing so hopefully raise awareness about the end of suffering and spark discussions amongst members of this community. The end of human suffering should be a cause area we care about and devote resources towards, even if we’re in a world in which the single most important cause remains ensuring beneficial outcomes with respect to the proliferation of AI, for now. At the very least, if ending suffering in this lifetime is indeed possible, as I claim it to be, there needs to be a strong justification for deprioritizing that in favor of longer-term concerns like AI safety, and I do not presently observe the community prioritizing directly ending human suffering to begin with.

May all beings be free of suffering.

Share on

Twitter Facebook Google+ LinkedIn

Suffering Is Not Pain

9 minute read

Published: June 18, 2024

“Pain is inevitable; suffering is optional.”

Mapping the Conceptual Territory in AI Existential Safety and Alignment

39 minute read

Published: December 17, 2020

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,¹ as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩

Spinning Up in Deep RL: Getting Started

4 minute read

Published: November 29, 2020

I’ll be spending the next month getting some hands-on experience with deep reinforcement learning via OpenAI’s Spinning Up in Deep RL, which includes both an overview of key concepts in deep reinforcement learning and a well-documented repository of implementations of key algorithms that are designed to be 1) “as simple as possible while still being reasonably good,” and 2) “highly-consistent with each other to expose fundamental similarities between algorithms.” I’ll be posting here about this endeavor in order to document the process and share the lessons I learn along the way for those who are also looking to “spin up” in deep RL.

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read

Published: October 08, 2020

Recently, I listened to a podcast¹ from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”². Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.

Future of Life Institute, Andrew Critch on AI Research Considerations for Human Existential Safety ↩
Andrew Critch and David Krueger, AI Research Considerations for Human Existential Safety (ARCHES) ↩

Jack Koch