Blog posts


Why I stopped working on AI safety

6 minute read


Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)


Mapping the Conceptual Territory in AI Existential Safety and Alignment

39 minute read


Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,1 as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

  1. Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. 

Spinning Up in Deep RL: Getting Started

4 minute read


I’ll be spending the next month getting some hands-on experience with deep reinforcement learning via OpenAI’s Spinning Up in Deep RL, which includes both an overview of key concepts in deep reinforcement learning and a well-documented repository of implementations of key algorithms that are designed to be 1) “as simple as possible while still being reasonably good,” and 2) “highly-consistent with each other to expose fundamental similarities between algorithms.” I’ll be posting here about this endeavor in order to document the process and share the lessons I learn along the way for those who are also looking to “spin up” in deep RL.

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read


Recently, I listened to a podcast1 from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”2. Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.


Comparing Pre-trained Language Models with Semantic Parsing

17 minute read


In my last post, I showed how adding ELMo features to a seq2seq model improved performance on semantic parsing tasks. Recently, I have been experimenting with adding OpenAI GPT and BERT to the model in order to compare their performance against ELMo’s. All the data, configuration files, and scripts needed to reproduce my experiments have been pushed to the GitHub repository. I’m excited to share my results!


Applying Unsupervised Pretraining to Language Generation: Semantic Parsing + ELMo

6 minute read


For those who haven’t heard it yet, NLP’s ImageNet moment has arrived; approaches such as ULMFiT, ELMo, OpenAI GPT, and BERT have gained significant traction in the community in the last year by using the unsupervised pretraining of language models to achieve significant improvements above prior state-of-the-art results on a diverse set of language understanding tasks (including classification, commonsense reasoning, and coreference resolution, among others) and datasets. (For more on unsupervised pretraining and the motivations behind it, read the blog post about NLP’s ImageNet moment I have linked above.)

Rick and Morty & Metamodernism: Always “both-neither,” Never “either-or”

19 minute read


In examining any piece of science fiction, considering the context of the work, whether historical, cultural, philosophical, etc., is of the utmost importance. “Literature & the Future” is missing a text that accurately reflects the context of today; that is, a text should be included that is representative of the way that our society and culture presently thinks of futurity. The TV show Rick and Morty, specifically the episode “Rixty Minutes,” is the best candidate for a text of this nature. Humanity is now living in “the future” that the thinkers discussed in class speculated about in the past, so it is desirable to consider what the concept of futurity means in an age where humans are simultaneously more connected and isolated than ever before. In essence, “Rixty Minutes” should be included as a “missing text” for the class syllabus because it self-reflexively offers a metamodern, integrative worldview as a solution for the crisis of human existence as it presently exists in the age of technology.

Reproducing SOTA Commonsense Reasoning Result in with a OpenAI’s Pretrained Transformer Language Model

5 minute read


I wanted to write this blog post to share a bit of interesting code I’ve been working on recently. Earlier this year, OpenAI achieved SOTA results on a diverse set of NLP tasks and datasets utilizing unsupervised pretraining, nearly identically the same approach as the one ULMFiT used to achieve SOTA on several text classification datasets. However, OpenAI used the new Transformer architecture instead of the AWD LSTM used by ULMFiT and trained on a billion token corpus instead of ULMFiT’s Wikitext-103.

The Need for ML Safety Researchers

3 minute read


I recently came across this article in the New York Times, entitled “Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots.” I found it quite thought provoking, even though the mainstream media’s accounts of these topics and debates always leave much to be desired (note: if you mention The Terminator, The Matrix, and 2001: A Space Odyssey in a discussion about AGI and superintelligence, you’ve already lost me).


less than 1 minute read


Welcome to my blog! I’ll be writing about my various academic interests here, including machine learning, deep learning, natural language processing, and AI alignment. I hope you enjoy!