Posts by Tags

The Need for ML Safety Researchers

3 minute read

Published: June 20, 2018

I recently came across this article in the New York Times, entitled “Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots.” I found it quite thought provoking, even though the mainstream media’s accounts of these topics and debates always leave much to be desired (note: if you mention The Terminator, The Matrix, and 2001: A Space Odyssey in a discussion about AGI and superintelligence, you’ve already lost me).

Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Mapping the Conceptual Territory in AI Existential Safety and Alignment

39 minute read

Published: December 17, 2020

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,¹ as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read

Published: October 08, 2020

Recently, I listened to a podcast¹ from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”². Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.

Future of Life Institute, Andrew Critch on AI Research Considerations for Human Existential Safety ↩
Andrew Critch and David Krueger, AI Research Considerations for Human Existential Safety (ARCHES) ↩

The Need for ML Safety Researchers

3 minute read

Published: June 20, 2018

I recently came across this article in the New York Times, entitled “Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots.” I found it quite thought provoking, even though the mainstream media’s accounts of these topics and debates always leave much to be desired (note: if you mention The Terminator, The Matrix, and 2001: A Space Odyssey in a discussion about AGI and superintelligence, you’ve already lost me).

Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Mapping the Conceptual Territory in AI Existential Safety and Alignment

39 minute read

Published: December 17, 2020

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,¹ as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read

Published: October 08, 2020

Recently, I listened to a podcast¹ from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”². Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.

Future of Life Institute, Andrew Critch on AI Research Considerations for Human Existential Safety ↩
Andrew Critch and David Krueger, AI Research Considerations for Human Existential Safety (ARCHES) ↩

Welcome!

less than 1 minute read

Published: June 06, 2018

Welcome to my blog! I’ll be writing about my various academic interests here, including machine learning, deep learning, natural language processing, and AI alignment. I hope you enjoy!

Comparing Pre-trained Language Models with Semantic Parsing

17 minute read

Published: January 09, 2019

In my last post, I showed how adding ELMo features to a seq2seq model improved performance on semantic parsing tasks. Recently, I have been experimenting with adding OpenAI GPT and BERT to the model in order to compare their performance against ELMo’s. All the data, configuration files, and scripts needed to reproduce my experiments have been pushed to the GitHub repository. I’m excited to share my results!

Applying Unsupervised Pretraining to Language Generation: Semantic Parsing + ELMo

6 minute read

Published: December 24, 2018

For those who haven’t heard it yet, NLP’s ImageNet moment has arrived; approaches such as ULMFiT, ELMo, OpenAI GPT, and BERT have gained significant traction in the community in the last year by using the unsupervised pretraining of language models to achieve significant improvements above prior state-of-the-art results on a diverse set of language understanding tasks (including classification, commonsense reasoning, and coreference resolution, among others) and datasets. (For more on unsupervised pretraining and the motivations behind it, read the blog post about NLP’s ImageNet moment I have linked above.)

Reproducing SOTA Commonsense Reasoning Result in fast.ai with a OpenAI’s Pretrained Transformer Language Model

5 minute read

Published: October 04, 2018

I wanted to write this blog post to share a bit of interesting code I’ve been working on recently. Earlier this year, OpenAI achieved SOTA results on a diverse set of NLP tasks and datasets utilizing unsupervised pretraining, nearly identically the same approach as the one ULMFiT used to achieve SOTA on several text classification datasets. However, OpenAI used the new Transformer architecture instead of the AWD LSTM used by ULMFiT and trained on a billion token corpus instead of ULMFiT’s Wikitext-103.

Spinning Up in Deep RL: Getting Started

4 minute read

Published: November 29, 2020

I’ll be spending the next month getting some hands-on experience with deep reinforcement learning via OpenAI’s Spinning Up in Deep RL, which includes both an overview of key concepts in deep reinforcement learning and a well-documented repository of implementations of key algorithms that are designed to be 1) “as simple as possible while still being reasonably good,” and 2) “highly-consistent with each other to expose fundamental similarities between algorithms.” I’ll be posting here about this endeavor in order to document the process and share the lessons I learn along the way for those who are also looking to “spin up” in deep RL.

Rick and Morty & Metamodernism: Always “both-neither,” Never “either-or”

19 minute read

Published: December 15, 2018

In examining any piece of science fiction, considering the context of the work, whether historical, cultural, philosophical, etc., is of the utmost importance. “Literature & the Future” is missing a text that accurately reflects the context of today; that is, a text should be included that is representative of the way that our society and culture presently thinks of futurity. The TV show Rick and Morty, specifically the episode “Rixty Minutes,” is the best candidate for a text of this nature. Humanity is now living in “the future” that the thinkers discussed in class speculated about in the past, so it is desirable to consider what the concept of futurity means in an age where humans are simultaneously more connected and isolated than ever before. In essence, “Rixty Minutes” should be included as a “missing text” for the class syllabus because it self-reflexively offers a metamodern, integrative worldview as a solution for the crisis of human existence as it presently exists in the age of technology.

Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Mapping the Conceptual Territory in AI Existential Safety and Alignment

39 minute read

Published: December 17, 2020

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,¹ as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read

Published: October 08, 2020

Recently, I listened to a podcast¹ from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”². Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.

Future of Life Institute, Andrew Critch on AI Research Considerations for Human Existential Safety ↩
Andrew Critch and David Krueger, AI Research Considerations for Human Existential Safety (ARCHES) ↩

The Need for ML Safety Researchers

3 minute read

Published: June 20, 2018

I recently came across this article in the New York Times, entitled “Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots.” I found it quite thought provoking, even though the mainstream media’s accounts of these topics and debates always leave much to be desired (note: if you mention The Terminator, The Matrix, and 2001: A Space Odyssey in a discussion about AGI and superintelligence, you’ve already lost me).

Spinning Up in Deep RL: Getting Started

4 minute read

Published: November 29, 2020

I’ll be spending the next month getting some hands-on experience with deep reinforcement learning via OpenAI’s Spinning Up in Deep RL, which includes both an overview of key concepts in deep reinforcement learning and a well-documented repository of implementations of key algorithms that are designed to be 1) “as simple as possible while still being reasonably good,” and 2) “highly-consistent with each other to expose fundamental similarities between algorithms.” I’ll be posting here about this endeavor in order to document the process and share the lessons I learn along the way for those who are also looking to “spin up” in deep RL.

Suffering Is Not Pain

9 minute read

Published: June 18, 2024

“Pain is inevitable; suffering is optional.”

Why I stopped working on AI safety

6 minute read

Published: May 02, 2024

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Rick and Morty & Metamodernism: Always “both-neither,” Never “either-or”

19 minute read

Published: December 15, 2018

In examining any piece of science fiction, considering the context of the work, whether historical, cultural, philosophical, etc., is of the utmost importance. “Literature & the Future” is missing a text that accurately reflects the context of today; that is, a text should be included that is representative of the way that our society and culture presently thinks of futurity. The TV show Rick and Morty, specifically the episode “Rixty Minutes,” is the best candidate for a text of this nature. Humanity is now living in “the future” that the thinkers discussed in class speculated about in the past, so it is desirable to consider what the concept of futurity means in an age where humans are simultaneously more connected and isolated than ever before. In essence, “Rixty Minutes” should be included as a “missing text” for the class syllabus because it self-reflexively offers a metamodern, integrative worldview as a solution for the crisis of human existence as it presently exists in the age of technology.

Comparing Pre-trained Language Models with Semantic Parsing

17 minute read

Published: January 09, 2019

In my last post, I showed how adding ELMo features to a seq2seq model improved performance on semantic parsing tasks. Recently, I have been experimenting with adding OpenAI GPT and BERT to the model in order to compare their performance against ELMo’s. All the data, configuration files, and scripts needed to reproduce my experiments have been pushed to the GitHub repository. I’m excited to share my results!

Applying Unsupervised Pretraining to Language Generation: Semantic Parsing + ELMo

6 minute read

Published: December 24, 2018

For those who haven’t heard it yet, NLP’s ImageNet moment has arrived; approaches such as ULMFiT, ELMo, OpenAI GPT, and BERT have gained significant traction in the community in the last year by using the unsupervised pretraining of language models to achieve significant improvements above prior state-of-the-art results on a diverse set of language understanding tasks (including classification, commonsense reasoning, and coreference resolution, among others) and datasets. (For more on unsupervised pretraining and the motivations behind it, read the blog post about NLP’s ImageNet moment I have linked above.)

Reproducing SOTA Commonsense Reasoning Result in fast.ai with a OpenAI’s Pretrained Transformer Language Model

5 minute read

Published: October 04, 2018

I wanted to write this blog post to share a bit of interesting code I’ve been working on recently. Earlier this year, OpenAI achieved SOTA results on a diverse set of NLP tasks and datasets utilizing unsupervised pretraining, nearly identically the same approach as the one ULMFiT used to achieve SOTA on several text classification datasets. However, OpenAI used the new Transformer architecture instead of the AWD LSTM used by ULMFiT and trained on a billion token corpus instead of ULMFiT’s Wikitext-103.

Reproducing SOTA Commonsense Reasoning Result in fast.ai with a OpenAI’s Pretrained Transformer Language Model

5 minute read

Published: October 04, 2018

I wanted to write this blog post to share a bit of interesting code I’ve been working on recently. Earlier this year, OpenAI achieved SOTA results on a diverse set of NLP tasks and datasets utilizing unsupervised pretraining, nearly identically the same approach as the one ULMFiT used to achieve SOTA on several text classification datasets. However, OpenAI used the new Transformer architecture instead of the AWD LSTM used by ULMFiT and trained on a billion token corpus instead of ULMFiT’s Wikitext-103.

Jack Koch

Posts by Tags

agi

alignment

cause_prioritization

existential_safety

general

nlp

reinforcement learning

rick_and_morty

safety

spinning up

suffering

suffering_reduction

television

transfer_learning

transformer