Spinning Up in Deep RL: Getting Started

4 minute read

Published: November 29, 2020

I’ll be spending the next month getting some hands-on experience with deep reinforcement learning via OpenAI’s Spinning Up in Deep RL, which includes both an overview of key concepts in deep reinforcement learning and a well-documented repository of implementations of key algorithms that are designed to be 1) “as simple as possible while still being reasonably good,” and 2) “highly-consistent with each other to expose fundamental similarities between algorithms.” I’ll be posting here about this endeavor in order to document the process and share the lessons I learn along the way for those who are also looking to “spin up” in deep RL.

Motivation

There are several reasons I am pursuing this project. Most importantly, I am pursuing careers where I hope to contribute to reducing potential existential risks from AI as well as ensuring that humanity deploys systems that are robustly beneficial for everyone. Reinforcement learning is a topic that comes up very frequently in discussions within this context, especially in the research paradigms of places like OpenAI, DeepMind, and CHAI, since it seems possible that deep RL could scale to yield prosaic AGI and thus studying alignment in the context of current RL systems is one way to make tractable progress on these problems now. So it seems desirable to have a solid grounding in RL in order to not only better understand ongoing work in the field but also be able to begin empirical research into potential alignment schemes within current ML paradigms. While I have a background in machine learning and have touched upon the concept of reinforcement learning at various points in my education, I have not worked with modern RL algorithms in a practical context, and I believe that doing so will help me master the content better than merely studying the concepts and reading papers, for example.

Additionally, this project will be my first extended ML coding project in some time, and I hope it will help me get back into the swing of things, as it were, with the ML development process. My past projects in NLP affirmed the stereotypical difficulty of debugging machine learning programs, so it will be good to exercise that mental “frustration tolerance” muscle again in anticipation of doing similar work at a job or for a PhD.

(Rough) Plan

I spent good chunks of the last two days or so getting set up for this project (more details below), as well as going through their “Introduction to RL” material, which was a helpful review of the important underlying concepts. Now that I have, I intend to spend some time studying how these concepts translate into code, probably starting with their implementation of the Vanilla Policy Gradient algorithm. Then, I want to re-implement one or two other algorithms from scratch, since being able to do so would be a good confidence check in my understanding of the concepts. Depending on how that goes, I might try and re-implement an inverse reinforcement learning algorithm, but, as always with machine learning, I am anticipating that everything will probably take substantially longer than I first expect. I also expect to get a better sense of what I more specifically want to accomplish with this project as I delve into it.

Getting Set Up

One of the things I definitely did not miss about ML development was the difficulty and frustration that often accompanies simply setting up environments and installing packages for a project. There were several challenges I encountered in getting set up with Spinning Up, which I document here in case others encounter similar stumbling blocks.

General Set Up

I set up a VM on GCP (through their Notebooks platform, in case I wanted to easily be able to use JupyterLab at some point). Here are the details for my instance:

GCP Instance Details

Installing Spinning Up

I was mostly able to follow the documented installation guide to create a conda environment and install OpenMPI and the Spinning Up repo. I was able to run PPO in the LunarLander-v2 environment successfully, but ran into problems watching a video of the trained policy and plotting the results of the run.

Graphics

I was able to get the plot of results via X11 forwarding by adding --ssh-flag="-Y" to my gcloud compute ssh command, but was still running into errors trying to view the video of the agent in the environment. I explored various solutions to remotely rendering video output from a virtual instance and played around with Xvfb and vncviewer but could not get them to work. Eventually, I stumbled across instructions for setting up Chrome Remote Desktop for Linux on Compute Engine VM instances, which did the trick. I’ll continue to use ssh for my normal development workflow for this project, but whenever I need to view video samples from trained policies, I can simply run the relevant command through a remote desktop interface:

Chrome Remote Desktop

MuJoCo

I decided to go ahead and get a free 30-day license for MuJoCo to have full access to the set of possible environments, since my timeline for this project is about a month, anyway. However, it turns out that Spinning Up does not support the latest version (2.0), so I had to install version 1.5 (which can be found on their website). After sorting out a few more installation errors regarding dependencies I didn’t already have installed, I was able to successfully able to run the test command for PPO in the Walker2d-v2 environment.

Conclusion

I hope this post might be helpful to those running into any of the same issues getting set up with the Spinning Up in Deep RL package. I will continue to document my progress on this project over the next several weeks here on my blog.

Share on

Twitter Facebook Google+ LinkedIn

Mapping the Conceptual Territory in AI Existential Safety and Alignment

40 minute read

Published: December 17, 2020

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,¹ as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩

The Need for Better Terminology in Discussing Existential Risks from AI

24 minute read

Published: October 08, 2020

Recently, I listened to a podcast¹ from the Future of Life Institute in which Andrew Critch (from the Center for Human Compatible AI at Berkeley) discussed his and David Krueger’s recent paper, “AI Research Considerations for Human Existential Safety (ARCHES)”². Throughout the episode, I found myself impressed by the clarity and the strength of many of the points Critch made. In particular, I’m thinking about how Critch distinguishes “existential safety” from “safety” more generally, “delegation” from “alignment,” and “prepotent AI” from “generally intelligent AI” or “superintelligent AI” as concepts that can help give us more traction in analyzing the potential existential risks posed by artificial intelligences. So, I decided it would be worthwhile to write this post on one of my key takeaways from the episode: the community working on AI-related existential risks needs to adopt better, more precise terminology.

Future of Life Institute, Andrew Critch on AI Research Considerations for Human Existential Safety ↩
Andrew Critch and David Krueger, AI Research Considerations for Human Existential Safety (ARCHES) ↩

Comparing Pre-trained Language Models with Semantic Parsing

17 minute read

Published: January 09, 2019

In my last post, I showed how adding ELMo features to a seq2seq model improved performance on semantic parsing tasks. Recently, I have been experimenting with adding OpenAI GPT and BERT to the model in order to compare their performance against ELMo’s. All the data, configuration files, and scripts needed to reproduce my experiments have been pushed to the GitHub repository. I’m excited to share my results!

Applying Unsupervised Pretraining to Language Generation: Semantic Parsing + ELMo

6 minute read

Published: December 24, 2018

For those who haven’t heard it yet, NLP’s ImageNet moment has arrived; approaches such as ULMFiT, ELMo, OpenAI GPT, and BERT have gained significant traction in the community in the last year by using the unsupervised pretraining of language models to achieve significant improvements above prior state-of-the-art results on a diverse set of language understanding tasks (including classification, commonsense reasoning, and coreference resolution, among others) and datasets. (For more on unsupervised pretraining and the motivations behind it, read the blog post about NLP’s ImageNet moment I have linked above.)

Jack Koch