Faulty Reward Functions in the Wild

Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we'll explore one failure mode, which is where you misspecify your reward function. At OpenAI, we've recently started using Universe, our software for measuring and training AI agents, … | Continue reading


@blog.openai.com | 5 years ago

How AI Training Scales

We've discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training. | Continue reading


@blog.openai.com | 5 years ago

Quantifying Generalization in Reinforcement Learning

We’re releasing a new training environment, CoinRun, that precisely quantifies an agent’s ability to transfer its experience to novel test environments. | Continue reading


@blog.openai.com | 5 years ago

Spinning Up in Deep RL

We’re releasing Spinning Up in Deep RL, an educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning. Spinning Up consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials. Take your … | Continue reading


@blog.openai.com | 5 years ago

Learning Concepts with Energy Functions

We’ve built an energy-based model that can quickly recognize, generate, and transfer simple concepts (such as near, above, between, closest, and furthest) represented by sequences or sets of 2D points. | Continue reading


@blog.openai.com | 5 years ago

Learning Complex Goals with Iterated Amplification

We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Alth … | Continue reading


@blog.openai.com | 5 years ago

OpenAI Scholars Winter 2019 – Application Open

We are now accepting applications for our second cohort of OpenAI Scholars, a program where we provide 6-10 stipends and mentorship to individuals from underrepresented groups to study deep learning full-time for 3 months and open-source a project. The first cohort of Scholars re … | Continue reading


@blog.openai.com | 5 years ago

OpenAI @ The International 2018: Results

OpenAI Five lost two games against top Dota 2 players at The International in Vancouver this week, maintaining a good chance of winning for the first 35 minutes of the first and the first 20 minutes of the second against teams of top Dota 2 professionals. In contrast to our | Continue reading


@blog.openai.com | 5 years ago

OpenAI Five Benchmark: Results

Yesterday, OpenAI Five won a best-of-three against a team of 99.95th percentile Dota players: Blitz, Cap, Fogged, Merlini, and MoonMeander — four of whom have played Dota professionally — in front of a live audience and 100,000 concurrent livestream viewers. The human team won ga … | Continue reading


@blog.openai.com | 5 years ago

OpenAI Five Benchmark: Results

Yesterday, OpenAI Five won a best-of-three against a team of 99.95th percentile Dota players: Blitz, Cap, Fogged, Merlini, and MoonMeander — four of whom have played Dota professionally — in front of a live audience and 100,000 concurrent livestream viewers. The human team won ga … | Continue reading


@blog.openai.com | 5 years ago

Learning Dexterity

We've trained a human-like robot hand to manipulate physical objects with unprecedented dexterity. | Continue reading


@blog.openai.com | 5 years ago

OpenAI Five Benchmark

We’ve removed the most significant restrictions on OpenAI Five’s gameplay — namely, wards, Roshan, and mirror match of fixed heroes, and will soon benchmark our progress by playing top 99.95th-percentile Dota players. | Continue reading


@blog.openai.com | 5 years ago

Synthesizing realistic high-resolution images with Glow

We introduce Glow, a reversible generative model which uses invertible 1x1 convolutions. It extends previous work on reversible generative models and simplifies the architecture. Our model can generate realistic high resolution images, supports efficient sampling, and discovers f … | Continue reading


@blog.openai.com | 5 years ago

Infrastructure for Deep Learning

Deep learning is an empirical science, and the quality of a group's infrastructure is a multiplier on progress. Fortunately, today's open-source ecosystem makes it possible for anyone to build great deep learning infrastructure. In this post, we'll share how deep learning researc … | Continue reading


@blog.openai.com | 5 years ago

OpenAI Five

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. | Continue reading


@blog.openai.com | 5 years ago

OpenAI Five

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. | Continue reading


@blog.openai.com | 5 years ago

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

We've discovered that evolution strategies (ES), an optimization technique that's been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks (e.g. Atari/MuJoCo), while overcoming many of RL's inconveniences. In partic … | Continue reading


@blog.openai.com | 5 years ago

Retro Contest: Results

The first run of our Retro Contest — exploring the development of algorithms that can generalize from previous experience — is now complete. Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There's a lon … | Continue reading


@blog.openai.com | 5 years ago

Improving Language Understanding with Unsupervised Learning

We've obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we're also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing exam … | Continue reading


@blog.openai.com | 5 years ago

OpenAI Fellows–Fall 2018

We’re now accepting applications for the next cohort of OpenAI Fellows, a program which offers a compensated 6-month apprenticeship in AI research at OpenAI. We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. App … | Continue reading


@blog.openai.com | 5 years ago

Gym Retro

We're releasing the full version of Gym Retro, a platform for reinforcement learning research on games. This brings our publicly-released game count from around 70 Atari games and 30 Sega games to over 1,000 games across a variety of backing emulators. We're also releasing the to … | Continue reading


@blog.openai.com | 5 years ago

AI and Compute

Since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month doubling time (by comparison, Moore's Law had an 18 month doubling period). | Continue reading


@blog.openai.com | 6 years ago

AI and Compute

Since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month doubling time (by comparison, Moore's Law had an 18 month doubling period). | Continue reading


@blog.openai.com | 6 years ago

AI Safety via Debate

We're proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins. We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capab … | Continue reading


@blog.openai.com | 6 years ago