A company called Canco has announced plans to build a huge cannery near Noyo. The murderous, sex-hungry mutations are apparently the result of Canco's experiments with a growth hormone they had earlier administered to salmon. The salmon escaped from Canco's laboratory into the ocean during a storm and were eaten by large fish that then mutated into the brutal, depraved humanoids that have begun to terrorize the village.
Second unit director James Sbardellati, who later directed Deathstalker, was hired to enliven the film; he filmed explicit scenes in which the humanoids rape women. These changes were not communicated to most of the people who had made the film with the working title Beneath the Darkness; several of them expressed shock and anger at the released film, its changed title, and the nudity and sexual exploitation. After Peeters and Turkel saw the additional sequences, they asked for their names to be removed from the film, but were refused. Turkel appeared on television talk shows and castigated Corman for his actions.
We organized the Learn to Move competition series to facilitate developing control models with advanced deep RL techniques in neuromechanical simulation. It has been an official competition at the NeurIPS conference from 2017 to 2019. We provided the neuromechanical simulation environment, OpenSim-RL, and participants developed locomotion controllers for a human musculoskeletal model. In the most recent competition, NeurIPS 2019: Learn to Move - Walk Around, the top teams adapted state-of-the-art deep RL techniques and successfully controlled a 3D human musculoskeletal model to follow target velocities by changing walking speed and direction as well as transitioning between walking and standing. Some of these locomotion behaviors were demonstrated in neuromechanical simulations for the first time without using reference motion data. While the solutions were not explicitly designed to model human learning or control, they provide means of developing control models that are capable of producing complex motions.
Modeling human motor control is crucial for a predictive neuromechanical simulation. However, most of our current understanding of human locomotion control is extrapolated from experimental studies of simpler animals [60, 61] as it is extremely difficult to measure and interpret the biological neural circuits. Therefore, human locomotion control models have been proposed based on a few structural and functional control hypotheses that are shared in many animals (Fig. 3). First, locomotion in many animals can be interpreted as a hierarchical structure with two layers, where the lower layer generates basic motor patterns and the higher layer sends commands to the lower layer to modulate the basic patterns . It has been shown in some vertebrates, including cats and lampreys, that the neural circuitry of the spinal cord, disconnected from the brain, can produce stereotypical locomotion behaviors and can be modulated by electrical stimulation to change speed, direction and gait [62, 63]. Second, the lower layer seems to consist of two control mechanisms: reflexes [64, 65] and central pattern generators (CPGs) [66, 67]. In engineering terms, reflexes and CPGs roughly correspond to feedback and feedforward control, respectively. Muscle synergies, where a single pathway co-activates multiple muscles, have also been proposed as a lower layer control mechanism that reduces the degrees of freedom for complex control tasks [68, 69]. Lastly, there is a consensus that humans use minimum effort to conduct well-practiced motor tasks, such as walking [70, 71]. This consensus provides a basis for using energy or fatigue optimization [26,27,28] as a principled means of finding control parameter values.
Most neuromechanical control models are focused on lower layer control using spinal control mechanisms, such as CPGs and reflexes. CPG-based locomotion controllers consist of both CPGs and simple reflexes, where the CPGs, often modeled as mutually inhibiting neurons , generate the basic muscle excitation patterns. These CPG-based models [8, 73,74,75,76,77] demonstrated that stable locomotion can emerge from the entrainment between CPGs and the musculoskeletal system, which are linked by sensory feedback and joint actuation. A CPG-based model that consists of 125 control parameters produced walking and running with a 3D musculoskeletal model with 60 muscles to walk and run . CPG-based models also have been integrated with different control mechanisms, such as muscle synergies [8, 76, 77] and various sensory feedback circuits [74, 76]. On the other hand, reflex-based control models consist of simple feedback circuits without any temporal characteristics and demonstrate that CPGs are not necessary for producing stable locomotion. Reflex-based models [6, 20, 78,79,80] mostly use simple feedback laws based on sensory data accessible at the spinal cord such as proprioception (e.g., muscle length, speed and force) and cutaneous (e.g., foot contact and pressure) data [61, 65]. A reflex-based control model with 80 control parameters combined with a simple higher layer controller that regulates foot placement to maintain balance produced diverse locomotion behaviors with a 3D musculoskeletal model with 22 muscles, including walking, running, and climbing stairs and slopes  and reacted to a range of unexpected perturbations similarly to humans  (Fig. 4). Reflex-based controllers also have been combined with CPGs  and a deep neural network that operates as a higher layer controller  for more control functions, such as speed and terrain adaptation.
This section highlights the concepts from deep reinforcement learning relevant to developing models for motor control. We provide a brief overview of the terminology and problem formulations of RL and then cover selected state-of-the-art deep RL algorithms that are relevant to successful solutions in the Learn to Move competition. We also review studies that used deep RL to control human locomotion in physics-based simulation.
Reinforcement learning algorithms for continuous action space. The diagram is adapted from  and presents a partial taxonomy of RL algorithms for continuous control, or continuous action space. This focuses on a few modern deep RL algorithms and some traditional RL algorithms that are relevant to the algorithms used by the top teams in our competition. TRPO: trust region policy optimization ; PPO: proximal policy optimization ; DDPG: deep deterministic policy gradients ; TD3: twin delayed deep deterministic policy gradients ; SAC: soft-actor critic 
Human motion simulation studies have used various forms of RL (Fig. 6). A number of works in neuromechanical simulation [6, 75] and computer graphics studies [95, 96] reviewed in the Background on neuromechanical simulations of human locomotion section used policy search methods  with derivative-free optimization techniques, such as evolutionary algorithms, to tune their controllers. The control parameters are optimized by repeatedly running a simulation trial with a set of control parameters, evaluating the objective function from the simulation result, and updating the control parameters using an evolutionary algorithm . This optimization approach makes very minimal assumptions about the underlying system and can be effective for tuning controllers to perform a diverse array of skills [6, 122]. However, these algorithms often struggle with high dimensional parameter spaces (i.e., more than a couple of hundred parameters) . Therefore, researchers developed controllers with a relatively low-dimensional set of parameters that could produce desired motions, which require a great deal of expertise and human insight. Also, the selected set of parameters tend to be specific for particular skills, limiting the behaviors that can be reproduced by the character.
The potential synergy of neuromechanical simulations and deep RL methods in modeling human control motivated us to develop the OpenSim-RL simulation platform and to organize the Learn to Move competition series. OpenSim-RL  leverages OpenSim to simulate musculoskeletal models and OpenAI Gym, a widely used RL toolkit , to standardize the interface with state-of-the-art RL algorithms. OpenSim-RL is open-source and is provided as a Conda package , which has been downloaded about 42,000 times from 2017 to 2019. Training a controller for a human musculoskeletal model is a difficult RL problem considering the large-dimensional observation and action spaces, delayed and sparse rewards resulting from the highly non-linear and discontinuous dynamics, and the slow simulation of muscle dynamics. Therefore, we organized the Learn to Move competition series to crowd-source machine learning expertise in developing control models of human locomotion. The mission of the competition series is to bridge neuroscience, biomechanics, robotics, and machine learning to model human motor control.
NeurIPS 2019: Learn to Move - Walk Around was held online from June 6 to November 29 in 2019. The task was to develop a locomotion controller, which was scored based on its ability to meet target velocity vectors when applied in the provided OpenSim-RL simulation environment. The environment repository was shared on Github , the submission and grading were managed using the AIcrowd platform , and the project homepage provided documentation on the environment and the competition . Participants were free to develop any type of controller that worked in the environment. We encouraged approaches other than brute force deep RL by providing human gait data sets of walking and running [138,139,140] and a 2D walking controller adapted from a reflex-based control model  that could be used for imitation learning or in developing a hierarchical control structure. There were two round