Project Overview

The goal of this project was to create a quadruped platform for reinforcement learning tasks under $600. First, I built a Pybullet environment using the open-source Spot Micro CAD models (and later my own). Then, I deployed a 12-point Bezier gait scheme and used it as a baseline for Reinforcement Learning tasks on various terrain environments. After building this original version, I collaborated with a friend to mechanically redesign Spot for higher fidelity, and more optimal weight distribution. Check out the package on Github!

Sim2Real: Gait Modulation with Bezier Curves

My experience with locomotive RL is that it is mostly limited to demonstrative tasks (walk forward, stand up, etc) with little real-world use. Using this platform, I propose a novel reinforcement learning method that seeks to deliver a robust and universally controllable gait. It builds on an existing gait scheme using 12-point Bezier curves which allow for any combination of forward, lateral, and yaw commands at user-defined step heights, lengths, and speeds. The method wraps a learning agent around this scheme to modulate gait parameters such as step and body height, and to add significant residuals to the resultant foot coordinates. The only sensor used here is an IMU. After simulating training for 1 hour on a CPU (~500 epochs), I was able to achieve the following real-world results, where the agent is shown on the right.

The best part is that even though the agent was only trained to walk forward, it responds to previously unseen commands such as yaw and lateral motion! This means that you can finally use RL on a real robot! Keep in mind that this is all done on a $600 platform!

Inverse Kinematics

After deriving the Inverse Kinematics for each leg, the next step was to describe the IK for the body itself. The approach used here considers a world frame $w$, which is the robot centroid’s base position, and a body frame $b$, describing the robot’s pose relative to the world frame. In addition, we have $T_{ws}$, which is a transform from the world frame to the robot’s shoulder: this describes the base transform between the robot centroid and the shoulder. Finally, we have our inputs: $T_{wb}$, which describes the desired transform from world to body (RPY and Translation), and $T_{bf}$, the desired foot position relative to the transformed body - this is useful for gait generation. The output of our process is $T_{sf}$, the transform between each shoulder and its respective foot required to achieve this motion - this is fed into the leg IK solver to retrieve joint angles. The gallery below shows our inputs and outputs. Note that this diagram is facing the robot, so the example shown is for body roll.

Here’s a gif of the body IK in action:

Bezier Gait

The Bezier Gait deployed in this project uses a closed-loop trajectory generator, which resets whenever the reference foot (front left) hits the ground after a swing. The Bezier curve for the Swing period is made up of 12 points, where some of them overlap to induce zero vertical velocities or changes in foot direction. The Table and image below summarize the gait:

For the Stance portion of the gait, we simply deploy a sinusoidal curve whose z-amplitude is the desired penetration depth and whose x-amplitude is the half the Stride Length. Note, for y-coordinate foot motion, we apply the same Bezier and Sinusoidal curves as for x-coordinate motions.

Here’s what this looks like on Spot:

I implemented Yaw control based on this paper which treats the quadruped as a four-wheel steering car. The intuition here is that to turn clockwise, both front feet should move towards the rear-left, and both back feet should move towards the rear-right of the robot during the stance phase. To adequately trace a circular path while doing this, the directional vector of each foot is modulated at each iteration by $\theta_{mod}$ to remain tangent to said circle. This is calculated using $arctan (\frac{M_{mod}}{M_{base}})$, where $M_{mod}$ is the change in magnitude of the commanded foot coordinates relative to the stance coordinates, and $M_{base}$ is the magnitude of the $x$ and $y$ elements of the shoulder-to-foot vector in stance mode.

Here’s what this looks like on Spot:

Gym Environment and Terrain

The environment provided here is largely derived from Pybullet’s minitaur example. In fact, it is nearly identical aside from accounting for the differences in the robots themselves. Another difference is the terrain used in the environment, which is an optional programmatically generated heightfield triggered at the command-line. You should experiment with the meshscale argument as well, as this will change the characteristics of your terrain. This environment is great for locomotive reinforcement learning tasks! Notice that if we increase the mesh size, and hence the terrain’s roughness, the robot loses the ability to traverse it:

Reinforcement Learning Task

To allow for stable terrain traversal, I trained an Augmented Random Search agent with a 12-dimensional observation space [IMU Inputs (8), Leg Phases (4)] and a 14-dimensional action space [Clearance Height (1), Body Height (1), and Foot XYZ Residual modulations (12)] processed through an exponential filter with alpha = 0.7, the agent was able to traverse the light terrain in as little as 150 epochs, and on terrain of twice the height (shown above) in 500 epochs.

Here is a system diagram describing the Gait Modulation with Bezier Curves method:

Real World Validation

Here are some additional takes where the agent is on the right!

Mechanical Redesign

Together with Adham Elarabawy, I have a completed a total mechanical redesign of SpotMicro, the robot that inspired this project. We call it Open Quadruped!

Main improvements:
  • Shortened the body by 40mm while making more room for our electronics with adapter plates.
  • Moved all the servos to the hip to save 60g on the lower legs, which are now belt-drive actuated with tunable belt tightness.
  • Added support bridge on hip joint for added longevity.
  • Added flush slots for hall effect sensors on the feet.

I also went created a new URDF with proper inertial values on each link, making the simulation much more reliable.

Power Distribution Board

We also designed this Power Distribution Board with a 1.5mm Track Width to support up to 6A at a 10C temperature increase (conservative estimate). There are copper grounding planes on both sides of the board to help with heat dissipation, and parallel tracks for the power lines are provided for the same reason. The PDB also includes shunt electrolytic capactiors for each servo motor to smooth out the power input. The board interfaces with a sensor array (optionally used for foot sensors) and contains two I2C terminals and a regulated 5V power rail. At the center of the board is a Teensy 4.0 which communicates with a Raspberry Pi over ROSSerial to control the 12 servo motors and read analogue sensors.