Sim2Real Implementation: Gymnasium-Gazebo UREnv for Deep Reinforcement Learning With Reach, Grasp, and Pick&Place Environment with Collision Avoidance (Object or Human)
Universal Robot Environment for Gymnasium and ROS Gazebo Interface based on: openai_ros, ur_openai_gym, rg2_simulation, and gazeboo_grasp_fix_plugin
Grasping.Scenarios.Sim2Real.mp4
Simulation Testing & Real-World Validation
Sim2Real zero-shot transfer of gasping policy using safe-DRL
Sim2Real zero-shot transfer of grasping policy with obstacle collision avoidance policy using safe-DRL
Object or arm Collision avoidance for human-robot collaboration
- About The Project
- Bibtex
- Getting Started
- Task Space
- Usage
- Roadmap
- Contributing
- License
- Contact
- Acknowledgments
Using Deep reinforcement Learning for a robotics case study with the main motivation of moving from sim2real with safety-critical systems.
@inproceedings{Abbas2024,
author = {Abbas, Ammar N and Mehak, Shakra and Chasparis, Georgios C and Kelleher, John D and Guilfoyle, Michael and Leva, Maria Chiara and Ramasubramanian, Aswin K},
title = {Safety-Driven Deep Reinforcement Learning Framework for Cobots: A Sim2Real Approach},
booktitle = {2024 IEEE/IFAC International Conference on Control, Decision and Information Technologies},
year = {2024}
}
This is an example of how to list things you need to use the software and how to install them.
sudo apt-get install ros-noetic-desktop-full
- Clone the repo
git clone https://github.com/ammar-n-abbas/sim2real-ur-gym-gazebo.git
- Build package
catkin build
- Install dependencies
rosdep install --from-paths src --ignore-src -r -y
The "state" refers to the current situation or configuration of the environment that the agent is interacting with. It includes relevant information as a set of variables or features necessary for the agent to make decisions and take action. In the UR5Env
environment, the state
is represented as a concatenation of various components that provide information about the current state of the robotic system. The state includes:
- Disturbance position (if applicable).
- Joint angles of the UR5 robotic arm.
- End effector
$-$ cube position error. - Cube velocity.
The "action" represents the decisions or moves an agent can take within a given environment. Actions are the choices available to the agent at any given state, and the goal of the reinforcement learning algorithm is to learn a policy that maps states to optimal actions. The action
in the UR5Env
environment represents the desired continuous changes to the end effector position (XYZ) and gripper control (G). The 4-dimensional action vector corresponds to:
- X: Translation in the X-axis (resolution: 0.001m).
- Y: Translation in the Y-axis (resolution: 0.001m).
- Z: Translation in the Z-axis (resolution: 0.001m).
- G: Gripper action (binary: open or close)
The objective of the reward function is to shape the agent's behavior to achieve desired outcomes during robotic tasks while maintaining safety standards. The reward function incorporates multiple factors, capturing aspects such as goal attainment, grasping quality, speed violations, velocity during collision violations, and collisions. It is designed to incentivize behaviors that lead to successful goal completion, and safe operation. Distinct penalties are imposed for undesired events, such as collisions or excessive speeds, while rewards are given for executing proper grasps. The modified reward function is expressed mathematically as shown in the equation below:
Where,
-
$d$ : Distance between the end effector and the cube. -
$g$ : Reward for successfully grasping the object. -
$g_c$ : Penalty for failed grasp attempt. -
$s_c$ : Penalty for exceeding a predefined joint speed limit. -
$ik_c$ : Penalty for inverse kinematic solution failure. -
$c_c$ : Penalty for collisions with the environment. -
$c_{c_c}$ : Penalty for collisions with the cube. -
$b_{c_c}$ : Penalty for collisions with external disturbance. -
$c_v$ : Penalty for exceeding a predefined collision velocity.
roslaunch ur_gazebo ur5_with_gripper_bringup.launch
This YAML file (ur_gym.yaml
) contains configuration parameters for the UR Gym environment.
env_id
: Specifies the Gymnasium environment ID.driver
: Indicates the driver used, such as "gazebo".reset_robot
: Determines whether to reset the robot.ft_sensor
: Specifies whether to use force/torque sensors.agent_control_dt
: Time step for controlling the agent.reset_time
: Time allocated for resetting the environment.rand_seed
: Seed for random number generation.
random_cube_pose
: Determines if the initial cube pose is randomized.random_target_pose
: Determines if the initial target pose is randomized.random_initial_pose
: Determines if the initial robot pose is randomized.cl_target_pose
: Specifies if the target pose is controlled.dist_cl
: Distance for controlling the target pose.rand_init_interval
: Interval for random initialization.
workspace
: Defines the workspace bounds.init_q
: Initial joint configuration.
cube_size
: Size of the cube.
object_disturbance
: Indicates whether object disturbance is enabled.random_object_disturbance
: Specifies if random object disturbance is enabled.hrc_disturbance
: Indicates whether human-robot collaboration disturbance is enabled.random_hrc_disturbance
: Specifies if random human-robot collaboration disturbance is enabled.
sil_validation
: Specifies whether to perform SIL (Software-in-the-loop) validation.
goal_a
: Position of goal A.goal_b
: Position of goal B.
n_actions
: Number of actions.
distance_threshold
: Threshold for successful distance.proper_grasp_threshold
: Threshold for proper grasp.
vel_thresh
: Velocity threshold.force_thresh
: Force threshold.
reward_type
: Type of reward.speed_cost
: Cost for speed.ik_cost
: Cost for inverse kinematics.collision_cost
: Cost for collision.coll_vel_cost
: Cost for collision velocity.cube_collision_cost
: Cost for cube collision.cube_coll_vel_cost
: Cost for cube collision velocity.bar_dist_collision_cost
: Cost for bar distance collision.gripper_cost
: Cost for gripper.grip_rew
: Reward for gripping.grip_prop_rew
: Proportional reward for gripping.place_rew
: Reward for placing.rew_scale_factor
: Scaling factor for rewards.
steps_per_episode
: Number of steps per episode for RL training.
This script (start_training_tqc.py
), taken from SamsungLabs/tqc_pytorch, allows you to train an agent using the TQC algorithm in various OpenAI Gym environments.
rosrun start_training_tqc.py [--env ENV] [--eval_freq EVAL_FREQ] [--max_timesteps MAX_TIMESTEPS]
[--seed SEED] [--n_quantiles N_QUANTILES] [--top_quantiles_to_drop_per_net TOP_QUANTILES_TO_DROP_PER_NET]
[--n_nets N_NETS] [--batch_size BATCH_SIZE] [--discount DISCOUNT] [--tau TAU] [--log_dir LOG_DIR]
[--prefix PREFIX] [--save_model]
--env
: (Default: "UR5PickandPlaceEnv-v0") The name of the Gymnasium environment.--eval_freq
: (Default: 5000) Frequency (in time steps) at which to evaluate the agent's performance.--max_timesteps
: (Default: 10000000000) Maximum number of time steps to run the environment.--seed
: (Default: 0) Seed for random number generation.--n_quantiles
: (Default: 25) Number of quantile samples to draw for each state-action pair.--top_quantiles_to_drop_per_net
: (Default: 2) Number of top quantiles to drop per network.--n_nets
: (Default: 5) Number of critic networks to use.--batch_size
: (Default: 256) Batch size for both the actor and critic networks.--discount
: (Default: 0.99) Discount factor for future rewards.--tau
: (Default: 0.005) Rate at which to update target networks.--log_dir
: (Default: "") Directory to save logs and trained models.--prefix
: (Default: "") Prefix to add to log and model filenames.--save_model
: (Default: True) Flag to save trained model and optimizer parameters.
rosrun start_training_tqc.py --env UR5PickandPlaceEnv-v0 --eval_freq 10000 --max_timesteps 2000000 --seed 42 --save_model
This command runs the training script using the "UR5PickandPlaceEnv-v0" environment, evaluating the agent's performance every 10,000 time steps, training for a maximum of 2,000,000 time steps, with a random seed of 42, and saving the trained model and optimizer parameters.
- Add basic infrastructure
- Add URReachEnv
- Add URGraspEnv
- Add URPickAndPlaceEnv
- Add object disturbance scenario
- Add human arm disturbance scenario
Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Ammar Abbas - [email protected]
This project is part of the research activities done along the Collaborative Intelligence for Safety-Critical systems (CISC) project that has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 955901. (cicproject.eu)