CACTO: Continuous Actor-Critic algorithm with Trajectory Optimization

Files:

main implements CACTO with state = [x,t]. Inputs: test-n, system-id, seed, recover-training-flag, nb-cpus, and w-S.
TO implements the TO problem of the selected system whose end effector has to reach a target state while avoiding an obstacle and ensuring low control effort. The TO problem is modelled in CasADi and solved with ipopt.
RL implements the acotr-critic RL problem of the selected system whose end effector has to reach a target state while avoiding an obstacle and ensuring low control. It creates the state trajectory and controls to initialize TO.
NeuralNetwork contains the functions to create the NN-models and to compute the quantities needed to update them.
environment contains the functions of the selected system (reset, step, and get-end-effector-position functions).
environment_TO contains the functions of the selected system implemented with CasADi (step, and get-end-effector-position functions).
replay_buffer implements a reply buffer where to store and sample transitions. It implements also a prioritized version of the replay buffer using a segment tree structure implemented in segment_tree to efficiently calculate the cumulative probability needed to sample.
robot_utils implements the dynamics of the selected system with Pinocchio.
plot contains the plot functions
system_conf configures the training for the selected system.
urdf contains system URDF file (double integrator and manipulator).

Systems: single integrator (system-id: single_integrator), double integrator (system-id: double_integrator), car (system-id: 'car'), car_park (system-id: 'car_park'), and 3 DOF planar manipulator (system-id: manipulator)

Inputs:

Argument Name	Type	Default	Choices	Help
`--test-n`	int	0		Test number
`--seed`	int	0		Random and tf.random seed
`--system-id`	str	'single_integrator'	single_integrator, double_integrator, car, car_park, manipulator, ur5	System-id (single_integrator, double_integrator, car, car_park, manipulator, ur5)
`--nb-cpus`	int	2		Number of TO problems solved in parallel
`--w-S`	float	0		Sobolev training - weight of the value related error
`--GPU-flag`	bool	N/A	include (True), omit (False)	Flag to use GPU for training
`--recover-training-flag`	bool	N/A	include (True), omit (False)	Flag to recover training

Example of usage:

python3 main.py --system-id='single_integrator' --seed=0 --nb-cpus=15 --w-S=1e-2 --test-n=0

The "single_integrator" system is selected;
All the seeds are set to 0;
15 TO problems are solved in parallel (if enough resources are available);
The weight of the value-error is set to 1e-2 (the value-gradient-error is set to 1). Note that w-S=0 corresponds to the standard CACTO algorithm (without Sobolev-Learning);
The information about the test and the results are stored in the folder N_try_0.
The actor critic model is trained on the CPU without training recovery

To run the same example on GPU and recover training, use the following command: python3 main.py --system-id='single_integrator' --seed=0 --nb-cpus=15 --w-S=1e-2 --test-n=0 --GPU-flag --recover-training-flag

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Results Double Integrator/Results set test		Results Double Integrator/Results set test
Results Single Integrator/Results set test		Results Single Integrator/Results set test
__pycache__		__pycache__
urdf		urdf
CACTO DEPENDENCY DEBUGGING READ ME		CACTO DEPENDENCY DEBUGGING READ ME
NeuralNetwork.py		NeuralNetwork.py
README.md		README.md
RL.py		RL.py
TO.py		TO.py
conf_car.py		conf_car.py
conf_car_park.py		conf_car_park.py
conf_double_integrator.py		conf_double_integrator.py
conf_manipulator.py		conf_manipulator.py
conf_single_integrator.py		conf_single_integrator.py
conf_ur5.py		conf_ur5.py
environment.py		environment.py
environment_TO.py		environment_TO.py
generate_tests_set_script.py		generate_tests_set_script.py
installation instructions.txt		installation instructions.txt
main.py		main.py
misc debug notes do not open.txt		misc debug notes do not open.txt
plot_utils.py		plot_utils.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
robot_utils.py		robot_utils.py
segment_tree.py		segment_tree.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CACTO: Continuous Actor-Critic algorithm with Trajectory Optimization

About

Releases

Packages

Languages

nadimkanazi/cacto

Folders and files

Latest commit

History

Repository files navigation

CACTO: Continuous Actor-Critic algorithm with Trajectory Optimization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages