forked from gianluigigrandesso/cacto
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request gianluigigrandesso#7 from gianluigigrandesso/devel
CACTO-SL 12_12_2023
- Loading branch information
Showing
22 changed files
with
4,356 additions
and
1,380 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,36 @@ | ||
# CACTO: Continuous Actor-Critic algorithm with Trajectory Optimization | ||
|
||
- ***main*** implements CACTO with state = *[q,v,t]* (joint angles, velocities and time). Inputs: test-n (default: 0), system-id (default:'-'), TO-method (default: 'pyomo'), and seed (default: None) | ||
- ***TO*** implements the TO problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control effort. The TO problem is modelled in *Pyomo* and solved with *ipopt*. | ||
**Files**: | ||
- ***main*** implements CACTO with state = *[x,t]*. Inputs: test-n, system-id, seed, recover-training-flag, nb-cpus, and w-S. | ||
- ***TO*** implements the TO problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control effort. The TO problem is modelled in *CasADi* and solved with *ipopt*. | ||
- ***RL*** implements the acotr-critic RL problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control. It creates the state trajectory and controls to initialize TO. | ||
- ***NeuralNetwork*** contains the functions to create the NN-models and to compute the quantities needed to update them. | ||
- ***environment*** contains the training functions of the selected *system* (reset, step (both array and tensor version), and get-end-effector-position functions). | ||
- ***environment*** contains the functions of the selected *system* (reset, step, and get-end-effector-position functions). | ||
- ***environment_TO*** contains the functions of the selected *system* implemented with *CasADi* (step, and get-end-effector-position functions). | ||
- ***replay_buffer*** implements a reply buffer where to store and sample transitions. It implements also a prioritized version of the replay buffer using a segment tree structure implemented in ***segment_tree*** to efficiently calculate the cumulative probability needed to sample. | ||
- ***robot_utils*** implements the dynamics of the selected *system* with Pinocchio. | ||
- ***plot*** contains the plot functions | ||
- ***system_conf*** configures the training for the selected *system*. | ||
- ***inits*** contains the functions for the selected *system* to warm-start TO (ICS, CACTO's rollout, 0s). | ||
- ***urdf*** contains *system* URDF file. | ||
- ***urdf*** contains *system* URDF file (double integrator and manipulator). | ||
|
||
**Systems**: | ||
single integrator (system-id: single_integrator), double integrator (system-id: double_integrator), car (system-id: 'car'), car_park (system-id: 'car_park'), and 3 DOF planar manipulator (system-id: manipulator) | ||
|
||
**Inputs**: | ||
| Argument Name | Type | Default | Choices | Help | | ||
|-------------------------|--------|---------|------------------------------------------------------------------------------------------------------|-------------------------------------| | ||
| `--test-n` | int | 0 | | Test number | | ||
| `--seed` | int | 0 | | Random and tf.random seed | | ||
| `--system-id` | str | 'single_integrator' | single_integrator, double_integrator, car, car_park, manipulator, ur5 | System-id (single_integrator, double_integrator, car, car_park, manipulator, ur5) | | ||
| `--recover-training-flag` | bool | False | True, False | Flag to recover training | | ||
| `--nb-cpus` | int | 2 | | Number of TO problems solved in parallel | | ||
| `--w-S` | float | 0 | | Sobolev training - weight of the value related error | | ||
|
||
|
||
Example of usage: | ||
|
||
*Systems*: double integrator (system-id: double_integrator) and 3 DOF manipulator (system-id: manipulator) | ||
```python3 main.py --system-id='single_integrator' --seed=0 --nb-cpus=15 --w-S=1e-2 --test-n=0``` | ||
- The "single_integrator" system is selected; | ||
- All the seeds are set to 0; | ||
- 15 TO problems are solved in parallel (if enough resources are available); | ||
- The weight of the value-error is set to 1e-2 (the value-gradient-error is set to 1). Note that w-S=0 corresponds to the standard CACTO algorithm (without Sobolev-Learning); | ||
- The information about the test and the results are stored in the folder N_try_0. |
Oops, something went wrong.