Skip to content

Commit

Permalink
Merge pull request gianluigigrandesso#7 from gianluigigrandesso/devel
Browse files Browse the repository at this point in the history
CACTO-SL 12_12_2023
  • Loading branch information
elisaalboni authored Dec 12, 2023
2 parents 9b395b8 + 4fbe2a9 commit 1a770c7
Show file tree
Hide file tree
Showing 22 changed files with 4,356 additions and 1,380 deletions.
210 changes: 157 additions & 53 deletions NeuralNetwork.py

Large diffs are not rendered by default.

35 changes: 28 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,36 @@
# CACTO: Continuous Actor-Critic algorithm with Trajectory Optimization

- ***main*** implements CACTO with state = *[q,v,t]* (joint angles, velocities and time). Inputs: test-n (default: 0), system-id (default:'-'), TO-method (default: 'pyomo'), and seed (default: None)
- ***TO*** implements the TO problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control effort. The TO problem is modelled in *Pyomo* and solved with *ipopt*.
**Files**:
- ***main*** implements CACTO with state = *[x,t]*. Inputs: test-n, system-id, seed, recover-training-flag, nb-cpus, and w-S.
- ***TO*** implements the TO problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control effort. The TO problem is modelled in *CasADi* and solved with *ipopt*.
- ***RL*** implements the acotr-critic RL problem of the selected *system* whose end effector has to reach a target state while avoiding an obstacle and ensuring low control. It creates the state trajectory and controls to initialize TO.
- ***NeuralNetwork*** contains the functions to create the NN-models and to compute the quantities needed to update them.
- ***environment*** contains the training functions of the selected *system* (reset, step (both array and tensor version), and get-end-effector-position functions).
- ***environment*** contains the functions of the selected *system* (reset, step, and get-end-effector-position functions).
- ***environment_TO*** contains the functions of the selected *system* implemented with *CasADi* (step, and get-end-effector-position functions).
- ***replay_buffer*** implements a reply buffer where to store and sample transitions. It implements also a prioritized version of the replay buffer using a segment tree structure implemented in ***segment_tree*** to efficiently calculate the cumulative probability needed to sample.
- ***robot_utils*** implements the dynamics of the selected *system* with Pinocchio.
- ***plot*** contains the plot functions
- ***system_conf*** configures the training for the selected *system*.
- ***inits*** contains the functions for the selected *system* to warm-start TO (ICS, CACTO's rollout, 0s).
- ***urdf*** contains *system* URDF file.
- ***urdf*** contains *system* URDF file (double integrator and manipulator).

**Systems**:
single integrator (system-id: single_integrator), double integrator (system-id: double_integrator), car (system-id: 'car'), car_park (system-id: 'car_park'), and 3 DOF planar manipulator (system-id: manipulator)

**Inputs**:
| Argument Name | Type | Default | Choices | Help |
|-------------------------|--------|---------|------------------------------------------------------------------------------------------------------|-------------------------------------|
| `--test-n` | int | 0 | | Test number |
| `--seed` | int | 0 | | Random and tf.random seed |
| `--system-id` | str | 'single_integrator' | single_integrator, double_integrator, car, car_park, manipulator, ur5 | System-id (single_integrator, double_integrator, car, car_park, manipulator, ur5) |
| `--recover-training-flag` | bool | False | True, False | Flag to recover training |
| `--nb-cpus` | int | 2 | | Number of TO problems solved in parallel |
| `--w-S` | float | 0 | | Sobolev training - weight of the value related error |


Example of usage:

*Systems*: double integrator (system-id: double_integrator) and 3 DOF manipulator (system-id: manipulator)
```python3 main.py --system-id='single_integrator' --seed=0 --nb-cpus=15 --w-S=1e-2 --test-n=0```
- The "single_integrator" system is selected;
- All the seeds are set to 0;
- 15 TO problems are solved in parallel (if enough resources are available);
- The weight of the value-error is set to 1e-2 (the value-gradient-error is set to 1). Note that w-S=0 corresponds to the standard CACTO algorithm (without Sobolev-Learning);
- The information about the test and the results are stored in the folder N_try_0.
Loading

0 comments on commit 1a770c7

Please sign in to comment.