fly-craft-examples

Demonstrations generation and training scripts for fly-craft.

Generating Demonstrations

generating with PID controller

sample from $ V \times \Mu \times \Chi = [v_{min}:v_{max}:v_{interval}] \times [mu_{min}:mu_{max}:mu_{interval}] \times [chi_{min}:chi_{max}:chi_{interval}] $ with PID controller and save sampled trajectories in demonstrations/data/{step-frequence}hz_{$v_{interval}$}_{$mu_{interval}$}_{$chi_{interval}$}_{data-dir-suffix}

# sample trajectories single-processing
python demonstrations/rollout_trajs/rollout_by_pid.py --data-dir-suffix v5 --step-frequence 10 --v-min 100 --v-max 110 --v-interval 10 --mu-min -5 --mu-max 5 --mu-interval 5 --chi-min -5 --chi-max 5 --chi-interval 5

# sample trajectories multi-processing with Ray
python demonstrations/rollout_trajs/rollout_by_pid_parallel.py --data-dir-suffix v4 --step-frequence 10 --v-min 100 --v-max 110 --v-interval 10 --mu-min -5 --mu-max 5 --mu-interval 5 --chi-min -5 --chi-max 5 --chi-interval 5

updating demonstrations with policy

update demonstrations in {demos-dir} with policy in {policy-ckpt-dir}

python demonstrations/rollout_trajs/rollout_by_policy_and_update_demostrations.py --policy-ckpt-dir checkpoints/sac_her/best_model --env-config-dir configs/env/env_config_for_sac.json --demos-dir demonstrations/data/10hz_10_5_5_v2

augment demonstrations

augment trajectories based on $\chi$'s symmetry

python demonstrations/utils/augment_trajs.py --demos-dir demonstrations/data/10hz_10_5_5_v2

label demonstrations with rewards (support for Offline RL)

label demonstrations in {demos-dir} with rewards (--traj-prefix is the csv filename's prefix in the demonstration direction)

python demonstrations/utils/label_transitions_with_rewards.py --demos-dir demonstrations/data/10hz_10_5_5_test --traj-prefix my_f16trace

process demonstrations (normarlize observations and actions, and concat all csv files) and cache the processed np.ndarray objects

python demonstrations/utils/load_dataset.py --demo-dir demonstrations/data/10hz_10_5_5_iter_1_aug --demo-cache-dir demonstrations/cache/10hz_10_5_5_iter_1_aug

Note: the cache directory should be consistent with the "data_cache_dir" in the training configurations.

Training policies with Stable-baselines3

BC

python train_scripts/IRPO/train_with_bc_ppo.py --config-file-name configs/train/IRPO/ppo/easy/ppo_bc_config_10hz_128_128_easy_1.json

PPO

python train_scripts/IRPO/train_with_rl_ppo.py --config-file-name configs/train/IRPO/ppo/easy/ppo_bc_config_10hz_128_128_easy_1.json

PPO fine-tuning a BC-pre-trained policy

python train_scripts/IRPO/train_with_rl_bc_ppo.py --config-file-name configs/train/IRPO/ppo/easy/ppo_bc_config_10hz_128_128_easy_1.json

SAC

python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/sac_without_her/sac_config_10hz_128_128_1.json

SAC with HER

python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/sac_her/sac_config_10hz_128_128_1.json

NMR (Non-Markovian Reward Problem)

# test SAC on NMR(last 10 observations)
python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/easy_her_sparse_negative_non_markov_reward_persist_1_sec/sac_config_10hz_128_128_1.json

# test SAC on NMR(last 20 observations)
python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/easy_her_sparse_negative_non_markov_reward_persist_2_sec/sac_config_10hz_128_128_1.json

# test SAC on NMR(last 30 observations)
python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/easy_her_sparse_negative_non_markov_reward_persist_3_sec/sac_config_10hz_128_128_1.json

# try solve NMR with framestack
python train_scripts/train_with_rl_sac_her.py --config-file-name configs/train/sac/hard_her_framestack_sparse_negative_non_markov_reward_persist_1_sec/sac_config_10hz_128_128_1.json

Evaluating policies

Visualization

The script train_scripts/IRPO/evaluate/rollout_one_trajectory.py can be used to generate .acmi files, which can be used to visualize the flight trajectory with the help of Tacview.

python train_scripts/IRPO/evaluate/rollout_one_trajectory.py --config-file-name configs/train/IRPO/ppo/easy/ppo_bc_config_10hz_128_128_easy_1.json --algo rl_bc --save-acmi --use-fixed-target --target-v 210 --target-mu 5 --target-chi 10 --save-dir train_scripts/IRPO/evaluate/rolled_out_trajs/

Statistical Evaluation

The script train_scripts/IRPO/evaluate/evaluate_policy_by_success_rate.py can be used to evaluate trained policies statistically, which will obtain information such as the success rate, cumulative rewards, and trajectory length of the policy.

python train_scripts/IRPO/evaluate/evaluate_policy_by_success_rate.py --config-file-name configs/train/IRPO/ppo/easy/ppo_bc_config_10hz_128_128_easy_1.json --algo rl_bc --seed 11 --n-envs 8 --n-eval-episode 100

Citation

Cite as

@misc{gong2024flycraftexamples,
  title        = {fly-craft-examples},
  author       = {Gong, Xudong},
  year         = 2024,
  note         = {\url{https://github.com/GongXudong/fly-craft-examples} [Accessed: (2024-07-01)]},
}

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
configs		configs
demonstrations		demonstrations
evaluation		evaluation
tests		tests
train_scripts		train_scripts
utils_my		utils_my
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fly-craft-examples

Generating Demonstrations

generating with PID controller

updating demonstrations with policy

augment demonstrations

label demonstrations with rewards (support for Offline RL)

process demonstrations (normarlize observations and actions, and concat all csv files) and cache the processed np.ndarray objects

Training policies with Stable-baselines3

BC

PPO

PPO fine-tuning a BC-pre-trained policy

SAC

SAC with HER

NMR (Non-Markovian Reward Problem)

Evaluating policies

Visualization

Statistical Evaluation

Citation

About

Releases

Packages

Languages

GongXudong/fly-craft-examples

Folders and files

Latest commit

History

Repository files navigation

fly-craft-examples

Generating Demonstrations

generating with PID controller

updating demonstrations with policy

augment demonstrations

label demonstrations with rewards (support for Offline RL)

process demonstrations (normarlize observations and actions, and concat all csv files) and cache the processed np.ndarray objects

Training policies with Stable-baselines3

BC

PPO

PPO fine-tuning a BC-pre-trained policy

SAC

SAC with HER

NMR (Non-Markovian Reward Problem)

Evaluating policies

Visualization

Statistical Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages