Skip to content

Public Repository for Open Source implementation of algorithms for Reward Learning with Intractable Normalizing Functions

Notifications You must be signed in to change notification settings

VT-Collab/Reward-Learning-with-Intractable-Normalizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reward Learning with Intractable Normalizing Functions

This is a repository for our paper, "Reward Learning with Intractable Normalizing Functions". We include:

  • Implementation of a basic environment showcasing all normalizers mentioned in the paper for test and comparison of normalizer performance under different sampling conditions.
  • Implementation of a second basic 2D exploration environment showcasing all normalizers mentioned in the paper for alternate state-space Q-function implementation.
  • Implementation of a Panda Robot environment showcasing all normalizers mentioned in the paper for test and comparison of normalizer performance under different sampling conditions in a comparable setting to User Study Setting.
  • Showcase of the wider implementation used within the real-world user study for the Franka Panda robot.

Requirements

Requirements are listed in requirements.txt:

  • python3
  • numpy $\ge$ 1.24.2
  • pybullet $\ge$ 3.5.2
  • scipy $\ge$ 1.8.0

Requirements can be installed using pip:

pip install -r requirements.txt

Instructions - Working Example

To run a demonstration of the normalizer runs, run python main.py. The initial settings define an initial 1000-run test summing the total error for each normalizer approximation of theta.

You can also provide arguments to adjust the run parameters of the code:

--runs: changes the number of runs that the errors are summed over. Default is 1000

--outer: changes the number of outer sample loops are used to sample for different beliefs for an ideal human action. Default is 1000

--inner: changes the number of inner sample loops are used to find the normalizers for each approach. Default is 50

Instructions - Working Example 2

To run a demonstration of the normalizer runs, run python main.py (for the regular version) or 'python main_alt.py' (for the q function version). The initial settings define an initial 1000-run test summing the total error for each normalizer approximation of theta.

You can also provide arguments to adjust the run parameters of the code:

--runs: changes the number of runs that the errors are summed over. Default is 1000

--outer: changes the number of outer sample loops are used to sample for different beliefs for an ideal human action. Default is 1000

--inner: changes the number of inner sample loops are used to find the normalizers for each approach. Default is 50

Instructions - Panda_Sims

To run a demonstration of the normalizer runs, run python main.py. The initial settings define an initial 100-run test summing the total error for each normalizer approximation of theta. If you want a graphical representation of the error and regret metrics afterward, run 'python plotter.py'.

You can also provide arguments to adjust the run parameters of the code:

--runs: changes the number of runs that the errors are summed over. Default is 100

--outer: changes the number of outer sample loops are used to sample for different beliefs for an ideal human action. Default is 50

--inner: changes the number of inner sample loops are used to find the normalizers for each approach. Default is 10

About

Public Repository for Open Source implementation of algorithms for Reward Learning with Intractable Normalizing Functions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published