MineRLTreechopVectorObf-v0
MineRLNavigateVectorObf-v0
MineRLNavigateExtremeVectorObf-v0
MineRLNavigateDenseVectorObf-v0
MineRLNavigateExtremeDenseVectorObf-v0
MineRLObtainDiamondVectorObf-v0
(competition evaluation task)MineRLObtainDiamondDenseVectorObf-v0
We will ignore this because it is a task subset of Obtain Diamond. However using this category human data may be valuable.
MineRLObtainIronPickaxeVectorObf-v0
MineRLObtainIronPickaxeDenseVectorObf-v0
- Final episode reward
- Final episode reward with human normalized performance
- Sample efficiency (0, 100k, 1M, 8M)
- Episode reward curves
- Continuous
- Naturally supported by minerl vector actions (64,)
- Discrete
- K-means clustering on human data and use as discretized action space
- Helps lessen the exploration problem with a cost of restricting the action space
- Tuple Observations (Image (64,64,3), vector (64,))
- Convolutional Neural Network for image observations
- Concatenate hidden output with vector observations
- Feed concatenation into feed-forward network (and possibly RNN) for latent state representation
- Use latent state representation for policy, value, Q networks.
- Online exploration
- Use algorithm default exploration
- Learn policy from environment sampled data
- Both on-policy and off-policy RL algorithms
- No exploration
- Learn policy from human data
- Only off-policy RL algorithms
- Online exploration in the environment
- Learn from environment sampled data and human data
- Only off-policy RL algorithms