Skip to content
Adam Novak edited this page Apr 15, 2020 · 31 revisions

Maintainance:

  • Release on the 1st Wednesday of each month.

Prioritized list of Toil projects

Implementation Time estimates are in engineering days and very back-of-the-envelope.

  1. 🎉DONE More efficient kubernetes support
    • One of:
      • Host-path caching
      • Toil-integrated within-pod scheduler
    • Time estimate: not scoped yet
  2. 🎉DONE More robust Kubernetes support
    • Handle Kubernetes communication timeouts without restarting the leader
    • Time estimate: not scoped yet
  3. 🎉DONE Rework SingleMachineBarchSystem to eliminate thread limit exhaustion issues; Reduce thread usage in singleMachine mode
    • Time estimate: not scoped yet
  4. CWL 1.1+ Support
    • 13 failing CWL 1.1 conformance tests currently
    • Conditional support (CWL 1.2)
    • Time estimate: not scoped yet
  5. Move away from mesos before/as Ubuntu 16.04 goes out of support
    • probably in favor of auto-deployed Kubernetes somehow
    • Time estimate: not scoped yet
  6. Running in-house VG WDL workflows
    • Time estimate: not scoped yet
  7. WDL compliance test suite
    • Time estimate: not scoped yet
  8. Increase test coverage
    • Time estimate: not scoped yet
  9. Automatic idle worker termination and fixes to ignored nodes.
    • Time estimate: not scoped yet
  10. Updates on caching (should we enable by default?).
    • Time estimate: not scoped yet
  11. Incorporate a cactus integration test to better support cactus
    • Time estimate: not scoped yet
  12. Improved ease of debugging
    • This is more of an ongoing task than clearly defined project.
    • Adjust log levels based on experience.
    • Time estimate: not scoped yet
  13. More/better HPC Queueing System Support
    • Time estimate: not scoped yet

Contributions welcome

  1. Google Job Store Support
    • Time estimate: not scoped yet
  2. Update boto libraries to boto3
    • Time estimate: not scoped yet
  3. Move from simpleDB to a better supported service
    • Time estimate: not scoped yet
  4. More scalable Kubernetes support
    • moving to watches
    • handling more pods in queue than we can loop over before our continue tokens expire
    • Time estimate: not scoped yet
  5. Restart/recovery improvements
    • Changing CWL parameters
    • Managing a failed task that cannot be recovered in a large pipeline
    • Checkpointing
    • Time estimate: not scoped yet
  6. Better support for heterogeneous tasks (e.g. customizing disk size per instance type, (maybe) FPGA support for DRAGEN).
    • Time estimate: not scoped yet
  7. AWS custom/multi security group support
    • Time estimate: not scoped yet
  8. AWS multi-zone support
    • Time estimate: not scoped yet