Skip to content

Latest commit

 

History

History
57 lines (55 loc) · 3.47 KB

HISTORY.rst

File metadata and controls

57 lines (55 loc) · 3.47 KB

Changelog

  • master
  • v0.2.28 (2015-07-03)
    • implement RDD.sortBy() and RDD.sortByKey()
    • additional unit tests
  • v0.2.24 (2015-06-16)
    • replace dill with cloudpickle in docs and test
    • add tests with pypy and pypy3
  • v0.2.23 (2015-06-15)
    • added RDD.randomSplit()
    • saveAsTextFile() saves single file if there is only one partition (and does not break it out into partitions)
  • v0.2.22 (2015-06-12)
    • added Context.wholeTextFiles()
    • improved RDD.first() and RDD.take(n)
    • added fileio.TextFile
  • v0.2.21 (2015-06-07)
    • added doc strings and created Sphinx documentation
    • implemented allowLocal in Context.runJob()
  • v0.2.19 (2015-06-04)
  • v0.2.16 (2015-05-31)
    • add values(), union(), zip(), zipWithUniqueId(), toLocalIterator()
    • improve aggregate() and fold()
    • add stats(), sampleStdev(), sampleVariance(), stdev(), variance()
    • make cache() and persist() do something useful
    • better partitioning in parallelize()
    • logo
    • fix foreach()
  • v0.2.10 (2015-05-27)
    • fix fileio.codec import
    • support http://
  • v0.2.8 (2015-05-26)
    • parallelized text file reading (and made it lazy)
    • parallelized take() and takeSample() that only computes required data partitions
    • add example: access Human Microbiome Project
  • v0.2.6 (2015-05-21)
    • factor out fileio.fs and fileio.codec modules
    • merge WholeFile into File
    • improved handling of compressed files (backwards incompatible)
    • fileio interface changed to dump() and load() methods. Added make_public() for S3.
    • factor file related operations into fileio submodule
  • v0.2.2 (2015-05-18)
    • compressions: .gz, .bz2
  • v0.2.0 (2015-05-17)
    • proper handling of partitions
    • custom serializers, deserializers (for functions and data separately)
    • more tests for parallelization options
    • execution of distributed jobs is such that a chain of map() operations gets executed on workers without sending intermediate results back to the master
    • a few more methods for RDDs implemented
  • v0.1.1 (2015-05-12)
    • implemented a few more RDD methods
    • changed handling of context in RDD
  • v0.1.0 (2015-05-09)