Skip to content

Changes between v1.6.1 and v1.7.0 rc1

sebhtml edited this page Sep 26, 2011 · 11 revisions

Most significant changes

  • Output files are written to the directory specified by -o (previously it was a file prefix)
  • Round-robin reception of messages
  • Bloom filter
  • Illumina mate-pairs support
  • Job checkpointing
  • New scaffolding algorithm
  • New assembly engine for the extension of seeds with mate-pairs (NovaEngine)
  • Parallel file partitionning
  • Network latency testing
  • Compiles cleanly on 32-bit systems

List of commits

v1.7.0-rc1 Mon Sep 26 16:59:17 2011 -0400 58 commits

Sébastien Boisvert [email protected] 58 commits

   f75bfa6 Removed inline code because compilers optimize the code anyway.
   6b345f9 Changed mpirun to mpiexec as mpiexec is in the standard.
   ec32cf4 Merged the persistent communication layer with the round-robin reception.
   714875d Implemented a round-robin algorithm for the reception of messages.
   63a3131 Write raw data for network tests if -test-network-only is provided.
   dc196b2 Added option -write-network-test-raw-data.
   fd3c033 Added a time period during which other messages are more important than urgent messages.
   7c2d6f6 Fixed a bug in the code that loads the checkpoint GenomeGraph.
   58733de Enabled the communication optimizer for the network test too.
   a08888b The option -show-communication-events now shows all messages with overlays too.
   c97ea8e Added a communication optimizer with urgent messages.
   b63035e Added overlays for option -show-communication-events.
   9779ced Added option -show-communication-events.
   7619e84 Added option -show-read-placement.
   d6b1f5d Added more details in the output of -run-profiler.
   6aa7b59 Removed dependency for clock_gettime.
   078148a Added option -debug-scaffolder.
   9181dd8 Added assertions and fixed a bug in GridTable.
   0df6381 Fixed some divisions by 0 in the scaffolder.
   0554af1 Regression bug on phix system test fixed.
   76b0b0d Fixed a bug in JoinerWorker in which two overlapping paths would not be joined together.
   bc7b314 Added some debugging information for -debug-fusions.
   783d522 Fixed a communication problem in MessageProcessor.
   4e2b599 The number of enabled MPI ranks can be changed during the network test by changing a variable in the source code.
   95d825d Merge branch 'master' of [email protected]:sebhtml/ray
   bdf49a2 Fixed an integer overflow in the computation of standard deviations.
   53eb012 Fixed scripts to accomodate new prefix directory option.
   2828d85 Added more debugging information in the scaffolding test.
   2afa3cf Updated for ScaffoldLinks.txt format to v2.0.
   5cd9848 Added some documentation for Infiniband.
   1dab4e3 Restored the default number of words in the network test to 500.
   21e312c Modified some scaffolding code to obtain the correct side of a contig when it allows both.
   a908567 Implemented a new greedy scaffolding algorithm as discussed with François Laviolette.
   81be6d9 Added the standard deviation in ScaffoldLinks.txt
   32646f2 Added non-persistent MPI communication just to compare.
   06c7695 Added information in ScaffoldLinks.txt
   f0fcdd3 Changed the default message size for network testing.
   eb9f8c7 Limiting scaffolding links to vertices that have one parent, one child and a coverage value near the peak.
   b5a98d2 RayVersion and RayCommand are now written (a bug was introduced).
   8dc6766 Fixed the code that counts the number of extended seeds.
   862b5fc Added option -write-contig-paths to write contig paths with coverage values. This is enabled by default.
   1db2865 The checkpoint ContigPaths is now fully operational (read and write).
   edbfcd7 The checkpoint ContigPaths is now written on demand.
   9150bc1 Removed the minimum number of raw scaffolding links.
   77e4a01 Modified the scaffolder routines to check the vertex coverage values in paths.
   18edfd0 Fixed the content of a displayed text in the fusion task creator.
   eaadc7c Modified the Sun Grid Engine job template to erase the directory before running the whole thing.
   672b5f3 Changed --oneline to --pretty=oneline for compatibility with older versions of git.
   fdad4e9 Cleaned some code in the task creator routines for edge purging.
   b9fe00b Cleaned some code in the task creator routines for edge purging.
   e6a4fd8 Fixed a bug in the virtual processor wherein it was not force-flushing messages when needed.
   32c1384 Corrected the number of flowed vertices in the seed extension.
   6538859 Improved heuristics for selection.
   221e7b4 Implemented the reverse strand case in JoinerWorker.
   2754a54 Added 3 unit tests for NovaEngine and improved the heuristics.
   f704b0f Corrected positions in JoinerWorker when on the other strand.
   ef06930 The default is now ASSERT=y in the Makefile.
   e77388e All output files are written in a directory provided with option -o.

v1.7.0-beta1 Tue Sep 6 12:02:43 2011 -0400 50 commits

Sébastien Boisvert [email protected] 17 commits

   db6b8d3 reset() must be called in the constructor.
   d946b29 Fixed compilation warning.
   8200a28 Merge branch 'master' of github.com:sebhtml/ray
   78af599 Adding new files in Documentation/.
   c6fb5d0 Added INSTALL.txt.
   151028b Migrated some code only utilised by the scaffolder.
   345d267 Added \author tag to all classes.
   952e1a2 Updated Documentation files.
   068ce0e Added VirtualProcessor initialization.
   8a2be4a Removed MyForest and its iterator minion.
   4eebe61 Introducing the VirtualProcessor class.
   4897465 Fixed compilation warnings for 32-bit systems.
   91799e1 Fixed an argument name.
   1039128 Added documentation for the network latency.
   f11cb16 Added documentation for the virtual processor.
   8152dbb Added documentation for the virtual communicator.
   7e04a0a Fixed compilation warnings.

Sébastien Boisvert [email protected] 33 commits

   e5f0ac7 Joiner software stack now joins otherwise un-joined paths in the distributed graph.
   e53c0cf Now printing hit information in JoinerWorker.
   3d49b71 Added selected hit in standard output for JoinerWorker.
   1ef78fb Updated a threshold in FusionWorker.
   9595b0d Added debugging information in JoinerWorker.
   441d133 Added Joiner code.
   b3b8dde Disabled the reverse-complement copies of extensions.
   e42c2d1 Workers push virtual messages, not real messages.
   80d8fc8 Fixed a state-machine bug in TaskCreator/FusionTaskCreator.
   ae4af42 Fixed a machine-state bug in FusionWorker.
   15fc7d1 Added an AUTHORS file.
   6a5954b Changed the default algorithm in VirtualProcessor -- now using a minimum work unit.
   3cbc5fd Added some debugging information for FusionTaskCreator.
   45280ff Removed OperatingSystem dependency in unit tests.
   f778c5b Implemented a new better and simpler merger module -- FusionTaskCreator/FusionWorker.
   57753e7 Fixed some unit tests by moving scaffolder methods.
   84656ae Using the VirtualProcessor for edge purge.
   c801302 Added some debugging information for fusions.
   53085bb Restored worker codes.
   d05a902 Added interface Worker for worker classes.
   3547950 Added method hasWorkToDo to VirtualProcessor.
   75f29f1 Added debugging messages in FusionData.
   b826e68 Added TaskCreator and Merger classes.
   173a2f3 Removed hard-coded parameter -debug-fusions.
   778e0d2 Changed the maximum number of cycles to 16 in merging code.
   10d3346 Merge branch 'master' of [email protected]:sebhtml/ray
   1048b06 Added scaffolder cases in Documentation/
   45ecca1 The ChangeLog file will not be maintained anymore, use ./scripts/dump-ChangeLog.sh
   b01fe11 Added option -version to Ray.
   58e1a6d Modified the behavior of Ray when fusions are generated.
   26ee554 Updated the path to Ray in system tests.
   c920e1e Fixed a compilation warning.
   7dea079 Added a function to create directories.

v1.6.2-rc2 Wed Aug 24 20:50:07 2011 -0400 71 commits

Sébastien Boisvert [email protected] 5 commits

   70c8e92 Changed where is written the binary Ray.
   b14a5b7 Removed the manual target from the Makefile.
   a5ff5c4 Added a Documentation directory.
   6b6fd7c Removed logo from source.
   fa809fa Testing symbolic links.

Sébastien Boisvert [email protected] 64 commits

   6a290ce Added additional debugging information.
   b6aa616 Restored original state.
   7f8708f Added an explicit flush.
   1a11ee5 Added checkpoint Sequences.
   ffef055 Updated the ouput of -help option.
   89ed32b Added option -debug-fusions.
   5e09388 Updated the MANUAL.
   797baef Added -read-write-checkpoints in the changes.
   d15a0ca Added gmane link in the README
   98dbbd4 Removed unused scripts.
   0034730 Skip a seed if within it during flow 1 and a vertex is already processed.
   37aa1ff Limiting seeds to probably unique vertices.
   03cb71f Don't write a checkpoint if it was just read.
   dcf3eb6 Added a file describing checkpoints.
   ee971f5 Read checkpoint before writing it.
   aec738a Changed 1 hash function because it was a copy.
   6c4fac9 Fixed a hanging problem.
   0094fba Added checkpoint Extensions.
   5dd5470 Added checkpoint Partition.
   ddf2125 Fixed a bug when no sequence files are provided.
   5a7b56e Improved checkpointing message.
   a6596ea Added option -test-network-only to only to test the network and return.
   a72d0ec Improved checkpointing messages.
   6d29d43 Fixed a messaging bug occuring very rarely.
   e2a2bb6 Added a MANUAL file.
   218546f Checkpoint files are now written in a binary format.
   f63590a Checkpoints are now operational.
   adf7222 -read-checkpoints works with checkpoints <CoverageDistribution>, <GenomeGraph> and <Seeds>.
   890ba11 Option -write-checkpoints writes all checkpoints.
   4822527 Added options -read-checkpoints and -write-checkpoints, this is still in development.
   23f7218 Preparing code for a change.
   60d1ebe Reduced the number of messages with tag RAY_MPI_TAG_REQUEST_VERTEX_COVERAGE in SeedExtender.cpp
   c510d96 Added tag counts for option -run-profiler.
   63beff4 Fixed a display problem.
   1e3490f Modified the order of the steps performed when merging identical paths.
   5fc8c13 Changed the prototype of VirtualCommunicator::getMessageResponseElements.
   237339a Only send a RAY_MPI_TAG_ASK_IS_ASSEMBLED message if starting on a seed on flow 1.
   6fa20a9 Don't fetch read markers when not needed, use less memory to know is a vertex was assembled.
   7c2a084 Added skipping events.
   d4e5a25 Fixed N50 when there is only 1 scaffold.
   c17c1fd RayCommands file is written correctly now.
   3ece690 Ray merger will merge more things now.
   b6e342c Fixed a segmentation fault that occurs in rare cases.
   6208303 Fixed a bug in the scaffolder, now more vertices should be investigated.
   eb13301 Changed the precision of things that go together.
   0f78f4e Fixed which arguments are picked up by opcodes -p and -i.
   dd5c796 Input opcodes are now shuffled before being utilised.
   44d2bc8 Extension of seeds if done endlessly until growth stops.
   ac01c27 Modified NovaEngine to pass a new unit test (as well as the old unit tests).
   f23b52d Changed the default number of persistent requests.
   fe05cee Added list of working C++ compilers.
   6a1854a The paired read simulator is now a separate project, see https://github.com/sebhtml/paired-read-simulator
   6ccac7a Flag invalid choices before doing the actual selection.
   82f6107 Fixed a compilation Warning.
   5a99139 Fixed a integer overflow.
   c63ba4a Fixed a compilation warning with gcc.
   0d1aab0 Fixed a compilation warning with Intel compiler.
   f84d308 ExtensionElement objects now contains reads in 2-bit format.
   ec0f6d1 Fixed a memory problem in the computation of optimal read markers.
   5a7c941 Implemented a Bloom filter to reduce the memory usage. This is ridiculously good and the false positive rate has no effect whatsoever on Ray thanks to the KmerAcademy.
   27f24d5 Added a few things in the README.
   a1233c0 Fixed a recently introduced regression in heuristics (should not choose an invalid choice).
   ed67f25 Updates in the README.
   83f0e0a Fixed the maximum length of input reads to 65535.

Sébastien Boisvert [email protected] 2 commits

   dbba521 Added some documentation files.
   40d1d8a Fixed a bug.

v1.6.2-rc1 Wed Aug 3 13:39:46 2011 -0400 87 commits

David Eccles (gringer) [email protected] 1 commits

   c9aa372 fixing a segfault when no contigs are found

Sébastien Boisvert [email protected] 72 commits

   e6c66af Added N50, median, average and largest contig and scaffold lengths in PREFIX.OutputNumbers.txt
   1aac7a6 Removed the coverage threshold from the algorithm that finds seeds in the distributed graph. Suggested by David Eccles(gringer) from Max-Planck-Gesellschaft, München.
   a6244b0 Modified the selection engine to that the new unit tests also pass.
   0811707 Removed email of contributor.
   91f3251 Added David Eccles in the README
   798f262  Added debugging option -show-distance-summary.
   34d1af2 New development option: -show-distance-summary.
   454d5ee Added a test for open addressing.
   bffce62 Merging of similar paths has been modified.
   3b83d6c Added read placement freezing.
   dac7de5 Modified the extension algorithm to avoid collapsing of repeated k-mers that are near each other in the genome.
   c667133 Added a unit test and fixed NovaEngine to handle it.
   7a3443c Improved the manual.
   d842360 Added a section on how to launch Ray in the manual.
   9c449be Fixed a compilation warning.
   182fe2c Added some unit tests for the Ray NovaEngine (for mate-pair reads). Seems to work quite well so far.
   a5e631b New option: -write-seeds which is useful for debugging the code.
   1902636 Only use NovaEngine when paired information is available.
   91359b0 Added options -use-NovaEngine and -show-NovaEngine for debugging purposes.
   8bc58aa Fixed compilation errors with HAVE_LIBZ=y and HAVE_LIBBZ2=y
   af7557e New output files: PREFIX.SequencePartition.txt and PREFIX.NumberOfSequences. Improved the content shown with -help.
   9598fda Now using the NovaEngine.
   ddeca62 Fixed a bug in the NovaEngine.
   13e5e76 Removed by default the reporting of libraries in stdout.
   5e5edae Option use:NovaEngine enables experimental NovaEngine.
   73c61bd Fixed a bug in the recently introduced peer-to-peer parallel Partitioner.
   0bbd8d3 Removed options in 2 system tests.
   1c53c6e Added comments at random places.
   3696be0 Added 2 scripts for code editing.
   1c135af Disabling by default the experimental NovaEngine. Results so far are promising !
   e1efa6e Fixed a compilation warning.
   3fe9fab Removed roughly half of the messages with the MPI tag RAY_MPI_TAG_KMER_ACADEMY_DATA.
   4b3639f Added a unit test and modified CoverageDistribution.cpp to handle low-coverage datasets.
   b76e9dd * Added information on unknown nucleotides in the instruction manual.
   9847b9e Added a TODO item.
   20ad18f Added an abstraction layer for the operating system.
   9706603 Improved the README for system tests.
   4372845 Only show nova choices if -show-extension-choice is provided.
   67a87ed Improved the document about patching.
   231dbc9 Added a file describing how to submit a patch.
   88b13a0 Unit tests are now files named test_<test_name>.sh
   43200b5 Fixed a typo.
   efda9d9 Moved Kmer routines in the class Kmer.
   207a193 Added an entry in the changelog.
   19b3905 Added a symbolic link.
   f7addb4 Added 35 unit tests.
   e704aa1 Simplified the algorithm that finds peak and added 35 unit tests to test it (for various datasets).
   1a8b0e0 Improved the NovaEngine according to unit tests.
   412052b Improved the unit tests for the NovaEngine.
   862e6dd Removed some messages.
   1bd4d5a Removed assertion.
   4471c37 Improved the NovaEngine, but not using it yet. It needs more testing.
   668ad2f Added a unit test for the NovaEngine.
   9c8b02d New experimental heuristics: The Ray NovaEngine.
   1ef868d Created an heuristics module and moved related bits in it.
   cd8c1af Improved the peak finder when there is no deviation.
   15ea481 Fixed a compilation error.
   a2470f2 Moved the configuration of the virtual communicator in Machine.cpp
   810187d Corrected a code comment.
   982ff04 Updated the coding style.
   f7d9087 Added a coding style file.
   3f15314 Don't compute or update peaks for libraries with manually-provided information.
   d386173 When the extension is finished, show library peak usage.
   952dfb8 Don't print the tree.
   5a49add Changed to 64 slots.
   8c15d79 Corrected the peak finder.
   fdc1bde Merge branch 'master' of [email protected]:sebhtml/ray
   2b5f716 Changed the behavior for repeats.
   0d6565e Fixed a system test.
   857c207 Fixed some compilation warning with gcc 4.1.2.
   4bb20ba Added an entry in the change log for 1.6.2.
   5fab076 Now working on v1.6.2

Sébastien Boisvert [email protected] 14 commits

   8354455 Fixed a compilation errors due to the algorithm library.
   8ffee2b Merge branch 'master' of github.com:sebhtml/ray
   29ab020 Fixed a bug in the incremental resizing algorithm of MyHashTable.
   bacc0c7 Fixed comments and assertions for new correct code.
   d48fa33 Fixed an implementation bug for double hashing.
   c9e8a5e Added a TODO item.
   8549758 File partitioning is now performed in parallel.
   9825ecf Implemented parallel file partitioning.
   4a4b739 Don't set NSLOTS if already defined.
   d652a1c Now Ray uses the maximum peak of a paired library to compute the expiry position of a read.
   065df3a Choosing the good peak for a paired library if a mate is already available.
   bd324ea Now selecting the correct peak to choose the next vertex.
   7a72aac Ray can find more than one peak in any paired library.
   7d92f06 Ported the prototype for finding peaks from Python to C++.
Clone this wiki locally