Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition speed improvement #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

flowers9
Copy link

When partitioning the *.can file, multiple passes are required - one per 10 output files (plus one to get the number of reads involved). As the *.can file can be many gigabytes, this slows down the partitioning process quite a bit. This fix increases the number of files that can be written to be closer to the system limit, rather than a fixed 10, likely reducing the number of passes to one (plus the one to get the number of reads).

My initial approach of using std::vector<PODArray > failed when other variables got overwritten. I didn't want to muck around in PODArray<> to figure out what the cause was, so I used new[]/delete[] instead.

@AGI-chandler
Copy link

AGI-chandler commented Oct 26, 2017

I could not compile with these changes, the errors were below. I think it is because of flowers9:index_t branch changed index_t to idx_t but in this branch it still uses index_t. Sorry I am not that experienced with git so not sure if there is an elegant way to merge both branches. I manually edit the files in this branch and changed index_t back to idx_t and now it compiles...

g++ -o ../Linux-amd64/obj/mecat2cns/mecat2cns/overlaps_partition.o -c -MD -D_GLIBCXX_PARALLEL -pthread -O3 -Wall -Imecat2cns -Imecat2cns/libboost mecat2cns/overlaps_partition.cpp
mecat2cns/overlaps_partition.cpp: In function ‘bool check_m4record_mapping_range(const M4Record&, double)’:
mecat2cns/overlaps_partition.cpp:18: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:19: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:20: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:21: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:22: error: ‘qm’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:22: error: ‘qs’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:22: error: ‘sm’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:22: error: ‘ss’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: In function ‘bool query_is_contained(const M4Record&, double)’:
mecat2cns/overlaps_partition.cpp:28: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:29: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:30: error: ‘qm’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:30: error: ‘qs’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: In function ‘bool subject_is_contained(const M4Record&, double)’:
mecat2cns/overlaps_partition.cpp:36: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:37: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:38: error: ‘sm’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:38: error: ‘ss’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: At global scope:
mecat2cns/overlaps_partition.cpp:42: error: ‘index_t’ has not been declared
mecat2cns/overlaps_partition.cpp:42: error: ‘index_t’ has not been declared
mecat2cns/overlaps_partition.cpp: In function ‘void get_qualified_m4record_counts(const char*, double, int&, int&)’:
mecat2cns/overlaps_partition.cpp:48: error: ‘index_t’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:48: error: expected ‘;’ before ‘num_records’
mecat2cns/overlaps_partition.cpp:57: error: ‘num_records’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:59: error: no matching function for call to ‘max(int&, idx_t&)’
mecat2cns/overlaps_partition.cpp:60: error: no matching function for call to ‘max(int&, idx_t&)’
mecat2cns/overlaps_partition.cpp:64: error: ‘num_records’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: At global scope:
mecat2cns/overlaps_partition.cpp:69: error: ISO C++ forbids declaration of ‘index_t’ with no type
mecat2cns/overlaps_partition.cpp:69: error: expected ‘,’ or ‘...’ before ‘num_reads’
mecat2cns/overlaps_partition.cpp: In function ‘void get_repeat_reads(const char*, double, int)’:
mecat2cns/overlaps_partition.cpp:74: warning: array subscript has type ‘char’
mecat2cns/overlaps_partition.cpp:75: error: ‘num_reads’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:84: error: expected ‘;’ before ‘qid’
mecat2cns/overlaps_partition.cpp:84: warning: statement has no effect
mecat2cns/overlaps_partition.cpp:85: error: ‘qid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:89: error: expected ‘;’ before ‘sid’
mecat2cns/overlaps_partition.cpp:89: warning: statement has no effect
mecat2cns/overlaps_partition.cpp:90: error: ‘sid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:95: error: expected ‘;’ before ‘i’
mecat2cns/overlaps_partition.cpp:95: warning: statement has no effect
mecat2cns/overlaps_partition.cpp:95: error: name lookup of ‘i’ changed for ISO ‘for’ scoping
mecat2cns/overlaps_partition.cpp:95: note: (if you use ‘-fpermissive’ G++ will accept your code)
mecat2cns/overlaps_partition.cpp:99: error: ‘repeat_reads’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:102: error: ‘repeat_reads’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: At global scope:
mecat2cns/overlaps_partition.cpp:113: error: ISO C++ forbids declaration of ‘index_t’ with no type
mecat2cns/overlaps_partition.cpp:113: error: expected ‘,’ or ‘...’ before ‘part’
mecat2cns/overlaps_partition.cpp: In function ‘void generate_partition_file_name(const char*, int)’:
mecat2cns/overlaps_partition.cpp:115: error: ‘ret’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:118: error: ‘part’ was not declared in this scope
mecat2cns/overlaps_partition.cpp: In function ‘void partition_candidates(const char*, idx_t, int)’:
mecat2cns/overlaps_partition.cpp:171: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:179: error: ‘index_t’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:179: error: expected ‘;’ before ‘i’
mecat2cns/overlaps_partition.cpp:179: error: ‘i’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:179: error: ‘num_batches’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:181: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:182: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:183: error: ‘efid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:183: error: ‘sfid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:184: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:185: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:193: error: ‘L’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:193: error: ‘R’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:198: error: ‘L’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:198: error: ‘R’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:207: error: ‘index_t’ cannot appear in a constant-expression
mecat2cns/overlaps_partition.cpp:207: error: template argument 1 is invalid
mecat2cns/overlaps_partition.cpp:170: warning: unused variable ‘num_reads’
mecat2cns/overlaps_partition.cpp: At global scope:
mecat2cns/overlaps_partition.cpp:217: error: ISO C++ forbids declaration of ‘index_t’ with no type
mecat2cns/overlaps_partition.cpp:217: error: expected ‘,’ or ‘...’ before ‘batch_size’
mecat2cns/overlaps_partition.cpp: In function ‘void partition_m4records(const char*, double, int)’:
mecat2cns/overlaps_partition.cpp:221: error: expected ‘;’ before ‘num_reads’
mecat2cns/overlaps_partition.cpp:221: warning: statement has no effect
mecat2cns/overlaps_partition.cpp:222: error: ‘num_qualified_records’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:222: error: ‘num_reads’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:223: error: ‘index_t’ cannot appear in a constant-expression
mecat2cns/overlaps_partition.cpp:223: error: template argument 1 is invalid
mecat2cns/overlaps_partition.cpp:223: error: template argument 2 is invalid
mecat2cns/overlaps_partition.cpp:223: error: template argument 3 is invalid
mecat2cns/overlaps_partition.cpp:223: error: invalid type in declaration before ‘;’ token
mecat2cns/overlaps_partition.cpp:225: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:234: error: expected ‘;’ before ‘i’
mecat2cns/overlaps_partition.cpp:234: warning: statement has no effect
mecat2cns/overlaps_partition.cpp:234: error: ‘i’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:234: error: ‘num_batches’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:236: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:237: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:238: error: ‘efid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:238: error: ‘sfid’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:239: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:240: error: ‘index_t’ does not name a type
mecat2cns/overlaps_partition.cpp:247: error: ‘min_read_size’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:249: error: request for member ‘find’ in ‘repeat_reads’, which is of non-class type ‘int’
mecat2cns/overlaps_partition.cpp:249: error: request for member ‘end’ in ‘repeat_reads’, which is of non-class type ‘int’
mecat2cns/overlaps_partition.cpp:251: error: request for member ‘find’ in ‘repeat_reads’, which is of non-class type ‘int’
mecat2cns/overlaps_partition.cpp:251: error: request for member ‘end’ in ‘repeat_reads’, which is of non-class type ‘int’
mecat2cns/overlaps_partition.cpp:253: error: ‘L’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:253: error: ‘R’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:257: error: ‘batch_size’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:259: error: ‘L’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:259: error: ‘R’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:263: error: ‘batch_size’ was not declared in this scope
mecat2cns/overlaps_partition.cpp:269: error: ‘index_t’ cannot appear in a constant-expression
mecat2cns/overlaps_partition.cpp:269: error: template argument 1 is invalid
make[1]: *** [../Linux-amd64/obj/mecat2cns/mecat2cns/overlaps_partition.o] Error 1
make[1]: Leaving directory `/opt/MECAT/src'
make: *** [mecat] Error 2

@xiaochuanle
Copy link
Owner

xiaochuanle commented Dec 2, 2017

Dear all, thanks for your interest in MECAT, We have updated MECAT versiong 1.3 and fixed these issues by adding one new option '-k to specified the number of partition files. Please complie the new version again and use '-k -1' to let mecat2cns write as many as possible partition files at one pass. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants