You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a large Promethion P2 dataset (1.4 TBytes, PoreC library) to be basecalled in sup mode.
I have two identical servers with ~360G RAM, fast local storage, "Nvidia A100 (40G)".
No other users, no other processes on these machines.
merge run POD5 data to a single POD5 file on the local storage of the corresponding server
run dorado on that POD5 file, write output to another (fast) storage, not the one where the POD5 is stored
So the setup is somewhat identical.
The duplex dorado run works extremely inefficient, looks like an I/O issue (which I think is weird, because the simplex sup run runs just fine). E.g.:
duplex (sup)
simplex (sup)
I cannot tell how dorado proceeds as progress is not reported when stdout is redirected ... :-(
Any idea about the actual cause? dorado reads from one single POD5 file in both cases, so reading I/O is unlikely to be an issue.
Are there differences in how dorado is writing data between duplex and simplex?
Any hints/comments/solutions are appreciated ;-)
The text was updated successfully, but these errors were encountered:
Hi @sklages - thanks for reporting this issue. I actually noticed the same thing on a recent run I was doing. I suspect the issue is actually I/O, but not exactly storage related.
The read pattern for duplex is different from simplex. In simplex the reads are simply read in order within the file, which is very efficient and benefits from prefetching, etc. However, in order to keep duplex host memory under control, reads are read by channel. This requires reads to be fetched in random order from the file, which is what is likely causing this massive slowdown. And that's exacerbated if the input file is quite large.
We're looking into this now, but in the meantime subsetting the data by channel, especially for large datasets, should alleviate some of these issues.
I have a large Promethion P2 dataset (1.4 TBytes, PoreC library) to be basecalled in sup mode.
I have two identical servers with ~360G RAM, fast local storage, "Nvidia A100 (40G)".
No other users, no other processes on these machines.
dorado
on that POD5 file, write output to another (fast) storage, not the one where the POD5 is storedSo the setup is somewhat identical.
The duplex dorado run works extremely inefficient, looks like an I/O issue (which I think is weird, because the simplex sup run runs just fine). E.g.:
duplex (sup)
simplex (sup)
I cannot tell how
dorado
proceeds as progress is not reported when stdout is redirected ... :-(Any idea about the actual cause?
dorado
reads from one single POD5 file in both cases, so reading I/O is unlikely to be an issue.Are there differences in how
dorado
is writing data betweenduplex
andsimplex
?Any hints/comments/solutions are appreciated ;-)
The text was updated successfully, but these errors were encountered: