-
Notifications
You must be signed in to change notification settings - Fork 1
Cluster usage: Data Management
Exacloud has 3 main places you can store files.
-
RDS
, accessible at/home/groups/ZuckermanLab
: General storage for data. We have a fairly large storage allocation here, so any data you're holding on to long-term should go in here. -
gscratch
, accessible at/home/exacloud/gscratch/ZuckermanLab
: High-performance storage. If you're actively working with files you're loading data in or out of, you'll get the best performance by putting them here. In some cases, it can be a striking difference (I've benchmarked a 30% speedup in some simulations just by moving them from the RDS to gscratch.)NOTE: This filesystem has been acting up a little lately. At time of writing, this will likely be sorted out by Nov. 21 2021 or so, so I maintain you should use it for data you're actively working with whenever possible.
-
home
, at/home/users/<your username>
: Generally, you shouldn't be using this to store data or anything. It's got a quite small amount of space available, and it's not high performance. However, it can be convenient to temporarily throw stuff in here when you're copying files on/off, or for putting things like shell scripts.
(Maybe move this to its own section)
WESTPA simulations generally/by default produce a huge number of very small files from running the dynamics. These get put in your <WESTPA simulation directory>traj_segs/
.
These files incur a substantial amount of overhead on the cluster filesystems, for a slough of reasons. There's typically not much to do about this, other than tar up your traj_segs
if you don't need to directly access trajectory files. This can be a big help, because in addition to the compression from tarring it, that eliminates the overhead (which can be almost double the original file size in some cases).
Additionally, the large amount of file I/O involved in producing these and shuffling around them means filesystem performance may have a substantial impact on WE performance. For this reason, I strongly recommend running WE on gscratch
, and NOT directly from the RDS.