Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory Structure for MDF files #45

Open
tknopp opened this issue Sep 11, 2016 · 22 comments
Open

Directory Structure for MDF files #45

tknopp opened this issue Sep 11, 2016 · 22 comments

Comments

@tknopp
Copy link
Member

tknopp commented Sep 11, 2016

As already discussed in #42:

It might make sense that we develop a standardized folder structure for mdf files. We have not yet developed anything in that direction but start thinking about that.

Here is my current thinking

mainfolder/
-- measurements/
---- study1
------ 000001.mdf
-- systemFunctions/
----000001.mdf
-- reconstructions/
---- study1/
------ 000001/     # this is the experiment number
--------- 000001.mdf # this is the number of the reconstruction

The leading zeros might not be necessary. In the name "study" one might also encode the date as Bruker does this.

Ping @MandyA @profix898 @hofmannmartin

@tknopp
Copy link
Member Author

tknopp commented Sep 12, 2016

We should really do this. I think here about exchanging studies between groups and it would be very handy to just drop a study folder into the dataset directory without thinking how others structure the data.

@hofmannmartin
Copy link
Member

I like this idea. +1

@tknopp You plan on adding a paragraph regarding this issue to the specifications?

@MandyA
Copy link
Collaborator

MandyA commented Sep 12, 2016

Personally, I like the idea. I will talk to Anselm if this would be a good solution to store MPS study data.

One thing that might be inconvinient: system matrices are very often reused for different studies. With this format we would have to store the matrix several times. With big 3D matrices this might be an issue.

@tknopp
Copy link
Member Author

tknopp commented Sep 12, 2016

No system function go into a dedicated global folder that is not linked to any study. This is, by the way, also one major difference to the way Bruker handles its data.

@MandyA
Copy link
Collaborator

MandyA commented Sep 12, 2016

Ah, ok - I didn't saw this on the first glance ;)
So in relation to #46 it would make sense to have an optional paramter that includes the information which matrix has been used for reconstruction.

@hofmannmartin
Copy link
Member

So in relation to #46 it would make sense to have an optional paramter that includes the information which matrix has been used for reconstruction.

Yes that would be necessary. Instead of the location one could also reference other mdf-files via their uuid, which in turn has the advantage that files can be renamed without loosing the reference.

@tknopp
Copy link
Member Author

tknopp commented Sep 12, 2016

yes indeed, within this entire issue we should keep the UUID in mind.

@tknopp
Copy link
Member Author

tknopp commented Nov 12, 2016

At the IBI group in HH we have implemented this. Works quite well. We will probably come up with something for the specification.

@hofmannmartin
Copy link
Member

Should we move this issue to MPIFiles.jl? It is more related to the actual data handling than the specifications.

@Neumann-A
Copy link
Contributor

Neumann-A commented Aug 25, 2017

Question why isnt it:

mainfolder/
-- study1/
---- measurements/
------ 000001.mdf
---- reconstructions/
------ 000001/ # this is the experiment (measurement) number
--------- 000245.mdf # this is the number of the reconstruction (changed the number)
-- systemFunctions/
----000126.mdf # (changed the number to make clear it us nothing to do with the above)

My Reasoning:
a study is a collection of measurements and reconstructions. Having an extra reconstruction folder in the mainfolder seems strange since a reconstruction is nothing without the context of a study.

@hofmannmartin
Copy link
Member

In our work flow we did occasionally reconstruct the measurement from different studies, which would be a hassle, if reconstructions are assigned to studies.

@Neumann-A
Copy link
Contributor

Let me guess: You start your reconstruction comparison script from the /reconstructions/ folder?
If you would start it from the main folder it would just be a reordering of the path string.

if reconstructions are assigned to studies

They are assigned to different studies in your current layout due to the extra /study folder in reconstructions.
(So you need the study path any way.)

instead of having /reconstructions/study/ you will have /study/reconstructions
which in my opinion is more logical because a study is a collection of measurements and reconstructions

@hofmannmartin
Copy link
Member

Currently we have our measurement data and reconstruction data separated on different NAS systems. The NAS at the MPI Scanner stores our Measurement data, whereas the NAS at our workstation stores the reconstruction data. Your proposal requires one large file system, where everything is stored and does not allow to split off the reconstructed data.

@Neumann-A
Copy link
Contributor

Neumann-A commented Aug 25, 2017

Your proposal requires one large file system.

No. Thats not necessary. Storage is a implementation detail, you can still store it anywhere you want. The only thing you have to do is present the data in the proposed directory structure. (Its a virtual structure)

@hofmannmartin
Copy link
Member

No. Thats not necessary. Storage is a implementation detail, you can still store it anywhere you want. The only thing you have to do is present the data in the proposed directory structure. (Its a virtual structure)

That is true, but the proposal of @tknopp requires no mapping at all and can be written directly to a file system, which should be feasible, regardless if someone is a programming expert or not.

In this case I would vote for simplicity.

@Neumann-A
Copy link
Contributor

You just need to mount your filesystem correctly beforehand which is just a configuration step. (Has nothing to do with being a programming expert)

Than you can just as easily write to the filesystem.

@tknopp requires no mapping

It also requires mapping if you have to many studies to store on one filesystem... You will sooner or later run into that issue. Possible solutions: Archive old studies somewhere else by moving a lot of data or add more HDDs or a new NAS. For the latter solution you will most likely then need a mapping anyway. Currently you are just delaying the issue ;)

@tknopp
Copy link
Member Author

tknopp commented Aug 25, 2017

The reason is much more simple: We use two stores: One store is that from Bruker, the second is that from MDF. The first is a pure Measurement store (read only!!!), the second is the reconstruction store.

I can see that both systems are isomorph but have different advantages. Bruker does it in a similar fashion as @NeumannIMT proposes it. They put the reconstruction even as a subfolder of the experiment (which also makes sense).

@AvGladiss
Copy link
Contributor

I do not have a smart idea about storing the data, but I have a question about it: A system function is a simple measurement in the first place. Will one file in /systemFunctions be a post-processed version of one /measurements/study/file ?

Furthermore, a system function may reconstruct another system function by handling the different spatial positions as frames (for test purposes). This should be kept in mind when designing a directory structure (then, /reconstruction would need a subfolder /systemFunction ?).

@hofmannmartin
Copy link
Member

I do not have a smart idea about storing the data, but I have a question about it: A system function is a simple measurement in the first place. Will one file in /systemFunctions be a post-processed version of one /measurements/study/file ?

That should depend on what you want to do with it. If you perform a calibration measurement then it should always be stored in //systemFunctions/

Furthermore, a system function may reconstruct another system function by handling the different spatial positions as frames (for test purposes). This should be kept in mind when designing a directory structure (then, /reconstruction would need a subfolder /systemFunction ?).

I dont see a problem here. The directory structure merely provides standard locations for your stuff. Your personal reconstruction framework working on that structure then might do whatever it wants. The process of reconstruction is not part of MDF, but up to the user.

@Neumann-A
Copy link
Contributor

Nice catch @AvGladiss. You seem to have the more general view on this.

Translating this into a file structure:
mainfolder/
-- study1/
---- measurements/
------ 000001.mdf
---- processed/ (renamed from reconstruction)
------ 000001/ # this is the experiment (measurement) number
--------- 000245.mdf # this is the number of the processing (system matrix)
--------- 000123.mdf # this is the number of the processing (reconstructed image)

(-- systemFunctions/)
(----Link to 0000245.mdf )

@tknopp
Copy link
Member Author

tknopp commented Aug 25, 2017

(side note, our actually directory structure uses the name "calibration" instead of "systemFunction")

@tknopp
Copy link
Member Author

tknopp commented Aug 25, 2017

We have very bad experience when mixing calibration measurements and regular measurements. In our opinion it does not make sense that a calibration belongs to a study. Therefore we went with a "flat" structure for calibration scans. It has the advantage that all calibration scans are directly available without the need so search through a deep directory structure.

I can understand Anselm use case. In that case "reconstruction/calibration" could be a good storage location. One could also move "calibration" into the "measurement" folder in which case "reconstruction/calibration" would actually be no workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants