-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #73 from fact-project/fact_path_utils
Fact path utils
- Loading branch information
Showing
6 changed files
with
553 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,3 +56,6 @@ docs/_build/ | |
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,373 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## fact.path Examples\n", | ||
"\n", | ||
"# path deconstruction\n", | ||
"\n", | ||
"Sometimes one wants to iterate over a bunch of file paths and get the (night, run) integer tuple from the path. Often in order to retrieve information for each file from the RunInfo DB. \n", | ||
"\n", | ||
"Often the paths we get from something like:\n", | ||
"\n", | ||
" paths = glob('/fact/raw/*/*/*/*')\n", | ||
" \n", | ||
"Below I have defined a couple of example paths, which I want to deconstruct.\n", | ||
"Note that not all of the `paths_for_parsing` contain the typical \"yyyy/mm/dd\" part.\n", | ||
"Still the `night` and `run` are found just fine. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Help on function parse in module fact.path:\n", | ||
"\n", | ||
"parse(path)\n", | ||
" Return a dict with {prefix, suffix, night, run} parsed from path.\n", | ||
" \n", | ||
" path: string\n", | ||
" any (absolute) path should be fine.\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from fact.path import parse\n", | ||
"help(parse)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"paths_for_parsing = [\n", | ||
" '/fact/raw/2016/01/01/20160101_011.fits.fz',\n", | ||
" '/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits',\n", | ||
" '/fact/aux/2016/01/01/20160101.log',\n", | ||
" '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'\n", | ||
"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"/fact/raw/2016/01/01/20160101_011.fits.fz\n", | ||
"{'prefix': '/fact/raw', 'night': 20160101, 'run': 11, 'suffix': '.fits.fz'}\n", | ||
"\n", | ||
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n", | ||
"{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.FSC_CONTROL_TEMPERATURE.fits'}\n", | ||
"\n", | ||
"/fact/aux/2016/01/01/20160101.log\n", | ||
"{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.log'}\n", | ||
"\n", | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root\n", | ||
"{'prefix': '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b', 'night': 20140115, 'run': 79, 'suffix': '_079.root'}\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"for path in paths_for_parsing:\n", | ||
" print(path)\n", | ||
" print(parse(path))\n", | ||
" print()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"The slowest run took 4.25 times longer than the fastest. This could mean that an intermediate result is being cached.\n", | ||
"100000 loops, best of 3: 3.08 µs per loop\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%timeit parse(paths_for_parsing[0])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Parsing is quicker than 10µs, but at the moment we have in the order of 250k runs, so parsing all paths in the raw folder might take as long as 2.5 seconds.\n", | ||
"\n", | ||
"However, usually `glob` is taking much longer to actually get all the paths in the first place, so speed should not be an issue.\n", | ||
"\n", | ||
"----\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Path construction\n", | ||
"\n", | ||
"Equally often, people already have runs from the RunInfo DB, and want to find the according files. Be it raw files or aux-files or other files, that happen to sit in a similar tree-like directory structure, like for example the photon-stream files.\n", | ||
"\n", | ||
"the typical task starts with the (night, run) tuple and wants to create a path like\n", | ||
"\"/gpfs0/fact/processing/photon-stream/yyyy/mm/dd/night_run.phs.jsonl.gz\"\n", | ||
"\n", | ||
"Or similar." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Help on function tree_path in module fact.path:\n", | ||
"\n", | ||
"tree_path(night, run, prefix, suffix)\n", | ||
" Make a tree_path from a (night, run) for given prefix, suffix\n", | ||
" \n", | ||
" night: int or string\n", | ||
" eg. 20160101 or '20160101'\n", | ||
" run: int or string\n", | ||
" eg. 11 or '011' or None (int, string or None accepted)\n", | ||
" prefix: string\n", | ||
" eg. '/fact/raw' or '/fact/aux'\n", | ||
" suffix: string\n", | ||
" eg. '.fits.fz' or '.log' or '.AUX_FOO.fits'\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from fact.path import tree_path\n", | ||
"help(tree_path)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from functools import partial\n", | ||
"\n", | ||
"night_run_tuples = [\n", | ||
" (20160101, 1),\n", | ||
" (20160101, 2),\n", | ||
" (20130506, 3),\n", | ||
"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_001.phs.jsonl.gz\n", | ||
"/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_002.phs.jsonl.gz\n", | ||
"/gpfs0/fact/processing/photon-stream/2013/05/06/20130506_003.phs.jsonl.gz\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"photon_stream_path = partial(tree_path,\n", | ||
" prefix='/gpfs0/fact/processing/photon-stream',\n", | ||
" suffix='.phs.jsonl.gz'\n", | ||
")\n", | ||
"for night, run in night_run_tuples:\n", | ||
" print(photon_stream_path(night, run))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n", | ||
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n", | ||
"/fact/aux/2013/05/06/20130506.FSC_CONTROL_TEMPERATURE.fits\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"aux_path = partial(\n", | ||
" tree_path,\n", | ||
" prefix='/fact/aux',\n", | ||
" suffix='.FSC_CONTROL_TEMPERATURE.fits',\n", | ||
" run=None\n", | ||
")\n", | ||
"for night, run in night_run_tuples:\n", | ||
" print(aux_path(night))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"But what about more special cases? I sometime copy files from ISDC or La Palma to my machine in order to work with them locally and try something out. In the past I often did not bother to recreate the yyyy/mm/dd file structure, since I copied the files e.g. like this:\n", | ||
"\n", | ||
" scp isdc:/fact/aux/*/*/*/*.FSC_CONTROL_TEMPERATURE.fits ~/fact/aux_toy/.\n", | ||
" \n", | ||
"In this case I cannot make use of the `TreePath` thing, so I have to roll my own solution again?\n", | ||
"\n", | ||
"Nope! We have you covered. Assume you have a quite sepcialized path format like e.g. this:\n", | ||
"\n", | ||
" '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'\n", | ||
"\n", | ||
" * yyyy/mm/dd tree structure missing, and \n", | ||
" * file name contains **two** not one run id.\n", | ||
" \n", | ||
"Just define a template for this filename, e.g. like this:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Help on function template_to_path in module fact.path:\n", | ||
"\n", | ||
"template_to_path(night, run, template, **kwargs)\n", | ||
" Make path from template and (night, run) using kwargs existing.\n", | ||
" \n", | ||
" night: int or string\n", | ||
" e.g. night = 20160102 (int)\n", | ||
" is used to create Y,M,D,N template values as:\n", | ||
" Y = \"2016\"\n", | ||
" M = \"01\"\n", | ||
" D = \"02\"\n", | ||
" N = \"20160101\"\n", | ||
" run: int or string\n", | ||
" e.g. run = 1 or run = \"000000001\"\n", | ||
" is used to create template value R = \"001\"\n", | ||
" template: string\n", | ||
" e.g. \"/foo/bar/{Y}/baz/{R}_{M}_{D}.gz.{N}\"\n", | ||
" kwargs:\n", | ||
" if template contains other place holders than Y,M,D,N,R\n", | ||
" kwargs are used to format these.\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from fact.path import template_to_path\n", | ||
"help(template_to_path)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_001.root\n", | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_002.root\n", | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_003.root\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"single_pe_path = partial(\n", | ||
" template_to_path,\n", | ||
" template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{R}.root'\n", | ||
")\n", | ||
"\n", | ||
"for night, run in night_run_tuples:\n", | ||
" print(single_pe_path(night, run))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Okay but what if the 2nd run id is not always the same as the first?\n", | ||
"\n", | ||
"In that case you'll have to type a bit more:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_003.root\n", | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_004.root\n", | ||
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_005.root\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"single_pe_path_2runs = partial(\n", | ||
" template_to_path,\n", | ||
" template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{run2:03d}.root'\n", | ||
")\n", | ||
"\n", | ||
"for night, run in night_run_tuples:\n", | ||
" print(single_pe_path_2runs(night, run, run2=run+2))" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
0.12.0 | ||
0.12.1 |
Oops, something went wrong.