Skip to content

Commit

Permalink
Merge pull request #73 from fact-project/fact_path_utils
Browse files Browse the repository at this point in the history
Fact path utils
  • Loading branch information
Dominik Neise authored Aug 30, 2017
2 parents 10a281b + 73738e7 commit 54d9d72
Show file tree
Hide file tree
Showing 6 changed files with 553 additions and 13 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,6 @@ docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints
373 changes: 373 additions & 0 deletions examples/path_utils.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## fact.path Examples\n",
"\n",
"# path deconstruction\n",
"\n",
"Sometimes one wants to iterate over a bunch of file paths and get the (night, run) integer tuple from the path. Often in order to retrieve information for each file from the RunInfo DB. \n",
"\n",
"Often the paths we get from something like:\n",
"\n",
" paths = glob('/fact/raw/*/*/*/*')\n",
" \n",
"Below I have defined a couple of example paths, which I want to deconstruct.\n",
"Note that not all of the `paths_for_parsing` contain the typical \"yyyy/mm/dd\" part.\n",
"Still the `night` and `run` are found just fine. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function parse in module fact.path:\n",
"\n",
"parse(path)\n",
" Return a dict with {prefix, suffix, night, run} parsed from path.\n",
" \n",
" path: string\n",
" any (absolute) path should be fine.\n",
"\n"
]
}
],
"source": [
"from fact.path import parse\n",
"help(parse)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"paths_for_parsing = [\n",
" '/fact/raw/2016/01/01/20160101_011.fits.fz',\n",
" '/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits',\n",
" '/fact/aux/2016/01/01/20160101.log',\n",
" '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/fact/raw/2016/01/01/20160101_011.fits.fz\n",
"{'prefix': '/fact/raw', 'night': 20160101, 'run': 11, 'suffix': '.fits.fz'}\n",
"\n",
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n",
"{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.FSC_CONTROL_TEMPERATURE.fits'}\n",
"\n",
"/fact/aux/2016/01/01/20160101.log\n",
"{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.log'}\n",
"\n",
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root\n",
"{'prefix': '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b', 'night': 20140115, 'run': 79, 'suffix': '_079.root'}\n",
"\n"
]
}
],
"source": [
"for path in paths_for_parsing:\n",
" print(path)\n",
" print(parse(path))\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The slowest run took 4.25 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"100000 loops, best of 3: 3.08 µs per loop\n"
]
}
],
"source": [
"%timeit parse(paths_for_parsing[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parsing is quicker than 10µs, but at the moment we have in the order of 250k runs, so parsing all paths in the raw folder might take as long as 2.5 seconds.\n",
"\n",
"However, usually `glob` is taking much longer to actually get all the paths in the first place, so speed should not be an issue.\n",
"\n",
"----\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Path construction\n",
"\n",
"Equally often, people already have runs from the RunInfo DB, and want to find the according files. Be it raw files or aux-files or other files, that happen to sit in a similar tree-like directory structure, like for example the photon-stream files.\n",
"\n",
"the typical task starts with the (night, run) tuple and wants to create a path like\n",
"\"/gpfs0/fact/processing/photon-stream/yyyy/mm/dd/night_run.phs.jsonl.gz\"\n",
"\n",
"Or similar."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function tree_path in module fact.path:\n",
"\n",
"tree_path(night, run, prefix, suffix)\n",
" Make a tree_path from a (night, run) for given prefix, suffix\n",
" \n",
" night: int or string\n",
" eg. 20160101 or '20160101'\n",
" run: int or string\n",
" eg. 11 or '011' or None (int, string or None accepted)\n",
" prefix: string\n",
" eg. '/fact/raw' or '/fact/aux'\n",
" suffix: string\n",
" eg. '.fits.fz' or '.log' or '.AUX_FOO.fits'\n",
"\n"
]
}
],
"source": [
"from fact.path import tree_path\n",
"help(tree_path)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from functools import partial\n",
"\n",
"night_run_tuples = [\n",
" (20160101, 1),\n",
" (20160101, 2),\n",
" (20130506, 3),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_001.phs.jsonl.gz\n",
"/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_002.phs.jsonl.gz\n",
"/gpfs0/fact/processing/photon-stream/2013/05/06/20130506_003.phs.jsonl.gz\n"
]
}
],
"source": [
"photon_stream_path = partial(tree_path,\n",
" prefix='/gpfs0/fact/processing/photon-stream',\n",
" suffix='.phs.jsonl.gz'\n",
")\n",
"for night, run in night_run_tuples:\n",
" print(photon_stream_path(night, run))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n",
"/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits\n",
"/fact/aux/2013/05/06/20130506.FSC_CONTROL_TEMPERATURE.fits\n"
]
}
],
"source": [
"aux_path = partial(\n",
" tree_path,\n",
" prefix='/fact/aux',\n",
" suffix='.FSC_CONTROL_TEMPERATURE.fits',\n",
" run=None\n",
")\n",
"for night, run in night_run_tuples:\n",
" print(aux_path(night))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"But what about more special cases? I sometime copy files from ISDC or La Palma to my machine in order to work with them locally and try something out. In the past I often did not bother to recreate the yyyy/mm/dd file structure, since I copied the files e.g. like this:\n",
"\n",
" scp isdc:/fact/aux/*/*/*/*.FSC_CONTROL_TEMPERATURE.fits ~/fact/aux_toy/.\n",
" \n",
"In this case I cannot make use of the `TreePath` thing, so I have to roll my own solution again?\n",
"\n",
"Nope! We have you covered. Assume you have a quite sepcialized path format like e.g. this:\n",
"\n",
" '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'\n",
"\n",
" * yyyy/mm/dd tree structure missing, and \n",
" * file name contains **two** not one run id.\n",
" \n",
"Just define a template for this filename, e.g. like this:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function template_to_path in module fact.path:\n",
"\n",
"template_to_path(night, run, template, **kwargs)\n",
" Make path from template and (night, run) using kwargs existing.\n",
" \n",
" night: int or string\n",
" e.g. night = 20160102 (int)\n",
" is used to create Y,M,D,N template values as:\n",
" Y = \"2016\"\n",
" M = \"01\"\n",
" D = \"02\"\n",
" N = \"20160101\"\n",
" run: int or string\n",
" e.g. run = 1 or run = \"000000001\"\n",
" is used to create template value R = \"001\"\n",
" template: string\n",
" e.g. \"/foo/bar/{Y}/baz/{R}_{M}_{D}.gz.{N}\"\n",
" kwargs:\n",
" if template contains other place holders than Y,M,D,N,R\n",
" kwargs are used to format these.\n",
"\n"
]
}
],
"source": [
"from fact.path import template_to_path\n",
"help(template_to_path)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_001.root\n",
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_002.root\n",
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_003.root\n"
]
}
],
"source": [
"single_pe_path = partial(\n",
" template_to_path,\n",
" template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{R}.root'\n",
")\n",
"\n",
"for night, run in night_run_tuples:\n",
" print(single_pe_path(night, run))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay but what if the 2nd run id is not always the same as the first?\n",
"\n",
"In that case you'll have to type a bit more:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_003.root\n",
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_004.root\n",
"/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_005.root\n"
]
}
],
"source": [
"single_pe_path_2runs = partial(\n",
" template_to_path,\n",
" template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{run2:03d}.root'\n",
")\n",
"\n",
"for night, run in night_run_tuples:\n",
" print(single_pe_path_2runs(night, run, run2=run+2))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 1 addition & 1 deletion fact/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.12.0
0.12.1
Loading

0 comments on commit 54d9d72

Please sign in to comment.