-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single source for resource and column names #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good to me, this PR seems a big step forward towards a couple of issues (role of config file in the library and standard names for files).
After @sbesson review, for me this is good to go.
README.rst
Outdated
this will create the *.ini configuration file* | ||
|
||
This file should look something like this: | ||
+ **step 1** - create a `cell_track_dpkg.ini` configuration file and place it in the same directory as your tracking file. The file must be structured as follows: | ||
|
||
.. code-block:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for having deleted the writeConfigFile.py
.
"schema": { | ||
"foreignKeys": [{ | ||
"fields": "SPOT_ID", | ||
"reference": { | ||
"resource": "objectsTable", | ||
"resource": "objects_table", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going towards spec compliance, nice!
Z_COORD_NAME = "z_coord_cmso" | ||
FRAME_NAME = "frame_cmso" | ||
OBJECT_NAME = "object_id_cmso" | ||
LINK_NAME = "link_id_cmso" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simleo - the fields in this names
file are standard, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file (does not necessarily have to be called names.py
, we can change that here or in the future) is meant to be the single source where our resource / column names are defined. Everywhere else in the code we should use these vars (e.g., names.OBJECT_NAME
) rather than hardwire the values. I have kept the current values (e.g., "object_id_cmso"
), but they can change depending on what we agree upon for the spec (keeping in mind datapackage spec compliance, of course). Indeed, I have proposed some changes myself in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
dpkg/pushtopandas.py
Outdated
@@ -14,8 +14,9 @@ def push_to_pandas(directory, object_id_cmso): | |||
storage = dp.push_datapackage(descriptor=descr, backend='pandas') | |||
print(storage.buckets) | |||
|
|||
objects = storage['objects___objectstable'] | |||
links = storage['links___linkstable'] | |||
# FIXME: the following is hardwired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simleo - can we get this from the names
file as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked more deeply into this issue. The push_datapackage
function generates those keys based on the resource's path and name via a convert_path
function:
table = os.path.splitext(path)[0]
table = table.replace(os.path.sep, '__')
if name is not None:
table = '___'.join([table, name])
table = re.sub('[^0-9a-zA-Z_]+', '_', table)
table = table.lower()
return table
Until datapackage version 0.x this was implemented in the public API (from datapackage.mappers import convert_path
), but they have moved it to an internal function in datapackage.pushpull
for the upcoming 1.0.0 version (they are at v1.0.0-alpha4 now). This is not good. I will open an issue on their repo. We can implement a workaround in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like they have this already scheduled for a change, see frictionlessdata/datapackage-py#110 which they "merged" into frictionlessdata/datapackage-py#132.
Before this is merged can someone please double check that at least one example runs correctly (e.g., |
@simleo I have tested this for the One last thing before merging. The step 1: install the library (I did Do you mind committing these edits before we proceed with the merging? Thanks. |
Thanks! For me this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed earlier today with @simleo @pcmasuzzo @gsergeant. Two minor comments about using relative imports within the package but these can be sorted out in upcoming PRs.
Overall, the strategy of defining and sourcing constants in a dedicated module makes sense and I don't have an obvious better suggestion than names
at the moment. Merging so that we can move forwards with the package renaming (see #13)
@@ -7,6 +7,8 @@ | |||
import xlrd | |||
from xlrd import XLRDError | |||
|
|||
import dpkg.names as names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use relative imports within the package?
from jsontableschema import infer | ||
import dpkg.names as names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use relative imports within the package?
A first step towards fixing #8 and #3, but it does not address everything because #14 needs to be discussed first.
Note that:
The
dpkg.py
has been moved to an external script (scripts/create_dpkg.py
). It's not part of the library and it had the same name as the package, which would have messed up absolute imports. This means that the package now has to be installed.writeConfigFile.py
is gone. The user can create the.ini
file directly, that should be easier and less error-prone than modifying Python code (which would have been installed anyway)."objects.csv"
and"links.csv"
are still hardwired inreadfile.py
andcreatedp.py
. We should define these in the library as well at some point. There is more hardwired stuff, see Role of config file in data package creation #14.I would follow @JaimePrilusky's suggestion to prepend rather than append
cmso
, e.g.,cmso_x_coord
.For consistency, we might want to use
cmso_frame_id
instead ofcmso_frame
.