Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse_wikidump: programmatic access #28

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nucflash
Copy link

Refactored parse_wikidump to allow programmatic access to its functionality, e.g., initialize db, and initiate downloads via code.

Manos Tsagkias added 2 commits February 25, 2015 15:13
…ality, e.g., initialize db, and initiate downloads via code.
help='Download snapshot if it does not exist as snapshot.xml.bz2. The corpus file name should match that of snapshot.')
parser.add_argument('-N', '--ngram', dest='ngram', default=7, type=int,
help='Maximum order of ngrams, set to None to disable [default: 7].')
args = parser.parse_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @IsaacHaze will be very unhappy when he sees his darling docopt replaced by ArgumentParser...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know. I was inspired from how things are being done in xtas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was written before I knew about docopt :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:'(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so docopt is the current default for parsing args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'll revert the arg parsing logic to use docopt, then.

@larsmans
Copy link
Contributor

I see how this is a useful feature, but I think it should be implemented differently, viz. by moving functionality from __main__.py (which is mean to be only a script entry point) to the module. E.g., move the functionality you might want to import from your code to parse_wikidump/__init__.py and then import it in __main__.py.

@nucflash
Copy link
Author

Good points. Let me work on it a bit more and will hear from me soon.

…unction in __init__ that takes care of downloading.
@IsaacHaze
Copy link
Contributor

But apart from the sadness, i'm all for this change (i did something similar in my joblib branch.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants