Added files via upload #46

LivKelley · 2016-02-25T16:58:23Z

Submitting a text mining project and PDF

Oops... I forgot the doctest line on the bottom. I added that in, so hopefully everything works now.

rdiverdi · 2016-03-18T21:30:45Z

Review status: 0 of 3 files reviewed at latest revision, 10 unresolved discussions.

AnalysisofCloningonWikipedia.py, line 1 [r1] (raw file):
Nice job, everything here works well.
There were a couple of improvements you could make to make the code more efficient and more robust
Also, did you look into sentiment analysis from the pattern library?

This was scoped as a pretty small project, it might have been good to do something like search through other pages, and find an average amount of positive and negative words, so you could classify these according to how close they are to neutral. You could also have normalized the output so that it wouldn't be effected by the length of the article.

AnalysisofCloningonWikipedia.py, line 12 [r1] (raw file):
Where did these lists come from?

AnalysisofCloningonWikipedia.py, line 26 [r1] (raw file):
This test fails

AnalysisofCloningonWikipedia.py, line 42 [r1] (raw file):
Why do you split the file up by lines instead of looking through the whole file at once?

AnalysisofCloningonWikipedia.py, line 46 [r1] (raw file):
Nested for loops are very slow
In this case, it would have been better to just loop through the words in b, then test if the word was in the list of negative words.

for word in b:
    if word in negative_search_words:
        ...

Also, a comment here would have been helpful to know what the for loops are doing

AnalysisofCloningonWikipedia.py, line 49 [r1] (raw file):
If you only ever care about the total number of negative words, then it doesn't make sense to keep a dictionary which stores the count of each individual word. Instead, you could just use a variable for total number of negative words which starts at 0 and increments in your if statement.

AnalysisofCloningonWikipedia.py, line 57 [r1] (raw file):
This test will miss any words with punctuation immediately before or after them:
for example "hi" == "hi." is false.
There is a list called punctuation, which is built into the string class, and a strip method of strings, which lets you pull the punctuation off of words:

import string
word = '.banana.'
word.strip(string.punctuation)

outputs 'banana'

AnalysisofCloningonWikipedia.py, line 65 [r1] (raw file):
You need to import doctest to use it

DownloadWikiArticlesGenetics.py, line 1 [r1] (raw file):
You never use pickle here:
Pickle is essentially just a formatting scheme used to represent python variables such as dictionaries or lists as a string. That is helpful because a string can be saved in a file that can be accessed later.

Here, you are just storing strings, so you don't need pickle (and you don't actually use it anywhere).

DownloadWikiArticlesGenetics.py, line 13 [r1] (raw file):
I'm guessing you found this somewhere and aren't exactly sure what it does (which isn't really a problem)
this does the same thing as:

f = open( '...' , 'w')
f.write( ... )

where "open" opens the specified file, and stores the file in the variable f.
f.write then writes a string into the file f
In addition, it is always good practice to close files you use: it makes sure that everything is saved correctly
f.close()

Comments from the review on Reviewable.io

LivKelley added 2 commits February 25, 2016 11:58

Added files via upload

b50c752

Submitting a text mining project and PDF

Added files via upload

1b6e10e

Oops... I forgot the doctest line on the bottom. I added that in, so hopefully everything works now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added files via upload #46

Added files via upload #46

LivKelley commented Feb 25, 2016

rdiverdi commented Mar 18, 2016

Added files via upload #46

Are you sure you want to change the base?

Added files via upload #46

Conversation

LivKelley commented Feb 25, 2016

rdiverdi commented Mar 18, 2016