Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Unterminated string starting at: line 1 column 1175 (char 1174) #1

Open
loretoparisi opened this issue Oct 24, 2018 · 1 comment

Comments

@loretoparisi
Copy link

Hello, I get this error when running the preprocess_releases_json_to_hdf_pandas.py

Loading json dump into a pandas DataFrame
Processed 500000 releases
Processed 1000000 releases
Processed 1500000 releases
Processed 2000000 releases
Processed 2500000 releases
Processed 3000000 releases
Processed 3500000 releases
Processed 4000000 releases
Processed 4500000 releases
Processed 5000000 releases
Processed 5500000 releases
Processed 6000000 releases
Processed 6500000 releases
Processed 7000000 releases
Processed 7500000 releases
Processed 8000000 releases
Processed 8500000 releases
Processed 9000000 releases
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-4df24afa67c8> in <module>()
----> 1 from preprocess_releases_json_to_hdf_pandas.py import *

/Users/loretoparisi/Documents/Projects/AI/ismir2017-discogs/code/preprocess_releases_json_to_hdf_pandas.py in <module>()
    134 else:
    135     print("Loading json dump into a pandas DataFrame")
--> 136     data = load_releases(ignore_genres=IGNORE_GENRES, part=100)
    137     print("Saving DataFrame to %s" % dump_pandas)
    138     data.to_hdf(dump_pandas, 'w')

/Users/loretoparisi/Documents/Projects/AI/ismir2017-discogs/code/preprocess_releases_json_to_hdf_pandas.py in load_releases(size, part, ignore_genres)
     69             if not i % (100/part):
     70 
---> 71                 release = json.loads(jsonline)
     72 
     73                 # remove some columns that we won't use to save memory

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    336             parse_int is None and parse_float is None and
    337             parse_constant is None and object_pairs_hook is None and not kw):
--> 338         return _default_decoder.decode(s)
    339     if cls is None:
    340         cls = JSONDecoder

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    364 
    365         """
--> 366         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    367         end = _w(s, end).end()
    368         if end != len(s):

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    380         """
    381         try:
--> 382             obj, end = self.scan_once(s, idx)
    383         except StopIteration:
    384             raise ValueError("No JSON object could be decoded")

ValueError: Unterminated string starting at: line 1 column 1175 (char 1174)

I have updated the data to 2018 releases here https://github.com/loretoparisi/ismir2017-discogs/blob/master/code/config.py
Everything worked properly, so in my data/ folder I have

ip-192-168-22-127:discogs loretoparisi$ tree -L 1 -h
.
├── [239M]  discogs_20180101_artists.xml.gz
├── [ 39M]  discogs_20180101_labels.xml.gz
├── [152M]  discogs_20180101_masters.xml.gz
├── [9.0G]  discogs_20180101_releases.json.dump
└── [5.1G]  discogs_20180101_releases.xml.gz

0 directories, 5 files
@dbogdanov
Copy link
Owner

Hi @loretoparisi, I'll have a look and try this new dump next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants