Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some albumUrl's invalid? #59

Open
drone1 opened this issue Oct 17, 2021 · 5 comments
Open

Some albumUrl's invalid? #59

drone1 opened this issue Oct 17, 2021 · 5 comments

Comments

@drone1
Copy link

drone1 commented Oct 17, 2021

Hello and thanks for this great package.

I don't see what's different about certain URL's or label/artist profiles that would affect this, but a call like:

bandcamp.getAlbumInfo('https://yantmusicuk.bandcamp.com/?label=3961057738&tab=artists/album/contravention-ep-sk11x006', ...)

loads data just fine, but another URL (also returned from getArtistInfo) does not return data:

bandcamp.getAlbumInfo('https://borderonerecords.bandcamp.com/?label=3961057738&tab=artists/album/zener-diode-volt001a', ...)

You'll notice that if you point your browser to the first URL, the album page loads, whereas the second URL redirects to the artist's album grid.

When I click on a link to the album in question, the URL looks different from that returned from getArtistInfo, so I'm wondering if perhaps something's changed and needs to be updated?

Thanks again.

@drone1
Copy link
Author

drone1 commented Oct 17, 2021

@89z as I said, these URL's being extracted from the result of a call to getArtistInfo (in the album.url properties).

@drone1
Copy link
Author

drone1 commented Oct 17, 2021

I guess a better title for this thread might be, Is getArtistInfo returning bad album URLs? Because they don't work.

@drone1
Copy link
Author

drone1 commented Oct 17, 2021

Oh hm. Look, so this is just one of the URLs that comes back from getArtistUrls:

https://borderonerecords.bandcamp.com/?label=3961057738&tab=artists

So it's already got the query string on there, and this is presumably affecting album.url in the result of getArtistInfo.

The root call is like this:

bandcamp.getArtistUrls(labelUrl, function (error, artistsUrls) {```

@drone1
Copy link
Author

drone1 commented Oct 18, 2021

My code is quite simple and is based on your examples. I'm scraping a given label's releases, by using getArtistUrls -> getArtistInfo -> getAlbumInfo

I get the same pattern of result when I use a random label on BC's home page, e.g.:

getArtistUrls('https://multiculti.bandcamp.com/', ...) to get artist URLs results in:

[
  'https://nicolacruz.bandcamp.com/?label=846803195&tab=artists',
  'https://vonparty.bandcamp.com/?label=846803195&tab=artists',
  'https://dreemsdreems.bandcamp.com/?label=846803195&tab=artists',
  ...
]

This seems like your bug here. Why do these artist URLs contain this query string? Because now when one uses these URLs to do the following:

artistsUrls.forEach(url => getArtistInfo(url, (err, artistInfo) => ...))

artistInfo.album.url also includes this unneeded query string, e.g. ?label=3961057738&tab=artists which results in the nonsense URLs.

Am I using the package in an unintended way? It seems quite fundamentally broken. Perhaps some tests would be useful.

@masterT
Copy link
Owner

masterT commented Oct 24, 2021

What is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants