Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pixiv metadata plugin #950

Merged
merged 25 commits into from
Mar 27, 2024
Merged

Add Pixiv metadata plugin #950

merged 25 commits into from
Mar 27, 2024

Conversation

psilabs-dev
Copy link
Contributor

@psilabs-dev psilabs-dev commented Feb 19, 2024

Adds Pixiv cookie login and metadata extraction for illustrations from pixiv.net.

usage

Supported one-shot parameter format (can extract $illust_id for metadata extraction):

  • simple ID extraction:$illust_id
  • URL-based extraction: pixiv.net/en/artworks/$illust_id

Supported archive file format:

  • {$illust_id} title_of_work
  • pixiv_{$illust_id} title_of_work

parameters

  • tag_languages: include comma-separated list of strings for languages to extract (e.g. "en", "jp"). By default, an empty string parameter corresponds to extraction of untranslated tags, i.e. "jp" tags.

metadata

A generic Pixiv illustration includes the following extractable metadata:

  • tags: list[str]: list of comma-delimited strings for tags.
  • source: str: URL link to Pixiv artwork
  • pixiv_user_id: int: artist ID on pixiv (can be used to find user channel: https://pixiv.net/en/users/$user_id
  • artist: str: name of artist/username
  • date_created: int: epoch time of artwork creation in seconds
  • date_uploaded: int: epoch time of upload in seconds

If the illustration is a manga, this plugin may include additional manga metadata:

  • pixiv_series_id: int: ID of the series which this manga belongs to.
  • pixiv_series_title: str: name of the manga series
  • pixiv_series_order: int: position of this illustration within the manga series

Taggable strings (e.g. tags, artist, series title) are sanitized of special characters by the following logic:

  • underscores are replaced by a space (e.g. "lorem_ipsum" -> "lorem ipsum")
  • dashes preceded by space are removed (e.g. "lorem -ipsum" -> "lorem ipsum"; "lorem-ipsum" -> "lorem-ipsum")
  • other special characters (["?*%$:]) are removed (e.g. "lorem: ipsum" -> "lorem ipsum")

* add metadata by pixiv illustration id

* add metadata from archive title

* add en tags, illust id capture

* support translated tags

* add default user agent

* clean up

* streamline illust metadata logic

* add manga specific metadata

* readme
fix bug where series metadata exists but is null
sanitize tags correct url embed
* add pixiv tests

* add pixiv to module tests

* pass github actions (#7)

fix tests so they pass
* add refactor and tests

* fix pixiv tests
@psilabs-dev psilabs-dev changed the title WIP: Add Pixiv metadata plugin Add Pixiv metadata plugin Feb 29, 2024
@psilabs-dev psilabs-dev marked this pull request as ready for review February 29, 2024 06:39
@psilabs-dev
Copy link
Contributor Author

For some of the fields, (e.g. user_id, series_id), this is exclusively a pixiv property, but different sources might also have their own user_id field which might result in possible metadata conflict. I'm thinking there are few ways to resolve this, which one is better?

  • first way: make category specific to pixiv: pixiv_user_id: 123456.
  • second way: making the id point to pixiv: user_id: https://www.pixiv.net/users/123456.

@Difegue
Copy link
Owner

Difegue commented Mar 2, 2024

A pixiv-specific namespace sounds like the way to go to me.

@Difegue
Copy link
Owner

Difegue commented Mar 27, 2024

👋 Apologies for the long time reviewing this; Stuff's been busy!

On top of it, well, I don't have much to say at all... this looks perfect. 😤
Who the hell goes this hard for a first-time contribution? Thanks a lot, I am humbled. 🙇

@Difegue Difegue merged commit 4913beb into Difegue:dev Mar 27, 2024
1 check passed
Copy link

holopin-bot bot commented Mar 28, 2024

Congratulations @psilabs-dev, you just earned a holobyte! Here it is: https://holopin.io/holobyte/cluagx9w2448240fjqqanb8t72

This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account.
Or if you're new to Holopin, you can simply sign up with GitHub, which will do the trick!

@psilabs-dev
Copy link
Contributor Author

Thanks👌 Development was easy thanks to seeing/copying the previous plugins haha, I just cleaned it up a bit. Love the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants