Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scholar] minor changes #31

Open
5 of 6 tasks
boogheta opened this issue Nov 14, 2024 · 5 comments
Open
5 of 6 tasks

[scholar] minor changes #31

boogheta opened this issue Nov 14, 2024 · 5 comments
Assignees

Comments

@boogheta
Copy link
Member

boogheta commented Nov 14, 2024

  • remove from title the [CITATION] [PDF] [HTML] extra info and store it as an external column type
  • maybe we can try and extract the last piece of the green section as a journal field
  • Add date to scraper: there's often a year at best in the green meta bar of each results, it should be easy to catch it with some regexp matching 4 digits starting with 19 or 20
  • cleanup line breaks in descriptions
  • The author detection should not stop directly after a - but at least a - or even a -
  • Sometimes the authors detected are not really authors, but that's quite hard to find a heuristic to distinguish them from the authors so...
@Yomguithereal
Copy link
Member

Won't we leave out search results from the XIXth century then? Asking for a friend.

@boogheta boogheta changed the title [scholar] Add date to scraper [scholar] minor changes Nov 14, 2024
@boogheta
Copy link
Member Author

@Yomguithereal: we can also include 18 just for your friend

@Yomguithereal
Copy link
Member

Okay but what about l'an mille and negative dates then?

@boogheta
Copy link
Member Author

I didn't know Platon was peer reviewed, but I'll be interested in reading its 2nd reviewer's comments

@Yomguithereal
Copy link
Member

Reviewer 2: « the whole article felt too cavernous for my taste »

jpontoire added a commit that referenced this issue Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants