Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement extract_from_text to get neutral citations for pasuperct #1251

Closed
grossir opened this issue Nov 21, 2024 · 3 comments
Closed

Implement extract_from_text to get neutral citations for pasuperct #1251

grossir opened this issue Nov 21, 2024 · 3 comments
Assignees

Comments

@grossir
Copy link
Contributor

grossir commented Nov 21, 2024

Neutral citations are present inside the document's text, but we are not collecting them. Once we implement this, and freelawproject/courtlistener#4520 is merged, we can collect those citations

Example
Image

Related to #858 (comment)

@grossir
Copy link
Contributor Author

grossir commented Nov 21, 2024

@flooie is this a neutral citation? I was checking reporters-db, and it's listed as a variation of a state citation
https://github.com/freelawproject/reporters-db/blob/3eab4222612154224fb5e513741b5bf7b8cfa252/reporters_db/data/reporters.json#L21493

The indigo book lists it as a "public domain" citation
Image

@grossir
Copy link
Contributor Author

grossir commented Jan 13, 2025

This is working as far as parsing goes; but there is a validation bug in Courtlistener that makes us unable to ingest it. We have PRs addressing the problem. After that, it's a matter of re-running update_from_text

(Not so sure about the start date)

./manage.py update_from_text --courts juriscraper.opinions.united_states.state.pasuperct --cluster-status Published --date-filed-gte 2019-08-01 --date-filed-lte 2025-01-01 --verbosity 3

Output:
INFO Modified objects counts: {'Docket': 0, 'OpinionCluster': 0, 'Opinion': 0, 'Citation': 1567, 'No text to extract from': 68, 'No metadata extracted': 0, 'Error': 0}

@grossir grossir self-assigned this Jan 13, 2025
@flooie flooie moved this to General Backlog in Case Law Sprint Jan 14, 2025
@grossir
Copy link
Contributor Author

grossir commented Jan 16, 2025

We gained around 1638 citations from this run. There may be more due to a bug on the first backscrape (missing pagination); which I will collect in another issue

@grossir grossir closed this as completed Jan 16, 2025
@github-project-automation github-project-automation bot moved this from General Backlog to Done in Case Law Sprint Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant