This repo contains scripts that are meant to implement one-off large-scale changes to INSPIRE or gather complex information.
To add a new script, create a new directory under scripts
containing a single script.py
file implementing the script logic, then create a PR. Auxiliary files will be generated automatically when merging to master.
Often, a task can be fit into the following pattern:
- Search for a set of records that need to be handled
- Check for each record whether it needs to be modified, or print some metadata from the record
- Do some modifications on the record passing the check
In those cases, one can subclass the SearchCheckDo
class which provides logic to perform these operations. The script will have the following structure:
from inspirehep.curation.search_check_do import SearchCheckDo
class MyCustomAction(SearchCheckDo):
"""Explain what this does."""
# Literature is default, ``search_class`` needs to be set for other
# collections
# search_class = LiteratureSearch
query = ... # a custom query, like "t electron"
@staticmethod
def check(record, logger, state):
# Use ``record`` to check if it needs to be modified, return True if
# so. Optionally use ``logger`` to log additional info and ``state`` to
# transmit some data to the ``do`` step.
...
@staticmethod
def do(record, logger, state):
# Mutate ``record`` to make modifications.
# Optionally use ``logger`` to log additional info and ``state`` to
# retrieve some data to the ``do`` step.
...
MyCustomAction()
Concrete examples can be found here or under scripts.
By default, when running the script, the output will contain which record was checked and whether the check was positive and hence the record modified. If you want to output additional information from the record in the check
or do
phase, you can use the logger
that's passed to the method. It is an instance of a structlog logger. You can use it similarly to the standard library logger (with methods like warning
or info
to be used depending on the importance of the message): you can pass it a string for the message and additional arbitrary arguments with extra data you want to output. For example, you could do
logger.info(
"More details about the record.",
title=record["title"],
first_author=record.set_value("authors[0].full_name"),
)
The recid is included automatically, you don't need to add it yourself.