This project is designed to take in a Yale Lux Search URL and identify potentially overlapping records. The main script, separate.py
, downloads entries from the query, processes them to clean and standardize the data, and then creates a hierarchical tree structure to visualize the relationships between the entries.
- Downloads entries from a given Yale Lux Search query using LuxY.
- Cleans and standardizes the data, including handling parentheticals, abbreviations, and name parts.
- Creates a hierarchical tree structure to visualize the relationships between the entries.
- Outputs the tree structure to a specified file.
- Python 3.6+
- tqdm
- anytree
- nameparser
- luxy
-
Clone the repository:
git clone https://github.com/project-lux/lux-overlaps cd lux-overlaps
-
Install the required Python packages:
pip install -r requirements.txt
To use the script, run the following command:
python separate.py <query> [output]
<query>
: The Yale Lux Search query to process.[output]
: (Optional) The output file to save the tree structure. Defaults tooutput.txt
.
python separate.py "tolkien" output.txt
This project is licensed under the MIT License. See the LICENSE file for details.