A Python script to search and scrape code repositories from grep.app using customizable search queries and filters.
- Search code repositories based on keywords, language, and page range.
- Filter repositories based on a specified language (supports multiple languages).
- Optionally exclude results based on keywords.
- Save repository URLs to a file.
- Optionally view code snippets with the
--show-snippet
argument. - Support for regular expressions in search queries.
-
Clone this repository:
git clone https://github.com/your-username/grepapp-scraper.git cd grepapp-scraper
-
Install dependencies:
pip install -r requirements.txt
You can use this script from the command line as follows:
python grepapp_scraper.py --search-term "<search-term>" --lang "<language>" --max <number-of-pages> [--show-snippet] [--exclude "<comma-separated-exclude-terms>"]
- --search-term : Search term to match in the code repositories (required).
- --lang : Comma-separated list of languages to filter results by (e.g., Python,JavaScript) (required).
- --max : Maximum number of pages to scrape. Default is 4.
- --show-snippet: Option to display the code snippet for each search result.
- --exclude : Option to exclude results containing specified terms (comma-separated).
Search for repositories containing the term encryption in both Python and JavaScript code:
python grepapp_scraper.py --search-term "encryption" --lang "Python,JavaScript" --max 3
Search with the term encryption, exclude results containing test, and show code snippets:
python grepapp_scraper.py --search-term "encryption" --lang "Python,JavaScript" --max 3 --show-snippet --exclude "test"
If you want to both display results and save them to a CSV:
python3 run.py --search-term "encryption" --lang "Python,JavaScript" --max 3 --csv
By default, the script will print the repository URLs found in the search. If --show-snippet is enabled, it will also display a code snippet from the results.
Repositories can be saved in csv format with --csv