-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Github Scraper Python to TypeScript (GSoC 2024 Mid-Term Evaluation) #458
Refactor Github Scraper Python to TypeScript (GSoC 2024 Mid-Term Evaluation) #458
Conversation
… type.ts and gh_events.ts
… type.ts and gh_events.ts
@dgparmar14 is attempting to deploy a commit to the Open Healthcare Network Team on Vercel. A member of the Team first needs to authorize it. |
…4/leaderboard into refactor-scrapper-gsoc
…4/leaderboard into refactor-scrapper-gsoc
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
…4/leaderboard into refactor-scrapper-gsoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I guess there will be no type errors in the scraper. I checked multiple time if there still let me know. |
0141997
into
ohcnetwork:gsoc/gh-discussions
Description
GSoC Mid-Term Evaluation: Refactoring Scraper from
Python
toTypeScript
This task involves refactoring the existing Python-based scraper into TypeScript. The transition aims to enhance the codebase by introducing
type safety
and leveraging the capabilities ofOctokit
for more efficient GitHub interactions.Fixes: #212
Week 1:
Github.py
toGithub.ts
: Convert all scraper functionalities from the Python file to TypeScript.Week 2:
Modularize Scraper: Break down the
github.ts
file into different modules for improved readability.File Structure and Features:
index.ts
: Entry point of the scraper containingmain()
andscrapGithub()
functions.fetchEvents.ts
: Fetches all GitHub events and filters out blacklisted users (configurable via the .env file).parseEvents.ts
: Parses the events fetched byfetchEvents.ts
based on required GitHub event types.fetchUserData.ts
: Fetches user-related data usingfetchOpenPulls()
andfetchMergeEvents()
.config.ts
: HandlesOctokit
authentication usingGITHUB_TOKEN
.saveData.ts
: Contains themergedData()
function to merge scraped data with previous contributor data.types.ts
: Contains all required types.utils.ts
: Contains common functions likecalculateTurnaroundTime()
,resolveAutonomyResponsibility()
,loadUserData()
, andsaveUserData()
.Week 3:
discussion.ts
: Fetches discussions and stores them in thedata/github/discussion
directory asdiscussion.json
.scraper-dry-run.yaml
: Modify the dry-run file to work with a Node.js and npm environment.github-discussion-schema.test()
for discussions.How Has This Been Tested?
The refactored scraper can be tested using the following commands:
pnpm build
pnpm start org_name data_dir date(format:YYYY-MM-DD) num_days
pnpm dev org_name data_dir date(format:YYYY-MM-DD) num_days
(Default values will be used if
date
(current date) andnum_days
(1) are not provided).