if file in workspace directory
- if the file is larger than 10MB, skip it
- check if it's exists based on md5
- if not, try to use .github/catalog, and call AI to classify it and copy it to the corresponding directory
- if still not classified, ask human curator to classify it
rename the workspace dir to old_workspace.
And then run the script to update the database.