Code from the InnoSuisse AIRating project.
The AIRating project comprises various components for crawling, annotating, and processing web resources to generate Impaakt ratings, SASB topics, and company mentions.
The crawler
directory contains code for crawling monthly updates of resources (i.e., webpages) from news and organizational websites using CommonCrawl.
The annotator
directory contains the Annotator API, a tool designed to scrape web sources, including HTML pages and PDFs, extract text, and annotate the content for Impaakt rating, SASB topics, and company mentions.
The Evaluations
directory includes evaluation datasets and scripts for Impaakt annotator tasks.
This project is licensed under the terms specified in the LICENSE file.