KrdWrd

The KrdWrd Project ran from 2008 to 2011. The mission statement was

Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora.

Develop a classification engine that learns to automatically annotate pages, and provide visual tools for inspection of results.

Basically, it was an infrastructure for research into web page cleaning. A good overview can be found in the paper and an extensive description in the master's thesis (both, see further down).

Remnants

The annotation guidelines and the Firefox add-on manual are still available online and as pdf file.
The CANOLA Corpus

System Components

The system consisted of

Firefox Add-on for interactive visual annotation and retrieval of tagging results
XULRunner application for batch processing of web pages
Web Proxy and additional server-side infrastructure for providing access to corpora and storing annotation results
Server-side Machine Learning infrastructure for experiments with cleaning models

This is part of the general infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
apache2/site-addons		apache2/site-addons
db		db
jamf		jamf
mail		mail
vizcoord		vizcoord
wwwoffle		wwwoffle
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KrdWrd

Remnants

System Components

About

Releases

Packages

Languages

krdwrd/src_utils

Folders and files

Latest commit

History

Repository files navigation

KrdWrd

Remnants

System Components

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages