Skip to content

xiangyan99/HtmlParserSharp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HtmlParserSharp

This is a manual C# port of the Validator.nu HTML Parser, a HTML5 parser originally written in Java and (compiled to C++ using the Google Web Toolkit) used by Mozilla's Gecko rendering engine. The port uses the DOM implemented in System.Xml.

Status

PLEASE SEE https://github.com/jamietre/HtmlParserSharp FOR AN ACTIVELY MAINTAINED VERSION OF THIS PROJECT.

Currently the port is based on Validator.nu 1.3.1 and works, as far as I have tested it. However as there are no unit tests, I'm not sure if every detail is working correctly. Tests showed that it is quite fast (about 3-6 times slower than parsing XML using .NET's XDocument API, but I think XML parsing is easier to implement, so this is okay and it's still FAST).

What's missing

If you want to contribute, maybe you can start here:

  • Support for character encodings other than UTF-8
  • More C#-ish coding style
  • Unit tests
  • Look for TODOs in the code

About

C# port of the Validator.nu HTML Parser (http://about.validator.nu/htmlparser/)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 85.6%
  • HTML 14.4%