We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing is done with chunking with the following code:
myhtml_tree_t* Parse(myhtml_t* myhtml, const std::string& body, size_t chunk_sz) { myhtml_tree_t* tree = myhtml_tree_create(); myhtml_tree_init(tree, myhtml); size_t body_chunk_pos = 0; while (body_chunk_pos < body.size()) { size_t current_chunk_sz = std::min(chunk_sz, body.size() - body_chunk_pos); mystatus_t parse_status = myhtml_parse_chunk_single( tree, body.c_str() + body_chunk_pos, current_chunk_sz); if (parse_status != MyHTML_STATUS_OK) { myhtml_tree_destroy(tree); return nullptr; } body_chunk_pos += current_chunk_sz; } return tree; }
And called with arguments:
myhtml_t* myhtml = myhtml_create(); myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0); std::string body = "<html><head><style>a</style></head><body>f</body></html>"; size_t chunk_sz = 13; myhtml_tree_t* tree = Parse(myhtml, body, chunk_sz);
Depending on build options, there may be various results. In some cases serialized tree looks like this:
<html><head><style>a</style></head><body>f</body></html></style></head><body></body></html>
In some cases looks like this
<html><head><style></style></head></html>
While it should be:
<html><head><style>a</style></head><body>f</body></html>
After some investigation I found out, that the issue is inside myhtml_tokenizer_state_rawtext_end_tag_name with token_node->raw_begin.
myhtml_tokenizer_state_rawtext_end_tag_name
token_node->raw_begin
The text was updated successfully, but these errors were encountered:
Looks like Lexbor project does not have similar issue. But it's also nice to have it here since it's a standalone html5 parser.
Sorry, something went wrong.
No branches or pull requests
Parsing is done with chunking with the following code:
And called with arguments:
Depending on build options, there may be various results.
In some cases serialized tree looks like this:
In some cases looks like this
While it should be:
After some investigation I found out, that the issue is inside
myhtml_tokenizer_state_rawtext_end_tag_name
withtoken_node->raw_begin
.The text was updated successfully, but these errors were encountered: