Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TooManyHeadings error on otherwise legit-looking page #16

Open
yuvipanda opened this issue Jan 11, 2016 · 12 comments
Open

TooManyHeadings error on otherwise legit-looking page #16

yuvipanda opened this issue Jan 11, 2016 · 12 comments
Labels

Comments

@yuvipanda
Copy link
Contributor

On https://en.wikipedia.org/wiki/Wikipedia:Teahouse/Questions/Archive_235#How_to_link_any_file_like_video_or_picture.2C_on_wikipedia.27s_article.3F I get a TooManyHeadings error. Removing sections titled == How to link any file like video or picture, on wikipedia's article? == or any of the two ones succeeding it fixes this issue.

It looks like the wikicode section being received is not properly split?

@yuvipanda
Copy link
Contributor Author

A simple mwparserfromhell script seems to work fine:

from mwparserfromhell import parse

doc = parse(open('text.wm'), skip_style_tags=True)

for sec in doc.get_sections(include_lead=True, flat=True):
    print(sec.filter_headings())

@kjschiroo
Copy link
Collaborator

This is really strange! This script works for me:

from mwparserfromhell import parse

doc = parse(open('/path/to/file'), skip_style_tags=True)

for sec in doc.get_sections(include_lead=True, flat=True):
    if len(sec.filter_headings()) > 1:
        print("Bad!")
        # never prints

however, this does not work:

from mwparserfromhell import parse

with open('/path/to/file') as f:
    text = f.read()

doc = parse(text, skip_style_tags=True)

for sec in doc.get_sections(include_lead=True, flat=True):
    if len(sec.filter_headings()) > 1:
        print("Bad!")
        # prints once

Using the file pointer rather than the text matters?

@yuvipanda
Copy link
Contributor Author

Yeah, I can repro that. When passed as a string, one of the sections has multiple headings ["== qHow to link any file like video or picture, on wikipedia's article? ==", '==going "live"=='] and when parsed as a file it doesn't.

@kjschiroo
Copy link
Collaborator

I think this is an issue to raise with mwparserfromhell, I would guess this is not their intended behavior.

@kjschiroo
Copy link
Collaborator

@kjschiroo
Copy link
Collaborator

It looks like the issue in mwparserfromhell that is causing this is projected to be fixed in 0.5. In the meantime it will be possible to just cut sections on all headings which should take care of the issue.

@yuvipanda
Copy link
Contributor Author

I've hand-fixed that archive page for now.

@Ironholds
Copy link
Contributor

Python ignoramus here; is this also the reason I'm getting a TooManyHeadings problem on https://en.wikipedia.org/wiki/Talk:Rhaetian_Railway ?

@kjschiroo
Copy link
Collaborator

Yeah, there is a pretty good chance that this is the issue. I will look more into this specific case tomorrow. It is just the current revision that causes the issue, correct?

@Ironholds
Copy link
Contributor

I mean, I haven't tried past revisions, so...

@kjschiroo
Copy link
Collaborator

@Ironholds, I am not able to reproduce your issue with current version of that talk page. Could you post a file containing the input that throws the error?

@kjschiroo kjschiroo added the bug label Jan 22, 2016
@Ironholds
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants