Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding detection errors #10

Open
jpcima opened this issue Oct 29, 2019 · 3 comments
Open

Encoding detection errors #10

jpcima opened this issue Oct 29, 2019 · 3 comments

Comments

@jpcima
Copy link
Owner

jpcima commented Oct 29, 2019

I think, the encoding detection issues are vastly resolved, but I'll drop here some samples which are still failing.

z2ow.mid.gz CP932

@jpcima
Copy link
Owner Author

jpcima commented Nov 28, 2019

random set of some vgmusic's which misdetect:

beachcave.mid.gz
ff1flcst.mid.gz
Mi%27Ihen_Highway.mid.gz
realemotion1.mid.gz
so2_hurry.mid.gz

@jpcima
Copy link
Owner Author

jpcima commented Jan 13, 2020

goemon.mid.gz

@jpcima
Copy link
Owner Author

jpcima commented Jan 18, 2020

Idea of algorithm for new heuristic

Let S be an input string of length N
Score ← 0
Counter ← 0
Script ← None
For each codepoint C of S:
    PreviousScript ← Script
    Script ← uscript_getScript(C)
    If Script ≠ PreviousScript:
        Counter ← 0
    Counter ← Counter + 1
    Score ← Score + Counter
Score ← Score / N

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant