Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong extraction for some valid domains #174

Closed
ddelange opened this issue Sep 16, 2019 · 1 comment
Closed

Wrong extraction for some valid domains #174

ddelange opened this issue Sep 16, 2019 · 1 comment

Comments

@ddelange
Copy link

ddelange commented Sep 16, 2019

The following domains all work, yet www is recognized as domain instead of subdomain, and the actual domain is wrongly prepended to the suffix. Although it is clear that www is not a domain, but rather a special subdomain, this doesn't yet negatively impact the registered_domain and fqdn methods.

Python 3.6.8 (default, Jan 19 2019, 21:26:02)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract

>>> tldextract.__version__
'2.2.1'

>>> tldextract.extract('www.experts-comptables.fr')
ExtractResult(subdomain='', domain='www', suffix='experts-comptables.fr')

>>> tldextract.extract('www.gob.mx')
ExtractResult(subdomain='', domain='www', suffix='gob.mx')

>>> tldextract.extract('www.ma.gov.br')
ExtractResult(subdomain='', domain='www', suffix='ma.gov.br')

>>> tldextract.extract('www.wroclaw.pl')
ExtractResult(subdomain='', domain='www', suffix='wroclaw.pl')

But when passing the same domains without www, the whole input ends up in the suffix, and both registered_domain and fqdn functions wrongly return an empty string.

I've read up on #138 and some more related issues in this repo, but I'm pretty sure this can't be intended behaviour?

@ddelange
Copy link
Author

Closing this issue, as these are cases of (at first instance unexpected) top level domains, and indeed www being the domain, not a (special) subdomain as in 99% of URLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant