Skip to content
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.

Update email regex to be more liberal to check TLDs #10983

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

CITIZENDOT
Copy link
Collaborator

Summary

Important

This PR/commit will be reverted after testing is finished.

Previously, EMAIL_REGEX has a list of top level domains, that are most commonly used by people. That regex was picked from here: https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php

But there are actually lot of other top level domains in existence, and it's just not viable to put all of them into regex. Here's the list of all allowed top level domains: https://data.iana.org/TLD/tlds-alpha-by-domain.txt. There are 1447 top level domains registered till now.

Now, I'm updating regex to allow top level domains which follow this criteria:

  • minimum length: 2
  • maximum length: 10 (this is actually very long when considering regularly used TLDs like .com, .io. But to allow maximum TLDs, while still not allowing obscure TLDs, this figure is chosen)

Out of the 1447 TLDs, 1305 (90%) TLDs follow this criteria.

New pattern is liberal than previous pattern. So it can have false positives. But I guess false positives are better than false negatives in case of email validation.

Subtasks Checklist

Breaking Changes

References

closes #insert number here

QA Steps

Copy link
Member

@hanzlamateen hanzlamateen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanzlamateen hanzlamateen merged commit 9a578d1 into stg Aug 16, 2024
2 checks passed
@hanzlamateen hanzlamateen deleted the update-email-regex-temp branch August 16, 2024 09:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants