Extracting email addresses from large blocks of text — web page source, log files, CSV exports, or forum archives — by hand is impractical. This extractor uses a robust email regex that matches the overwhelming majority of real-world email formats (including plus-addressing, subdomains, and quoted local parts) while filtering obvious false positives.
A perfect email regex conforming to RFC 5322 would be thousands of characters long — because the spec allows highly unusual formats almost nobody uses. This tool uses a practical pattern that matches: local@domain.tld including plus addresses (user+tag@example.com), multiple subdomain levels, and hyphens in domain names. It does not match comment syntax ((comment)local@domain) or other RFC edge cases that don't appear in real-world email data.
Yes — the extractor scans the raw text including HTML markup. Email addresses in href="mailto:...", data- attributes, and plain text within tags will all be found.
No — obfuscated formats are intentionally unreadable by automated tools. Matching them would require pattern recognition that is too error-prone. You'll need to replace the obfuscation before extracting.
Internationalised domain names (IDN) like 用户@例子.广告 are not matched by default — enable the Unicode domain option to include them. IDN emails are rare in practice.
Only with explicit consent. Sending unsolicited email to scraped addresses violates GDPR, CAN-SPAM, CASL, and similar laws, and damages sender reputation. Only use extracted emails for legitimate purposes (your own data migration, backup, etc.).
See also the URL Extractor and the Email Validator for related text extraction tools.