Email Extractor — Extract Email Addresses

WHAT "FIX BROKEN EMAILS" DOES

STRIPS TRAILING PUNCTUATION

Removes . , : ) ] fused to the end of emails.

e.g. name@site.com. → name@site.com

STRIPS JUNK TEXT FUSED ONTO TLDS

e.g. name@site.comSomeText → name@site.com

WORKS WITH ANY DOMAIN ENDING

Automatically detects any TLD — .com .tv .photography .co.uk — no list needed.

Does it extract emails from HTML source code?

Yes — paste HTML source and the extractor will find all email addresses in attributes, text nodes, and mailto: links. The regex works on any text format.

Can it extract addresses from PDFs?

Paste the text content of a PDF (copy-paste from your PDF reader) and the tool will extract all email addresses found. It cannot read PDF binary format directly.

How to Use the Email Extractor

Paste any block of text — HTML source, a forum post, a CSV, or plain text — into the input.
Click Extract — all valid email addresses are found using regex pattern matching.
Duplicates are automatically removed; results are sorted alphabetically.
Copy the list or download it as a line-separated text file.

Extracting email addresses from large blocks of text — web page source, log files, CSV exports, or forum archives — by hand is impractical. This extractor uses a robust email regex that matches the overwhelming majority of real-world email formats (including plus-addressing, subdomains, and quoted local parts) while filtering obvious false positives.

Email Regex Pattern Notes

A perfect email regex conforming to RFC 5322 would be thousands of characters long — because the spec allows highly unusual formats almost nobody uses. This tool uses a practical pattern that matches: local@domain.tld including plus addresses (user+tag@example.com), multiple subdomain levels, and hyphens in domain names. It does not match comment syntax ((comment)local@domain) or other RFC edge cases that don't appear in real-world email data.

Matches standard, plus-addressed, and subdomain email formats
Deduplicates results and sorts alphabetically
Shows total count and unique count
Optionally normalises to lowercase for deduplication accuracy

Frequently Asked Questions

Will it find emails inside HTML attributes?

Yes — the extractor scans the raw text including HTML markup. Email addresses in href="mailto:...", data- attributes, and plain text within tags will all be found.

Does it handle obfuscated emails like "user [at] domain [dot] com"?

No — obfuscated formats are intentionally unreadable by automated tools. Matching them would require pattern recognition that is too error-prone. You'll need to replace the obfuscation before extracting.

What about international email addresses with Unicode domain names?

Internationalised domain names (IDN) like 用户@例子.广告 are not matched by default — enable the Unicode domain option to include them. IDN emails are rare in practice.

Should I use extracted emails for marketing?

Only with explicit consent. Sending unsolicited email to scraped addresses violates GDPR, CAN-SPAM, CASL, and similar laws, and damages sender reputation. Only use extracted emails for legitimate purposes (your own data migration, backup, etc.).

See also the URL Extractor and the Email Validator for related text extraction tools.

Email Extractor online

Extract all email addresses from any block of text