WHAT "FIX BROKEN URLs" DOES
STRIPS TRAILING PUNCTUATION
Removes . , ; : ! ? ) fused to the end of URLs.
e.g. https://site.com/page. → https://site.com/page
STRIPS ANGLE BRACKET AND QUOTE WRAPPERS
URLs wrapped in <> or " from HTML source are cleaned automatically.
HANDLES HTTP AND HTTPS
Extracts both protocols unless HTTPS Only is enabled. Use the Strip query strings option to remove ?utm_source= and other tracking params.
Does it extract URLs from HTML source code?
Yes — paste HTML source and the extractor will find all URLs in href, src, and other attributes, as well as bare URLs in text content.
Can it find URLs without http:// prefix?
By default the tool looks for URLs starting with http://, https://, or ftp://. Enable the smart detection mode to also catch bare domain-style URLs like www.example.com without a scheme prefix.
How to Use the URL Extractor
- Paste or enter your input into the text field.
- Configure any options (format, delimiter, encoding, or mode) using the controls above the output.
- The result updates instantly — no submit button required for most operations.
- Click Copy or Download to take the output to your next step.
Pull every URL out of a block of text or HTML in one pass. The extractor uses a tuned regular expression that recognises http/https URLs, balances parentheses inside paths (so Wikipedia-style links survive), and trims stray trailing punctuation that isn’t really part of the address.
How the URL Extractor Works
Results can be deduplicated while preserving original order, sorted alphabetically, or filtered by domain. An optional repair step adds https:// to obvious bare domains (example.com→https://example.com). Total and unique counts are shown so you can spot duplicates at a glance.
- Finds http/https URLs in plain text or HTML
- Removes duplicate URLs while preserving order
- Strips trailing punctuation that’s not really part of the URL
- Optional scheme repair (adds https:// to bare domains)
Frequently Asked Questions
What URL formats does the extractor recognise?
It catches http://, https://, and (optionally) bare domains like example.com/path. Mailto and ftp links can be enabled separately. The regex handles internationalised domain names and percent-encoded paths.
How are trailing punctuation marks handled?
Stray characters like commas, full stops, and closing brackets at the end of a URL (e.g. ‘see https://example.com.’) are stripped. Brackets are balanced so a Wikipedia-style URL with parentheses inside isn’t cut short.
Can I extract URLs from raw HTML source?
Yes — paste HTML and the tool finds URLs both in href/src attributes and in plain-text content. Use the deduplicate option to collapse repeated links.
Is my input uploaded anywhere?
No. The extraction is a regular expression run inside your browser; nothing is sent over the network.
Explore the full suite of Text tools and 290+ other free utilities at Chunky Munster.