HTML carries a lot of noise — tags, classes, inline styles, scripts — when all you want is the readable content. This converter parses your input through the browser's own DOM, walks the tree to extract visible text, and decodes entities like , © and ’ into the characters they actually represent. Block-level tags become real line breaks so paragraphs survive the conversion.
The HTML is loaded into a temporary DOM node so the browser does the heavy lifting — entities decode automatically, malformed markup is forgiven, and textContent returns the visible text without tags. Before extraction, block-level openers (<br>, <p>, <div>, <h1>–<h6>, <li>, <tr>) are replaced with newline characters so paragraph breaks survive. Runs of three or more newlines are collapsed to two for readability.
Yes — leave "Preserve line breaks" ticked and the converter inserts real newlines wherever it finds <br>, <p>, <div>, <h1>–<h6>, <li> or <tr>. Untick the option to flatten everything into a single line.
By default the visible link text is kept and the URL is dropped — that's usually what you want for reading. Untick "Keep link text" to remove anchors entirely. The tool doesn't append URLs in brackets like some converters do.
and ’?Yes — the converter parses the HTML using the browser's own DOM, so all named entities (©, —, ) and numeric entities (’) are decoded into their actual characters automatically when "Decode entities" is ticked.
Yes — paste the page source (View Source in your browser, then copy). Scripts, styles and HTML comments are stripped along with the tags, so the output is just the visible content. <pre> and <code> contents are kept verbatim.
Pair with the HTML Minifier, the Markdown to HTML Converter, or the Word Counter for follow-up text work.