.md file. Your PDF never leaves your browser.This tool uses PDF.js to read the raw text content of each page, including each text item's font size, font name, and position. It groups text items into visual lines by Y-coordinate, calculates the body font size from the most common size across the document, and classifies anything significantly larger as a heading (H1–H4). Bold and italic are detected from font names. Bullet and numbered lists are identified by their leading characters. Page boundaries are optionally marked with horizontal rules.
No. Scanned PDFs consist of images rather than actual text. OCR (optical character recognition) is needed to extract text from them — this cannot run in the browser. The tool will warn you if pages appear to be image-only.
Yes. The Markdown output is clean structured text, ideal for chunking and embedding in retrieval-augmented generation (RAG) systems, LangChain document loaders, or any text-based AI pipeline. It produces the same format as tools like OpenDataLoader PDF — but entirely in your browser.
Standard CommonMark-compatible Markdown — works in GitHub, Notion, Obsidian, VS Code, and all major Markdown renderers and static site generators.
No hard limit. Large PDFs will take longer to process. Very large files (100+ pages) may take several seconds depending on your device. The file never leaves your browser.
Markdown references image files separately rather than embedding them. The tool inserts  markers where images are detected so you know where to add them manually after export.