MDCopy: An Edge Extension to copy HTML as Markdown


MDCopy: A Lean Edge Extension That Turns Web Pages into Markdown

https://github.com/blakepell/MDCopyExtension

If you spend any meaningful amount of time pulling content from the web into a notes app, a wiki, a static-site generator, or pretty much anything that eats Markdown, you've probably lived through the same frustration I have. You highlight a section of a web page, copy it, and paste it into your app and you get a wall of unstyled plain text. Every heading, every bullet, every bold phrase gone. Then you spend five minutes reformatting it by hand. There are existing extensions that do this, but do you trust them?

Enter MDCopy. MDCopy is a Microsoft Edge extension built to solve exactly that problem, and nothing else. It's small, it's focused, and it does one thing well: it takes whatever you've selected on a page and converts the underlying HTML into clean, well-structured Markdown before it ever hits your clipboard.

Let's walk through how it's put together.


Overview

MDCopy is a Manifest V3 browser extension. It requests only the permissions it actually needs: activeTab (to talk to whatever tab is currently open), clipboardWrite (to write the result to your clipboard), contextMenus (to add the right-click option), and scripting (to inject the content script when necessary).

There are four moving pieces:

  1. manifest.json - the extension's configuration file
  2. background.js - the service worker that manages the right-click context menu
  3. content.js - the script that runs inside every page and does the actual HTML-to-Markdown conversion
  4. popup.html / popup.js - the toolbar button UI with a copy button and an inline preview

manifest.json: The Configuration

The manifest is where everything is wired together. The extension is named MDCopy, targets Manifest V3, and declares background.js as a service worker. The action block ties the popup UI to the toolbar button. The content_scripts block tells Edge to inject content.js into every page at document_idle meaning the script loads after the page's DOM is ready, without blocking rendering.

The icon is a single app.png used at all three standard sizes (16px, 48px, 128px). Future improvement opportunity.


background.js: The Right-Click Menu

The background service worker is intentionally thin. When the extension is installed, it registers a single context menu item labeled "Copy as Markdown" that only appears when text is selected on the page (contexts: ['selection']).

chrome.runtime.onInstalled.addListener(() => {
  chrome.contextMenus.create({
    id: 'mdcopy-copy',
    title: 'Copy as Markdown',
    contexts: ['selection'],
  });
});

When that menu item is clicked, the background worker sends a copyAsMarkdown message to the content script running in the active tab. The content script then handles the conversion and writes directly to the clipboard. The background worker doesn't need to know the result, it fires the message and moves on, suppressing the benign "Receiving end does not exist" error that fires on Edge's internal pages where content scripts can't run.

This is a clean separation of concerns: the service worker handles the menu plumbing, and the content script owns the actual work.


content.js: The Engine

The content script has three responsibilities: capture the current HTML selection, convert it to Markdown, and write it to the clipboard.

Capturing the Selection

getSelectedHtml() uses the browser's native window.getSelection() API to grab the current selection range, clones the DOM fragment it represents, and serializes it to an HTML string. If nothing is selected, it returns null.

function getSelectedHtml() {
  const selection = window.getSelection();
  if (!selection || selection.rangeCount === 0 || selection.isCollapsed) return null;
  const range = selection.getRangeAt(0);
  const container = document.createElement('div');
  container.appendChild(range.cloneContents());
  return container.innerHTML;
}

The key detail here is cloneContents() who takes care of deep copying the selected DOM nodes without touching the live page, so the selection state isn't disturbed.

Converting HTML to Markdown

The htmlToMarkdown() function parses the captured HTML string into a fresh DOM document using DOMParser, then walks the resulting node tree recursively via convertNode() and convertElement().

The converter handles a comprehensive set of HTML elements:

Headings (h1h6) get converted to the appropriate number of # prefixes, surrounded by blank lines to ensure proper rendering in most Markdown parsers.

Block elements like <p>, <div>, <section>, <article>, <blockquote>, and their semantic relatives are all wrapped with double newlines. Blockquotes are handled specially — each line of the inner content gets a > prefix prepended, which handles multi-line quotes correctly.

Inline formatting covers the full gamut:

  • <strong> and <b>**bold**
  • <em> and <i>*italic*
  • <s>, <del>, <strike>~~strikethrough~~
  • <mark>==highlighted==
  • <sup>^superscript^
  • <sub>~subscript~

Code gets special treatment. Inline <code> elements become backtick-wrapped spans. If the code content itself contains a backtick (a common edge case people forget), the converter cleverly falls back to double-backtick wrapping to avoid breaking the Markdown. Block <pre> elements become fenced code blocks, and the converter even looks for a language-xyz class on the inner <code> element to populate the language hint in the fence:

const langClass = Array.from(codeEl.classList).find(c => c.startsWith('language-'));
if (langClass) lang = langClass.replace('language-', '');

Lists are handled recursively. Both ordered (<ol>) and unordered (<ul>) lists are supported, and nested lists are indented correctly by tracking depth in a context object passed through the recursive calls. Each level of nesting adds two spaces of indentation.

Tables are converted to the standard GitHub-flavored Markdown pipe table format. The converter reads all <tr> rows, extracts cell text, escapes any pipe characters inside cells so they don't break the table structure, and builds the header row, separator row, and data rows. It automatically detects whether the first row uses <th> elements to decide if it's a header row.

Links (<a>) become [text](href) pairs. If there's no text content, just the URL is output. If there's no href or it's just #, the text is output as


Summary

If you're interested in using this, the source code and the instructions for setting it up are over on it's GitHub repository (setup only takes a minute or two):

This has been released under the MIT license.