Deep Dive · Developer Guide

HTML to PDF: The Complete Developer Guide (2026)

· 18 min read · · Updated

Quick Answer

The best way to convert HTML to PDF in 2026 is Playwright (for new projects) or Puppeteer (for existing Node.js codebases). Both drive headless Chromium, support full modern CSS and JavaScript, and produce pixel-accurate PDFs.

If you are using wkhtmltopdf, migrate now — it was archived in 2023 and has unpatched CVEs. If you generate more than a few hundred PDFs/day, a managed API removes the Chromium infrastructure entirely.

It starts simply enough. You need to generate a PDF — an invoice, a report, a contract. You find a library, write twenty lines of code, and it works. Then you push to production. The font is wrong. A table is split across two pages. The header is missing on page three. The whole thing looks like it was rendered on a computer from 2009.

Welcome to HTML to PDF conversion — one of the most deceptively hard problems in web development. This guide documents every major failure mode, why each one happens, and what actually fixes it — with working code in every major language.

1. Why HTML to PDF Is Fundamentally Hard

HTML was designed for infinite scroll. PDF was designed for fixed pages. This is not a minor implementation detail — it is a philosophical mismatch that causes almost every problem in this space.

A browser doesn't need to know where page 3 starts. A PDF renderer does. CSS properties like overflow: scroll, position: sticky, and viewport units (vw, vh) have no meaningful equivalent in a paginated document. Every HTML-to-PDF tool is, at its core, an attempt to bridge this gap — and each makes different tradeoffs about where to break down.

There are five fundamentally different approaches to converting HTML to PDF, each with different tradeoffs:

  • Headless browser rendering (Puppeteer, Playwright, Gotenberg) — sends the HTML through a real browser engine and prints the result. Highest fidelity, highest resource cost.
  • CSS Paged Media engines (PrinceXML, PDFreactor, WeasyPrint) — purpose-built renderers that interpret the W3C CSS Paged Media spec. Excellent for printed documents, but require PDF-specific CSS knowledge.
  • Legacy WebKit wrappers (wkhtmltopdf) — frozen circa 2015. Do not use.
  • Programmatic PDF builders (jsPDF, PDFKit, iTextSharp) — generate PDF structure directly rather than rendering HTML. Full control, but requires rewriting your templates in a PDF-specific API.
  • Managed cloud APIs (ConvertStack, DocRaptor, Browserless) — offload the rendering infrastructure entirely. Trade cost per conversion for zero ops burden.
The fundamental rule: anything designed to respond to a viewport will behave unpredictably in a PDF. Build your PDF templates for a fixed page, not a flexible screen — or use @media print to override layout for print contexts.

2. wkhtmltopdf Is Dead — Migrate Now

Critical — Action Required

The most popular HTML-to-PDF tool is officially abandoned

wkhtmltopdf was archived in January 2023. Its last stable release was June 2020. It runs on a Qt WebKit engine frozen circa 2015. It has unpatched CVEs, no arm64 binaries, and no future. If it's anywhere in your stack, you have a ticking clock.

For years, wkhtmltopdf was the default answer to "how do I convert HTML to PDF." It was free, it worked, and it ran anywhere. Thousands of libraries were built on top of it — DinkToPdf, Rotativa, NReco.PdfGenerator in .NET; barryvdh/laravel-dompdf in PHP; WickedPDF in Ruby on Rails.

All of those wrappers are now wrappers around abandoned software. The rendering engine doesn't understand CSS Grid, CSS custom properties, modern flexbox behaviour, clamp(), min(), max(), or ES6+ JavaScript. If your design uses any of these — and modern designs all do — your PDFs will silently produce wrong output that's difficult to debug.

The migration path: replace wkhtmltopdf with Playwright or Puppeteer. Both ship an evergreen Chromium engine and produce PDFs that match what you see in Chrome. Playwright is faster and produces smaller files (see benchmarks); Puppeteer is a stable choice for existing Node.js codebases. Gotenberg wraps Chromium behind a Docker REST API for polyglot architectures.

3. CSS That Works in a Browser Breaks in a PDF

Very Common

Your layout collapses, your colors vanish, your fonts aren't there

PDF renderers interpret CSS differently from browsers. Some properties are ignored entirely. Others produce subtly wrong output. The visual result can look nothing like your browser preview.

The most common CSS casualties in HTML to PDF conversion:

Background colors and images. Most renderers suppress backgrounds by default to save ink — a sensible printer default that's disastrous for designed documents. In headless Chrome, pass printBackground: true. In CSS, add -webkit-print-color-adjust: exact; print-color-adjust: exact; to every element that needs it, or globally.

Flexbox and Grid. Modern headless Chromium tools handle these correctly. wkhtmltopdf and other WebKit-frozen tools render them incorrectly or ignore them entirely.

Viewport units. 100vw in a PDF context is meaningless or resolves to the paper width — usually not what you want. Use fixed px or % widths relative to your page size instead.

Position: fixed / sticky. Fixed positioning may repeat on every page, or may not appear at all, depending on the renderer. Use headers/footers via the renderer API instead.

External stylesheets. Renderers running in sandboxed environments can't load external CSS via <link> tags. Inline your critical styles, or serve them from a URL the renderer can reach.

CSS variables. Fully supported in modern Chromium-based tools. Not supported in wkhtmltopdf.

/* ── Force backgrounds in PDF ── */
* {
  -webkit-print-color-adjust: exact;
  print-color-adjust: exact;
}

/* ── Target PDF output only ── */
@media print {
  .sidebar    { display: none; }
  .content    { width: 100%; }
  body        { font-size: 12pt; }

  /* Avoid viewport units in PDFs */
  .hero       { width: 100%; height: auto; }
}

4. Page Breaks Are a Nightmare

Very Common

Tables split mid-row. Headings appear alone at the bottom. Images get cut in half.

Automatic page breaking is the single most frustrating aspect of HTML to PDF conversion. The renderer almost never decides correctly by default.

CSS provides page-break properties, but they are easy to get wrong. Use both the old and new forms for maximum compatibility:

/* ── Never break inside these elements ── */
tr, img, figure, .card, .invoice-line {
  page-break-inside: avoid;  /* old */
  break-inside: avoid;       /* new */
}

/* ── Force a new page before major sections ── */
.page-section {
  page-break-before: always;
  break-before: page;
}

/* ── Keep headings attached to their content ── */
h1, h2, h3 {
  page-break-after: avoid;
  break-after: avoid;
}

/* ── Prevent orphaned/widowed lines ── */
p {
  orphans: 3;
  widows: 3;
}

For large tables where break-inside: avoid is not feasible, split the data into multiple <table> elements — one per page worth of rows — and apply page-break-before: always to each after the first.

5. Fonts That Work Everywhere Else Don't Render

Common

Google Fonts load in your browser. In a sandboxed renderer, the request times out and you get system fallback.

Font loading is a network operation. Your carefully designed typography disappears without an error.

The solutions, in order of reliability:

  1. Self-host fonts. Download the font files and serve them from the same origin as your HTML. No external network calls required.
  2. Wait for network idle. Use waitUntil: 'networkidle0' in Puppeteer or wait_until='networkidle' in Playwright to ensure all font requests complete before rendering.
  3. Base64-encode and embed. Convert font files to base64 and embed them directly in your @font-face declaration. Verbose but eliminates all network dependency.
/* Self-hosted font — works in sandboxed renderers */
@font-face {
  font-family: 'Inter';
  src: url('/fonts/Inter-Regular.woff2') format('woff2');
  font-weight: 400;
  font-display: block; /* avoid FOUT in renderer */
}

/* Base64 embed — no network dependency at all */
@font-face {
  font-family: 'Inter';
  src: url('data:font/woff2;base64,d09GMgABAA...') format('woff2');
}

6. Headers, Footers, and Page Numbers

Common

Adding "Page 1 of N" to every page requires special renderer support — and every tool handles it differently.

There is no standard cross-tool approach.

In headless Chrome (Puppeteer/Playwright), headers and footers are separate HTML templates passed to the pdf() call. They run in a different execution context — your page's CSS and JavaScript do not apply. Inline all styles inside the template.

// Playwright / Puppeteer — header and footer with page numbers
const pdf = await page.pdf({
  format: 'A4',
  printBackground: true,
  displayHeaderFooter: true,
  headerTemplate: `
    <div style="font-family:sans-serif;font-size:9px;width:100%;
      padding:0 20px;display:flex;justify-content:space-between;
      color:#888;border-bottom:1px solid #eee;">
      <span>My Company — Confidential</span>
      <span>Generated <span class="date"></span></span>
    </div>`,
  footerTemplate: `
    <div style="font-family:sans-serif;font-size:9px;width:100%;
      padding:0 20px;display:flex;justify-content:space-between;
      color:#888;">
      <span>https://example.com</span>
      <span>
        Page <span class="pageNumber"></span>
        of <span class="totalPages"></span>
      </span>
    </div>`,
  margin: {
    top: '50px',    // must match header height
    bottom: '40px', // must match footer height
    left: '15mm',
    right: '15mm'
  }
});

Chromium automatically replaces .pageNumber, .totalPages, .title, .url, and .date spans. The margin values must equal or exceed the header/footer height — otherwise content overlaps.

7. JavaScript-Rendered Content Is Missing

Common in SPAs

Your React or Vue app renders blank in the PDF.

Tools that don't execute JavaScript will capture only the initial HTML shell of a single-page app. If your content is rendered client-side, the PDF will be empty or partial.

Headless Chromium tools (Puppeteer, Playwright) execute JavaScript fully. Tell them to wait for your content:

// Wait for all network requests to finish (fonts, API calls, images)
await page.goto(url, { waitUntil: 'networkidle0' });

// OR wait for a specific element that signals content is ready
await page.waitForSelector('.invoice-total', { state: 'visible' });

// OR wait for a custom signal from your app
await page.waitForFunction(() => window.PDF_READY === true);

For server-rendered applications (Next.js, Nuxt, Astro), this is less of an issue since HTML is fully formed before JavaScript executes. Building PDF-heavy features with SSR is the most reliable architecture.

8. Scale Breaks Everything

Production Issue

One conversion: fine. One hundred concurrent: browser crashes, memory pressure, queue timeouts.

A headless Chromium instance is a full browser. Running it at scale means managing process pools, crash recovery, memory limits, Docker images, and cold-start latency — none of which is your core product.

A single Chromium instance handles roughly 5–10 parallel renders before performance degrades. Each process uses 150–400 MB of RAM. In a serverless environment (Lambda, Cloud Run), cold-starting Chromium adds 1–3 seconds to the first request in a new instance.

At scale you need: a pool of warm browser instances, a job queue with retry logic, health checks and restart policies, and careful memory management to prevent leaks across renders. This is real infrastructure work that has nothing to do with your product. It is why managed API services exist.

9. HTML to PDF in Node.js / JavaScript

Node.js has the best HTML to PDF tooling ecosystem. Playwright and Puppeteer both have first-class Node.js support.

With Playwright (recommended)

// npm install playwright && npx playwright install chromium
const { chromium } = require('playwright');

async function htmlToPdf(htmlString) {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  await page.setContent(htmlString, {
    waitUntil: 'networkidle' // wait for fonts, images
  });

  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' }
  });

  await browser.close();
  return pdf; // Buffer
}

// Convert a live URL
async function urlToPdf(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });
  const pdf = await page.pdf({ format: 'A4', printBackground: true });
  await browser.close();
  return pdf;
}

With Puppeteer

// npm install puppeteer
const puppeteer = require('puppeteer');

async function htmlToPdf(htmlString) {
  const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
  const page = await browser.newPage();
  await page.setContent(htmlString, { waitUntil: 'networkidle0' });
  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' }
  });
  await browser.close();
  return pdf;
}

// Express.js route example
app.post('/generate-pdf', async (req, res) => {
  const pdf = await htmlToPdf(req.body.html);
  res.set({ 'Content-Type': 'application/pdf', 'Content-Length': pdf.length });
  res.send(pdf);
});

10. HTML to PDF in Python

Python has two strong options depending on your requirements: Playwright for full JavaScript support, and WeasyPrint for pure CSS Paged Media rendering without a browser dependency.

With Playwright (JavaScript support)

# pip install playwright && playwright install chromium
from playwright.sync_api import sync_playwright

def html_to_pdf(html_string: str) -> bytes:
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.set_content(html_string, wait_until='networkidle')
        pdf = page.pdf(
            format='A4',
            print_background=True,
            margin={'top': '20mm', 'bottom': '20mm',
                    'left': '15mm', 'right': '15mm'}
        )
        browser.close()
        return pdf  # bytes

# Convert a URL
def url_to_pdf(url: str) -> bytes:
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until='networkidle')
        pdf = page.pdf(format='A4', print_background=True)
        browser.close()
        return pdf

# Django view example
def generate_invoice(request, invoice_id):
    invoice = Invoice.objects.get(pk=invoice_id)
    html = render_to_string('invoices/pdf.html', {'invoice': invoice})
    pdf = html_to_pdf(html)
    return HttpResponse(pdf, content_type='application/pdf')

With WeasyPrint (no browser required)

# pip install weasyprint
# No Chromium required — pure Python CSS renderer
from weasyprint import HTML, CSS

def html_to_pdf(html_string: str) -> bytes:
    return HTML(string=html_string).write_pdf()

# With external stylesheet
def html_to_pdf_with_styles(html_string: str, css_path: str) -> bytes:
    return HTML(string=html_string).write_pdf(
        stylesheets=[CSS(filename=css_path)]
    )

# WeasyPrint supports CSS Paged Media — running headers,
# footnotes, margin boxes. Playwright does not.
# WeasyPrint does NOT execute JavaScript.
WeasyPrint vs Playwright in Python: Use WeasyPrint when you need CSS Paged Media features (running headers, footnotes, @page rules) and have no JavaScript dependency. Use Playwright when your HTML requires JavaScript execution or uses modern CSS that WeasyPrint doesn't support (Flexbox in older versions, custom properties, Grid).

11. HTML to PDF in PHP

PHP has historically relied on wkhtmltopdf wrappers (barryvdh/laravel-snappy, knplabs/knp-snappy). These should now be migrated. The best modern PHP approaches are Gotenberg (Docker REST API) or a managed API.

Via Gotenberg (Docker REST API)

# docker run --rm -p 3000:3000 gotenberg/gotenberg:8

// PHP — send HTML to Gotenberg and receive a PDF
function htmlToPdf(string $html): string {
    $boundary = uniqid();
    $body  = "--{$boundary}\r\n";
    $body .= 'Content-Disposition: form-data; name="files"; filename="index.html"' . "\r\n";
    $body .= "Content-Type: text/html\r\n\r\n";
    $body .= $html . "\r\n";
    $body .= "--{$boundary}--";

    $response = file_get_contents('http://localhost:3000/forms/chromium/convert/html', false, stream_context_create([
        'http' => [
            'method'  => 'POST',
            'header'  => "Content-Type: multipart/form-data; boundary={$boundary}\r\n",
            'content' => $body,
        ]
    ]));

    return $response; // PDF bytes
}

// Laravel — stream PDF to browser
public function invoice(Invoice $invoice) {
    $html = view('invoices.pdf', compact('invoice'))->render();
    return response(htmlToPdf($html), 200)
        ->header('Content-Type', 'application/pdf');
}

12. HTML to PDF in Ruby / Rails

Ruby on Rails historically used WickedPDF (wkhtmltopdf wrapper) or Prawn. WickedPDF should be migrated. The modern path is Playwright via a subprocess or a REST API.

Via Playwright (Node.js subprocess)

# Gemfile — no gem needed, call Node.js subprocess
# node_modules/playwright must be installed

# app/services/pdf_generator.rb
class PdfGenerator
  SCRIPT = Rails.root.join('lib/pdf/render.js')

  def self.from_html(html)
    result = Open3.capture2('node', SCRIPT.to_s, stdin_data: html, binmode: true)
    result[0] # PDF bytes
  end
end

# lib/pdf/render.js
const { chromium } = require('playwright');
let html = '';
process.stdin.on('data', d => html += d);
process.stdin.on('end', async () => {
  const b = await chromium.launch();
  const p = await b.newPage();
  await p.setContent(html, { waitUntil: 'networkidle' });
  const pdf = await p.pdf({ format: 'A4', printBackground: true });
  await b.close();
  process.stdout.write(pdf);
});

# Rails controller
def show
  html = render_to_string('invoices/pdf', layout: 'pdf')
  pdf  = PdfGenerator.from_html(html)
  send_data pdf, type: 'application/pdf', disposition: 'inline'
end

13. Performance Benchmarks

The numbers below are from PDF4.dev's 2026 HTML to PDF benchmark, generating a 5-page document with images, fonts, and CSS Grid on a 4-core VM. All headless browser tests used pre-warmed browser instances.

Tool Warm (ms) Cold (ms) File size RAM/instance
Playwright ~3 ms ~42 ms 59–125 KB ~180 MB
Puppeteer ~48 ms ~147 ms ~197 KB ~200 MB
WeasyPrint ~120 ms ~130 ms ~80 KB ~60 MB
Gotenberg ~60 ms ~800 ms ~130 KB ~250 MB
wkhtmltopdf ~200 ms ~200 ms ~90 KB ~30 MB

Playwright's warm-render speed (~3ms) reflects its optimized Chromium build and efficient IPC. Gotenberg's cold start is high because Docker container initialization includes Chromium startup — in a pre-warmed container it's competitive with Puppeteer.

14. Full Tool Comparison

Tool Modern CSS JS Maintained Languages Ops burden Cost
wkhtmltopdf ✗ No ✗ No ✗ Archived Any (CLI) Low Free
Puppeteer ✓ Yes ✓ Yes ✓ Active Node.js only High Free
Playwright ✓ Yes ✓ Yes ✓ Active JS, Python, .NET, Java High Free
WeasyPrint ✓ Partial ✗ No ✓ Active Python Medium Free
Gotenberg ✓ Yes ✓ Yes ✓ Active Any (REST) Medium Free
PrinceXML ✓ Best ✗ No ✓ Active Any (CLI) Low $3,800+/yr
Managed API
(e.g. ConvertStack)
✓ Yes ✓ Yes ✓ Active Any (REST) None Per conversion

15. Best Practices That Actually Help

Design for paper from the start. Don't try to make your web UI render as a PDF — build a separate template designed for fixed pages. Use @media print styles and fixed dimensions from the beginning.

Use a headless Chromium tool. It's the only category that correctly renders modern CSS and JavaScript. Playwright is the current performance leader; Puppeteer is a stable fallback.

Inline your styles and fonts. Any network dependency inside a PDF renderer is a potential failure. Self-host fonts, inline critical CSS, and test in a network-restricted environment to catch issues early.

Test page breaks explicitly. Generate test documents with content that spans multiple pages and verify nothing gets cut. Automate this in your CI pipeline with visual regression tools.

Set a fixed viewport width. PDF renderers default to a narrow viewport. Set an explicit width that matches your page size: await page.setViewportSize({ width: 794, height: 1123 }) for A4 at 96 DPI.

Treat scaling as infrastructure. If you're generating more than a few hundred PDFs per day, the operational cost of managing Chromium will exceed the cost of a managed API. Make that call deliberately rather than discovering it under load.

Add a PDF_READY signal. In complex apps with deferred rendering, set a flag (window.PDF_READY = true) when content is completely loaded, and wait for it with page.waitForFunction() before calling pdf().

The honest summary: HTML to PDF is not a solved problem. Headless Chromium renders nearly everything correctly — but the operational complexity of running it at scale is genuinely significant. Know what you're signing up for before you build. If you'd rather not manage any of it, ConvertStack handles the infrastructure.

16. Frequently Asked Questions

What is the best tool for HTML to PDF in 2026? expand_more
Playwright for new projects. It uses an evergreen Chromium engine, supports full CSS and JavaScript, and is 16× faster than Puppeteer on warm renders (~3ms vs ~48ms) with smaller file output. For Python stacks without JavaScript requirements, WeasyPrint is excellent. For production scale where you don't want to manage Chromium infrastructure, use a managed API.
Is wkhtmltopdf safe to use in 2026? expand_more
No. wkhtmltopdf was archived in January 2023 with unpatched CVEs and no arm64 binaries. Its Qt WebKit engine dates to circa 2015 and doesn't support CSS Grid, CSS custom properties, or ES6+ JavaScript. If it's in your stack, migrate to Playwright or Puppeteer now.
Why does my CSS look wrong in the PDF? expand_more
The two most common causes: (1) Backgrounds are suppressed — add printBackground: true in Puppeteer/Playwright and -webkit-print-color-adjust: exact; print-color-adjust: exact; in CSS. (2) External stylesheets fail to load in sandboxed renderers — inline critical CSS into the HTML string before passing it to the renderer.
How do I prevent tables from splitting across pages? expand_more
Apply page-break-inside: avoid; break-inside: avoid; to your tr elements. For large tables where this causes empty space, split data into multiple <table> elements with a page break between them. Also set orphans: 3; widows: 3 globally.
How do I add page numbers to a PDF from HTML? expand_more
Use displayHeaderFooter: true with a footerTemplate in Puppeteer/Playwright. Add <span class="pageNumber"></span> / <span class="totalPages"></span> — Chromium replaces these automatically. Inline all styles in the template (your page's CSS does not apply to it). Add a margin.bottom equal to the footer height to prevent overlap.
Why are my fonts not loading in the PDF? expand_more
Font loading is a network request. Sandboxed renderers may lack internet access or time out. Fix: (1) self-host fonts on the same server as your HTML, (2) use waitUntil: 'networkidle0' to ensure fonts load before render, or (3) base64-encode font files and embed them in @font-face — the most reliable approach.
What is the difference between Puppeteer and Playwright for PDF generation? expand_more
Both drive headless Chromium, but Playwright is faster (~3ms warm vs ~48ms) and produces smaller files (59–125 KB vs ~197 KB). Playwright also supports Python, .NET, and Java in addition to Node.js, while Puppeteer is Node.js only. For new projects, choose Playwright. For existing Node.js codebases, Puppeteer is a safe, supported choice.
How do I convert a URL to PDF (not just HTML)? expand_more
Use page.goto(url, { waitUntil: 'networkidle0' }) instead of page.setContent(html). For pages requiring authentication, set cookies before calling goto(): await page.context().addCookies([...]). For live public URLs, ConvertStack's API accepts a URL parameter directly.
What is Gotenberg and when should I use it? expand_more
Gotenberg is an open-source Docker microservice that wraps headless Chromium (and LibreOffice) behind a REST API. Any language can call it via HTTP POST. It's ideal for polyglot architectures and teams that want a language-agnostic PDF service. The tradeoff: you still manage the Docker container, scaling, and health checks yourself.
Is there a free HTML to PDF API? expand_more
Yes. ConvertStack gives waitlist members 1,000 free conversions per month at launch — no credit card required. Self-hosted options (Playwright, Puppeteer, Gotenberg) are also free but require your own server. Commercial managed APIs like DocRaptor and Browserless offer free tiers with lower limits.

Skip the Infrastructure

ConvertStack is a managed HTML to PDF API — Chromium-powered, zero-ops, with async jobs and generous free tier.

Join the Waitlist — First 1,000 Conversions Free