Ask Anvil

Answers to questions about automating PDFs, e-signatures, Webforms, and other paperwork problems.
PDFs
Categories

Why does my generated PDF show boxes or missing characters instead of accents or non-Latin text?

What is actually happening

When a PDF renderer needs to draw a character, it looks up the glyph in the embedded font. If the font does not contain that code point, you get a hollow rectangle (often called tofu), a question mark, or nothing at all. The text is correct in the underlying PDF content stream, but the font cannot draw it.

This usually appears when a standard Western font (Helvetica, Times, Arial) is asked to render CJK, Arabic, Cyrillic, Hebrew, or Devanagari text, when emoji show up in user-generated content, or when accented Latin characters fall outside a font's embedded subset.

How to confirm it is a font problem

Open the PDF in Adobe Acrobat and check File, Properties, Fonts. The dialog lists every font the document references, including subsets, and notes whether each is embedded. If the character you expect is missing, the font shown for that text run does not cover its code point. Copying the text out and pasting it into a plain text editor confirms the underlying characters are correct, isolating the issue to rendering rather than data.

How to fix it

Switch to a Unicode-complete font. Google's Noto family was designed specifically to eliminate tofu and covers nearly every script. Noto Sans handles Latin, Cyrillic, and Greek. Noto Sans CJK adds Chinese, Japanese, and Korean. Noto Color Emoji adds emoji.

Declare a fallback stack. For HTML to PDF tools that use a Chromium-based renderer, list a brand font followed by Noto Sans and Noto CJK so the renderer falls back per glyph rather than per text run:

html body {
  font-family: 'Barlow', 'Noto Sans', 'Noto CJK', sans-serif;
}

Register a Unicode TTF in native PDF libraries. If you are using pdf-lib, ReportLab, or PDFKit, load the TTF for the script you need and assign it before drawing the text. Subset embedding is fine for production size, but the subset must include every code point your content can contain at runtime, not only the ones present at template-design time.

A note for HTML and CSS PDF generation

If you are generating PDFs with Anvil's HTML to PDF API, Noto Sans and Noto CJK are already the default fallbacks, so accented Latin and most CJK characters render correctly even without a custom font. Custom fonts are added through standard CSS @import or @font-face directives. Font files must be in TTF format and served with a Content-Type of font/ttf or application/x-font-ttf, otherwise the renderer ignores them.

Back to All Questions

The fastest way to build software for documents

Anvil Document SDK is a comprehensive toolbox for product teams launching document flows where PDF filling, signing, and complex conditional scenarios are necessary.
Explore Anvil
Anvil Webforms