guideSupported content formatting features

Export to Word is capable of converting HTML content to a Word document while preserving the original formatting of the content. The converter supports a wide range of HTML elements and CSS properties, allowing for the conversion of complex content structures.

This document provides an overview of the supported content formatting features, including text styles, inline elements, block elements, and common types of CSS properties.

You can style HTML content using inline styles, styles defined inside the <style> element, or styles provided via the css configuration option. The converter supports style inheritance, so any properties set on ancestor elements also work in descendant elements, as long as the given property is inherited in CSS.

# Text

The converter supports many inline HTML elements that have default styling:

Element name Default style
abbr
b font-weight: bold
br
cite font-style: italic
code font-family: monospace
del text-decoration: line-through solid black
dfn font-style: italic;
em font-style: italic;
i font-style: italic;
ins text-decoration: underline solid black
kbd font-family: monospace
mark background-color: yellow
s text-decoration: line-through solid black
samp font-family: monospace
small font-size: smaller
span
strong font-weight: bold
sub vertical-alignment: sub; font-size: smaller
sup vertical-alignment: sup; font-size: smaller
u text-decoration: underline solid black
var font-style: italic

Additionally, all elements support the id, class, and style attributes, which can be used to format document content with various CSS properties. The following sections provide an overview of the supported CSS properties.

# Font size

The size of a font can be controlled via the font-size property.

# Font family

A font family can be set via the font-family property. When multiple font family names are specified, only the first one is used. For example, the following HTML will result in a document with Courier New font:

<span style="font-family: 'Courier New', Courier, monospace">
    Example text
</span>

Text with Courier New font

Generic CSS font families are converted to well-known fonts as follows:

Generic CSS font family Word counterpart
serif Times New Roman
sans-serif Arial
cursive Times New Roman
fantasy Impact
monospace Courier New
system-ui Verdana
math Times New Roman
fangsong Times New Roman
kai Times New Roman
nastaliq Times New Roman
ui-serif Times New Roman
ui-sans-serif Arial
ui-monospace Courier New
ui-rounded Times New Roman

# Known limitations

  • Not all fonts used in a document may be available in Word.

# Font style

The style of a font can be set via the font-style property:

  • normal: a font that is neither italic nor oblique
  • italic: an italic font
  • oblique: treated the same as italic

Custom slant angles in oblique fonts are not supported.

# Font weight

To make text bold, the font-weight property can be used. The following values result in bold text:

  • bold
  • Values greater than or equal to 700

Other values are not supported.

# Foreground and background colors

The foreground and background colors can be specified via the color and background-color properties respectively. The converter supports only colors in the sRGB color space:

  • Named colors such as red, green, or blue.
  • A hexadecimal color notation such as #FF000, #00FF00, or #0000FF.
  • sRGB color functions: hsl(), hwb(), or rgb().

Other color spaces (such as CIELAB or Oklab) and custom color spaces are not supported.

# Known limitations

  • Support for semi-transparent background colors is limited. Overlapping semi-transparent colors are not mixed together.

# Text decoration

Conversion of text decoration is supported to different extents based on the value of the text-decoration-line property. Only underline and line-through line types are supported.

# Underline

When the text-decoration-line property is set to underline, the converter underlines the text. The following other properties are supported for underlines:

  • text-decoration-style: supports solid, double, dotted, dashed, and wavy values.
  • text-decoration-color: supports the same values as foreground and background colors.

# Strike-through

When the text-decoration-line property is set to line-through, the converter generates strike-through text.

  • text-decoration-style: supports solid and double values.

# Subscript and superscript

To make text appear as subscript or superscript, the vertical-align property can be set to sub or super respectively.

Converter transforms <a> elements with the href attribute into hyperlinks.

Allowed URL schemes for the href attribute:

  • http
  • https
  • file
  • ftp
  • mailto
  • sms
  • tel

You can change the base URL of hyperlinks using the base_url configuration option. The option changes all relative links to absolute ones. Read more about this option in the API documentation.

<a> elements with hrefs: # and #top will be converted to links to the top of the document in Word. If document contains <a> element with id equal to #top then all <a> elements with href #top will point to that element instead.

# Bookmarks

Converter transforms <a> elements with the id attribute into bookmarks. This allows preserving behavior of linking HTML text fragments.

Maximum length for id attribute is 40 unicode characters as this is maximum length of bookmark name allowed in Word, any id longer than this will be trimmed down.

# Images

Converter supports conversion of <img> elements. The following attributes are supported:

  • src
  • width
  • height
  • alt

Empty alt attributes (alt="") are treated interchangeably with missing alt attributes.

Allowed URL schemes for the src attribute:

  • http
  • https
  • data

You can change the base URL of images using the base_url configuration option. The option changes all relative image sources to absolute ones. Read more about this option in the API documentation.

Supported image formats include:

  • apng
  • avif
  • bmp
  • gif
  • heic*
  • jpg
  • png
  • svg
  • tiff*
  • webp

* Conversion of the image width and height might not work properly.

The converter downloads and embeds all images into the document itself. This means that all images have to be accessible via the Internet at the time of conversion. When an image is not accessible, the converter generates an image that links to its source instead. However, Word displays such images only when a user is online and the images are accessible.

Word replaces missing images with a placeholder image. The size of that placeholder is not set unless the source image has the width and height attributes explicitly defined.

# Image captions

You can create an image caption with the <figcaption> element. For this feature to work properly, both <figcaption> and <img> must be placed inside a <figure> element.

Example of converting an image with a caption:

<figure>
    <img src="https://placeholder.com/200"/>
    <figcaption>This is caption of example image.</figcaption>
</figure>

Captioned image

# Known limitations

  • For a captioned image to be converted properly, there must be no other than the required elements inside the <figure> element.
  • The caption is always positioned below the image.

# Image positioning

Images inside a <figure> element can be additionally positioned. The following table is a breakdown of positioning types along with styles that must be applied to the <figure> element so that the converter properly generates positioned images.

Positioning Styles applied to <figure> element
Image horizontally aligned to the left margin-left: 0; margin-right: auto
Image horizontally centered margin-left: auto; margin-right: auto
Image horizontally aligned to the right margin-left: auto; margin-right: 0
Image with text wrapped from the left side float: right
Image with text wrapped from the right side float: left

# Paragraphs

The converter converts <p> elements into paragraphs. Additionally, inline content placed outside any block element is automatically wrapped in a paragraph. Multiple subsequent inline elements outside a block element are wrapped in the same paragraph.

This is text content outside of any block element.<br />
<strong>That can be additionally styled.</strong>
<p>This is a paragraph</p>

Text content outside of any block element and a paragraph

Additionally, paragraphs support formatting properties described in the following sections.

# Text alignment

Inline content can be aligned inside a paragraph via the text-align property. The following values are supported:

  • left
  • center
  • right
  • justify

# Line height

The height of a line can be changed via the line-height property. The converter supports both unitless values and values with a unit specified. The normal value is treated the same as 1.2.

# Spacing and indentation

Spacing and indentation can be changed by applying the margin property:

  • Indentation is controlled by margin-left and margin-right properties.
  • Spacing above and below a paragraph is controlled by margin-top and margin-bottom properties respectively.

For example, consider the following HTML that sets the default paragraph spacing to 1 cm, and the left indentation of the middle paragraph to 4 cm:

<style>
p {
    margin-top: 1cm;
    margin-bottom: 1cm;
}
</style>

<p>First paragraph with the default indentation and 1cm spacing.</p>
<p style="margin-left: 4cm">Second paragraph with 4cm left indentation.</p>
<p>Third paragraph with the default indentation and 1cm spacing.</p>

Margins

# Background color

It is possible to set the background color of a paragraph via the background-color property in the same way as the background color of text.

# Headings

The converter supports all levels of HTML headings <h1> to <h6>. The headings are converted to corresponding Word Styles according to their level. See the Word Styles section for more information.

Headings support the same content formatting as paragraphs.

# Lists

The converter supports both ordered <ol> and unordered <ul> lists.

Nested lists are supported up to 9 levels of nesting, levels greater than 9 are coerced to 9, which is the maximum level that Word supports.

The starting value of ordered lists can be changed using the start attribute. Negative values are coerced to 0 as Word does not support them.

Example:

<ul>
    <li>First item</li>
    <li>
        Second item
        <ul>
            <li>Third item</li>
        </ul>
    </li>
</ul>
<ol start="4">
    <li>Fourth item</li>
    <li>Fifth item</li>
</ol>

Unordered and ordered lists

# Marker type

The style of list item markers can be changed by applying the list-style-type property to <ol> and <ul> elements.

Only the following types are supported:

  • decimal
  • decimal-leading-zero
  • lower-alpha
  • lower-latin
  • upper-alpha
  • upper-latin
  • lower-roman
  • upper-roman
  • disc
  • circle
  • square
  • none

Unsupported list types are treated as disc.

# Known limitations

  • Multi-level lists are not supported yet.
  • Formatting of list item markers is not supported. Similarly, list-style-type cannot be applied to <li> elements.
  • Indentation of lists and their content is not supported.
  • Reversed lists are not supported due to Word limitations.

# Tables

The converter converts any <table> element to a table. The <table> element can be placed in a <figure> element, inside table cells (allowing for nested tables), or just placed inside the document body.

<style>
    table {
        border-collapse: collapse;
    }
    th,
    td {
        border: 1px solid black;
        padding: 10px;
    }
</style>
<figure>
    <table>
        <thead>
            <tr>
                <th>Cell 1</th>
                <th>Cell 2</th>
                <th>Cell 3</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Cell 1</td>
                <td>Cell 2</td>
                <td>Cell 3</td>
            </tr>
            <tr>
                <td>Cell 4</td>
                <td>Cell 5</td>
                <td>Cell 6</td>
            </tr>
        </tbody>
    </table>
</figure>

Table

Additionally, the converter supports the following obsolete table attributes:

  • align
  • bgcolor
  • border
  • cellpadding
  • cellspacing
  • frame
  • rules
  • width

# Table columns

Columns of a table can be defined with <colgroup> and <col> elements. Both are supported by the converter. The converter supports the span attribute on these elements, too.

# Table rows

The converter supports <thead>, <tbody>, and <tfoot> row group elements as well as the <tr> element. Only the first <thead> and <tfoot> elements are recognized as the table header and footer, and if they’re placed in wrong order, the converter moves them so that they’re placed at the table start or end respectively.

Any rows inside the first <thead> element are treated as header rows and repeat on every new page in Word.

The following obsolete attributes are supported in these elements:

  • bgcolor
  • valign

# Table cells

Table cells are defined with <th> and <td> elements. Since Word does not support header cells, the <th> element is treated interchangeably with the <td> element.

Table cells may span multiple columns or rows by applying the colspan or rowspan attributes to them. However, rowspan cannot exceed the number of rows in the current row group, otherwise it’s coerced to that number instead.

Table cells support the following obsolete attributes:

  • bgcolor
  • height
  • valign
  • width

Additionally, the scope attribute may be applied to <th> elements, although the converter ignores it.

# Table positioning

Tables can be positioned similarly to images, but the required styles may be applied either to the <table> element itself or the parent <figure> element. Left and right margins can be used to align a table horizontally.

Positioning Styles applied to <table> or <figure> element
Table horizontally aligned to the left margin-left: 0; margin-right: auto
Table horizontally centered margin-left: auto; margin-right: auto
Table horizontally aligned to the right margin-left: auto; margin-right: 0

The float property can align a table to the left or right side with text wrapping on the opposite side.

Positioning Styles applied to <table> or <figure> element
Table with text wrapped from the left side float: right
Table with text wrapped from the right side float: left

# Table borders

All table elements can be used to style table borders with the following properties:

  • border-color
  • border-style
  • border-width

The ridge and groove line styles are converted to inset and outset respectively, which Word renders differently from browsers. Additionally, double borders are transformed into solid ones if their width is less than or equal to 1 px.

Border colors support the same values as the foreground and background colors of text. The minimum and maximum border width supported by Word is 0.333… px and 16 px respectively. Values outside this range are coerced to boundary values.

Moreover, the <table> element supports two extra border-related properties:

  • border-collapse
  • border-spacing

They can be used to control whether the borders of a table should be collapsed or separated. Word does not support separate borders without spacing, so in such cases, the converter sets the spacing to the minimum value of 0.05 pt.

# Background color

The background color of any table element can be set via the background-color property similarly to the background color of text.

# Table and table cell width

The <table>, <th>, and <td> elements support the width property that can be used to set the width of the entire table or a single table cell. Percentages aren’t fully supported and they’re always converted into absolute units.

# Table row height

The <tr> element supports the height property, which is used to define the height of a table row. The converter always treats this value as the minimum table row height.

# Table cell margins

Margins of a table cell are specified via the padding property on <th> and <td> elements.

# Vertical content alignment

The vertical-align property can be set on <th> and <td> elements to define how the content of the cell should be aligned vertically. The supported values are top, middle, and bottom.

# Known limitations

  • Table captions are not supported yet.
  • Word does not support tables inside list items, so they are moved out of lists.

# Preformatted text

The converter supports the <pre> element, allowing to display a text in a fixed-width font. This is often used to display code snippets.

<pre>
 ________________
/                \
| How about moo? |  ^__^
\________________/  (oo)\_______
                  \ (__)\       )\/\
                        ||----w |
                        ||     ||
</pre>

Preformatted text

# Quotations

The converter supports the <blockquote> element, allowing to display its content as a quotation.

Example:

<blockquote>
 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas eu
 scelerisque tortor. Cras venenatis iaculis velit sit amet sollicitudin.
</blockquote>

Quotation

# Page breaks

The converter supports the page value for the break-after and break-before properties. Adding this style property to any inline or block HTML element will result in a page break generated before or after the corresponding element.

Page breaks are not supported inside tables.

# Horizontal lines

The converter supports the <hr> element, allowing the insertion of horizontal lines in the Word document.

Horizontal lines do not support any styling.

# Ignored elements

Any element with the display property set to none or contents, or the visibility property set to hidden or collapse is ignored by the converter.

# Language

The converter supports the lang attribute on all supported HTML elements that contain text. In Word, the lang attribute specifies the language used for spell-checking the text within the element.