Supported content formatting features
Export to Word is capable of converting HTML content to a Word document while preserving the original formatting of the content. The converter supports a wide range of HTML elements and CSS properties, allowing for the conversion of complex content structures.
This document provides an overview of the supported content formatting features, including text styles, inline elements, block elements, and common types of CSS properties.
You can style HTML content using inline styles, styles defined inside the <style>
element, or styles provided via the css
configuration option. The converter supports style inheritance, so any properties set on ancestor elements also work in descendant elements, as long as the given property is inherited in CSS.
# Text
The converter supports many inline HTML elements that have default styling:
Element name | Default style |
---|---|
abbr | – |
b | font-weight: bold |
br | – |
cite | font-style: italic |
code | font-family: monospace |
del | text-decoration: line-through solid black |
dfn | font-style: italic; |
em | font-style: italic; |
i | font-style: italic; |
ins | text-decoration: underline solid black |
kbd | font-family: monospace |
mark | background-color: yellow |
s | text-decoration: line-through solid black |
samp | font-family: monospace |
small | font-size: smaller |
span | – |
strong | font-weight: bold |
sub | vertical-alignment: sub; font-size: smaller |
sup | vertical-alignment: sup; font-size: smaller |
u | text-decoration: underline solid black |
var | font-style: italic |
Additionally, all elements support the id
, class
, and style
attributes, which can be used to format document content with various CSS properties. The following sections provide an overview of the supported CSS properties.
# Font size
The size of a font can be controlled via the font-size
property.
# Font family
A font family can be set via the font-family
property. When multiple font family names are specified, only the first one is used. For example, the following HTML will result in a document with Courier New
font:
<span style="font-family: 'Courier New', Courier, monospace">
Example text
</span>
Generic CSS font families are converted to well-known fonts as follows:
Generic CSS font family | Word counterpart |
---|---|
serif |
Times New Roman |
sans-serif |
Arial |
cursive |
Times New Roman |
fantasy |
Impact |
monospace |
Courier New |
system-ui |
Verdana |
math |
Times New Roman |
fangsong |
Times New Roman |
kai |
Times New Roman |
nastaliq |
Times New Roman |
ui-serif |
Times New Roman |
ui-sans-serif |
Arial |
ui-monospace |
Courier New |
ui-rounded |
Times New Roman |
# Known limitations
- Not all fonts used in a document may be available in Word.
# Font style
The style of a font can be set via the font-style
property:
normal
: a font that is neither italic nor obliqueitalic
: an italic fontoblique
: treated the same asitalic
Custom slant angles in oblique fonts are not supported.
# Font weight
To make text bold, the font-weight
property can be used. The following values result in bold text:
bold
- Values greater than or equal to
700
Other values are not supported.
# Foreground and background colors
The foreground and background colors can be specified via the color
and background-color
properties respectively. The converter supports only colors in the sRGB color space:
- Named colors such as
red
,green
, orblue
. - A hexadecimal color notation such as
#FF000
,#00FF00
, or#0000FF
. - sRGB color functions:
hsl()
,hwb()
, orrgb()
.
Other color spaces (such as CIELAB or Oklab) and custom color spaces are not supported.
# Known limitations
- Support for semi-transparent background colors is limited. Overlapping semi-transparent colors are not mixed together.
# Text decoration
Conversion of text decoration is supported to different extents based on the value of the text-decoration-line
property. Only underline
and line-through
line types are supported.
# Underline
When the text-decoration-line
property is set to underline
, the converter underlines the text. The following other properties are supported for underlines:
text-decoration-style
: supportssolid
,double
,dotted
,dashed
, andwavy
values.text-decoration-color
: supports the same values as foreground and background colors.
# Strike-through
When the text-decoration-line
property is set to line-through
, the converter generates strike-through text.
text-decoration-style
: supportssolid
anddouble
values.
# Subscript and superscript
To make text appear as subscript or superscript, the vertical-align
property can be set to sub
or super
respectively.
# Hyperlinks
Converter transforms <a>
elements with the href
attribute into hyperlinks.
Allowed URL schemes for the href
attribute:
http
https
file
ftp
mailto
sms
tel
You can change the base URL of hyperlinks using the base_url
configuration option. The option changes all relative links to absolute ones. Read more about this option in the API documentation.
<a>
elements with hrefs: #
and #top
will be converted to links to the top of the document in Word. If document contains <a>
element with id
equal to #top
then all <a>
elements with href #top
will point to that element instead.
# Bookmarks
Converter transforms <a>
elements with the id
attribute into bookmarks. This allows preserving behavior of linking HTML text fragments.
Maximum length for id
attribute is 40 unicode characters as this is maximum length of bookmark name allowed in Word, any id
longer than this will be trimmed down.
# Images
Converter supports conversion of <img>
elements. The following attributes are supported:
src
width
height
alt
Empty alt
attributes (alt=""
) are treated interchangeably with missing alt
attributes.
Allowed URL schemes for the src
attribute:
http
https
data
You can change the base URL of images using the base_url
configuration option. The option changes all relative image sources to absolute ones. Read more about this option in the API documentation.
Supported image formats include:
apng
avif
bmp
gif
heic
*jpg
png
svg
tiff
*webp
* Conversion of the image width and height might not work properly.
The converter downloads and embeds all images into the document itself. This means that all images have to be accessible via the Internet at the time of conversion. When an image is not accessible, the converter generates an image that links to its source instead. However, Word displays such images only when a user is online and the images are accessible.
Word replaces missing images with a placeholder image. The size of that placeholder is not set unless the source image has the width
and height
attributes explicitly defined.
# Image captions
You can create an image caption with the <figcaption>
element. For this feature to work properly, both <figcaption>
and <img>
must be placed inside a <figure>
element.
Example of converting an image with a caption:
<figure>
<img src="https://placeholder.com/200"/>
<figcaption>This is caption of example image.</figcaption>
</figure>
# Known limitations
- For a captioned image to be converted properly, there must be no other than the required elements inside the
<figure>
element. - The caption is always positioned below the image.
# Image positioning
Images inside a <figure>
element can be additionally positioned. The following table is a breakdown of positioning types along with styles that must be applied to the <figure>
element so that the converter properly generates positioned images.
Positioning | Styles applied to <figure> element |
---|---|
Image horizontally aligned to the left | margin-left: 0; margin-right: auto |
Image horizontally centered | margin-left: auto; margin-right: auto |
Image horizontally aligned to the right | margin-left: auto; margin-right: 0 |
Image with text wrapped from the left side | float: right |
Image with text wrapped from the right side | float: left |
# Paragraphs
The converter converts <p>
elements into paragraphs. Additionally, inline content placed outside any block element is automatically wrapped in a paragraph. Multiple subsequent inline elements outside a block element are wrapped in the same paragraph.
This is text content outside of any block element.<br />
<strong>That can be additionally styled.</strong>
<p>This is a paragraph</p>
Additionally, paragraphs support formatting properties described in the following sections.
# Text alignment
Inline content can be aligned inside a paragraph via the text-align
property. The following values are supported:
left
center
right
justify
# Line height
The height of a line can be changed via the line-height
property. The converter supports both unitless values and values with a unit specified. The normal
value is treated the same as 1.2
.
# Spacing and indentation
Spacing and indentation can be changed by applying the margin
property:
- Indentation is controlled by
margin-left
andmargin-right
properties. - Spacing above and below a paragraph is controlled by
margin-top
andmargin-bottom
properties respectively.
For example, consider the following HTML that sets the default paragraph spacing to 1 cm, and the left indentation of the middle paragraph to 4 cm:
<style>
p {
margin-top: 1cm;
margin-bottom: 1cm;
}
</style>
<p>First paragraph with the default indentation and 1cm spacing.</p>
<p style="margin-left: 4cm">Second paragraph with 4cm left indentation.</p>
<p>Third paragraph with the default indentation and 1cm spacing.</p>
# Background color
It is possible to set the background color of a paragraph via the background-color
property in the same way as the background color of text.
# Headings
The converter supports all levels of HTML headings <h1>
to <h6>
. The headings are converted to corresponding Word Styles according to their level. See the Word Styles section for more information.
Headings support the same content formatting as paragraphs.
# Lists
The converter supports both ordered <ol>
and unordered <ul>
lists.
Nested lists are supported up to 9 levels of nesting, levels greater than 9 are coerced to 9, which is the maximum level that Word supports.
The starting value of ordered lists can be changed using the start
attribute. Negative values are coerced to 0 as Word does not support them.
Example:
<ul>
<li>First item</li>
<li>
Second item
<ul>
<li>Third item</li>
</ul>
</li>
</ul>
<ol start="4">
<li>Fourth item</li>
<li>Fifth item</li>
</ol>
# Marker type
The style of list item markers can be changed by applying the list-style-type
property to <ol>
and <ul>
elements.
Only the following types are supported:
decimal
decimal-leading-zero
lower-alpha
lower-latin
upper-alpha
upper-latin
lower-roman
upper-roman
disc
circle
square
none
Unsupported list types are treated as disc
.
# Known limitations
- Multi-level lists are not supported yet.
- Formatting of list item markers is not supported. Similarly,
list-style-type
cannot be applied to<li>
elements. - Indentation of lists and their content is not supported.
- Reversed lists are not supported due to Word limitations.
# Tables
The converter converts any <table>
element to a table. The <table>
element can be placed in a <figure>
element, inside table cells (allowing for nested tables), or just placed inside the document body.
<style>
table {
border-collapse: collapse;
}
th,
td {
border: 1px solid black;
padding: 10px;
}
</style>
<figure>
<table>
<thead>
<tr>
<th>Cell 1</th>
<th>Cell 2</th>
<th>Cell 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cell 1</td>
<td>Cell 2</td>
<td>Cell 3</td>
</tr>
<tr>
<td>Cell 4</td>
<td>Cell 5</td>
<td>Cell 6</td>
</tr>
</tbody>
</table>
</figure>
Additionally, the converter supports the following obsolete table attributes:
align
bgcolor
border
cellpadding
cellspacing
frame
rules
width
# Table columns
Columns of a table can be defined with <colgroup>
and <col>
elements. Both are supported by the converter. The converter supports the span
attribute on these elements, too.
# Table rows
The converter supports <thead>
, <tbody>
, and <tfoot>
row group elements as well as the <tr>
element. Only the first <thead>
and <tfoot>
elements are recognized as the table header and footer, and if they’re placed in wrong order, the converter moves them so that they’re placed at the table start or end respectively.
Any rows inside the first <thead>
element are treated as header rows and repeat on every new page in Word.
The following obsolete attributes are supported in these elements:
bgcolor
valign
# Table cells
Table cells are defined with <th>
and <td>
elements. Since Word does not support header cells, the <th>
element is treated interchangeably with the <td>
element.
Table cells may span multiple columns or rows by applying the colspan
or rowspan
attributes to them. However, rowspan
cannot exceed the number of rows in the current row group, otherwise it’s coerced to that number instead.
Table cells support the following obsolete attributes:
bgcolor
height
valign
width
Additionally, the scope
attribute may be applied to <th>
elements, although the converter ignores it.
# Table positioning
Tables can be positioned similarly to images, but the required styles may be applied either to the <table>
element itself or the parent <figure>
element. Left and right margins can be used to align a table horizontally.
Positioning | Styles applied to <table> or <figure> element |
---|---|
Table horizontally aligned to the left | margin-left: 0; margin-right: auto |
Table horizontally centered | margin-left: auto; margin-right: auto |
Table horizontally aligned to the right | margin-left: auto; margin-right: 0 |
The float
property can align a table to the left or right side with text wrapping on the opposite side.
Positioning | Styles applied to <table> or <figure> element |
---|---|
Table with text wrapped from the left side | float: right |
Table with text wrapped from the right side | float: left |
# Table borders
All table elements can be used to style table borders with the following properties:
border-color
border-style
border-width
The ridge
and groove
line styles are converted to inset
and outset
respectively, which Word renders differently from browsers. Additionally, double
borders are transformed into solid
ones if their width is less than or equal to 1 px.
Border colors support the same values as the foreground and background colors of text. The minimum and maximum border width supported by Word is 0.333… px and 16 px respectively. Values outside this range are coerced to boundary values.
Moreover, the <table>
element supports two extra border-related properties:
border-collapse
border-spacing
They can be used to control whether the borders of a table should be collapsed or separated. Word does not support separate borders without spacing, so in such cases, the converter sets the spacing to the minimum value of 0.05 pt.
# Background color
The background color of any table element can be set via the background-color
property similarly to the background color of text.
# Table and table cell width
The <table>
, <th>
, and <td>
elements support the width
property that can be used to set the width of the entire table or a single table cell. Percentages aren’t fully supported and they’re always converted into absolute units.
# Table row height
The <tr>
element supports the height
property, which is used to define the height of a table row. The converter always treats this value as the minimum table row height.
# Table cell margins
Margins of a table cell are specified via the padding
property on <th>
and <td>
elements.
# Vertical content alignment
The vertical-align
property can be set on <th>
and <td>
elements to define how the content of the cell should be aligned vertically. The supported values are top
, middle
, and bottom
.
# Known limitations
- Table captions are not supported yet.
- Word does not support tables inside list items, so they are moved out of lists.
# Preformatted text
The converter supports the <pre>
element, allowing to display a text in a fixed-width font. This is often used to display code snippets.
<pre>
________________
/ \
| How about moo? | ^__^
\________________/ (oo)\_______
\ (__)\ )\/\
||----w |
|| ||
</pre>
# Quotations
The converter supports the <blockquote>
element, allowing to display its content as a quotation.
Example:
<blockquote>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas eu
scelerisque tortor. Cras venenatis iaculis velit sit amet sollicitudin.
</blockquote>
# Page breaks
The converter supports the page
value for the break-after
and break-before
properties. Adding this style property to any inline or block HTML element will result in a page break generated before or after the corresponding element.
Page breaks are not supported inside tables.
# Horizontal lines
The converter supports the <hr>
element, allowing the insertion of horizontal lines in the Word document.
Horizontal lines do not support any styling.
# Ignored elements
Any element with the display
property set to none
or contents
, or the visibility
property set to hidden
or collapse
is ignored by the converter.
# Language
The converter supports the lang
attribute on all supported HTML elements that contain text. In Word, the lang
attribute specifies the language used for spell-checking the text within the element.