Supported content formatting features

Listed below are all content formatting features that are currently supported by the Import from Word converter.

# Inline formatting

Support for inline formatting includes conversion of inline styling (from UI toolbar buttons), styles, and default document properties.

The inline formatting is generated as native HTML tags (e.g. <strong> for bold text) or as <span> tags with proper CSS styling declarations. Applying multiple inline styles to the same content will produce nested HTML, like:

<p>
    <span style="font-size: 21.33px; font-family: Georgia, serif">
        <i>
            <u>Hello, World!</u>
        </i>
    </span>
</p>

All inline content is placed inside a paragraph (<p>) element.

# Basic styling

Provides support for basic inline styling, including bold, italics, underline, and strikethrough.

Basic styles are converted to semantic HTML tags, making the content more accessible and SEO-friendly.

Feature name Markup
Bold <strong>
Italics <i>
Underline <u>
Strikethrough <s>

# Custom underline styling

The Import from Word converter is able to preserve custom underline styling, including line style, color, and thickness.

# Known limitations

  • Currently, the supported CSS does not offer equivalents for every line type currently found in Word. Some of these styles had to be adjusted to offer the best compatibility.

# Font styles

Provides support for different font styles, including font family, font size, font color, background color, and many more.

Feature name Markup
Font size <span style="font-size: 16px">
Font family <span style="font-family: 'Comic Sans MS'">
Font color <span style="color: #ff0000">
Font background <span style="background-color: #00ffff">
Subscript <sub>
Superscript <sup>
Small caps <span style="font-variant-caps: small-caps">
All caps <span style="text-transform: uppercase">
Letter spacing <span style="letter-spacing: 1px">
Font stretching <span style="font-stretch: 125%">

Provides support for links.

Feature name Markup
Link <a href="https://cksource.com">

Links can be fully styled just as any other inline content.

<p>
    <a href="https://cksource.com">
        <span style="color: #f12ec5; font-size: 26.67px;">
            <strong>
                <u>Colorful link!</u>
            </strong>
        </span>
    </a>
</p>

# Text language

Language of text is outputted as RFC 5652 value in lang attribute for specific part of text.

To ensure that CKEditor 5 Text Part Language properly recognizes the languages returned by the converter, make sure that the format of language tags configured in your editor matches exactly with the format of language tags returned by the converter.

# Hidden text

Hidden text is supported by simply not outputting any HTML markup for it.

# Images

Import from Word can recognize both embedded images in the document and images that come from external sources.

Images are converted to <img> tags with a proper src attribute.

Feature name Markup
Image <img src="https://cksource.com/image.png">

Images that include an external hyperlink are converted to inline images, wrapped in an anchor (<a>) tag.

Additionally, the converter supports these Word features:

Feature name Markup
Alternative text <img alt="Alternative text" src="..." />
Image height <img style="height: 100px" height="100" src="..." />
Image width <img style="width: 100px" width="100" src="..." />

# Image positioning

The converter supports the basic positioning of images:

  • Distance from the top and left edges or margins of the page is translated to appropriate CSS margins and, additionally, such images get the float: left CSS property.
  • Horizontally aligned images are converted to images with an appropriate value of the float CSS property or, in the case of centered images, to images centered by the margin-left: auto and margin-right: auto CSS properties.

Moreover, setting an image behind or in front of text works by applying the appropriate value of the z-index CSS property to the image. To fully support the behind-the-text position, the container (e.g. CKEditor 5 editor) must have a transparent background. For CKEditor 5 it can be achieved by the following CSS code:

.ck-editor__editable {
    background-color: transparent;
}

# Rotation

Image rotation, both in 2D and 3D, is supported along with the Word’s built-in camera presets. When using the camera presets, only those with rotation angles defined are outputted.

Feature name Markup
Image rotation <img style="transform: rotateZ(90deg)" src="..." />

# Known limitations

  • Positioning and text wrapping settings are supported only on a basic level.
  • Horizontal alignment does not work if the image is positioned behind or in front of the text.
  • Horizontal centering does not work if the image is also vertically positioned.
  • Captions are currently converted to regular paragraphs.
  • Image perspective is not supported.

# Known limitations for inline formatting

  • Due to their complicated representation in Word documents, tab characters are always converted to 4 spaces.

# Block formatting

Block-level Word features are converted to proper block-level HTML elements to represent the same semantic meaning and visual appearance. Some elements like images can be either inline or block-level, depending on their position in the document.

Similarly to inline formatting, block formatting can be directly applied to the Word content and (depending on the feature) defined as inline styling or a separate style, or it can be part of default document properties.

# Paragraphs

Any text that occurs in the document, regardless of its styling, is placed inside a paragraph element. Paragraphs are converted to <p> tags and can be nested inside table cells and list items.

<p>
    Representative paragraph.
</p>

Paragraphs in Word can additionally be styled with the following features:

Feature name Markup
Text alignment <p style="text-align: right">
Indentation <p style="margin-left: 48px">
First line indentation <p style="text-indent: 190px">
Hanging indentation <p style="margin-left: 190px; text-indent: -190px">
Line height <p style="line-height: 1.5">
Paragraph spacing <p style="margin-top: 20px; margin-bottom: 10px">
Paragraph borders <p style="border-top: 1px solid #000000">
Background color <p style="background-color: #ffc000">

# Known limitations

  • The “at least” line height type is treated the same as the “exact” one.
  • Spacing between paragraphs of the same style is always preserved, even when explicitly disabled.

# Headings

Headings in Word documents are represented by paragraphs with a special formatting property called “outline level”. It can be applied either by using styles, such as the built-in “Heading 1” style, or by paragraph formatting.

This property determines into what heading a paragraph will be converted. The converter only supports headings from <h1> up to <h6>. If there are some headings imported outside this span, they will be clamped to the closest respective level (i.e. levels larger than 1 will be turned into <h1> while levels below 6 will be rendered as <h6>). Otherwise, they act like paragraphs.

Styles that can be applied to paragraphs can also be applied to headings.

<h1>Heading 1</h1>
<h2 style="color: #ffc000">Colored heading 2</h2>

# Lists

Import from Word support both ordered and unordered lists that are available in Word, converting them to proper HTML elements. List items text content is always wrapped within a <p> element.

The most basic list structure of an unordered list is shown below:

<ul style="list-style-type: disc">
  <li><p>List item 1</p></li>
  <li>
    <p>List item 2</p>
    <ul style="list-style-type: circle">
      <li><p>List item 3</p></li>
    </ul>
  </li>
</ul>

And the basic structure for an ordered list:

<ol style="list-style-type: decimal">
  <li><p>List item 1</p></li>
  <li>
    <p>List item 2</p>
    <ol style="list-style-type: lower-latin">
      <li><p>List item 3</p></li>
    </ol>
  </li>
</ol>

The converter supports multi-level lists that do not include some intermediary levels. Such lists are created by indenting list items. As an example, we can have a list that, after the first level, skips the second one and goes directly to the third one. This is supported by the converter, and the result will be the following:

<ol style="list-style-type: decimal">
  <li>
    <p>Level 1</p>
    <ul style="list-style-type: none">
      <li>
        <ol style="list-style-type: lower-roman">
          <li><p>Level 3</p></li>
        </ol>
      </li>
    </ul>
  </li>
</ol>

Ordered lists can start with a different number than 1 by utilizing the available start HTML attribute that matches the proper list level from a Word document:

<ol start="4" style="list-style-type: decimal">
  <li><p>Item 4</p></li>
  <li><p>Item 5</p></li>
  <li><p>Item 6</p></li>
</ol>

Due to the fact that Word lets the user set any character as a marker in unordered lists, the converter will first try to recognize the marker character and match it to supported ones as closely as possible. If the used marker cannot be recognized, it will fall back to the disc list style type.

  • Default supported unordered list styles are: disc, circle, square, and none.
  • Default supported ordered list styles are: decimal, lower-latin, lower-roman, and decimal-leading-zero.
  • Other numbering types, such as Hebrew or Thai, are also supported. However, some of them do not have a CSS counterpart, and they are converted to some other numbering that is available in CSS. An example of such numbering is the numbering using the Russian alphabet that is converted to lower-latin.

# Known limitations

  • Due to differences between built-in DOCX and CSS numbering definitions, some numbering types may have discrepancies. The extent of those discrepancies depends on specific numbering, so for some lists, the first list item with different numbering may be the thousandth one, whereas for other ones the numbering may start being different at around the 30th item.
  • Custom list markers are not supported. This applies to some built-in list styles, and, in general, to markers that cannot be represented by the built-in CSS list style types.
  • Marker detection, due to DOCX representation of unordered lists, depends on fonts used for list markers and cannot be generalized. This means that there is always a possibility of producing an invalid marker.
  • Indentation of list items is not supported right now.

# Tables

Tables are represented as <figure> elements with a <table> element inside, to keep the same content semantics as CKEditor 5.

Every figure element includes a table class, giving the integrator an easy way to apply custom styles to all tables, e.g. to reposition them inside the document.

An example of a simple table that will be output by the converter (styles skipped for clarity):

<figure class="table">
  <table>
    <tbody>
      <tr>
        <td>
          <p>Cell 1</p>
        </td>
        <td>
          <p>Cell 2</p>
        </td>
      </tr>
    </tbody>
  </table>
</figure>

Tables support most features that are available in Word, including:

Feature name Markup
Table / cell width <td style="width: 100px;">
Table / cell height <td style="height: 20px;">
Cell merging <td colspan="2">
Cell padding <td style="padding-top: 20px">
Cell spacing <table style="border-spacing: 10px">
Cell’s vertical alignment <td style="vertical-align: top">
Table background color <table style="background-color: #f4b083;">
Cell background color <td style="background-color: #f4b083;">
Table border style <table style="border-top: 1px solid #f4b083;">
Cell border style <td style="border-top: 1px solid #f4b083;">
Table header <th scope="col">
Table alignment/floating <figure style="margin-left: auto; margin-right: 0">

Table nesting is properly supported by the converter, it’s possible to have multiple tables inside each other. Table captions are converted to normal paragraphs to keep the same representation of the caption as in Word.

# Known limitations

  • Conditional table formatting is not supported.
  • Captions are currently converted to regular paragraphs.
  • Some border styles may not be properly resolved in the browser, as the resolution of border conflicts differs between HTML and Word.

# Page breaks

Page breaks in Word can be added in two ways: by using the “Page Break” button in the toolbar, or by applying a special “Page break before” paragraph formatting that adds a page break before the paragraph.

Both those methods are supported by Import from Word, and produce the following HTML:

<div class="page-break" style="page-break-after: always">
  <span style="display: none">&nbsp;</span>
</div>

# Known limitations

  • Page breaks applied through the paragraph formatting are not supported inside tables.

# Horizontal lines

Horizontal lines in Word are, similarly to HTML, designed to guide the flow of a text or to separate parts of a document. Import from Word properly recognizes and converts horizontal lines to semantically correct <hr> elements.

<p>Paragraph 1</p>
<hr />
<p>Paragraph 2</p>

# Known limitations

  • Horizontal line styling is not supported at the moment.

# Complex objects

Word incorporates more complex objects like a table of contents, text boxes, citation fields, etc. Import from Word attempts to support all of them, but some of them may not be converted properly due to the limitations of HTML and CSS itself.

# Known limitations

  • The content of a table of contents is preserved, but it does not support content bookmarks.
  • Form objects: only the text and styling are retained.