Configuring How the Connector Handles Special Characters

You can configure how the Connector handles special characters. This is important because it determines how special characters are displayed for your translators.

Background

The Connector exports your content from Sitecore and sends it for translation as XML files. A valid XML file cannot contain any of the special characters listed in the table below. Instead, a valid XML file must use the following entity references to represent special characters:

Special Character Represented by This Entity Reference
< <
> >
& &
' '
" "

Escaping is the term that describes creating valid XML by converting any XML special characters to their entity references.

How does Sitecore handle special characters?

There are two types of text within Sitecore: 

  • plain text
  • rich text, which can contain HTML formatting, links. etc.

You can use the HTML editor in Sitecore’s Content Editor to view how Sitecore stores content.

Sitecore handles special characters differently, depending on whether they are in a plain text title, or a rich text paragraph.

  • In plain text, Sitecore does not escape special characters. It displays them as the actual characters. For example, it displays & as &.
  • In rich text, Sitecore escapes special characters. It displays them as their corresponding entity references. For example, it displays & as &.
How does the Connector handle special characters?

When the Connector prepares content for translation, it packages all content into XML translation files. This necessitates escaping all special characters into their corresponding entity references. However, the Connector does not differentiate between plain text (which displays the actual special characters) and rich text (which displays the entity references instead of the special characters). Therefore, the Connector escapes all special characters.

As a result, the rich text that the translator receives may contain a combination of special characters and entity references.

For example:

Suppose that the rich-text paragraph contains the text that is rendered as follows in the Sitecore Content Editor: &.

Sitecore actually stores this rich-text content as its corresponding entity reference: &.

When the Connector processes this rich-text content to create XML files to send out for translation, it escapes this content as follows:

  • & becomes &
  • amp; remains amp;

As a result, this content is escaped twice (once by Sitecore and then by the Connector).

The Connector then stores this content as &amp; in the XML translation file, which it sends to the translator.

However, some translation systems cannot handle double-escaped special characters such as &amp; or a mix of single-escaped and double-escaped special characters.