User Guide

SYSTRAN 6 Desktop User Guide 124
Formatted Text Files
Formatted text files for import into SDM include the document header and the
dictionary content.
The header part of the dictionary is a sequence of lines starting with the “#” character
and containing a header field followed by its value.
The content part is a sequence of lines, with each line representing a dictionary entry
whose fields are separated by tab characters.
The field types are defined in the header. It is important that each line have the
same number of fields, even if they are empty.
Required and Optional Fields for Importing Files into SDM
Header Description of Input
#AUTHOR= Optional: contains the name of the creator of the
dictionary.
#EMAIL= Optional: contains the email address of the creator of the
dictionary.
#COVERED DOMAINS= Optional header: lists all domains configured in the
dictionary.
#ENCODING= Required: defines the encoding of the file. UTF-8
encoding is recommended.
#GENERAL DICTIONARY
DOMAINS=
Optional header: lists the system domains associated
with the dictionary.
#SUMMARY= Required: the name of the UD file.
#MULTI/TM/NORM/DNT
#<Languages><Informational
columns>=
Required: These two lines are the end of the header
section.
#MULTI defines that the dictionary is a User Dictionary,
#TM defines that the dictionary is a Translation Memory,
#NORM defines that the dictionary is a Normalization
Dictionary.
#DNT is used to separate in a User Dictionary,
multilingual entries from DNT entries.
The second line describes the list of columns in the
content section. It is a list of codes separated by tab
characters as described in the following table..
Description of the different codes defining the content fields
Code Description
XX Where XX is a 2-letter ISO 639 code in uppercase. This
represents a language (Refer toAppendix B. Language
Pairs and ISO 639 Codes). The source language is
always the first column, with target languages as the
following columns.
XX_NO For Normalization Dictionaries only. XX corresponds to
the ISO 639 code for the source language. These
columns represent the Normalized columns.
UPOS User Part of Speech. This entry corresponds to the SDM
Category column.