User Guide

ManualsBrandsSystran ManualsSoftwareSystran v6.0 - Desktop

131

132

133

134

135

136

137

138

139

140

SYSTRAN 6 Desktop User Guide 124

Formatted Text Files

Formatted text files for import into SDM include the document header and the

dictionary content.

The header part of the dictionary is a sequence of lines starting with the “#” character

and containing a header field followed by its value.

The content part is a sequence of lines, with each line representing a dictionary entry

whose fields are separated by tab characters.

The field types are defined in the header. It is important that each line have the

same number of fields, even if they are empty.

Required and Optional Fields for Importing Files into SDM

Header Description of Input

#AUTHOR= Optional: contains the name of the creator of the

dictionary.

#EMAIL= Optional: contains the email address of the creator of the

dictionary.

#COVERED DOMAINS= Optional header: lists all domains configured in the

dictionary.

#ENCODING= Required: defines the encoding of the file. UTF-8

encoding is recommended.

#GENERAL DICTIONARY

DOMAINS=

Optional header: lists the system domains associated

with the dictionary.

#SUMMARY= Required: the name of the UD file.

#MULTI/TM/NORM/DNT

#<Languages><Informational

columns>=

Required: These two lines are the end of the header

section.

#MULTI defines that the dictionary is a User Dictionary,

#TM defines that the dictionary is a Translation Memory,

#NORM defines that the dictionary is a Normalization

Dictionary.

#DNT is used to separate in a User Dictionary,

multilingual entries from DNT entries.

The second line describes the list of columns in the

content section. It is a list of codes separated by tab

characters as described in the following table..

Description of the different codes defining the content fields

Code Description

XX Where XX is a 2-letter ISO 639 code in uppercase. This

represents a language (Refer toAppendix B. Language

Pairs and ISO 639 Codes). The source language is

always the first column, with target languages as the

following columns.

XX_NO For Normalization Dictionaries only. XX corresponds to

the ISO 639 code for the source language. These

columns represent the Normalized columns.

UPOS User Part of Speech. This entry corresponds to the SDM

Category column.