Specifications
Unicode and Protocol Interaction
142 SnapServer Administrator Guide
for example, you might see {!^AB in a file name. MacOSX clients can edit such files,
and the names will be retained in their original form when written back to the file
system.
MacOS 9 and lower are not Unicode-compliant, and use the MacRoman code page
to represent extended characters. AFP translates MacRoman into UTF8 when
writing to SnapServers. Any extended characters on the file system that cannot be
translated to MacRoman will also be returned with an escape sequence.
NFS
The NFS protocol is not Unicode-compliant or -aware. Addtionally, there is no
means for the SnapServer to determine what method is being used by the client to
represent extended characters. Currently, the code pages most commonly used in
Linux environments are: 8859-1, 8859-15, and EUC-JP. The SnapServer then must
make an assumption to enable it to translate to and from UTF8 on the file system.
Therefore, when in Unicode mode, you must configure the SnapServer’s NFS
protocol for the code page being used by NFS clients. Code page options include
ISO-8859-1, ISO-8859-15, EUC-JP, and UTF8.
Any extended characters on the file system that cannot be translated to the
configured NFS code page will be returned to the NFS client with an escape
sequence. Escape sequences begin with {!^. The following two characters are the
hexidecimal value of the characters in the filename; for example, you might see
{!^AB in a file name.
FTP
FTP only supports ASCII characters by specification. Some clients bend the
specification to allow extended characters, but there is no standard means of
representing them. Therefore, no translation is performed on extended characters
for FTP clients — all filenames are written to and read from the file system as a
“bag-of-bytes”. This has two ramifications: extended characters written to the file
system by other protocols will be visible to FTP clients as garbled characters; and
FTP clients are able to write invalid UTF8 characters to the file system. For the latter
case, when other protocols encounter invalid UTF8 characters on the file system
(which normally can only be written by FTP), the characters will be returned in an
escape sequence. Escape sequences begin with {!^. The following two characters are
the hexidecimal value of the characters in the filename; for example, you might see
{!^AB in a file name.