Installation guide
mxODBC - Python ODBC Database Interface
8.8 Unicode and String Data Encodings
mxODBC also supports Unicode objects to interface with databases. As more
databases and ODBC drivers support Unicode natively, using Unicode for text
data stored in database becomes more attractive than ever and allows you to
avoid the problems you typically face when having to deal with different text
encodings and code pages in databases.
Even if you don't have access to an ODBC capable of dealing with Unicode
natively, you can still take advantage of the auto-conversion mechanisms in
mxODBC to simulate Unicode capabilities.
mxODBC provides several different run-time configurations to deal with passing
Unicode to and fetching it from an ODBC driver. The
.stringformat attribute of
connection and cursor objects allows defining how to convert string data into
Python objects and vice-versa.
Unicode conversions to and from 8-bit strings in Python usually assume the
Python default encoding (which is ASCII unless you modify the Python
installation). Since the database may be using a different encoding, mxODBC
allows defining the encoding to be used on a per-connection basis.
The
.encoding attribute of connection and cursor objects is writeable for this
purpose. Its default value is
None, meaning that Python's default encoding (usually
ASCII) is to be used. You can change the encoding by simply assigning a valid
encoding name to the attribute. Make sure that Python supports the encoding
(you can test this using the
unicode() built-in).
The default conversion mechanism used in mxODBC is
EIGHTBIT_STRINGFORMAT
(Unicode gets converted to 8-bit strings before passing the data to the driver,
output is always an 8-bit string), the default encoding Python's default encoding.
To store Unicode in a database, one possibility is to use the
UNICODE_STRINGFORMAT and set the encoding attribute to e.g. 'utf-8'.
mxODBC will then convert the Unicode input data to UTF-8, store this in the
database and convert it back to Unicode during fetch operations. Note however
that UTF-8 encoded data usually takes up more room in the database than the
Unicode equivalent, so may experience data truncations which then cause the
decoding process to fail.
Another possibility is to use the
MIXED_STRINGFORMAT which allows mxODBC to
interface to the database using the best suitable data type. For e.g. MS SQL Server
this usually means passing all string data as Unicode data to and from the
database. In
MIXED_STRINGFORMAT mode mxODBC will return string data in the
default format of the database driver, leaving the conversion to the Python
program.
Note:
mxODBC only supports Unicode objects at the data storage interface level
meaning that it can insert and fetch Unicode data from a database provided that
the database can handle Unicode and that the used mxODBC subpackage was
154