Installation guide

ManualsBrandsSAP ManualsComputer equipmentDB:ODBC

171

172

173

174

175

176

177

178

179

180

mxODBC - Python ODBC Database Interface

8.8 Unicode and String Data Encodings

mxODBC also supports Unicode objects to interface with databases. As more

databases and ODBC drivers support Unicode natively, using Unicode for text

data stored in database becomes more attractive than ever and allows you to

avoid the problems you typically face when having to deal with different text

encodings and code pages in databases.

Even if you don't have access to an ODBC capable of dealing with Unicode

natively, you can still take advantage of the auto-conversion mechanisms in

mxODBC to simulate Unicode capabilities.

mxODBC provides several different run-time configurations to deal with passing

Unicode to and fetching it from an ODBC driver. The

.stringformat attribute of

connection and cursor objects allows defining how to convert string data into

Python objects and vice-versa.

Unicode conversions to and from 8-bit strings in Python usually assume the

Python default encoding (which is ASCII unless you modify the Python

installation). Since the database may be using a different encoding, mxODBC

allows defining the encoding to be used on a per-connection basis.

The

.encoding attribute of connection and cursor objects is writeable for this

purpose. Its default value is

None, meaning that Python's default encoding (usually

ASCII) is to be used. You can change the encoding by simply assigning a valid

encoding name to the attribute. Make sure that Python supports the encoding

(you can test this using the

unicode() built-in).

The default conversion mechanism used in mxODBC is

EIGHTBIT_STRINGFORMAT

(Unicode gets converted to 8-bit strings before passing the data to the driver,

output is always an 8-bit string), the default encoding Python's default encoding.

To store Unicode in a database, one possibility is to use the

UNICODE_STRINGFORMAT and set the encoding attribute to e.g. 'utf-8'.

mxODBC will then convert the Unicode input data to UTF-8, store this in the

database and convert it back to Unicode during fetch operations. Note however

that UTF-8 encoded data usually takes up more room in the database than the

Unicode equivalent, so may experience data truncations which then cause the

decoding process to fail.

Another possibility is to use the

MIXED_STRINGFORMAT which allows mxODBC to

interface to the database using the best suitable data type. For e.g. MS SQL Server

this usually means passing all string data as Unicode data to and from the

database. In

MIXED_STRINGFORMAT mode mxODBC will return string data in the

default format of the database driver, leaving the conversion to the Python

program.

Note:

mxODBC only supports Unicode objects at the data storage interface level

meaning that it can insert and fetch Unicode data from a database provided that

the database can handle Unicode and that the used mxODBC subpackage was

154