CHAR and VARCHAR data types
Informix®
Informix supports the following character data types:
Informix data type | Description |
---|---|
CHAR(n) |
SBCS and MBCS character data (max is 32767 bytes) |
VARCHAR(n[,m]) |
SBCS and MBCS character data (max is 255 bytes) |
NCHAR(n) |
Same as CHAR , with specific collation
order |
NVARCHAR(n[,m]) |
Same as VARCHAR , with specific collation
order |
LVARCHAR(n) |
max size varies depending on the IDS version |
With Informix, both
CHAR/VARCHAR
and NCHAR/NVARCHAR
data types can be used to store
single-byte or multibyte encoded character strings. The only difference between
CHAR/VARCHAR
and NCHAR/NVARCHAR
is in how they use sorting:
N[VAR]CHAR
types use the collation order, while [VAR]CHAR
types
use the byte order.
The character set used to store strings in
CHAR/VARCHAR/NCHAR/NVARCHAR
columns is defined by the DB_LOCALE environment
variable.
Informix uses Byte Length Semantics (the size
N that you specify in [VAR]CHAR(N)
is expressed in bytes, not characters as in some
other databases)
Microsoft™ SQL Server
Microsoft SQL Server supports following data types to store character data:
Microsoft SQL Server data type | Description |
---|---|
CHAR(n) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), where n is specified in bytes (max is 8000 bytes) |
VARCHAR(n) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), where n is specified in bytes (max is 8000 bytes) |
VARCHAR(MAX) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), to store large text data (max is 2^31-1 bytes) |
NCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (max is 4000 byte-pairs) |
NVARCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (max is 4000 byte-pairs) |
NVARCHAR(MAX) |
UCS-2/UTF-16 encoding, to store large text data (max is 2^31-1 bytes) |
VARCHAR(MAX)
/NVARCHAR(MAX)
type as a
replacement for the old TEXT
/NTEXT
types. See TEXT and BYTE (LOB) types for more details.The use of NCHAR
, NVARCHAR
character types is the same as
CHAR
, VARCHAR
respectively, except:
- For
NCHAR
/NVARCHAR
, the character encoding is UCS-2 or UTF-16, depending if the database collation is using the SC option. - The length
N
inN[VAR]CHAR(N)
defines a number of byte-pairs, not bytes. For UCS-2 this corresponds to a number of characters. But with UTF-16 there can be surrogate pairs using 4 bytes (2 byte-pairs). - The maximum size of
NCHAR(N)
andNVARCHAR(N)
column is 4000 byte-pairs, compared to 8000 bytes forCHAR/VARCHAR
using a single-byte character set. - Unicode string literals are specified with a leading N. For example: N'日本語'
- The LIKE statement behaves
differently with
CHAR
andNCHAR
columns when using the N prefix before the search pattern.
Note that SQL Server uses Byte Length Semantics to define the size of
CHAR/VARCHAR
columns, while NCHAR
and NVARCHAR
sizes are expressed in byte-pair units, which for most cases (UCS-2) corresponds to a number of
characters (but this is not true for UTF-16 characters encoded in surrogate pairs).
SQL Server defines the character encoding for CHAR
and VARCHAR
columns with the database collation. The database collation can be specified when creating a new
database. Character strings are always stored in the UCS-2 or UTF-16 encoding for
NCHAR/NVARCHAR
columns. UTF-16 is used when the DB collation has the SC option.
Automatic charset conversion is supported by SQL Server between the client application and the server. The client charset is defined by the Windows® operating system, in the language settings for non-Unicode applications.
Solution
Depending on the character set used by your application, you must either use
CHAR/VARCHAR
or NCHAR/NVARCHAR
columns with SQL Server. If the
charset is single-byte , you can use CHAR/VARCHAR
columns. If the charset set is
multibyte or Unicode (i.e. UTF-8), you must use NCHAR/NVARCHAR
columns in SQL
Server.
See also the section about Localization.
Make sure that the regional language settings for non-Unicode applications corresponds to the locale used by Genero programs.
Check that your database tables does not use CHAR
or VARCHAR
types with a length exceeding the SQL Server character types limits.
When using a multibyte character set (such as UTF-8), define database columns as
NCHAR
and NVARCHAR
, with the size in byte-pair units, and use
character length semantics in BDL programs with FGL_LENGTH_SEMANTICS=CHAR.
When extracting a database schema from a SQL Server database, the fgldbsch schema extractor uses the
size of the column in byte-pair units, not the byte length. If you have created an
NCHAR(10)
column in an SQL Server database, the .sch file will
get a size of 10, that will be interpreted as a number of bytes or characters, depending on
FGL_LENGTH_SEMANTICS.
Do not forget to properly define the database client character set, which must correspond to the runtime system character set.
CHAR/VARCHAR
type translation can be controlled with the following FGLPROFILE
entries:dbi.database.dsname.ifxemul.datatype.char = { true | false }
dbi.database.dsname.ifxemul.datatype.varchar = { true | false }
For
more details see IBM Informix emulation parameters in FGLPROFILE.Using the SNC driver
The SNC driver can work in char or in wide-char mode:
The char mode must be used with applications defining database columns with
CHAR/VARCHAR/TEXT
types. It is not mandatory (i.e. the wide-char mode
could be used), but it appears that SQL Server performs better, when char bindings are
used for CHAR/VARCHAR/TEXT
columns. In char mode, the SNC driver binds
SQL parameters (SQLBindParameter) with the SQL_CHAR/SQL_VARCHAR
ODBC SQL types.
When defining CHAR(N)/VARCHAR(N)
columns in SQL Server, you specify N as a number
of bytes. Therefore, it is recommended that you use byte length semantics in Genero programs, with
FGL_LENGTH_SEMANTICS=BYTE (this is the default).
The wide-char mode must be used for applications defining database columns with
NCHAR/NVARCHAR/NTEXT
types. These SQL types are used to store UNICODE data. In such
case, the runtime system must use a UTF-8 locale, with character length semantics
(FGL_LENGTH_SEMANTICS=CHAR). In wide-char mode, the SNC driver binds SQL parameters
(SQLBindParameter) with the SQL_WCHAR/SQL_WVARCHAR
ODBC SQL types. Furthermore, all
string literals of an SQL statement are automatically changed to get the N prefix. Thus, you don't
need to add the N prefix by hand in all of your programs. This solution makes your Genero code
portable to other databases. When defining NCHAR(N)/NVARCHAR(N)
columns in SQL
Server, you specify N as a number of byte-pairs that corresponds to a number of characters in UCS-2.
Therefore, it is recommended you use char length semantics in Genero programs, with
FGL_LENGTH_SEMANTICS=CHAR.
By default, the SNC database driver selects the expected char or wide-char mode, depending on the current application locale, assuming that the database column types fit to the application locale:
- If the application locale defines a single-byte encoding (such as ISO8859-1), the driver assumes
that the database columns are defined with
CHAR/VARCHAR/TEXT
types to store single-byte characters, and the SNC driver will useSQL_[VAR]CHAR
. - If the application locale defines a multibyte encoding (such as UTF-8 or BIG5), the driver
assumes that the database columns are defined with
NCHAR/NVARCHAR/NTEXT
types to store UNICODE characters, and the SNC driver will useSQL_W[VAR]CHAR
.
dbi.database.dbname.snc.widechar= { true | false }
dbi.database.dbname.snc.widechar
to
false
, only if you are using a multibyte encoding such as
BIG5, with CHAR/VARCHAR/TEXT
column types in the
database.Using the ESM driver
When using the ESM (Easysoft) database driver, SQL Statements are prepared with SQLPrepare(), by
using the current character set. Easysoft makes the necessary charset conversions from the client
charset to UCS-2 or UTF-16 for the server. ODBC SQL parameters with character string data are bound
(SQLBindParameter) with the C type SQL_C_CHAR
and with the SQL type
SQL_W[VAR]CHAR
(=UNICODE) or with SQL_[VAR]CHAR
, based on the
current locale. The SQL_W[VAR]CHAR
type is used if the current locale uses a
multibyte encoding. When using a single-byte encoding, parameters are bound with the
SQL_[VAR]CHAR
type. String literals get the N prefix only if the current locale
defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte
character set. As a result, the necessary character set conversion is controlled by Easysoft and is
optimized for SQL Server when using a single-byte character set.
Client_CSet
" parameter, and the server character set is defined by
"Server_CSet
" or "Server_UCSet
" parameters. For example, to cover
all UNICODE characters,
define:Client_CSet = UTF-8
Server_UCSet = UTF-16LE
Using the FTM driver
With the FTM (FreeTDS) database driver, SQL Statements are prepared with SQLPrepare(), by using
the current character set. FreeTDS makes the necessary charset conversions from the client charset
to UCS-2/UTF-16 before sending the SQL text to the server. ODBC SQL parameters with character string data
are bound (SQLBindParameter) with the C type SQL_C_CHAR
and with the SQL type
SQL_W[VAR]CHAR
(=UNICODE) or with SQL_[VAR]CHAR
, based on the
current locale. The SQL_W[VAR]CHAR
type is used if the current locale uses a
multibyte encoding. When using a single-byte encoding, parameters are bound with the
SQL_[VAR]CHAR
type. String literals get the N prefix only if the current locale
defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte
character set. As a result, the necessary character set conversion is controlled by FreeTDS and is
optimized for SQL Server when using a single-byte character set.
ClientCharset
" parameter in
odbc.ini.