2009-04-28 18:45:33

by Steve French

[permalink] [raw]
Subject: String conversions

In looking at various patches for more accurately sizing the required
buffer needed for conversions to UTF-8, the following question came up
more than once.

Functions which do string copying often null terminate the target
string (single byte of \0), and size strings one byte larger than
their string name (for UCS-2 this does not work since the null
termination is two bytes). Are there local nls codepages in Linux
kernel which require double null termination (e.g. DBCS asian code
pages), and if so how do you tell which ones require "double null
termination?"

--
Thanks,

Steve


2009-04-29 12:43:00

by Suresh Jayaraman

[permalink] [raw]
Subject: Re: String conversions

> Steve French <[email protected]> wrote:
>
> In looking at various patches for more accurately sizing the required
> buffer needed for conversions to UTF-8, the following question came up
> more than once.
>
> Functions which do string copying often null terminate the target
> string (single byte of \0), and size strings one byte larger than
> their string name (for UCS-2 this does not work since the null
> termination is two bytes). � Are there local nls codepages in Linux
> kernel which require double null termination (e.g. DBCS asian code
> pages), and if so how do you tell which ones require "double null
> termination?"

A look at fs/nls and supported charsets suggests that the linux kernel
does not support a pure double-byte charsets. Some of the supported east
asian charsets seem to be a superset of ASCII (for e.g TIS 620, Big5)
and only non-ASCII characters are expressed in 2 bytes.


Thanks,

--
Suresh Jayaraman