2002-11-05 10:45:12

by Samium Gromoff

[permalink] [raw]
Subject: [RFC] FS charset conversions

The problem root lies in the fact in some languages (notably russian)
there is more then one widely used charset. In russian for example
there are koi8-r, iso8859-5, cp866 and the infamous but widely used
ms cp1251.

Once you need to have access to some data with names using the second
half of the ascii table the trouble arises. For example the situation
i have here is that smbd provides the public share and people creates
there some files originating with the cp1251 encoding. Myself having
koi8-r as the system default charset naturally observe crap.

The proposed and seemingly natural solution is to add a possibility
to mount --bind the subtree with a filename charset conversion applied.


regards, Samium Gromoff

______________________________________
__________________________________


2002-11-05 11:24:59

by Alan

[permalink] [raw]
Subject: Re: [RFC] FS charset conversions

On Tue, 2002-11-05 at 10:51, Samium Gromoff wrote:
> The proposed and seemingly natural solution is to add a possibility
> to mount --bind the subtree with a filename charset conversion applied.

The traditional unix approach is to declare the universe UTF-8. No
single character set is the right answer, UTF8 preserves "/" and \0
semantics so works very well indeed.

2002-11-05 12:19:00

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] FS charset conversions



On Tue, 5 Nov 2002, Samium Gromoff wrote:

> The problem root lies in the fact in some languages (notably russian)
> there is more then one widely used charset. In russian for example
> there are koi8-r, iso8859-5, cp866 and the infamous but widely used
> ms cp1251.
>
> Once you need to have access to some data with names using the second
> half of the ascii table the trouble arises. For example the situation
> i have here is that smbd provides the public share and people creates
> there some files originating with the cp1251 encoding. Myself having
> koi8-r as the system default charset naturally observe crap.
>
> The proposed and seemingly natural solution is to add a possibility
> to mount --bind the subtree with a filename charset conversion applied.

Will not work. Bindings do _NOT_ create extra superblock/dentry tree/etc.
and they are invisible to filesystem. E.g. to ->readdir().

(besides, filesystems playing with case conversions are bad enough, now
let's VFS try charset ones?)