2006-05-22 21:18:53

by Steve French

[permalink] [raw]
Subject: charset2upper broken

Charset2upper is broken, at least for utf8 (see line 41 of nls_utf8.c)
Seems straightforward to fix it for the key characters a-z (0x61-0x7a),
unless the uppercasing rules are stranger than I think - especially
since other places have it right e.g. nls_base.c seems to have it right
in its charset2upper.

I need to uppercase passwords for cifs to be able to mount to older
servers (e.g. win9x and OS/2) but since I can't tell that
utf8->charset2upper is broken I can't know when to fall back to the
simpleminded way of uppercasing that smbfs does.

I wish we could make theis charset2upper (and charset2lower probably)
optional so it could be set to null when broken or at least return an
error for those cp that are broken such as utf8 so we could fall back
... (and unfortunately utf8 is the default ...)

Apparently this breaks other guys too see below ...

On Sat, Oct 29, 2005 at 12:07:40AM +0900, OGAWA Hirofumi wrote:
>/ OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> writes:/
>/ /
>/ > Horms <horms@xxxxxxxxxxxx> writes:/
>/ >/
>/ >> static struct nls_table table = {/
>/ >> .charset = "utf8",/
>/ >> .uni2char = uni2char,/
>/ >> .char2uni = char2uni,/
>/ >> .charset2lower = identity, /* no conversion *//
>/ >> .charset2upper = identity,/
>/ >> .owner = THIS_MODULE,/
>/ >> };/
>/ >>/
>/ >> I guess it is charset2lower or charset2upper that vfat is calling,/
>/ >> which make no conversion, thus leading to the problem I outlined
above./
>/ >>/
>/ >> My question is: Is this behaviour correct, or is it a bug?/
>/ >/
>/ > This is known bug. For fixing this bug cleanly, we will need to much/
>/ > change the both of nls and filesystems./
>/ /
>/ And fatfs has "utf8" option, probably the behavior is preferable than/
>/ "iocharset=utf8". However, unfortunately "utf8" has problem too./



2006-05-24 04:31:36

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: charset2upper broken

Steve French wrote:
> Charset2upper is broken, at least for utf8 (see line 41 of nls_utf8.c)
> Seems straightforward to fix it for the key characters a-z (0x61-0x7a),
> unless the uppercasing rules are stranger than I think - especially
> since other places have it right e.g. nls_base.c seems to have it right
> in its charset2upper.

<troll>
Don't use UTF-8. Neither the kernel nor userspace is fully ready.

Also, it seems wrong to put such comples thing as a complete UNICODE upper/lower
case mapping into the kernel, especially since this mapping is different for
Turkish and non-Turkish cases (see
http://www.i18nguy.com/unicode/turkish-i18n.html). So someone should convert all
filesystems that use character conversion and case mapping to FUSE, so that they
can use glibc to do all of this dirty/complex work.
</troll>

--
Alexander E. Patrakov