2007-02-19 15:53:45

by Pierre Ossman

[permalink] [raw]
Subject: Racy NLS behaviour in FAT (and possible other fs)

Hi,

I'm experiencing a rather odd behaviour with the character set conversion. If I
mount a vfat fs with utf8 and then create a file with invalid utf-8 sequences,
the file will briefly exist with these invalid sequences, then quickly convert
to a stripped version.

I haven't found an easy way to catch the race, but if I have nautilus open it
tends to catch it now and then (I get a file name with "<?>" replacing each bad
byte).

The race also seems to corrupt the in-memory state of the fs now and then. I
managed to create a file where "ls" shows "?" for most fields. Data seemed to
have made it to disk ok though (fsck didn't complain and a remount showed
everything as it should be).

Third, there seems to be a problem with not all syscalls being subjected to the
NLS transformation. Example:

$ echo foo > bar???.txt
$ ls
foo.txt
$ echo foo > bar???.txt
bash: bar???.txt: File exists

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org


2007-02-19 16:55:50

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Racy NLS behaviour in FAT (and possible other fs)

Pierre Ossman <[email protected]> writes:

> Hi,

Hi,

> I'm experiencing a rather odd behaviour with the character set
> conversion. If I mount a vfat fs with utf8 and then create a file
> with invalid utf-8 sequences, the file will briefly exist with these
> invalid sequences, then quickly convert to a stripped version.

Yes. utf8 support is broken, and it will fail to convert letter case
on many case. And it's why that is not recommended.

> I haven't found an easy way to catch the race, but if I have
> nautilus open it tends to catch it now and then (I get a file name
> with "<?>" replacing each bad byte).

`?' is meaning the character can't be converted.

> The race also seems to corrupt the in-memory state of the fs now and then. I
> managed to create a file where "ls" shows "?" for most fields. Data seemed to
> have made it to disk ok though (fsck didn't complain and a remount showed
> everything as it should be).

It seems readdir() was success, but stat() was failed. And the dosfsck
doesn't check filename correctly.

Those problem seem to be relateing to poor utf8 support...

> Third, there seems to be a problem with not all syscalls being subjected to the
> NLS transformation. Example:
>
> $ echo foo > bar???.txt
> $ ls
> foo.txt
> $ echo foo > bar???.txt
> bash: bar???.txt: File exists
--
OGAWA Hirofumi <[email protected]>

2007-02-19 17:03:03

by Pierre Ossman

[permalink] [raw]
Subject: Re: Racy NLS behaviour in FAT (and possible other fs)

OGAWA Hirofumi wrote:
>> I'm experiencing a rather odd behaviour with the character set
>> conversion. If I mount a vfat fs with utf8 and then create a file
>> with invalid utf-8 sequences, the file will briefly exist with these
>> invalid sequences, then quickly convert to a stripped version.
>>
>
> Yes. utf8 support is broken, and it will fail to convert letter case
> on many case. And it's why that is not recommended.
>
>

Is there any ongoing work to fix this? UTF-8 is standard on more or less
every distribution these days.

Rgds

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2007-02-19 17:32:12

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Racy NLS behaviour in FAT (and possible other fs)

Pierre Ossman <[email protected]> writes:

> OGAWA Hirofumi wrote:
>>> I'm experiencing a rather odd behaviour with the character set
>>> conversion. If I mount a vfat fs with utf8 and then create a file
>>> with invalid utf-8 sequences, the file will briefly exist with these
>>> invalid sequences, then quickly convert to a stripped version.
>>
>> Yes. utf8 support is broken, and it will fail to convert letter case
>> on many case. And it's why that is not recommended.
>
> Is there any ongoing work to fix this? UTF-8 is standard on more or less
> every distribution these days.

Yes. But sorry, I don't have any plan and time to fix it now.

If you are using "iocharset=utf8" now, "codepage=cp???,iocharset=xxx,utf8"
might help a bit.
--
OGAWA Hirofumi <[email protected]>

2007-02-20 06:42:56

by Pierre Ossman

[permalink] [raw]
Subject: Re: Racy NLS behaviour in FAT (and possible other fs)

OGAWA Hirofumi wrote:
> Yes. But sorry, I don't have any plan and time to fix it now.
>

I know the feeling. :)

I just wanted to know where things stand as I have little insight into
vfat development.

> If you are using "iocharset=utf8" now, "codepage=cp???,iocharset=xxx,utf8"
> might help a bit.
>

These tests was with the "utf8" option.

Rgds

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org