2006-01-08 20:21:57

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH] It's UTF-8

Signed-off-by: Alexey Dobriyan <[email protected]>
---

Documentation/filesystems/isofs.txt | 4 ++--
Documentation/filesystems/jfs.txt | 2 +-
Documentation/filesystems/vfat.txt | 6 +++---
fs/befs/linuxvfs.c | 2 +-
fs/cifs/CHANGES | 2 +-
fs/fat/dir.c | 2 +-
fs/fat/inode.c | 2 +-
fs/isofs/joliet.c | 2 +-
fs/nls/Kconfig | 2 +-
include/asm-mips/termbits.h | 2 +-
include/linux/msdos_fs.h | 2 +-
11 files changed, 14 insertions(+), 14 deletions(-)

--- a/Documentation/filesystems/isofs.txt
+++ b/Documentation/filesystems/isofs.txt
@@ -9,9 +9,9 @@ when using discs encoded using Microsoft
iocharset=name Character set to use for converting from Unicode to
ASCII. Joliet filenames are stored in Unicode format, but
Unix for the most part doesn't know how to deal with Unicode.
- There is also an option of doing UTF8 translations with the
+ There is also an option of doing UTF-8 translations with the
utf8 option.
- utf8 Encode Unicode names in UTF8 format. Default is no.
+ utf8 Encode Unicode names in UTF-8 format. Default is no.

Mount options unique to the isofs filesystem.
block=512 Set the block size for the disk to 512 bytes
--- a/Documentation/filesystems/jfs.txt
+++ b/Documentation/filesystems/jfs.txt
@@ -6,7 +6,7 @@ The following mount options are supporte

iocharset=name Character set to use for converting from Unicode to
ASCII. The default is to do no conversion. Use
- iocharset=utf8 for UTF8 translations. This requires
+ iocharset=utf8 for UTF-8 translations. This requires
CONFIG_NLS_UTF8 to be set in the kernel .config file.
iocharset=none specifies the default behavior explicitly.

--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -28,16 +28,16 @@ iocharset=name -- Character set to use f
know how to deal with Unicode.
By default, FAT_DEFAULT_IOCHARSET setting is used.

- There is also an option of doing UTF8 translations
+ There is also an option of doing UTF-8 translations
with the utf8 option.

NOTE: "iocharset=utf8" is not recommended. If unsure,
you should consider the following option instead.

-utf8=<bool> -- UTF8 is the filesystem safe version of Unicode that
+utf8=<bool> -- UTF-8 is the filesystem safe version of Unicode that
is used by the console. It can be be enabled for the
filesystem with this option. If 'uni_xlate' gets set,
- UTF8 gets disabled.
+ UTF-8 gets disabled.

uni_xlate=<bool> -- Translate unhandled Unicode characters to special
escaped sequences. This would let you backup and
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -561,7 +561,7 @@ befs_utf2nls(struct super_block *sb, con
* @sb: Superblock
* @src: Input string buffer in NLS format
* @srclen: Length of input string in bytes
- * @dest: The output string in UTF8 format
+ * @dest: The output string in UTF-8 format
* @destlen: Length of the output buffer
*
* Converts input string @src, which is in the format of the loaded NLS map,
--- a/fs/cifs/CHANGES
+++ b/fs/cifs/CHANGES
@@ -150,7 +150,7 @@ improperly zeroed buffer in CIFS Unix ex
Version 1.25
------------
Fix internationalization problem in cifs readdir with filenames that map to
-longer UTF8 strings than the string on the wire was in Unicode. Add workaround
+longer UTF-8 strings than the string on the wire was in Unicode. Add workaround
for readdir to netapp servers. Fix search rewind (seek into readdir to return
non-consecutive entries). Do not do readdir when server negotiates
buffer size to small to fit filename. Add support for reading POSIX ACLs from
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -114,7 +114,7 @@ static inline int fat_get_entry(struct i
}

/*
- * Convert Unicode 16 to UTF8, translated Unicode, or ASCII.
+ * Convert Unicode 16 to UTF-8, translated Unicode, or ASCII.
* If uni_xlate is enabled and we can't get a 1:1 conversion, use a
* colon as an escape character since it is normally invalid on the vfat
* filesystem. The following four characters are the hexadecimal digits
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -1016,7 +1016,7 @@ static int parse_options(char *options,
return -EINVAL;
}
}
- /* UTF8 doesn't provide FAT semantics */
+ /* UTF-8 doesn't provide FAT semantics */
if (!strcmp(opts->iocharset, "utf8")) {
printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
" for FAT filesystems, filesystem will be case sensitive!\n");
--- a/fs/isofs/joliet.c
+++ b/fs/isofs/joliet.c
@@ -11,7 +11,7 @@
#include "isofs.h"

/*
- * Convert Unicode 16 to UTF8 or ASCII.
+ * Convert Unicode 16 to UTF-8 or ASCII.
*/
static int
uni16_to_x8(unsigned char *ascii, u16 *uni, int len, struct nls_table *nls)
--- a/fs/nls/Kconfig
+++ b/fs/nls/Kconfig
@@ -491,7 +491,7 @@ config NLS_KOI8_U
(koi8-u) and Belarusian (koi8-ru) character sets.

config NLS_UTF8
- tristate "NLS UTF8"
+ tristate "NLS UTF-8"
depends on NLS
help
If you want to display filenames with native language characters
--- a/include/asm-mips/termbits.h
+++ b/include/asm-mips/termbits.h
@@ -77,7 +77,7 @@ struct termios {
#define IXANY 0004000 /* Any character will restart after stop. */
#define IXOFF 0010000 /* Enable start/stop input control. */
#define IMAXBEL 0020000 /* Ring bell when input queue is full. */
-#define IUTF8 0040000 /* Input is UTF8 */
+#define IUTF8 0040000 /* Input is UTF-8 */

/* c_oflag bits */
#define OPOST 0000001 /* Perform output processing. */
--- a/include/linux/msdos_fs.h
+++ b/include/linux/msdos_fs.h
@@ -199,7 +199,7 @@ struct fat_mount_options {
sys_immutable:1, /* set = system files are immutable */
dotsOK:1, /* set = hidden and system files are named '.filename' */
isvfat:1, /* 0=no vfat long filename support, 1=vfat support */
- utf8:1, /* Use of UTF8 character set (Default) */
+ utf8:1, /* Use of UTF-8 character set (Default) */
unicode_xlate:1, /* create escape sequences for unhandled Unicode */
numtail:1, /* Does first alias have a numeric '~1' type tail? */
atari:1, /* Use Atari GEMDOS variation of MS-DOS fs */


2006-01-08 21:46:30

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8


>Signed-off-by: Alexey Dobriyan <[email protected]>

I'd say ACK. However,

> iocharset=name Character set to use for converting from Unicode to
> ASCII. The default is to do no conversion. Use
>- iocharset=utf8 for UTF8 translations. This requires
>+ iocharset=utf8 for UTF-8 translations. This requires
> CONFIG_NLS_UTF8 to be set in the kernel .config file.

If you are really nitpicky about the "-", then it should also be
"iocharset=utf-8" (and whereever else). Or what's the real purpose of
adding the dashes in only half of the places, then?



Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2006-01-08 22:08:38

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

On Sun, Jan 08, 2006 at 10:46:22PM +0100, Jan Engelhardt wrote:
> > iocharset=name Character set to use for converting from Unicode to
> > ASCII. The default is to do no conversion. Use
> >- iocharset=utf8 for UTF8 translations. This requires
> >+ iocharset=utf8 for UTF-8 translations. This requires
> > CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

I don't want to be shot by everyone who has "iocharset=utf8" in
/etc/fstab.

2006-01-08 22:09:23

by Måns Rullgård

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

Jan Engelhardt <[email protected]> writes:

>>Signed-off-by: Alexey Dobriyan <[email protected]>
>
> I'd say ACK. However,
>
>> iocharset=name Character set to use for converting from Unicode to
>> ASCII. The default is to do no conversion. Use
>>- iocharset=utf8 for UTF8 translations. This requires
>>+ iocharset=utf8 for UTF-8 translations. This requires
>> CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

The patch only changes documentation/comments. Changing other things
would break compatibility, and that's usually not a good idea for
cosmetic changes.

--
M?ns Rullg?rd
[email protected]

2006-01-08 22:10:10

by Alistair John Strachan

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

On Sunday 08 January 2006 21:46, Jan Engelhardt wrote:
> >Signed-off-by: Alexey Dobriyan <[email protected]>
>
> I'd say ACK. However,
>
> > iocharset=name Character set to use for converting from Unicode to
> > ASCII. The default is to do no conversion. Use
> >- iocharset=utf8 for UTF8 translations. This requires
> >+ iocharset=utf8 for UTF-8 translations. This requires
> > CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

Also what's "Unicode 16" as used in several places in the kernel. Surely this
should be changed to UTF-16, which is the _encoding_ for the unicode
character space.

--
Cheers,
Alistair.

'No sense being pessimistic, it probably wouldn't work anyway.'
Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-01-09 08:27:44

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

Alexey Dobriyan wrote:

> if (!strcmp(opts->iocharset, "utf8")) {
> printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
> " for FAT filesystems, filesystem will be case sensitive!\n");

This warning better reads in such a way:

FAT: this is not the recommended filesystem for use with UTF-8 filenames.

Reason: the utf8 IO charset is the only IO charset that displays
filenames properly in UTF-8 locales. So the choice is really between
case-sensitive filenames (iocharset=utf8) and completely unreadable
filenames (everything else).

--
Alexander E. Patrakov

2006-01-09 09:04:52

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

On Sun, Jan 08, 2006 at 10:10:09PM +0000, Alistair John Strachan wrote:

> On Sunday 08 January 2006 21:46, Jan Engelhardt wrote:
> > >Signed-off-by: Alexey Dobriyan <[email protected]>
> >
> > I'd say ACK. However,
> >
> > > iocharset=name Character set to use for converting from Unicode to
> > > ASCII. The default is to do no conversion. Use
> > >- iocharset=utf8 for UTF8 translations. This requires
> > >+ iocharset=utf8 for UTF-8 translations. This requires
> > > CONFIG_NLS_UTF8 to be set in the kernel .config file.
> >
> > If you are really nitpicky about the "-", then it should also be
> > "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> > adding the dashes in only half of the places, then?
>
> Also what's "Unicode 16" as used in several places in the kernel. Surely this
> should be changed to UTF-16, which is the _encoding_ for the unicode
> character space.

It might also be UCS-2 and not UTF-16 in some places. They do differ.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2006-01-09 11:38:16

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

"Alexander E. Patrakov" <[email protected]> writes:

> Alexey Dobriyan wrote:
>
>> if (!strcmp(opts->iocharset, "utf8")) {
>> printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
>> " for FAT filesystems, filesystem will be case sensitive!\n");
>
> This warning better reads in such a way:
>
> FAT: this is not the recommended filesystem for use with UTF-8 filenames.
>
> Reason: the utf8 IO charset is the only IO charset that displays
> filenames properly in UTF-8 locales. So the choice is really between
> case-sensitive filenames (iocharset=utf8) and completely unreadable
> filenames (everything else).

And UTF-8 locale seems to be the only really sane today. I'd kill the
whole warning.
--
Krzysztof Halasa

2006-01-09 12:48:46

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

Jan Engelhardt wrote:
>>Signed-off-by: Alexey Dobriyan <[email protected]>
>
>
> I'd say ACK. However,
>
>
>>iocharset=name Character set to use for converting from Unicode to
>> ASCII. The default is to do no conversion. Use
>>- iocharset=utf8 for UTF8 translations. This requires
>>+ iocharset=utf8 for UTF-8 translations. This requires
>> CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

glibc was the starter, AFAIR. So both utf8 and UTF-8 are generally accepted, but utf-8 is not that
wide spread.

Kalin.

--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|

2006-01-09 18:44:22

by Xavier Bestel

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

Le lundi 09 janvier 2006 ? 12:38 +0100, Krzysztof Halasa a ?crit :
> "Alexander E. Patrakov" <[email protected]> writes:

> > FAT: this is not the recommended filesystem for use with UTF-8 filenames.
> >
> > Reason: the utf8 IO charset is the only IO charset that displays
> > filenames properly in UTF-8 locales. So the choice is really between
> > case-sensitive filenames (iocharset=utf8) and completely unreadable
> > filenames (everything else).
>
> And UTF-8 locale seems to be the only really sane today. I'd kill the
> whole warning.

.. on unix. But FAT is a sort of lingua franca of filesystems, and is
the only one understandable by every (embedded) OS. So you'd better stay
compatible with everyone else.

Xav


2006-01-10 00:12:15

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [PATCH] It's UTF-8

Xavier Bestel <[email protected]> writes:

>> And UTF-8 locale seems to be the only really sane today. I'd kill the
>> whole warning.
>
> .. on unix. But FAT is a sort of lingua franca of filesystems, and is
> the only one understandable by every (embedded) OS. So you'd better stay
> compatible with everyone else.

You stay compatible. And you can even read files with national
characters in names.
--
Krzysztof Halasa