Module nls_utf8 is broken in several ways. It does not support (full)
UTF-8, despite its name. It cannot handle 4-byte UTF-8 sequences and
tolower/toupper table is not implemented at all. Which means that it is
not suitable for usage in case-insensitive filesystems or UTF-16
filesystems (because of e.g. missing UTF-16 surrogate pairs processing).
This is RFC patch series which unify and fix iocharset=utf8 mount
option in all fs drivers and converts all remaining fs drivers to use
utf8s_to_utf16s(), utf16s_to_utf8s(), utf8_to_utf32(), utf32_to_utf8
functions for implementing UTF-8 support instead of nls_utf8.
So at the end it allows to completely drop this broken nls_utf8 module.
For more details look at email thread where was discussed fs unification:
https://lore.kernel.org/linux-fsdevel/20200102211855.gg62r7jshp742d6i@pali/t/#u
This patch series is mostly untested and presented as RFC. Please let me
know what do you think about it and if is the correct way how to fix
broken UTF-8 support in fs drivers. As explained in above email thread I
think it does not make sense to try fixing whole NLS framework and it is
easier to just drop this nls_utf8 module.
Note: this patch series does not address UTF-8 fat case-sensitivity issue:
https://lore.kernel.org/linux-fsdevel/20200119221455.bac7dc55g56q2l4r@pali/
Pali Rohár (20):
fat: Fix iocharset=utf8 mount option
hfsplus: Add iocharset= mount option as alias for nls=
udf: Fix iocharset=utf8 mount option
isofs: joliet: Fix iocharset=utf8 mount option
ntfs: Undeprecate iocharset= mount option
ntfs: Fix error processing when load_nls() fails
befs: Fix printing iocharset= mount option
befs: Rename enum value Opt_charset to Opt_iocharset to match mount
option
befs: Fix error processing when load_nls() fails
befs: Allow to use native UTF-8 mode
hfs: Explicitly set hsb->nls_disk when hsb->nls_io is set
hfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
hfsplus: Do not use broken utf8 NLS table for iocharset=utf8 mount
option
jfs: Remove custom iso8859-1 implementation
jfs: Fix buffer overflow in jfs_strfromUCS_le() function
jfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
ntfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
cifs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
cifs: Remove usage of load_nls_default() calls
nls: Drop broken nls_utf8 module
fs/befs/linuxvfs.c | 22 ++++---
fs/cifs/cifs_unicode.c | 128 +++++++++++++++++++++++-------------
fs/cifs/cifs_unicode.h | 2 +-
fs/cifs/cifsfs.c | 2 +
fs/cifs/cifssmb.c | 8 +--
fs/cifs/connect.c | 8 ++-
fs/cifs/dfs_cache.c | 24 +++----
fs/cifs/dir.c | 28 ++++++--
fs/cifs/smb2pdu.c | 17 ++---
fs/cifs/winucase.c | 14 ++--
fs/fat/Kconfig | 15 -----
fs/fat/dir.c | 17 ++---
fs/fat/fat.h | 22 +++++++
fs/fat/inode.c | 28 ++++----
fs/fat/namei_vfat.c | 26 ++++++--
fs/hfs/super.c | 62 ++++++++++++++---
fs/hfs/trans.c | 62 +++++++++--------
fs/hfsplus/dir.c | 6 +-
fs/hfsplus/options.c | 39 ++++++-----
fs/hfsplus/super.c | 7 +-
fs/hfsplus/unicode.c | 31 ++++++++-
fs/hfsplus/xattr.c | 14 ++--
fs/hfsplus/xattr_security.c | 3 +-
fs/isofs/inode.c | 27 ++++----
fs/isofs/isofs.h | 1 -
fs/isofs/joliet.c | 4 +-
fs/jfs/jfs_dtree.c | 13 +++-
fs/jfs/jfs_unicode.c | 35 +++++-----
fs/jfs/jfs_unicode.h | 2 +-
fs/jfs/super.c | 29 ++++++--
fs/nls/Kconfig | 9 ---
fs/nls/Makefile | 1 -
fs/nls/nls_utf8.c | 67 -------------------
fs/ntfs/dir.c | 6 +-
fs/ntfs/inode.c | 5 +-
fs/ntfs/super.c | 60 ++++++++---------
fs/ntfs/unistr.c | 28 +++++++-
fs/udf/super.c | 50 ++++++--------
fs/udf/udf_sb.h | 2 -
fs/udf/unicode.c | 4 +-
40 files changed, 510 insertions(+), 418 deletions(-)
delete mode 100644 fs/nls/nls_utf8.c
--
2.20.1
Currently iocharset=utf8 mount option is broken and error is printed to
dmesg when it is used. To use UTF-8 as iocharset, it is required to use
utf8=1 mount option.
Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount
option and remove printing error from dmesg.
FAT by definition is case-insensitive but current Linux implementation is
case-sensitive for non-ASCII characters when UTF-8 is used. This patch does
not change this UTF-8 behavior. Only more comments in fat_utf8_strnicmp()
function are added about it.
After this patch iocharset=utf8 starts working, so there is no need to have
separate config option FAT_DEFAULT_UTF8 as FAT_DEFAULT_IOCHARSET for utf8
also starts working. So remove redundant config option FAT_DEFAULT_UTF8.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/fat/Kconfig | 15 ---------------
fs/fat/dir.c | 17 +++++++----------
fs/fat/fat.h | 22 ++++++++++++++++++++++
fs/fat/inode.c | 28 +++++++++++-----------------
fs/fat/namei_vfat.c | 26 +++++++++++++++++++-------
5 files changed, 59 insertions(+), 49 deletions(-)
diff --git a/fs/fat/Kconfig b/fs/fat/Kconfig
index 66532a71e8fd..a31594137d5e 100644
--- a/fs/fat/Kconfig
+++ b/fs/fat/Kconfig
@@ -100,18 +100,3 @@ config FAT_DEFAULT_IOCHARSET
Enable any character sets you need in File Systems/Native Language
Support.
-
-config FAT_DEFAULT_UTF8
- bool "Enable FAT UTF-8 option by default"
- depends on VFAT_FS
- default n
- help
- Set this if you would like to have "utf8" mount option set
- by default when mounting FAT filesystems.
-
- Even if you say Y here can always disable UTF-8 for
- particular mount by adding "utf8=0" to mount options.
-
- Say Y if you use UTF-8 encoding for file names, N otherwise.
-
- See <file:Documentation/filesystems/vfat.rst> for more information.
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index c4a274285858..49fe8dc6e5f0 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -33,11 +33,6 @@
#define FAT_MAX_UNI_CHARS ((MSDOS_SLOTS - 1) * 13 + 1)
#define FAT_MAX_UNI_SIZE (FAT_MAX_UNI_CHARS * sizeof(wchar_t))
-static inline unsigned char fat_tolower(unsigned char c)
-{
- return ((c >= 'A') && (c <= 'Z')) ? c+32 : c;
-}
-
static inline loff_t fat_make_i_pos(struct super_block *sb,
struct buffer_head *bh,
struct msdos_dir_entry *de)
@@ -258,10 +253,12 @@ static inline int fat_name_match(struct msdos_sb_info *sbi,
if (a_len != b_len)
return 0;
- if (sbi->options.name_check != 's')
- return !nls_strnicmp(sbi->nls_io, a, b, a_len);
- else
+ if (sbi->options.name_check == 's')
return !memcmp(a, b, a_len);
+ else if (sbi->options.utf8)
+ return !fat_utf8_strnicmp(a, b, a_len);
+ else
+ return !nls_strnicmp(sbi->nls_io, a, b, a_len);
}
enum { PARSE_INVALID = 1, PARSE_NOT_LONGNAME, PARSE_EOF, };
@@ -384,7 +381,7 @@ static int fat_parse_short(struct super_block *sb,
de->lcase & CASE_LOWER_BASE);
if (chl <= 1) {
if (!isvfat)
- ptname[i] = nocase ? c : fat_tolower(c);
+ ptname[i] = nocase ? c : fat_ascii_to_lower(c);
i++;
if (c != ' ') {
name_len = i;
@@ -421,7 +418,7 @@ static int fat_parse_short(struct super_block *sb,
if (chl <= 1) {
k++;
if (!isvfat)
- ptname[i] = nocase ? c : fat_tolower(c);
+ ptname[i] = nocase ? c : fat_ascii_to_lower(c);
i++;
if (c != ' ') {
name_len = i;
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 02d4d4234956..0cd15fb3b042 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -310,6 +310,28 @@ static inline void fatwchar_to16(__u8 *dst, const wchar_t *src, size_t len)
#endif
}
+static inline unsigned char fat_ascii_to_lower(unsigned char c)
+{
+ return ((c >= 'A') && (c <= 'Z')) ? c+32 : c;
+}
+
+static inline int fat_utf8_strnicmp(const unsigned char *a,
+ const unsigned char *b,
+ int len)
+{
+ int i;
+
+ /*
+ * FIXME: UTF-8 doesn't provide FAT semantics
+ * Case-insensitive support is only for 7-bit ASCII characters
+ */
+ for (i = 0; i < len; i++) {
+ if (fat_ascii_to_lower(a[i]) != fat_ascii_to_lower(b[i]))
+ return 1;
+ }
+ return 0;
+}
+
/* fat/cache.c */
extern void fat_cache_inval_inode(struct inode *inode);
extern int fat_get_cluster(struct inode *inode, int cluster,
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index de0c9b013a85..f8c8a739f8f0 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -957,7 +957,9 @@ static int fat_show_options(struct seq_file *m, struct dentry *root)
/* strip "cp" prefix from displayed option */
seq_printf(m, ",codepage=%s", &sbi->nls_disk->charset[2]);
if (isvfat) {
- if (sbi->nls_io)
+ if (opts->utf8)
+ seq_printf(m, ",iocharset=utf8");
+ else if (sbi->nls_io)
seq_printf(m, ",iocharset=%s", sbi->nls_io->charset);
switch (opts->shortname) {
@@ -994,8 +996,6 @@ static int fat_show_options(struct seq_file *m, struct dentry *root)
if (opts->nocase)
seq_puts(m, ",nocase");
} else {
- if (opts->utf8)
- seq_puts(m, ",utf8");
if (opts->unicode_xlate)
seq_puts(m, ",uni_xlate");
if (!opts->numtail)
@@ -1157,8 +1157,6 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
opts->errors = FAT_ERRORS_RO;
*debug = 0;
- opts->utf8 = IS_ENABLED(CONFIG_FAT_DEFAULT_UTF8) && is_vfat;
-
if (!options)
goto out;
@@ -1319,10 +1317,14 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
| VFAT_SFN_CREATE_WIN95;
break;
case Opt_utf8_no: /* 0 or no or false */
- opts->utf8 = 0;
+ fat_reset_iocharset(opts);
break;
case Opt_utf8_yes: /* empty or 1 or yes or true */
- opts->utf8 = 1;
+ fat_reset_iocharset(opts);
+ iocharset = kstrdup("utf8", GFP_KERNEL);
+ if (!iocharset)
+ return -ENOMEM;
+ opts->iocharset = iocharset;
break;
case Opt_uni_xl_no: /* 0 or no or false */
opts->unicode_xlate = 0;
@@ -1360,18 +1362,11 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
}
out:
- /* UTF-8 doesn't provide FAT semantics */
- if (!strcmp(opts->iocharset, "utf8")) {
- fat_msg(sb, KERN_WARNING, "utf8 is not a recommended IO charset"
- " for FAT filesystems, filesystem will be "
- "case sensitive!");
- }
+ opts->utf8 = !strcmp(opts->iocharset, "utf8") && is_vfat;
/* If user doesn't specify allow_utime, it's initialized from dmask. */
if (opts->allow_utime == (unsigned short)-1)
opts->allow_utime = ~opts->fs_dmask & (S_IWGRP | S_IWOTH);
- if (opts->unicode_xlate)
- opts->utf8 = 0;
if (opts->nfs == FAT_NFS_NOSTALE_RO) {
sb->s_flags |= SB_RDONLY;
sb->s_export_op = &fat_export_ops_nostale;
@@ -1832,8 +1827,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
goto out_fail;
}
- /* FIXME: utf8 is using iocharset for upper/lower conversion */
- if (sbi->options.isvfat) {
+ if (sbi->options.isvfat && !sbi->options.utf8) {
sbi->nls_io = load_nls(sbi->options.iocharset);
if (!sbi->nls_io) {
fat_msg(sb, KERN_ERR, "IO charset %s not found",
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 5369d82e0bfb..efb3cb9ea8a8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -134,6 +134,7 @@ static int vfat_hash(const struct dentry *dentry, struct qstr *qstr)
static int vfat_hashi(const struct dentry *dentry, struct qstr *qstr)
{
struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
+ int utf8 = MSDOS_SB(dentry->d_sb)->options.utf8;
const unsigned char *name;
unsigned int len;
unsigned long hash;
@@ -142,8 +143,17 @@ static int vfat_hashi(const struct dentry *dentry, struct qstr *qstr)
len = vfat_striptail_len(qstr);
hash = init_name_hash(dentry);
- while (len--)
- hash = partial_name_hash(nls_tolower(t, *name++), hash);
+ if (utf8) {
+ /*
+ * FIXME: UTF-8 doesn't provide FAT semantics
+ * Case-insensitive support is only for 7-bit ASCII characters
+ */
+ while (len--)
+ hash = partial_name_hash(fat_ascii_to_lower(*name++), hash);
+ } else {
+ while (len--)
+ hash = partial_name_hash(nls_tolower(t, *name++), hash);
+ }
qstr->hash = end_name_hash(hash);
return 0;
@@ -156,16 +166,18 @@ static int vfat_cmpi(const struct dentry *dentry,
unsigned int len, const char *str, const struct qstr *name)
{
struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
+ int utf8 = MSDOS_SB(dentry->d_sb)->options.utf8;
unsigned int alen, blen;
/* A filename cannot end in '.' or we treat it like it has none */
alen = vfat_striptail_len(name);
blen = __vfat_striptail_len(len, str);
- if (alen == blen) {
- if (nls_strnicmp(t, name->name, str, alen) == 0)
- return 0;
- }
- return 1;
+ if (alen != blen)
+ return 1;
+ else if (utf8)
+ return fat_utf8_strnicmp(name->name, str, alen);
+ else
+ return nls_strnicmp(t, name->name, str, alen);
}
/*
--
2.20.1
Ensure that specified charset in iocharset= mount option is used. On error
correctly propagate error code back to the caller.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/ntfs/super.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index 02de1aa05b7c..69c7871b742e 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -94,7 +94,7 @@ static bool parse_options(ntfs_volume *vol, char *opt)
umode_t fmask = (umode_t)-1, dmask = (umode_t)-1;
int mft_zone_multiplier = -1, on_errors = -1;
int show_sys_files = -1, case_sensitive = -1, disable_sparse = -1;
- struct nls_table *nls_map = NULL, *old_nls;
+ struct nls_table *nls_map = NULL;
/* I am lazy... (-8 */
#define NTFS_GETOPT_WITH_DEFAULT(option, variable, default_value) \
@@ -195,20 +195,12 @@ static bool parse_options(ntfs_volume *vol, char *opt)
if (!v || !*v)
goto needs_arg;
use_utf8:
- old_nls = nls_map;
+ unload_nls(nls_map);
nls_map = load_nls(v);
if (!nls_map) {
- if (!old_nls) {
- ntfs_error(vol->sb, "NLS character set "
- "%s not found.", v);
- return false;
- }
- ntfs_error(vol->sb, "NLS character set %s not "
- "found. Using previous one %s.",
- v, old_nls->charset);
- nls_map = old_nls;
- } else /* nls_map */ {
- unload_nls(old_nls);
+ ntfs_error(vol->sb, "NLS character set "
+ "%s not found.", v);
+ return false;
}
} else if (!strcmp(p, "utf8")) {
bool val = false;
--
2.20.1
Function jfs_strfromUCS_le() writes to unknown offset in buffer allocated
by __get_free_page(GFP_KERNEL). So it cannot expects that there is least
NLS_MAX_CHARSET_SIZE bytes space before end of that buffer.
Fix this issue by add a new parameter maxlen for jfs_strfromUCS_le()
function. And use it for passing remaining size of buffer to prevent buffer
overflow in kernel.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/jfs/jfs_dtree.c | 13 ++++++++++---
fs/jfs/jfs_unicode.c | 6 +++---
fs/jfs/jfs_unicode.h | 2 +-
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
index 837d42f61464..6dbdce54f139 100644
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
@@ -3013,6 +3013,7 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
int d_namleft, len, outlen;
unsigned long dirent_buf;
char *name_ptr;
+ int maxlen;
u32 dir_index;
int do_index = 0;
uint loop_count = 0;
@@ -3235,7 +3236,10 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
}
/* copy the name of head/only segment */
- outlen = jfs_strfromUCS_le(name_ptr, d->name, len,
+ maxlen = PAGE_SIZE - sizeof(struct jfs_dirent) -
+ (name_ptr - jfs_dirent->name);
+ outlen = jfs_strfromUCS_le(name_ptr, maxlen,
+ d->name, len,
codepage);
jfs_dirent->name_len = outlen;
@@ -3255,8 +3259,11 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
goto skip_one;
}
len = min(d_namleft, DTSLOTDATALEN);
- outlen = jfs_strfromUCS_le(name_ptr, t->name,
- len, codepage);
+ maxlen = PAGE_SIZE - sizeof(struct jfs_dirent) -
+ (name_ptr - jfs_dirent->name);
+ outlen = jfs_strfromUCS_le(name_ptr, maxlen,
+ t->name, len,
+ codepage);
jfs_dirent->name_len += outlen;
next = t->next;
diff --git a/fs/jfs/jfs_unicode.c b/fs/jfs/jfs_unicode.c
index 1d0f65d13b58..2db923872bf1 100644
--- a/fs/jfs/jfs_unicode.c
+++ b/fs/jfs/jfs_unicode.c
@@ -16,7 +16,7 @@
* FUNCTION: Convert little-endian unicode string to character string
*
*/
-int jfs_strfromUCS_le(char *to, const __le16 * from,
+int jfs_strfromUCS_le(char *to, int maxlen, const __le16 * from,
int len, struct nls_table *codepage)
{
int i;
@@ -25,12 +25,12 @@ int jfs_strfromUCS_le(char *to, const __le16 * from,
int warn = !!warn_again; /* once per string */
if (codepage) {
- for (i = 0; (i < len) && from[i]; i++) {
+ for (i = 0; (i < len) && from[i] && outlen < maxlen-1; i++) {
int charlen;
charlen =
codepage->uni2char(le16_to_cpu(from[i]),
&to[outlen],
- NLS_MAX_CHARSET_SIZE);
+ maxlen-1-outlen);
if (charlen > 0)
outlen += charlen;
else {
diff --git a/fs/jfs/jfs_unicode.h b/fs/jfs/jfs_unicode.h
index 9db62d047daa..8b5c74315e07 100644
--- a/fs/jfs/jfs_unicode.h
+++ b/fs/jfs/jfs_unicode.h
@@ -19,7 +19,7 @@ typedef struct {
extern signed char UniUpperTable[512];
extern UNICASERANGE UniUpperRange[];
extern int get_UCSname(struct component_name *, struct dentry *);
-extern int jfs_strfromUCS_le(char *, const __le16 *, int, struct nls_table *);
+extern int jfs_strfromUCS_le(char *, int, const __le16 *, int, struct nls_table *);
#define free_UCSname(COMP) kfree((COMP)->name)
--
2.20.1
Other fs drivers are using iocharset= mount option for specifying charset.
So add it also for hfsplus and mark old nls= mount option as deprecated.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/hfsplus/options.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
index 047e05c57560..a975548f6b91 100644
--- a/fs/hfsplus/options.c
+++ b/fs/hfsplus/options.c
@@ -23,6 +23,7 @@ enum {
opt_creator, opt_type,
opt_umask, opt_uid, opt_gid,
opt_part, opt_session, opt_nls,
+ opt_iocharset,
opt_nodecompose, opt_decompose,
opt_barrier, opt_nobarrier,
opt_force, opt_err
@@ -37,6 +38,7 @@ static const match_table_t tokens = {
{ opt_part, "part=%u" },
{ opt_session, "session=%u" },
{ opt_nls, "nls=%s" },
+ { opt_iocharset, "iocharset=%s" },
{ opt_decompose, "decompose" },
{ opt_nodecompose, "nodecompose" },
{ opt_barrier, "barrier" },
@@ -166,6 +168,9 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
}
break;
case opt_nls:
+ pr_warn("option nls= is deprecated, use iocharset=\n");
+ /* fallthrough */
+ case opt_iocharset:
if (sbi->nls) {
pr_err("unable to change nls mapping\n");
return 0;
@@ -230,7 +235,7 @@ int hfsplus_show_options(struct seq_file *seq, struct dentry *root)
if (sbi->session >= 0)
seq_printf(seq, ",session=%u", sbi->session);
if (sbi->nls)
- seq_printf(seq, ",nls=%s", sbi->nls->charset);
+ seq_printf(seq, ",iocharset=%s", sbi->nls->charset);
if (test_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags))
seq_puts(seq, ",nodecompose");
if (test_bit(HFSPLUS_SB_NOBARRIER, &sbi->flags))
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
So instead of broken utf8 nls functions char2uni() and uni2char() use
functions utf8s_to_utf16s() and utf16s_to_utf8s() which implements correct
conversion between UTF-16 and UTF-8.
These functions implements also correct processing of UTF-16 surrogate
pairs and therefore after this change jfs driver would be able to correctly
handle also file names with 4-byte UTF-8 sequences.
When iochatset=utf8 is used then set sbi->nls_tab to NULL and use it for
distinguish between the fact if NLS table or native UTF-8 functions should
be used.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/jfs/jfs_unicode.c | 17 +++++++++++++++--
fs/jfs/super.c | 24 +++++++++++++++---------
2 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/fs/jfs/jfs_unicode.c b/fs/jfs/jfs_unicode.c
index 2db923872bf1..4c39b6b65bca 100644
--- a/fs/jfs/jfs_unicode.c
+++ b/fs/jfs/jfs_unicode.c
@@ -46,6 +46,9 @@ int jfs_strfromUCS_le(char *to, int maxlen, const __le16 * from,
}
}
}
+ } else {
+ outlen = utf16s_to_utf8s(from, len,
+ UTF16_LITTLE_ENDIAN, to, maxlen-1);
}
to[outlen] = 0;
return outlen;
@@ -61,6 +64,7 @@ static int jfs_strtoUCS(wchar_t * to, const unsigned char *from, int len,
struct nls_table *codepage)
{
int charlen;
+ int outlen;
int i;
if (codepage) {
@@ -75,10 +79,19 @@ static int jfs_strtoUCS(wchar_t * to, const unsigned char *from, int len,
return charlen;
}
}
+ outlen = i;
+ } else {
+ outlen = utf8s_to_utf16s(from, len, UTF16_LITTLE_ENDIAN,
+ to, len);
+ if (outlen < 1) {
+ jfs_err("jfs_strtoUCS: utf8s_to_utf16s returned %d.",
+ outlen);
+ return outlen;
+ }
}
- to[i] = 0;
- return i;
+ to[outlen] = 0;
+ return outlen;
}
/*
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 8ba2ac032292..f449fdd56654 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -261,16 +261,20 @@ static int parse_options(char *options, struct super_block *sb, s64 *newLVSize,
/* Don't do anything ;-) */
break;
case Opt_iocharset:
- if (nls_map && nls_map != (void *) -1)
+ if (nls_map && nls_map != (void *) -1) {
unload_nls(nls_map);
- /* compatibility alias none means ISO-8859-1 */
- if (strcmp(args[0].from, "none") == 0)
- nls_map = load_nls("iso8859-1");
- else
- nls_map = load_nls(args[0].from);
- if (!nls_map) {
- pr_err("JFS: charset not found\n");
- goto cleanup;
+ nls_map = NULL;
+ }
+ if (strcmp(args[0].from, "utf8") != 0) {
+ /* compatibility alias none means ISO-8859-1 */
+ if (strcmp(args[0].from, "none") == 0)
+ nls_map = load_nls("iso8859-1");
+ else
+ nls_map = load_nls(args[0].from);
+ if (!nls_map) {
+ pr_err("JFS: charset not found\n");
+ goto cleanup;
+ }
}
break;
case Opt_resize:
@@ -718,6 +722,8 @@ static int jfs_show_options(struct seq_file *seq, struct dentry *root)
seq_printf(seq, ",discard=%u", sbi->minblks_trim);
if (sbi->nls_tab)
seq_printf(seq, ",iocharset=%s", sbi->nls_tab->charset);
+ else
+ seq_puts(seq, ",iocharset=utf8");
if (sbi->flag & JFS_ERR_CONTINUE)
seq_printf(seq, ",errors=continue");
if (sbi->flag & JFS_ERR_PANIC)
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
So instead of broken utf8 nls functions char2uni() and uni2char() use
functions utf8s_to_utf16s() and utf16s_to_utf8s() which implements correct
conversion between UTF-16 and UTF-8.
These functions implements also correct processing of UTF-16 surrogate
pairs and therefore after this change ntfs driver would be able to correctly
handle also file names with 4-byte UTF-8 sequences.
When iochatset=utf8 is used then set vol->nls_map to NULL and use it for
distinguish between the fact if NLS table or native UTF-8 functions should
be used.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/ntfs/dir.c | 6 ++++--
fs/ntfs/inode.c | 5 ++++-
fs/ntfs/super.c | 41 ++++++++++++++++++++++++-----------------
fs/ntfs/unistr.c | 27 ++++++++++++++++++++++++---
4 files changed, 56 insertions(+), 23 deletions(-)
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
index cd96083a12c8..035582b92aa2 100644
--- a/fs/ntfs/dir.c
+++ b/fs/ntfs/dir.c
@@ -1034,7 +1034,8 @@ static inline int ntfs_filldir(ntfs_volume *vol,
}
name_len = ntfs_ucstonls(vol, (ntfschar*)&ie->key.file_name.file_name,
ie->key.file_name.file_name_length, &name,
- NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1);
+ NTFS_MAX_NAME_LEN *
+ (vol->nls_map ? NLS_MAX_CHARSET_SIZE : 4) + 1);
if (name_len <= 0) {
ntfs_warning(vol->sb, "Skipping unrepresentable inode 0x%llx.",
(long long)MREF_LE(ie->data.dir.indexed_file));
@@ -1118,7 +1119,8 @@ static int ntfs_readdir(struct file *file, struct dir_context *actor)
* Allocate a buffer to store the current name being processed
* converted to format determined by current NLS.
*/
- name = kmalloc(NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1, GFP_NOFS);
+ name = kmalloc(NTFS_MAX_NAME_LEN *
+ (vol->nls_map ? NLS_MAX_CHARSET_SIZE : 4) + 1, GFP_NOFS);
if (unlikely(!name)) {
err = -ENOMEM;
goto err_out;
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
index 3676f185b4a0..1437944be66d 100644
--- a/fs/ntfs/inode.c
+++ b/fs/ntfs/inode.c
@@ -2303,7 +2303,10 @@ int ntfs_show_options(struct seq_file *sf, struct dentry *root)
seq_printf(sf, ",fmask=0%o", vol->fmask);
seq_printf(sf, ",dmask=0%o", vol->dmask);
}
- seq_printf(sf, ",iocharset=%s", vol->nls_map->charset);
+ if (vol->nls_map)
+ seq_printf(sf, ",iocharset=%s", vol->nls_map->charset);
+ else
+ seq_puts(sf, ",iocharset=utf8");
if (NVolCaseSensitive(vol))
seq_printf(sf, ",case_sensitive");
if (NVolShowSystemFiles(vol))
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index 69c7871b742e..358f5e9e3c46 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -84,7 +84,7 @@ static int simple_getbool(char *s, bool *setval)
*
* Parse the recognized options in @opt for the ntfs volume described by @vol.
*/
-static bool parse_options(ntfs_volume *vol, char *opt)
+static bool parse_options(ntfs_volume *vol, char *opt, int remount)
{
char *p, *v, *ov;
static char *utf8 = "utf8";
@@ -95,6 +95,7 @@ static bool parse_options(ntfs_volume *vol, char *opt)
int mft_zone_multiplier = -1, on_errors = -1;
int show_sys_files = -1, case_sensitive = -1, disable_sparse = -1;
struct nls_table *nls_map = NULL;
+ int have_iocharset = 0;
/* I am lazy... (-8 */
#define NTFS_GETOPT_WITH_DEFAULT(option, variable, default_value) \
@@ -196,12 +197,16 @@ static bool parse_options(ntfs_volume *vol, char *opt)
goto needs_arg;
use_utf8:
unload_nls(nls_map);
- nls_map = load_nls(v);
- if (!nls_map) {
- ntfs_error(vol->sb, "NLS character set "
- "%s not found.", v);
- return false;
+ nls_map = NULL;
+ if (strcmp(v, "utf8") != 0) {
+ nls_map = load_nls(v);
+ if (!nls_map) {
+ ntfs_error(vol->sb, "NLS character set "
+ "%s not found.", v);
+ return false;
+ }
}
+ have_iocharset = 1;
} else if (!strcmp(p, "utf8")) {
bool val = false;
ntfs_warning(vol->sb, "Option utf8 is no longer "
@@ -241,25 +246,27 @@ static bool parse_options(ntfs_volume *vol, char *opt)
return false;
}
}
- if (nls_map) {
- if (vol->nls_map && vol->nls_map != nls_map) {
+ if (have_iocharset) {
+ if (remount && vol->nls_map != nls_map) {
ntfs_error(vol->sb, "Cannot change NLS character set "
"on remount.");
return false;
- } /* else (!vol->nls_map) */
- ntfs_debug("Using NLS character set %s.", nls_map->charset);
- vol->nls_map = nls_map;
- } else /* (!nls_map) */ {
- if (!vol->nls_map) {
+ } else (!remount) {
+ ntfs_debug("Using NLS character set %s.",
+ nls_map ? nls_map->charset : "utf8");
+ vol->nls_map = nls_map;
+ }
+ } else if (!remount) {
+ if (strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0) {
vol->nls_map = load_nls_default();
if (!vol->nls_map) {
ntfs_error(vol->sb, "Failed to load default "
"NLS character set.");
return false;
}
- ntfs_debug("Using default NLS character set (%s).",
- vol->nls_map->charset);
}
+ ntfs_debug("Using default NLS character set (%s).",
+ vol->nls_map ? vol->nls_map->charset : "utf8");
}
if (mft_zone_multiplier != -1) {
if (vol->mft_zone_multiplier && vol->mft_zone_multiplier !=
@@ -534,7 +541,7 @@ static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
// TODO: Deal with *flags.
- if (!parse_options(vol, opt))
+ if (!parse_options(vol, opt, 1))
return -EINVAL;
ntfs_debug("Done.");
@@ -2731,7 +2738,7 @@ static int ntfs_fill_super(struct super_block *sb, void *opt, const int silent)
NVolSetSparseEnabled(vol);
/* Important to get the mount options dealt with now. */
- if (!parse_options(vol, (char*)opt))
+ if (!parse_options(vol, (char*)opt, 0))
goto err_out_now;
/* We support sector sizes up to the PAGE_SIZE. */
diff --git a/fs/ntfs/unistr.c b/fs/ntfs/unistr.c
index 75a7f73bccdd..f29d83fb09bb 100644
--- a/fs/ntfs/unistr.c
+++ b/fs/ntfs/unistr.c
@@ -254,6 +254,16 @@ int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
if (likely(ins)) {
ucs = kmem_cache_alloc(ntfs_name_cache, GFP_NOFS);
if (likely(ucs)) {
+ if (!nls) {
+ wc_len = utf8s_to_utf16s(ins, ins_len,
+ UTF16_LITTLE_ENDIAN, ucs,
+ NTFS_MAX_NAME_LEN);
+ if (wc_len < 0 || wc_len >= NTFS_MAX_NAME_LEN)
+ goto name_err;
+ ucs[wc_len] = 0;
+ *outs = ucs;
+ return o;
+ }
for (i = o = 0; i < ins_len; i += wc_len) {
wc_len = nls->char2uni(ins + i, ins_len - i,
&wc);
@@ -283,7 +293,7 @@ int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
if (wc_len < 0) {
ntfs_error(vol->sb, "Name using character set %s contains "
"characters that cannot be converted to "
- "Unicode.", nls->charset);
+ "Unicode.", nls ? nls->charset : "utf8");
i = -EILSEQ;
} else /* if (o >= NTFS_MAX_NAME_LEN) */ {
ntfs_error(vol->sb, "Name is too long (maximum length for a "
@@ -335,11 +345,22 @@ int ntfs_ucstonls(const ntfs_volume *vol, const ntfschar *ins,
goto conversion_err;
}
if (!ns) {
- ns_len = ins_len * NLS_MAX_CHARSET_SIZE;
+ ns_len = ins_len * (nls ? NLS_MAX_CHARSET_SIZE : 4);
ns = kmalloc(ns_len + 1, GFP_NOFS);
if (!ns)
goto mem_err_out;
}
+ if (!nls) {
+ o = utf16s_to_utf8s(ins, ins_len, UTF16_LITTLE_ENDIAN,
+ ns, ns_len);
+ if (o >= ns_len) {
+ wc = -ENAMETOOLONG;
+ goto conversion_err;
+ }
+ ns[o] = 0;
+ *outs = ns;
+ return o;
+ }
for (i = o = 0; i < ins_len; i++) {
retry: wc = nls->uni2char(le16_to_cpu(ins[i]), ns + o,
ns_len - o);
@@ -373,7 +394,7 @@ retry: wc = nls->uni2char(le16_to_cpu(ins[i]), ns + o,
ntfs_error(vol->sb, "Unicode name contains characters that cannot be "
"converted to character set %s. You might want to "
"try to use the mount option iocharset=utf8.",
- nls->charset);
+ nls ? nls->charset : "utf8");
if (ns != *outs)
kfree(ns);
if (wc != -ENAMETOOLONG)
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
Now that all filesystems are using utf8s_to_utf16s()/utf16s_to_utf8s()
functions for converting between UTF-8 and UTF-16, and functions
utf8_to_utf32()/utf32_to_utf8() for converting between UTF-8 and Unicode
code points, there is no need to have this broken utf8 NLS module in kernel
tree anymore.
There is no user of this utf8 NLS module, so completely drop it,
Signed-off-by: Pali Rohár <[email protected]>
---
fs/nls/Kconfig | 9 -------
fs/nls/Makefile | 1 -
fs/nls/nls_utf8.c | 67 -----------------------------------------------
3 files changed, 77 deletions(-)
delete mode 100644 fs/nls/nls_utf8.c
diff --git a/fs/nls/Kconfig b/fs/nls/Kconfig
index c7857e36adbb..8f82cf30a493 100644
--- a/fs/nls/Kconfig
+++ b/fs/nls/Kconfig
@@ -608,13 +608,4 @@ config NLS_MAC_TURKISH
If unsure, say Y.
-config NLS_UTF8
- tristate "NLS UTF-8"
- help
- If you want to display filenames with native language characters
- from the Microsoft FAT file system family or from JOLIET CD-ROMs
- correctly on the screen, you need to include the appropriate
- input/output character sets. Say Y here for the UTF-8 encoding of
- the Unicode/ISO9646 universal character set.
-
endif # NLS
diff --git a/fs/nls/Makefile b/fs/nls/Makefile
index ac54db297128..e573db7fc173 100644
--- a/fs/nls/Makefile
+++ b/fs/nls/Makefile
@@ -42,7 +42,6 @@ obj-$(CONFIG_NLS_ISO8859_14) += nls_iso8859-14.o
obj-$(CONFIG_NLS_ISO8859_15) += nls_iso8859-15.o
obj-$(CONFIG_NLS_KOI8_R) += nls_koi8-r.o
obj-$(CONFIG_NLS_KOI8_U) += nls_koi8-u.o nls_koi8-ru.o
-obj-$(CONFIG_NLS_UTF8) += nls_utf8.o
obj-$(CONFIG_NLS_MAC_CELTIC) += mac-celtic.o
obj-$(CONFIG_NLS_MAC_CENTEURO) += mac-centeuro.o
obj-$(CONFIG_NLS_MAC_CROATIAN) += mac-croatian.o
diff --git a/fs/nls/nls_utf8.c b/fs/nls/nls_utf8.c
deleted file mode 100644
index afcfbc4a14db..000000000000
--- a/fs/nls/nls_utf8.c
+++ /dev/null
@@ -1,67 +0,0 @@
-/*
- * Module for handling utf8 just like any other charset.
- * By Urban Widmark 2000
- */
-
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/nls.h>
-#include <linux/errno.h>
-
-static unsigned char identity[256];
-
-static int uni2char(wchar_t uni, unsigned char *out, int boundlen)
-{
- int n;
-
- if (boundlen <= 0)
- return -ENAMETOOLONG;
-
- n = utf32_to_utf8(uni, out, boundlen);
- if (n < 0) {
- *out = '?';
- return -EINVAL;
- }
- return n;
-}
-
-static int char2uni(const unsigned char *rawstring, int boundlen, wchar_t *uni)
-{
- int n;
- unicode_t u;
-
- n = utf8_to_utf32(rawstring, boundlen, &u);
- if (n < 0 || u > MAX_WCHAR_T) {
- *uni = 0x003f; /* ? */
- return -EINVAL;
- }
- *uni = (wchar_t) u;
- return n;
-}
-
-static struct nls_table table = {
- .charset = "utf8",
- .uni2char = uni2char,
- .char2uni = char2uni,
- .charset2lower = identity, /* no conversion */
- .charset2upper = identity,
-};
-
-static int __init init_nls_utf8(void)
-{
- int i;
- for (i=0; i<256; i++)
- identity[i] = i;
-
- return register_nls(&table);
-}
-
-static void __exit exit_nls_utf8(void)
-{
- unregister_nls(&table);
-}
-
-module_init(init_nls_utf8)
-module_exit(exit_nls_utf8)
-MODULE_LICENSE("Dual BSD/GPL");
--
2.20.1
cifs functions will use UTF-8 encoding when nls_table is set to NULL. So
there is no need to load "dummy" NLS table for some operations.
On few places in dfs_cache replace utf8 nls by utf8_to_utf32() function
which converts UTF-8 sequence to unicode code points (stored as type
unicode_t). This should fix handling of (UTF-16) CIFS paths with UTF-16
surrogare pairs, which utf8 nls module cannot handle.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/cifs/cifssmb.c | 8 ++------
fs/cifs/dfs_cache.c | 24 ++++++++----------------
fs/cifs/smb2pdu.c | 17 ++++-------------
3 files changed, 14 insertions(+), 35 deletions(-)
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 65d1a65bfc37..8a2eb380c97d 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -119,7 +119,6 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
int rc;
struct cifs_ses *ses;
struct TCP_Server_Info *server;
- struct nls_table *nls_codepage;
int retries;
/*
@@ -186,8 +185,6 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
if (!ses->need_reconnect && !tcon->need_reconnect)
return 0;
- nls_codepage = load_nls_default();
-
/*
* need to prevent multiple threads trying to simultaneously
* reconnect the same SMB session
@@ -207,7 +204,7 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
rc = cifs_negotiate_protocol(0, ses);
if (rc == 0 && ses->need_reconnect)
- rc = cifs_setup_session(0, ses, nls_codepage);
+ rc = cifs_setup_session(0, ses, NULL);
/* do we need to reconnect tcon? */
if (rc || !tcon->need_reconnect) {
@@ -216,7 +213,7 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
}
cifs_mark_open_files_invalid(tcon);
- rc = cifs_tree_connect(0, tcon, nls_codepage);
+ rc = cifs_tree_connect(0, tcon, NULL);
mutex_unlock(&ses->session_mutex);
cifs_dbg(FYI, "reconnect tcon rc = %d\n", rc);
@@ -252,7 +249,6 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
rc = -EAGAIN;
}
- unload_nls(nls_codepage);
return rc;
}
diff --git a/fs/cifs/dfs_cache.c b/fs/cifs/dfs_cache.c
index 283745592844..3ba748e59e64 100644
--- a/fs/cifs/dfs_cache.c
+++ b/fs/cifs/dfs_cache.c
@@ -66,8 +66,6 @@ static struct workqueue_struct *dfscache_wq __read_mostly;
static int cache_ttl;
static DEFINE_SPINLOCK(cache_ttl_lock);
-static struct nls_table *cache_cp;
-
/*
* Number of entries in the cache
*/
@@ -194,14 +192,14 @@ char *dfs_cache_canonical_path(const char *path, const struct nls_table *cp, int
if (!path || strlen(path) < 3 || (*path != '\\' && *path != '/'))
return ERR_PTR(-EINVAL);
- if (unlikely(strcmp(cp->charset, cache_cp->charset))) {
+ if (unlikely(cp)) {
tmp = (char *)cifs_strndup_to_utf16(path, strlen(path), &plen, cp, remap);
if (!tmp) {
cifs_dbg(VFS, "%s: failed to convert path to utf16\n", __func__);
return ERR_PTR(-EINVAL);
}
- npath = cifs_strndup_from_utf16(tmp, plen, true, cache_cp);
+ npath = cifs_strndup_from_utf16(tmp, plen, true, NULL);
kfree(tmp);
if (!npath) {
@@ -413,9 +411,6 @@ int dfs_cache_init(void)
INIT_HLIST_HEAD(&cache_htable[i]);
atomic_set(&cache_count, 0);
- cache_cp = load_nls("utf8");
- if (!cache_cp)
- cache_cp = load_nls_default();
cifs_dbg(FYI, "%s: initialized DFS referral cache\n", __func__);
return 0;
@@ -429,11 +424,11 @@ static int cache_entry_hash(const void *data, int size, unsigned int *hash)
{
int i, clen;
const unsigned char *s = data;
- wchar_t c;
+ unicode_t c;
unsigned int h = 0;
for (i = 0; i < size; i += clen) {
- clen = cache_cp->char2uni(&s[i], size - i, &c);
+ clen = utf8_to_utf32(&s[i], size - i, &c);
if (unlikely(clen < 0)) {
cifs_dbg(VFS, "%s: can't convert char\n", __func__);
return clen;
@@ -622,14 +617,14 @@ static int add_cache_entry_locked(struct dfs_info3_param *refs, int numrefs)
static bool dfs_path_equal(const char *s1, int len1, const char *s2, int len2)
{
int i, l1, l2;
- wchar_t c1, c2;
+ unicode_t c1, c2;
if (len1 != len2)
return false;
for (i = 0; i < len1; i += l1) {
- l1 = cache_cp->char2uni(&s1[i], len1 - i, &c1);
- l2 = cache_cp->char2uni(&s2[i], len2 - i, &c2);
+ l1 = utf8_to_utf32(&s1[i], len1 - i, &c1);
+ l2 = utf8_to_utf32(&s2[i], len2 - i, &c2);
if (unlikely(l1 < 0 && l2 < 0)) {
if (s1[i] != s2[i])
return false;
@@ -719,7 +714,6 @@ static struct cache_entry *lookup_cache_entry(const char *path)
void dfs_cache_destroy(void)
{
cancel_delayed_work_sync(&refresh_task);
- unload_nls(cache_cp);
free_mount_group_list();
flush_cache_ents();
kmem_cache_destroy(cache_slab);
@@ -767,10 +761,8 @@ static int get_dfs_referral(const unsigned int xid, struct cifs_ses *ses, const
if (!ses || !ses->server || !ses->server->ops->get_dfs_refer)
return -EOPNOTSUPP;
- if (unlikely(!cache_cp))
- return -EINVAL;
- rc = ses->server->ops->get_dfs_refer(xid, ses, path, refs, numrefs, cache_cp,
+ rc = ses->server->ops->get_dfs_refer(xid, ses, path, refs, numrefs, NULL,
NO_MAP_UNI_RSVD);
if (!rc) {
struct dfs_info3_param *ref = *refs;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 781d14e5f2af..b44f91dd2782 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -145,7 +145,6 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
struct TCP_Server_Info *server)
{
int rc;
- struct nls_table *nls_codepage;
struct cifs_ses *ses;
int retries;
@@ -233,8 +232,6 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
if (!tcon->ses->need_reconnect && !tcon->need_reconnect)
return 0;
- nls_codepage = load_nls_default();
-
/*
* need to prevent multiple threads trying to simultaneously reconnect
* the same SMB session
@@ -262,7 +259,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
rc = cifs_negotiate_protocol(0, tcon->ses);
if (!rc && tcon->ses->need_reconnect) {
- rc = cifs_setup_session(0, tcon->ses, nls_codepage);
+ rc = cifs_setup_session(0, tcon->ses, NULL);
if ((rc == -EACCES) && !tcon->retry) {
rc = -EHOSTDOWN;
ses->binding = false;
@@ -286,7 +283,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
if (tcon->use_persistent)
tcon->need_reopen_files = true;
- rc = cifs_tree_connect(0, tcon, nls_codepage);
+ rc = cifs_tree_connect(0, tcon, NULL);
mutex_unlock(&tcon->ses->session_mutex);
cifs_dbg(FYI, "reconnect tcon rc = %d\n", rc);
@@ -322,7 +319,6 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
rc = -EAGAIN;
}
failed:
- unload_nls(nls_codepage);
return rc;
}
@@ -481,12 +477,10 @@ build_encrypt_ctxt(struct smb2_encryption_neg_context *pneg_ctxt)
static unsigned int
build_netname_ctxt(struct smb2_netname_neg_context *pneg_ctxt, char *hostname)
{
- struct nls_table *cp = load_nls_default();
-
pneg_ctxt->ContextType = SMB2_NETNAME_NEGOTIATE_CONTEXT_ID;
/* copy up to max of first 100 bytes of server name to NetName field */
- pneg_ctxt->DataLength = cpu_to_le16(2 * cifs_strtoUTF16(pneg_ctxt->NetName, hostname, 100, cp));
+ pneg_ctxt->DataLength = cpu_to_le16(2 * cifs_strtoUTF16(pneg_ctxt->NetName, hostname, 100, NULL));
/* context size is DataLength + minimal smb2_neg_context */
return DIV_ROUND_UP(le16_to_cpu(pneg_ctxt->DataLength) +
sizeof(struct smb2_neg_context), 8) * 8;
@@ -2498,7 +2492,6 @@ alloc_path_with_tree_prefix(__le16 **out_path, int *out_size, int *out_len,
const char *treename, const __le16 *path)
{
int treename_len, path_len;
- struct nls_table *cp;
const __le16 sep[] = {cpu_to_le16('\\'), cpu_to_le16(0x0000)};
/*
@@ -2529,11 +2522,9 @@ alloc_path_with_tree_prefix(__le16 **out_path, int *out_size, int *out_len,
if (!*out_path)
return -ENOMEM;
- cp = load_nls_default();
- cifs_strtoUTF16(*out_path, treename, treename_len, cp);
+ cifs_strtoUTF16(*out_path, treename, treename_len, NULL);
UniStrcat(*out_path, sep);
UniStrcat(*out_path, path);
- unload_nls(cp);
return 0;
}
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
So instead of broken utf8 nls functions char2uni() and uni2char() use
functions utf8s_to_utf16s() and utf16s_to_utf8s() which implements correct
conversion between UTF-16 and UTF-8.
When iochatset=utf8 is used then set ctx->iocharset to NULL and use it for
distinguish between the fact if NLS table or native UTF-8 functions should
be used.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/cifs/cifs_unicode.c | 128 +++++++++++++++++++++++++++--------------
fs/cifs/cifs_unicode.h | 2 +-
fs/cifs/cifsfs.c | 2 +
fs/cifs/connect.c | 8 ++-
fs/cifs/dir.c | 28 +++++++--
fs/cifs/winucase.c | 14 +++--
6 files changed, 124 insertions(+), 58 deletions(-)
diff --git a/fs/cifs/cifs_unicode.c b/fs/cifs/cifs_unicode.c
index 9bd03a231032..b0f7f78da7c2 100644
--- a/fs/cifs/cifs_unicode.c
+++ b/fs/cifs/cifs_unicode.c
@@ -131,20 +131,17 @@ cifs_mapchar(char *target, const __u16 *from, const struct nls_table *cp,
convert_sfu_char(src_char, target))
return len;
- /* if character not one of seven in special remap set */
- len = cp->uni2char(src_char, target, NLS_MAX_CHARSET_SIZE);
- if (len <= 0)
- goto surrogate_pair;
-
- return len;
+ if (cp) {
+ /* if character not one of seven in special remap set */
+ len = cp->uni2char(src_char, target, NLS_MAX_CHARSET_SIZE);
+ if (len <= 0)
+ goto unknown;
+ } else {
+ len = utf16s_to_utf8s(from, 3, UTF16_LITTLE_ENDIAN, target, 6);
+ if (len <= 0)
+ goto unknown;
+ }
-surrogate_pair:
- /* convert SURROGATE_PAIR and IVS */
- if (strcmp(cp->charset, "utf8"))
- goto unknown;
- len = utf16s_to_utf8s(from, 3, UTF16_LITTLE_ENDIAN, target, 6);
- if (len <= 0)
- goto unknown;
return len;
unknown:
@@ -240,6 +237,37 @@ cifs_from_utf16(char *to, const __le16 *from, int tolen, int fromlen,
return outlen;
}
+static int cifs_utf8s_to_utf16s(const char *s, int inlen, __le16 *pwcs)
+{
+ __le16 *op;
+ int size;
+ unicode_t u;
+
+ op = pwcs;
+ while (inlen > 0 && *s) {
+ if (*s & 0x80) {
+ size = utf8_to_utf32(s, inlen, &u);
+ if (size <= 0) {
+ u = 0x003f; /* A question mark */
+ size = 1;
+ }
+ s += size;
+ inlen -= size;
+ if (u >= 0x10000) {
+ u -= 0x10000;
+ *op++ = __cpu_to_le16(0xd800 | ((u >> 10) & 0x03ff));
+ *op++ = __cpu_to_le16(0xdc00 | (u & 0x03ff));
+ } else {
+ *op++ = __cpu_to_le16(u);
+ }
+ } else {
+ *op++ = __cpu_to_le16(*s++);
+ inlen--;
+ }
+ }
+ return op - pwcs;
+}
+
/*
* NAME: cifs_strtoUTF16()
*
@@ -255,24 +283,14 @@ cifs_strtoUTF16(__le16 *to, const char *from, int len,
wchar_t wchar_to; /* needed to quiet sparse */
/* special case for utf8 to handle no plane0 chars */
- if (!strcmp(codepage->charset, "utf8")) {
+ if (!codepage) {
/*
* convert utf8 -> utf16, we assume we have enough space
* as caller should have assumed conversion does not overflow
- * in destination len is length in wchar_t units (16bits)
- */
- i = utf8s_to_utf16s(from, len, UTF16_LITTLE_ENDIAN,
- (wchar_t *) to, len);
-
- /* if success terminate and exit */
- if (i >= 0)
- goto success;
- /*
- * if fails fall back to UCS encoding as this
- * function should not return negative values
- * currently can fail only if source contains
- * invalid encoded characters
+ * in destination len is length in __le16 units
*/
+ i = cifs_utf8s_to_utf16s(from, len, to);
+ goto success;
}
for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
@@ -508,25 +526,29 @@ cifsConvertToUTF16(__le16 *target, const char *source, int srclen,
* as they use backslash as separator.
*/
if (dst_char == 0) {
- charlen = cp->char2uni(source + i, srclen - i, &tmp);
- dst_char = cpu_to_le16(tmp);
-
- /*
- * if no match, use question mark, which at least in
- * some cases serves as wild card
- */
- if (charlen > 0)
- goto ctoUTF16;
-
- /* convert SURROGATE_PAIR */
- if (strcmp(cp->charset, "utf8") || !wchar_to)
- goto unknown;
- if (*(source + i) & 0x80) {
- charlen = utf8_to_utf32(source + i, 6, &u);
- if (charlen < 0)
+ if (cp) {
+ charlen = cp->char2uni(source + i, srclen - i, &tmp);
+ dst_char = cpu_to_le16(tmp);
+
+ /*
+ * if no match, use question mark, which at least in
+ * some cases serves as wild card
+ */
+ if (charlen > 0)
+ goto ctoUTF16;
+ else
goto unknown;
- } else
+ }
+
+ /* UTF-8 to UTF-16 conversion */
+
+ if (!wchar_to)
goto unknown;
+
+ charlen = utf8_to_utf32(source + i, 6, &u);
+ if (charlen < 0)
+ goto unknown;
+
ret = utf8s_to_utf16s(source + i, charlen,
UTF16_LITTLE_ENDIAN,
wchar_to, 6);
@@ -595,8 +617,26 @@ cifs_local_to_utf16_bytes(const char *from, int len,
{
int charlen;
int i;
+ int outlen;
+ unicode_t u_to;
wchar_t wchar_to;
+ if (!codepage) {
+ outlen = 0;
+ for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
+ charlen = utf8_to_utf32(from, len, &u_to);
+ /* Failed conversion defaults to a question mark */
+ if (charlen < 1) {
+ charlen = 1;
+ outlen += 2;
+ } else if (u_to <= 0xFFFF)
+ outlen += 2;
+ else
+ outlen += 4;
+ }
+ return outlen;
+ }
+
for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
charlen = codepage->char2uni(from, len, &wchar_to);
/* Failed conversion defaults to a question mark */
diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
index 80b3d845419f..b9a3290faaf7 100644
--- a/fs/cifs/cifs_unicode.h
+++ b/fs/cifs/cifs_unicode.h
@@ -106,7 +106,7 @@ extern __le16 *cifs_strndup_to_utf16(const char *src, const int maxlen,
int remap);
#endif
-wchar_t cifs_toupper(wchar_t in);
+unicode_t cifs_toupper(unicode_t in);
/*
* UniStrcat: Concatenate the second string to the first
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 64b71c4e2a9d..9941bb6f2aad 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -569,6 +569,8 @@ cifs_show_options(struct seq_file *s, struct dentry *root)
cifs_sb->ctx->dir_mode);
if (cifs_sb->ctx->iocharset)
seq_printf(s, ",iocharset=%s", cifs_sb->ctx->iocharset);
+ else
+ seq_puts(s, ",iocharset=utf8");
if (tcon->seal)
seq_puts(s, ",seal");
else if (tcon->ses->server->ignore_signature)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 3781eee9360a..d560fb7a9aed 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2338,7 +2338,11 @@ compare_mount_options(struct super_block *sb, struct cifs_mnt_data *mnt_data)
old->ctx->dir_mode != new->ctx->dir_mode)
return 0;
- if (strcmp(old->local_nls->charset, new->local_nls->charset))
+ if (old->local_nls && !new->local_nls)
+ return 0;
+ if (!old->local_nls && new->local_nls)
+ return 0;
+ if (old->local_nls && new->local_nls && strcmp(old->local_nls->charset, new->local_nls->charset))
return 0;
if (old->ctx->acregmax != new->ctx->acregmax)
@@ -2800,7 +2804,7 @@ int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb)
if (ctx->iocharset == NULL) {
/* load_nls_default cannot return null */
cifs_sb->local_nls = load_nls_default();
- } else {
+ } else if (strcmp(ctx->iocharset, "utf8") != 0) {
cifs_sb->local_nls = load_nls(ctx->iocharset);
if (cifs_sb->local_nls == NULL) {
cifs_dbg(VFS, "CIFS mount error: iocharset %s not found\n",
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 79402ca0ddfa..fa09fb5d3641 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -789,16 +789,22 @@ static int cifs_ci_hash(const struct dentry *dentry, struct qstr *q)
{
struct nls_table *codepage = CIFS_SB(dentry->d_sb)->local_nls;
unsigned long hash;
+ unicode_t u;
wchar_t c;
int i, charlen;
hash = init_name_hash(dentry);
for (i = 0; i < q->len; i += charlen) {
- charlen = codepage->char2uni(&q->name[i], q->len - i, &c);
+ if (codepage) {
+ charlen = codepage->char2uni(&q->name[i], q->len - i, &c);
+ if (likely(charlen > 0))
+ u = c;
+ } else
+ charlen = utf8_to_utf32(&q->name[i], q->len - i, &u);
/* error out if we can't convert the character */
if (unlikely(charlen < 0))
return charlen;
- hash = partial_name_hash(cifs_toupper(c), hash);
+ hash = partial_name_hash(cifs_toupper(u), hash);
}
q->hash = end_name_hash(hash);
@@ -809,6 +815,7 @@ static int cifs_ci_compare(const struct dentry *dentry,
unsigned int len, const char *str, const struct qstr *name)
{
struct nls_table *codepage = CIFS_SB(dentry->d_sb)->local_nls;
+ unicode_t u1, u2;
wchar_t c1, c2;
int i, l1, l2;
@@ -822,9 +829,18 @@ static int cifs_ci_compare(const struct dentry *dentry,
return 1;
for (i = 0; i < len; i += l1) {
- /* Convert characters in both strings to UTF-16. */
- l1 = codepage->char2uni(&str[i], len - i, &c1);
- l2 = codepage->char2uni(&name->name[i], name->len - i, &c2);
+ /* Convert characters in both strings to UTF-32. */
+ if (codepage) {
+ l1 = codepage->char2uni(&str[i], len - i, &c1);
+ l2 = codepage->char2uni(&name->name[i], name->len - i, &c2);
+ if (likely(l1 > 0))
+ u1 = c1;
+ if (likely(l2 > 0))
+ u2 = c2;
+ } else {
+ l1 = utf8_to_utf32(&str[i], len - i, &u1);
+ l2 = utf8_to_utf32(&name->name[i], name->len - i, &u2);
+ }
/*
* If we can't convert either character, just declare it to
@@ -845,7 +861,7 @@ static int cifs_ci_compare(const struct dentry *dentry,
return 1;
/* Now compare uppercase versions of these characters */
- if (cifs_toupper(c1) != cifs_toupper(c2))
+ if (cifs_toupper(u1) != cifs_toupper(u2))
return 1;
}
diff --git a/fs/cifs/winucase.c b/fs/cifs/winucase.c
index 59b6c577aa0a..fce38de59e13 100644
--- a/fs/cifs/winucase.c
+++ b/fs/cifs/winucase.c
@@ -18,7 +18,7 @@
#include <linux/nls.h>
-wchar_t cifs_toupper(wchar_t in); /* quiet sparse */
+unicode_t cifs_toupper(unicode_t in); /* quiet sparse */
static const wchar_t t2_00[256] = {
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
@@ -616,20 +616,24 @@ static const wchar_t *const toplevel[256] = {
};
/**
- * cifs_toupper - convert a wchar_t from lower to uppercase
+ * cifs_toupper - convert a unicode_t from lower to uppercase
* @in: character to convert from lower to uppercase
*
- * This function consults the static tables above to convert a wchar_t from
+ * This function consults the static tables above to convert a unicode_t from
* lower to uppercase. In the event that there is no mapping, the original
* "in" character is returned.
*/
-wchar_t
-cifs_toupper(wchar_t in)
+unicode_t
+cifs_toupper(unicode_t in)
{
unsigned char idx;
const wchar_t *tbl;
wchar_t out;
+ /* cifs_toupper table has only defines for plane-0 */
+ if (in > 0xffff)
+ return in;
+
/* grab upper byte */
idx = (in & 0xff00) >> 8;
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
So instead of broken utf8 nls functions char2uni() and uni2char() use
functions utf8_to_utf32() and utf32_to_utf8() which implements correct
encoding and decoding between Unicode code points and UTF-8 sequence.
Note that this fs driver does not support full Unicode range, specially
UTF-16 surrogate pairs are unsupported. This patch does not change this
limitation and support for UTF-16 surrogate pairs stay unimplemented.
When iochatset=utf8 is used then set sbi->nls to NULL and use it for
distinguish between the fact if NLS table or native UTF-8 functions should
be used.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/hfsplus/dir.c | 6 ++++--
fs/hfsplus/options.c | 32 ++++++++++++++++++--------------
fs/hfsplus/super.c | 7 +------
fs/hfsplus/unicode.c | 31 ++++++++++++++++++++++++++++---
fs/hfsplus/xattr.c | 14 +++++++++-----
fs/hfsplus/xattr_security.c | 3 ++-
6 files changed, 62 insertions(+), 31 deletions(-)
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index 84714bbccc12..2caf0cd82221 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -144,7 +144,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
if (err)
return err;
- strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
+ strbuf = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
if (!strbuf) {
err = -ENOMEM;
goto out;
@@ -203,7 +204,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
fd.entrylength);
type = be16_to_cpu(entry.type);
- len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
+ len = (HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_MAX_STRLEN;
err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
if (err)
goto out;
diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
index a975548f6b91..16c08cb5c4f8 100644
--- a/fs/hfsplus/options.c
+++ b/fs/hfsplus/options.c
@@ -104,6 +104,9 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
char *p;
substring_t args[MAX_OPT_ARGS];
int tmp, token;
+ int have_iocharset;
+
+ have_iocharset = 0;
if (!input)
goto done;
@@ -171,20 +174,24 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
pr_warn("option nls= is deprecated, use iocharset=\n");
/* fallthrough */
case opt_iocharset:
- if (sbi->nls) {
+ if (have_iocharset) {
pr_err("unable to change nls mapping\n");
return 0;
}
p = match_strdup(&args[0]);
- if (p)
- sbi->nls = load_nls(p);
- if (!sbi->nls) {
- pr_err("unable to load nls mapping \"%s\"\n",
- p);
- kfree(p);
+ if (!p)
return 0;
+ if (strcmp(p, "utf8") != 0) {
+ sbi->nls = load_nls(p);
+ if (!sbi->nls) {
+ pr_err("unable to load nls mapping "
+ "\"%s\"\n", p);
+ kfree(p);
+ return 0;
+ }
}
kfree(p);
+ have_iocharset = 1;
break;
case opt_decompose:
clear_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
@@ -207,13 +214,10 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
}
done:
- if (!sbi->nls) {
- /* try utf8 first, as this is the old default behaviour */
- sbi->nls = load_nls("utf8");
- if (!sbi->nls)
- sbi->nls = load_nls_default();
- if (!sbi->nls)
- return 0;
+ if (!have_iocharset) {
+ /* use utf8, as this is the old default behaviour */
+ pr_debug("using native UTF-8 without nls\n");
+ /* no sbi->nls means that native UTF-8 code is used */
}
return 1;
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index b9e3db3f855f..985662451bfc 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -403,11 +403,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
/* temporarily use utf8 to correctly find the hidden dir below */
nls = sbi->nls;
- sbi->nls = load_nls("utf8");
- if (!sbi->nls) {
- pr_err("unable to load nls for utf8\n");
- goto out_unload_nls;
- }
+ sbi->nls = NULL;
/* Grab the volume header */
if (hfsplus_read_wrapper(sb)) {
@@ -585,7 +581,6 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
}
}
- unload_nls(sbi->nls);
sbi->nls = nls;
return 0;
diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
index 73342c925a4b..1d8c31c5126f 100644
--- a/fs/hfsplus/unicode.c
+++ b/fs/hfsplus/unicode.c
@@ -190,7 +190,12 @@ int hfsplus_uni2asc(struct super_block *sb,
c0 = ':';
break;
}
- res = nls->uni2char(c0, op, len);
+ if (nls)
+ res = nls->uni2char(c0, op, len);
+ else (len > 0)
+ res = utf32_to_utf8(c0, op, len);
+ else
+ res = -ENAMETOOLONG;
if (res < 0) {
if (res == -ENAMETOOLONG)
goto out;
@@ -233,7 +238,12 @@ int hfsplus_uni2asc(struct super_block *sb,
cc = c0;
}
done:
- res = nls->uni2char(cc, op, len);
+ if (nls)
+ res = nls->uni2char(cc, op, len);
+ else (len > 0)
+ res = utf32_to_utf8(cc, op, len);
+ else
+ res = -ENAMETOOLONG;
if (res < 0) {
if (res == -ENAMETOOLONG)
goto out;
@@ -256,7 +266,22 @@ int hfsplus_uni2asc(struct super_block *sb,
static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
wchar_t *uc)
{
- int size = HFSPLUS_SB(sb)->nls->char2uni(astr, len, uc);
+ struct nls_table *nls = HFSPLUS_SB(sb)->nls;
+ unicode_t u;
+ int size;
+
+ if (nls)
+ size = nls->char2uni(astr, len, uc);
+ else {
+ size = utf8_to_utf32(astr, len, &u);
+ if (size >= 0) {
+ /* TODO: Add support for UTF-16 surrogate pairs */
+ if (u <= MAX_WCHAR_T)
+ *uc = u;
+ else
+ size = -EINVAL;
+ }
+ }
if (size <= 0) {
*uc = '?';
size = 1;
diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
index e2855ceefd39..9b2653f08a5f 100644
--- a/fs/hfsplus/xattr.c
+++ b/fs/hfsplus/xattr.c
@@ -425,7 +425,8 @@ int hfsplus_setxattr(struct inode *inode, const char *name,
char *xattr_name;
int res;
- xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
+ xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_ATTR_MAX_STRLEN + 1,
GFP_KERNEL);
if (!xattr_name)
return -ENOMEM;
@@ -579,7 +580,8 @@ ssize_t hfsplus_getxattr(struct inode *inode, const char *name,
int res;
char *xattr_name;
- xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
+ xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_ATTR_MAX_STRLEN + 1,
GFP_KERNEL);
if (!xattr_name)
return -ENOMEM;
@@ -699,8 +701,9 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
return err;
}
- strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN +
- XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL);
+ strbuf = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_ATTR_MAX_STRLEN + XATTR_MAC_OSX_PREFIX_LEN + 1,
+ GFP_KERNEL);
if (!strbuf) {
res = -ENOMEM;
goto out;
@@ -732,7 +735,8 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
if (be32_to_cpu(attr_key.cnid) != inode->i_ino)
goto end_listxattr;
- xattr_name_len = NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN;
+ xattr_name_len = (HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4)
+ * HFSPLUS_ATTR_MAX_STRLEN;
if (hfsplus_uni2asc(inode->i_sb,
(const struct hfsplus_unistr *)&fd.key->attr.key_name,
strbuf, &xattr_name_len)) {
diff --git a/fs/hfsplus/xattr_security.c b/fs/hfsplus/xattr_security.c
index c1c7a16cbf21..438ebcd1359b 100644
--- a/fs/hfsplus/xattr_security.c
+++ b/fs/hfsplus/xattr_security.c
@@ -41,7 +41,8 @@ static int hfsplus_initxattrs(struct inode *inode,
char *xattr_name;
int err = 0;
- xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
+ xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
+ HFSPLUS_ATTR_MAX_STRLEN + 1,
GFP_KERNEL);
if (!xattr_name)
return -ENOMEM;
--
2.20.1
NLS table for utf8 is broken and cannot be fixed.
So instead of broken utf8 nls functions char2uni() and uni2char() use
functions utf8_to_utf32() and utf32_to_utf8() which implements correct
encoding and decoding between Unicode code points and UTF-8 sequence.
When iochatset=utf8 is used then set hsb->nls_io to NULL and use it for
distinguish between the fact if NLS table or native UTF-8 functions should
be used.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/hfs/super.c | 33 ++++++++++++++++++++++-----------
fs/hfs/trans.c | 24 ++++++++++++++++++++----
2 files changed, 42 insertions(+), 15 deletions(-)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 86bc46746c7f..076308df41cf 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -149,10 +149,13 @@ static int hfs_show_options(struct seq_file *seq, struct dentry *root)
seq_printf(seq, ",part=%u", sbi->part);
if (sbi->session >= 0)
seq_printf(seq, ",session=%u", sbi->session);
- if (sbi->nls_disk)
+ if (sbi->nls_disk) {
seq_printf(seq, ",codepage=%s", sbi->nls_disk->charset);
- if (sbi->nls_io)
- seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
+ if (sbi->nls_io)
+ seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
+ else
+ seq_puts(seq, ",iocharset=utf8");
+ }
if (sbi->s_quiet)
seq_printf(seq, ",quiet");
return 0;
@@ -225,6 +228,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
char *p;
substring_t args[MAX_OPT_ARGS];
int tmp, token;
+ int have_iocharset;
/* initialize the sb with defaults */
hsb->s_uid = current_uid();
@@ -239,6 +243,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
if (!options)
return 1;
+ have_iocharset = 0;
+
while ((p = strsep(&options, ",")) != NULL) {
if (!*p)
continue;
@@ -332,18 +338,22 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
kfree(p);
break;
case opt_iocharset:
- if (hsb->nls_io) {
+ if (have_iocharset) {
pr_err("unable to change iocharset\n");
return 0;
}
p = match_strdup(&args[0]);
- if (p)
- hsb->nls_io = load_nls(p);
- if (!hsb->nls_io) {
- pr_err("unable to load iocharset \"%s\"\n", p);
- kfree(p);
+ if (!p)
return 0;
+ if (strcmp(p, "utf8") != 0) {
+ hsb->nls_io = load_nls(p);
+ if (!hsb->nls_io) {
+ pr_err("unable to load iocharset \"%s\"\n", p);
+ kfree(p);
+ return 0;
+ }
}
+ have_iocharset = 1;
kfree(p);
break;
default:
@@ -351,7 +361,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
}
}
- if (hsb->nls_io && !hsb->nls_disk) {
+ if (have_iocharset && !hsb->nls_disk) {
/*
* Previous version of hfs driver did something unexpected:
* When codepage was not defined but iocharset was then
@@ -382,7 +392,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
return 0;
}
}
- if (hsb->nls_disk && !hsb->nls_io) {
+ if (hsb->nls_disk &&
+ !have_iocharset && strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0) {
hsb->nls_io = load_nls_default();
if (!hsb->nls_io) {
pr_err("unable to load default iocharset\n");
diff --git a/fs/hfs/trans.c b/fs/hfs/trans.c
index c75682c61b06..bff8e54003ab 100644
--- a/fs/hfs/trans.c
+++ b/fs/hfs/trans.c
@@ -44,7 +44,7 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
srclen = HFS_NAMELEN;
dst = out;
dstlen = HFS_MAX_NAMELEN;
- if (nls_io) {
+ if (nls_disk) {
wchar_t ch;
while (srclen > 0) {
@@ -57,7 +57,12 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
srclen -= size;
if (ch == '/')
ch = ':';
- size = nls_io->uni2char(ch, dst, dstlen);
+ if (nls_io)
+ size = nls_io->uni2char(ch, dst, dstlen);
+ else if (dstlen > 0)
+ size = utf32_to_utf8(ch, dst, dstlen);
+ else
+ size = -ENAMETOOLONG;
if (size < 0) {
if (size == -ENAMETOOLONG)
goto out;
@@ -101,11 +106,22 @@ void hfs_asc2mac(struct super_block *sb, struct hfs_name *out, const struct qstr
srclen = in->len;
dst = out->name;
dstlen = HFS_NAMELEN;
- if (nls_io) {
+ if (nls_disk) {
wchar_t ch;
+ unicode_t u;
while (srclen > 0) {
- size = nls_io->char2uni(src, srclen, &ch);
+ if (nls_io)
+ size = nls_io->char2uni(src, srclen, &ch);
+ else {
+ size = utf8_to_utf32(str, strlen, &u);
+ if (size >= 0) {
+ if (u <= MAX_WCHAR_T)
+ ch = u;
+ else
+ size = -EINVAL;
+ }
+ }
if (size < 0) {
ch = '?';
size = 1;
--
2.20.1
befs driver already has a code which avoids usage of NLS when befs_sb->nls
is not set.
But befs_fill_super() always set befs_sb->nls, so activating native UTF-8
is not possible.
Fix it by not setting befs_sb->nls when iocharset is set to utf8. So now
after this cgange mount option iocharset=utf8 activates usage of native
UTF-8 code path in befs driver.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/befs/linuxvfs.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 963da3e9ab5d..000f946b92b6 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -770,6 +770,7 @@ static int befs_show_options(struct seq_file *m, struct dentry *root)
{
struct befs_sb_info *befs_sb = BEFS_SB(root->d_sb);
struct befs_mount_options *opts = &befs_sb->mount_opts;
+ struct nls_table *nls = befs_sb->nls;
if (!uid_eq(opts->uid, GLOBAL_ROOT_UID))
seq_printf(m, ",uid=%u",
@@ -777,8 +778,10 @@ static int befs_show_options(struct seq_file *m, struct dentry *root)
if (!gid_eq(opts->gid, GLOBAL_ROOT_GID))
seq_printf(m, ",gid=%u",
from_kgid_munged(&init_user_ns, opts->gid));
- if (opts->iocharset)
- seq_printf(m, ",iocharset=%s", opts->iocharset);
+ if (nls)
+ seq_printf(m, ",iocharset=%s", nls->charset);
+ else
+ seq_puts(m, ",iocharset=utf8");
if (opts->debug)
seq_puts(m, ",debug");
return 0;
@@ -908,8 +911,10 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
goto unacquire_priv_sbp;
}
+ if (strcmp(opt.iocharset ? opt.iocharset : CONFIG_NLS_DEFAULT, "utf8") == 0) {
+ befs_debug(sb, "Using native UTF-8 without nls");
/* load nls library */
- if (befs_sb->mount_opts.iocharset) {
+ } else if (befs_sb->mount_opts.iocharset) {
befs_debug(sb, "Loading nls: %s",
befs_sb->mount_opts.iocharset);
befs_sb->nls = load_nls(befs_sb->mount_opts.iocharset);
--
2.20.1
Mount option is named iocharset= and not charset=
Signed-off-by: Pali Rohár <[email protected]>
---
fs/befs/linuxvfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index c1ba13d19024..ed4d3afb8638 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -778,7 +778,7 @@ static int befs_show_options(struct seq_file *m, struct dentry *root)
seq_printf(m, ",gid=%u",
from_kgid_munged(&init_user_ns, opts->gid));
if (opts->iocharset)
- seq_printf(m, ",charset=%s", opts->iocharset);
+ seq_printf(m, ",iocharset=%s", opts->iocharset);
if (opts->debug)
seq_puts(m, ",debug");
return 0;
--
2.20.1
Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
it is required to use utf8 mount option.
Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
option.
If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
distinguish between UTF-8 and non-UTF-8 it is needed just to check if
s_nls_map set to NULL or not.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/udf/super.c | 50 ++++++++++++++++++------------------------------
fs/udf/udf_sb.h | 2 --
fs/udf/unicode.c | 4 ++--
3 files changed, 21 insertions(+), 35 deletions(-)
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 2f83c1204e20..6e8c29107b04 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -349,10 +349,10 @@ static int udf_show_options(struct seq_file *seq, struct dentry *root)
seq_printf(seq, ",lastblock=%u", sbi->s_last_block);
if (sbi->s_anchor != 0)
seq_printf(seq, ",anchor=%u", sbi->s_anchor);
- if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8))
- seq_puts(seq, ",utf8");
- if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP) && sbi->s_nls_map)
+ if (sbi->s_nls_map)
seq_printf(seq, ",iocharset=%s", sbi->s_nls_map->charset);
+ else
+ seq_puts(seq, ",iocharset=utf8");
return 0;
}
@@ -558,19 +558,24 @@ static int udf_parse_options(char *options, struct udf_options *uopt,
/* Ignored (never implemented properly) */
break;
case Opt_utf8:
- uopt->flags |= (1 << UDF_FLAG_UTF8);
+ if (!remount) {
+ unload_nls(uopt->nls_map);
+ uopt->nls_map = NULL;
+ }
break;
case Opt_iocharset:
if (!remount) {
- if (uopt->nls_map)
- unload_nls(uopt->nls_map);
- /*
- * load_nls() failure is handled later in
- * udf_fill_super() after all options are
- * parsed.
- */
+ unload_nls(uopt->nls_map);
+ uopt->nls_map = NULL;
+ }
+ /* When nls_map is not loaded then UTF-8 is used */
+ if (!remount && strcmp(args[0].from, "utf8") != 0) {
uopt->nls_map = load_nls(args[0].from);
- uopt->flags |= (1 << UDF_FLAG_NLS_MAP);
+ if (!uopt->nls_map) {
+ pr_err("iocharset %s not found\n",
+ args[0].from);
+ return 0;
+ }
}
break;
case Opt_uforget:
@@ -2139,21 +2144,6 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
if (!udf_parse_options((char *)options, &uopt, false))
goto parse_options_failure;
- if (uopt.flags & (1 << UDF_FLAG_UTF8) &&
- uopt.flags & (1 << UDF_FLAG_NLS_MAP)) {
- udf_err(sb, "utf8 cannot be combined with iocharset\n");
- goto parse_options_failure;
- }
- if ((uopt.flags & (1 << UDF_FLAG_NLS_MAP)) && !uopt.nls_map) {
- uopt.nls_map = load_nls_default();
- if (!uopt.nls_map)
- uopt.flags &= ~(1 << UDF_FLAG_NLS_MAP);
- else
- udf_debug("Using default NLS map\n");
- }
- if (!(uopt.flags & (1 << UDF_FLAG_NLS_MAP)))
- uopt.flags |= (1 << UDF_FLAG_UTF8);
-
fileset.logicalBlockNum = 0xFFFFFFFF;
fileset.partitionReferenceNum = 0xFFFF;
@@ -2308,8 +2298,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
error_out:
iput(sbi->s_vat_inode);
parse_options_failure:
- if (uopt.nls_map)
- unload_nls(uopt.nls_map);
+ unload_nls(uopt.nls_map);
if (lvid_open)
udf_close_lvid(sb);
brelse(sbi->s_lvid_bh);
@@ -2359,8 +2348,7 @@ static void udf_put_super(struct super_block *sb)
sbi = UDF_SB(sb);
iput(sbi->s_vat_inode);
- if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
- unload_nls(sbi->s_nls_map);
+ unload_nls(sbi->s_nls_map);
if (!sb_rdonly(sb))
udf_close_lvid(sb);
brelse(sbi->s_lvid_bh);
diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index 758efe557a19..4fa620543d30 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -20,8 +20,6 @@
#define UDF_FLAG_UNDELETE 6
#define UDF_FLAG_UNHIDE 7
#define UDF_FLAG_VARCONV 8
-#define UDF_FLAG_NLS_MAP 9
-#define UDF_FLAG_UTF8 10
#define UDF_FLAG_UID_FORGET 11 /* save -1 for uid to disk */
#define UDF_FLAG_GID_FORGET 12
#define UDF_FLAG_UID_SET 13
diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 5fcfa96463eb..622569007b53 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -177,7 +177,7 @@ static int udf_name_from_CS0(struct super_block *sb,
return 0;
}
- if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
+ if (UDF_SB(sb)->s_nls_map)
conv_f = UDF_SB(sb)->s_nls_map->uni2char;
else
conv_f = NULL;
@@ -285,7 +285,7 @@ static int udf_name_to_CS0(struct super_block *sb,
if (ocu_max_len <= 0)
return 0;
- if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
+ if (UDF_SB(sb)->s_nls_map)
conv_f = UDF_SB(sb)->s_nls_map->char2uni;
else
conv_f = NULL;
--
2.20.1
Other fs drivers are using iocharset= mount option for specifying charset.
So mark iocharset= mount option as preferred and deprecate nls= mount
option.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/ntfs/inode.c | 2 +-
fs/ntfs/super.c | 13 ++++---------
fs/ntfs/unistr.c | 3 ++-
3 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
index 4474adb393ca..3676f185b4a0 100644
--- a/fs/ntfs/inode.c
+++ b/fs/ntfs/inode.c
@@ -2303,7 +2303,7 @@ int ntfs_show_options(struct seq_file *sf, struct dentry *root)
seq_printf(sf, ",fmask=0%o", vol->fmask);
seq_printf(sf, ",dmask=0%o", vol->dmask);
}
- seq_printf(sf, ",nls=%s", vol->nls_map->charset);
+ seq_printf(sf, ",iocharset=%s", vol->nls_map->charset);
if (NVolCaseSensitive(vol))
seq_printf(sf, ",case_sensitive");
if (NVolShowSystemFiles(vol))
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index 0d7e948cb29c..02de1aa05b7c 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -192,11 +192,6 @@ static bool parse_options(ntfs_volume *vol, char *opt)
ntfs_warning(vol->sb, "Ignoring obsolete option %s.",
p);
else if (!strcmp(p, "nls") || !strcmp(p, "iocharset")) {
- if (!strcmp(p, "iocharset"))
- ntfs_warning(vol->sb, "Option iocharset is "
- "deprecated. Please use "
- "option nls=<charsetname> in "
- "the future.");
if (!v || !*v)
goto needs_arg;
use_utf8:
@@ -218,10 +213,10 @@ static bool parse_options(ntfs_volume *vol, char *opt)
} else if (!strcmp(p, "utf8")) {
bool val = false;
ntfs_warning(vol->sb, "Option utf8 is no longer "
- "supported, using option nls=utf8. Please "
- "use option nls=utf8 in the future and "
- "make sure utf8 is compiled either as a "
- "module or into the kernel.");
+ "supported, using option iocharset=utf8. "
+ "Please use option iocharset=utf8 in the "
+ "future and make sure utf8 is compiled "
+ "either as a module or into the kernel.");
if (!v || !*v)
val = true;
else if (!simple_getbool(v, &val))
diff --git a/fs/ntfs/unistr.c b/fs/ntfs/unistr.c
index a6b6c64f14a9..75a7f73bccdd 100644
--- a/fs/ntfs/unistr.c
+++ b/fs/ntfs/unistr.c
@@ -372,7 +372,8 @@ retry: wc = nls->uni2char(le16_to_cpu(ins[i]), ns + o,
conversion_err:
ntfs_error(vol->sb, "Unicode name contains characters that cannot be "
"converted to character set %s. You might want to "
- "try to use the mount option nls=utf8.", nls->charset);
+ "try to use the mount option iocharset=utf8.",
+ nls->charset);
if (ns != *outs)
kfree(ns);
if (wc != -ENAMETOOLONG)
--
2.20.1
Ensure that specified charset in iocharset= mount option is used. On error
correctly propagate error code back to the caller.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/befs/linuxvfs.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e071157bdaa3..963da3e9ab5d 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -914,10 +914,9 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
befs_sb->mount_opts.iocharset);
befs_sb->nls = load_nls(befs_sb->mount_opts.iocharset);
if (!befs_sb->nls) {
- befs_warning(sb, "Cannot load nls %s"
- " loading default nls",
+ befs_error(sb, "Cannot load nls %s",
befs_sb->mount_opts.iocharset);
- befs_sb->nls = load_nls_default();
+ goto unacquire_priv_sbp;
}
/* load default nls if none is specified in mount options */
} else {
--
2.20.1
Mount option is named iocharset= and not charset=
Signed-off-by: Pali Rohár <[email protected]>
---
fs/befs/linuxvfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index ed4d3afb8638..e071157bdaa3 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -678,13 +678,13 @@ static struct dentry *befs_get_parent(struct dentry *child)
}
enum {
- Opt_uid, Opt_gid, Opt_charset, Opt_debug, Opt_err,
+ Opt_uid, Opt_gid, Opt_iocharset, Opt_debug, Opt_err,
};
static const match_table_t befs_tokens = {
{Opt_uid, "uid=%d"},
{Opt_gid, "gid=%d"},
- {Opt_charset, "iocharset=%s"},
+ {Opt_iocharset, "iocharset=%s"},
{Opt_debug, "debug"},
{Opt_err, NULL}
};
@@ -745,7 +745,7 @@ parse_options(char *options, struct befs_mount_options *opts)
opts->gid = gid;
opts->use_gid = 1;
break;
- case Opt_charset:
+ case Opt_iocharset:
kfree(opts->iocharset);
opts->iocharset = match_strdup(&args[0]);
if (!opts->iocharset) {
--
2.20.1
It does not make any sense to set hsb->nls_io (NLS iocharset used between
VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
driver and disk) is not set.
Reverse engineering driver code shown what is doing in this special case:
When codepage was not defined but iocharset was then
hfs driver copied 8bit character from disk directly to
16bit unicode wchar_t type. Which means it did conversion
from Latin1 (ISO-8859-1) to Unicode because first 256
Unicode code points matches 8bit ISO-8859-1 codepage table.
So when iocharset was specified and codepage not, then
codepage used implicit value "iso8859-1".
So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
hsb->nls_disk to "iso8859-1".
Such setup is obviously incompatible with Mac OS systems as they do not
support iso8859-1 encoding for hfs. So print warning into dmesg about this
fact.
After this change hsb->nls_disk is always set, so remove code paths for
case when hsb->nls_disk was not set as they are not needed anymore.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/hfs/super.c | 31 +++++++++++++++++++++++++++++++
fs/hfs/trans.c | 38 ++++++++++++++------------------------
2 files changed, 45 insertions(+), 24 deletions(-)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 12d9bae39363..86bc46746c7f 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -351,6 +351,37 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
}
}
+ if (hsb->nls_io && !hsb->nls_disk) {
+ /*
+ * Previous version of hfs driver did something unexpected:
+ * When codepage was not defined but iocharset was then
+ * hfs driver copied 8bit character from disk directly to
+ * 16bit unicode wchar_t type. Which means it did conversion
+ * from Latin1 (ISO-8859-1) to Unicode because first 256
+ * Unicode code points matches 8bit ISO-8859-1 codepage table.
+ * So when iocharset was specified and codepage not, then
+ * codepage used implicit value "iso8859-1".
+ *
+ * To not change this previous default behavior as some users
+ * may depend on it, we load iso8859-1 NLS table explicitly
+ * to simplify code and make it more reable what happens.
+ *
+ * In context of hfs driver it is really strange to use
+ * ISO-8859-1 codepage table for storing data to disk, but
+ * nothing forbids it. Just it is highly incompatible with
+ * Mac OS systems. So via pr_warn() inform user that this
+ * is not probably what he wants.
+ */
+ pr_warn("iocharset was specified but codepage not, "
+ "using default codepage=iso8859-1\n");
+ pr_warn("this default codepage=iso8859-1 is incompatible with "
+ "Mac OS systems and may be changed in the future");
+ hsb->nls_disk = load_nls("iso8859-1");
+ if (!hsb->nls_disk) {
+ pr_err("unable to load iso8859-1 codepage\n");
+ return 0;
+ }
+ }
if (hsb->nls_disk && !hsb->nls_io) {
hsb->nls_io = load_nls_default();
if (!hsb->nls_io) {
diff --git a/fs/hfs/trans.c b/fs/hfs/trans.c
index 39f5e343bf4d..c75682c61b06 100644
--- a/fs/hfs/trans.c
+++ b/fs/hfs/trans.c
@@ -48,18 +48,13 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
wchar_t ch;
while (srclen > 0) {
- if (nls_disk) {
- size = nls_disk->char2uni(src, srclen, &ch);
- if (size <= 0) {
- ch = '?';
- size = 1;
- }
- src += size;
- srclen -= size;
- } else {
- ch = *src++;
- srclen--;
+ size = nls_disk->char2uni(src, srclen, &ch);
+ if (size <= 0) {
+ ch = '?';
+ size = 1;
}
+ src += size;
+ srclen -= size;
if (ch == '/')
ch = ':';
size = nls_io->uni2char(ch, dst, dstlen);
@@ -119,20 +114,15 @@ void hfs_asc2mac(struct super_block *sb, struct hfs_name *out, const struct qstr
srclen -= size;
if (ch == ':')
ch = '/';
- if (nls_disk) {
- size = nls_disk->uni2char(ch, dst, dstlen);
- if (size < 0) {
- if (size == -ENAMETOOLONG)
- goto out;
- *dst = '?';
- size = 1;
- }
- dst += size;
- dstlen -= size;
- } else {
- *dst++ = ch > 0xff ? '?' : ch;
- dstlen--;
+ size = nls_disk->uni2char(ch, dst, dstlen);
+ if (size < 0) {
+ if (size == -ENAMETOOLONG)
+ goto out;
+ *dst = '?';
+ size = 1;
}
+ dst += size;
+ dstlen -= size;
}
} else {
char ch;
--
2.20.1
Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
it is required to use utf8 mount option.
Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
option.
If UTF-8 as iocharset is used then s_nls_iocharset is set to NULL. So
simplify code around, remove s_utf8 field as to distinguish between UTF-8
and non-UTF-8 it is needed just to check if s_nls_iocharset is set to NULL
or not.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/isofs/inode.c | 27 +++++++++++++--------------
fs/isofs/isofs.h | 1 -
fs/isofs/joliet.c | 4 +---
3 files changed, 14 insertions(+), 18 deletions(-)
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 21edc423b79f..678e2c51b855 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -155,7 +155,6 @@ struct iso9660_options{
unsigned int overriderockperm:1;
unsigned int uid_set:1;
unsigned int gid_set:1;
- unsigned int utf8:1;
unsigned char map;
unsigned char check;
unsigned int blocksize;
@@ -356,7 +355,6 @@ static int parse_options(char *options, struct iso9660_options *popt)
popt->gid = GLOBAL_ROOT_GID;
popt->uid = GLOBAL_ROOT_UID;
popt->iocharset = NULL;
- popt->utf8 = 0;
popt->overriderockperm = 0;
popt->session=-1;
popt->sbsector=-1;
@@ -389,10 +387,13 @@ static int parse_options(char *options, struct iso9660_options *popt)
case Opt_cruft:
popt->cruft = 1;
break;
+#ifdef CONFIG_JOLIET
case Opt_utf8:
- popt->utf8 = 1;
+ kfree(popt->iocharset);
+ popt->iocharset = kstrdup("utf8", GFP_KERNEL);
+ if (!popt->iocharset)
+ return 0;
break;
-#ifdef CONFIG_JOLIET
case Opt_iocharset:
kfree(popt->iocharset);
popt->iocharset = match_strdup(&args[0]);
@@ -495,7 +496,6 @@ static int isofs_show_options(struct seq_file *m, struct dentry *root)
if (sbi->s_nocompress) seq_puts(m, ",nocompress");
if (sbi->s_overriderockperm) seq_puts(m, ",overriderockperm");
if (sbi->s_showassoc) seq_puts(m, ",showassoc");
- if (sbi->s_utf8) seq_puts(m, ",utf8");
if (sbi->s_check) seq_printf(m, ",check=%c", sbi->s_check);
if (sbi->s_mapping) seq_printf(m, ",map=%c", sbi->s_mapping);
@@ -518,9 +518,10 @@ static int isofs_show_options(struct seq_file *m, struct dentry *root)
seq_printf(m, ",fmode=%o", sbi->s_fmode);
#ifdef CONFIG_JOLIET
- if (sbi->s_nls_iocharset &&
- strcmp(sbi->s_nls_iocharset->charset, CONFIG_NLS_DEFAULT) != 0)
+ if (sbi->s_nls_iocharset)
seq_printf(m, ",iocharset=%s", sbi->s_nls_iocharset->charset);
+ else
+ seq_puts(m, ",iocharset=utf8");
#endif
return 0;
}
@@ -863,14 +864,13 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent)
sbi->s_nls_iocharset = NULL;
#ifdef CONFIG_JOLIET
- if (joliet_level && opt.utf8 == 0) {
+ if (joliet_level) {
char *p = opt.iocharset ? opt.iocharset : CONFIG_NLS_DEFAULT;
- sbi->s_nls_iocharset = load_nls(p);
- if (! sbi->s_nls_iocharset) {
- /* Fail only if explicit charset specified */
- if (opt.iocharset)
+ if (strcmp(p, "utf8") != 0) {
+ sbi->s_nls_iocharset = opt.iocharset ?
+ load_nls(opt.iocharset) : load_nls_default();
+ if (!sbi->s_nls_iocharset)
goto out_freesbi;
- sbi->s_nls_iocharset = load_nls_default();
}
}
#endif
@@ -886,7 +886,6 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent)
sbi->s_gid = opt.gid;
sbi->s_uid_set = opt.uid_set;
sbi->s_gid_set = opt.gid_set;
- sbi->s_utf8 = opt.utf8;
sbi->s_nocompress = opt.nocompress;
sbi->s_overriderockperm = opt.overriderockperm;
/*
diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index 055ec6c586f7..dcdc191ed183 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -44,7 +44,6 @@ struct isofs_sb_info {
unsigned char s_session;
unsigned int s_high_sierra:1;
unsigned int s_rock:2;
- unsigned int s_utf8:1;
unsigned int s_cruft:1; /* Broken disks with high byte of length
* containing junk */
unsigned int s_nocompress:1;
diff --git a/fs/isofs/joliet.c b/fs/isofs/joliet.c
index be8b6a9d0b92..c0f04a1e7f69 100644
--- a/fs/isofs/joliet.c
+++ b/fs/isofs/joliet.c
@@ -41,14 +41,12 @@ uni16_to_x8(unsigned char *ascii, __be16 *uni, int len, struct nls_table *nls)
int
get_joliet_filename(struct iso_directory_record * de, unsigned char *outname, struct inode * inode)
{
- unsigned char utf8;
struct nls_table *nls;
unsigned char len = 0;
- utf8 = ISOFS_SB(inode->i_sb)->s_utf8;
nls = ISOFS_SB(inode->i_sb)->s_nls_iocharset;
- if (utf8) {
+ if (!nls) {
len = utf16s_to_utf8s((const wchar_t *) de->name,
de->name_len[0] >> 1, UTF16_BIG_ENDIAN,
outname, PAGE_SIZE);
--
2.20.1
When iocharset= mount option is not specified or when is set to
iocharset=none then jfs driver uses its own custom iso8895-1 encoding
implementation.
NLS already provides iso8895-1 module, so use it instead of custom jfs
iso8859-1 implementation.
Signed-off-by: Pali Rohár <[email protected]>
---
fs/jfs/jfs_unicode.c | 14 +-------------
fs/jfs/super.c | 29 +++++++++++++++++++----------
2 files changed, 20 insertions(+), 23 deletions(-)
diff --git a/fs/jfs/jfs_unicode.c b/fs/jfs/jfs_unicode.c
index 0c1e9027245a..1d0f65d13b58 100644
--- a/fs/jfs/jfs_unicode.c
+++ b/fs/jfs/jfs_unicode.c
@@ -33,13 +33,8 @@ int jfs_strfromUCS_le(char *to, const __le16 * from,
NLS_MAX_CHARSET_SIZE);
if (charlen > 0)
outlen += charlen;
- else
+ else {
to[outlen++] = '?';
- }
- } else {
- for (i = 0; (i < len) && from[i]; i++) {
- if (unlikely(le16_to_cpu(from[i]) & 0xff00)) {
- to[i] = '?';
if (unlikely(warn)) {
warn--;
warn_again--;
@@ -49,12 +44,8 @@ int jfs_strfromUCS_le(char *to, const __le16 * from,
printk(KERN_ERR
"mount with iocharset=utf8 to access\n");
}
-
}
- else
- to[i] = (char) (le16_to_cpu(from[i]));
}
- outlen = i;
}
to[outlen] = 0;
return outlen;
@@ -84,9 +75,6 @@ static int jfs_strtoUCS(wchar_t * to, const unsigned char *from, int len,
return charlen;
}
}
- } else {
- for (i = 0; (i < len) && from[i]; i++)
- to[i] = (wchar_t) from[i];
}
to[i] = 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 9030aeaf0f88..8ba2ac032292 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -231,7 +231,7 @@ static const match_table_t tokens = {
};
static int parse_options(char *options, struct super_block *sb, s64 *newLVSize,
- int *flag)
+ int *flag, int remount)
{
void *nls_map = (void *)-1; /* -1: no change; NULL: none */
char *p;
@@ -263,14 +263,14 @@ static int parse_options(char *options, struct super_block *sb, s64 *newLVSize,
case Opt_iocharset:
if (nls_map && nls_map != (void *) -1)
unload_nls(nls_map);
- if (!strcmp(args[0].from, "none"))
- nls_map = NULL;
- else {
+ /* compatibility alias none means ISO-8859-1 */
+ if (strcmp(args[0].from, "none") == 0)
+ nls_map = load_nls("iso8859-1");
+ else
nls_map = load_nls(args[0].from);
- if (!nls_map) {
- pr_err("JFS: charset not found\n");
- goto cleanup;
- }
+ if (!nls_map) {
+ pr_err("JFS: charset not found\n");
+ goto cleanup;
}
break;
case Opt_resize:
@@ -414,6 +414,15 @@ static int parse_options(char *options, struct super_block *sb, s64 *newLVSize,
}
}
+ if (!remount && nls_map == (void *) -1) {
+ /* Previously default NLS table was ISO-8859-1 */
+ nls_map = load_nls("iso8859-1");
+ if (!nls_map) {
+ pr_err("JFS: iso8859-1 charset not found\n");
+ goto cleanup;
+ }
+ }
+
if (nls_map != (void *) -1) {
/* Discard old (if remount) */
unload_nls(sbi->nls_tab);
@@ -435,7 +444,7 @@ static int jfs_remount(struct super_block *sb, int *flags, char *data)
int ret;
sync_filesystem(sb);
- if (!parse_options(data, sb, &newLVSize, &flag))
+ if (!parse_options(data, sb, &newLVSize, &flag, 1))
return -EINVAL;
if (newLVSize) {
@@ -513,7 +522,7 @@ static int jfs_fill_super(struct super_block *sb, void *data, int silent)
/* initialize the mount flag and determine the default error handler */
flag = JFS_ERR_REMOUNT_RO;
- if (!parse_options((char *) data, sb, &newLVSize, &flag))
+ if (!parse_options((char *) data, sb, &newLVSize, &flag, 0))
goto out_kfree;
sbi->flag = flag;
--
2.20.1
> On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
>
> It does not make any sense to set hsb->nls_io (NLS iocharset used between
> VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
> driver and disk) is not set.
>
> Reverse engineering driver code shown what is doing in this special case:
>
> When codepage was not defined but iocharset was then
> hfs driver copied 8bit character from disk directly to
> 16bit unicode wchar_t type. Which means it did conversion
> from Latin1 (ISO-8859-1) to Unicode because first 256
> Unicode code points matches 8bit ISO-8859-1 codepage table.
> So when iocharset was specified and codepage not, then
> codepage used implicit value "iso8859-1".
>
> So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
> hsb->nls_disk to "iso8859-1".
>
> Such setup is obviously incompatible with Mac OS systems as they do not
> support iso8859-1 encoding for hfs. So print warning into dmesg about this
> fact.
>
> After this change hsb->nls_disk is always set, so remove code paths for
> case when hsb->nls_disk was not set as they are not needed anymore.
>
Sounds reasonable. But it will be great to know that the change has been tested reasonably well.
Thanks,
Slava.
> Signed-off-by: Pali Rohár <[email protected]>
> ---
> fs/hfs/super.c | 31 +++++++++++++++++++++++++++++++
> fs/hfs/trans.c | 38 ++++++++++++++------------------------
> 2 files changed, 45 insertions(+), 24 deletions(-)
>
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index 12d9bae39363..86bc46746c7f 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -351,6 +351,37 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> }
> }
>
> + if (hsb->nls_io && !hsb->nls_disk) {
> + /*
> + * Previous version of hfs driver did something unexpected:
> + * When codepage was not defined but iocharset was then
> + * hfs driver copied 8bit character from disk directly to
> + * 16bit unicode wchar_t type. Which means it did conversion
> + * from Latin1 (ISO-8859-1) to Unicode because first 256
> + * Unicode code points matches 8bit ISO-8859-1 codepage table.
> + * So when iocharset was specified and codepage not, then
> + * codepage used implicit value "iso8859-1".
> + *
> + * To not change this previous default behavior as some users
> + * may depend on it, we load iso8859-1 NLS table explicitly
> + * to simplify code and make it more reable what happens.
> + *
> + * In context of hfs driver it is really strange to use
> + * ISO-8859-1 codepage table for storing data to disk, but
> + * nothing forbids it. Just it is highly incompatible with
> + * Mac OS systems. So via pr_warn() inform user that this
> + * is not probably what he wants.
> + */
> + pr_warn("iocharset was specified but codepage not, "
> + "using default codepage=iso8859-1\n");
> + pr_warn("this default codepage=iso8859-1 is incompatible with "
> + "Mac OS systems and may be changed in the future");
> + hsb->nls_disk = load_nls("iso8859-1");
> + if (!hsb->nls_disk) {
> + pr_err("unable to load iso8859-1 codepage\n");
> + return 0;
> + }
> + }
> if (hsb->nls_disk && !hsb->nls_io) {
> hsb->nls_io = load_nls_default();
> if (!hsb->nls_io) {
> diff --git a/fs/hfs/trans.c b/fs/hfs/trans.c
> index 39f5e343bf4d..c75682c61b06 100644
> --- a/fs/hfs/trans.c
> +++ b/fs/hfs/trans.c
> @@ -48,18 +48,13 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
> wchar_t ch;
>
> while (srclen > 0) {
> - if (nls_disk) {
> - size = nls_disk->char2uni(src, srclen, &ch);
> - if (size <= 0) {
> - ch = '?';
> - size = 1;
> - }
> - src += size;
> - srclen -= size;
> - } else {
> - ch = *src++;
> - srclen--;
> + size = nls_disk->char2uni(src, srclen, &ch);
> + if (size <= 0) {
> + ch = '?';
> + size = 1;
> }
> + src += size;
> + srclen -= size;
> if (ch == '/')
> ch = ':';
> size = nls_io->uni2char(ch, dst, dstlen);
> @@ -119,20 +114,15 @@ void hfs_asc2mac(struct super_block *sb, struct hfs_name *out, const struct qstr
> srclen -= size;
> if (ch == ':')
> ch = '/';
> - if (nls_disk) {
> - size = nls_disk->uni2char(ch, dst, dstlen);
> - if (size < 0) {
> - if (size == -ENAMETOOLONG)
> - goto out;
> - *dst = '?';
> - size = 1;
> - }
> - dst += size;
> - dstlen -= size;
> - } else {
> - *dst++ = ch > 0xff ? '?' : ch;
> - dstlen--;
> + size = nls_disk->uni2char(ch, dst, dstlen);
> + if (size < 0) {
> + if (size == -ENAMETOOLONG)
> + goto out;
> + *dst = '?';
> + size = 1;
> }
> + dst += size;
> + dstlen -= size;
> }
> } else {
> char ch;
> --
> 2.20.1
>
On Mon, Aug 09, 2021 at 10:31:55AM -0700, Viacheslav Dubeyko wrote:
> > On Aug 8, 2021, at 9:24 AM, Pali Roh?r <[email protected]> wrote:
> >
> > It does not make any sense to set hsb->nls_io (NLS iocharset used between
> > VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
> > driver and disk) is not set.
> >
> > Reverse engineering driver code shown what is doing in this special case:
> >
> > When codepage was not defined but iocharset was then
> > hfs driver copied 8bit character from disk directly to
> > 16bit unicode wchar_t type. Which means it did conversion
> > from Latin1 (ISO-8859-1) to Unicode because first 256
> > Unicode code points matches 8bit ISO-8859-1 codepage table.
> > So when iocharset was specified and codepage not, then
> > codepage used implicit value "iso8859-1".
> >
> > So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
> > hsb->nls_disk to "iso8859-1".
> >
> > Such setup is obviously incompatible with Mac OS systems as they do not
> > support iso8859-1 encoding for hfs. So print warning into dmesg about this
> > fact.
> >
> > After this change hsb->nls_disk is always set, so remove code paths for
> > case when hsb->nls_disk was not set as they are not needed anymore.
>
>
> Sounds reasonable. But it will be great to know that the change has been tested reasonably well.
I don't think it's reasonable to ask Pali to test every single filesystem.
That's something the maintainer should do, as you're more likely to have
the infrastructure already set up to do testing of your filesystem and
be aware of fun corner cases and use cases than someone who's working
across all filesystems.
On Monday 09 August 2021 18:37:19 Matthew Wilcox wrote:
> On Mon, Aug 09, 2021 at 10:31:55AM -0700, Viacheslav Dubeyko wrote:
> > > On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
> > >
> > > It does not make any sense to set hsb->nls_io (NLS iocharset used between
> > > VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
> > > driver and disk) is not set.
> > >
> > > Reverse engineering driver code shown what is doing in this special case:
> > >
> > > When codepage was not defined but iocharset was then
> > > hfs driver copied 8bit character from disk directly to
> > > 16bit unicode wchar_t type. Which means it did conversion
> > > from Latin1 (ISO-8859-1) to Unicode because first 256
> > > Unicode code points matches 8bit ISO-8859-1 codepage table.
> > > So when iocharset was specified and codepage not, then
> > > codepage used implicit value "iso8859-1".
> > >
> > > So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
> > > hsb->nls_disk to "iso8859-1".
> > >
> > > Such setup is obviously incompatible with Mac OS systems as they do not
> > > support iso8859-1 encoding for hfs. So print warning into dmesg about this
> > > fact.
> > >
> > > After this change hsb->nls_disk is always set, so remove code paths for
> > > case when hsb->nls_disk was not set as they are not needed anymore.
> >
> >
> > Sounds reasonable. But it will be great to know that the change has been tested reasonably well.
>
> I don't think it's reasonable to ask Pali to test every single filesystem.
> That's something the maintainer should do, as you're more likely to have
> the infrastructure already set up to do testing of your filesystem and
> be aware of fun corner cases and use cases than someone who's working
> across all filesystems.
This patch series is currently in RFC form, as stated in cover letter
mostly untested. So they are not in form for merging or detailed
reviewing. I just would like to know if this is the right direction with
filesystems and if I should continue with this my effort or not.
And I thought that sending RFC "incomplete" patches is better way than
just describing what to do and how...
> On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
>
> Other fs drivers are using iocharset= mount option for specifying charset.
> So add it also for hfsplus and mark old nls= mount option as deprecated.
>
> Signed-off-by: Pali Rohár <[email protected]>
> ---
> fs/hfsplus/options.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
> index 047e05c57560..a975548f6b91 100644
> --- a/fs/hfsplus/options.c
> +++ b/fs/hfsplus/options.c
> @@ -23,6 +23,7 @@ enum {
> opt_creator, opt_type,
> opt_umask, opt_uid, opt_gid,
> opt_part, opt_session, opt_nls,
> + opt_iocharset,
> opt_nodecompose, opt_decompose,
> opt_barrier, opt_nobarrier,
> opt_force, opt_err
> @@ -37,6 +38,7 @@ static const match_table_t tokens = {
> { opt_part, "part=%u" },
> { opt_session, "session=%u" },
> { opt_nls, "nls=%s" },
> + { opt_iocharset, "iocharset=%s" },
> { opt_decompose, "decompose" },
> { opt_nodecompose, "nodecompose" },
> { opt_barrier, "barrier" },
> @@ -166,6 +168,9 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> }
> break;
> case opt_nls:
> + pr_warn("option nls= is deprecated, use iocharset=\n");
> + /* fallthrough */
> + case opt_iocharset:
> if (sbi->nls) {
> pr_err("unable to change nls mapping\n");
> return 0;
> @@ -230,7 +235,7 @@ int hfsplus_show_options(struct seq_file *seq, struct dentry *root)
> if (sbi->session >= 0)
> seq_printf(seq, ",session=%u", sbi->session);
> if (sbi->nls)
> - seq_printf(seq, ",nls=%s", sbi->nls->charset);
> + seq_printf(seq, ",iocharset=%s", sbi->nls->charset);
> if (test_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags))
> seq_puts(seq, ",nodecompose");
> if (test_bit(HFSPLUS_SB_NOBARRIER, &sbi->flags))
> --
> 2.20.1
>
Looks reasonable. But I would like to be sure that the code has been reasonably tested.
Thanks,
Slava.
> On Aug 9, 2021, at 10:37 AM, Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Aug 09, 2021 at 10:31:55AM -0700, Viacheslav Dubeyko wrote:
>>> On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
>>>
>>> It does not make any sense to set hsb->nls_io (NLS iocharset used between
>>> VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
>>> driver and disk) is not set.
>>>
>>> Reverse engineering driver code shown what is doing in this special case:
>>>
>>> When codepage was not defined but iocharset was then
>>> hfs driver copied 8bit character from disk directly to
>>> 16bit unicode wchar_t type. Which means it did conversion
>>> from Latin1 (ISO-8859-1) to Unicode because first 256
>>> Unicode code points matches 8bit ISO-8859-1 codepage table.
>>> So when iocharset was specified and codepage not, then
>>> codepage used implicit value "iso8859-1".
>>>
>>> So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
>>> hsb->nls_disk to "iso8859-1".
>>>
>>> Such setup is obviously incompatible with Mac OS systems as they do not
>>> support iso8859-1 encoding for hfs. So print warning into dmesg about this
>>> fact.
>>>
>>> After this change hsb->nls_disk is always set, so remove code paths for
>>> case when hsb->nls_disk was not set as they are not needed anymore.
>>
>>
>> Sounds reasonable. But it will be great to know that the change has been tested reasonably well.
>
> I don't think it's reasonable to ask Pali to test every single filesystem.
> That's something the maintainer should do, as you're more likely to have
> the infrastructure already set up to do testing of your filesystem and
> be aware of fun corner cases and use cases than someone who's working
> across all filesystems.
I see the point. But the whole approach needs to be tested as minimum for one particular file system. :) And it could be any favorite one.
Thanks,
Slava.
> On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
>
> NLS table for utf8 is broken and cannot be fixed.
>
> So instead of broken utf8 nls functions char2uni() and uni2char() use
> functions utf8_to_utf32() and utf32_to_utf8() which implements correct
> encoding and decoding between Unicode code points and UTF-8 sequence.
>
> Note that this fs driver does not support full Unicode range, specially
> UTF-16 surrogate pairs are unsupported. This patch does not change this
> limitation and support for UTF-16 surrogate pairs stay unimplemented.
>
> When iochatset=utf8 is used then set sbi->nls to NULL and use it for
> distinguish between the fact if NLS table or native UTF-8 functions should
> be used.
>
> Signed-off-by: Pali Rohár <[email protected]>
> ---
> fs/hfsplus/dir.c | 6 ++++--
> fs/hfsplus/options.c | 32 ++++++++++++++++++--------------
> fs/hfsplus/super.c | 7 +------
> fs/hfsplus/unicode.c | 31 ++++++++++++++++++++++++++++---
> fs/hfsplus/xattr.c | 14 +++++++++-----
> fs/hfsplus/xattr_security.c | 3 ++-
> 6 files changed, 62 insertions(+), 31 deletions(-)
>
> diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
> index 84714bbccc12..2caf0cd82221 100644
> --- a/fs/hfsplus/dir.c
> +++ b/fs/hfsplus/dir.c
> @@ -144,7 +144,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
> if (err)
> return err;
> - strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
> + strbuf = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
Maybe, introduce some variable that will contain the length calculation?
> if (!strbuf) {
> err = -ENOMEM;
> goto out;
> @@ -203,7 +204,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
> fd.entrylength);
> type = be16_to_cpu(entry.type);
> - len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
> + len = (HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_MAX_STRLEN;
> err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
> if (err)
> goto out;
> diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
> index a975548f6b91..16c08cb5c4f8 100644
> --- a/fs/hfsplus/options.c
> +++ b/fs/hfsplus/options.c
> @@ -104,6 +104,9 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> char *p;
> substring_t args[MAX_OPT_ARGS];
> int tmp, token;
> + int have_iocharset;
> +
> + have_iocharset = 0;
What’s about boolean type and to use true/false?
>
> if (!input)
> goto done;
> @@ -171,20 +174,24 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> pr_warn("option nls= is deprecated, use iocharset=\n");
> /* fallthrough */
> case opt_iocharset:
> - if (sbi->nls) {
> + if (have_iocharset) {
> pr_err("unable to change nls mapping\n");
> return 0;
> }
> p = match_strdup(&args[0]);
> - if (p)
> - sbi->nls = load_nls(p);
> - if (!sbi->nls) {
> - pr_err("unable to load nls mapping \"%s\"\n",
> - p);
> - kfree(p);
> + if (!p)
> return 0;
> + if (strcmp(p, "utf8") != 0) {
> + sbi->nls = load_nls(p);
> + if (!sbi->nls) {
> + pr_err("unable to load nls mapping "
> + "\"%s\"\n", p);
> + kfree(p);
> + return 0;
> + }
> }
> kfree(p);
> + have_iocharset = 1;
Ditto. What’s about true here?
> break;
> case opt_decompose:
> clear_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
> @@ -207,13 +214,10 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> }
>
> done:
> - if (!sbi->nls) {
> - /* try utf8 first, as this is the old default behaviour */
> - sbi->nls = load_nls("utf8");
> - if (!sbi->nls)
> - sbi->nls = load_nls_default();
> - if (!sbi->nls)
> - return 0;
> + if (!have_iocharset) {
> + /* use utf8, as this is the old default behaviour */
> + pr_debug("using native UTF-8 without nls\n");
> + /* no sbi->nls means that native UTF-8 code is used */
> }
>
> return 1;
> diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
> index b9e3db3f855f..985662451bfc 100644
> --- a/fs/hfsplus/super.c
> +++ b/fs/hfsplus/super.c
> @@ -403,11 +403,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
>
> /* temporarily use utf8 to correctly find the hidden dir below */
> nls = sbi->nls;
> - sbi->nls = load_nls("utf8");
> - if (!sbi->nls) {
> - pr_err("unable to load nls for utf8\n");
> - goto out_unload_nls;
> - }
> + sbi->nls = NULL;
>
> /* Grab the volume header */
> if (hfsplus_read_wrapper(sb)) {
> @@ -585,7 +581,6 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
> }
> }
>
> - unload_nls(sbi->nls);
> sbi->nls = nls;
> return 0;
>
> diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
> index 73342c925a4b..1d8c31c5126f 100644
> --- a/fs/hfsplus/unicode.c
> +++ b/fs/hfsplus/unicode.c
> @@ -190,7 +190,12 @@ int hfsplus_uni2asc(struct super_block *sb,
> c0 = ':';
> break;
> }
> - res = nls->uni2char(c0, op, len);
> + if (nls)
> + res = nls->uni2char(c0, op, len);
> + else (len > 0)
> + res = utf32_to_utf8(c0, op, len);
> + else
> + res = -ENAMETOOLONG;
> if (res < 0) {
> if (res == -ENAMETOOLONG)
> goto out;
> @@ -233,7 +238,12 @@ int hfsplus_uni2asc(struct super_block *sb,
> cc = c0;
> }
> done:
> - res = nls->uni2char(cc, op, len);
> + if (nls)
> + res = nls->uni2char(cc, op, len);
> + else (len > 0)
> + res = utf32_to_utf8(cc, op, len);
> + else
> + res = -ENAMETOOLONG;
> if (res < 0) {
> if (res == -ENAMETOOLONG)
> goto out;
> @@ -256,7 +266,22 @@ int hfsplus_uni2asc(struct super_block *sb,
> static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
> wchar_t *uc)
> {
> - int size = HFSPLUS_SB(sb)->nls->char2uni(astr, len, uc);
> + struct nls_table *nls = HFSPLUS_SB(sb)->nls;
> + unicode_t u;
> + int size;
> +
> + if (nls)
> + size = nls->char2uni(astr, len, uc);
> + else {
> + size = utf8_to_utf32(astr, len, &u);
> + if (size >= 0) {
> + /* TODO: Add support for UTF-16 surrogate pairs */
Have you forgot to delete this string? Or do you plan to implement this?
> + if (u <= MAX_WCHAR_T)
> + *uc = u;
> + else
> + size = -EINVAL;
> + }
> + }
> if (size <= 0) {
> *uc = '?';
> size = 1;
> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
> index e2855ceefd39..9b2653f08a5f 100644
> --- a/fs/hfsplus/xattr.c
> +++ b/fs/hfsplus/xattr.c
> @@ -425,7 +425,8 @@ int hfsplus_setxattr(struct inode *inode, const char *name,
> char *xattr_name;
> int res;
>
> - xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
> + xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_ATTR_MAX_STRLEN + 1,
> GFP_KERNEL);
What’s about to introduce a variable for length calculation?
> if (!xattr_name)
> return -ENOMEM;
> @@ -579,7 +580,8 @@ ssize_t hfsplus_getxattr(struct inode *inode, const char *name,
> int res;
> char *xattr_name;
>
> - xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
> + xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_ATTR_MAX_STRLEN + 1,
> GFP_KERNEL);
Ditto. What’s about to introduce a variable for length calculation?
> if (!xattr_name)
> return -ENOMEM;
> @@ -699,8 +701,9 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> return err;
> }
>
> - strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN +
> - XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL);
> + strbuf = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_ATTR_MAX_STRLEN + XATTR_MAC_OSX_PREFIX_LEN + 1,
> + GFP_KERNEL);
Ditto. What’s about to introduce a variable for length calculation?
> if (!strbuf) {
> res = -ENOMEM;
> goto out;
> @@ -732,7 +735,8 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> if (be32_to_cpu(attr_key.cnid) != inode->i_ino)
> goto end_listxattr;
>
> - xattr_name_len = NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN;
> + xattr_name_len = (HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4)
> + * HFSPLUS_ATTR_MAX_STRLEN;
> if (hfsplus_uni2asc(inode->i_sb,
> (const struct hfsplus_unistr *)&fd.key->attr.key_name,
> strbuf, &xattr_name_len)) {
> diff --git a/fs/hfsplus/xattr_security.c b/fs/hfsplus/xattr_security.c
> index c1c7a16cbf21..438ebcd1359b 100644
> --- a/fs/hfsplus/xattr_security.c
> +++ b/fs/hfsplus/xattr_security.c
> @@ -41,7 +41,8 @@ static int hfsplus_initxattrs(struct inode *inode,
> char *xattr_name;
> int err = 0;
>
> - xattr_name = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + 1,
> + xattr_name = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> + HFSPLUS_ATTR_MAX_STRLEN + 1,
> GFP_KERNEL);
Ditto. What’s about to introduce a variable for length calculation?
Thanks,
Slava.
> if (!xattr_name)
> return -ENOMEM;
> --
> 2.20.1
>
> On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
>
> NLS table for utf8 is broken and cannot be fixed.
>
> So instead of broken utf8 nls functions char2uni() and uni2char() use
> functions utf8_to_utf32() and utf32_to_utf8() which implements correct
> encoding and decoding between Unicode code points and UTF-8 sequence.
>
> When iochatset=utf8 is used then set hsb->nls_io to NULL and use it for
> distinguish between the fact if NLS table or native UTF-8 functions should
> be used.
>
> Signed-off-by: Pali Rohár <[email protected]>
> ---
> fs/hfs/super.c | 33 ++++++++++++++++++++++-----------
> fs/hfs/trans.c | 24 ++++++++++++++++++++----
> 2 files changed, 42 insertions(+), 15 deletions(-)
>
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index 86bc46746c7f..076308df41cf 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -149,10 +149,13 @@ static int hfs_show_options(struct seq_file *seq, struct dentry *root)
> seq_printf(seq, ",part=%u", sbi->part);
> if (sbi->session >= 0)
> seq_printf(seq, ",session=%u", sbi->session);
> - if (sbi->nls_disk)
> + if (sbi->nls_disk) {
> seq_printf(seq, ",codepage=%s", sbi->nls_disk->charset);
Maybe, I am missing something. But where is the closing “}”?
> - if (sbi->nls_io)
> - seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
> + if (sbi->nls_io)
> + seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
> + else
> + seq_puts(seq, ",iocharset=utf8");
> + }
> if (sbi->s_quiet)
> seq_printf(seq, ",quiet");
> return 0;
> @@ -225,6 +228,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> char *p;
> substring_t args[MAX_OPT_ARGS];
> int tmp, token;
> + int have_iocharset;
What’s about boolean type?
>
> /* initialize the sb with defaults */
> hsb->s_uid = current_uid();
> @@ -239,6 +243,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> if (!options)
> return 1;
>
> + have_iocharset = 0;
What’s about false here?
> +
> while ((p = strsep(&options, ",")) != NULL) {
> if (!*p)
> continue;
> @@ -332,18 +338,22 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> kfree(p);
> break;
> case opt_iocharset:
> - if (hsb->nls_io) {
> + if (have_iocharset) {
> pr_err("unable to change iocharset\n");
> return 0;
> }
> p = match_strdup(&args[0]);
> - if (p)
> - hsb->nls_io = load_nls(p);
> - if (!hsb->nls_io) {
> - pr_err("unable to load iocharset \"%s\"\n", p);
> - kfree(p);
> + if (!p)
> return 0;
> + if (strcmp(p, "utf8") != 0) {
> + hsb->nls_io = load_nls(p);
> + if (!hsb->nls_io) {
> + pr_err("unable to load iocharset \"%s\"\n", p);
> + kfree(p);
> + return 0;
> + }
> }
> + have_iocharset = 1;
What’s about true here?
> kfree(p);
> break;
> default:
> @@ -351,7 +361,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> }
> }
>
> - if (hsb->nls_io && !hsb->nls_disk) {
> + if (have_iocharset && !hsb->nls_disk) {
> /*
> * Previous version of hfs driver did something unexpected:
> * When codepage was not defined but iocharset was then
> @@ -382,7 +392,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> return 0;
> }
> }
> - if (hsb->nls_disk && !hsb->nls_io) {
> + if (hsb->nls_disk &&
> + !have_iocharset && strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0) {
Maybe, introduce the variable to calculate the boolean value here? Then if statement will look much cleaner.
> hsb->nls_io = load_nls_default();
> if (!hsb->nls_io) {
> pr_err("unable to load default iocharset\n");
> diff --git a/fs/hfs/trans.c b/fs/hfs/trans.c
> index c75682c61b06..bff8e54003ab 100644
> --- a/fs/hfs/trans.c
> +++ b/fs/hfs/trans.c
> @@ -44,7 +44,7 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
> srclen = HFS_NAMELEN;
> dst = out;
> dstlen = HFS_MAX_NAMELEN;
> - if (nls_io) {
> + if (nls_disk) {
> wchar_t ch;
>
I could miss something here. But what’s about the closing “}”?
Thanks,
Slava.
> while (srclen > 0) {
> @@ -57,7 +57,12 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
> srclen -= size;
> if (ch == '/')
> ch = ':';
> - size = nls_io->uni2char(ch, dst, dstlen);
> + if (nls_io)
> + size = nls_io->uni2char(ch, dst, dstlen);
> + else if (dstlen > 0)
> + size = utf32_to_utf8(ch, dst, dstlen);
> + else
> + size = -ENAMETOOLONG;
> if (size < 0) {
> if (size == -ENAMETOOLONG)
> goto out;
> @@ -101,11 +106,22 @@ void hfs_asc2mac(struct super_block *sb, struct hfs_name *out, const struct qstr
> srclen = in->len;
> dst = out->name;
> dstlen = HFS_NAMELEN;
> - if (nls_io) {
> + if (nls_disk) {
> wchar_t ch;
> + unicode_t u;
>
> while (srclen > 0) {
> - size = nls_io->char2uni(src, srclen, &ch);
> + if (nls_io)
> + size = nls_io->char2uni(src, srclen, &ch);
> + else {
> + size = utf8_to_utf32(str, strlen, &u);
> + if (size >= 0) {
> + if (u <= MAX_WCHAR_T)
> + ch = u;
> + else
> + size = -EINVAL;
> + }
> + }
> if (size < 0) {
> ch = '?';
> size = 1;
> --
> 2.20.1
>
On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Roh?r wrote:
> Other fs drivers are using iocharset= mount option for specifying charset.
> So mark iocharset= mount option as preferred and deprecate nls= mount
> option.
Documentation needs to also be updated here.
For cifs.ko, I don't mind running our automated regression tests on
this patch when the patch (or patches) is ready, but was thinking
about an earlier discussion a few months about parth conversion in
cifs.ko prompted by Al Viro, and whether additional changes should be
made to move the character conversion later as well (e.g. for
characters in the reserved range such as '\' to 0xF026, and'':' to
0xF022 and '>' to 0xF024 and '?' to 0xF025 etc) for the 10 special
characters which have to get remapped into the UCS-2 reserved
character range.
On Mon, Aug 9, 2021 at 12:49 PM Pali Rohár <[email protected]> wrote:
>
> On Monday 09 August 2021 18:37:19 Matthew Wilcox wrote:
> > On Mon, Aug 09, 2021 at 10:31:55AM -0700, Viacheslav Dubeyko wrote:
> > > > On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
> > > >
> > > > It does not make any sense to set hsb->nls_io (NLS iocharset used between
> > > > VFS and hfs driver) when hsb->nls_disk (NLS codepage used between hfs
> > > > driver and disk) is not set.
> > > >
> > > > Reverse engineering driver code shown what is doing in this special case:
> > > >
> > > > When codepage was not defined but iocharset was then
> > > > hfs driver copied 8bit character from disk directly to
> > > > 16bit unicode wchar_t type. Which means it did conversion
> > > > from Latin1 (ISO-8859-1) to Unicode because first 256
> > > > Unicode code points matches 8bit ISO-8859-1 codepage table.
> > > > So when iocharset was specified and codepage not, then
> > > > codepage used implicit value "iso8859-1".
> > > >
> > > > So when hsb->nls_disk is not set and hsb->nls_io is then explicitly set
> > > > hsb->nls_disk to "iso8859-1".
> > > >
> > > > Such setup is obviously incompatible with Mac OS systems as they do not
> > > > support iso8859-1 encoding for hfs. So print warning into dmesg about this
> > > > fact.
> > > >
> > > > After this change hsb->nls_disk is always set, so remove code paths for
> > > > case when hsb->nls_disk was not set as they are not needed anymore.
> > >
> > >
> > > Sounds reasonable. But it will be great to know that the change has been tested reasonably well.
> >
> > I don't think it's reasonable to ask Pali to test every single filesystem.
> > That's something the maintainer should do, as you're more likely to have
> > the infrastructure already set up to do testing of your filesystem and
> > be aware of fun corner cases and use cases than someone who's working
> > across all filesystems.
>
> This patch series is currently in RFC form, as stated in cover letter
> mostly untested. So they are not in form for merging or detailed
> reviewing. I just would like to know if this is the right direction with
> filesystems and if I should continue with this my effort or not.
> And I thought that sending RFC "incomplete" patches is better way than
> just describing what to do and how...
--
Thanks,
Steve
On Sun, Aug 08, 2021 at 06:24:35PM +0200, Pali Roh?r wrote:
> Other fs drivers are using iocharset= mount option for specifying charset.
> So add it also for hfsplus and mark old nls= mount option as deprecated.
It would be good to also update Documentation/filesystems/hfsplus.rst.
On Monday 09 August 2021 23:49:21 Kari Argillander wrote:
> On Sun, Aug 08, 2021 at 06:24:35PM +0200, Pali Rohár wrote:
> > Other fs drivers are using iocharset= mount option for specifying charset.
> > So add it also for hfsplus and mark old nls= mount option as deprecated.
>
> It would be good to also update Documentation/filesystems/hfsplus.rst.
Good point! I'm making a note.
On Sun 08-08-21 18:24:36, Pali Roh?r wrote:
> Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> it is required to use utf8 mount option.
>
> Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> option.
>
> If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
> code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
> distinguish between UTF-8 and non-UTF-8 it is needed just to check if
> s_nls_map set to NULL or not.
>
> Signed-off-by: Pali Roh?r <[email protected]>
Thanks for the cleanup. It looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Or should I take this patch through my tree?
Honza
> ---
> fs/udf/super.c | 50 ++++++++++++++++++------------------------------
> fs/udf/udf_sb.h | 2 --
> fs/udf/unicode.c | 4 ++--
> 3 files changed, 21 insertions(+), 35 deletions(-)
>
> diff --git a/fs/udf/super.c b/fs/udf/super.c
> index 2f83c1204e20..6e8c29107b04 100644
> --- a/fs/udf/super.c
> +++ b/fs/udf/super.c
> @@ -349,10 +349,10 @@ static int udf_show_options(struct seq_file *seq, struct dentry *root)
> seq_printf(seq, ",lastblock=%u", sbi->s_last_block);
> if (sbi->s_anchor != 0)
> seq_printf(seq, ",anchor=%u", sbi->s_anchor);
> - if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8))
> - seq_puts(seq, ",utf8");
> - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP) && sbi->s_nls_map)
> + if (sbi->s_nls_map)
> seq_printf(seq, ",iocharset=%s", sbi->s_nls_map->charset);
> + else
> + seq_puts(seq, ",iocharset=utf8");
>
> return 0;
> }
> @@ -558,19 +558,24 @@ static int udf_parse_options(char *options, struct udf_options *uopt,
> /* Ignored (never implemented properly) */
> break;
> case Opt_utf8:
> - uopt->flags |= (1 << UDF_FLAG_UTF8);
> + if (!remount) {
> + unload_nls(uopt->nls_map);
> + uopt->nls_map = NULL;
> + }
> break;
> case Opt_iocharset:
> if (!remount) {
> - if (uopt->nls_map)
> - unload_nls(uopt->nls_map);
> - /*
> - * load_nls() failure is handled later in
> - * udf_fill_super() after all options are
> - * parsed.
> - */
> + unload_nls(uopt->nls_map);
> + uopt->nls_map = NULL;
> + }
> + /* When nls_map is not loaded then UTF-8 is used */
> + if (!remount && strcmp(args[0].from, "utf8") != 0) {
> uopt->nls_map = load_nls(args[0].from);
> - uopt->flags |= (1 << UDF_FLAG_NLS_MAP);
> + if (!uopt->nls_map) {
> + pr_err("iocharset %s not found\n",
> + args[0].from);
> + return 0;
> + }
> }
> break;
> case Opt_uforget:
> @@ -2139,21 +2144,6 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
> if (!udf_parse_options((char *)options, &uopt, false))
> goto parse_options_failure;
>
> - if (uopt.flags & (1 << UDF_FLAG_UTF8) &&
> - uopt.flags & (1 << UDF_FLAG_NLS_MAP)) {
> - udf_err(sb, "utf8 cannot be combined with iocharset\n");
> - goto parse_options_failure;
> - }
> - if ((uopt.flags & (1 << UDF_FLAG_NLS_MAP)) && !uopt.nls_map) {
> - uopt.nls_map = load_nls_default();
> - if (!uopt.nls_map)
> - uopt.flags &= ~(1 << UDF_FLAG_NLS_MAP);
> - else
> - udf_debug("Using default NLS map\n");
> - }
> - if (!(uopt.flags & (1 << UDF_FLAG_NLS_MAP)))
> - uopt.flags |= (1 << UDF_FLAG_UTF8);
> -
> fileset.logicalBlockNum = 0xFFFFFFFF;
> fileset.partitionReferenceNum = 0xFFFF;
>
> @@ -2308,8 +2298,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
> error_out:
> iput(sbi->s_vat_inode);
> parse_options_failure:
> - if (uopt.nls_map)
> - unload_nls(uopt.nls_map);
> + unload_nls(uopt.nls_map);
> if (lvid_open)
> udf_close_lvid(sb);
> brelse(sbi->s_lvid_bh);
> @@ -2359,8 +2348,7 @@ static void udf_put_super(struct super_block *sb)
> sbi = UDF_SB(sb);
>
> iput(sbi->s_vat_inode);
> - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> - unload_nls(sbi->s_nls_map);
> + unload_nls(sbi->s_nls_map);
> if (!sb_rdonly(sb))
> udf_close_lvid(sb);
> brelse(sbi->s_lvid_bh);
> diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> index 758efe557a19..4fa620543d30 100644
> --- a/fs/udf/udf_sb.h
> +++ b/fs/udf/udf_sb.h
> @@ -20,8 +20,6 @@
> #define UDF_FLAG_UNDELETE 6
> #define UDF_FLAG_UNHIDE 7
> #define UDF_FLAG_VARCONV 8
> -#define UDF_FLAG_NLS_MAP 9
> -#define UDF_FLAG_UTF8 10
> #define UDF_FLAG_UID_FORGET 11 /* save -1 for uid to disk */
> #define UDF_FLAG_GID_FORGET 12
> #define UDF_FLAG_UID_SET 13
> diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
> index 5fcfa96463eb..622569007b53 100644
> --- a/fs/udf/unicode.c
> +++ b/fs/udf/unicode.c
> @@ -177,7 +177,7 @@ static int udf_name_from_CS0(struct super_block *sb,
> return 0;
> }
>
> - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> + if (UDF_SB(sb)->s_nls_map)
> conv_f = UDF_SB(sb)->s_nls_map->uni2char;
> else
> conv_f = NULL;
> @@ -285,7 +285,7 @@ static int udf_name_to_CS0(struct super_block *sb,
> if (ocu_max_len <= 0)
> return 0;
>
> - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> + if (UDF_SB(sb)->s_nls_map)
> conv_f = UDF_SB(sb)->s_nls_map->char2uni;
> else
> conv_f = NULL;
> --
> 2.20.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Sun 08-08-21 18:24:37, Pali Roh?r wrote:
> Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> it is required to use utf8 mount option.
>
> Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> option.
>
> If UTF-8 as iocharset is used then s_nls_iocharset is set to NULL. So
> simplify code around, remove s_utf8 field as to distinguish between UTF-8
> and non-UTF-8 it is needed just to check if s_nls_iocharset is set to NULL
> or not.
>
> Signed-off-by: Pali Roh?r <[email protected]>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
I can also take this patch through my tree if you want.
Honza
> ---
> fs/isofs/inode.c | 27 +++++++++++++--------------
> fs/isofs/isofs.h | 1 -
> fs/isofs/joliet.c | 4 +---
> 3 files changed, 14 insertions(+), 18 deletions(-)
>
> diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
> index 21edc423b79f..678e2c51b855 100644
> --- a/fs/isofs/inode.c
> +++ b/fs/isofs/inode.c
> @@ -155,7 +155,6 @@ struct iso9660_options{
> unsigned int overriderockperm:1;
> unsigned int uid_set:1;
> unsigned int gid_set:1;
> - unsigned int utf8:1;
> unsigned char map;
> unsigned char check;
> unsigned int blocksize;
> @@ -356,7 +355,6 @@ static int parse_options(char *options, struct iso9660_options *popt)
> popt->gid = GLOBAL_ROOT_GID;
> popt->uid = GLOBAL_ROOT_UID;
> popt->iocharset = NULL;
> - popt->utf8 = 0;
> popt->overriderockperm = 0;
> popt->session=-1;
> popt->sbsector=-1;
> @@ -389,10 +387,13 @@ static int parse_options(char *options, struct iso9660_options *popt)
> case Opt_cruft:
> popt->cruft = 1;
> break;
> +#ifdef CONFIG_JOLIET
> case Opt_utf8:
> - popt->utf8 = 1;
> + kfree(popt->iocharset);
> + popt->iocharset = kstrdup("utf8", GFP_KERNEL);
> + if (!popt->iocharset)
> + return 0;
> break;
> -#ifdef CONFIG_JOLIET
> case Opt_iocharset:
> kfree(popt->iocharset);
> popt->iocharset = match_strdup(&args[0]);
> @@ -495,7 +496,6 @@ static int isofs_show_options(struct seq_file *m, struct dentry *root)
> if (sbi->s_nocompress) seq_puts(m, ",nocompress");
> if (sbi->s_overriderockperm) seq_puts(m, ",overriderockperm");
> if (sbi->s_showassoc) seq_puts(m, ",showassoc");
> - if (sbi->s_utf8) seq_puts(m, ",utf8");
>
> if (sbi->s_check) seq_printf(m, ",check=%c", sbi->s_check);
> if (sbi->s_mapping) seq_printf(m, ",map=%c", sbi->s_mapping);
> @@ -518,9 +518,10 @@ static int isofs_show_options(struct seq_file *m, struct dentry *root)
> seq_printf(m, ",fmode=%o", sbi->s_fmode);
>
> #ifdef CONFIG_JOLIET
> - if (sbi->s_nls_iocharset &&
> - strcmp(sbi->s_nls_iocharset->charset, CONFIG_NLS_DEFAULT) != 0)
> + if (sbi->s_nls_iocharset)
> seq_printf(m, ",iocharset=%s", sbi->s_nls_iocharset->charset);
> + else
> + seq_puts(m, ",iocharset=utf8");
> #endif
> return 0;
> }
> @@ -863,14 +864,13 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent)
> sbi->s_nls_iocharset = NULL;
>
> #ifdef CONFIG_JOLIET
> - if (joliet_level && opt.utf8 == 0) {
> + if (joliet_level) {
> char *p = opt.iocharset ? opt.iocharset : CONFIG_NLS_DEFAULT;
> - sbi->s_nls_iocharset = load_nls(p);
> - if (! sbi->s_nls_iocharset) {
> - /* Fail only if explicit charset specified */
> - if (opt.iocharset)
> + if (strcmp(p, "utf8") != 0) {
> + sbi->s_nls_iocharset = opt.iocharset ?
> + load_nls(opt.iocharset) : load_nls_default();
> + if (!sbi->s_nls_iocharset)
> goto out_freesbi;
> - sbi->s_nls_iocharset = load_nls_default();
> }
> }
> #endif
> @@ -886,7 +886,6 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent)
> sbi->s_gid = opt.gid;
> sbi->s_uid_set = opt.uid_set;
> sbi->s_gid_set = opt.gid_set;
> - sbi->s_utf8 = opt.utf8;
> sbi->s_nocompress = opt.nocompress;
> sbi->s_overriderockperm = opt.overriderockperm;
> /*
> diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
> index 055ec6c586f7..dcdc191ed183 100644
> --- a/fs/isofs/isofs.h
> +++ b/fs/isofs/isofs.h
> @@ -44,7 +44,6 @@ struct isofs_sb_info {
> unsigned char s_session;
> unsigned int s_high_sierra:1;
> unsigned int s_rock:2;
> - unsigned int s_utf8:1;
> unsigned int s_cruft:1; /* Broken disks with high byte of length
> * containing junk */
> unsigned int s_nocompress:1;
> diff --git a/fs/isofs/joliet.c b/fs/isofs/joliet.c
> index be8b6a9d0b92..c0f04a1e7f69 100644
> --- a/fs/isofs/joliet.c
> +++ b/fs/isofs/joliet.c
> @@ -41,14 +41,12 @@ uni16_to_x8(unsigned char *ascii, __be16 *uni, int len, struct nls_table *nls)
> int
> get_joliet_filename(struct iso_directory_record * de, unsigned char *outname, struct inode * inode)
> {
> - unsigned char utf8;
> struct nls_table *nls;
> unsigned char len = 0;
>
> - utf8 = ISOFS_SB(inode->i_sb)->s_utf8;
> nls = ISOFS_SB(inode->i_sb)->s_nls_iocharset;
>
> - if (utf8) {
> + if (!nls) {
> len = utf16s_to_utf8s((const wchar_t *) de->name,
> de->name_len[0] >> 1, UTF16_BIG_ENDIAN,
> outname, PAGE_SIZE);
> --
> 2.20.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thursday 12 August 2021 16:17:36 Jan Kara wrote:
> On Sun 08-08-21 18:24:36, Pali Rohár wrote:
> > Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> > it is required to use utf8 mount option.
> >
> > Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> > option.
> >
> > If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
> > code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
> > distinguish between UTF-8 and non-UTF-8 it is needed just to check if
> > s_nls_map set to NULL or not.
> >
> > Signed-off-by: Pali Rohár <[email protected]>
>
> Thanks for the cleanup. It looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> Or should I take this patch through my tree?
Hello! Patches are just RFC, mostly untested and not ready for merging.
I will wait for feedback and then I do more testing nad prepare new
patch series.
>
> Honza
>
>
> > ---
> > fs/udf/super.c | 50 ++++++++++++++++++------------------------------
> > fs/udf/udf_sb.h | 2 --
> > fs/udf/unicode.c | 4 ++--
> > 3 files changed, 21 insertions(+), 35 deletions(-)
> >
> > diff --git a/fs/udf/super.c b/fs/udf/super.c
> > index 2f83c1204e20..6e8c29107b04 100644
> > --- a/fs/udf/super.c
> > +++ b/fs/udf/super.c
> > @@ -349,10 +349,10 @@ static int udf_show_options(struct seq_file *seq, struct dentry *root)
> > seq_printf(seq, ",lastblock=%u", sbi->s_last_block);
> > if (sbi->s_anchor != 0)
> > seq_printf(seq, ",anchor=%u", sbi->s_anchor);
> > - if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8))
> > - seq_puts(seq, ",utf8");
> > - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP) && sbi->s_nls_map)
> > + if (sbi->s_nls_map)
> > seq_printf(seq, ",iocharset=%s", sbi->s_nls_map->charset);
> > + else
> > + seq_puts(seq, ",iocharset=utf8");
> >
> > return 0;
> > }
> > @@ -558,19 +558,24 @@ static int udf_parse_options(char *options, struct udf_options *uopt,
> > /* Ignored (never implemented properly) */
> > break;
> > case Opt_utf8:
> > - uopt->flags |= (1 << UDF_FLAG_UTF8);
> > + if (!remount) {
> > + unload_nls(uopt->nls_map);
> > + uopt->nls_map = NULL;
> > + }
> > break;
> > case Opt_iocharset:
> > if (!remount) {
> > - if (uopt->nls_map)
> > - unload_nls(uopt->nls_map);
> > - /*
> > - * load_nls() failure is handled later in
> > - * udf_fill_super() after all options are
> > - * parsed.
> > - */
> > + unload_nls(uopt->nls_map);
> > + uopt->nls_map = NULL;
> > + }
> > + /* When nls_map is not loaded then UTF-8 is used */
> > + if (!remount && strcmp(args[0].from, "utf8") != 0) {
> > uopt->nls_map = load_nls(args[0].from);
> > - uopt->flags |= (1 << UDF_FLAG_NLS_MAP);
> > + if (!uopt->nls_map) {
> > + pr_err("iocharset %s not found\n",
> > + args[0].from);
> > + return 0;
> > + }
> > }
> > break;
> > case Opt_uforget:
> > @@ -2139,21 +2144,6 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
> > if (!udf_parse_options((char *)options, &uopt, false))
> > goto parse_options_failure;
> >
> > - if (uopt.flags & (1 << UDF_FLAG_UTF8) &&
> > - uopt.flags & (1 << UDF_FLAG_NLS_MAP)) {
> > - udf_err(sb, "utf8 cannot be combined with iocharset\n");
> > - goto parse_options_failure;
> > - }
> > - if ((uopt.flags & (1 << UDF_FLAG_NLS_MAP)) && !uopt.nls_map) {
> > - uopt.nls_map = load_nls_default();
> > - if (!uopt.nls_map)
> > - uopt.flags &= ~(1 << UDF_FLAG_NLS_MAP);
> > - else
> > - udf_debug("Using default NLS map\n");
> > - }
> > - if (!(uopt.flags & (1 << UDF_FLAG_NLS_MAP)))
> > - uopt.flags |= (1 << UDF_FLAG_UTF8);
> > -
> > fileset.logicalBlockNum = 0xFFFFFFFF;
> > fileset.partitionReferenceNum = 0xFFFF;
> >
> > @@ -2308,8 +2298,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
> > error_out:
> > iput(sbi->s_vat_inode);
> > parse_options_failure:
> > - if (uopt.nls_map)
> > - unload_nls(uopt.nls_map);
> > + unload_nls(uopt.nls_map);
> > if (lvid_open)
> > udf_close_lvid(sb);
> > brelse(sbi->s_lvid_bh);
> > @@ -2359,8 +2348,7 @@ static void udf_put_super(struct super_block *sb)
> > sbi = UDF_SB(sb);
> >
> > iput(sbi->s_vat_inode);
> > - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> > - unload_nls(sbi->s_nls_map);
> > + unload_nls(sbi->s_nls_map);
> > if (!sb_rdonly(sb))
> > udf_close_lvid(sb);
> > brelse(sbi->s_lvid_bh);
> > diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> > index 758efe557a19..4fa620543d30 100644
> > --- a/fs/udf/udf_sb.h
> > +++ b/fs/udf/udf_sb.h
> > @@ -20,8 +20,6 @@
> > #define UDF_FLAG_UNDELETE 6
> > #define UDF_FLAG_UNHIDE 7
> > #define UDF_FLAG_VARCONV 8
> > -#define UDF_FLAG_NLS_MAP 9
> > -#define UDF_FLAG_UTF8 10
> > #define UDF_FLAG_UID_FORGET 11 /* save -1 for uid to disk */
> > #define UDF_FLAG_GID_FORGET 12
> > #define UDF_FLAG_UID_SET 13
> > diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
> > index 5fcfa96463eb..622569007b53 100644
> > --- a/fs/udf/unicode.c
> > +++ b/fs/udf/unicode.c
> > @@ -177,7 +177,7 @@ static int udf_name_from_CS0(struct super_block *sb,
> > return 0;
> > }
> >
> > - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> > + if (UDF_SB(sb)->s_nls_map)
> > conv_f = UDF_SB(sb)->s_nls_map->uni2char;
> > else
> > conv_f = NULL;
> > @@ -285,7 +285,7 @@ static int udf_name_to_CS0(struct super_block *sb,
> > if (ocu_max_len <= 0)
> > return 0;
> >
> > - if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
> > + if (UDF_SB(sb)->s_nls_map)
> > conv_f = UDF_SB(sb)->s_nls_map->char2uni;
> > else
> > conv_f = NULL;
> > --
> > 2.20.1
> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
On Thu 12-08-21 17:51:34, Pali Roh?r wrote:
> On Thursday 12 August 2021 16:17:36 Jan Kara wrote:
> > On Sun 08-08-21 18:24:36, Pali Roh?r wrote:
> > > Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> > > it is required to use utf8 mount option.
> > >
> > > Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> > > option.
> > >
> > > If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
> > > code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
> > > distinguish between UTF-8 and non-UTF-8 it is needed just to check if
> > > s_nls_map set to NULL or not.
> > >
> > > Signed-off-by: Pali Roh?r <[email protected]>
> >
> > Thanks for the cleanup. It looks good. Feel free to add:
> >
> > Reviewed-by: Jan Kara <[email protected]>
> >
> > Or should I take this patch through my tree?
>
> Hello! Patches are just RFC, mostly untested and not ready for merging.
> I will wait for feedback and then I do more testing nad prepare new
> patch series.
OK, FWIW I've also tested the UDF and isofs patches.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
Pali Roh?r <[email protected]> writes:
> Currently iocharset=utf8 mount option is broken and error is printed to
> dmesg when it is used. To use UTF-8 as iocharset, it is required to use
> utf8=1 mount option.
>
> Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount
> option and remove printing error from dmesg.
This change is not equivalent to utf8=1. In the case of utf8=1, vfat
uses iocharset's conversion table and it can handle more than ascii.
So this patch is incompatible changes, and handles less chars than
utf8=1. So I think this is clean though, but this would be regression
for user of utf8=1.
Thanks.
--
OGAWA Hirofumi <[email protected]>
On Sunday 15 August 2021 12:42:47 OGAWA Hirofumi wrote:
> Pali Rohár <[email protected]> writes:
>
> > Currently iocharset=utf8 mount option is broken and error is printed to
> > dmesg when it is used. To use UTF-8 as iocharset, it is required to use
> > utf8=1 mount option.
> >
> > Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount
> > option and remove printing error from dmesg.
>
> This change is not equivalent to utf8=1. In the case of utf8=1, vfat
> uses iocharset's conversion table and it can handle more than ascii.
>
> So this patch is incompatible changes, and handles less chars than
> utf8=1. So I think this is clean though, but this would be regression
> for user of utf8=1.
I do not think so... But please correct me, as this code around is mess.
Without this change when utf8=1 is set then iocharset= encoding is used
for case-insensitivity implementation (toupper / tolower conversion).
For all other parts are use correct utf8* conversion functions.
But you use touppper / tolower functions from iocharset= encoding on
stream of utf8 bytes then you either get identity or some unpredictable
garbage in utf8. So when comparing two (different) non-ASCII filenames
via this method you in most cases get that filenames are different.
Because converting their utf8 bytes via toupper / tolower functions from
iocharset= encoding results in two different byte sequences in most
cases. Even for two utf8 case-insensitive same strings.
But you can play with it and I guess it is possible to find two
different utf8 strings which after toupper / tolower conversion from
some iocharset= encoding would lead to same byte sequence.
This patch uses for utf8 tolower / touppser function simple 7-bit
tolower / toupper ascii function. And so for 7-bit ascii file names
there is no change.
So this patch changes behavior when comparing non 7-bit ascii file
names, but only in cases when previously two different file names were
marked as same. As now they are marked correctly as different. So this
is changed behavior, but I guess it is bug fix which is needed.
If you want I can put this change into separate patch.
Issue that two case-insensitive same files are marked as different is
not changed by this patch and therefore this issue stay here.
> Thanks.
> --
> OGAWA Hirofumi <[email protected]>
To: Pali Roh?r <[email protected]>
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], Alexander Viro <[email protected]>, Jan Kara <[email protected]>, "Theodore Y . Ts'o" <[email protected]>, Luis de Bethencourt <[email protected]>, Salah Triki <[email protected]>, Andrew Morton <[email protected]>, Dave Kleikamp <[email protected]>, Anton Altaparmakov <[email protected]>, Pavel Machek <[email protected]>, Marek Beh?n <[email protected]>, Christoph Hellwig <[email protected]>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
From: OGAWA Hirofumi <[email protected]>
Gcc: nnimap+ibmpc.myhome.or.jp:Sent
--text follows this line--
To: Pali Roh?r <[email protected]>
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], Alexander Viro <[email protected]>, Jan Kara <[email protected]>, "Theodore Y . Ts'o" <[email protected]>, Luis de Bethencourt <[email protected]>, Salah Triki <[email protected]>, Andrew Morton <[email protected]>, Dave Kleikamp <[email protected]>, Anton Altaparmakov <[email protected]>, Pavel Machek <[email protected]>, Marek Beh?n <[email protected]>, Christoph Hellwig <[email protected]>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
From: OGAWA Hirofumi <[email protected]>
Gcc: nnimap+ibmpc.myhome.or.jp:Sent
--text follows this line--
Pali Roh?r <[email protected]> writes:
>> This change is not equivalent to utf8=1. In the case of utf8=1, vfat
>> uses iocharset's conversion table and it can handle more than ascii.
>>
>> So this patch is incompatible changes, and handles less chars than
>> utf8=1. So I think this is clean though, but this would be regression
>> for user of utf8=1.
>
> I do not think so... But please correct me, as this code around is mess.
>
> Without this change when utf8=1 is set then iocharset= encoding is used
> for case-insensitivity implementation (toupper / tolower conversion).
> For all other parts are use correct utf8* conversion functions.
>
> But you use touppper / tolower functions from iocharset= encoding on
> stream of utf8 bytes then you either get identity or some unpredictable
> garbage in utf8. So when comparing two (different) non-ASCII filenames
> via this method you in most cases get that filenames are different.
> Because converting their utf8 bytes via toupper / tolower functions from
> iocharset= encoding results in two different byte sequences in most
> cases. Even for two utf8 case-insensitive same strings.
>
> But you can play with it and I guess it is possible to find two
> different utf8 strings which after toupper / tolower conversion from
> some iocharset= encoding would lead to same byte sequence.
>
> This patch uses for utf8 tolower / touppser function simple 7-bit
> tolower / toupper ascii function. And so for 7-bit ascii file names
> there is no change.
>
> So this patch changes behavior when comparing non 7-bit ascii file
> names, but only in cases when previously two different file names were
> marked as same. As now they are marked correctly as different. So this
> is changed behavior, but I guess it is bug fix which is needed.
> If you want I can put this change into separate patch.
>
> Issue that two case-insensitive same files are marked as different is
> not changed by this patch and therefore this issue stay here.
OK, sure. utf8 looks like broken than I was thinking (although user can
use iocharset=ascii and utf8=1 for this). The code might be better to
clean up a bit more though, looks like good basically.
One thing, please update FAT_DEFAULT_IOCHARSET help in Kconfig and
Documentation/filesystems/vfat.rst (with new warning about iocharset=utf8).
Thanks.
--
OGAWA Hirofumi <[email protected]>
On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Roh?r wrote:
> Other fs drivers are using iocharset= mount option for specifying charset.
> So mark iocharset= mount option as preferred and deprecate nls= mount
> option.
One idea is also make this change to fs/fc_parser.c and then when we
want we can drop support from all filesystem same time. This way we
can get more deprecated code off the fs drivers. Draw back is that
then every filesstem has this deprecated nls= option if it support
iocharsets option. But that should imo be ok.
On Thursday 19 August 2021 04:21:08 Kari Argillander wrote:
> On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Rohár wrote:
> > Other fs drivers are using iocharset= mount option for specifying charset.
> > So mark iocharset= mount option as preferred and deprecate nls= mount
> > option.
>
> One idea is also make this change to fs/fc_parser.c and then when we
> want we can drop support from all filesystem same time. This way we
> can get more deprecated code off the fs drivers. Draw back is that
> then every filesstem has this deprecated nls= option if it support
> iocharsets option. But that should imo be ok.
Beware that iocharset= is required only for fs which store filenames in
some specific encoding (in this case extension to UTF-16). For fs which
store filenames in raw bytes this option should not be parsed at all.
Therefore I'm not sure if this parsing should be in global
fs/fc_parser.c file...
On Friday 13 August 2021 15:48:22 Jan Kara wrote:
> On Thu 12-08-21 17:51:34, Pali Rohár wrote:
> > On Thursday 12 August 2021 16:17:36 Jan Kara wrote:
> > > On Sun 08-08-21 18:24:36, Pali Rohár wrote:
> > > > Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> > > > it is required to use utf8 mount option.
> > > >
> > > > Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> > > > option.
> > > >
> > > > If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
> > > > code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
> > > > distinguish between UTF-8 and non-UTF-8 it is needed just to check if
> > > > s_nls_map set to NULL or not.
> > > >
> > > > Signed-off-by: Pali Rohár <[email protected]>
> > >
> > > Thanks for the cleanup. It looks good. Feel free to add:
> > >
> > > Reviewed-by: Jan Kara <[email protected]>
> > >
> > > Or should I take this patch through my tree?
> >
> > Hello! Patches are just RFC, mostly untested and not ready for merging.
> > I will wait for feedback and then I do more testing nad prepare new
> > patch series.
>
> OK, FWIW I've also tested the UDF and isofs patches.
Well, if you have already done tests, patches are correct and these fs
driver are working fine then fell free to take it through your tree.
I just wanted to warn people that patches in this RFC are mostly
untested to prevent some issues. But if somebody else was faster than
me, did testing + reviewing and there was no issue, I do not see any
problem with including them. Just I cannot put my own Tested-by (yet) :-)
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
On Thu, Aug 19, 2021 at 10:12:22AM +0200, Pali Roh?r wrote:
> On Thursday 19 August 2021 04:21:08 Kari Argillander wrote:
> > On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Roh?r wrote:
> > > Other fs drivers are using iocharset= mount option for specifying charset.
> > > So mark iocharset= mount option as preferred and deprecate nls= mount
> > > option.
> >
> > One idea is also make this change to fs/fc_parser.c and then when we
> > want we can drop support from all filesystem same time. This way we
> > can get more deprecated code off the fs drivers. Draw back is that
> > then every filesstem has this deprecated nls= option if it support
> > iocharsets option. But that should imo be ok.
>
> Beware that iocharset= is required only for fs which store filenames in
> some specific encoding (in this case extension to UTF-16). For fs which
> store filenames in raw bytes this option should not be parsed at all.
Yeah of course. I was thinking that what we do is that if key is nls=
we change key to iocharset, print deprecated and then send it to driver
parser as usual. This way driver parser will never know that user
specifie nls= because it just get iocharset. But this is probebly too
fancy way to think simple problem. Just idea.
> Therefore I'm not sure if this parsing should be in global
> fs/fc_parser.c file...
On Thu 19-08-21 10:34:32, Pali Roh?r wrote:
> On Friday 13 August 2021 15:48:22 Jan Kara wrote:
> > On Thu 12-08-21 17:51:34, Pali Roh?r wrote:
> > > On Thursday 12 August 2021 16:17:36 Jan Kara wrote:
> > > > On Sun 08-08-21 18:24:36, Pali Roh?r wrote:
> > > > > Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
> > > > > it is required to use utf8 mount option.
> > > > >
> > > > > Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
> > > > > option.
> > > > >
> > > > > If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
> > > > > code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
> > > > > distinguish between UTF-8 and non-UTF-8 it is needed just to check if
> > > > > s_nls_map set to NULL or not.
> > > > >
> > > > > Signed-off-by: Pali Roh?r <[email protected]>
> > > >
> > > > Thanks for the cleanup. It looks good. Feel free to add:
> > > >
> > > > Reviewed-by: Jan Kara <[email protected]>
> > > >
> > > > Or should I take this patch through my tree?
> > >
> > > Hello! Patches are just RFC, mostly untested and not ready for merging.
> > > I will wait for feedback and then I do more testing nad prepare new
> > > patch series.
> >
> > OK, FWIW I've also tested the UDF and isofs patches.
>
> Well, if you have already done tests, patches are correct and these fs
> driver are working fine then fell free to take it through your tree.
>
> I just wanted to warn people that patches in this RFC are mostly
> untested to prevent some issues. But if somebody else was faster than
> me, did testing + reviewing and there was no issue, I do not see any
> problem with including them. Just I cannot put my own Tested-by (yet) :-)
OK, I've pulled the udf and isofs fixes to my tree.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thursday 19 August 2021 13:23:42 Kari Argillander wrote:
> On Thu, Aug 19, 2021 at 10:12:22AM +0200, Pali Rohár wrote:
> > On Thursday 19 August 2021 04:21:08 Kari Argillander wrote:
> > > On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Rohár wrote:
> > > > Other fs drivers are using iocharset= mount option for specifying charset.
> > > > So mark iocharset= mount option as preferred and deprecate nls= mount
> > > > option.
> > >
> > > One idea is also make this change to fs/fc_parser.c and then when we
> > > want we can drop support from all filesystem same time. This way we
> > > can get more deprecated code off the fs drivers. Draw back is that
> > > then every filesstem has this deprecated nls= option if it support
> > > iocharsets option. But that should imo be ok.
> >
> > Beware that iocharset= is required only for fs which store filenames in
> > some specific encoding (in this case extension to UTF-16). For fs which
> > store filenames in raw bytes this option should not be parsed at all.
>
> Yeah of course. I was thinking that what we do is that if key is nls=
> we change key to iocharset, print deprecated and then send it to driver
> parser as usual. This way driver parser will never know that user
> specifie nls= because it just get iocharset. But this is probebly too
> fancy way to think simple problem. Just idea.
This has an issue that when you use nls= option for e.g. ext4 fs then
kernel starts reporting that nls= for ext4 is deprecated. But there is
no nls= option and neither iocharset= option for ext4. So kernel should
not start reporting such warnings for ext4.
> > Therefore I'm not sure if this parsing should be in global
> > fs/fc_parser.c file...
>
On Fri, Aug 20, 2021 at 12:04:12AM +0200, Pali Roh?r wrote:
> On Thursday 19 August 2021 13:23:42 Kari Argillander wrote:
> > On Thu, Aug 19, 2021 at 10:12:22AM +0200, Pali Roh?r wrote:
> > > On Thursday 19 August 2021 04:21:08 Kari Argillander wrote:
> > > > On Sun, Aug 08, 2021 at 06:24:38PM +0200, Pali Roh?r wrote:
> > > > > Other fs drivers are using iocharset= mount option for specifying charset.
> > > > > So mark iocharset= mount option as preferred and deprecate nls= mount
> > > > > option.
> > > >
> > > > One idea is also make this change to fs/fc_parser.c and then when we
> > > > want we can drop support from all filesystem same time. This way we
> > > > can get more deprecated code off the fs drivers. Draw back is that
> > > > then every filesstem has this deprecated nls= option if it support
> > > > iocharsets option. But that should imo be ok.
> > >
> > > Beware that iocharset= is required only for fs which store filenames in
> > > some specific encoding (in this case extension to UTF-16). For fs which
> > > store filenames in raw bytes this option should not be parsed at all.
> >
> > Yeah of course. I was thinking that what we do is that if key is nls=
> > we change key to iocharset, print deprecated and then send it to driver
> > parser as usual. This way driver parser will never know that user
> > specifie nls= because it just get iocharset. But this is probebly too
> > fancy way to think simple problem. Just idea.
>
> This has an issue that when you use nls= option for e.g. ext4 fs then
> kernel starts reporting that nls= for ext4 is deprecated. But there is
> no nls= option and neither iocharset= option for ext4. So kernel should
> not start reporting such warnings for ext4.
It gets kinda messy. I was also thinking that but if that was
implemented then we could first send iocharset to driver and after that
we print deprecated if it succeeded. If it not succeed then we print
error messages same as always.
I have not look how easily this is can be done in parser.
>
> > > Therefore I'm not sure if this parsing should be in global
> > > fs/fc_parser.c file...
> >
On Sun, Aug 08, 2021 at 06:24:34PM +0200, Pali Roh?r wrote:
> Currently iocharset=utf8 mount option is broken and error is printed to
> dmesg when it is used. To use UTF-8 as iocharset, it is required to use
> utf8=1 mount option.
>
> Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount
> option and remove printing error from dmesg.
>
> FAT by definition is case-insensitive but current Linux implementation is
> case-sensitive for non-ASCII characters when UTF-8 is used. This patch does
> not change this UTF-8 behavior. Only more comments in fat_utf8_strnicmp()
> function are added about it.
>
> After this patch iocharset=utf8 starts working, so there is no need to have
> separate config option FAT_DEFAULT_UTF8 as FAT_DEFAULT_IOCHARSET for utf8
> also starts working. So remove redundant config option FAT_DEFAULT_UTF8.
>
> Signed-off-by: Pali Roh?r <[email protected]>
> ---
> fs/fat/Kconfig | 15 ---------------
> fs/fat/dir.c | 17 +++++++----------
> fs/fat/fat.h | 22 ++++++++++++++++++++++
> fs/fat/inode.c | 28 +++++++++++-----------------
> fs/fat/namei_vfat.c | 26 +++++++++++++++++++-------
> 5 files changed, 59 insertions(+), 49 deletions(-)
>
> diff --git a/fs/fat/Kconfig b/fs/fat/Kconfig
> index 66532a71e8fd..a31594137d5e 100644
> --- a/fs/fat/Kconfig
> +++ b/fs/fat/Kconfig
> @@ -100,18 +100,3 @@ config FAT_DEFAULT_IOCHARSET
>
> Enable any character sets you need in File Systems/Native Language
> Support.
> -
> -config FAT_DEFAULT_UTF8
> - bool "Enable FAT UTF-8 option by default"
> - depends on VFAT_FS
> - default n
> - help
> - Set this if you would like to have "utf8" mount option set
> - by default when mounting FAT filesystems.
> -
> - Even if you say Y here can always disable UTF-8 for
> - particular mount by adding "utf8=0" to mount options.
> -
> - Say Y if you use UTF-8 encoding for file names, N otherwise.
> -
> - See <file:Documentation/filesystems/vfat.rst> for more information.
> diff --git a/fs/fat/dir.c b/fs/fat/dir.c
> index c4a274285858..49fe8dc6e5f0 100644
> --- a/fs/fat/dir.c
> +++ b/fs/fat/dir.c
> @@ -33,11 +33,6 @@
> #define FAT_MAX_UNI_CHARS ((MSDOS_SLOTS - 1) * 13 + 1)
> #define FAT_MAX_UNI_SIZE (FAT_MAX_UNI_CHARS * sizeof(wchar_t))
>
> -static inline unsigned char fat_tolower(unsigned char c)
> -{
> - return ((c >= 'A') && (c <= 'Z')) ? c+32 : c;
> -}
> -
> static inline loff_t fat_make_i_pos(struct super_block *sb,
> struct buffer_head *bh,
> struct msdos_dir_entry *de)
> @@ -258,10 +253,12 @@ static inline int fat_name_match(struct msdos_sb_info *sbi,
> if (a_len != b_len)
> return 0;
>
> - if (sbi->options.name_check != 's')
> - return !nls_strnicmp(sbi->nls_io, a, b, a_len);
> - else
> + if (sbi->options.name_check == 's')
> return !memcmp(a, b, a_len);
> + else if (sbi->options.utf8)
> + return !fat_utf8_strnicmp(a, b, a_len);
> + else
> + return !nls_strnicmp(sbi->nls_io, a, b, a_len);
> }
>
> enum { PARSE_INVALID = 1, PARSE_NOT_LONGNAME, PARSE_EOF, };
> @@ -384,7 +381,7 @@ static int fat_parse_short(struct super_block *sb,
> de->lcase & CASE_LOWER_BASE);
> if (chl <= 1) {
> if (!isvfat)
> - ptname[i] = nocase ? c : fat_tolower(c);
> + ptname[i] = nocase ? c : fat_ascii_to_lower(c);
> i++;
> if (c != ' ') {
> name_len = i;
> @@ -421,7 +418,7 @@ static int fat_parse_short(struct super_block *sb,
> if (chl <= 1) {
> k++;
> if (!isvfat)
> - ptname[i] = nocase ? c : fat_tolower(c);
> + ptname[i] = nocase ? c : fat_ascii_to_lower(c);
> i++;
> if (c != ' ') {
> name_len = i;
> diff --git a/fs/fat/fat.h b/fs/fat/fat.h
> index 02d4d4234956..0cd15fb3b042 100644
> --- a/fs/fat/fat.h
> +++ b/fs/fat/fat.h
> @@ -310,6 +310,28 @@ static inline void fatwchar_to16(__u8 *dst, const wchar_t *src, size_t len)
> #endif
> }
>
> +static inline unsigned char fat_ascii_to_lower(unsigned char c)
> +{
> + return ((c >= 'A') && (c <= 'Z')) ? c+32 : c;
> +}
> +
> +static inline int fat_utf8_strnicmp(const unsigned char *a,
> + const unsigned char *b,
> + int len)
> +{
> + int i;
> +
> + /*
> + * FIXME: UTF-8 doesn't provide FAT semantics
> + * Case-insensitive support is only for 7-bit ASCII characters
> + */
> + for (i = 0; i < len; i++) {
> + if (fat_ascii_to_lower(a[i]) != fat_ascii_to_lower(b[i]))
> + return 1;
> + }
> + return 0;
> +}
> +
> /* fat/cache.c */
> extern void fat_cache_inval_inode(struct inode *inode);
> extern int fat_get_cluster(struct inode *inode, int cluster,
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index de0c9b013a85..f8c8a739f8f0 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -957,7 +957,9 @@ static int fat_show_options(struct seq_file *m, struct dentry *root)
> /* strip "cp" prefix from displayed option */
> seq_printf(m, ",codepage=%s", &sbi->nls_disk->charset[2]);
> if (isvfat) {
> - if (sbi->nls_io)
> + if (opts->utf8)
> + seq_printf(m, ",iocharset=utf8");
checkpatch will probably warn you about this.
WARNING: Prefer seq_puts to seq_printf
> + else if (sbi->nls_io)
> seq_printf(m, ",iocharset=%s", sbi->nls_io->charset);
>
> switch (opts->shortname) {
> @@ -994,8 +996,6 @@ static int fat_show_options(struct seq_file *m, struct dentry *root)
> if (opts->nocase)
> seq_puts(m, ",nocase");
> } else {
> - if (opts->utf8)
> - seq_puts(m, ",utf8");
> if (opts->unicode_xlate)
> seq_puts(m, ",uni_xlate");
> if (!opts->numtail)
> @@ -1157,8 +1157,6 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
> opts->errors = FAT_ERRORS_RO;
> *debug = 0;
>
> - opts->utf8 = IS_ENABLED(CONFIG_FAT_DEFAULT_UTF8) && is_vfat;
> -
> if (!options)
> goto out;
>
> @@ -1319,10 +1317,14 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
> | VFAT_SFN_CREATE_WIN95;
> break;
> case Opt_utf8_no: /* 0 or no or false */
> - opts->utf8 = 0;
> + fat_reset_iocharset(opts);
> break;
> case Opt_utf8_yes: /* empty or 1 or yes or true */
> - opts->utf8 = 1;
> + fat_reset_iocharset(opts);
> + iocharset = kstrdup("utf8", GFP_KERNEL);
> + if (!iocharset)
> + return -ENOMEM;
> + opts->iocharset = iocharset;
> break;
> case Opt_uni_xl_no: /* 0 or no or false */
> opts->unicode_xlate = 0;
> @@ -1360,18 +1362,11 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat,
> }
>
> out:
> - /* UTF-8 doesn't provide FAT semantics */
> - if (!strcmp(opts->iocharset, "utf8")) {
> - fat_msg(sb, KERN_WARNING, "utf8 is not a recommended IO charset"
> - " for FAT filesystems, filesystem will be "
> - "case sensitive!");
> - }
> + opts->utf8 = !strcmp(opts->iocharset, "utf8") && is_vfat;
>
> /* If user doesn't specify allow_utime, it's initialized from dmask. */
> if (opts->allow_utime == (unsigned short)-1)
> opts->allow_utime = ~opts->fs_dmask & (S_IWGRP | S_IWOTH);
> - if (opts->unicode_xlate)
> - opts->utf8 = 0;
> if (opts->nfs == FAT_NFS_NOSTALE_RO) {
> sb->s_flags |= SB_RDONLY;
> sb->s_export_op = &fat_export_ops_nostale;
> @@ -1832,8 +1827,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
> goto out_fail;
> }
>
> - /* FIXME: utf8 is using iocharset for upper/lower conversion */
> - if (sbi->options.isvfat) {
> + if (sbi->options.isvfat && !sbi->options.utf8) {
> sbi->nls_io = load_nls(sbi->options.iocharset);
> if (!sbi->nls_io) {
> fat_msg(sb, KERN_ERR, "IO charset %s not found",
> diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
> index 5369d82e0bfb..efb3cb9ea8a8 100644
> --- a/fs/fat/namei_vfat.c
> +++ b/fs/fat/namei_vfat.c
> @@ -134,6 +134,7 @@ static int vfat_hash(const struct dentry *dentry, struct qstr *qstr)
> static int vfat_hashi(const struct dentry *dentry, struct qstr *qstr)
> {
> struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
> + int utf8 = MSDOS_SB(dentry->d_sb)->options.utf8;
> const unsigned char *name;
> unsigned int len;
> unsigned long hash;
> @@ -142,8 +143,17 @@ static int vfat_hashi(const struct dentry *dentry, struct qstr *qstr)
> len = vfat_striptail_len(qstr);
>
> hash = init_name_hash(dentry);
> - while (len--)
> - hash = partial_name_hash(nls_tolower(t, *name++), hash);
> + if (utf8) {
> + /*
> + * FIXME: UTF-8 doesn't provide FAT semantics
> + * Case-insensitive support is only for 7-bit ASCII characters
> + */
> + while (len--)
> + hash = partial_name_hash(fat_ascii_to_lower(*name++), hash);
> + } else {
> + while (len--)
> + hash = partial_name_hash(nls_tolower(t, *name++), hash);
> + }
> qstr->hash = end_name_hash(hash);
>
> return 0;
> @@ -156,16 +166,18 @@ static int vfat_cmpi(const struct dentry *dentry,
> unsigned int len, const char *str, const struct qstr *name)
> {
> struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
> + int utf8 = MSDOS_SB(dentry->d_sb)->options.utf8;
> unsigned int alen, blen;
>
> /* A filename cannot end in '.' or we treat it like it has none */
> alen = vfat_striptail_len(name);
> blen = __vfat_striptail_len(len, str);
> - if (alen == blen) {
> - if (nls_strnicmp(t, name->name, str, alen) == 0)
> - return 0;
> - }
> - return 1;
> + if (alen != blen)
> + return 1;
> + else if (utf8)
> + return fat_utf8_strnicmp(name->name, str, alen);
> + else
> + return nls_strnicmp(t, name->name, str, alen);
> }
>
> /*
> --
> 2.20.1
>
On Sun, Aug 08, 2021 at 06:24:33PM +0200, Pali Roh?r wrote:
> Module nls_utf8 is broken in several ways. It does not support (full)
> UTF-8, despite its name. It cannot handle 4-byte UTF-8 sequences and
> tolower/toupper table is not implemented at all. Which means that it is
> not suitable for usage in case-insensitive filesystems or UTF-16
> filesystems (because of e.g. missing UTF-16 surrogate pairs processing).
>
> This is RFC patch series which unify and fix iocharset=utf8 mount
> option in all fs drivers and converts all remaining fs drivers to use
> utf8s_to_utf16s(), utf16s_to_utf8s(), utf8_to_utf32(), utf32_to_utf8
> functions for implementing UTF-8 support instead of nls_utf8.
>
> So at the end it allows to completely drop this broken nls_utf8 module.
Now that every filesystem will support nls=NULL. Is it possible to just
drop default_table completly? Then default has to be utf8, but is it a
problem?
Then I was also thinking that every nls "codepage module" can have in
Kconfig
select HAVE_NLS
HAVE_NLS will tell if we can get anything other than nls=NULL. This way
fs can drop some functions if they wanted to. It would be nice to also
make nls module as small as possible because also acpi, pci and usb
selects it. Also many other driver seems to depend on it and they do not
even seem to select it. All other than filesystems seems to just need
utf conversions. At least for quick eye. Other option is to seperate
nls and utf, but I'm not fan this idea just yet at least.
Whole point is to help little bit small Linux and embedded devices. I'm
happy to do this, but all really depens on if utf8 can be default and
that we sure can think before hand.
Argillander
> For more details look at email thread where was discussed fs unification:
> https://lore.kernel.org/linux-fsdevel/20200102211855.gg62r7jshp742d6i@pali/t/#u
>
> This patch series is mostly untested and presented as RFC. Please let me
> know what do you think about it and if is the correct way how to fix
> broken UTF-8 support in fs drivers. As explained in above email thread I
> think it does not make sense to try fixing whole NLS framework and it is
> easier to just drop this nls_utf8 module.
>
> Note: this patch series does not address UTF-8 fat case-sensitivity issue:
> https://lore.kernel.org/linux-fsdevel/20200119221455.bac7dc55g56q2l4r@pali/
>
> Pali Roh?r (20):
> fat: Fix iocharset=utf8 mount option
> hfsplus: Add iocharset= mount option as alias for nls=
> udf: Fix iocharset=utf8 mount option
> isofs: joliet: Fix iocharset=utf8 mount option
> ntfs: Undeprecate iocharset= mount option
> ntfs: Fix error processing when load_nls() fails
> befs: Fix printing iocharset= mount option
> befs: Rename enum value Opt_charset to Opt_iocharset to match mount
> option
> befs: Fix error processing when load_nls() fails
> befs: Allow to use native UTF-8 mode
> hfs: Explicitly set hsb->nls_disk when hsb->nls_io is set
> hfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> hfsplus: Do not use broken utf8 NLS table for iocharset=utf8 mount
> option
> jfs: Remove custom iso8859-1 implementation
> jfs: Fix buffer overflow in jfs_strfromUCS_le() function
> jfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> ntfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> cifs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> cifs: Remove usage of load_nls_default() calls
> nls: Drop broken nls_utf8 module
>
> fs/befs/linuxvfs.c | 22 ++++---
> fs/cifs/cifs_unicode.c | 128 +++++++++++++++++++++++-------------
> fs/cifs/cifs_unicode.h | 2 +-
> fs/cifs/cifsfs.c | 2 +
> fs/cifs/cifssmb.c | 8 +--
> fs/cifs/connect.c | 8 ++-
> fs/cifs/dfs_cache.c | 24 +++----
> fs/cifs/dir.c | 28 ++++++--
> fs/cifs/smb2pdu.c | 17 ++---
> fs/cifs/winucase.c | 14 ++--
> fs/fat/Kconfig | 15 -----
> fs/fat/dir.c | 17 ++---
> fs/fat/fat.h | 22 +++++++
> fs/fat/inode.c | 28 ++++----
> fs/fat/namei_vfat.c | 26 ++++++--
> fs/hfs/super.c | 62 ++++++++++++++---
> fs/hfs/trans.c | 62 +++++++++--------
> fs/hfsplus/dir.c | 6 +-
> fs/hfsplus/options.c | 39 ++++++-----
> fs/hfsplus/super.c | 7 +-
> fs/hfsplus/unicode.c | 31 ++++++++-
> fs/hfsplus/xattr.c | 14 ++--
> fs/hfsplus/xattr_security.c | 3 +-
> fs/isofs/inode.c | 27 ++++----
> fs/isofs/isofs.h | 1 -
> fs/isofs/joliet.c | 4 +-
> fs/jfs/jfs_dtree.c | 13 +++-
> fs/jfs/jfs_unicode.c | 35 +++++-----
> fs/jfs/jfs_unicode.h | 2 +-
> fs/jfs/super.c | 29 ++++++--
> fs/nls/Kconfig | 9 ---
> fs/nls/Makefile | 1 -
> fs/nls/nls_utf8.c | 67 -------------------
> fs/ntfs/dir.c | 6 +-
> fs/ntfs/inode.c | 5 +-
> fs/ntfs/super.c | 60 ++++++++---------
> fs/ntfs/unistr.c | 28 +++++++-
> fs/udf/super.c | 50 ++++++--------
> fs/udf/udf_sb.h | 2 -
> fs/udf/unicode.c | 4 +-
> 40 files changed, 510 insertions(+), 418 deletions(-)
> delete mode 100644 fs/nls/nls_utf8.c
>
> --
> 2.20.1
>
On Saturday 04 September 2021 00:26:16 Kari Argillander wrote:
> On Sun, Aug 08, 2021 at 06:24:33PM +0200, Pali Rohár wrote:
> > Module nls_utf8 is broken in several ways. It does not support (full)
> > UTF-8, despite its name. It cannot handle 4-byte UTF-8 sequences and
> > tolower/toupper table is not implemented at all. Which means that it is
> > not suitable for usage in case-insensitive filesystems or UTF-16
> > filesystems (because of e.g. missing UTF-16 surrogate pairs processing).
> >
> > This is RFC patch series which unify and fix iocharset=utf8 mount
> > option in all fs drivers and converts all remaining fs drivers to use
> > utf8s_to_utf16s(), utf16s_to_utf8s(), utf8_to_utf32(), utf32_to_utf8
> > functions for implementing UTF-8 support instead of nls_utf8.
> >
> > So at the end it allows to completely drop this broken nls_utf8 module.
>
> Now that every filesystem will support nls=NULL. Is it possible to just
> drop default_table completly? Then default has to be utf8, but is it a
> problem?
Currently (default) fallback nls table is iso8859-1. I was planning to
merge fallback nls table and external iso8859-1 table into one, to
decrease code duplication.
There is also config option for default table. I do not think it is a
good idea to drop config option for default table as more people are
using some iso8859-X as default encoding.
> Then I was also thinking that every nls "codepage module" can have in
> Kconfig
> select HAVE_NLS
>
> HAVE_NLS will tell if we can get anything other than nls=NULL. This way
> fs can drop some functions if they wanted to. It would be nice to also
> make nls module as small as possible because also acpi, pci and usb
> selects it. Also many other driver seems to depend on it and they do not
> even seem to select it. All other than filesystems seems to just need
> utf conversions. At least for quick eye. Other option is to seperate
> nls and utf, but I'm not fan this idea just yet at least.
nls tables can be already compiled as modules. There are also
inefficient implementations of some nls tables (e.g. ascii or
iso8859-1). So there are already places for decreasing size of nls
code without loosing any functionality.
> Whole point is to help little bit small Linux and embedded devices. I'm
> happy to do this, but all really depens on if utf8 can be default and
> that we sure can think before hand.
I agree that on modern embedded systems there is no reason to use
non-utf8 encoding if you are not targeting some legacy userspace.
So allowing to compile filesystems also without nls code (in which case
they would use only utf-8) makes sense.
> Argillander
>
> > For more details look at email thread where was discussed fs unification:
> > https://lore.kernel.org/linux-fsdevel/20200102211855.gg62r7jshp742d6i@pali/t/#u
> >
> > This patch series is mostly untested and presented as RFC. Please let me
> > know what do you think about it and if is the correct way how to fix
> > broken UTF-8 support in fs drivers. As explained in above email thread I
> > think it does not make sense to try fixing whole NLS framework and it is
> > easier to just drop this nls_utf8 module.
> >
> > Note: this patch series does not address UTF-8 fat case-sensitivity issue:
> > https://lore.kernel.org/linux-fsdevel/20200119221455.bac7dc55g56q2l4r@pali/
> >
> > Pali Rohár (20):
> > fat: Fix iocharset=utf8 mount option
> > hfsplus: Add iocharset= mount option as alias for nls=
> > udf: Fix iocharset=utf8 mount option
> > isofs: joliet: Fix iocharset=utf8 mount option
> > ntfs: Undeprecate iocharset= mount option
> > ntfs: Fix error processing when load_nls() fails
> > befs: Fix printing iocharset= mount option
> > befs: Rename enum value Opt_charset to Opt_iocharset to match mount
> > option
> > befs: Fix error processing when load_nls() fails
> > befs: Allow to use native UTF-8 mode
> > hfs: Explicitly set hsb->nls_disk when hsb->nls_io is set
> > hfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > hfsplus: Do not use broken utf8 NLS table for iocharset=utf8 mount
> > option
> > jfs: Remove custom iso8859-1 implementation
> > jfs: Fix buffer overflow in jfs_strfromUCS_le() function
> > jfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > ntfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > cifs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > cifs: Remove usage of load_nls_default() calls
> > nls: Drop broken nls_utf8 module
> >
> > fs/befs/linuxvfs.c | 22 ++++---
> > fs/cifs/cifs_unicode.c | 128 +++++++++++++++++++++++-------------
> > fs/cifs/cifs_unicode.h | 2 +-
> > fs/cifs/cifsfs.c | 2 +
> > fs/cifs/cifssmb.c | 8 +--
> > fs/cifs/connect.c | 8 ++-
> > fs/cifs/dfs_cache.c | 24 +++----
> > fs/cifs/dir.c | 28 ++++++--
> > fs/cifs/smb2pdu.c | 17 ++---
> > fs/cifs/winucase.c | 14 ++--
> > fs/fat/Kconfig | 15 -----
> > fs/fat/dir.c | 17 ++---
> > fs/fat/fat.h | 22 +++++++
> > fs/fat/inode.c | 28 ++++----
> > fs/fat/namei_vfat.c | 26 ++++++--
> > fs/hfs/super.c | 62 ++++++++++++++---
> > fs/hfs/trans.c | 62 +++++++++--------
> > fs/hfsplus/dir.c | 6 +-
> > fs/hfsplus/options.c | 39 ++++++-----
> > fs/hfsplus/super.c | 7 +-
> > fs/hfsplus/unicode.c | 31 ++++++++-
> > fs/hfsplus/xattr.c | 14 ++--
> > fs/hfsplus/xattr_security.c | 3 +-
> > fs/isofs/inode.c | 27 ++++----
> > fs/isofs/isofs.h | 1 -
> > fs/isofs/joliet.c | 4 +-
> > fs/jfs/jfs_dtree.c | 13 +++-
> > fs/jfs/jfs_unicode.c | 35 +++++-----
> > fs/jfs/jfs_unicode.h | 2 +-
> > fs/jfs/super.c | 29 ++++++--
> > fs/nls/Kconfig | 9 ---
> > fs/nls/Makefile | 1 -
> > fs/nls/nls_utf8.c | 67 -------------------
> > fs/ntfs/dir.c | 6 +-
> > fs/ntfs/inode.c | 5 +-
> > fs/ntfs/super.c | 60 ++++++++---------
> > fs/ntfs/unistr.c | 28 +++++++-
> > fs/udf/super.c | 50 ++++++--------
> > fs/udf/udf_sb.h | 2 -
> > fs/udf/unicode.c | 4 +-
> > 40 files changed, 510 insertions(+), 418 deletions(-)
> > delete mode 100644 fs/nls/nls_utf8.c
> >
> > --
> > 2.20.1
> >
On Fri, Sep 03, 2021 at 11:37:03PM +0200, Pali Roh?r wrote:
> On Saturday 04 September 2021 00:26:16 Kari Argillander wrote:
> > On Sun, Aug 08, 2021 at 06:24:33PM +0200, Pali Roh?r wrote:
> > > Module nls_utf8 is broken in several ways. It does not support (full)
> > > UTF-8, despite its name. It cannot handle 4-byte UTF-8 sequences and
> > > tolower/toupper table is not implemented at all. Which means that it is
> > > not suitable for usage in case-insensitive filesystems or UTF-16
> > > filesystems (because of e.g. missing UTF-16 surrogate pairs processing).
> > >
> > > This is RFC patch series which unify and fix iocharset=utf8 mount
> > > option in all fs drivers and converts all remaining fs drivers to use
> > > utf8s_to_utf16s(), utf16s_to_utf8s(), utf8_to_utf32(), utf32_to_utf8
> > > functions for implementing UTF-8 support instead of nls_utf8.
> > >
> > > So at the end it allows to completely drop this broken nls_utf8 module.
> >
> > Now that every filesystem will support nls=NULL. Is it possible to just
> > drop default_table completly? Then default has to be utf8, but is it a
> > problem?
>
> Currently (default) fallback nls table is iso8859-1. I was planning to
> merge fallback nls table and external iso8859-1 table into one, to
> decrease code duplication.
>
> There is also config option for default table. I do not think it is a
> good idea to drop config option for default table as more people are
> using some iso8859-X as default encoding.
I'm not suggesting that we drop default config option. I just suggest we
make fallback default to utf8. So load_nls_default() will just return
NULL and it will be ok because every fs can handle that situation after
some tweaks at least. This way we can drop default_table (iso8859-1 as
you said) from nls_base.
> > Then I was also thinking that every nls "codepage module" can have in
> > Kconfig
> > select HAVE_NLS
> >
> > HAVE_NLS will tell if we can get anything other than nls=NULL. This way
> > fs can drop some functions if they wanted to. It would be nice to also
> > make nls module as small as possible because also acpi, pci and usb
> > selects it. Also many other driver seems to depend on it and they do not
> > even seem to select it. All other than filesystems seems to just need
> > utf conversions. At least for quick eye. Other option is to seperate
> > nls and utf, but I'm not fan this idea just yet at least.
>
> nls tables can be already compiled as modules. There are also
> inefficient implementations of some nls tables (e.g. ascii or
> iso8859-1). So there are already places for decreasing size of nls
> code without loosing any functionality.
There will still be default_table in and many times we won't need it as
we only be using utf conversion.
>
> > Whole point is to help little bit small Linux and embedded devices. I'm
> > happy to do this, but all really depens on if utf8 can be default and
> > that we sure can think before hand.
>
> I agree that on modern embedded systems there is no reason to use
> non-utf8 encoding if you are not targeting some legacy userspace.
>
> So allowing to compile filesystems also without nls code (in which case
> they would use only utf-8) makes sense.
Now I have looked code little more and it kinda makes sense to even just
seperate nls and utf. Only filesystems will need nls and rest can do
with just utf so kinda makes sense here. Also utf stuff probably has no
need to be module because usually when something selects it (pci, acpi,
usb) they cannot be modules. But I'm not expert in what the drawbacks
are here.
>
> > Argillander
> >
> > > For more details look at email thread where was discussed fs unification:
> > > https://lore.kernel.org/linux-fsdevel/20200102211855.gg62r7jshp742d6i@pali/t/#u
> > >
> > > This patch series is mostly untested and presented as RFC. Please let me
> > > know what do you think about it and if is the correct way how to fix
> > > broken UTF-8 support in fs drivers. As explained in above email thread I
> > > think it does not make sense to try fixing whole NLS framework and it is
> > > easier to just drop this nls_utf8 module.
> > >
> > > Note: this patch series does not address UTF-8 fat case-sensitivity issue:
> > > https://lore.kernel.org/linux-fsdevel/20200119221455.bac7dc55g56q2l4r@pali/
> > >
> > > Pali Roh?r (20):
> > > fat: Fix iocharset=utf8 mount option
> > > hfsplus: Add iocharset= mount option as alias for nls=
> > > udf: Fix iocharset=utf8 mount option
> > > isofs: joliet: Fix iocharset=utf8 mount option
> > > ntfs: Undeprecate iocharset= mount option
> > > ntfs: Fix error processing when load_nls() fails
> > > befs: Fix printing iocharset= mount option
> > > befs: Rename enum value Opt_charset to Opt_iocharset to match mount
> > > option
> > > befs: Fix error processing when load_nls() fails
> > > befs: Allow to use native UTF-8 mode
> > > hfs: Explicitly set hsb->nls_disk when hsb->nls_io is set
> > > hfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > > hfsplus: Do not use broken utf8 NLS table for iocharset=utf8 mount
> > > option
> > > jfs: Remove custom iso8859-1 implementation
> > > jfs: Fix buffer overflow in jfs_strfromUCS_le() function
> > > jfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > > ntfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > > cifs: Do not use broken utf8 NLS table for iocharset=utf8 mount option
> > > cifs: Remove usage of load_nls_default() calls
> > > nls: Drop broken nls_utf8 module
> > >
> > > fs/befs/linuxvfs.c | 22 ++++---
> > > fs/cifs/cifs_unicode.c | 128 +++++++++++++++++++++++-------------
> > > fs/cifs/cifs_unicode.h | 2 +-
> > > fs/cifs/cifsfs.c | 2 +
> > > fs/cifs/cifssmb.c | 8 +--
> > > fs/cifs/connect.c | 8 ++-
> > > fs/cifs/dfs_cache.c | 24 +++----
> > > fs/cifs/dir.c | 28 ++++++--
> > > fs/cifs/smb2pdu.c | 17 ++---
> > > fs/cifs/winucase.c | 14 ++--
> > > fs/fat/Kconfig | 15 -----
> > > fs/fat/dir.c | 17 ++---
> > > fs/fat/fat.h | 22 +++++++
> > > fs/fat/inode.c | 28 ++++----
> > > fs/fat/namei_vfat.c | 26 ++++++--
> > > fs/hfs/super.c | 62 ++++++++++++++---
> > > fs/hfs/trans.c | 62 +++++++++--------
> > > fs/hfsplus/dir.c | 6 +-
> > > fs/hfsplus/options.c | 39 ++++++-----
> > > fs/hfsplus/super.c | 7 +-
> > > fs/hfsplus/unicode.c | 31 ++++++++-
> > > fs/hfsplus/xattr.c | 14 ++--
> > > fs/hfsplus/xattr_security.c | 3 +-
> > > fs/isofs/inode.c | 27 ++++----
> > > fs/isofs/isofs.h | 1 -
> > > fs/isofs/joliet.c | 4 +-
> > > fs/jfs/jfs_dtree.c | 13 +++-
> > > fs/jfs/jfs_unicode.c | 35 +++++-----
> > > fs/jfs/jfs_unicode.h | 2 +-
> > > fs/jfs/super.c | 29 ++++++--
> > > fs/nls/Kconfig | 9 ---
> > > fs/nls/Makefile | 1 -
> > > fs/nls/nls_utf8.c | 67 -------------------
> > > fs/ntfs/dir.c | 6 +-
> > > fs/ntfs/inode.c | 5 +-
> > > fs/ntfs/super.c | 60 ++++++++---------
> > > fs/ntfs/unistr.c | 28 +++++++-
> > > fs/udf/super.c | 50 ++++++--------
> > > fs/udf/udf_sb.h | 2 -
> > > fs/udf/unicode.c | 4 +-
> > > 40 files changed, 510 insertions(+), 418 deletions(-)
> > > delete mode 100644 fs/nls/nls_utf8.c
> > >
> > > --
> > > 2.20.1
> > >
Hello! Sorry for a longer delay. Below are comments.
On Monday 09 August 2021 10:49:34 Viacheslav Dubeyko wrote:
> > On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
> >
> > NLS table for utf8 is broken and cannot be fixed.
> >
> > So instead of broken utf8 nls functions char2uni() and uni2char() use
> > functions utf8_to_utf32() and utf32_to_utf8() which implements correct
> > encoding and decoding between Unicode code points and UTF-8 sequence.
> >
> > When iochatset=utf8 is used then set hsb->nls_io to NULL and use it for
> > distinguish between the fact if NLS table or native UTF-8 functions should
> > be used.
> >
> > Signed-off-by: Pali Rohár <[email protected]>
> > ---
> > fs/hfs/super.c | 33 ++++++++++++++++++++++-----------
> > fs/hfs/trans.c | 24 ++++++++++++++++++++----
> > 2 files changed, 42 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index 86bc46746c7f..076308df41cf 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -149,10 +149,13 @@ static int hfs_show_options(struct seq_file *seq, struct dentry *root)
> > seq_printf(seq, ",part=%u", sbi->part);
> > if (sbi->session >= 0)
> > seq_printf(seq, ",session=%u", sbi->session);
> > - if (sbi->nls_disk)
> > + if (sbi->nls_disk) {
> > seq_printf(seq, ",codepage=%s", sbi->nls_disk->charset);
>
> Maybe, I am missing something. But where is the closing “}”?
See below...
>
> > - if (sbi->nls_io)
> > - seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
> > + if (sbi->nls_io)
> > + seq_printf(seq, ",iocharset=%s", sbi->nls_io->charset);
> > + else
> > + seq_puts(seq, ",iocharset=utf8");
> > + }
^
... Closing "}" is marked above.
> > if (sbi->s_quiet)
> > seq_printf(seq, ",quiet");
> > return 0;
> > @@ -225,6 +228,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> > char *p;
> > substring_t args[MAX_OPT_ARGS];
> > int tmp, token;
> > + int have_iocharset;
>
> What’s about boolean type?
Ok! No problem, I can use "bool" type. Just I was in impression that
code style of this driver is to use "int" type also for booleans.
Same for "false" and "true" as you mentioned below.
> >
> > /* initialize the sb with defaults */
> > hsb->s_uid = current_uid();
> > @@ -239,6 +243,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> > if (!options)
> > return 1;
> >
> > + have_iocharset = 0;
>
> What’s about false here?
>
> > +
> > while ((p = strsep(&options, ",")) != NULL) {
> > if (!*p)
> > continue;
> > @@ -332,18 +338,22 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> > kfree(p);
> > break;
> > case opt_iocharset:
> > - if (hsb->nls_io) {
> > + if (have_iocharset) {
> > pr_err("unable to change iocharset\n");
> > return 0;
> > }
> > p = match_strdup(&args[0]);
> > - if (p)
> > - hsb->nls_io = load_nls(p);
> > - if (!hsb->nls_io) {
> > - pr_err("unable to load iocharset \"%s\"\n", p);
> > - kfree(p);
> > + if (!p)
> > return 0;
> > + if (strcmp(p, "utf8") != 0) {
> > + hsb->nls_io = load_nls(p);
> > + if (!hsb->nls_io) {
> > + pr_err("unable to load iocharset \"%s\"\n", p);
> > + kfree(p);
> > + return 0;
> > + }
> > }
> > + have_iocharset = 1;
>
> What’s about true here?
>
> > kfree(p);
> > break;
> > default:
> > @@ -351,7 +361,7 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> > }
> > }
> >
> > - if (hsb->nls_io && !hsb->nls_disk) {
> > + if (have_iocharset && !hsb->nls_disk) {
> > /*
> > * Previous version of hfs driver did something unexpected:
> > * When codepage was not defined but iocharset was then
> > @@ -382,7 +392,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
> > return 0;
> > }
> > }
> > - if (hsb->nls_disk && !hsb->nls_io) {
> > + if (hsb->nls_disk &&
> > + !have_iocharset && strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0) {
>
> Maybe, introduce the variable to calculate the boolean value here? Then if statement will look much cleaner.
I'm not sure how to do it to make code look cleaner.
Currently there is:
if (hsb->nls_disk &&
!have_iocharset && strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0) {
hsb->nls_io = load_nls_default();
...
}
I can replace it e.g. by:
bool need_to_load_nls;
...
if (hsb->nls_disk &&
!have_iocharset && strcmp(CONFIG_NLS_DEFAULT, "utf8") != 0)
need_to_load_nls = true;
else
need_to_load_nls = false;
if (need_to_load_nls) {
hsb->nls_io = load_nls_default();
...
}
But it is just longer, condition is still there and it requires one
additional variable which more me is less readable because it is longer.
> > hsb->nls_io = load_nls_default();
> > if (!hsb->nls_io) {
> > pr_err("unable to load default iocharset\n");
> > diff --git a/fs/hfs/trans.c b/fs/hfs/trans.c
> > index c75682c61b06..bff8e54003ab 100644
> > --- a/fs/hfs/trans.c
> > +++ b/fs/hfs/trans.c
> > @@ -44,7 +44,7 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
> > srclen = HFS_NAMELEN;
> > dst = out;
> > dstlen = HFS_MAX_NAMELEN;
> > - if (nls_io) {
> > + if (nls_disk) {
> > wchar_t ch;
> >
>
> I could miss something here. But what’s about the closing “}”?
Closing "}" is there on the same location as it was. Before my change on
"if" line was opening "{" and also with my change there is opening "{".
So opening "{" and closing "}" are there and matches.
> Thanks,
> Slava.
>
> > while (srclen > 0) {
> > @@ -57,7 +57,12 @@ int hfs_mac2asc(struct super_block *sb, char *out, const struct hfs_name *in)
> > srclen -= size;
> > if (ch == '/')
> > ch = ':';
> > - size = nls_io->uni2char(ch, dst, dstlen);
> > + if (nls_io)
> > + size = nls_io->uni2char(ch, dst, dstlen);
> > + else if (dstlen > 0)
> > + size = utf32_to_utf8(ch, dst, dstlen);
> > + else
> > + size = -ENAMETOOLONG;
> > if (size < 0) {
> > if (size == -ENAMETOOLONG)
> > goto out;
> > @@ -101,11 +106,22 @@ void hfs_asc2mac(struct super_block *sb, struct hfs_name *out, const struct qstr
> > srclen = in->len;
> > dst = out->name;
> > dstlen = HFS_NAMELEN;
> > - if (nls_io) {
> > + if (nls_disk) {
> > wchar_t ch;
> > + unicode_t u;
> >
> > while (srclen > 0) {
> > - size = nls_io->char2uni(src, srclen, &ch);
> > + if (nls_io)
> > + size = nls_io->char2uni(src, srclen, &ch);
> > + else {
> > + size = utf8_to_utf32(str, strlen, &u);
> > + if (size >= 0) {
> > + if (u <= MAX_WCHAR_T)
> > + ch = u;
> > + else
> > + size = -EINVAL;
> > + }
> > + }
> > if (size < 0) {
> > ch = '?';
> > size = 1;
> > --
> > 2.20.1
> >
>
Hello!
On Monday 09 August 2021 10:42:02 Viacheslav Dubeyko wrote:
> > On Aug 8, 2021, at 9:24 AM, Pali Rohár <[email protected]> wrote:
> >
> > NLS table for utf8 is broken and cannot be fixed.
> >
> > So instead of broken utf8 nls functions char2uni() and uni2char() use
> > functions utf8_to_utf32() and utf32_to_utf8() which implements correct
> > encoding and decoding between Unicode code points and UTF-8 sequence.
> >
> > Note that this fs driver does not support full Unicode range, specially
> > UTF-16 surrogate pairs are unsupported. This patch does not change this
> > limitation and support for UTF-16 surrogate pairs stay unimplemented.
> >
> > When iochatset=utf8 is used then set sbi->nls to NULL and use it for
> > distinguish between the fact if NLS table or native UTF-8 functions should
> > be used.
> >
> > Signed-off-by: Pali Rohár <[email protected]>
> > ---
> > fs/hfsplus/dir.c | 6 ++++--
> > fs/hfsplus/options.c | 32 ++++++++++++++++++--------------
> > fs/hfsplus/super.c | 7 +------
> > fs/hfsplus/unicode.c | 31 ++++++++++++++++++++++++++++---
> > fs/hfsplus/xattr.c | 14 +++++++++-----
> > fs/hfsplus/xattr_security.c | 3 ++-
> > 6 files changed, 62 insertions(+), 31 deletions(-)
> >
> > diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
> > index 84714bbccc12..2caf0cd82221 100644
> > --- a/fs/hfsplus/dir.c
> > +++ b/fs/hfsplus/dir.c
> > @@ -144,7 +144,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> > err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
> > if (err)
> > return err;
> > - strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
> > + strbuf = kmalloc((HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> > + HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
>
> Maybe, introduce some variable that will contain the length calculation?
Ok! I can introduce variable with calculated length into all places.
> > if (!strbuf) {
> > err = -ENOMEM;
> > goto out;
> > @@ -203,7 +204,8 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> > hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
> > fd.entrylength);
> > type = be16_to_cpu(entry.type);
> > - len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
> > + len = (HFSPLUS_SB(sb)->nls ? NLS_MAX_CHARSET_SIZE : 4) *
> > + HFSPLUS_MAX_STRLEN;
> > err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
> > if (err)
> > goto out;
> > diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
> > index a975548f6b91..16c08cb5c4f8 100644
> > --- a/fs/hfsplus/options.c
> > +++ b/fs/hfsplus/options.c
> > @@ -104,6 +104,9 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> > char *p;
> > substring_t args[MAX_OPT_ARGS];
> > int tmp, token;
> > + int have_iocharset;
> > +
> > + have_iocharset = 0;
>
> What’s about boolean type and to use true/false?
Ok. I can change type to "bool" and use "true"/"false" values.
> >
> > if (!input)
> > goto done;
> > @@ -171,20 +174,24 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
> > pr_warn("option nls= is deprecated, use iocharset=\n");
> > /* fallthrough */
> > case opt_iocharset:
> > - if (sbi->nls) {
> > + if (have_iocharset) {
> > pr_err("unable to change nls mapping\n");
> > return 0;
> > }
> > p = match_strdup(&args[0]);
> > - if (p)
> > - sbi->nls = load_nls(p);
> > - if (!sbi->nls) {
> > - pr_err("unable to load nls mapping \"%s\"\n",
> > - p);
> > - kfree(p);
> > + if (!p)
> > return 0;
> > + if (strcmp(p, "utf8") != 0) {
> > + sbi->nls = load_nls(p);
> > + if (!sbi->nls) {
> > + pr_err("unable to load nls mapping "
> > + "\"%s\"\n", p);
> > + kfree(p);
> > + return 0;
> > + }
> > }
> > kfree(p);
> > + have_iocharset = 1;
>
> Ditto. What’s about true here?
>
> > break;
> > case opt_decompose:
> > clear_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
...
> > @@ -256,7 +266,22 @@ int hfsplus_uni2asc(struct super_block *sb,
> > static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
> > wchar_t *uc)
> > {
> > - int size = HFSPLUS_SB(sb)->nls->char2uni(astr, len, uc);
> > + struct nls_table *nls = HFSPLUS_SB(sb)->nls;
> > + unicode_t u;
> > + int size;
> > +
> > + if (nls)
> > + size = nls->char2uni(astr, len, uc);
> > + else {
> > + size = utf8_to_utf32(astr, len, &u);
> > + if (size >= 0) {
> > + /* TODO: Add support for UTF-16 surrogate pairs */
>
> Have you forgot to delete this string? Or do you plan to implement this?
No. I have not forgot. In current version there is missing support for
UTF-16 surrogate pairs and this my patch still does not implement it.
So this is kind a issue / bug in the driver and at least it should be
documented. So reader of this code would know it and maybe somebody in
future will implement it.
> > + if (u <= MAX_WCHAR_T)
> > + *uc = u;
> > + else
> > + size = -EINVAL;
> > + }
> > + }