2018-10-16 04:59:23

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 0/9] Support encoding awareness and casefold

Hi Ted,

These are the modifications to e2fsprogs in order to support encoding
awareness and case folding. This patch series is divided in 3 parts:

Patch 1 & 2 work on reserving superblock fields. Patch 1 is actually
unrelated, just updating the super_block to resynchronize with the
kernel. Patch 2 reserves the feature bit and superblock fields for this
feature.

Patch 3 through 5 implements the changes the changes to mke2fs and
chattr/lsattr to enable the encoding feature at mkfs time and flipping
the casefold flag on demand for specific directories.

Patch 6 through 9 is where things get a bit ugly. fsck needs to become
encoding aware, in order to calculate directory hashes correctly and
verify/fix inconsistencies. This requires a tiny bit of plumbing to
pass the encoding information up to the point where we calculate the
hash, as well as implementing a simple nls-like interface in e2fsprogs
to do normalization/casefolding. You'll see that in this series I've
actually dropped the utf8 part because that patch is huge and I'd rather
discuss it separately. I did it in a hacky way now, where we import the
utf8n code from linux. I thought about using libunistring but it
doesn't seem to support versioning and we risk being incompatible with
the kernel hashes. I think we could follow the kernel approach and make
ucd files available in e2fsprogs and generate the data at
compilation. What do you think?

If you want to see a full utf8 capable version of this series, please
clone from:

https://gitlab.collabora.com/krisman/e2fsprogs -b encoding-feature-merge

If you don't object to patch 1 & 2, can we get them merged before the
rest of the series is ready, so I can reserve the bits in the super
block for this feature (patch 2) and avoid more rebasing on my side?

Thanks,

Gabriel Krisman Bertazi (9):
e2fsprogs: Add timestamp extension bits to superblock
e2fsprogs: Reserve feature bit and SB field bit for filename encoding
libe2p: Helpers for configuring the encoding superblock fields
mke2fs: Configure encoding during superblock initialization
chattr/lsattr: Support casefold attribute
lib/ext2fs: Implement NLS support
lib/ext2fs: Support encoding when calculating dx hashes
debugfs/htree: Support encoding when printing the file hash
tune2fs: Prevent enabling encryption flag on encoding-aware fs

debugfs/htree.c | 27 +++++++++++----
e2fsck/dx_dirinfo.c | 4 ++-
e2fsck/e2fsck.h | 4 ++-
e2fsck/pass1.c | 11 ++++--
e2fsck/pass2.c | 7 +++-
e2fsck/rehash.c | 12 ++++---
lib/e2p/Makefile.in | 8 +++--
lib/e2p/e2p.h | 4 +++
lib/e2p/encoding.c | 76 +++++++++++++++++++++++++++++++++++++++++
lib/e2p/feature.c | 2 ++
lib/e2p/pf.c | 1 +
lib/ext2fs/Makefile.in | 10 ++++--
lib/ext2fs/dirhash.c | 49 +++++++++++++++++++++++---
lib/ext2fs/ext2_fs.h | 31 +++++++++++++++--
lib/ext2fs/ext2fs.h | 6 +++-
lib/ext2fs/initialize.c | 4 +++
lib/ext2fs/nls.h | 65 +++++++++++++++++++++++++++++++++++
lib/ext2fs/nls_ascii.c | 48 ++++++++++++++++++++++++++
misc/chattr.c | 3 +-
misc/mke2fs.c | 43 +++++++++++++++++++++++
misc/tune2fs.c | 6 ++++
21 files changed, 393 insertions(+), 28 deletions(-)
create mode 100644 lib/e2p/encoding.c
create mode 100644 lib/ext2fs/nls.h
create mode 100644 lib/ext2fs/nls_ascii.c

--
2.19.1


2018-10-16 04:59:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 1/9] e2fsprogs: Add timestamp extension bits to superblock

Re-sync the superblock structure declaration with its kernel counterpart
to include the fields added by kernel commit 6a0678a79bb3 ("ext4: super:
extend timestamps to 40 bits")

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
lib/ext2fs/ext2_fs.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 13c2c20e5c42..ab2595486d21 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -748,7 +748,14 @@ struct ext2_super_block {
/*268*/ __le32 s_lpf_ino; /* Location of the lost+found inode */
__le32 s_prj_quota_inum; /* inode for tracking project quota */
/*270*/ __le32 s_checksum_seed; /* crc32c(orig_uuid) if csum_seed set */
- __le32 s_reserved[98]; /* Padding to the end of the block */
+/*274*/ __u8 s_wtime_hi;
+ __u8 s_mtime_hi;
+ __u8 s_mkfs_time_hi;
+ __u8 s_lastcheck_hi;
+ __u8 s_first_error_time_hi;
+ __u8 s_last_error_time_hi;
+ __u8 s_pad[2];
+ __le32 s_reserved[96]; /* Padding to the end of the block */
/*3fc*/ __u32 s_checksum; /* crc32c(superblock) */
};

--
2.19.1

2018-10-16 04:59:56

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 9/9] tune2fs: Prevent enabling encryption flag on encoding-aware fs

The kernel will refuse to mount filesystems with the encryption and
encoding features enabled at the same time. The encoding feature can
only be set at mount time, so we can just prevent encryption from being
set at a later time by tune2fs.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
misc/tune2fs.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index a680b461cc86..cda4d8076f81 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -1459,6 +1459,12 @@ mmp_error:
}

if (FEATURE_ON(E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_ENCRYPT)) {
+ if (ext2fs_has_feature_fname_encoding(sb)) {
+ fputs(_("Cannot enable feature 'encrypt' on filesystems "
+ "with the 'encoding' feature enabled.\n"),
+ stderr);
+ return 1;
+ }
fs->super->s_encrypt_algos[0] =
EXT4_ENCRYPTION_MODE_AES_256_XTS;
fs->super->s_encrypt_algos[1] =
--
2.19.1

2018-10-16 04:59:45

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 6/9] lib/ext2fs: Implement NLS support

Basic NLS support is required in e2fsprogs because of fsck, which
needsto calculate dx hashes for encoding aware filesystems. this patch
implements this infrastructure as well as ascii support.

We don't need to do all the dance of versioning as we do in the kernel,
because we know before-hand which encodings and versions we
support (those we know how to store in the sb), so it is simpler just to
create static tables.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
lib/ext2fs/Makefile.in | 10 +++++--
lib/ext2fs/nls.h | 65 ++++++++++++++++++++++++++++++++++++++++++
lib/ext2fs/nls_ascii.c | 48 +++++++++++++++++++++++++++++++
3 files changed, 121 insertions(+), 2 deletions(-)
create mode 100644 lib/ext2fs/nls.h
create mode 100644 lib/ext2fs/nls_ascii.c

diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 4a197cdf4e4a..a2f07403c9ae 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -20,6 +20,9 @@ COMPILE_ET= _ET_DIR_OVERRIDE=$(srcdir)/../et ../et/compile_et
@TEST_IO_CMT@TEST_IO_LIB_OBJS = test_io.o
@IMAGER_CMT@E2IMAGE_LIB_OBJS = imager.o

+NLS_OBJS=nls_ascii.o
+NLS_SRCS=nls_ascii.c
+
DEBUG_OBJS= debug_cmds.o extent_cmds.o tst_cmds.o debugfs.o util.o \
ncheck.o icheck.o ls.o lsdel.o dump.o set_fields.o logdump.o \
htree.o unused.o e2freefrag.o filefrag.o extent_inode.o zap.o \
@@ -130,7 +133,8 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
unlink.o \
valid_blk.o \
version.o \
- rbtree.o
+ rbtree.o \
+ $(NLS_OBJS)

SRCS= ext2_err.c \
$(srcdir)/alloc.c \
@@ -222,7 +226,8 @@ SRCS= ext2_err.c \
$(srcdir)/write_bb_file.c \
$(srcdir)/rbtree.c \
$(srcdir)/tst_libext2fs.c \
- $(DEBUG_SRCS)
+ $(DEBUG_SRCS) \
+ $(NLS_SRCS)

HFILES= bitops.h ext2fs.h ext2_io.h ext2_fs.h ext2_ext_attr.h ext3_extents.h \
tdb.h qcow2.h hashmap.h
@@ -1412,3 +1417,4 @@ do_journal.o: $(top_srcdir)/debugfs/do_journal.c $(top_builddir)/lib/config.h \
$(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/kernel-jbd.h \
$(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h \
$(top_srcdir)/debugfs/journal.h $(srcdir)/../../e2fsck/jfs_user.h
+$(NLS_OBJS): $(srcdir)/nls.h
diff --git a/lib/ext2fs/nls.h b/lib/ext2fs/nls.h
new file mode 100644
index 000000000000..b7f6ebcd3b25
--- /dev/null
+++ b/lib/ext2fs/nls.h
@@ -0,0 +1,65 @@
+/*
+ * nls.h - Header for encoding support functions
+ *
+ * Copyright (C) 2017 Collabora Ltd.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef EXT2FS_NLS_H
+#define EXT2FS_NLS_H
+
+#include <unistd.h>
+#include <string.h>
+#include <stdio.h>
+
+struct nls_table;
+
+#define ARRAY_SIZE(array) \
+ (sizeof(array) / sizeof(array[0]))
+
+struct nls_ops {
+ int (*normalize)(const struct nls_table *charset,
+ const unsigned char *str, size_t len,
+ unsigned char *dest, size_t dlen);
+
+ int (*casefold)(const struct nls_table *charset,
+ const unsigned char *str, size_t len,
+ unsigned char *dest, size_t dlen);
+};
+
+struct nls_table {
+ char *name;
+ const struct nls_ops *ops;
+};
+
+extern const struct nls_table nls_ascii;
+
+static const struct nls_table *encoding_list[] = {
+ &nls_ascii
+};
+
+static const struct nls_table *nls_load_table(const char *name)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(encoding_list); i++) {
+ if (strcmp(encoding_list[i]->name, name) == 0)
+ return encoding_list[i];
+ }
+ return NULL;
+}
+
+#endif
diff --git a/lib/ext2fs/nls_ascii.c b/lib/ext2fs/nls_ascii.c
new file mode 100644
index 000000000000..22e819849f3a
--- /dev/null
+++ b/lib/ext2fs/nls_ascii.c
@@ -0,0 +1,48 @@
+#include "nls.h"
+#include <string.h>
+
+static unsigned char charset_tolower(const struct nls_table *table,
+ unsigned int c)
+{
+ if (c >= 'A' && c <= 'Z')
+ return (c | 0x20);
+ return c;
+}
+
+static unsigned char charset_toupper(const struct nls_table *table,
+ unsigned int c)
+{
+ if (c >= 'a' && c <= 'z')
+ return (c & ~0x20);
+ return c;
+}
+
+static int ascii_casefold(const struct nls_table *table,
+ const unsigned char *str, size_t len,
+ unsigned char *dest, size_t dlen)
+{
+ unsigned i;
+
+ for (i = 0; i < len; i++)
+ dest[i] = charset_toupper(table, str[i]);
+
+ return len;
+}
+
+static int ascii_normalize(const struct nls_table *table,
+ const unsigned char *str, size_t len,
+ unsigned char *dest, size_t dlen)
+{
+ memcpy(dest, str, len);
+ return len;
+}
+
+const static struct nls_ops ascii_ops = {
+ .casefold = ascii_casefold,
+ .normalize = ascii_normalize,
+};
+
+const struct nls_table nls_ascii = {
+ .name = "ascii",
+ .ops = &ascii_ops,
+};
--
2.19.1

2018-10-16 04:59:35

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields

Implement helper functions to convert the encoding name and specific
parameters requested by the user on the command line into the format
that is written to disk.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
lib/e2p/Makefile.in | 8 +++--
lib/e2p/e2p.h | 4 +++
lib/e2p/encoding.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
lib/ext2fs/ext2_fs.h | 13 ++++++++
4 files changed, 99 insertions(+), 2 deletions(-)
create mode 100644 lib/e2p/encoding.c

diff --git a/lib/e2p/Makefile.in b/lib/e2p/Makefile.in
index 2b0aa1915130..68d534cdaf11 100644
--- a/lib/e2p/Makefile.in
+++ b/lib/e2p/Makefile.in
@@ -19,7 +19,8 @@ all:: e2p.pc
OBJS= feature.o fgetflags.o fsetflags.o fgetversion.o fsetversion.o \
getflags.o getversion.o hashstr.o iod.o ls.o ljs.o mntopts.o \
parse_num.o pe.o pf.o ps.o setflags.o setversion.o uuid.o \
- ostype.o percent.o crypto_mode.o fgetproject.o fsetproject.o
+ ostype.o percent.o crypto_mode.o fgetproject.o fsetproject.o \
+ encoding.o

SRCS= $(srcdir)/feature.c $(srcdir)/fgetflags.c \
$(srcdir)/fsetflags.c $(srcdir)/fgetversion.c \
@@ -29,7 +30,7 @@ SRCS= $(srcdir)/feature.c $(srcdir)/fgetflags.c \
$(srcdir)/pe.c $(srcdir)/pf.c $(srcdir)/ps.c \
$(srcdir)/setflags.c $(srcdir)/setversion.c $(srcdir)/uuid.c \
$(srcdir)/ostype.c $(srcdir)/percent.c $(srcdir)/crypto_mode.c \
- $(srcdir)/fgetproject.c $(srcdir)/fsetproject.c
+ $(srcdir)/fgetproject.c $(srcdir)/fsetproject.c $(srcdir)/encoding.c
HFILES= e2p.h

LIBRARY= libe2p
@@ -147,6 +148,9 @@ getversion.o: $(srcdir)/getversion.c $(top_builddir)/lib/config.h \
hashstr.o: $(srcdir)/hashstr.c $(top_builddir)/lib/config.h \
$(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
$(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
+encoding.o: $(srcdir)/encoding.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
+ $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
iod.o: $(srcdir)/iod.c $(top_builddir)/lib/config.h \
$(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
$(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
diff --git a/lib/e2p/e2p.h b/lib/e2p/e2p.h
index d70b59a5d358..c39074abe8eb 100644
--- a/lib/e2p/e2p.h
+++ b/lib/e2p/e2p.h
@@ -80,3 +80,7 @@ unsigned int e2p_percent(int percent, unsigned int base);

const char *e2p_encmode2string(int num);
int e2p_string2encmode(char *string);
+
+int e2p_str2encoding(const char *string);
+const char *e2p_encoding2str(int encoding);
+int e2p_str2encoding_flags(int encoding, char *param, __u16 *flags);
diff --git a/lib/e2p/encoding.c b/lib/e2p/encoding.c
new file mode 100644
index 000000000000..6904db73b94c
--- /dev/null
+++ b/lib/e2p/encoding.c
@@ -0,0 +1,76 @@
+/*
+ * encoding.c --- convert between encoding magic numbers and strings
+ *
+ * Copyright (C) 2018 Collabora Ltd.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include <stdio.h>
+
+#include "e2p.h"
+
+static const struct enc_flags {
+ __u16 flag;
+ char *param;
+} encoding_flags[] = {
+ { EXT4_ENC_STRICT_MODE_FL, "strict" },
+ {0, NULL},
+};
+
+/* Return a positive number < 0xff indicating the encoding magic number
+ * or a negative value indicating error. */
+int e2p_str2encoding(const char *string)
+{
+ int i;
+
+ for (i = 0 ; ext4_encoding_map[i].name; i++)
+ if (!strcmp(string, ext4_encoding_map[i].name))
+ return i;
+
+ return -EINVAL;
+}
+
+const char *e2p_encoding2str(int encoding)
+{
+ return ext4_encoding_map[encoding].name;
+}
+
+int e2p_str2encoding_flags(int encoding, char *param, __u16 *flags)
+{
+ char *f = strtok(param, "-");
+ const struct enc_flags *fl;
+ int neg = 0;
+
+ while (f) {
+ neg = 0;
+ if (!strncmp ("no", f, 2)) {
+ neg = 1;
+ f += 2;
+ }
+
+ for (fl = encoding_flags; fl->param; fl++) {
+ if (!strcmp(fl->param, f)) {
+ if (neg)
+ *flags &= ~fl->flag;
+ else
+ *flags |= fl->flag;
+
+ goto next_flag;
+ }
+ }
+ return -EINVAL;
+ next_flag:
+ f = strtok(NULL, "-");
+ }
+ return 0;
+}
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index f1c405b76339..df8ced088f38 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -1127,4 +1127,17 @@ struct mmp_struct {
*/
#define EXT4_INLINE_DATA_DOTDOT_SIZE (4)

+#define EXT4_ENC_STRICT_MODE_FL (1 << 0) /* Reject invalid sequences? */
+#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
+#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)
+
+static const struct ext4_sb_encoding_map {
+ char *name;
+ __u16 default_flags;
+} ext4_encoding_map[] = {
+ /* 0x0 */ { "ascii", 0x0},
+ /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},
+ {0x0, 0x0},
+};
+
#endif /* _LINUX_EXT2_FS_H */
--
2.19.1

2018-10-16 04:59:41

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 5/9] chattr/lsattr: Support casefold attribute

This flag can be set on directories to request insensitive file name
lookups.

I used the letter 'F', referring to "caseFold" for lack of a better
option.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
lib/e2p/pf.c | 1 +
lib/ext2fs/ext2_fs.h | 5 +++--
misc/chattr.c | 3 ++-
3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c
index 884f1671edae..0c6998c4b766 100644
--- a/lib/e2p/pf.c
+++ b/lib/e2p/pf.c
@@ -44,6 +44,7 @@ static struct flags_name flags_array[] = {
{ EXT2_TOPDIR_FL, "T", "Top_of_Directory_Hierarchies" },
{ EXT4_EXTENTS_FL, "e", "Extents" },
{ FS_NOCOW_FL, "C", "No_COW" },
+ { EXT4_CASEFOLD_FL, "F", "Casefold" },
{ EXT4_INLINE_DATA_FL, "N", "Inline_Data" },
{ EXT4_PROJINHERIT_FL, "P", "Project_Hierarchy" },
{ EXT4_VERITY_FL, "V", "Verity" },
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index df8ced088f38..b27380866198 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -339,10 +339,11 @@ struct ext2_dx_tail {
#define EXT4_SNAPFILE_SHRUNK_FL 0x08000000 /* Snapshot shrink has completed */
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data */
#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
+#define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded file */
#define EXT2_RESERVED_FL 0x80000000 /* reserved for ext2 lib */

-#define EXT2_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
-#define EXT2_FL_USER_MODIFIABLE 0x204B80FF /* User modifiable flags */
+#define EXT2_FL_USER_VISIBLE 0x604BDFFF /* User visible flags */
+#define EXT2_FL_USER_MODIFIABLE 0x604B80FF /* User modifiable flags */

/*
* ioctl commands
diff --git a/misc/chattr.c b/misc/chattr.c
index a5b401a741b7..a5d60170bdb6 100644
--- a/misc/chattr.c
+++ b/misc/chattr.c
@@ -86,7 +86,7 @@ static unsigned long sf;
static void usage(void)
{
fprintf(stderr,
- _("Usage: %s [-pRVf] [-+=aAcCdDeijPsStTu] [-v version] files...\n"),
+ _("Usage: %s [-pRVf] [-+=aAcCdDeijPsStTuF] [-v version] files...\n"),
program_name);
exit(1);
}
@@ -112,6 +112,7 @@ static const struct flags_char flags_array[] = {
{ EXT2_NOTAIL_FL, 't' },
{ EXT2_TOPDIR_FL, 'T' },
{ FS_NOCOW_FL, 'C' },
+ { EXT4_CASEFOLD_FL, 'F' },
{ 0, 0 }
};

--
2.19.1

2018-10-16 04:59:38

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 4/9] mke2fs: Configure encoding during superblock initialization

This patch implements two new extended options to mkefs, allowing the
user to specify an encoding for file name operations and encoding flags
during filesystem creation. We provide default flags for each encoding,
which the user can overwrite by passing -E encoding-flags to mkfs.
---
lib/ext2fs/initialize.c | 4 ++++
misc/mke2fs.c | 43 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)

diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index 8c9e97fee831..30b1ae033340 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -186,6 +186,10 @@ errcode_t ext2fs_initialize(const char *name, int flags,
set_field(s_flags, 0);
assign_field(s_backup_bgs[0]);
assign_field(s_backup_bgs[1]);
+
+ assign_field(s_encoding);
+ assign_field(s_encoding_flags);
+
if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
retval = EXT2_ET_UNSUPP_FEATURE;
goto cleanup;
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index f05003fc30b9..5ed7b987540e 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -790,6 +790,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
int len;
int r_usage = 0;
int ret;
+ int encoding = -1;
+ char *encoding_flags = NULL;

len = strlen(opts);
buf = malloc(len+1);
@@ -1056,6 +1058,26 @@ static void parse_extended_opts(struct ext2_super_block *param,
}
} else if (!strcmp(token, "android_sparse")) {
android_sparse_file = 1;
+ } else if (!strcmp(token, "encoding")) {
+ if (!arg) {
+ r_usage++;
+ continue;
+ }
+
+ encoding = e2p_str2encoding(arg);
+ if (encoding < 0) {
+ fprintf(stderr, _("Invalid encoding: %s"), arg);
+ r_usage++;
+ continue;
+ }
+ param->s_encoding = encoding;
+ ext2fs_set_feature_fname_encoding(param);
+ } else if (!strcmp(token, "encoding-flags")) {
+ if (!arg) {
+ r_usage++;
+ continue;
+ }
+ encoding_flags = arg;
} else {
r_usage++;
badopt = token;
@@ -1080,6 +1102,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
"\ttest_fs\n"
"\tdiscard\n"
"\tnodiscard\n"
+ "\tencoding=<encoding>\n"
+ "\tencoding-flags=<flags>\n"
"\tquotatype=<quota type(s) to be enabled>\n\n"),
badopt ? badopt : "");
free(buf);
@@ -1091,6 +1115,24 @@ static void parse_extended_opts(struct ext2_super_block *param,
"multiple of stride %u.\n\n"),
param->s_raid_stripe_width, param->s_raid_stride);

+ if (ext2fs_has_feature_fname_encoding(param)) {
+ param->s_encoding_flags =
+ ext4_encoding_map[encoding].default_flags;
+ if (encoding_flags &&
+ e2p_str2encoding_flags(encoding, encoding_flags,
+ &param->s_encoding_flags)) {
+ fprintf(stderr, _("error: Invalid encoding flag: %s\n"),
+ encoding_flags);
+ free(buf);
+ exit(1);
+ }
+ } else if (encoding_flags) {
+ fprintf(stderr, _("error: An encoding must be explicitely "
+ "specified when passing encoding-flags\n"));
+ free(buf);
+ exit(1);
+ }
+
free(buf);
}

@@ -1112,6 +1154,7 @@ static __u32 ok_features[3] = {
EXT4_FEATURE_INCOMPAT_64BIT|
EXT4_FEATURE_INCOMPAT_INLINE_DATA|
EXT4_FEATURE_INCOMPAT_ENCRYPT |
+ EXT4_FEATURE_INCOMPAT_FNAME_ENCODING |
EXT4_FEATURE_INCOMPAT_CSUM_SEED |
EXT4_FEATURE_INCOMPAT_LARGEDIR,
/* R/O compat */
--
2.19.1

2018-10-16 04:59:51

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 7/9] lib/ext2fs: Support encoding when calculating dx hashes

fsck must be aware of the superblock encoding and the casefold directory
setting, such that it is able to correctly calculate the dentry hashes.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
debugfs/htree.c | 7 ++++---
e2fsck/dx_dirinfo.c | 4 +++-
e2fsck/e2fsck.h | 4 +++-
e2fsck/pass1.c | 11 ++++++++--
e2fsck/pass2.c | 7 ++++++-
e2fsck/rehash.c | 12 +++++++----
lib/ext2fs/dirhash.c | 49 ++++++++++++++++++++++++++++++++++++++++----
lib/ext2fs/ext2fs.h | 5 ++++-
8 files changed, 82 insertions(+), 17 deletions(-)

diff --git a/debugfs/htree.c b/debugfs/htree.c
index 0c6a3852393e..51ae3fa94cc8 100644
--- a/debugfs/htree.c
+++ b/debugfs/htree.c
@@ -89,7 +89,7 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
}
strncpy(name, dirent->name, thislen);
name[thislen] = '\0';
- errcode = ext2fs_dirhash(hash_alg, name,
+ errcode = ext2fs_dirhash(NULL, hash_alg, 0, name,
thislen, fs->super->s_hash_seed,
&hash, &minor_hash);
if (errcode)
@@ -339,8 +339,9 @@ void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
"[-s hash_seed] filename");
return;
}
- err = ext2fs_dirhash(hash_version, argv[optind], strlen(argv[optind]),
- hash_seed, &hash, &minor_hash);
+ err = ext2fs_dirhash(NULL, hash_version, 0, argv[optind],
+ strlen(argv[optind]), hash_seed, &hash,
+ &minor_hash);
if (err) {
com_err(argv[0], err, "while calculating hash");
return;
diff --git a/e2fsck/dx_dirinfo.c b/e2fsck/dx_dirinfo.c
index c7b605685339..c0b0e9a41235 100644
--- a/e2fsck/dx_dirinfo.c
+++ b/e2fsck/dx_dirinfo.c
@@ -13,7 +13,8 @@
* entry. During pass1, the passed-in parent is 0; it will get filled
* in during pass2.
*/
-void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks)
+void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, struct ext2_inode *inode,
+ int num_blocks)
{
struct dx_dir_info *dir;
int i, j;
@@ -72,6 +73,7 @@ void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks)
dir->ino = ino;
dir->numblocks = num_blocks;
dir->hashversion = 0;
+ dir->casefolded_hash = inode->i_flags & EXT4_CASEFOLD_FL;
dir->dx_block = e2fsck_allocate_memory(ctx, num_blocks
* sizeof (struct dx_dirblock_info),
"dx_block info array");
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index cd5cba2f6031..1c7a67cba1ce 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -109,6 +109,7 @@ struct dx_dir_info {
int hashversion;
short depth; /* depth of tree */
struct dx_dirblock_info *dx_block; /* Array of size numblocks */
+ int casefolded_hash;
};

#define DX_DIRBLOCK_ROOT 1
@@ -471,7 +472,8 @@ extern int e2fsck_dir_info_get_dotdot(e2fsck_t ctx, ext2_ino_t ino,
ext2_ino_t *dotdot);

/* dx_dirinfo.c */
-extern void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks);
+extern void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino,
+ struct ext2_inode *inode, int num_blocks);
extern struct dx_dir_info *e2fsck_get_dx_dir_info(e2fsck_t ctx, ext2_ino_t ino);
extern void e2fsck_free_dx_dir_info(e2fsck_t ctx);
extern int e2fsck_get_num_dx_dirinfo(e2fsck_t ctx);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 8abf0c33a1d3..65cf5e140edf 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -48,6 +48,8 @@

#include "e2fsck.h"
#include <ext2fs/ext2_ext_attr.h>
+#include <ext2fs/nls.h>
+#include <e2p/e2p.h>

#include "problem.h"

@@ -1169,7 +1171,7 @@ void e2fsck_pass1(e2fsck_t ctx)
struct problem_context pctx;
struct scan_callback_struct scan_struct;
struct ext2_super_block *sb = ctx->fs->super;
- const char *old_op;
+ const char *old_op, *encoding_name;
int imagic_fs, extent_fs, inlinedata_fs;
int low_dtime_check = 1;
unsigned int inode_size = EXT2_INODE_SIZE(fs->super);
@@ -1217,6 +1219,11 @@ void e2fsck_pass1(e2fsck_t ctx)
extent_fs = ext2fs_has_feature_extents(sb);
inlinedata_fs = ext2fs_has_feature_inline_data(sb);

+ if (ext2fs_has_feature_fname_encoding(sb)) {
+ encoding_name = e2p_encoding2str(sb->s_encoding);
+ fs->encoding = nls_load_table(encoding_name);
+ }
+
/*
* Allocate bitmaps structures
*/
@@ -3381,7 +3388,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
inode->i_flags &= ~EXT2_INDEX_FL;
dirty_inode++;
} else {
- e2fsck_add_dx_dir(ctx, ino, pb.last_block+1);
+ e2fsck_add_dx_dir(ctx, ino, inode, pb.last_block+1);
}
}

diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index b92eec1e149f..c1c2c6160512 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -933,6 +933,7 @@ static int check_dir_block(ext2_filsys fs,
int filetype = 0;
int encrypted = 0;
size_t max_block_size;
+ int hash_flags = 0;

cd = (struct check_dir_struct *) priv_data;
ibuf = buf = cd->buf;
@@ -1426,7 +1427,11 @@ skip_checksum:
dir_modified++;

if (dx_db) {
- ext2fs_dirhash(dx_dir->hashversion, dirent->name,
+ if (dx_dir->casefolded_hash)
+ hash_flags = EXT4_CASEFOLD_FL;
+
+ ext2fs_dirhash(fs->encoding, dx_dir->hashversion,
+ hash_flags, dirent->name,
ext2fs_dirent_name_len(dirent),
fs->super->s_hash_seed, &hash, 0);
if (hash < dx_db->min_hash)
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 7c4ab0836482..25e947615778 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -113,7 +113,7 @@ static int fill_dir_block(ext2_filsys fs,
struct ext2_dir_entry *dirent;
char *dir;
unsigned int offset, dir_offset, rec_len, name_len;
- int hash_alg;
+ int hash_alg, hash_flags;

if (blockcnt < 0)
return 0;
@@ -139,6 +139,7 @@ static int fill_dir_block(ext2_filsys fs,
if (fd->err)
return BLOCK_ABORT;
}
+ hash_flags = fd->inode->i_flags & EXT4_CASEFOLD_FL;
hash_alg = fs->super->s_def_hash_version;
if ((hash_alg <= EXT2_HASH_TEA) &&
(fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
@@ -184,8 +185,9 @@ static int fill_dir_block(ext2_filsys fs,
if (fd->compress)
ent->hash = ent->minor_hash = 0;
else {
- fd->err = ext2fs_dirhash(hash_alg, dirent->name,
- name_len,
+ fd->err = ext2fs_dirhash(fs->encoding, hash_alg,
+ hash_flags,
+ dirent->name, name_len,
fs->super->s_hash_seed,
&ent->hash, &ent->minor_hash);
if (fd->err)
@@ -371,6 +373,7 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
char new_name[256];
unsigned int new_len;
int hash_alg;
+ int hash_flags = fd->inode->i_flags & EXT4_CASEFOLD_FL;

clear_problem_context(&pctx);
pctx.ino = ino;
@@ -415,7 +418,8 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
if (fix_problem(ctx, PR_2_NON_UNIQUE_FILE, &pctx)) {
memcpy(ent->dir->name, new_name, new_len);
ext2fs_dirent_set_name_len(ent->dir, new_len);
- ext2fs_dirhash(hash_alg, new_name, new_len,
+ ext2fs_dirhash(fs->encoding, hash_alg, hash_flags,
+ new_name, new_len,
fs->super->s_hash_seed,
&ent->hash, &ent->minor_hash);
fixed++;
diff --git a/lib/ext2fs/dirhash.c b/lib/ext2fs/dirhash.c
index 4ba3f35c091f..2198a6fd4d2a 100644
--- a/lib/ext2fs/dirhash.c
+++ b/lib/ext2fs/dirhash.c
@@ -14,9 +14,11 @@
#include "config.h"
#include <stdio.h>
#include <string.h>
+#include <limits.h>

#include "ext2_fs.h"
#include "ext2fs.h"
+#include "nls.h"

/*
* Keyed 32-bit hash function using TEA in a Davis-Meyer function
@@ -185,10 +187,10 @@ static void str2hashbuf(const char *msg, int len, __u32 *buf, int num,
* represented, and whether or not the returned hash is 32 bits or 64
* bits. 32 bit hashes will return 0 for the minor hash.
*/
-errcode_t ext2fs_dirhash(int version, const char *name, int len,
- const __u32 *seed,
- ext2_dirhash_t *ret_hash,
- ext2_dirhash_t *ret_minor_hash)
+errcode_t _ext2fs_dirhash(int version, const char *name, int len,
+ const __u32 *seed,
+ ext2_dirhash_t *ret_hash,
+ ext2_dirhash_t *ret_minor_hash)
{
__u32 hash;
__u32 minor_hash = 0;
@@ -257,3 +259,42 @@ errcode_t ext2fs_dirhash(int version, const char *name, int len,
*ret_minor_hash = minor_hash;
return 0;
}
+
+errcode_t ext2fs_dirhash(const struct nls_table *charset, int version,
+ int hash_flags, const char *name, int len,
+ const __u32 *seed,
+ ext2_dirhash_t *ret_hash,
+ ext2_dirhash_t *ret_minor_hash)
+{
+ errcode_t r;
+ int dlen;
+ unsigned char *buff;
+
+ if (len && charset) {
+ buff = calloc(sizeof (char), PATH_MAX);
+ if (!buff)
+ return -1;
+
+ if (hash_flags & EXT4_CASEFOLD_FL)
+ dlen = charset->ops->casefold(charset, name, len, buff,
+ PATH_MAX);
+ else
+ dlen = charset->ops->normalize(charset, name, len, buff,
+ PATH_MAX);
+
+ if (dlen < 0) {
+ free(buff);
+ goto opaque_seq;
+ }
+
+ r = _ext2fs_dirhash(version, buff, dlen, seed, ret_hash,
+ ret_minor_hash);
+
+ free(buff);
+ return r;
+ }
+
+opaque_seq:
+ return _ext2fs_dirhash(version, name, len, seed, ret_hash,
+ ret_minor_hash);
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 64c5b8758a40..e50d8a066ef3 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -307,6 +307,8 @@ struct struct_ext2_filsys {

/* hashmap for SHA of data blocks */
struct ext2fs_hashmap* block_sha_map;
+
+ const struct nls_table *encoding;
};

#if EXT2_FLAT_INCLUDES
@@ -1169,7 +1171,8 @@ extern errcode_t ext2fs_write_dir_block4(ext2_filsys fs, blk64_t block,
void *buf, int flags, ext2_ino_t ino);

/* dirhash.c */
-extern errcode_t ext2fs_dirhash(int version, const char *name, int len,
+extern errcode_t ext2fs_dirhash(const struct nls_table *charset, int version,
+ int hash_flags, const char *name, int len,
const __u32 *seed,
ext2_dirhash_t *ret_hash,
ext2_dirhash_t *ret_minor_hash);
--
2.19.1

2018-10-16 04:59:30

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 2/9] e2fsprogs: Reserve feature bit and SB field bit for filename encoding

The s_encoding field in the superblock stores a magic number indicating
the encoding format and version used globally by file and directory
names in the filesystem.

The s_encoding_flags defines policies for using the charset encoding,
like how to handle invalid sequences and what kind of normalization to
use.

A feature flag is also allocated to indicate whether this filesystem has
encoding awareness enabled.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
lib/e2p/feature.c | 2 ++
lib/ext2fs/ext2_fs.h | 6 +++++-
lib/ext2fs/ext2fs.h | 1 +
3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index e3b0dab83c81..294a56a40b52 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -109,6 +109,8 @@ static struct feature feature_list[] = {
"inline_data"},
{ E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_ENCRYPT,
"encrypt"},
+ { E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FNAME_ENCODING,
+ "encoding"},
{ 0, 0, 0 },
};

diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index ab2595486d21..f1c405b76339 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -755,7 +755,9 @@ struct ext2_super_block {
__u8 s_first_error_time_hi;
__u8 s_last_error_time_hi;
__u8 s_pad[2];
- __le32 s_reserved[96]; /* Padding to the end of the block */
+/*27c*/ __le16 s_encoding; /* Filename charset encoding */
+ __le16 s_encoding_flags; /* Filename charset encoding flags */
+ __le32 s_reserved[95]; /* Padding to the end of the block */
/*3fc*/ __u32 s_checksum; /* crc32c(superblock) */
};

@@ -846,6 +848,7 @@ struct ext2_super_block {
#define EXT4_FEATURE_INCOMPAT_LARGEDIR 0x4000 /* >2GB or 3-lvl htree */
#define EXT4_FEATURE_INCOMPAT_INLINE_DATA 0x8000 /* data in inode */
#define EXT4_FEATURE_INCOMPAT_ENCRYPT 0x10000
+#define EXT4_FEATURE_INCOMPAT_FNAME_ENCODING 0x20000

#define EXT4_FEATURE_COMPAT_FUNCS(name, ver, flagname) \
static inline int ext2fs_has_feature_##name(struct ext2_super_block *sb) \
@@ -939,6 +942,7 @@ EXT4_FEATURE_INCOMPAT_FUNCS(csum_seed, 4, CSUM_SEED)
EXT4_FEATURE_INCOMPAT_FUNCS(largedir, 4, LARGEDIR)
EXT4_FEATURE_INCOMPAT_FUNCS(inline_data, 4, INLINE_DATA)
EXT4_FEATURE_INCOMPAT_FUNCS(encrypt, 4, ENCRYPT)
+EXT4_FEATURE_INCOMPAT_FUNCS(fname_encoding, 4, FNAME_ENCODING)

#define EXT2_FEATURE_COMPAT_SUPP 0
#define EXT2_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE| \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 185be5df511f..64c5b8758a40 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -616,6 +616,7 @@ typedef struct ext2_icount *ext2_icount_t;
EXT4_FEATURE_INCOMPAT_64BIT|\
EXT4_FEATURE_INCOMPAT_INLINE_DATA|\
EXT4_FEATURE_INCOMPAT_ENCRYPT|\
+ EXT4_FEATURE_INCOMPAT_FNAME_ENCODING|\
EXT4_FEATURE_INCOMPAT_CSUM_SEED|\
EXT4_FEATURE_INCOMPAT_LARGEDIR)

--
2.19.1

2018-10-16 04:59:52

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH e2fsprogs 8/9] debugfs/htree: Support encoding when printing the file hash

Implement two parameters -e and -c, to specify encoding and casefold
when printing the hash of a given file.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
debugfs/htree.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/debugfs/htree.c b/debugfs/htree.c
index 51ae3fa94cc8..afa297e65a85 100644
--- a/debugfs/htree.c
+++ b/debugfs/htree.c
@@ -27,6 +27,8 @@ extern char *optarg;
#include "uuid/uuid.h"
#include "e2p/e2p.h"

+#include "ext2fs/nls.h"
+
static FILE *pager;

static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
@@ -44,6 +46,7 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
ext2_dirhash_t hash, minor_hash;
unsigned int rec_len;
int hash_alg;
+ int hash_flags = inode->i_flags & EXT4_CASEFOLD_FL;
int csum_size = 0;

if (ext2fs_has_feature_metadata_csum(fs->super))
@@ -89,8 +92,8 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
}
strncpy(name, dirent->name, thislen);
name[thislen] = '\0';
- errcode = ext2fs_dirhash(NULL, hash_alg, 0, name,
- thislen, fs->super->s_hash_seed,
+ errcode = ext2fs_dirhash(fs->encoding, hash_alg, hash_flags,
+ name, thislen, fs->super->s_hash_seed,
&hash, &minor_hash);
if (errcode)
com_err("htree_dump_leaf_node", errcode,
@@ -306,11 +309,12 @@ errout:
void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
void *infop EXT2FS_ATTR((unused)))
{
- ext2_dirhash_t hash, minor_hash;
+ ext2_dirhash_t hash, minor_hash, hash_flags;
errcode_t err;
int c;
int hash_version = 0;
__u32 hash_seed[4];
+ const struct nls_table *encoding;

hash_seed[0] = hash_seed[1] = hash_seed[2] = hash_seed[3] = 0;

@@ -329,6 +333,15 @@ void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
return;
}
break;
+ case 'c':
+ hash_flags = EXT4_CASEFOLD_FL;
+ break;
+ case 'e':
+ encoding = nls_load_table(optarg);
+ if (!encoding)
+ fprintf(stderr, "Invalid encoding: %s\n",
+ optarg);
+ return;
default:
goto print_usage;
}
@@ -336,10 +349,11 @@ void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
if (optind != argc-1) {
print_usage:
com_err(argv[0], 0, "usage: dx_hash [-h hash_alg] "
- "[-s hash_seed] filename");
+ "[-s hash_seed] [-c] [-e encoding] filename");
return;
}
- err = ext2fs_dirhash(NULL, hash_version, 0, argv[optind],
+
+ err = ext2fs_dirhash(encoding, hash_version, hash_flags, argv[optind],
strlen(argv[optind]), hash_seed, &hash,
&minor_hash);
if (err) {
--
2.19.1

2018-11-19 13:58:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 1/9] e2fsprogs: Add timestamp extension bits to superblock

On Mon, Oct 15, 2018 at 05:12:12PM -0400, Gabriel Krisman Bertazi wrote:
> Re-sync the superblock structure declaration with its kernel counterpart
> to include the fields added by kernel commit 6a0678a79bb3 ("ext4: super:
> extend timestamps to 40 bits")
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Thanks, applied.

- Ted

2018-11-22 06:20:38

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 6/9] lib/ext2fs: Implement NLS support

"Theodore Y. Ts'o" <[email protected]> writes:

> On Mon, Oct 15, 2018 at 05:12:17PM -0400, Gabriel Krisman Bertazi wrote:
>> Basic NLS support is required in e2fsprogs because of fsck, which
>> needsto calculate dx hashes for encoding aware filesystems. this patch
>> implements this infrastructure as well as ascii support.
>>
>> We don't need to do all the dance of versioning as we do in the kernel,
>> because we know before-hand which encodings and versions we
>> support (those we know how to store in the sb), so it is simpler just to
>> create static tables.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> I see the case folding tables for ASCII, but it looks like the case
> folding tables for Unicode aren't included. Am I missing something?

Oh, I think mentioned this in the cover letter, I removed the utf8 parts
from this version for no good reason. It is already part of v2. :)


--
Gabriel Krisman Bertazi

2018-11-22 06:21:56

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 9/9] tune2fs: Prevent enabling encryption flag on encoding-aware fs

"Theodore Y. Ts'o" <[email protected]> writes:

> On Mon, Oct 15, 2018 at 05:12:20PM -0400, Gabriel Krisman Bertazi wrote:
>> The kernel will refuse to mount filesystems with the encryption and
>> encoding features enabled at the same time. The encoding feature can
>> only be set at mount time, so we can just prevent encryption from being
>> set at a later time by tune2fs.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> This check also needs to be in mke2fs, right? I don't think I saw it
> in your mke2fs patch.

I hadn't noticed you could set the encryption flag at mkfs time.
sorry. will fix in v3.
--
Gabriel Krisman Bertazi

2018-11-21 15:05:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields

On Mon, Nov 19, 2018 at 10:28:48AM -0500, Gabriel Krisman Bertazi wrote:
>
> >> +#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
> >> +#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)

Where do these values come from? And why are they (1 << 1) and (1 << 4),
respectively?

I just noticed that these are used in utf8's default flags, when then
end up getting set in the superblock. So if these are official ext4
code points, they should have a EXT4_ prefix, not a UTF8_ prefix. It
also seems that it's not possible to set them in mke2fs (only the
"strict" flag can be set or unset in e2p_str2encoding_flags).

So are we going to support something other than NFKD, or not? If it's
in the superblock, then we need to make sure the kernel does something
sane if they are something other than the default. And if we are just
going to make it be a rule that all ext4 file systems with encoding
type utf8 v10 will be NFKD, then we should let it be configurable in
the superblock.

> >> +
> >> +static const struct ext4_sb_encoding_map {
> >> + char *name;
> >> + __u16 default_flags;
> >> +} ext4_encoding_map[] = {
> >> + /* 0x0 */ { "ascii", 0x0},
> >> + /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},

It might be enough to just use "utf8-10.0". Internally in the Unicode
standard, they only use the X.Y notation, and given that we're already
using the utf8 short-name, as opposed to something like "UTF-8
encoding of Unicode 10.0.0", it might be better to shorten it to utf-8.

I also noticed that Unicode 11.0 has been released in June 2018. For
poeple interested in scripts like Georgian Mtavruli (which has new
case folding rules, so it's not just academic on our part), Hanifi
Rohingya, Mayan Numberals, Historic Sanskrit etc., in their ext4 file
names, I'm sure they'll appreciate it. :-)

Oh, and I think the FSF will be happier if we use Unicode 11.0, since
it also features (in addition to a number of new emoji's), the
Copyleft Symbol. :-)

- Ted

2018-11-20 01:52:48

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields

"Theodore Y. Ts'o" <[email protected]> writes:

> On Mon, Oct 15, 2018 at 05:12:14PM -0400, Gabriel Krisman Bertazi wrote:
>> +#define EXT4_ENC_STRICT_MODE_FL (1 << 0) /* Reject invalid sequences? */
>
> Why the question mark?

Hi Ted,

The question mark is very redundant for a flag, I admit :). It meant to
say "Whether to reject invalid sequences" or something like that. Will
fix in the v2.

>> +#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
>> +#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)
>> +
>> +static const struct ext4_sb_encoding_map {
>> + char *name;
>> + __u16 default_flags;
>> +} ext4_encoding_map[] = {
>> + /* 0x0 */ { "ascii", 0x0},
>> + /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},
>> + {0x0, 0x0},
>> +};
>> +
>> #endif /* _LINUX_EXT2_FS_H */
>
> What uses this? I can't find any other references in either the kernel or
> e2fsprogs patches.

Only the instance ext4_encoding_map, itself, is used in this patch and
in the next one, which modifies mkefs. It stores the string for
comparison with what the user passed in the command line.

I guess naming the structure is unnecessary, since we have only this
single const static instance. I will change that in the v2, as well.

The current series doesn't include the huge utf8 stuff, which makes use
of the rest of the flags, but I will add that in v2 as well.


Thanks!

--
Gabriel Krisman Bertazi

2018-11-22 06:09:11

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields

"Theodore Y. Ts'o" <[email protected]> writes:

> On Mon, Nov 19, 2018 at 10:28:48AM -0500, Gabriel Krisman Bertazi wrote:
>>
>> >> +#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
>> >> +#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)
>
> Where do these values come from? And why are they (1 << 1) and (1 << 4),
> respectively?
>
> I just noticed that these are used in utf8's default flags, when then
> end up getting set in the superblock. So if these are official ext4
> code points, they should have a EXT4_ prefix, not a UTF8_ prefix. It
> also seems that it's not possible to set them in mke2fs (only the
> "strict" flag can be set or unset in e2p_str2encoding_flags).

Hi,

They come from the nls.h kernel header. These flags are passed to the
NLS system to describe the behavior of normalization/casefold functions.

In order to maintain compatibility to previous kernel users, the utf8
module (and others, eventually), still support the "no
normalization/casefold" policy (which I call 'plain' in the kernel).
When I merged utf8n into utf8, it became up to a flag set when loading
the nls table to decide what kind of normalization, if any, should be
done.

> So are we going to support something other than NFKD, or not? If it's
> in the superblock, then we need to make sure the kernel does something
> sane if they are something other than the default. And if we are just
> going to make it be a rule that all ext4 file systems with encoding
> type utf8 v10 will be NFKD, then we should let it be configurable in
> the superblock.

The NLS code in the kernel supports PLAIN and NFKD, but there is no real
reason for ext4 users to request PLAIN at all, which is only for
backward compatibility with filesystems that used the utf8 module
beforehand, so it can't be configured in e2fsprogs. It still makes
sense to store the normalization type in the superblock though, in case
we support other normalization forms in the future and need to do some
conversion.

That said, I am not planning to support other normalization forms in
ext4 in the future.

If the kernel (nls_load_version) finds any value other than TYPE_PLAIN
(0x0) or TYPE_NFKD in the superblock when loading the NLS table, it will
fail the table creation, which, in turn, fails the mount operation.

If you agree with the design above, I will just fix the EXT4_ prefix.

>
>> >> +
>> >> +static const struct ext4_sb_encoding_map {
>> >> + char *name;
>> >> + __u16 default_flags;
>> >> +} ext4_encoding_map[] = {
>> >> + /* 0x0 */ { "ascii", 0x0},
>> >> + /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},
>
> It might be enough to just use "utf8-10.0". Internally in the Unicode
> standard, they only use the X.Y notation, and given that we're already
> using the utf8 short-name, as opposed to something like "UTF-8
> encoding of Unicode 10.0.0", it might be better to shorten it to utf-8.
>
> I also noticed that Unicode 11.0 has been released in June 2018. For
> poeple interested in scripts like Georgian Mtavruli (which has new
> case folding rules, so it's not just academic on our part), Hanifi
> Rohingya, Mayan Numberals, Historic Sanskrit etc., in their ext4 file
> names, I'm sure they'll appreciate it. :-)
>
> Oh, and I think the FSF will be happier if we use Unicode 11.0, since
> it also features (in addition to a number of new emoji's), the
> Copyleft Symbol. :-)

I can do the update!


--
Gabriel Krisman Bertazi

2018-11-21 15:35:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 9/9] tune2fs: Prevent enabling encryption flag on encoding-aware fs

On Mon, Oct 15, 2018 at 05:12:20PM -0400, Gabriel Krisman Bertazi wrote:
> The kernel will refuse to mount filesystems with the encryption and
> encoding features enabled at the same time. The encoding feature can
> only be set at mount time, so we can just prevent encryption from being
> set at a later time by tune2fs.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

This check also needs to be in mke2fs, right? I don't think I saw it
in your mke2fs patch.

- Ted

2018-11-21 15:27:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 4/9] mke2fs: Configure encoding during superblock initialization

On Mon, Oct 15, 2018 at 05:12:15PM -0400, Gabriel Krisman Bertazi wrote:
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index f05003fc30b9..5ed7b987540e 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -790,6 +790,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
> int len;
> int r_usage = 0;
> int ret;
> + int encoding = -1;
> + char *encoding_flags = NULL;

...

> + if (ext2fs_has_feature_fname_encoding(param)) {
> + param->s_encoding_flags =
> + ext4_encoding_map[encoding].default_flags;

This code is assuming that users will specify the encoding via "-E encoding=utf8-10.0"
and this will set the FNAME_ENCODING flag implicitly.

But consider what happens if the user runs command like this:

mke2fs -t ext4 -O fname_encoding -E resize=12T

When parse_extended_opts gets called, the variable encoding will still
be -1, and so we'll end up trying to use a negative array index to
ext4_encoding_map[] which will be... unfortunate.

As I mentioned in another e-mail, I'm a bit dubious about having
per-encoding default flags. Those flags should either global ext4
code points, or they should be forced to specific values given the
encoding that is specified.

We probably also want to have a default encoding if the user just
specifies "-O fname_encoding". Say, in /etc/mke2fs.conf:

[options]
default_encoding = utf8-11.0

Then at some point a few years from now, we might enable
fname_encoding by default, so we might have in /etc/mke2fs.conf:

[fs_types]
ext4 = {
features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,largedir,fname_encoding
inode_size = 256
}

So having a way to specify the default encoding in /etc/mke2fs.conf is
going to be important. What will probably happen is two years, we'll
be up to Unicode 13.0, and we might want to add support for Unicode
13.0 in some future kernel version,, say, 5.8. But then we won't want
to make utf8-13.0 the default for some amount of time, since if the
file system is mounted on an older kernel, it won't work; the kernel
will have to reject mounting a file system with an unknown encoding.

So that's why I always like to make these sorts of configuration
defaults to be tuneable in /etc/mke2fs.conf. Different distros will
have different backwards compatibility policies. For example, For
enterprise distros, they might want to wait 7 years before creating
file systems with utf8-13.0 as the default. For a community distro,
they might want to wait 2-3 years. And for a purpose-built Linux
gaming Valve box, where the kernel is under the control of the box
manufacturers, they might want to be super-aggressive about adopting a
new Unicode encoding, in order to crack that critical Ancient Sanskrit
market. :-)

- Ted

2018-11-21 15:43:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 7/9] lib/ext2fs: Support encoding when calculating dx hashes

On Mon, Oct 15, 2018 at 05:12:18PM -0400, Gabriel Krisman Bertazi wrote:
> diff --git a/lib/ext2fs/dirhash.c b/lib/ext2fs/dirhash.c
> index 4ba3f35c091f..2198a6fd4d2a 100644
> --- a/lib/ext2fs/dirhash.c
> +++ b/lib/ext2fs/dirhash.c
> /* dirhash.c */
> -extern errcode_t ext2fs_dirhash(int version, const char *name, int len,
> +extern errcode_t ext2fs_dirhash(const struct nls_table *charset, int version,
> + int hash_flags, const char *name, int len,
> const __u32 *seed,
> ext2_dirhash_t *ret_hash,
> ext2_dirhash_t *ret_minor_hash);

We can't change function signatures in libext2fs, because this would
breaks the shared libraries ABI.

So when you want to add a new function parameter to a shared library
function, what you should do is create a new function that has the new
parameter, and call it something like ext2_dirhash2(). Then
ext2_dirhash() can be implemented in terms of ext2_dirhash2().

Also, my preference would be to insert new input paramters after len, e.g.:

extern errcode_t ext2fs_dirhash2(int version, const char *name, int len,
int hash_flags,
const struct nls_table *charset,
const __u32 *seed,
ext2_dirhash_t *ret_hash,
ext2_dirhash_t *ret_minor_hash);

- Ted

2018-11-21 15:34:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 6/9] lib/ext2fs: Implement NLS support

On Mon, Oct 15, 2018 at 05:12:17PM -0400, Gabriel Krisman Bertazi wrote:
> Basic NLS support is required in e2fsprogs because of fsck, which
> needsto calculate dx hashes for encoding aware filesystems. this patch
> implements this infrastructure as well as ascii support.
>
> We don't need to do all the dance of versioning as we do in the kernel,
> because we know before-hand which encodings and versions we
> support (those we know how to store in the sb), so it is simpler just to
> create static tables.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

I see the case folding tables for ASCII, but it looks like the case
folding tables for Unicode aren't included. Am I missing something?

- Ted

2018-11-19 14:49:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 3/9] libe2p: Helpers for configuring the encoding superblock fields

On Mon, Oct 15, 2018 at 05:12:14PM -0400, Gabriel Krisman Bertazi wrote:
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index f1c405b76339..df8ced088f38 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -1127,4 +1127,17 @@ struct mmp_struct {
> */
> #define EXT4_INLINE_DATA_DOTDOT_SIZE (4)
>
> +#define EXT4_ENC_STRICT_MODE_FL (1 << 0) /* Reject invalid sequences? */

Why the question mark?

> +#define UTF8_NORMALIZATION_TYPE_NFKD (1 << 1)
> +#define UTF8_CASEFOLD_TYPE_NFKDCF (1 << 4)
> +
> +static const struct ext4_sb_encoding_map {
> + char *name;
> + __u16 default_flags;
> +} ext4_encoding_map[] = {
> + /* 0x0 */ { "ascii", 0x0},
> + /* 0x1 */ {"utf8-10.0.0", UTF8_NORMALIZATION_TYPE_NFKD|UTF8_CASEFOLD_TYPE_NFKDCF},
> + {0x0, 0x0},
> +};
> +
> #endif /* _LINUX_EXT2_FS_H */

What uses this? I can't find any other references in either the kernel or
e2fsprogs patches.

- Ted

2018-11-22 06:19:06

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 4/9] mke2fs: Configure encoding during superblock initialization

"Theodore Y. Ts'o" <[email protected]> writes:

> On Mon, Oct 15, 2018 at 05:12:15PM -0400, Gabriel Krisman Bertazi wrote:
>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>> index f05003fc30b9..5ed7b987540e 100644
>> --- a/misc/mke2fs.c
>> +++ b/misc/mke2fs.c
>> @@ -790,6 +790,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
>> int len;
>> int r_usage = 0;
>> int ret;
>> + int encoding = -1;
>> + char *encoding_flags = NULL;
>
> ...
>
>> + if (ext2fs_has_feature_fname_encoding(param)) {
>> + param->s_encoding_flags =
>> + ext4_encoding_map[encoding].default_flags;
>
> This code is assuming that users will specify the encoding via "-E encoding=utf8-10.0"
> and this will set the FNAME_ENCODING flag implicitly.
>
> But consider what happens if the user runs command like this:
>
> mke2fs -t ext4 -O fname_encoding -E resize=12T
>
> When parse_extended_opts gets called, the variable encoding will still
> be -1, and so we'll end up trying to use a negative array index to
> ext4_encoding_map[] which will be... unfortunate.
>
> As I mentioned in another e-mail, I'm a bit dubious about having
> per-encoding default flags. Those flags should either global ext4
> code points, or they should be forced to specific values given the
> encoding that is specified.

Normalization and casefold types are too specific to each encoding, to
not be per-encoding. ASCII has no normalization, for instance.

If I understand you correctly, we should make them ext4 code points to
ensure they don't change in the future.

> We probably also want to have a default encoding if the user just
> specifies "-O fname_encoding". Say, in /etc/mke2fs.conf:

Right. That solves the case for -O fname_encoding. I will do this in v3.

>
> [options]
> default_encoding = utf8-11.0
>
> Then at some point a few years from now, we might enable
> fname_encoding by default, so we might have in /etc/mke2fs.conf:
> [fs_types]
> ext4 = {
> features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,largedir,fname_encoding
> inode_size = 256
> }
>
> So having a way to specify the default encoding in /etc/mke2fs.conf is
> going to be important. What will probably happen is two years, we'll
> be up to Unicode 13.0, and we might want to add support for Unicode
> 13.0 in some future kernel version,, say, 5.8. But then we won't want
> to make utf8-13.0 the default for some amount of time, since if the
> file system is mounted on an older kernel, it won't work; the kernel
> will have to reject mounting a file system with an unknown encoding.
>
> So that's why I always like to make these sorts of configuration
> defaults to be tuneable in /etc/mke2fs.conf. Different distros will
> have different backwards compatibility policies. For example, For
> enterprise distros, they might want to wait 7 years before creating
> file systems with utf8-13.0 as the default. For a community distro,
> they might want to wait 2-3 years. And for a purpose-built Linux
> gaming Valve box, where the kernel is under the control of the box
> manufacturers, they might want to be super-aggressive about adopting a
> new Unicode encoding, in order to crack that critical Ancient Sanskrit
> market. :-)

good point!

--
Gabriel Krisman Bertazi

2018-11-19 14:37:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 2/9] e2fsprogs: Reserve feature bit and SB field bit for filename encoding

On Mon, Oct 15, 2018 at 05:12:13PM -0400, Gabriel Krisman Bertazi wrote:
> The s_encoding field in the superblock stores a magic number indicating
> the encoding format and version used globally by file and directory
> names in the filesystem.
>
> The s_encoding_flags defines policies for using the charset encoding,
> like how to handle invalid sequences and what kind of normalization to
> use.
>
> A feature flag is also allocated to indicate whether this filesystem has
> encoding awareness enabled.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Thanks, applied.

- Ted

2018-11-21 15:33:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH e2fsprogs 5/9] chattr/lsattr: Support casefold attribute

On Mon, Oct 15, 2018 at 05:12:16PM -0400, Gabriel Krisman Bertazi wrote:
> This flag can be set on directories to request insensitive file name
> lookupus.
>
> I used the letter 'F', referring to "caseFold" for lack of a better
> option.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Could you include an update to the chattr man page? Thanks!

- Ted