2014-12-09 05:22:23

by Li Xi

[permalink] [raw]
Subject: [v8 0/5] ext4: add project quota support

The following patches propose an implementation of project quota
support for ext4. A project is an aggregate of unrelated inodes
which might scatter in different directories. Inodes that belong
to the same project possess an identical identification i.e.
'project ID', just like every inode has its user/group
identification. The following patches add project quota as
supplement to the former uer/group quota types.

The semantics of ext4 project quota is consistent with XFS. Each
directory can have EXT4_INODE_PROJINHERIT flag set. When the
EXT4_INODE_PROJINHERIT flag of a parent directory is not set, a
newly created inode under that directory will have a default project
ID (i.e. 0). And its EXT4_INODE_PROJINHERIT flag is not set either.
When this flag is set on a directory, following rules will be kept:

1) The newly created inode under that directory will inherit both
the EXT4_INODE_PROJINHERIT flag and the project ID from its parent
directory.

2) Hard-linking a inode with different project ID into that directory
will fail with errno EXDEV.

3) Renaming a inode with different project ID into that directory
will fail with errno EXDEV. However, 'mv' command will detect this
failure and copy the renamed inode to a new inode in the directory.
Thus, this new inode will inherit both the project ID and
EXT4_INODE_PROJINHERIT flag.

4) If the project quota of that ID is being enforced, statfs() on
that directory will take the quotas as another upper limits along
with the capacity of the file system, i.e. the total block/inode
number will be the minimum of the quota limits and file system
capacity.

Changelog:
* v8 <- v7:
- Rebase to newest dev branch of ext4 repository (3.18.0_rc3).
* v7 <- v6:
- Map ext4 inode flags to xflags of struct fsxattr;
- Add patch to cleanup ext4 inode flag definitions.
* v6 <- v5:
- Add project ID check for cross rename;
- Remove patch of EXT4_IOC_GETPROJECT/EXT4_IOC_SETPROJECT ioctl
* v5 <- v4:
- Check project feature when set/get project ID;
- Do not check project feature for project quota;
- Add support of FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR.
* v4 <- v3:
- Do not check project feature when set/get project ID;
- Use EXT4_MAXQUOTAS instead of MAXQUOTAS in ext4 patches;
- Remove unnecessary change of fs/quota/dquot.c;
- Remove CONFIG_QUOTA_PROJECT.
* v3 <- v2:
- Add EXT4_INODE_PROJINHERIT semantics.
* v2 <- v1:
- Add ioctl interface for setting/getting project;
- Add EXT4_FEATURE_RO_COMPAT_PROJECT;
- Add get_projid() method in struct dquot_operations;
- Add error check of ext4_inode_projid_set/get().

v7: http://www.spinics.net/lists/linux-fsdevel/msg80404.html
v6: http://www.spinics.net/lists/linux-fsdevel/msg80022.html
v5: http://www.spinics.net/lists/linux-api/msg04840.html
v4: http://lwn.net/Articles/612972/
v3: http://www.spinics.net/lists/linux-ext4/msg45184.html
v2: http://www.spinics.net/lists/linux-ext4/msg44695.html
v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153

Any comments or feedbacks are appreciated.

Regards,
- Li Xi

Li Xi (5):
vfs: adds general codes to enforces project quota limits
ext4: adds project ID support
ext4: adds project quota support
ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
ext4: cleanup inode flag definitions

fs/ext4/ext4.h | 190 +++++++++++++++++++++----
fs/ext4/ialloc.c | 6 +
fs/ext4/inode.c | 29 ++++
fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++-------------
fs/ext4/namei.c | 17 +++
fs/ext4/super.c | 96 +++++++++++--
fs/quota/dquot.c | 35 ++++-
fs/quota/quota.c | 8 +-
fs/quota/quotaio_v2.h | 6 +-
fs/xfs/xfs_fs.h | 47 +++----
include/linux/quota.h | 2 +
include/uapi/linux/fs.h | 59 ++++++++
include/uapi/linux/quota.h | 6 +-
13 files changed, 650 insertions(+), 181 deletions(-)


2014-12-09 05:22:24

by Li Xi

[permalink] [raw]
Subject: [v8 1/5] vfs: adds general codes to enforces project quota limits

This patch adds support for a new quota type PRJQUOTA for project quota
enforcement. Also a new method get_projid() is added into dquot_operations
structure.

Signed-off-by: Li Xi <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/quota/dquot.c | 35 ++++++++++++++++++++++++++++++-----
fs/quota/quota.c | 8 ++++++--
fs/quota/quotaio_v2.h | 6 ++++--
include/linux/quota.h | 2 ++
include/uapi/linux/quota.h | 6 ++++--
5 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 6b45272..379ff77 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1154,8 +1154,8 @@ static int need_print_warning(struct dquot_warn *warn)
return uid_eq(current_fsuid(), warn->w_dq_id.uid);
case GRPQUOTA:
return in_group_p(warn->w_dq_id.gid);
- case PRJQUOTA: /* Never taken... Just make gcc happy */
- return 0;
+ case PRJQUOTA:
+ return 1;
}
return 0;
}
@@ -1394,6 +1394,9 @@ static void __dquot_initialize(struct inode *inode, int type)
/* First get references to structures we might need. */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
struct kqid qid;
+ kprojid_t projid;
+ int rc;
+
got[cnt] = NULL;
if (type != -1 && cnt != type)
continue;
@@ -1404,6 +1407,10 @@ static void __dquot_initialize(struct inode *inode, int type)
*/
if (inode->i_dquot[cnt])
continue;
+
+ if (!sb_has_quota_active(sb, cnt))
+ continue;
+
init_needed = 1;

switch (cnt) {
@@ -1413,6 +1420,12 @@ static void __dquot_initialize(struct inode *inode, int type)
case GRPQUOTA:
qid = make_kqid_gid(inode->i_gid);
break;
+ case PRJQUOTA:
+ rc = inode->i_sb->dq_op->get_projid(inode, &projid);
+ if (rc)
+ continue;
+ qid = make_kqid_projid(projid);
+ break;
}
got[cnt] = dqget(sb, qid);
}
@@ -2156,7 +2169,8 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
error = -EROFS;
goto out_fmt;
}
- if (!sb->s_op->quota_write || !sb->s_op->quota_read) {
+ if (!sb->s_op->quota_write || !sb->s_op->quota_read ||
+ (type == PRJQUOTA && sb->dq_op->get_projid == NULL)) {
error = -EINVAL;
goto out_fmt;
}
@@ -2397,8 +2411,19 @@ static void do_get_dqblk(struct dquot *dquot, struct fs_disk_quota *di)

memset(di, 0, sizeof(*di));
di->d_version = FS_DQUOT_VERSION;
- di->d_flags = dquot->dq_id.type == USRQUOTA ?
- FS_USER_QUOTA : FS_GROUP_QUOTA;
+ switch (dquot->dq_id.type) {
+ case USRQUOTA:
+ di->d_flags = FS_USER_QUOTA;
+ break;
+ case GRPQUOTA:
+ di->d_flags = FS_GROUP_QUOTA;
+ break;
+ case PRJQUOTA:
+ di->d_flags = FS_PROJ_QUOTA;
+ break;
+ default:
+ BUG();
+ }
di->d_id = from_kqid_munged(current_user_ns(), dquot->dq_id);

spin_lock(&dq_data_lock);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 7562164..795d694 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -30,11 +30,15 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
case Q_XGETQSTATV:
case Q_XQUOTASYNC:
break;
- /* allow to query information for dquots we "own" */
+ /*
+ * allow to query information for dquots we "own"
+ * always allow querying project quota
+ */
case Q_GETQUOTA:
case Q_XGETQUOTA:
if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) ||
- (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))))
+ (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id))) ||
+ (type == PRJQUOTA))
break;
/*FALLTHROUGH*/
default:
diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
index f1966b4..4e95430 100644
--- a/fs/quota/quotaio_v2.h
+++ b/fs/quota/quotaio_v2.h
@@ -13,12 +13,14 @@
*/
#define V2_INITQMAGICS {\
0xd9c01f11, /* USRQUOTA */\
- 0xd9c01927 /* GRPQUOTA */\
+ 0xd9c01927, /* GRPQUOTA */\
+ 0xd9c03f14, /* PRJQUOTA */\
}

#define V2_INITQVERSIONS {\
1, /* USRQUOTA */\
- 1 /* GRPQUOTA */\
+ 1, /* GRPQUOTA */\
+ 1, /* PRJQUOTA */\
}

/* First generic header */
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 80d345a..f1b25f8 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -50,6 +50,7 @@

#undef USRQUOTA
#undef GRPQUOTA
+#undef PRJQUOTA
enum quota_type {
USRQUOTA = 0, /* element used for user quotas */
GRPQUOTA = 1, /* element used for group quotas */
@@ -312,6 +313,7 @@ struct dquot_operations {
/* get reserved quota for delayed alloc, value returned is managed by
* quota code only */
qsize_t *(*get_reserved_space) (struct inode *);
+ int (*get_projid) (struct inode *, kprojid_t *);/* Get project ID */
};

struct path;
diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h
index 3b6cfbe..b2d9486 100644
--- a/include/uapi/linux/quota.h
+++ b/include/uapi/linux/quota.h
@@ -36,11 +36,12 @@
#include <linux/errno.h>
#include <linux/types.h>

-#define __DQUOT_VERSION__ "dquot_6.5.2"
+#define __DQUOT_VERSION__ "dquot_6.6.0"

-#define MAXQUOTAS 2
+#define MAXQUOTAS 3
#define USRQUOTA 0 /* element used for user quotas */
#define GRPQUOTA 1 /* element used for group quotas */
+#define PRJQUOTA 2 /* element used for project quotas */

/*
* Definitions for the default names of the quotas files.
@@ -48,6 +49,7 @@
#define INITQFNAMES { \
"user", /* USRQUOTA */ \
"group", /* GRPQUOTA */ \
+ "project", /* PRJQUOTA */ \
"undefined", \
};

--
1.7.1


2014-12-09 05:24:15

by Li Xi

[permalink] [raw]
Subject: [v8 2/5] ext4: adds project ID support

This patch adds a new internal field of ext4 inode to save project
identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
inheriting project ID from parent directory.

Signed-off-by: Li Xi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 21 +++++++++++++++++----
fs/ext4/ialloc.c | 6 ++++++
fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
fs/ext4/namei.c | 17 +++++++++++++++++
fs/ext4/super.c | 1 +
include/uapi/linux/fs.h | 1 +
6 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 29c43e7..8bd1da9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -377,16 +377,18 @@ struct flex_groups {
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
+#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
-#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
+#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
+#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */

/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
- EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
+ EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
+ EXT4_PROJINHERIT_FL)

/* Flags that are appropriate for regular files (all but dir-specific ones). */
#define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
@@ -434,6 +436,7 @@ enum {
EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
+ EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
};

@@ -683,6 +686,7 @@ struct ext4_inode {
__le32 i_crtime; /* File Creation time */
__le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
__le32 i_version_hi; /* high 32 bits for 64-bit version */
+ __le32 i_projid; /* Project ID */
};

struct move_extent {
@@ -934,6 +938,7 @@ struct ext4_inode_info {

/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
__u32 i_csum_seed;
+ kprojid_t i_projid;
};

/*
@@ -1518,6 +1523,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
* GDT_CSUM bits are mutually exclusive.
*/
#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
+#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x1000 /* Project quota */

#define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
#define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
@@ -1567,7 +1573,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
- EXT4_FEATURE_RO_COMPAT_QUOTA)
+ EXT4_FEATURE_RO_COMPAT_QUOTA |\
+ EXT4_FEATURE_RO_COMPAT_PROJECT)

/*
* Default values for user and/or group using reserved blocks
@@ -1575,6 +1582,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_DEF_RESUID 0
#define EXT4_DEF_RESGID 0

+/*
+ * Default project ID
+ */
+#define EXT4_DEF_PROJID 0
+
#define EXT4_DEF_INODE_READAHEAD_BLKS 32

/*
@@ -2131,6 +2143,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
+extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
extern void ext4_da_update_reserve_space(struct inode *inode,
int used, int quota_claim);

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index ac644c3..fefb948 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -756,6 +756,12 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
inode->i_gid = dir->i_gid;
} else
inode_init_owner(inode, dir, mode);
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+ ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) {
+ ei->i_projid = EXT4_I(dir)->i_projid;
+ } else {
+ ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
+ }
dquot_initialize(inode);

if (!goal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5653fa4..29204d4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3863,6 +3863,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
EXT4_I(inode)->i_inline_off = 0;
}

+int ext4_get_projid(struct inode *inode, kprojid_t *projid)
+{
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+ return -EOPNOTSUPP;
+ *projid = EXT4_I(inode)->i_projid;
+ return 0;
+}
+
struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
{
struct ext4_iloc iloc;
@@ -3874,6 +3882,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
int block;
uid_t i_uid;
gid_t i_gid;
+ projid_t i_projid;

inode = iget_locked(sb, ino);
if (!inode)
@@ -3923,12 +3932,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
inode->i_mode = le16_to_cpu(raw_inode->i_mode);
i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+ i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
+ else
+ i_projid = EXT4_DEF_PROJID;
+
if (!(test_opt(inode->i_sb, NO_UID32))) {
i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
}
i_uid_write(inode, i_uid);
i_gid_write(inode, i_gid);
+ ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));

ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
@@ -4158,6 +4173,7 @@ static int ext4_do_update_inode(handle_t *handle,
int need_datasync = 0, set_large_file = 0;
uid_t i_uid;
gid_t i_gid;
+ projid_t i_projid;

spin_lock(&ei->i_raw_lock);

@@ -4170,6 +4186,7 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_mode = cpu_to_le16(inode->i_mode);
i_uid = i_uid_read(inode);
i_gid = i_gid_read(inode);
+ i_projid = from_kprojid(&init_user_ns, ei->i_projid);
if (!(test_opt(inode->i_sb, NO_UID32))) {
raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
@@ -4249,6 +4266,18 @@ static int ext4_do_update_inode(handle_t *handle,
}
}

+ BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+ i_projid != EXT4_DEF_PROJID);
+ if (i_projid != EXT4_DEF_PROJID &&
+ (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
+ (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
+ spin_unlock(&ei->i_raw_lock);
+ err = -EFBIG;
+ goto out_brelse;
+ }
+ raw_inode->i_projid = cpu_to_le32(i_projid);
+
ext4_inode_csum_set(inode, raw_inode, ei);

spin_unlock(&ei->i_raw_lock);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2291923..09e8e55 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;

+ if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
+ (!projid_eq(EXT4_I(dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)))
+ return -EXDEV;
+
dquot_initialize(dir);

retry:
@@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
int credits;
u8 old_file_type;

+ if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
+ (!projid_eq(EXT4_I(new_dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)))
+ return -EXDEV;
+
dquot_initialize(old.dir);
dquot_initialize(new.dir);

@@ -3395,6 +3405,13 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
u8 new_file_type;
int retval;

+ if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
+ ((!projid_eq(EXT4_I(new_dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)) ||
+ (!projid_eq(EXT4_I(old_dir)->i_projid,
+ EXT4_I(new_dentry->d_inode)->i_projid))))
+ return -EXDEV;
+
dquot_initialize(old.dir);
dquot_initialize(new.dir);

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4fca81c..6b67795 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1067,6 +1067,7 @@ static const struct dquot_operations ext4_quota_operations = {
.write_info = ext4_write_info,
.alloc_dquot = dquot_alloc,
.destroy_dquot = dquot_destroy,
+ .get_projid = ext4_get_projid,
};

static const struct quotactl_ops ext4_qctl_operations = {
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3735fa0..fcbf647 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -195,6 +195,7 @@ struct inodes_stat_t {
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
+#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */

#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
--
1.7.1


2014-12-09 05:24:26

by Li Xi

[permalink] [raw]
Subject: [v8 3/5] ext4: adds project quota support

This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

Signed-off-by: Li Xi <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 8 +++-
fs/ext4/super.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8bd1da9..136e18c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -208,6 +208,7 @@ struct ext4_io_submit {
#define EXT4_UNDEL_DIR_INO 6 /* Undelete directory inode */
#define EXT4_RESIZE_INO 7 /* Reserved group descriptors inode */
#define EXT4_JOURNAL_INO 8 /* Journal inode */
+#define EXT4_PRJ_QUOTA_INO 9 /* Project quota inode */

/* First non-reserved inode for old ext4 filesystems */
#define EXT4_GOOD_OLD_FIRST_INO 11
@@ -982,6 +983,7 @@ struct ext4_inode_info {
#define EXT4_MOUNT_DIOREAD_NOLOCK 0x400000 /* Enable support for dio read nolocking */
#define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */
#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */
+#define EXT4_MOUNT_PRJQUOTA 0x2000000 /* Project quota support */
#define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */
#define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */
#define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */
@@ -1157,7 +1159,8 @@ struct ext4_super_block {
__le32 s_grp_quota_inum; /* inode for tracking group quota */
__le32 s_overhead_clusters; /* overhead blocks/clusters in fs */
__le32 s_backup_bgs[2]; /* groups with sparse_super2 SBs */
- __le32 s_reserved[106]; /* Padding to the end of the block */
+ __le32 s_prj_quota_inum; /* inode for tracking project quota */
+ __le32 s_reserved[105]; /* Padding to the end of the block */
__le32 s_checksum; /* crc32c(superblock) */
};

@@ -1172,7 +1175,7 @@ struct ext4_super_block {
#define EXT4_MF_FS_ABORTED 0x0002 /* Fatal error detected */

/* Number of quota types we support */
-#define EXT4_MAXQUOTAS 2
+#define EXT4_MAXQUOTAS 3

/*
* fourth extended-fs super-block data in memory
@@ -1364,6 +1367,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
ino == EXT4_BOOT_LOADER_INO ||
ino == EXT4_JOURNAL_INO ||
ino == EXT4_RESIZE_INO ||
+ ino == EXT4_PRJ_QUOTA_INO ||
(ino >= EXT4_FIRST_INO(sb) &&
ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6b67795..f5d8ca2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1035,8 +1035,8 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
}

#ifdef CONFIG_QUOTA
-#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
-#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
+static char *quotatypes[] = INITQFNAMES;
+#define QTYPE2NAME(t) (quotatypes[t])

static int ext4_write_dquot(struct dquot *dquot);
static int ext4_acquire_dquot(struct dquot *dquot);
@@ -1128,10 +1128,11 @@ enum {
Opt_journal_path, Opt_journal_checksum, Opt_journal_async_commit,
Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
Opt_data_err_abort, Opt_data_err_ignore,
- Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
+ Opt_usrjquota, Opt_grpjquota, Opt_prjjquota,
+ Opt_offusrjquota, Opt_offgrpjquota, Opt_offprjjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
- Opt_usrquota, Opt_grpquota, Opt_i_version,
+ Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_mblk_io_submit,
Opt_nomblk_io_submit, Opt_block_validity, Opt_noblock_validity,
Opt_inode_readahead_blks, Opt_journal_ioprio,
@@ -1183,6 +1184,8 @@ static const match_table_t tokens = {
{Opt_usrjquota, "usrjquota=%s"},
{Opt_offgrpjquota, "grpjquota="},
{Opt_grpjquota, "grpjquota=%s"},
+ {Opt_offprjjquota, "prjjquota="},
+ {Opt_prjjquota, "prjjquota=%s"},
{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
@@ -1190,6 +1193,7 @@ static const match_table_t tokens = {
{Opt_noquota, "noquota"},
{Opt_quota, "quota"},
{Opt_usrquota, "usrquota"},
+ {Opt_prjquota, "prjquota"},
{Opt_barrier, "barrier=%u"},
{Opt_barrier, "barrier"},
{Opt_nobarrier, "nobarrier"},
@@ -1404,12 +1408,17 @@ static const struct mount_opts {
MOPT_SET | MOPT_Q},
{Opt_grpquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_GRPQUOTA,
MOPT_SET | MOPT_Q},
+ {Opt_prjquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_PRJQUOTA,
+ MOPT_SET | MOPT_Q},
{Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA |
- EXT4_MOUNT_GRPQUOTA), MOPT_CLEAR | MOPT_Q},
+ EXT4_MOUNT_GRPQUOTA | EXT4_MOUNT_PRJQUOTA),
+ MOPT_CLEAR | MOPT_Q},
{Opt_usrjquota, 0, MOPT_Q},
{Opt_grpjquota, 0, MOPT_Q},
+ {Opt_prjjquota, 0, MOPT_Q},
{Opt_offusrjquota, 0, MOPT_Q},
{Opt_offgrpjquota, 0, MOPT_Q},
+ {Opt_offprjjquota, 0, MOPT_Q},
{Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT},
{Opt_jqfmt_vfsv0, QFMT_VFS_V0, MOPT_QFMT},
{Opt_jqfmt_vfsv1, QFMT_VFS_V1, MOPT_QFMT},
@@ -1432,10 +1441,14 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
return set_qf_name(sb, USRQUOTA, &args[0]);
else if (token == Opt_grpjquota)
return set_qf_name(sb, GRPQUOTA, &args[0]);
+ else if (token == Opt_prjjquota)
+ return set_qf_name(sb, PRJQUOTA, &args[0]);
else if (token == Opt_offusrjquota)
return clear_qf_name(sb, USRQUOTA);
else if (token == Opt_offgrpjquota)
return clear_qf_name(sb, GRPQUOTA);
+ else if (token == Opt_offprjjquota)
+ return clear_qf_name(sb, PRJQUOTA);
#endif
switch (token) {
case Opt_noacl:
@@ -1661,19 +1674,28 @@ static int parse_options(char *options, struct super_block *sb,
}
#ifdef CONFIG_QUOTA
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) &&
- (test_opt(sb, USRQUOTA) || test_opt(sb, GRPQUOTA))) {
+ (test_opt(sb, USRQUOTA) ||
+ test_opt(sb, GRPQUOTA) ||
+ test_opt(sb, PRJQUOTA))) {
ext4_msg(sb, KERN_ERR, "Cannot set quota options when QUOTA "
"feature is enabled");
return 0;
}
- if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
+ if (sbi->s_qf_names[USRQUOTA] ||
+ sbi->s_qf_names[GRPQUOTA] ||
+ sbi->s_qf_names[PRJQUOTA]) {
if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
clear_opt(sb, USRQUOTA);

if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
clear_opt(sb, GRPQUOTA);

- if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
+ if (test_opt(sb, PRJQUOTA) && sbi->s_qf_names[PRJQUOTA])
+ clear_opt(sb, PRJQUOTA);
+
+ if (test_opt(sb, GRPQUOTA) ||
+ test_opt(sb, USRQUOTA) ||
+ test_opt(sb, PRJQUOTA)) {
ext4_msg(sb, KERN_ERR, "old and new quota "
"format mixing");
return 0;
@@ -1733,6 +1755,9 @@ static inline void ext4_show_quota_options(struct seq_file *seq,

if (sbi->s_qf_names[GRPQUOTA])
seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
+
+ if (sbi->s_qf_names[PRJQUOTA])
+ seq_printf(seq, ",prjjquota=%s", sbi->s_qf_names[PRJQUOTA]);
#endif
}

@@ -5037,6 +5062,46 @@ restore_opts:
return err;
}

+static int ext4_statfs_project(struct super_block *sb,
+ kprojid_t projid, struct kstatfs *buf)
+{
+ struct kqid qid;
+ struct dquot *dquot;
+ u64 limit;
+ u64 curblock;
+
+ qid = make_kqid_projid(projid);
+ dquot = dqget(sb, qid);
+ if (!dquot)
+ return -ESRCH;
+ spin_lock(&dq_data_lock);
+
+ limit = dquot->dq_dqb.dqb_bsoftlimit ?
+ dquot->dq_dqb.dqb_bsoftlimit :
+ dquot->dq_dqb.dqb_bhardlimit;
+ if (limit && buf->f_blocks * buf->f_bsize > limit) {
+ curblock = dquot->dq_dqb.dqb_curspace / buf->f_bsize;
+ buf->f_blocks = limit / buf->f_bsize;
+ buf->f_bfree = buf->f_bavail =
+ (buf->f_blocks > curblock) ?
+ (buf->f_blocks - curblock) : 0;
+ }
+
+ limit = dquot->dq_dqb.dqb_isoftlimit ?
+ dquot->dq_dqb.dqb_isoftlimit :
+ dquot->dq_dqb.dqb_ihardlimit;
+ if (limit && buf->f_files > limit) {
+ buf->f_files = limit;
+ buf->f_ffree =
+ (buf->f_files > dquot->dq_dqb.dqb_curinodes) ?
+ (buf->f_files - dquot->dq_dqb.dqb_curinodes) : 0;
+ }
+
+ spin_unlock(&dq_data_lock);
+ dqput(dquot);
+ return 0;
+}
+
static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
@@ -5045,6 +5110,7 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
ext4_fsblk_t overhead = 0, resv_blocks;
u64 fsid;
s64 bfree;
+ struct inode *inode = dentry->d_inode;
resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));

if (!test_opt(sb, MINIX_DF))
@@ -5069,6 +5135,9 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;

+ if (ext4_test_inode_flag(inode, EXT4_INODE_PROJINHERIT) &&
+ sb_has_quota_limits_enabled(sb, PRJQUOTA))
+ ext4_statfs_project(sb, EXT4_I(inode)->i_projid, buf);
return 0;
}

@@ -5149,7 +5218,9 @@ static int ext4_mark_dquot_dirty(struct dquot *dquot)

/* Are we journaling quotas? */
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) ||
- sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
+ sbi->s_qf_names[USRQUOTA] ||
+ sbi->s_qf_names[GRPQUOTA] ||
+ sbi->s_qf_names[PRJQUOTA]) {
dquot_mark_dquot_dirty(dquot);
return ext4_write_dquot(dquot);
} else {
@@ -5233,7 +5304,8 @@ static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
struct inode *qf_inode;
unsigned long qf_inums[EXT4_MAXQUOTAS] = {
le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
- le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
};

BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA));
@@ -5261,7 +5333,8 @@ static int ext4_enable_quotas(struct super_block *sb)
int type, err = 0;
unsigned long qf_inums[EXT4_MAXQUOTAS] = {
le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
- le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
};

sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;
--
1.7.1


2014-12-09 05:24:36

by Li Xi

[permalink] [raw]
Subject: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
support for ext4. The interface is kept consistent with
XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.

Signed-off-by: Li Xi <[email protected]>
---
fs/ext4/ext4.h | 111 ++++++++++++++++
fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
fs/xfs/xfs_fs.h | 47 +++-----
include/uapi/linux/fs.h | 58 ++++++++
4 files changed, 418 insertions(+), 128 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 136e18c..43a2a88 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -384,6 +384,115 @@ struct flex_groups {
#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */

+/* Transfer internal flags to xflags */
+static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
+{
+ __u32 xflags = 0;
+
+ if (iflags & EXT4_SECRM_FL)
+ xflags |= FS_XFLAG_SECRM;
+ if (iflags & EXT4_UNRM_FL)
+ xflags |= FS_XFLAG_UNRM;
+ if (iflags & EXT4_COMPR_FL)
+ xflags |= FS_XFLAG_COMPR;
+ if (iflags & EXT4_SYNC_FL)
+ xflags |= FS_XFLAG_SYNC;
+ if (iflags & EXT4_IMMUTABLE_FL)
+ xflags |= FS_XFLAG_IMMUTABLE;
+ if (iflags & EXT4_APPEND_FL)
+ xflags |= FS_XFLAG_APPEND;
+ if (iflags & EXT4_NODUMP_FL)
+ xflags |= FS_XFLAG_NODUMP;
+ if (iflags & EXT4_NOATIME_FL)
+ xflags |= FS_XFLAG_NOATIME;
+ if (iflags & EXT4_COMPRBLK_FL)
+ xflags |= FS_XFLAG_COMPRBLK;
+ if (iflags & EXT4_NOCOMPR_FL)
+ xflags |= FS_XFLAG_NOCOMPR;
+ if (iflags & EXT4_ECOMPR_FL)
+ xflags |= FS_XFLAG_ECOMPR;
+ if (iflags & EXT4_INDEX_FL)
+ xflags |= FS_XFLAG_INDEX;
+ if (iflags & EXT4_IMAGIC_FL)
+ xflags |= FS_XFLAG_IMAGIC;
+ if (iflags & EXT4_JOURNAL_DATA_FL)
+ xflags |= FS_XFLAG_JOURNAL_DATA;
+ if (iflags & EXT4_NOTAIL_FL)
+ xflags |= FS_XFLAG_NOTAIL;
+ if (iflags & EXT4_DIRSYNC_FL)
+ xflags |= FS_XFLAG_DIRSYNC;
+ if (iflags & EXT4_TOPDIR_FL)
+ xflags |= FS_XFLAG_TOPDIR;
+ if (iflags & EXT4_HUGE_FILE_FL)
+ xflags |= FS_XFLAG_HUGE_FILE;
+ if (iflags & EXT4_EXTENTS_FL)
+ xflags |= FS_XFLAG_EXTENTS;
+ if (iflags & EXT4_EA_INODE_FL)
+ xflags |= FS_XFLAG_EA_INODE;
+ if (iflags & EXT4_EOFBLOCKS_FL)
+ xflags |= FS_XFLAG_EOFBLOCKS;
+ if (iflags & EXT4_INLINE_DATA_FL)
+ xflags |= FS_XFLAG_INLINE_DATA;
+ if (iflags & EXT4_PROJINHERIT_FL)
+ xflags |= FS_XFLAG_PROJINHERIT;
+ return xflags;
+}
+
+/* Transfer xflags flags to internal */
+static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
+{
+ unsigned long iflags = 0;
+
+ if (xflags & FS_XFLAG_SECRM)
+ iflags |= EXT4_SECRM_FL;
+ if (xflags & FS_XFLAG_UNRM)
+ iflags |= EXT4_UNRM_FL;
+ if (xflags & FS_XFLAG_COMPR)
+ iflags |= EXT4_COMPR_FL;
+ if (xflags & FS_XFLAG_SYNC)
+ iflags |= EXT4_SYNC_FL;
+ if (xflags & FS_XFLAG_IMMUTABLE)
+ iflags |= EXT4_IMMUTABLE_FL;
+ if (xflags & FS_XFLAG_APPEND)
+ iflags |= EXT4_APPEND_FL;
+ if (xflags & FS_XFLAG_NODUMP)
+ iflags |= EXT4_NODUMP_FL;
+ if (xflags & FS_XFLAG_NOATIME)
+ iflags |= EXT4_NOATIME_FL;
+ if (xflags & FS_XFLAG_COMPRBLK)
+ iflags |= EXT4_COMPRBLK_FL;
+ if (xflags & FS_XFLAG_NOCOMPR)
+ iflags |= EXT4_NOCOMPR_FL;
+ if (xflags & FS_XFLAG_ECOMPR)
+ iflags |= EXT4_ECOMPR_FL;
+ if (xflags & FS_XFLAG_INDEX)
+ iflags |= EXT4_INDEX_FL;
+ if (xflags & FS_XFLAG_IMAGIC)
+ iflags |= EXT4_IMAGIC_FL;
+ if (xflags & FS_XFLAG_JOURNAL_DATA)
+ iflags |= EXT4_JOURNAL_DATA_FL;
+ if (xflags & FS_XFLAG_IMAGIC)
+ iflags |= FS_XFLAG_NOTAIL;
+ if (xflags & FS_XFLAG_DIRSYNC)
+ iflags |= EXT4_DIRSYNC_FL;
+ if (xflags & FS_XFLAG_TOPDIR)
+ iflags |= EXT4_TOPDIR_FL;
+ if (xflags & FS_XFLAG_HUGE_FILE)
+ iflags |= EXT4_HUGE_FILE_FL;
+ if (xflags & FS_XFLAG_EXTENTS)
+ iflags |= EXT4_EXTENTS_FL;
+ if (xflags & FS_XFLAG_EA_INODE)
+ iflags |= EXT4_EA_INODE_FL;
+ if (xflags & FS_XFLAG_EOFBLOCKS)
+ iflags |= EXT4_EOFBLOCKS_FL;
+ if (xflags & FS_XFLAG_INLINE_DATA)
+ iflags |= EXT4_INLINE_DATA_FL;
+ if (xflags & FS_XFLAG_PROJINHERIT)
+ iflags |= EXT4_PROJINHERIT_FL;
+
+ return iflags;
+}
+
/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
@@ -606,6 +715,8 @@ enum {
#define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
#define EXT4_IOC_SWAP_BOOT _IO('f', 17)
#define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
+#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
+#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR

#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/*
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f58a0d1..8332476 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -14,6 +14,8 @@
#include <linux/compat.h>
#include <linux/mount.h>
#include <linux/file.h>
+#include <linux/quotaops.h>
+#include <linux/quota.h>
#include <asm/uaccess.h>
#include "ext4_jbd2.h"
#include "ext4.h"
@@ -196,126 +198,220 @@ journal_err_out:
return err;
}

-long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int ext4_ioctl_setflags(struct file *filp, unsigned int flags)
{
struct inode *inode = file_inode(filp);
- struct super_block *sb = inode->i_sb;
struct ext4_inode_info *ei = EXT4_I(inode);
- unsigned int flags;
+ handle_t *handle = NULL;
+ int err, migrate = 0;
+ struct ext4_iloc iloc;
+ unsigned int oldflags, mask, i;
+ unsigned int jflag;

- ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
+ if (!inode_owner_or_capable(inode))
+ return -EACCES;

- switch (cmd) {
- case EXT4_IOC_GETFLAGS:
- ext4_get_inode_flags(ei);
- flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
- return put_user(flags, (int __user *) arg);
- case EXT4_IOC_SETFLAGS: {
- handle_t *handle = NULL;
- int err, migrate = 0;
- struct ext4_iloc iloc;
- unsigned int oldflags, mask, i;
- unsigned int jflag;
+ err = mnt_want_write_file(filp);
+ if (err)
+ return err;

- if (!inode_owner_or_capable(inode))
- return -EACCES;
+ flags = ext4_mask_flags(inode->i_mode, flags);

- if (get_user(flags, (int __user *) arg))
- return -EFAULT;
+ err = -EPERM;
+ mutex_lock(&inode->i_mutex);
+ /* Is it quota file? Do not allow user to mess with it */
+ if (IS_NOQUOTA(inode))
+ goto flags_out;

- err = mnt_want_write_file(filp);
- if (err)
- return err;
+ oldflags = ei->i_flags;

- flags = ext4_mask_flags(inode->i_mode, flags);
+ /* The JOURNAL_DATA flag is modifiable only by root */
+ jflag = flags & EXT4_JOURNAL_DATA_FL;

- err = -EPERM;
- mutex_lock(&inode->i_mutex);
- /* Is it quota file? Do not allow user to mess with it */
- if (IS_NOQUOTA(inode))
+ /*
+ * The IMMUTABLE and APPEND_ONLY flags can only be changed by
+ * the relevant capability.
+ *
+ * This test looks nicer. Thanks to Pauline Middelink
+ */
+ if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
+ if (!capable(CAP_LINUX_IMMUTABLE))
goto flags_out;
+ }

- oldflags = ei->i_flags;
-
- /* The JOURNAL_DATA flag is modifiable only by root */
- jflag = flags & EXT4_JOURNAL_DATA_FL;
-
- /*
- * The IMMUTABLE and APPEND_ONLY flags can only be changed by
- * the relevant capability.
- *
- * This test looks nicer. Thanks to Pauline Middelink
- */
- if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
- if (!capable(CAP_LINUX_IMMUTABLE))
- goto flags_out;
- }
-
- /*
- * The JOURNAL_DATA flag can only be changed by
- * the relevant capability.
- */
- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
- if (!capable(CAP_SYS_RESOURCE))
- goto flags_out;
- }
- if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
- migrate = 1;
-
+ /*
+ * The JOURNAL_DATA flag can only be changed by
+ * the relevant capability.
+ */
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+ if (!capable(CAP_SYS_RESOURCE))
+ goto flags_out;
+ }
+ if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
+ migrate = 1;
if (flags & EXT4_EOFBLOCKS_FL) {
- /* we don't support adding EOFBLOCKS flag */
- if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
- err = -EOPNOTSUPP;
- goto flags_out;
- }
- } else if (oldflags & EXT4_EOFBLOCKS_FL)
- ext4_truncate(inode);
-
- handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
- if (IS_ERR(handle)) {
- err = PTR_ERR(handle);
+ /* we don't support adding EOFBLOCKS flag */
+ if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
+ err = -EOPNOTSUPP;
goto flags_out;
}
- if (IS_SYNC(inode))
- ext4_handle_sync(handle);
- err = ext4_reserve_inode_write(handle, inode, &iloc);
- if (err)
- goto flags_err;
-
- for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
- if (!(mask & EXT4_FL_USER_MODIFIABLE))
- continue;
- if (mask & flags)
- ext4_set_inode_flag(inode, i);
- else
- ext4_clear_inode_flag(inode, i);
- }
+ } else if (oldflags & EXT4_EOFBLOCKS_FL)
+ ext4_truncate(inode);

- ext4_set_inode_flags(inode);
- inode->i_ctime = ext4_current_time(inode);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto flags_out;
+ }
+ if (IS_SYNC(inode))
+ ext4_handle_sync(handle);
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto flags_err;
+
+ for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
+ if (!(mask & EXT4_FL_USER_MODIFIABLE))
+ continue;
+ if (mask & flags)
+ ext4_set_inode_flag(inode, i);
+ else
+ ext4_clear_inode_flag(inode, i);
+ }

- err = ext4_mark_iloc_dirty(handle, inode, &iloc);
-flags_err:
- ext4_journal_stop(handle);
- if (err)
- goto flags_out;
+ ext4_set_inode_flags(inode);
+ inode->i_ctime = ext4_current_time(inode);

- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
- err = ext4_change_inode_journal_flag(inode, jflag);
- if (err)
- goto flags_out;
- if (migrate) {
- if (flags & EXT4_EXTENTS_FL)
- err = ext4_ext_migrate(inode);
- else
- err = ext4_ind_migrate(inode);
- }
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+flags_err:
+ ext4_journal_stop(handle);
+ if (err)
+ goto flags_out;
+
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+ err = ext4_change_inode_journal_flag(inode, jflag);
+ if (err)
+ goto flags_out;
+ if (migrate) {
+ if (flags & EXT4_EXTENTS_FL)
+ err = ext4_ext_migrate(inode);
+ else
+ err = ext4_ind_migrate(inode);
+ }

flags_out:
- mutex_unlock(&inode->i_mutex);
- mnt_drop_write_file(filp);
+ mutex_unlock(&inode->i_mutex);
+ mnt_drop_write_file(filp);
+ return err;
+}
+
+static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
+{
+ struct inode *inode = file_inode(filp);
+ struct super_block *sb = inode->i_sb;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ int err;
+ handle_t *handle;
+ kprojid_t kprojid;
+ struct ext4_iloc iloc;
+ struct ext4_inode *raw_inode;
+
+ struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
+
+ /* Make sure caller can change project. */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ if (projid != EXT4_DEF_PROJID
+ && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT))
+ return -EOPNOTSUPP;
+
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+ BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
+ != EXT4_DEF_PROJID);
+ if (projid != EXT4_DEF_PROJID)
+ return -EOPNOTSUPP;
+ else
+ return 0;
+ }
+
+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
+
+ if (projid_eq(kprojid, EXT4_I(inode)->i_projid))
+ return 0;
+
+ err = mnt_want_write_file(filp);
+ if (err)
return err;
+
+ err = -EPERM;
+ mutex_lock(&inode->i_mutex);
+ /* Is it quota file? Do not allow user to mess with it */
+ if (IS_NOQUOTA(inode))
+ goto project_out;
+
+ dquot_initialize(inode);
+
+ handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
+ EXT4_QUOTA_INIT_BLOCKS(sb) +
+ EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto project_out;
+ }
+
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto project_stop;
+
+ raw_inode = ext4_raw_inode(&iloc);
+ if ((EXT4_INODE_SIZE(sb) <=
+ EXT4_GOOD_OLD_INODE_SIZE) ||
+ (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))) {
+ err = -EFBIG;
+ goto project_stop;
}
+
+ transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
+ if (!transfer_to[PRJQUOTA])
+ goto project_set;
+
+ err = __dquot_transfer(inode, transfer_to);
+ dqput(transfer_to[PRJQUOTA]);
+ if (err)
+ goto project_stop;
+
+project_set:
+ EXT4_I(inode)->i_projid = kprojid;
+ inode->i_ctime = ext4_current_time(inode);
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+project_stop:
+ ext4_journal_stop(handle);
+project_out:
+ mutex_unlock(&inode->i_mutex);
+ mnt_drop_write_file(filp);
+ return err;
+}
+
+long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct super_block *sb = inode->i_sb;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ unsigned int flags;
+
+ ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
+
+ switch (cmd) {
+ case EXT4_IOC_GETFLAGS:
+ ext4_get_inode_flags(ei);
+ flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
+ return put_user(flags, (int __user *) arg);
+ case EXT4_IOC_SETFLAGS:
+ if (get_user(flags, (int __user *) arg))
+ return -EFAULT;
+ return ext4_ioctl_setflags(filp, flags);
case EXT4_IOC_GETVERSION:
case EXT4_IOC_GETVERSION_OLD:
return put_user(inode->i_generation, (int __user *) arg);
@@ -615,7 +711,45 @@ resizefs_out:
}
case EXT4_IOC_PRECACHE_EXTENTS:
return ext4_ext_precache(inode);
+ case EXT4_IOC_FSGETXATTR:
+ {
+ struct fsxattr fa;
+
+ memset(&fa, 0, sizeof(struct fsxattr));
+
+ ext4_get_inode_flags(ei);
+ fa.fsx_xflags = ext4_iflags_to_xflags(ei->i_flags & EXT4_FL_USER_VISIBLE);
+
+ if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+ fa.fsx_projid = (__u32)from_kprojid(&init_user_ns,
+ EXT4_I(inode)->i_projid);
+ }
+
+ if (copy_to_user((struct fsxattr __user *)arg,
+ &fa, sizeof(fa)))
+ return -EFAULT;
+ return 0;
+ }
+ case EXT4_IOC_FSSETXATTR:
+ {
+ struct fsxattr fa;
+ int err;
+
+ if (copy_from_user(&fa, (struct fsxattr __user *)arg,
+ sizeof(fa)))
+ return -EFAULT;

+ err = ext4_ioctl_setflags(filp, ext4_xflags_to_iflags(fa.fsx_xflags));
+ if (err)
+ return err;
+
+ err = ext4_ioctl_setproject(filp, fa.fsx_projid);
+ if (err)
+ return err;
+
+ return 0;
+ }
default:
return -ENOTTY;
}
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 18dc721..64c7ae6 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -36,38 +36,25 @@ struct dioattr {
#endif

/*
- * Structure for XFS_IOC_FSGETXATTR[A] and XFS_IOC_FSSETXATTR.
- */
-#ifndef HAVE_FSXATTR
-struct fsxattr {
- __u32 fsx_xflags; /* xflags field value (get/set) */
- __u32 fsx_extsize; /* extsize field value (get/set)*/
- __u32 fsx_nextents; /* nextents field value (get) */
- __u32 fsx_projid; /* project identifier (get/set) */
- unsigned char fsx_pad[12];
-};
-#endif
-
-/*
* Flags for the bs_xflags/fsx_xflags field
* There should be a one-to-one correspondence between these flags and the
* XFS_DIFLAG_s.
*/
-#define XFS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
-#define XFS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
-#define XFS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
-#define XFS_XFLAG_APPEND 0x00000010 /* all writes append */
-#define XFS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
-#define XFS_XFLAG_NOATIME 0x00000040 /* do not update access time */
-#define XFS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
-#define XFS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
-#define XFS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
-#define XFS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
-#define XFS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
-#define XFS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
-#define XFS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
-#define XFS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
-#define XFS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
+#define XFS_XFLAG_REALTIME FS_XFLAG_REALTIME /* data in realtime volume */
+#define XFS_XFLAG_PREALLOC FS_XFLAG_PREALLOC /* preallocated file extents */
+#define XFS_XFLAG_IMMUTABLE FS_XFLAG_IMMUTABLE /* file cannot be modified */
+#define XFS_XFLAG_APPEND FS_XFLAG_APPEND /* all writes append */
+#define XFS_XFLAG_SYNC FS_XFLAG_SYNC /* all writes synchronous */
+#define XFS_XFLAG_NOATIME FS_XFLAG_NOATIME /* do not update access time */
+#define XFS_XFLAG_NODUMP FS_XFLAG_NODUMP /* do not include in backups */
+#define XFS_XFLAG_RTINHERIT FS_XFLAG_RTINHERIT /* create with rt bit set */
+#define XFS_XFLAG_PROJINHERIT FS_XFLAG_PROJINHERIT /* create with parents projid */
+#define XFS_XFLAG_NOSYMLINKS FS_XFLAG_NOSYMLINKS /* disallow symlink creation */
+#define XFS_XFLAG_EXTSIZE FS_XFLAG_EXTSIZE /* extent size allocator hint */
+#define XFS_XFLAG_EXTSZINHERIT FS_XFLAG_EXTSZINHERIT /* inherit inode extent size */
+#define XFS_XFLAG_NODEFRAG FS_XFLAG_NODEFRAG /* do not defragment */
+#define XFS_XFLAG_FILESTREAM FS_XFLAG_FILESTREAM /* use filestream allocator */
+#define XFS_XFLAG_HASATTR FS_XFLAG_HASATTR /* no DIFLAG for this */

/*
* Structure for XFS_IOC_GETBMAP.
@@ -503,8 +490,8 @@ typedef struct xfs_swapext
#define XFS_IOC_ALLOCSP _IOW ('X', 10, struct xfs_flock64)
#define XFS_IOC_FREESP _IOW ('X', 11, struct xfs_flock64)
#define XFS_IOC_DIOINFO _IOR ('X', 30, struct dioattr)
-#define XFS_IOC_FSGETXATTR _IOR ('X', 31, struct fsxattr)
-#define XFS_IOC_FSSETXATTR _IOW ('X', 32, struct fsxattr)
+#define XFS_IOC_FSGETXATTR FS_IOC_FSGETXATTR
+#define XFS_IOC_FSSETXATTR FS_IOC_FSSETXATTR
#define XFS_IOC_ALLOCSP64 _IOW ('X', 36, struct xfs_flock64)
#define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
#define XFS_IOC_GETBMAP _IOWR('X', 38, struct getbmap)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index fcbf647..872fed5 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -58,6 +58,62 @@ struct inodes_stat_t {
long dummy[5]; /* padding for sysctl ABI compatibility */
};

+/*
+ * Extend attribute flags. These should be or-ed together to figure out what
+ * is valid.
+ */
+#define FSX_XFLAGS (1 << 0)
+#define FSX_EXTSIZE (1 << 1)
+#define FSX_NEXTENTS (1 << 2)
+#define FSX_PROJID (1 << 3)
+
+/*
+ * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
+ */
+struct fsxattr {
+ __u32 fsx_xflags; /* xflags field value (get/set) */
+ __u32 fsx_extsize; /* extsize field value (get/set)*/
+ __u32 fsx_nextents; /* nextents field value (get) */
+ __u32 fsx_projid; /* project identifier (get/set) */
+ unsigned char fsx_pad[12];
+};
+
+/*
+ * Flags for the fsx_xflags field
+ */
+#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
+#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
+#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */
+#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
+#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
+#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
+#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
+#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
+#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
+#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
+#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
+#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
+#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
+#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
+#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
+#define FS_XFLAG_UNRM 0x00008000 /* undelete */
+#define FS_XFLAG_COMPR 0x00010000 /* compress file */
+#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
+#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
+#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
+#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
+#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
+#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
+#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
+#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
+#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
+#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
+#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
+#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
+#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
+#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */
+#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
+

#define NR_FILE 8192 /* this can well be larger on a larger system */

@@ -163,6 +219,8 @@ struct inodes_stat_t {
#define FS_IOC_GETVERSION _IOR('v', 1, long)
#define FS_IOC_SETVERSION _IOW('v', 2, long)
#define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
+#define FS_IOC_FSGETXATTR _IOR('f', 31, struct fsxattr)
+#define FS_IOC_FSSETXATTR _IOW('f', 32, struct fsxattr)
#define FS_IOC32_GETFLAGS _IOR('f', 1, int)
#define FS_IOC32_SETFLAGS _IOW('f', 2, int)
#define FS_IOC32_GETVERSION _IOR('v', 1, int)
--
1.7.1


2014-12-09 22:57:58

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Tue, Dec 09, 2014 at 01:22:27PM +0800, Li Xi wrote:
> This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
> support for ext4. The interface is kept consistent with
> XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
>
> Signed-off-by: Li Xi <[email protected]>
> ---
> fs/ext4/ext4.h | 111 ++++++++++++++++
> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
> fs/xfs/xfs_fs.h | 47 +++-----
> include/uapi/linux/fs.h | 58 ++++++++
> 4 files changed, 418 insertions(+), 128 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 136e18c..43a2a88 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -384,6 +384,115 @@ struct flex_groups {
> #define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
> #define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>
> +/* Transfer internal flags to xflags */
> +static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
> +{
> + __u32 xflags = 0;
> +
> + if (iflags & EXT4_SECRM_FL)
> + xflags |= FS_XFLAG_SECRM;
> + if (iflags & EXT4_UNRM_FL)
> + xflags |= FS_XFLAG_UNRM;
> + if (iflags & EXT4_COMPR_FL)
> + xflags |= FS_XFLAG_COMPR;
....

NACK.

> + if (iflags & EXT4_SYNC_FL)
> + xflags |= FS_XFLAG_SYNC;
> + if (iflags & EXT4_IMMUTABLE_FL)
> + xflags |= FS_XFLAG_IMMUTABLE;
> + if (iflags & EXT4_APPEND_FL)
> + xflags |= FS_XFLAG_APPEND;
> + if (iflags & EXT4_NODUMP_FL)
> + xflags |= FS_XFLAG_NODUMP;
> + if (iflags & EXT4_NOATIME_FL)
> + xflags |= FS_XFLAG_NOATIME;

These are OK because they already exist in the interface, but all
the ext4 specific flags you've added have no place in this patchset.


> +
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> @@ -606,6 +715,8 @@ enum {
> #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
> #define EXT4_IOC_SWAP_BOOT _IO('f', 17)
> #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
> +#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
> +#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR

NACK. Userspace only needs FS_IOC_FS[GS]ETXATTR so that it works across
all filesystems. We need to retain XFS_IOC_FS[GS]ETXATTR so we
don't break existing userspace tools, but we do not need new
filesystem specific definitions anywhere.

> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index fcbf647..872fed5 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -58,6 +58,62 @@ struct inodes_stat_t {
> long dummy[5]; /* padding for sysctl ABI compatibility */
> };
>
> +/*
> + * Extend attribute flags. These should be or-ed together to figure out what
> + * is valid.
> + */
> +#define FSX_XFLAGS (1 << 0)
> +#define FSX_EXTSIZE (1 << 1)
> +#define FSX_NEXTENTS (1 << 2)
> +#define FSX_PROJID (1 << 3)

NACK.

I've said this more than once: these are *private to XFS's
implementation* and are not be part of the user interface. Do not
move them from their existing location.


> +
> +/*
> + * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
> + */
> +struct fsxattr {
> + __u32 fsx_xflags; /* xflags field value (get/set) */
> + __u32 fsx_extsize; /* extsize field value (get/set)*/
> + __u32 fsx_nextents; /* nextents field value (get) */
> + __u32 fsx_projid; /* project identifier (get/set) */
> + unsigned char fsx_pad[12];
> +};
> +
> +/*
> + * Flags for the fsx_xflags field
> + */
> +#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
> +#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
> +#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */

NACK - ext4 specific.

> +#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
> +#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
> +#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
> +#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
> +#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
> +#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
> +#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
> +#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
> +#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
> +#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
> +#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
> +#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */

existing flags.

> +#define FS_XFLAG_UNRM 0x00008000 /* undelete */
> +#define FS_XFLAG_COMPR 0x00010000 /* compress file */
> +#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
> +#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
> +#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
> +#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
> +#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
> +#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
> +#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
> +#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
> +#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
> +#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
> +#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
> +#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
> +#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
> +#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */

And a bunch more ext4 specific flags that *uses all the remaining
flag space*. At minimum, we need to keep space in this existing
flags field for flags to future indication of how the padding is
used, so that's yet another NACK.

Further, none of these have any relevance to project quotas so
should not be a part of this patchset. Nor are they relevant to any
other filesystem, and many are duplicated by information you can get
from FIEMAP and other interfaces. NACK, again.

Because I'm getting annoyed at being repeatedly ignored about what
needs to be done here, I'm now going to shout a bit. DO NOT CHANGE
THE INTERFACE. DO NOT ADD any new flags to the interface. DO NOT
REDEFINE how the interface works. DO NOT "ext4-ise" the interface.

The only changes to the interface code should be moving the XFS
definitions and renaming them so as to provide the new generic
ioctl definition as well as the historic XFS definitions. The ext4
implementation needs to be done in a separate patch to the interface
rename, and it must only implement the functionality the interface
already provides. Anything else is outside the scope of this
patchset and requires separate discussion.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-06 20:01:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: [v8 3/5] ext4: adds project quota support

On Dec 8, 2014, at 10:22 PM, Li Xi <[email protected]> wrote:
>
> This patch adds mount options for enabling/disabling project quota
> accounting and enforcement. A new specific inode is also used for
> project quota accounting.

Judy looking through these patches again and saw a minor issue:

> Signed-off-by: Li Xi <[email protected]>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> fs/ext4/ext4.h | 8 +++-
> fs/ext4/super.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 90 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 8bd1da9..136e18c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -208,6 +208,7 @@ struct ext4_io_submit {
> #define EXT4_UNDEL_DIR_INO 6 /* Undelete directory inode */
> #define EXT4_RESIZE_INO 7 /* Reserved group descriptors inode */
> #define EXT4_JOURNAL_INO 8 /* Journal inode */
> +#define EXT4_PRJ_QUOTA_INO 9 /* Project quota inode */
>
> /* First non-reserved inode for old ext4 filesystems */
> #define EXT4_GOOD_OLD_FIRST_INO 11
> @@ -982,6 +983,7 @@ struct ext4_inode_info {
> #define EXT4_MOUNT_DIOREAD_NOLOCK 0x400000 /* Enable support for dio read nolocking */
> #define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */
> #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */
> +#define EXT4_MOUNT_PRJQUOTA 0x2000000 /* Project quota support */
> #define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */
> #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */
> #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */
> @@ -1157,7 +1159,8 @@ struct ext4_super_block {
> __le32 s_grp_quota_inum; /* inode for tracking group quota */
> __le32 s_overhead_clusters; /* overhead blocks/clusters in fs */
> __le32 s_backup_bgs[2]; /* groups with sparse_super2 SBs */
> - __le32 s_reserved[106]; /* Padding to the end of the block */
> + __le32 s_prj_quota_inum; /* inode for tracking project quota */
> + __le32 s_reserved[105]; /* Padding to the end of the block */
> __le32 s_checksum; /* crc32c(superblock) */
> };
>
> @@ -1172,7 +1175,7 @@ struct ext4_super_block {
> #define EXT4_MF_FS_ABORTED 0x0002 /* Fatal error detected */
>
> /* Number of quota types we support */
> -#define EXT4_MAXQUOTAS 2
> +#define EXT4_MAXQUOTAS 3
>
> /*
> * fourth extended-fs super-block data in memory
> @@ -1364,6 +1367,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
> ino == EXT4_BOOT_LOADER_INO ||
> ino == EXT4_JOURNAL_INO ||
> ino == EXT4_RESIZE_INO ||
> + ino == EXT4_PRJ_QUOTA_INO ||
> (ino >= EXT4_FIRST_INO(sb) &&
> ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
> }
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 6b67795..f5d8ca2 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1035,8 +1035,8 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> }
>
> #ifdef CONFIG_QUOTA
> -#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
> -#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
> +static char *quotatypes[] = INITQFNAMES;
> +#define QTYPE2NAME(t) (quotatypes[t])
>
> static int ext4_write_dquot(struct dquot *dquot);
> static int ext4_acquire_dquot(struct dquot *dquot);
> @@ -1128,10 +1128,11 @@ enum {
> Opt_journal_path, Opt_journal_checksum, Opt_journal_async_commit,
> Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
> Opt_data_err_abort, Opt_data_err_ignore,
> - Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
> + Opt_usrjquota, Opt_grpjquota, Opt_prjjquota,
> + Opt_offusrjquota, Opt_offgrpjquota, Opt_offprjjquota,
> Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> - Opt_usrquota, Opt_grpquota, Opt_i_version,
> + Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_mblk_io_submit,
> Opt_nomblk_io_submit, Opt_block_validity, Opt_noblock_validity,
> Opt_inode_readahead_blks, Opt_journal_ioprio,
> @@ -1183,6 +1184,8 @@ static const match_table_t tokens = {
> {Opt_usrjquota, "usrjquota=%s"},
> {Opt_offgrpjquota, "grpjquota="},
> {Opt_grpjquota, "grpjquota=%s"},
> + {Opt_offprjjquota, "prjjquota="},
> + {Opt_prjjquota, "prjjquota=%s"},

We don't need to be able to specify the project quota inodes on the
mount command line. This was only needed for supporting old (non-
journaled) user and group quotas, but there is no chance of having
old project quotas.

> {Opt_jqfmt_vfsold, "jqfmt=vfsold"},
> {Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
> {Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
> @@ -1190,6 +1193,7 @@ static const match_table_t tokens = {
> {Opt_noquota, "noquota"},
> {Opt_quota, "quota"},
> {Opt_usrquota, "usrquota"},
> + {Opt_prjquota, "prjquota"},

Do we want to allow non-journaled project quota at all? I don't
see that as a benefit, and it was only allowed for user/group
quota for compatibility reasons. Since the project quota support
will need an updated e2fsprogs anyway, it may as well be journaled
from the start.

The rest of the patch looks good.

Cheers, Andreas

> {Opt_barrier, "barrier=%u"},
> {Opt_barrier, "barrier"},
> {Opt_nobarrier, "nobarrier"},
> @@ -1404,12 +1408,17 @@ static const struct mount_opts {
> MOPT_SET | MOPT_Q},
> {Opt_grpquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_GRPQUOTA,
> MOPT_SET | MOPT_Q},
> + {Opt_prjquota, EXT4_MOUNT_QUOTA | EXT4_MOUNT_PRJQUOTA,
> + MOPT_SET | MOPT_Q},
> {Opt_noquota, (EXT4_MOUNT_QUOTA | EXT4_MOUNT_USRQUOTA |
> - EXT4_MOUNT_GRPQUOTA), MOPT_CLEAR | MOPT_Q},
> + EXT4_MOUNT_GRPQUOTA | EXT4_MOUNT_PRJQUOTA),
> + MOPT_CLEAR | MOPT_Q},
> {Opt_usrjquota, 0, MOPT_Q},
> {Opt_grpjquota, 0, MOPT_Q},
> + {Opt_prjjquota, 0, MOPT_Q},
> {Opt_offusrjquota, 0, MOPT_Q},
> {Opt_offgrpjquota, 0, MOPT_Q},
> + {Opt_offprjjquota, 0, MOPT_Q},
> {Opt_jqfmt_vfsold, QFMT_VFS_OLD, MOPT_QFMT},
> {Opt_jqfmt_vfsv0, QFMT_VFS_V0, MOPT_QFMT},
> {Opt_jqfmt_vfsv1, QFMT_VFS_V1, MOPT_QFMT},
> @@ -1432,10 +1441,14 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
> return set_qf_name(sb, USRQUOTA, &args[0]);
> else if (token == Opt_grpjquota)
> return set_qf_name(sb, GRPQUOTA, &args[0]);
> + else if (token == Opt_prjjquota)
> + return set_qf_name(sb, PRJQUOTA, &args[0]);
> else if (token == Opt_offusrjquota)
> return clear_qf_name(sb, USRQUOTA);
> else if (token == Opt_offgrpjquota)
> return clear_qf_name(sb, GRPQUOTA);
> + else if (token == Opt_offprjjquota)
> + return clear_qf_name(sb, PRJQUOTA);
> #endif
> switch (token) {
> case Opt_noacl:
> @@ -1661,19 +1674,28 @@ static int parse_options(char *options, struct super_block *sb,
> }
> #ifdef CONFIG_QUOTA
> if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) &&
> - (test_opt(sb, USRQUOTA) || test_opt(sb, GRPQUOTA))) {
> + (test_opt(sb, USRQUOTA) ||
> + test_opt(sb, GRPQUOTA) ||
> + test_opt(sb, PRJQUOTA))) {
> ext4_msg(sb, KERN_ERR, "Cannot set quota options when QUOTA "
> "feature is enabled");
> return 0;
> }
> - if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
> + if (sbi->s_qf_names[USRQUOTA] ||
> + sbi->s_qf_names[GRPQUOTA] ||
> + sbi->s_qf_names[PRJQUOTA]) {
> if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
> clear_opt(sb, USRQUOTA);
>
> if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
> clear_opt(sb, GRPQUOTA);
>
> - if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
> + if (test_opt(sb, PRJQUOTA) && sbi->s_qf_names[PRJQUOTA])
> + clear_opt(sb, PRJQUOTA);
> +
> + if (test_opt(sb, GRPQUOTA) ||
> + test_opt(sb, USRQUOTA) ||
> + test_opt(sb, PRJQUOTA)) {
> ext4_msg(sb, KERN_ERR, "old and new quota "
> "format mixing");
> return 0;
> @@ -1733,6 +1755,9 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
>
> if (sbi->s_qf_names[GRPQUOTA])
> seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
> +
> + if (sbi->s_qf_names[PRJQUOTA])
> + seq_printf(seq, ",prjjquota=%s", sbi->s_qf_names[PRJQUOTA]);
> #endif
> }
>
> @@ -5037,6 +5062,46 @@ restore_opts:
> return err;
> }
>
> +static int ext4_statfs_project(struct super_block *sb,
> + kprojid_t projid, struct kstatfs *buf)
> +{
> + struct kqid qid;
> + struct dquot *dquot;
> + u64 limit;
> + u64 curblock;
> +
> + qid = make_kqid_projid(projid);
> + dquot = dqget(sb, qid);
> + if (!dquot)
> + return -ESRCH;
> + spin_lock(&dq_data_lock);
> +
> + limit = dquot->dq_dqb.dqb_bsoftlimit ?
> + dquot->dq_dqb.dqb_bsoftlimit :
> + dquot->dq_dqb.dqb_bhardlimit;
> + if (limit && buf->f_blocks * buf->f_bsize > limit) {
> + curblock = dquot->dq_dqb.dqb_curspace / buf->f_bsize;
> + buf->f_blocks = limit / buf->f_bsize;
> + buf->f_bfree = buf->f_bavail =
> + (buf->f_blocks > curblock) ?
> + (buf->f_blocks - curblock) : 0;
> + }
> +
> + limit = dquot->dq_dqb.dqb_isoftlimit ?
> + dquot->dq_dqb.dqb_isoftlimit :
> + dquot->dq_dqb.dqb_ihardlimit;
> + if (limit && buf->f_files > limit) {
> + buf->f_files = limit;
> + buf->f_ffree =
> + (buf->f_files > dquot->dq_dqb.dqb_curinodes) ?
> + (buf->f_files - dquot->dq_dqb.dqb_curinodes) : 0;
> + }
> +
> + spin_unlock(&dq_data_lock);
> + dqput(dquot);
> + return 0;
> +}
> +
> static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
> {
> struct super_block *sb = dentry->d_sb;
> @@ -5045,6 +5110,7 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
> ext4_fsblk_t overhead = 0, resv_blocks;
> u64 fsid;
> s64 bfree;
> + struct inode *inode = dentry->d_inode;
> resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));
>
> if (!test_opt(sb, MINIX_DF))
> @@ -5069,6 +5135,9 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
> buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
> buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;
>
> + if (ext4_test_inode_flag(inode, EXT4_INODE_PROJINHERIT) &&
> + sb_has_quota_limits_enabled(sb, PRJQUOTA))
> + ext4_statfs_project(sb, EXT4_I(inode)->i_projid, buf);
> return 0;
> }
>
> @@ -5149,7 +5218,9 @@ static int ext4_mark_dquot_dirty(struct dquot *dquot)
>
> /* Are we journaling quotas? */
> if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA) ||
> - sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
> + sbi->s_qf_names[USRQUOTA] ||
> + sbi->s_qf_names[GRPQUOTA] ||
> + sbi->s_qf_names[PRJQUOTA]) {
> dquot_mark_dquot_dirty(dquot);
> return ext4_write_dquot(dquot);
> } else {
> @@ -5233,7 +5304,8 @@ static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
> struct inode *qf_inode;
> unsigned long qf_inums[EXT4_MAXQUOTAS] = {
> le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
> - le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
> };
>
> BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA));
> @@ -5261,7 +5333,8 @@ static int ext4_enable_quotas(struct super_block *sb)
> int type, err = 0;
> unsigned long qf_inums[EXT4_MAXQUOTAS] = {
> le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
> - le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
> };
>
> sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;
> --
> 1.7.1
>


Cheers, Andreas






2015-01-06 21:52:56

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 3/5] ext4: adds project quota support

On Tue 06-01-15 13:01:21, Andreas Dilger wrote:
> On Dec 8, 2014, at 10:22 PM, Li Xi <[email protected]> wrote:
> >
> > This patch adds mount options for enabling/disabling project quota
> > accounting and enforcement. A new specific inode is also used for
> > project quota accounting.
>
> Judy looking through these patches again and saw a minor issue:
>
> > @@ -1128,10 +1128,11 @@ enum {
> > Opt_journal_path, Opt_journal_checksum, Opt_journal_async_commit,
> > Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
> > Opt_data_err_abort, Opt_data_err_ignore,
> > - Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
> > + Opt_usrjquota, Opt_grpjquota, Opt_prjjquota,
> > + Opt_offusrjquota, Opt_offgrpjquota, Opt_offprjjquota,
> > Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> > Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > - Opt_usrquota, Opt_grpquota, Opt_i_version,
> > + Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> > Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_mblk_io_submit,
> > Opt_nomblk_io_submit, Opt_block_validity, Opt_noblock_validity,
> > Opt_inode_readahead_blks, Opt_journal_ioprio,
> > @@ -1183,6 +1184,8 @@ static const match_table_t tokens = {
> > {Opt_usrjquota, "usrjquota=%s"},
> > {Opt_offgrpjquota, "grpjquota="},
> > {Opt_grpjquota, "grpjquota=%s"},
> > + {Opt_offprjjquota, "prjjquota="},
> > + {Opt_prjjquota, "prjjquota=%s"},
>
> We don't need to be able to specify the project quota inodes on the
> mount command line. This was only needed for supporting old (non-
> journaled) user and group quotas, but there is no chance of having
> old project quotas.
>
> > {Opt_jqfmt_vfsold, "jqfmt=vfsold"},
> > {Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
> > {Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
> > @@ -1190,6 +1193,7 @@ static const match_table_t tokens = {
> > {Opt_noquota, "noquota"},
> > {Opt_quota, "quota"},
> > {Opt_usrquota, "usrquota"},
> > + {Opt_prjquota, "prjquota"},
>
> Do we want to allow non-journaled project quota at all? I don't
> see that as a benefit, and it was only allowed for user/group
> quota for compatibility reasons. Since the project quota support
> will need an updated e2fsprogs anyway, it may as well be journaled
> from the start.
Yeah, probably you are right that it doesn't make a whole lot of sense to
support project quotas in user visible files. So these could be removed.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-07 23:11:16

by Andreas Dilger

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Dec 8, 2014, at 10:22 PM, Li Xi <[email protected]> wrote:
> This patch adds a new internal field of ext4 inode to save project
> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> inheriting project ID from parent directory.
>
> Signed-off-by: Li Xi <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> fs/ext4/ext4.h | 21 +++++++++++++++++----
> fs/ext4/ialloc.c | 6 ++++++
> fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
> fs/ext4/namei.c | 17 +++++++++++++++++
> fs/ext4/super.c | 1 +
> include/uapi/linux/fs.h | 1 +
> 6 files changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 29c43e7..8bd1da9 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -377,16 +377,18 @@ struct flex_groups {
> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */

Sigh, only a single on-disk inode flag unused.

> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>
> -#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
> -#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
> +#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
> +#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
> - EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
> + EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
> + EXT4_PROJINHERIT_FL)
>
> /* Flags that are appropriate for regular files (all but dir-specific ones). */
> #define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
> @@ -434,6 +436,7 @@ enum {
> EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
> EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
> EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
> + EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
> EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
> };
>
> @@ -683,6 +686,7 @@ struct ext4_inode {
> __le32 i_crtime; /* File Creation time */
> __le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
> __le32 i_version_hi; /* high 32 bits for 64-bit version */
> + __le32 i_projid; /* Project ID */
> };
>
> struct move_extent {
> @@ -934,6 +938,7 @@ struct ext4_inode_info {
>
> /* Precomputed uuid+inum+igen checksum for seeding inode checksums */
> __u32 i_csum_seed;
> + kprojid_t i_projid;
> };
>
> /*
> @@ -1518,6 +1523,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> * GDT_CSUM bits are mutually exclusive.
> */
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> +#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x1000 /* Project quota */

I wonder if it makes sense to add EXT4_FEATURE_RO_COMPAT_METADATA_CSUM
here as well, to make it clear why this is skipping a value?

> #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
> #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
> @@ -1567,7 +1573,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
> EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> - EXT4_FEATURE_RO_COMPAT_QUOTA)
> + EXT4_FEATURE_RO_COMPAT_QUOTA |\
> + EXT4_FEATURE_RO_COMPAT_PROJECT)
>
> /*
> * Default values for user and/or group using reserved blocks
> @@ -1575,6 +1582,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> #define EXT4_DEF_RESUID 0
> #define EXT4_DEF_RESGID 0
>
> +/*
> + * Default project ID
> + */
> +#define EXT4_DEF_PROJID 0
> +
> #define EXT4_DEF_INODE_READAHEAD_BLKS 32
>
> /*
> @@ -2131,6 +2143,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
> loff_t lstart, loff_t lend);
> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
> +extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
> extern void ext4_da_update_reserve_space(struct inode *inode,
> int used, int quota_claim);
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index ac644c3..fefb948 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -756,6 +756,12 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> inode->i_gid = dir->i_gid;
> } else
> inode_init_owner(inode, dir, mode);
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
> + ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) {
> + ei->i_projid = EXT4_I(dir)->i_projid;
> + } else {
> + ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
> + }

(style) no need for { } if only a single line in both if/else parts.

> dquot_initialize(inode);
>
> if (!goal)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5653fa4..29204d4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3863,6 +3863,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
> EXT4_I(inode)->i_inline_off = 0;
> }
>
> +int ext4_get_projid(struct inode *inode, kprojid_t *projid)
> +{
> + if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
> + return -EOPNOTSUPP;
> + *projid = EXT4_I(inode)->i_projid;
> + return 0;
> +}
> +
> struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> {
> struct ext4_iloc iloc;
> @@ -3874,6 +3882,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> int block;
> uid_t i_uid;
> gid_t i_gid;
> + projid_t i_projid;
>
> inode = iget_locked(sb, ino);
> if (!inode)
> @@ -3923,12 +3932,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
> i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
> + i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
> + else
> + i_projid = EXT4_DEF_PROJID;
> +
> if (!(test_opt(inode->i_sb, NO_UID32))) {
> i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
> i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
> }
> i_uid_write(inode, i_uid);
> i_gid_write(inode, i_gid);
> + ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
> set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
>
> ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
> @@ -4158,6 +4173,7 @@ static int ext4_do_update_inode(handle_t *handle,
> int need_datasync = 0, set_large_file = 0;
> uid_t i_uid;
> gid_t i_gid;
> + projid_t i_projid;
>
> spin_lock(&ei->i_raw_lock);
>
> @@ -4170,6 +4186,7 @@ static int ext4_do_update_inode(handle_t *handle,
> raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> i_uid = i_uid_read(inode);
> i_gid = i_gid_read(inode);
> + i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> if (!(test_opt(inode->i_sb, NO_UID32))) {
> raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> @@ -4249,6 +4266,18 @@ static int ext4_do_update_inode(handle_t *handle,
> }
> }
>
> + BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
> + EXT4_FEATURE_RO_COMPAT_PROJECT) &&
> + i_projid != EXT4_DEF_PROJID);
> + if (i_projid != EXT4_DEF_PROJID &&
> + (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
> + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
> + spin_unlock(&ei->i_raw_lock);
> + err = -EFBIG;

I'm not sure if -EFBIG is the best error case, since that is a common
filesystem error. Maybe -EOVERFLOW would be better?

Also, returning the error from ext4_mark_iloc_dirty->ext4_do_update_inode()
is a bit late in the game, since the inode has already been modified at
this point. That callpath typically only returned an error if the disk
was bad or the journal aborted (again normally because the disk was bad),
so at that point the dirty in-memory data was lost anyway.

It would be better to check this in the caller before the inode is changed
so that an error can be returned without (essentially) corrupting the
in-memory state. Since the projid should only be set for new inodes (which
will always have enough space, assuming RO_COMPAT_PROJECT cannot be set on
filesystems with 128-byte inodes), or in case of rename into a project
directory, it shouldn't be too big a change.

> + goto out_brelse;
> + }
> + raw_inode->i_projid = cpu_to_le32(i_projid);
> +
> ext4_inode_csum_set(inode, raw_inode, ei);
>
> spin_unlock(&ei->i_raw_lock);
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 2291923..09e8e55 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
> if (inode->i_nlink >= EXT4_LINK_MAX)
> return -EMLINK;
>
> + if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
> + (!projid_eq(EXT4_I(dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)))
> + return -EXDEV;
> +
> dquot_initialize(dir);
>
> retry:
> @@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> int credits;
> u8 old_file_type;
>
> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
> + (!projid_eq(EXT4_I(new_dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)))
> + return -EXDEV;
> +
> dquot_initialize(old.dir);
> dquot_initialize(new.dir);
>
> @@ -3395,6 +3405,13 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> u8 new_file_type;
> int retval;
>
> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
> + ((!projid_eq(EXT4_I(new_dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)) ||
> + (!projid_eq(EXT4_I(old_dir)->i_projid,
> + EXT4_I(new_dentry->d_inode)->i_projid))))
> + return -EXDEV;
> +
> dquot_initialize(old.dir);
> dquot_initialize(new.dir);
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 4fca81c..6b67795 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1067,6 +1067,7 @@ static const struct dquot_operations ext4_quota_operations = {
> .write_info = ext4_write_info,
> .alloc_dquot = dquot_alloc,
> .destroy_dquot = dquot_destroy,
> + .get_projid = ext4_get_projid,
> };
>
> static const struct quotactl_ops ext4_qctl_operations = {
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 3735fa0..fcbf647 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -195,6 +195,7 @@ struct inodes_stat_t {
> #define FS_EXTENT_FL 0x00080000 /* Extents */
> #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
> #define FS_NOCOW_FL 0x00800000 /* Do not cow file */
> +#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
>
> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
> --
> 1.7.1
>


Cheers, Andreas






2015-01-08 08:26:25

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Tue 09-12-14 13:22:25, Li Xi wrote:
> This patch adds a new internal field of ext4 inode to save project
> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> inheriting project ID from parent directory.
I have noticed one thing you apparently changed in v7 of the patch set.
See below.

> Signed-off-by: Li Xi <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> fs/ext4/ext4.h | 21 +++++++++++++++++----
> fs/ext4/ialloc.c | 6 ++++++
> fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
> fs/ext4/namei.c | 17 +++++++++++++++++
> fs/ext4/super.c | 1 +
> include/uapi/linux/fs.h | 1 +
> 6 files changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 29c43e7..8bd1da9 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -377,16 +377,18 @@ struct flex_groups {
> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
version of the patch set which is correct - this definition is defining
ext4 on-disk format. As such it is an ext4 specific flag and should be
definined to a fixed constant independed of any other filesystem. It seems
you are somewhat mixing what is an on-disk format flag value and what is a
flag value passed from userspace. These two may be different things and you
need to convert between the values when getting / setting flags...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-08 08:51:11

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Wed 07-01-15 16:11:16, Andreas Dilger wrote:
> On Dec 8, 2014, at 10:22 PM, Li Xi <[email protected]> wrote:
> > This patch adds a new internal field of ext4 inode to save project
> > identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> > inheriting project ID from parent directory.
> >
> > Signed-off-by: Li Xi <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
> > ---
...
> > @@ -4249,6 +4266,18 @@ static int ext4_do_update_inode(handle_t *handle,
> > }
> > }
> >
> > + BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
> > + EXT4_FEATURE_RO_COMPAT_PROJECT) &&
> > + i_projid != EXT4_DEF_PROJID);
> > + if (i_projid != EXT4_DEF_PROJID &&
> > + (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
> > + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
> > + spin_unlock(&ei->i_raw_lock);
> > + err = -EFBIG;
>
> I'm not sure if -EFBIG is the best error case, since that is a common
> filesystem error. Maybe -EOVERFLOW would be better?
So my thought has been that this is mostly a sanity check and we would
check in tune2fs whether all inodes have enough space when setting
EXT4_FEATURE_RO_COMPAT_PROJECT feature... Because sudden errors (regardless
whether it will be EFBIG or EOVERFLOW) when renaming files will be hard to
understand for sysadmins IMHO.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-08 22:20:24

by Andreas Dilger

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Jan 8, 2015, at 1:26 AM, Jan Kara <[email protected]> wrote:
> On Tue 09-12-14 13:22:25, Li Xi wrote:
>> This patch adds a new internal field of ext4 inode to save project
>> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
>> inheriting project ID from parent directory.
> I have noticed one thing you apparently changed in v7 of the patch set.
> See below.
>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 29c43e7..8bd1da9 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -377,16 +377,18 @@ struct flex_groups {
>> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
>> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
>> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
>> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
> How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
> version of the patch set which is correct - this definition is defining
> ext4 on-disk format. As such it is an ext4 specific flag and should be
> definined to a fixed constant independed of any other filesystem. It seems
> you are somewhat mixing what is an on-disk format flag value and what is a
> flag value passed from userspace. These two may be different things and
> you need to convert between the values when getting / setting flags...

Currently the EXT4_*_FL and FS_*_FL values are all identical, and there
is no reason to change that before it is actually needed. Since the
FS_PROJINHERIT_FL is used via chattr/lsattr from userspace, this value
must also be kept the same in the future to avoid API breakage, so there
is no reason to worry about incompatibilities.

See also the [v8 5/5] patch, which is changing the EXT4_*_FL values to
use FS_*_FL constants, where applicable, so that it is more clear that
these values need to be the same.

Cheers, Andreas






2015-01-09 09:48:01

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Thu 08-01-15 15:20:21, Andreas Dilger wrote:
> On Jan 8, 2015, at 1:26 AM, Jan Kara <[email protected]> wrote:
> > On Tue 09-12-14 13:22:25, Li Xi wrote:
> >> This patch adds a new internal field of ext4 inode to save project
> >> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> >> inheriting project ID from parent directory.
> > I have noticed one thing you apparently changed in v7 of the patch set.
> > See below.
> >
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index 29c43e7..8bd1da9 100644
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -377,16 +377,18 @@ struct flex_groups {
> >> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> >> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> >> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> >> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
> > How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
> > version of the patch set which is correct - this definition is defining
> > ext4 on-disk format. As such it is an ext4 specific flag and should be
> > definined to a fixed constant independed of any other filesystem. It seems
> > you are somewhat mixing what is an on-disk format flag value and what is a
> > flag value passed from userspace. These two may be different things and
> > you need to convert between the values when getting / setting flags...
>
> Currently the EXT4_*_FL and FS_*_FL values are all identical, and there
> is no reason to change that before it is actually needed. Since the
> FS_PROJINHERIT_FL is used via chattr/lsattr from userspace, this value
> must also be kept the same in the future to avoid API breakage, so there
> is no reason to worry about incompatibilities.
Agreed. I was somewhat worried about having on-disk flag defined through
the external non-ext4 define but you are right that neither can really
change once we ship a kernel with it.

> See also the [v8 5/5] patch, which is changing the EXT4_*_FL values to
> use FS_*_FL constants, where applicable, so that it is more clear that
> these values need to be the same.
OK, I've missed that. So if things will be consistent again, I'm fine
with the change.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-09 23:46:27

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Fri, Jan 09, 2015 at 10:47:58AM +0100, Jan Kara wrote:
> On Thu 08-01-15 15:20:21, Andreas Dilger wrote:
> > On Jan 8, 2015, at 1:26 AM, Jan Kara <[email protected]> wrote:
> > > On Tue 09-12-14 13:22:25, Li Xi wrote:
> > >> This patch adds a new internal field of ext4 inode to save project
> > >> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> > >> inheriting project ID from parent directory.
> > > I have noticed one thing you apparently changed in v7 of the patch set.
> > > See below.
> > >
> > >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > >> index 29c43e7..8bd1da9 100644
> > >> --- a/fs/ext4/ext4.h
> > >> +++ b/fs/ext4/ext4.h
> > >> @@ -377,16 +377,18 @@ struct flex_groups {
> > >> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> > >> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> > >> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> > >> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
> > > How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
> > > version of the patch set which is correct - this definition is defining
> > > ext4 on-disk format. As such it is an ext4 specific flag and should be
> > > definined to a fixed constant independed of any other filesystem. It seems
> > > you are somewhat mixing what is an on-disk format flag value and what is a
> > > flag value passed from userspace. These two may be different things and
> > > you need to convert between the values when getting / setting flags...
> >
> > Currently the EXT4_*_FL and FS_*_FL values are all identical, and there
> > is no reason to change that before it is actually needed. Since the
> > FS_PROJINHERIT_FL is used via chattr/lsattr from userspace, this value
> > must also be kept the same in the future to avoid API breakage, so there
> > is no reason to worry about incompatibilities.
> Agreed. I was somewhat worried about having on-disk flag defined through
> the external non-ext4 define but you are right that neither can really
> change once we ship a kernel with it.
>
> > See also the [v8 5/5] patch, which is changing the EXT4_*_FL values to
> > use FS_*_FL constants, where applicable, so that it is more clear that
> > these values need to be the same.
> OK, I've missed that. So if things will be consistent again, I'm fine
> with the change.

Except that I NACK'd that change (i.e patch 4/5) because it's out of
scope of a "support project quota" patchset. not to mention that it
is broken because it exhausts the flags space with ext4 specific
flags and prevents future expansion of the ioctl structure.

Any extension to the ioctl needs to be done in a spearate patch set,
with separate justification. This patch set should only implement
the very minimum needed to use the project quota ioctl flags....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-12 17:01:22

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Sat 10-01-15 10:46:27, Dave Chinner wrote:
> On Fri, Jan 09, 2015 at 10:47:58AM +0100, Jan Kara wrote:
> > On Thu 08-01-15 15:20:21, Andreas Dilger wrote:
> > > On Jan 8, 2015, at 1:26 AM, Jan Kara <[email protected]> wrote:
> > > > On Tue 09-12-14 13:22:25, Li Xi wrote:
> > > >> This patch adds a new internal field of ext4 inode to save project
> > > >> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> > > >> inheriting project ID from parent directory.
> > > > I have noticed one thing you apparently changed in v7 of the patch set.
> > > > See below.
> > > >
> > > >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > >> index 29c43e7..8bd1da9 100644
> > > >> --- a/fs/ext4/ext4.h
> > > >> +++ b/fs/ext4/ext4.h
> > > >> @@ -377,16 +377,18 @@ struct flex_groups {
> > > >> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> > > >> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> > > >> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> > > >> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
> > > > How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
> > > > version of the patch set which is correct - this definition is defining
> > > > ext4 on-disk format. As such it is an ext4 specific flag and should be
> > > > definined to a fixed constant independed of any other filesystem. It seems
> > > > you are somewhat mixing what is an on-disk format flag value and what is a
> > > > flag value passed from userspace. These two may be different things and
> > > > you need to convert between the values when getting / setting flags...
> > >
> > > Currently the EXT4_*_FL and FS_*_FL values are all identical, and there
> > > is no reason to change that before it is actually needed. Since the
> > > FS_PROJINHERIT_FL is used via chattr/lsattr from userspace, this value
> > > must also be kept the same in the future to avoid API breakage, so there
> > > is no reason to worry about incompatibilities.
> > Agreed. I was somewhat worried about having on-disk flag defined through
> > the external non-ext4 define but you are right that neither can really
> > change once we ship a kernel with it.
> >
> > > See also the [v8 5/5] patch, which is changing the EXT4_*_FL values to
> > > use FS_*_FL constants, where applicable, so that it is more clear that
> > > these values need to be the same.
> > OK, I've missed that. So if things will be consistent again, I'm fine
> > with the change.
>
> Except that I NACK'd that change (i.e patch 4/5) because it's out of
> scope of a "support project quota" patchset. not to mention that it
> is broken because it exhausts the flags space with ext4 specific
> flags and prevents future expansion of the ioctl structure.
I agree with your objections from that review (which is why I didn't
reply to that email since I didn't have more to say).

> Any extension to the ioctl needs to be done in a spearate patch set,
> with separate justification. This patch set should only implement
> the very minimum needed to use the project quota ioctl flags....
Agreed. I was just saying that I have nothing against defining ext4 flag
values using FS_*_FL where possible.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-15 07:52:47

by Li Xi

[permalink] [raw]
Subject: Re: [v8 2/5] ext4: adds project ID support

On Thu, Jan 8, 2015 at 7:11 AM, Andreas Dilger <[email protected]> wrote:
> On Dec 8, 2014, at 10:22 PM, Li Xi <[email protected]> wrote:
>> This patch adds a new internal field of ext4 inode to save project
>> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
>> inheriting project ID from parent directory.
>>
>> Signed-off-by: Li Xi <[email protected]>
>> Reviewed-by: Jan Kara <[email protected]>
>> ---
>> fs/ext4/ext4.h | 21 +++++++++++++++++----
>> fs/ext4/ialloc.c | 6 ++++++
>> fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
>> fs/ext4/namei.c | 17 +++++++++++++++++
>> fs/ext4/super.c | 1 +
>> include/uapi/linux/fs.h | 1 +
>> 6 files changed, 71 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 29c43e7..8bd1da9 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -377,16 +377,18 @@ struct flex_groups {
>> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
>> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
>> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
>> +#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
>
> Sigh, only a single on-disk inode flag unused.
>
>> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>>
>> -#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
>> -#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
>> +#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
>> +#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>>
>> /* Flags that should be inherited by new inodes from their parent. */
>> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
>> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
>> EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
>> - EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
>> + EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
>> + EXT4_PROJINHERIT_FL)
>>
>> /* Flags that are appropriate for regular files (all but dir-specific ones). */
>> #define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
>> @@ -434,6 +436,7 @@ enum {
>> EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
>> EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
>> EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
>> + EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
>> EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
>> };
>>
>> @@ -683,6 +686,7 @@ struct ext4_inode {
>> __le32 i_crtime; /* File Creation time */
>> __le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
>> __le32 i_version_hi; /* high 32 bits for 64-bit version */
>> + __le32 i_projid; /* Project ID */
>> };
>>
>> struct move_extent {
>> @@ -934,6 +938,7 @@ struct ext4_inode_info {
>>
>> /* Precomputed uuid+inum+igen checksum for seeding inode checksums */
>> __u32 i_csum_seed;
>> + kprojid_t i_projid;
>> };
>>
>> /*
>> @@ -1518,6 +1523,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> * GDT_CSUM bits are mutually exclusive.
>> */
>> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
>> +#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x1000 /* Project quota */
>
> I wonder if it makes sense to add EXT4_FEATURE_RO_COMPAT_METADATA_CSUM
> here as well, to make it clear why this is skipping a value?
>
>> #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
>> #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
>> @@ -1567,7 +1573,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
>> EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
>> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
>> - EXT4_FEATURE_RO_COMPAT_QUOTA)
>> + EXT4_FEATURE_RO_COMPAT_QUOTA |\
>> + EXT4_FEATURE_RO_COMPAT_PROJECT)
>>
>> /*
>> * Default values for user and/or group using reserved blocks
>> @@ -1575,6 +1582,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> #define EXT4_DEF_RESUID 0
>> #define EXT4_DEF_RESGID 0
>>
>> +/*
>> + * Default project ID
>> + */
>> +#define EXT4_DEF_PROJID 0
>> +
>> #define EXT4_DEF_INODE_READAHEAD_BLKS 32
>>
>> /*
>> @@ -2131,6 +2143,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
>> loff_t lstart, loff_t lend);
>> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
>> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
>> +extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
>> extern void ext4_da_update_reserve_space(struct inode *inode,
>> int used, int quota_claim);
>>
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index ac644c3..fefb948 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -756,6 +756,12 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>> inode->i_gid = dir->i_gid;
>> } else
>> inode_init_owner(inode, dir, mode);
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
>> + ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) {
>> + ei->i_projid = EXT4_I(dir)->i_projid;
>> + } else {
>> + ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
>> + }
>
> (style) no need for { } if only a single line in both if/else parts.
>
>> dquot_initialize(inode);
>>
>> if (!goal)
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 5653fa4..29204d4 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3863,6 +3863,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
>> EXT4_I(inode)->i_inline_off = 0;
>> }
>>
>> +int ext4_get_projid(struct inode *inode, kprojid_t *projid)
>> +{
>> + if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
>> + return -EOPNOTSUPP;
>> + *projid = EXT4_I(inode)->i_projid;
>> + return 0;
>> +}
>> +
>> struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> {
>> struct ext4_iloc iloc;
>> @@ -3874,6 +3882,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> int block;
>> uid_t i_uid;
>> gid_t i_gid;
>> + projid_t i_projid;
>>
>> inode = iget_locked(sb, ino);
>> if (!inode)
>> @@ -3923,12 +3932,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> inode->i_mode = le16_to_cpu(raw_inode->i_mode);
>> i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
>> i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
>> + i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
>> + else
>> + i_projid = EXT4_DEF_PROJID;
>> +
>> if (!(test_opt(inode->i_sb, NO_UID32))) {
>> i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
>> i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
>> }
>> i_uid_write(inode, i_uid);
>> i_gid_write(inode, i_gid);
>> + ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
>> set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
>>
>> ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
>> @@ -4158,6 +4173,7 @@ static int ext4_do_update_inode(handle_t *handle,
>> int need_datasync = 0, set_large_file = 0;
>> uid_t i_uid;
>> gid_t i_gid;
>> + projid_t i_projid;
>>
>> spin_lock(&ei->i_raw_lock);
>>
>> @@ -4170,6 +4186,7 @@ static int ext4_do_update_inode(handle_t *handle,
>> raw_inode->i_mode = cpu_to_le16(inode->i_mode);
>> i_uid = i_uid_read(inode);
>> i_gid = i_gid_read(inode);
>> + i_projid = from_kprojid(&init_user_ns, ei->i_projid);
>> if (!(test_opt(inode->i_sb, NO_UID32))) {
>> raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
>> raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
>> @@ -4249,6 +4266,18 @@ static int ext4_do_update_inode(handle_t *handle,
>> }
>> }
>>
>> + BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
>> + EXT4_FEATURE_RO_COMPAT_PROJECT) &&
>> + i_projid != EXT4_DEF_PROJID);
>> + if (i_projid != EXT4_DEF_PROJID &&
>> + (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
>> + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
>> + spin_unlock(&ei->i_raw_lock);
>> + err = -EFBIG;
>
> I'm not sure if -EFBIG is the best error case, since that is a common
> filesystem error. Maybe -EOVERFLOW would be better?
>
> Also, returning the error from ext4_mark_iloc_dirty->ext4_do_update_inode()
> is a bit late in the game, since the inode has already been modified at
> this point. That callpath typically only returned an error if the disk
> was bad or the journal aborted (again normally because the disk was bad),
> so at that point the dirty in-memory data was lost anyway.
>
> It would be better to check this in the caller before the inode is changed
> so that an error can be returned without (essentially) corrupting the
> in-memory state. Since the projid should only be set for new inodes (which
> will always have enough space, assuming RO_COMPAT_PROJECT cannot be set on
> filesystems with 128-byte inodes), or in case of rename into a project
> directory, it shouldn't be too big a change.
Sorry, I am not sure what I should do to improve this because of the reason that
Jan Kara mentioned. Please let me know if I need to do anything about this.

2015-01-22 15:20:21

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 10.12.2014 01:57, Dave Chinner wrote:
> On Tue, Dec 09, 2014 at 01:22:27PM +0800, Li Xi wrote:
>> This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
>> support for ext4. The interface is kept consistent with
>> XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
>>
>> Signed-off-by: Li Xi <[email protected]>
>> ---
>> fs/ext4/ext4.h | 111 ++++++++++++++++
>> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
>> fs/xfs/xfs_fs.h | 47 +++-----
>> include/uapi/linux/fs.h | 58 ++++++++
>> 4 files changed, 418 insertions(+), 128 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 136e18c..43a2a88 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -384,6 +384,115 @@ struct flex_groups {
>> #define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
>> #define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>>
>> +/* Transfer internal flags to xflags */
>> +static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
>> +{
>> + __u32 xflags = 0;
>> +
>> + if (iflags & EXT4_SECRM_FL)
>> + xflags |= FS_XFLAG_SECRM;
>> + if (iflags & EXT4_UNRM_FL)
>> + xflags |= FS_XFLAG_UNRM;
>> + if (iflags & EXT4_COMPR_FL)
>> + xflags |= FS_XFLAG_COMPR;
> ....
>
> NACK.
>
>> + if (iflags & EXT4_SYNC_FL)
>> + xflags |= FS_XFLAG_SYNC;
>> + if (iflags & EXT4_IMMUTABLE_FL)
>> + xflags |= FS_XFLAG_IMMUTABLE;
>> + if (iflags & EXT4_APPEND_FL)
>> + xflags |= FS_XFLAG_APPEND;
>> + if (iflags & EXT4_NODUMP_FL)
>> + xflags |= FS_XFLAG_NODUMP;
>> + if (iflags & EXT4_NOATIME_FL)
>> + xflags |= FS_XFLAG_NOATIME;
>
> These are OK because they already exist in the interface, but all
> the ext4 specific flags you've added have no place in this patchset.
>
>
>> +
>> /* Flags that should be inherited by new inodes from their parent. */
>> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
>> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
>> @@ -606,6 +715,8 @@ enum {
>> #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
>> #define EXT4_IOC_SWAP_BOOT _IO('f', 17)
>> #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
>> +#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
>> +#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR
>
> NACK. Userspace only needs FS_IOC_FS[GS]ETXATTR so that it works across
> all filesystems. We need to retain XFS_IOC_FS[GS]ETXATTR so we
> don't break existing userspace tools, but we do not need new
> filesystem specific definitions anywhere.
>
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index fcbf647..872fed5 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -58,6 +58,62 @@ struct inodes_stat_t {
>> long dummy[5]; /* padding for sysctl ABI compatibility */
>> };
>>
>> +/*
>> + * Extend attribute flags. These should be or-ed together to figure out what
>> + * is valid.
>> + */
>> +#define FSX_XFLAGS (1 << 0)
>> +#define FSX_EXTSIZE (1 << 1)
>> +#define FSX_NEXTENTS (1 << 2)
>> +#define FSX_PROJID (1 << 3)
>
> NACK.
>
> I've said this more than once: these are *private to XFS's
> implementation* and are not be part of the user interface. Do not
> move them from their existing location.
>
>
>> +
>> +/*
>> + * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
>> + */
>> +struct fsxattr {
>> + __u32 fsx_xflags; /* xflags field value (get/set) */
>> + __u32 fsx_extsize; /* extsize field value (get/set)*/
>> + __u32 fsx_nextents; /* nextents field value (get) */
>> + __u32 fsx_projid; /* project identifier (get/set) */
>> + unsigned char fsx_pad[12];
>> +};
>> +
>> +/*
>> + * Flags for the fsx_xflags field
>> + */
>> +#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
>> +#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
>> +#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */
>
> NACK - ext4 specific.
>
>> +#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
>> +#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
>> +#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
>> +#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
>> +#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
>> +#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
>> +#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
>> +#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
>> +#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
>> +#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
>> +#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
>> +#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
>
> existing flags.
>
>> +#define FS_XFLAG_UNRM 0x00008000 /* undelete */
>> +#define FS_XFLAG_COMPR 0x00010000 /* compress file */
>> +#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
>> +#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
>> +#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
>> +#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
>> +#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
>> +#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
>> +#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
>> +#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
>> +#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
>> +#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
>> +#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
>> +#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
>> +#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
>> +#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */
>
> And a bunch more ext4 specific flags that *uses all the remaining
> flag space*. At minimum, we need to keep space in this existing
> flags field for flags to future indication of how the padding is
> used, so that's yet another NACK.
>
> Further, none of these have any relevance to project quotas so
> should not be a part of this patchset. Nor are they relevant to any
> other filesystem, and many are duplicated by information you can get
> from FIEMAP and other interfaces. NACK, again.
>
> Because I'm getting annoyed at being repeatedly ignored about what
> needs to be done here, I'm now going to shout a bit. DO NOT CHANGE
> THE INTERFACE. DO NOT ADD any new flags to the interface. DO NOT
> REDEFINE how the interface works. DO NOT "ext4-ise" the interface.
>
> The only changes to the interface code should be moving the XFS
> definitions and renaming them so as to provide the new generic
> ioctl definition as well as the historic XFS definitions. The ext4
> implementation needs to be done in a separate patch to the interface
> rename, and it must only implement the functionality the interface
> already provides. Anything else is outside the scope of this
> patchset and requires separate discussion.

What reason for reusing XFS ioctl?

As I see quota tools from xfsprogs checks filesystem name and seems
like they wouldn't work without upgrade. e2fsprogs have to be updated
updated anyway to support changes in layout. So, in this case we could
design new generic ioctl/syscall interface for that. For example add
new commands to quotactl() instead of yet another obscure ioctl.

Also: is quota for project-id '0' really required for something?
It adds overhead but I don't see any use-cases for that.



>
> Cheers,
>
> Dave.
>


2015-01-22 15:28:51

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 09.12.2014 08:22, Li Xi wrote:
> This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
> support for ext4. The interface is kept consistent with
> XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
>
> Signed-off-by: Li Xi <[email protected]>
> ---
> fs/ext4/ext4.h | 111 ++++++++++++++++
> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
> fs/xfs/xfs_fs.h | 47 +++-----
> include/uapi/linux/fs.h | 58 ++++++++
> 4 files changed, 418 insertions(+), 128 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 136e18c..43a2a88 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -384,6 +384,115 @@ struct flex_groups {
> #define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
> #define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>
> +/* Transfer internal flags to xflags */
> +static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
> +{
> + __u32 xflags = 0;
> +
> + if (iflags & EXT4_SECRM_FL)
> + xflags |= FS_XFLAG_SECRM;
> + if (iflags & EXT4_UNRM_FL)
> + xflags |= FS_XFLAG_UNRM;
> + if (iflags & EXT4_COMPR_FL)
> + xflags |= FS_XFLAG_COMPR;
> + if (iflags & EXT4_SYNC_FL)
> + xflags |= FS_XFLAG_SYNC;
> + if (iflags & EXT4_IMMUTABLE_FL)
> + xflags |= FS_XFLAG_IMMUTABLE;
> + if (iflags & EXT4_APPEND_FL)
> + xflags |= FS_XFLAG_APPEND;
> + if (iflags & EXT4_NODUMP_FL)
> + xflags |= FS_XFLAG_NODUMP;
> + if (iflags & EXT4_NOATIME_FL)
> + xflags |= FS_XFLAG_NOATIME;
> + if (iflags & EXT4_COMPRBLK_FL)
> + xflags |= FS_XFLAG_COMPRBLK;
> + if (iflags & EXT4_NOCOMPR_FL)
> + xflags |= FS_XFLAG_NOCOMPR;
> + if (iflags & EXT4_ECOMPR_FL)
> + xflags |= FS_XFLAG_ECOMPR;
> + if (iflags & EXT4_INDEX_FL)
> + xflags |= FS_XFLAG_INDEX;
> + if (iflags & EXT4_IMAGIC_FL)
> + xflags |= FS_XFLAG_IMAGIC;
> + if (iflags & EXT4_JOURNAL_DATA_FL)
> + xflags |= FS_XFLAG_JOURNAL_DATA;
> + if (iflags & EXT4_NOTAIL_FL)
> + xflags |= FS_XFLAG_NOTAIL;
> + if (iflags & EXT4_DIRSYNC_FL)
> + xflags |= FS_XFLAG_DIRSYNC;
> + if (iflags & EXT4_TOPDIR_FL)
> + xflags |= FS_XFLAG_TOPDIR;
> + if (iflags & EXT4_HUGE_FILE_FL)
> + xflags |= FS_XFLAG_HUGE_FILE;
> + if (iflags & EXT4_EXTENTS_FL)
> + xflags |= FS_XFLAG_EXTENTS;
> + if (iflags & EXT4_EA_INODE_FL)
> + xflags |= FS_XFLAG_EA_INODE;
> + if (iflags & EXT4_EOFBLOCKS_FL)
> + xflags |= FS_XFLAG_EOFBLOCKS;
> + if (iflags & EXT4_INLINE_DATA_FL)
> + xflags |= FS_XFLAG_INLINE_DATA;
> + if (iflags & EXT4_PROJINHERIT_FL)
> + xflags |= FS_XFLAG_PROJINHERIT;
> + return xflags;
> +}
> +
> +/* Transfer xflags flags to internal */
> +static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
> +{
> + unsigned long iflags = 0;
> +
> + if (xflags & FS_XFLAG_SECRM)
> + iflags |= EXT4_SECRM_FL;
> + if (xflags & FS_XFLAG_UNRM)
> + iflags |= EXT4_UNRM_FL;
> + if (xflags & FS_XFLAG_COMPR)
> + iflags |= EXT4_COMPR_FL;
> + if (xflags & FS_XFLAG_SYNC)
> + iflags |= EXT4_SYNC_FL;
> + if (xflags & FS_XFLAG_IMMUTABLE)
> + iflags |= EXT4_IMMUTABLE_FL;
> + if (xflags & FS_XFLAG_APPEND)
> + iflags |= EXT4_APPEND_FL;
> + if (xflags & FS_XFLAG_NODUMP)
> + iflags |= EXT4_NODUMP_FL;
> + if (xflags & FS_XFLAG_NOATIME)
> + iflags |= EXT4_NOATIME_FL;
> + if (xflags & FS_XFLAG_COMPRBLK)
> + iflags |= EXT4_COMPRBLK_FL;
> + if (xflags & FS_XFLAG_NOCOMPR)
> + iflags |= EXT4_NOCOMPR_FL;
> + if (xflags & FS_XFLAG_ECOMPR)
> + iflags |= EXT4_ECOMPR_FL;
> + if (xflags & FS_XFLAG_INDEX)
> + iflags |= EXT4_INDEX_FL;
> + if (xflags & FS_XFLAG_IMAGIC)
> + iflags |= EXT4_IMAGIC_FL;
> + if (xflags & FS_XFLAG_JOURNAL_DATA)
> + iflags |= EXT4_JOURNAL_DATA_FL;
> + if (xflags & FS_XFLAG_IMAGIC)
> + iflags |= FS_XFLAG_NOTAIL;
> + if (xflags & FS_XFLAG_DIRSYNC)
> + iflags |= EXT4_DIRSYNC_FL;
> + if (xflags & FS_XFLAG_TOPDIR)
> + iflags |= EXT4_TOPDIR_FL;
> + if (xflags & FS_XFLAG_HUGE_FILE)
> + iflags |= EXT4_HUGE_FILE_FL;
> + if (xflags & FS_XFLAG_EXTENTS)
> + iflags |= EXT4_EXTENTS_FL;
> + if (xflags & FS_XFLAG_EA_INODE)
> + iflags |= EXT4_EA_INODE_FL;
> + if (xflags & FS_XFLAG_EOFBLOCKS)
> + iflags |= EXT4_EOFBLOCKS_FL;
> + if (xflags & FS_XFLAG_INLINE_DATA)
> + iflags |= EXT4_INLINE_DATA_FL;
> + if (xflags & FS_XFLAG_PROJINHERIT)
> + iflags |= EXT4_PROJINHERIT_FL;
> +
> + return iflags;
> +}
> +
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> @@ -606,6 +715,8 @@ enum {
> #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
> #define EXT4_IOC_SWAP_BOOT _IO('f', 17)
> #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
> +#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
> +#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR
>
> #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
> /*
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index f58a0d1..8332476 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -14,6 +14,8 @@
> #include <linux/compat.h>
> #include <linux/mount.h>
> #include <linux/file.h>
> +#include <linux/quotaops.h>
> +#include <linux/quota.h>
> #include <asm/uaccess.h>
> #include "ext4_jbd2.h"
> #include "ext4.h"
> @@ -196,126 +198,220 @@ journal_err_out:
> return err;
> }
>
> -long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> +static int ext4_ioctl_setflags(struct file *filp, unsigned int flags)
> {
> struct inode *inode = file_inode(filp);
> - struct super_block *sb = inode->i_sb;
> struct ext4_inode_info *ei = EXT4_I(inode);
> - unsigned int flags;
> + handle_t *handle = NULL;
> + int err, migrate = 0;
> + struct ext4_iloc iloc;
> + unsigned int oldflags, mask, i;
> + unsigned int jflag;
>
> - ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
> + if (!inode_owner_or_capable(inode))
> + return -EACCES;
>
> - switch (cmd) {
> - case EXT4_IOC_GETFLAGS:
> - ext4_get_inode_flags(ei);
> - flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
> - return put_user(flags, (int __user *) arg);
> - case EXT4_IOC_SETFLAGS: {
> - handle_t *handle = NULL;
> - int err, migrate = 0;
> - struct ext4_iloc iloc;
> - unsigned int oldflags, mask, i;
> - unsigned int jflag;
> + err = mnt_want_write_file(filp);
> + if (err)
> + return err;
>
> - if (!inode_owner_or_capable(inode))
> - return -EACCES;
> + flags = ext4_mask_flags(inode->i_mode, flags);
>
> - if (get_user(flags, (int __user *) arg))
> - return -EFAULT;
> + err = -EPERM;
> + mutex_lock(&inode->i_mutex);
> + /* Is it quota file? Do not allow user to mess with it */
> + if (IS_NOQUOTA(inode))
> + goto flags_out;
>
> - err = mnt_want_write_file(filp);
> - if (err)
> - return err;
> + oldflags = ei->i_flags;
>
> - flags = ext4_mask_flags(inode->i_mode, flags);
> + /* The JOURNAL_DATA flag is modifiable only by root */
> + jflag = flags & EXT4_JOURNAL_DATA_FL;
>
> - err = -EPERM;
> - mutex_lock(&inode->i_mutex);
> - /* Is it quota file? Do not allow user to mess with it */
> - if (IS_NOQUOTA(inode))
> + /*
> + * The IMMUTABLE and APPEND_ONLY flags can only be changed by
> + * the relevant capability.
> + *
> + * This test looks nicer. Thanks to Pauline Middelink
> + */
> + if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
> + if (!capable(CAP_LINUX_IMMUTABLE))
> goto flags_out;
> + }
>
> - oldflags = ei->i_flags;
> -
> - /* The JOURNAL_DATA flag is modifiable only by root */
> - jflag = flags & EXT4_JOURNAL_DATA_FL;
> -
> - /*
> - * The IMMUTABLE and APPEND_ONLY flags can only be changed by
> - * the relevant capability.
> - *
> - * This test looks nicer. Thanks to Pauline Middelink
> - */
> - if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
> - if (!capable(CAP_LINUX_IMMUTABLE))
> - goto flags_out;
> - }
> -
> - /*
> - * The JOURNAL_DATA flag can only be changed by
> - * the relevant capability.
> - */
> - if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
> - if (!capable(CAP_SYS_RESOURCE))
> - goto flags_out;
> - }
> - if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
> - migrate = 1;
> -
> + /*
> + * The JOURNAL_DATA flag can only be changed by
> + * the relevant capability.
> + */
> + if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
> + if (!capable(CAP_SYS_RESOURCE))
> + goto flags_out;
> + }
> + if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
> + migrate = 1;
> if (flags & EXT4_EOFBLOCKS_FL) {
> - /* we don't support adding EOFBLOCKS flag */
> - if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
> - err = -EOPNOTSUPP;
> - goto flags_out;
> - }
> - } else if (oldflags & EXT4_EOFBLOCKS_FL)
> - ext4_truncate(inode);
> -
> - handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> - if (IS_ERR(handle)) {
> - err = PTR_ERR(handle);
> + /* we don't support adding EOFBLOCKS flag */
> + if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
> + err = -EOPNOTSUPP;
> goto flags_out;
> }
> - if (IS_SYNC(inode))
> - ext4_handle_sync(handle);
> - err = ext4_reserve_inode_write(handle, inode, &iloc);
> - if (err)
> - goto flags_err;
> -
> - for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
> - if (!(mask & EXT4_FL_USER_MODIFIABLE))
> - continue;
> - if (mask & flags)
> - ext4_set_inode_flag(inode, i);
> - else
> - ext4_clear_inode_flag(inode, i);
> - }
> + } else if (oldflags & EXT4_EOFBLOCKS_FL)
> + ext4_truncate(inode);
>
> - ext4_set_inode_flags(inode);
> - inode->i_ctime = ext4_current_time(inode);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> + if (IS_ERR(handle)) {
> + err = PTR_ERR(handle);
> + goto flags_out;
> + }
> + if (IS_SYNC(inode))
> + ext4_handle_sync(handle);
> + err = ext4_reserve_inode_write(handle, inode, &iloc);
> + if (err)
> + goto flags_err;
> +
> + for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
> + if (!(mask & EXT4_FL_USER_MODIFIABLE))
> + continue;
> + if (mask & flags)
> + ext4_set_inode_flag(inode, i);
> + else
> + ext4_clear_inode_flag(inode, i);
> + }
>
> - err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> -flags_err:
> - ext4_journal_stop(handle);
> - if (err)
> - goto flags_out;
> + ext4_set_inode_flags(inode);
> + inode->i_ctime = ext4_current_time(inode);
>
> - if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
> - err = ext4_change_inode_journal_flag(inode, jflag);
> - if (err)
> - goto flags_out;
> - if (migrate) {
> - if (flags & EXT4_EXTENTS_FL)
> - err = ext4_ext_migrate(inode);
> - else
> - err = ext4_ind_migrate(inode);
> - }
> + err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> +flags_err:
> + ext4_journal_stop(handle);
> + if (err)
> + goto flags_out;
> +
> + if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
> + err = ext4_change_inode_journal_flag(inode, jflag);
> + if (err)
> + goto flags_out;
> + if (migrate) {
> + if (flags & EXT4_EXTENTS_FL)
> + err = ext4_ext_migrate(inode);
> + else
> + err = ext4_ind_migrate(inode);
> + }
>
> flags_out:
> - mutex_unlock(&inode->i_mutex);
> - mnt_drop_write_file(filp);
> + mutex_unlock(&inode->i_mutex);
> + mnt_drop_write_file(filp);
> + return err;
> +}
> +
> +static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
> +{
> + struct inode *inode = file_inode(filp);
> + struct super_block *sb = inode->i_sb;
> + struct ext4_inode_info *ei = EXT4_I(inode);
> + int err;
> + handle_t *handle;
> + kprojid_t kprojid;
> + struct ext4_iloc iloc;
> + struct ext4_inode *raw_inode;
> +
> + struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
> +
> + /* Make sure caller can change project. */
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + if (projid != EXT4_DEF_PROJID
> + && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
> + EXT4_FEATURE_RO_COMPAT_PROJECT))
> + return -EOPNOTSUPP;
> +
> + if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
> + EXT4_FEATURE_RO_COMPAT_PROJECT)) {
> + BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
> + != EXT4_DEF_PROJID);
> + if (projid != EXT4_DEF_PROJID)
> + return -EOPNOTSUPP;
> + else
> + return 0;
> + }
> +
> + kprojid = make_kprojid(&init_user_ns, (projid_t)projid);

Maybe current_user_ns()?
This code should be user-namespace aware from the beginning.

> +
> + if (projid_eq(kprojid, EXT4_I(inode)->i_projid))
> + return 0;
> +
> + err = mnt_want_write_file(filp);
> + if (err)
> return err;
> +
> + err = -EPERM;
> + mutex_lock(&inode->i_mutex);
> + /* Is it quota file? Do not allow user to mess with it */
> + if (IS_NOQUOTA(inode))
> + goto project_out;
> +
> + dquot_initialize(inode);
> +
> + handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
> + EXT4_QUOTA_INIT_BLOCKS(sb) +
> + EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
> + if (IS_ERR(handle)) {
> + err = PTR_ERR(handle);
> + goto project_out;
> + }
> +
> + err = ext4_reserve_inode_write(handle, inode, &iloc);
> + if (err)
> + goto project_stop;
> +
> + raw_inode = ext4_raw_inode(&iloc);
> + if ((EXT4_INODE_SIZE(sb) <=
> + EXT4_GOOD_OLD_INODE_SIZE) ||
> + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))) {
> + err = -EFBIG;
> + goto project_stop;
> }
> +
> + transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
> + if (!transfer_to[PRJQUOTA])
> + goto project_set;
> +
> + err = __dquot_transfer(inode, transfer_to);
> + dqput(transfer_to[PRJQUOTA]);
> + if (err)
> + goto project_stop;
> +
> +project_set:
> + EXT4_I(inode)->i_projid = kprojid;
> + inode->i_ctime = ext4_current_time(inode);
> + err = ext4_mark_iloc_dirty(handle, inode, &iloc);
> +project_stop:
> + ext4_journal_stop(handle);
> +project_out:
> + mutex_unlock(&inode->i_mutex);
> + mnt_drop_write_file(filp);
> + return err;
> +}
> +
> +long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct super_block *sb = inode->i_sb;
> + struct ext4_inode_info *ei = EXT4_I(inode);
> + unsigned int flags;
> +
> + ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
> +
> + switch (cmd) {
> + case EXT4_IOC_GETFLAGS:
> + ext4_get_inode_flags(ei);
> + flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
> + return put_user(flags, (int __user *) arg);
> + case EXT4_IOC_SETFLAGS:
> + if (get_user(flags, (int __user *) arg))
> + return -EFAULT;
> + return ext4_ioctl_setflags(filp, flags);
> case EXT4_IOC_GETVERSION:
> case EXT4_IOC_GETVERSION_OLD:
> return put_user(inode->i_generation, (int __user *) arg);
> @@ -615,7 +711,45 @@ resizefs_out:
> }
> case EXT4_IOC_PRECACHE_EXTENTS:
> return ext4_ext_precache(inode);
> + case EXT4_IOC_FSGETXATTR:
> + {
> + struct fsxattr fa;
> +
> + memset(&fa, 0, sizeof(struct fsxattr));
> +
> + ext4_get_inode_flags(ei);
> + fa.fsx_xflags = ext4_iflags_to_xflags(ei->i_flags & EXT4_FL_USER_VISIBLE);
> +
> + if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
> + EXT4_FEATURE_RO_COMPAT_PROJECT)) {
> + fa.fsx_projid = (__u32)from_kprojid(&init_user_ns,
> + EXT4_I(inode)->i_projid);

Same here.

> + }
> +
> + if (copy_to_user((struct fsxattr __user *)arg,
> + &fa, sizeof(fa)))
> + return -EFAULT;
> + return 0;
> + }
> + case EXT4_IOC_FSSETXATTR:
> + {
> + struct fsxattr fa;
> + int err;
> +
> + if (copy_from_user(&fa, (struct fsxattr __user *)arg,
> + sizeof(fa)))
> + return -EFAULT;
>
> + err = ext4_ioctl_setflags(filp, ext4_xflags_to_iflags(fa.fsx_xflags));
> + if (err)
> + return err;
> +
> + err = ext4_ioctl_setproject(filp, fa.fsx_projid);
> + if (err)
> + return err;
> +
> + return 0;
> + }
> default:
> return -ENOTTY;
> }
> diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
> index 18dc721..64c7ae6 100644
> --- a/fs/xfs/xfs_fs.h
> +++ b/fs/xfs/xfs_fs.h
> @@ -36,38 +36,25 @@ struct dioattr {
> #endif
>
> /*
> - * Structure for XFS_IOC_FSGETXATTR[A] and XFS_IOC_FSSETXATTR.
> - */
> -#ifndef HAVE_FSXATTR
> -struct fsxattr {
> - __u32 fsx_xflags; /* xflags field value (get/set) */
> - __u32 fsx_extsize; /* extsize field value (get/set)*/
> - __u32 fsx_nextents; /* nextents field value (get) */
> - __u32 fsx_projid; /* project identifier (get/set) */
> - unsigned char fsx_pad[12];
> -};
> -#endif
> -
> -/*
> * Flags for the bs_xflags/fsx_xflags field
> * There should be a one-to-one correspondence between these flags and the
> * XFS_DIFLAG_s.
> */
> -#define XFS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
> -#define XFS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
> -#define XFS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
> -#define XFS_XFLAG_APPEND 0x00000010 /* all writes append */
> -#define XFS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
> -#define XFS_XFLAG_NOATIME 0x00000040 /* do not update access time */
> -#define XFS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
> -#define XFS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
> -#define XFS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
> -#define XFS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
> -#define XFS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
> -#define XFS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
> -#define XFS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
> -#define XFS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> -#define XFS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
> +#define XFS_XFLAG_REALTIME FS_XFLAG_REALTIME /* data in realtime volume */
> +#define XFS_XFLAG_PREALLOC FS_XFLAG_PREALLOC /* preallocated file extents */
> +#define XFS_XFLAG_IMMUTABLE FS_XFLAG_IMMUTABLE /* file cannot be modified */
> +#define XFS_XFLAG_APPEND FS_XFLAG_APPEND /* all writes append */
> +#define XFS_XFLAG_SYNC FS_XFLAG_SYNC /* all writes synchronous */
> +#define XFS_XFLAG_NOATIME FS_XFLAG_NOATIME /* do not update access time */
> +#define XFS_XFLAG_NODUMP FS_XFLAG_NODUMP /* do not include in backups */
> +#define XFS_XFLAG_RTINHERIT FS_XFLAG_RTINHERIT /* create with rt bit set */
> +#define XFS_XFLAG_PROJINHERIT FS_XFLAG_PROJINHERIT /* create with parents projid */
> +#define XFS_XFLAG_NOSYMLINKS FS_XFLAG_NOSYMLINKS /* disallow symlink creation */
> +#define XFS_XFLAG_EXTSIZE FS_XFLAG_EXTSIZE /* extent size allocator hint */
> +#define XFS_XFLAG_EXTSZINHERIT FS_XFLAG_EXTSZINHERIT /* inherit inode extent size */
> +#define XFS_XFLAG_NODEFRAG FS_XFLAG_NODEFRAG /* do not defragment */
> +#define XFS_XFLAG_FILESTREAM FS_XFLAG_FILESTREAM /* use filestream allocator */
> +#define XFS_XFLAG_HASATTR FS_XFLAG_HASATTR /* no DIFLAG for this */
>
> /*
> * Structure for XFS_IOC_GETBMAP.
> @@ -503,8 +490,8 @@ typedef struct xfs_swapext
> #define XFS_IOC_ALLOCSP _IOW ('X', 10, struct xfs_flock64)
> #define XFS_IOC_FREESP _IOW ('X', 11, struct xfs_flock64)
> #define XFS_IOC_DIOINFO _IOR ('X', 30, struct dioattr)
> -#define XFS_IOC_FSGETXATTR _IOR ('X', 31, struct fsxattr)
> -#define XFS_IOC_FSSETXATTR _IOW ('X', 32, struct fsxattr)
> +#define XFS_IOC_FSGETXATTR FS_IOC_FSGETXATTR
> +#define XFS_IOC_FSSETXATTR FS_IOC_FSSETXATTR
> #define XFS_IOC_ALLOCSP64 _IOW ('X', 36, struct xfs_flock64)
> #define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
> #define XFS_IOC_GETBMAP _IOWR('X', 38, struct getbmap)
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index fcbf647..872fed5 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -58,6 +58,62 @@ struct inodes_stat_t {
> long dummy[5]; /* padding for sysctl ABI compatibility */
> };
>
> +/*
> + * Extend attribute flags. These should be or-ed together to figure out what
> + * is valid.
> + */
> +#define FSX_XFLAGS (1 << 0)
> +#define FSX_EXTSIZE (1 << 1)
> +#define FSX_NEXTENTS (1 << 2)
> +#define FSX_PROJID (1 << 3)
> +
> +/*
> + * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
> + */
> +struct fsxattr {
> + __u32 fsx_xflags; /* xflags field value (get/set) */
> + __u32 fsx_extsize; /* extsize field value (get/set)*/
> + __u32 fsx_nextents; /* nextents field value (get) */
> + __u32 fsx_projid; /* project identifier (get/set) */
> + unsigned char fsx_pad[12];
> +};
> +
> +/*
> + * Flags for the fsx_xflags field
> + */
> +#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
> +#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
> +#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */
> +#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
> +#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
> +#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
> +#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
> +#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
> +#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
> +#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
> +#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
> +#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
> +#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
> +#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
> +#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> +#define FS_XFLAG_UNRM 0x00008000 /* undelete */
> +#define FS_XFLAG_COMPR 0x00010000 /* compress file */
> +#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
> +#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
> +#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
> +#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
> +#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
> +#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
> +#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
> +#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
> +#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
> +#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
> +#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
> +#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
> +#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
> +#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */
> +#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
> +
>
> #define NR_FILE 8192 /* this can well be larger on a larger system */
>
> @@ -163,6 +219,8 @@ struct inodes_stat_t {
> #define FS_IOC_GETVERSION _IOR('v', 1, long)
> #define FS_IOC_SETVERSION _IOW('v', 2, long)
> #define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
> +#define FS_IOC_FSGETXATTR _IOR('f', 31, struct fsxattr)
> +#define FS_IOC_FSSETXATTR _IOW('f', 32, struct fsxattr)
> #define FS_IOC32_GETFLAGS _IOR('f', 1, int)
> #define FS_IOC32_SETFLAGS _IOW('f', 2, int)
> #define FS_IOC32_GETVERSION _IOR('v', 1, int)
>


2015-01-22 15:59:00

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Thu 22-01-15 18:20:15, Konstantin Khlebnikov wrote:
> On 10.12.2014 01:57, Dave Chinner wrote:
> >On Tue, Dec 09, 2014 at 01:22:27PM +0800, Li Xi wrote:
> >>This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
> >>support for ext4. The interface is kept consistent with
> >>XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
> >>
> >>Signed-off-by: Li Xi <[email protected]>
> >>---
> >> fs/ext4/ext4.h | 111 ++++++++++++++++
> >> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
> >> fs/xfs/xfs_fs.h | 47 +++-----
> >> include/uapi/linux/fs.h | 58 ++++++++
> >> 4 files changed, 418 insertions(+), 128 deletions(-)
> >>
> >>diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >>index 136e18c..43a2a88 100644
> >>--- a/fs/ext4/ext4.h
> >>+++ b/fs/ext4/ext4.h
> >>@@ -384,6 +384,115 @@ struct flex_groups {
> >> #define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
> >> #define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
> >>
> >>+/* Transfer internal flags to xflags */
> >>+static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
> >>+{
> >>+ __u32 xflags = 0;
> >>+
> >>+ if (iflags & EXT4_SECRM_FL)
> >>+ xflags |= FS_XFLAG_SECRM;
> >>+ if (iflags & EXT4_UNRM_FL)
> >>+ xflags |= FS_XFLAG_UNRM;
> >>+ if (iflags & EXT4_COMPR_FL)
> >>+ xflags |= FS_XFLAG_COMPR;
> >....
> >
> >NACK.
> >
> >>+ if (iflags & EXT4_SYNC_FL)
> >>+ xflags |= FS_XFLAG_SYNC;
> >>+ if (iflags & EXT4_IMMUTABLE_FL)
> >>+ xflags |= FS_XFLAG_IMMUTABLE;
> >>+ if (iflags & EXT4_APPEND_FL)
> >>+ xflags |= FS_XFLAG_APPEND;
> >>+ if (iflags & EXT4_NODUMP_FL)
> >>+ xflags |= FS_XFLAG_NODUMP;
> >>+ if (iflags & EXT4_NOATIME_FL)
> >>+ xflags |= FS_XFLAG_NOATIME;
> >
> >These are OK because they already exist in the interface, but all
> >the ext4 specific flags you've added have no place in this patchset.
> >
> >
> >>+
> >> /* Flags that should be inherited by new inodes from their parent. */
> >> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> >> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> >>@@ -606,6 +715,8 @@ enum {
> >> #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
> >> #define EXT4_IOC_SWAP_BOOT _IO('f', 17)
> >> #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
> >>+#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
> >>+#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR
> >
> >NACK. Userspace only needs FS_IOC_FS[GS]ETXATTR so that it works across
> >all filesystems. We need to retain XFS_IOC_FS[GS]ETXATTR so we
> >don't break existing userspace tools, but we do not need new
> >filesystem specific definitions anywhere.
> >
> >>diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> >>index fcbf647..872fed5 100644
> >>--- a/include/uapi/linux/fs.h
> >>+++ b/include/uapi/linux/fs.h
> >>@@ -58,6 +58,62 @@ struct inodes_stat_t {
> >> long dummy[5]; /* padding for sysctl ABI compatibility */
> >> };
> >>
> >>+/*
> >>+ * Extend attribute flags. These should be or-ed together to figure out what
> >>+ * is valid.
> >>+ */
> >>+#define FSX_XFLAGS (1 << 0)
> >>+#define FSX_EXTSIZE (1 << 1)
> >>+#define FSX_NEXTENTS (1 << 2)
> >>+#define FSX_PROJID (1 << 3)
> >
> >NACK.
> >
> >I've said this more than once: these are *private to XFS's
> >implementation* and are not be part of the user interface. Do not
> >move them from their existing location.
> >
> >
> >>+
> >>+/*
> >>+ * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
> >>+ */
> >>+struct fsxattr {
> >>+ __u32 fsx_xflags; /* xflags field value (get/set) */
> >>+ __u32 fsx_extsize; /* extsize field value (get/set)*/
> >>+ __u32 fsx_nextents; /* nextents field value (get) */
> >>+ __u32 fsx_projid; /* project identifier (get/set) */
> >>+ unsigned char fsx_pad[12];
> >>+};
> >>+
> >>+/*
> >>+ * Flags for the fsx_xflags field
> >>+ */
> >>+#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
> >>+#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
> >>+#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */
> >
> >NACK - ext4 specific.
> >
> >>+#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
> >>+#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
> >>+#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
> >>+#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
> >>+#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
> >>+#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
> >>+#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
> >>+#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
> >>+#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
> >>+#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
> >>+#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
> >>+#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> >
> >existing flags.
> >
> >>+#define FS_XFLAG_UNRM 0x00008000 /* undelete */
> >>+#define FS_XFLAG_COMPR 0x00010000 /* compress file */
> >>+#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
> >>+#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
> >>+#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
> >>+#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
> >>+#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
> >>+#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
> >>+#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
> >>+#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
> >>+#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
> >>+#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
> >>+#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
> >>+#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
> >>+#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
> >>+#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */
> >
> >And a bunch more ext4 specific flags that *uses all the remaining
> >flag space*. At minimum, we need to keep space in this existing
> >flags field for flags to future indication of how the padding is
> >used, so that's yet another NACK.
> >
> >Further, none of these have any relevance to project quotas so
> >should not be a part of this patchset. Nor are they relevant to any
> >other filesystem, and many are duplicated by information you can get
> >from FIEMAP and other interfaces. NACK, again.
> >
> >Because I'm getting annoyed at being repeatedly ignored about what
> >needs to be done here, I'm now going to shout a bit. DO NOT CHANGE
> >THE INTERFACE. DO NOT ADD any new flags to the interface. DO NOT
> >REDEFINE how the interface works. DO NOT "ext4-ise" the interface.
> >
> >The only changes to the interface code should be moving the XFS
> >definitions and renaming them so as to provide the new generic
> >ioctl definition as well as the historic XFS definitions. The ext4
> >implementation needs to be done in a separate patch to the interface
> >rename, and it must only implement the functionality the interface
> >already provides. Anything else is outside the scope of this
> >patchset and requires separate discussion.
>
> What reason for reusing XFS ioctl?
>
> As I see quota tools from xfsprogs checks filesystem name and seems
> like they wouldn't work without upgrade. e2fsprogs have to be updated
> updated anyway to support changes in layout. So, in this case we could
> design new generic ioctl/syscall interface for that. For example add
> new commands to quotactl() instead of yet another obscure ioctl.
Using quotactl() for setting / getting project ID is IMO a wrong fit.
quotactl() is used to manipulate quota usage & limits but not file
properties. And reusing XFS ioctl is IMHO better than inventing a new
ioctl - ext4 can use the same ioctl as XFS just fine. It's only that Li Xi
mixed in unrelated changes to the ext4 support for that ioctl.

> Also: is quota for project-id '0' really required for something?
> It adds overhead but I don't see any use-cases for that.
But only if filesystem has project quota feature enabled, no? That
doesn't concern me too much since the overhead doesn't seem to big and when
you enable project quotas you likely want to use them ;-). But if you are
concerned, I'm not strictly opposed to special casing of project id 0. But
I'd like to see how much speed you gain by that before complicating the
code...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-01-22 18:35:26

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Thu, Jan 22, 2015 at 6:59 PM, Jan Kara <[email protected]> wrote:
> On Thu 22-01-15 18:20:15, Konstantin Khlebnikov wrote:
>> On 10.12.2014 01:57, Dave Chinner wrote:
>> >On Tue, Dec 09, 2014 at 01:22:27PM +0800, Li Xi wrote:
>> >>This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
>> >>support for ext4. The interface is kept consistent with
>> >>XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
>> >>
>> >>Signed-off-by: Li Xi <[email protected]>
>> >>---
>> >> fs/ext4/ext4.h | 111 ++++++++++++++++
>> >> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++++--------------
>> >> fs/xfs/xfs_fs.h | 47 +++-----
>> >> include/uapi/linux/fs.h | 58 ++++++++
>> >> 4 files changed, 418 insertions(+), 128 deletions(-)
>> >>
>> >>diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> >>index 136e18c..43a2a88 100644
>> >>--- a/fs/ext4/ext4.h
>> >>+++ b/fs/ext4/ext4.h
>> >>@@ -384,6 +384,115 @@ struct flex_groups {
>> >> #define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
>> >> #define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>> >>
>> >>+/* Transfer internal flags to xflags */
>> >>+static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
>> >>+{
>> >>+ __u32 xflags = 0;
>> >>+
>> >>+ if (iflags & EXT4_SECRM_FL)
>> >>+ xflags |= FS_XFLAG_SECRM;
>> >>+ if (iflags & EXT4_UNRM_FL)
>> >>+ xflags |= FS_XFLAG_UNRM;
>> >>+ if (iflags & EXT4_COMPR_FL)
>> >>+ xflags |= FS_XFLAG_COMPR;
>> >....
>> >
>> >NACK.
>> >
>> >>+ if (iflags & EXT4_SYNC_FL)
>> >>+ xflags |= FS_XFLAG_SYNC;
>> >>+ if (iflags & EXT4_IMMUTABLE_FL)
>> >>+ xflags |= FS_XFLAG_IMMUTABLE;
>> >>+ if (iflags & EXT4_APPEND_FL)
>> >>+ xflags |= FS_XFLAG_APPEND;
>> >>+ if (iflags & EXT4_NODUMP_FL)
>> >>+ xflags |= FS_XFLAG_NODUMP;
>> >>+ if (iflags & EXT4_NOATIME_FL)
>> >>+ xflags |= FS_XFLAG_NOATIME;
>> >
>> >These are OK because they already exist in the interface, but all
>> >the ext4 specific flags you've added have no place in this patchset.
>> >
>> >
>> >>+
>> >> /* Flags that should be inherited by new inodes from their parent. */
>> >> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
>> >> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
>> >>@@ -606,6 +715,8 @@ enum {
>> >> #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
>> >> #define EXT4_IOC_SWAP_BOOT _IO('f', 17)
>> >> #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
>> >>+#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
>> >>+#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR
>> >
>> >NACK. Userspace only needs FS_IOC_FS[GS]ETXATTR so that it works across
>> >all filesystems. We need to retain XFS_IOC_FS[GS]ETXATTR so we
>> >don't break existing userspace tools, but we do not need new
>> >filesystem specific definitions anywhere.
>> >
>> >>diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> >>index fcbf647..872fed5 100644
>> >>--- a/include/uapi/linux/fs.h
>> >>+++ b/include/uapi/linux/fs.h
>> >>@@ -58,6 +58,62 @@ struct inodes_stat_t {
>> >> long dummy[5]; /* padding for sysctl ABI compatibility */
>> >> };
>> >>
>> >>+/*
>> >>+ * Extend attribute flags. These should be or-ed together to figure out what
>> >>+ * is valid.
>> >>+ */
>> >>+#define FSX_XFLAGS (1 << 0)
>> >>+#define FSX_EXTSIZE (1 << 1)
>> >>+#define FSX_NEXTENTS (1 << 2)
>> >>+#define FSX_PROJID (1 << 3)
>> >
>> >NACK.
>> >
>> >I've said this more than once: these are *private to XFS's
>> >implementation* and are not be part of the user interface. Do not
>> >move them from their existing location.
>> >
>> >
>> >>+
>> >>+/*
>> >>+ * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
>> >>+ */
>> >>+struct fsxattr {
>> >>+ __u32 fsx_xflags; /* xflags field value (get/set) */
>> >>+ __u32 fsx_extsize; /* extsize field value (get/set)*/
>> >>+ __u32 fsx_nextents; /* nextents field value (get) */
>> >>+ __u32 fsx_projid; /* project identifier (get/set) */
>> >>+ unsigned char fsx_pad[12];
>> >>+};
>> >>+
>> >>+/*
>> >>+ * Flags for the fsx_xflags field
>> >>+ */
>> >>+#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
>> >>+#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
>> >>+#define FS_XFLAG_SECRM 0x00000004 /* secure deletion */
>> >
>> >NACK - ext4 specific.
>> >
>> >>+#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
>> >>+#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
>> >>+#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
>> >>+#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
>> >>+#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
>> >>+#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
>> >>+#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
>> >>+#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
>> >>+#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
>> >>+#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
>> >>+#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
>> >>+#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
>> >
>> >existing flags.
>> >
>> >>+#define FS_XFLAG_UNRM 0x00008000 /* undelete */
>> >>+#define FS_XFLAG_COMPR 0x00010000 /* compress file */
>> >>+#define FS_XFLAG_COMPRBLK 0x00020000 /* one or more compressed clusters */
>> >>+#define FS_XFLAG_NOCOMPR 0x00040000 /* don't compress */
>> >>+#define FS_XFLAG_ECOMPR 0x00080000 /* compression error */
>> >>+#define FS_XFLAG_INDEX 0x00100000 /* hash-indexed directory */
>> >>+#define FS_XFLAG_IMAGIC 0x00200000 /* AFS directory */
>> >>+#define FS_XFLAG_JOURNAL_DATA 0x00400000 /* file data should be journaled */
>> >>+#define FS_XFLAG_NOTAIL 0x00800000 /* file tail should not be merged */
>> >>+#define FS_XFLAG_DIRSYNC 0x01000000 /* dirsync behaviour (directories only) */
>> >>+#define FS_XFLAG_TOPDIR 0x02000000 /* top of directory hierarchies*/
>> >>+#define FS_XFLAG_HUGE_FILE 0x04000000 /* set to each huge file */
>> >>+#define FS_XFLAG_EXTENTS 0x08000000 /* inode uses extents */
>> >>+#define FS_XFLAG_EA_INODE 0x10000000 /* inode used for large EA */
>> >>+#define FS_XFLAG_EOFBLOCKS 0x20000000 /* blocks allocated beyond EOF */
>> >>+#define FS_XFLAG_INLINE_DATA 0x40000000 /* inode has inline data. */
>> >
>> >And a bunch more ext4 specific flags that *uses all the remaining
>> >flag space*. At minimum, we need to keep space in this existing
>> >flags field for flags to future indication of how the padding is
>> >used, so that's yet another NACK.
>> >
>> >Further, none of these have any relevance to project quotas so
>> >should not be a part of this patchset. Nor are they relevant to any
>> >other filesystem, and many are duplicated by information you can get
>> >from FIEMAP and other interfaces. NACK, again.
>> >
>> >Because I'm getting annoyed at being repeatedly ignored about what
>> >needs to be done here, I'm now going to shout a bit. DO NOT CHANGE
>> >THE INTERFACE. DO NOT ADD any new flags to the interface. DO NOT
>> >REDEFINE how the interface works. DO NOT "ext4-ise" the interface.
>> >
>> >The only changes to the interface code should be moving the XFS
>> >definitions and renaming them so as to provide the new generic
>> >ioctl definition as well as the historic XFS definitions. The ext4
>> >implementation needs to be done in a separate patch to the interface
>> >rename, and it must only implement the functionality the interface
>> >already provides. Anything else is outside the scope of this
>> >patchset and requires separate discussion.
>>
>> What reason for reusing XFS ioctl?
>>
>> As I see quota tools from xfsprogs checks filesystem name and seems
>> like they wouldn't work without upgrade. e2fsprogs have to be updated
>> updated anyway to support changes in layout. So, in this case we could
>> design new generic ioctl/syscall interface for that. For example add
>> new commands to quotactl() instead of yet another obscure ioctl.
> Using quotactl() for setting / getting project ID is IMO a wrong fit.
> quotactl() is used to manipulate quota usage & limits but not file
> properties. And reusing XFS ioctl is IMHO better than inventing a new
> ioctl - ext4 can use the same ioctl as XFS just fine. It's only that Li Xi
> mixed in unrelated changes to the ext4 support for that ioctl.

XFS interface looks really strange for that:

struct fsxattr {
__u32 fsx_xflags; /* xflags field value (get/set) */
__u32 fsx_extsize; /* extsize field value (get/set)*/
__u32 fsx_nextents; /* nextents field value (get) */
__u32 fsx_projid; /* project identifier (get/set) */
unsigned char fsx_pad[12];
};

Where we need only one flag and one field, everything else is just a legacy.

Moreover that flag: FS_PROJINHERIT_FL is redundant. For files it's meaningless.
For directories only difference between clearing it and changing
project_id to zero is
in accounting directory itself into inode/space quota. I'm not sure if there any
use-case for accounting directory itself but not charging its content.

Well. Maybe pair of fcntl()? fcntl(fd, F_GET_PROJECT/F_SET_PROJECT, ...);

>
>> Also: is quota for project-id '0' really required for something?
>> It adds overhead but I don't see any use-cases for that.
> But only if filesystem has project quota feature enabled, no? That
> doesn't concern me too much since the overhead doesn't seem to big and when
> you enable project quotas you likely want to use them ;-). But if you are
> concerned, I'm not strictly opposed to special casing of project id 0. But
> I'd like to see how much speed you gain by that before complicating the
> code...

For non-journalled quota it's nothing but for journalled difference
might be significant.
This might be implemented without any complication as a fake inactive
quota block
which just plugs particular id.

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-01-23 01:39:32

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Thu, Jan 22, 2015 at 04:59:00PM +0100, Jan Kara wrote:
> On Thu 22-01-15 18:20:15, Konstantin Khlebnikov wrote:
> > Also: is quota for project-id '0' really required for something?
> > It adds overhead but I don't see any use-cases for that.
> But only if filesystem has project quota feature enabled, no? That
> doesn't concern me too much since the overhead doesn't seem to big and when
> you enable project quotas you likely want to use them ;-). But if you are
> concerned, I'm not strictly opposed to special casing of project id 0. But
> I'd like to see how much speed you gain by that before complicating the
> code...

Except that doing so will result in different behaviour between
filesystems. XFS always *accounts* inodes when quotas are enabled,
but does not allow limits to be placed on quota ID 0. Hence projid
= 0 accounts all the space not used by other project groups so
admins can easily see how much uncontrolled space is being used.
I've had admins tell me this is one of the features they liked most
about using project quotas because you can't hide from it, even as
root....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-23 01:53:07

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
> On 09.12.2014 08:22, Li Xi wrote:
> >+static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
> >+{
> >+ struct inode *inode = file_inode(filp);
> >+ struct super_block *sb = inode->i_sb;
> >+ struct ext4_inode_info *ei = EXT4_I(inode);
> >+ int err;
> >+ handle_t *handle;
> >+ kprojid_t kprojid;
> >+ struct ext4_iloc iloc;
> >+ struct ext4_inode *raw_inode;
> >+
> >+ struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
> >+
> >+ /* Make sure caller can change project. */
> >+ if (!capable(CAP_SYS_ADMIN))
> >+ return -EACCES;
> >+
> >+ if (projid != EXT4_DEF_PROJID
> >+ && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
> >+ EXT4_FEATURE_RO_COMPAT_PROJECT))
> >+ return -EOPNOTSUPP;
> >+
> >+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
> >+ EXT4_FEATURE_RO_COMPAT_PROJECT)) {
> >+ BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
> >+ != EXT4_DEF_PROJID);
> >+ if (projid != EXT4_DEF_PROJID)
> >+ return -EOPNOTSUPP;
> >+ else
> >+ return 0;
> >+ }
> >+
> >+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>
> Maybe current_user_ns()?
> This code should be user-namespace aware from the beginning.

No, the code is correct. Project quotas have nothing to do with
UIDs and so should never have been included in the uid/gid
namespace mapping infrastructure in the first place.

Point in case: directory subtree quotas can be used as a resource
controller for limiting space usage within separate containers that
share the same underlying (large) filesystem via mount namespaces.

Cheers,

Dave.

PS: can you please trim your reply to just the sections of the
patch you are commenting to? Finding replies like this in a large
patch is like searching for a needle in a haystack...
--
Dave Chinner
[email protected]

2015-01-23 11:58:09

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 23.01.2015 04:53, Dave Chinner wrote:
> On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
>> On 09.12.2014 08:22, Li Xi wrote:
>>> +static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
>>> +{
>>> + struct inode *inode = file_inode(filp);
>>> + struct super_block *sb = inode->i_sb;
>>> + struct ext4_inode_info *ei = EXT4_I(inode);
>>> + int err;
>>> + handle_t *handle;
>>> + kprojid_t kprojid;
>>> + struct ext4_iloc iloc;
>>> + struct ext4_inode *raw_inode;
>>> +
>>> + struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
>>> +
>>> + /* Make sure caller can change project. */
>>> + if (!capable(CAP_SYS_ADMIN))
>>> + return -EACCES;
>>> +
>>> + if (projid != EXT4_DEF_PROJID
>>> + && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
>>> + EXT4_FEATURE_RO_COMPAT_PROJECT))
>>> + return -EOPNOTSUPP;
>>> +
>>> + if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
>>> + EXT4_FEATURE_RO_COMPAT_PROJECT)) {
>>> + BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
>>> + != EXT4_DEF_PROJID);
>>> + if (projid != EXT4_DEF_PROJID)
>>> + return -EOPNOTSUPP;
>>> + else
>>> + return 0;
>>> + }
>>> +
>>> + kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>>
>> Maybe current_user_ns()?
>> This code should be user-namespace aware from the beginning.
>
> No, the code is correct. Project quotas have nothing to do with
> UIDs and so should never have been included in the uid/gid
> namespace mapping infrastructure in the first place.

Right, but user-namespace provides id mapping for project-id too.
This infrastructure adds support for nested project quotas with
virtualized ids in sub-containers. I couldn't say that this is
must have feature but implementation is trivial because whole
infrastructure is already here.

>
> Point in case: directory subtree quotas can be used as a resource
> controller for limiting space usage within separate containers that
> share the same underlying (large) filesystem via mount namespaces.

That's exactly my use-case: 'sub-volumes' for containers with
quota for space usage/inodes count.

2015-01-23 23:30:26

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
> On 23.01.2015 04:53, Dave Chinner wrote:
> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
> >>>+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
> >>
> >>Maybe current_user_ns()?
> >>This code should be user-namespace aware from the beginning.
> >
> >No, the code is correct. Project quotas have nothing to do with
> >UIDs and so should never have been included in the uid/gid
> >namespace mapping infrastructure in the first place.
>
> Right, but user-namespace provides id mapping for project-id too.
> This infrastructure adds support for nested project quotas with
> virtualized ids in sub-containers. I couldn't say that this is
> must have feature but implementation is trivial because whole
> infrastructure is already here.

This is an extremely common misunderstanding of project IDs. Project
IDs are completely separate to the UID/GID namespace. Project
quotas were originally designed specifically for
accounting/enforcing quotas in situations where uid/gid
accounting/enforcing is not possible. This design intent goes back
25 years - it predates XFS...

IOWs, mapping prids via user namespaces defeats the purpose
for which prids were originally intended for.

> >Point in case: directory subtree quotas can be used as a resource
> >controller for limiting space usage within separate containers that
> >share the same underlying (large) filesystem via mount namespaces.
>
> That's exactly my use-case: 'sub-volumes' for containers with
> quota for space usage/inodes count.

That doesn't require mapped project IDs. Hard container space limits
can only be controlled by the init namespace, and because inodes can
hold only one project ID the current ns cannot be allowed to change
the project ID on the inode because that allows them to escape the
resource limits set on the project ID associated with the sub-mount
set up by the init namespace...

i.e.

/mnt prid = 0, default for entire fs.
/mnt/container1/ prid = 1, inherit, 10GB space limit
/mnt/container2/ prid = 2, inherit, 50GB space limit
.....
/mnt/containerN/ prid = N, inherit, 20GB space limit

And you clone the mount namespace for each container so the root is
at the appropriate /mnt/containerX/. Now the containers have a
fixed amount of space they can use in the parent filesystem they
know nothing about, and it is enforced by directory subquotas
controlled by the init namespace. This "fixed amount of space" is
reflected in the container namespace when "df" is run as it will
report the project quota space limits. Adding or removing space to a
container is as simple as changing the project quota limits from the
init namespace. i.e. an admin operation controlled by the host, not
the container....

Allowing the container to modify the prid and/or the inherit bit of
inodes in it's namespace then means the user can define their own
space usage limits, even turn them off. It's not a resource
container at that point because the user can define their own
limits. Hence, only if the current_ns cannot change project quotas
will we have a hard fence on space usage that the container *cannot
exceed*.

Yes, I know there are other use cases for project quotas *within* a
container as controlled by the user (same as existing project quota
usages), but we don't have the capability of storing multiple
project IDs on each inode, nor accounting/enforcement across
multiple project IDs on an inode. Nor, really, do we want to (on
disk format changes required) and hence we can have one or the
other but not both.

Further, in a containerised system, providing the admin with a
trivial and easy to manage mechanism to provide hard limits on
shared filesystem space usage of each container is far more
important than catering to the occasional user who might have a need
for project quotas inside a container.

These are the points I brought up when I initially reviewed the user
namespace patches - the userns developer ignored my concerns and the
code was merged without acknowledging them, let alone addressing
them.

As we (the XFS guys) have no way of knowing when such a distinction
should be made, and with the user ns developers being completely
unresponsive on the subject, we made the decision ourselves. Our
only concern was to be consistent, safe and predictable and that
means we choose to only allow project quotas to be used as an
external container resource hardwall limit and hence *never* allow
access to project quotas inside container namespaces.

That's the long and the short of it. project IDs are independent of
user IDs and they cannot be sanely used both inside and outside user
namespaces at the same time. Hence they should never have been
included in the user namespace mappings in the first place.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-23 23:59:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
> On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>> On 23.01.2015 04:53, Dave Chinner wrote:
>> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
>> >>>+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>> >>
>> >>Maybe current_user_ns()?
>> >>This code should be user-namespace aware from the beginning.
>> >
>> >No, the code is correct. Project quotas have nothing to do with
>> >UIDs and so should never have been included in the uid/gid
>> >namespace mapping infrastructure in the first place.
>>
>> Right, but user-namespace provides id mapping for project-id too.
>> This infrastructure adds support for nested project quotas with
>> virtualized ids in sub-containers. I couldn't say that this is
>> must have feature but implementation is trivial because whole
>> infrastructure is already here.
>
> This is an extremely common misunderstanding of project IDs. Project
> IDs are completely separate to the UID/GID namespace. Project
> quotas were originally designed specifically for
> accounting/enforcing quotas in situations where uid/gid
> accounting/enforcing is not possible. This design intent goes back
> 25 years - it predates XFS...
>
> IOWs, mapping prids via user namespaces defeats the purpose
> for which prids were originally intended for.
>
>> >Point in case: directory subtree quotas can be used as a resource
>> >controller for limiting space usage within separate containers that
>> >share the same underlying (large) filesystem via mount namespaces.
>>
>> That's exactly my use-case: 'sub-volumes' for containers with
>> quota for space usage/inodes count.
>
> That doesn't require mapped project IDs. Hard container space limits
> can only be controlled by the init namespace, and because inodes can
> hold only one project ID the current ns cannot be allowed to change
> the project ID on the inode because that allows them to escape the
> resource limits set on the project ID associated with the sub-mount
> set up by the init namespace...
>
> i.e.
>
> /mnt prid = 0, default for entire fs.
> /mnt/container1/ prid = 1, inherit, 10GB space limit
> /mnt/container2/ prid = 2, inherit, 50GB space limit
> .....
> /mnt/containerN/ prid = N, inherit, 20GB space limit
>
> And you clone the mount namespace for each container so the root is
> at the appropriate /mnt/containerX/. Now the containers have a
> fixed amount of space they can use in the parent filesystem they
> know nothing about, and it is enforced by directory subquotas
> controlled by the init namespace. This "fixed amount of space" is
> reflected in the container namespace when "df" is run as it will
> report the project quota space limits. Adding or removing space to a
> container is as simple as changing the project quota limits from the
> init namespace. i.e. an admin operation controlled by the host, not
> the container....
>
> Allowing the container to modify the prid and/or the inherit bit of
> inodes in it's namespace then means the user can define their own
> space usage limits, even turn them off. It's not a resource
> container at that point because the user can define their own
> limits. Hence, only if the current_ns cannot change project quotas
> will we have a hard fence on space usage that the container *cannot
> exceed*.

I think I must be missing something simple here. In a hypothetical
world where the code used nsown_capable, if an admin wants to stick a
container in /mnt/container1 with associated prid 1 and a userns,
shouldn't it just map only prid 1 into the user ns? Then a user in
that userns can't try to change the prid of a file to 2 because the
number "2" is unmapped for that user and translation will fail.

--Andy

2015-01-26 18:46:39

by jon ernst

[permalink] [raw]
Subject: Re: [v8 0/5] ext4: add project quota support

Xi,
Just a question, do you have any test scripts or xfstest cases for
this patchset?

Thanks,
Jon

On Tue, Dec 9, 2014 at 12:22 AM, Li Xi <[email protected]> wrote:
> The following patches propose an implementation of project quota
> support for ext4. A project is an aggregate of unrelated inodes
> which might scatter in different directories. Inodes that belong
> to the same project possess an identical identification i.e.
> 'project ID', just like every inode has its user/group
> identification. The following patches add project quota as
> supplement to the former uer/group quota types.
>
> The semantics of ext4 project quota is consistent with XFS. Each
> directory can have EXT4_INODE_PROJINHERIT flag set. When the
> EXT4_INODE_PROJINHERIT flag of a parent directory is not set, a
> newly created inode under that directory will have a default project
> ID (i.e. 0). And its EXT4_INODE_PROJINHERIT flag is not set either.
> When this flag is set on a directory, following rules will be kept:
>
> 1) The newly created inode under that directory will inherit both
> the EXT4_INODE_PROJINHERIT flag and the project ID from its parent
> directory.
>
> 2) Hard-linking a inode with different project ID into that directory
> will fail with errno EXDEV.
>
> 3) Renaming a inode with different project ID into that directory
> will fail with errno EXDEV. However, 'mv' command will detect this
> failure and copy the renamed inode to a new inode in the directory.
> Thus, this new inode will inherit both the project ID and
> EXT4_INODE_PROJINHERIT flag.
>
> 4) If the project quota of that ID is being enforced, statfs() on
> that directory will take the quotas as another upper limits along
> with the capacity of the file system, i.e. the total block/inode
> number will be the minimum of the quota limits and file system
> capacity.
>
> Changelog:
> * v8 <- v7:
> - Rebase to newest dev branch of ext4 repository (3.18.0_rc3).
> * v7 <- v6:
> - Map ext4 inode flags to xflags of struct fsxattr;
> - Add patch to cleanup ext4 inode flag definitions.
> * v6 <- v5:
> - Add project ID check for cross rename;
> - Remove patch of EXT4_IOC_GETPROJECT/EXT4_IOC_SETPROJECT ioctl
> * v5 <- v4:
> - Check project feature when set/get project ID;
> - Do not check project feature for project quota;
> - Add support of FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR.
> * v4 <- v3:
> - Do not check project feature when set/get project ID;
> - Use EXT4_MAXQUOTAS instead of MAXQUOTAS in ext4 patches;
> - Remove unnecessary change of fs/quota/dquot.c;
> - Remove CONFIG_QUOTA_PROJECT.
> * v3 <- v2:
> - Add EXT4_INODE_PROJINHERIT semantics.
> * v2 <- v1:
> - Add ioctl interface for setting/getting project;
> - Add EXT4_FEATURE_RO_COMPAT_PROJECT;
> - Add get_projid() method in struct dquot_operations;
> - Add error check of ext4_inode_projid_set/get().
>
> v7: http://www.spinics.net/lists/linux-fsdevel/msg80404.html
> v6: http://www.spinics.net/lists/linux-fsdevel/msg80022.html
> v5: http://www.spinics.net/lists/linux-api/msg04840.html
> v4: http://lwn.net/Articles/612972/
> v3: http://www.spinics.net/lists/linux-ext4/msg45184.html
> v2: http://www.spinics.net/lists/linux-ext4/msg44695.html
> v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153
>
> Any comments or feedbacks are appreciated.
>
> Regards,
> - Li Xi
>
> Li Xi (5):
> vfs: adds general codes to enforces project quota limits
> ext4: adds project ID support
> ext4: adds project quota support
> ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
> ext4: cleanup inode flag definitions
>
> fs/ext4/ext4.h | 190 +++++++++++++++++++++----
> fs/ext4/ialloc.c | 6 +
> fs/ext4/inode.c | 29 ++++
> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++-------------
> fs/ext4/namei.c | 17 +++
> fs/ext4/super.c | 96 +++++++++++--
> fs/quota/dquot.c | 35 ++++-
> fs/quota/quota.c | 8 +-
> fs/quota/quotaio_v2.h | 6 +-
> fs/xfs/xfs_fs.h | 47 +++----
> include/linux/quota.h | 2 +
> include/uapi/linux/fs.h | 59 ++++++++
> include/uapi/linux/quota.h | 6 +-
> 13 files changed, 650 insertions(+), 181 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-01-27 08:02:39

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
> On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
> > On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
> >> On 23.01.2015 04:53, Dave Chinner wrote:
> >> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
> >> >>>+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
> >> >>
> >> >>Maybe current_user_ns()?
> >> >>This code should be user-namespace aware from the beginning.
> >> >
> >> >No, the code is correct. Project quotas have nothing to do with
> >> >UIDs and so should never have been included in the uid/gid
> >> >namespace mapping infrastructure in the first place.
> >>
> >> Right, but user-namespace provides id mapping for project-id too.
> >> This infrastructure adds support for nested project quotas with
> >> virtualized ids in sub-containers. I couldn't say that this is
> >> must have feature but implementation is trivial because whole
> >> infrastructure is already here.
> >
> > This is an extremely common misunderstanding of project IDs. Project
> > IDs are completely separate to the UID/GID namespace. Project
> > quotas were originally designed specifically for
> > accounting/enforcing quotas in situations where uid/gid
> > accounting/enforcing is not possible. This design intent goes back
> > 25 years - it predates XFS...
> >
> > IOWs, mapping prids via user namespaces defeats the purpose
> > for which prids were originally intended for.
> >
> >> >Point in case: directory subtree quotas can be used as a resource
> >> >controller for limiting space usage within separate containers that
> >> >share the same underlying (large) filesystem via mount namespaces.
> >>
> >> That's exactly my use-case: 'sub-volumes' for containers with
> >> quota for space usage/inodes count.
> >
> > That doesn't require mapped project IDs. Hard container space limits
> > can only be controlled by the init namespace, and because inodes can
> > hold only one project ID the current ns cannot be allowed to change
> > the project ID on the inode because that allows them to escape the
> > resource limits set on the project ID associated with the sub-mount
> > set up by the init namespace...
> >
> > i.e.
> >
> > /mnt prid = 0, default for entire fs.
> > /mnt/container1/ prid = 1, inherit, 10GB space limit
> > /mnt/container2/ prid = 2, inherit, 50GB space limit
> > .....
> > /mnt/containerN/ prid = N, inherit, 20GB space limit
> >
> > And you clone the mount namespace for each container so the root is
> > at the appropriate /mnt/containerX/. Now the containers have a
> > fixed amount of space they can use in the parent filesystem they
> > know nothing about, and it is enforced by directory subquotas
> > controlled by the init namespace. This "fixed amount of space" is
> > reflected in the container namespace when "df" is run as it will
> > report the project quota space limits. Adding or removing space to a
> > container is as simple as changing the project quota limits from the
> > init namespace. i.e. an admin operation controlled by the host, not
> > the container....
> >
> > Allowing the container to modify the prid and/or the inherit bit of
> > inodes in it's namespace then means the user can define their own
> > space usage limits, even turn them off. It's not a resource
> > container at that point because the user can define their own
> > limits. Hence, only if the current_ns cannot change project quotas
> > will we have a hard fence on space usage that the container *cannot
> > exceed*.
>
> I think I must be missing something simple here. In a hypothetical
> world where the code used nsown_capable, if an admin wants to stick a
> container in /mnt/container1 with associated prid 1 and a userns,
> shouldn't it just map only prid 1 into the user ns? Then a user in
> that userns can't try to change the prid of a file to 2 because the
> number "2" is unmapped for that user and translation will fail.

You've effectively said "yes, project quotas are enabled, but you
only have a single ID, it's always turned on and you can't change it
to anything else.

So, why do they need to be mapped via user namespaces to enable
this? Think about it a little harder:

- Project IDs are not user IDs.
- Project IDs are not a security/permission mechanism.
- Project quotas only provide a mechanism for
resource usage control.

Think about that last one some more. Perhaps, as a hint, I should
relate it to control groups? :) i.e:

- Project quotas can be used as an effective mount ns space
usage controller.

But this can only be safely and reliably by keeping the project IDs
inaccessible from the containers themselves. I don't see why a
mechanism that controls the amount of filesystem space used by a
container should be considered any differently to a memory control
group that limits the amount of memory the container can use.

However, nobody on the container side of things would answer any of
my questions about how project quotas were going to be used,
limited, managed, etc back when we had to make a decision to enable
XFS user ns support, I did what was needed to support the obvious
container use case and close any possible loop hole that containers
might be able to use to subvert that use case.

If we want to do anything different, then there's a *lot* of
userns aware regression tests needed to be written for xfstests....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-27 10:45:17

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 27.01.2015 11:02, Dave Chinner wrote:
> On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
>> On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
>>> On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>>>> On 23.01.2015 04:53, Dave Chinner wrote:
>>>>> On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
>>>>>>> + kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>>>>>>
>>>>>> Maybe current_user_ns()?
>>>>>> This code should be user-namespace aware from the beginning.
>>>>>
>>>>> No, the code is correct. Project quotas have nothing to do with
>>>>> UIDs and so should never have been included in the uid/gid
>>>>> namespace mapping infrastructure in the first place.
>>>>
>>>> Right, but user-namespace provides id mapping for project-id too.
>>>> This infrastructure adds support for nested project quotas with
>>>> virtualized ids in sub-containers. I couldn't say that this is
>>>> must have feature but implementation is trivial because whole
>>>> infrastructure is already here.
>>>
>>> This is an extremely common misunderstanding of project IDs. Project
>>> IDs are completely separate to the UID/GID namespace. Project
>>> quotas were originally designed specifically for
>>> accounting/enforcing quotas in situations where uid/gid
>>> accounting/enforcing is not possible. This design intent goes back
>>> 25 years - it predates XFS...
>>>
>>> IOWs, mapping prids via user namespaces defeats the purpose
>>> for which prids were originally intended for.
>>>
>>>>> Point in case: directory subtree quotas can be used as a resource
>>>>> controller for limiting space usage within separate containers that
>>>>> share the same underlying (large) filesystem via mount namespaces.
>>>>
>>>> That's exactly my use-case: 'sub-volumes' for containers with
>>>> quota for space usage/inodes count.
>>>
>>> That doesn't require mapped project IDs. Hard container space limits
>>> can only be controlled by the init namespace, and because inodes can
>>> hold only one project ID the current ns cannot be allowed to change
>>> the project ID on the inode because that allows them to escape the
>>> resource limits set on the project ID associated with the sub-mount
>>> set up by the init namespace...
>>>
>>> i.e.
>>>
>>> /mnt prid = 0, default for entire fs.
>>> /mnt/container1/ prid = 1, inherit, 10GB space limit
>>> /mnt/container2/ prid = 2, inherit, 50GB space limit
>>> .....
>>> /mnt/containerN/ prid = N, inherit, 20GB space limit
>>>
>>> And you clone the mount namespace for each container so the root is
>>> at the appropriate /mnt/containerX/. Now the containers have a
>>> fixed amount of space they can use in the parent filesystem they
>>> know nothing about, and it is enforced by directory subquotas
>>> controlled by the init namespace. This "fixed amount of space" is
>>> reflected in the container namespace when "df" is run as it will
>>> report the project quota space limits. Adding or removing space to a
>>> container is as simple as changing the project quota limits from the
>>> init namespace. i.e. an admin operation controlled by the host, not
>>> the container....
>>>
>>> Allowing the container to modify the prid and/or the inherit bit of
>>> inodes in it's namespace then means the user can define their own
>>> space usage limits, even turn them off. It's not a resource
>>> container at that point because the user can define their own
>>> limits. Hence, only if the current_ns cannot change project quotas
>>> will we have a hard fence on space usage that the container *cannot
>>> exceed*.
>>
>> I think I must be missing something simple here. In a hypothetical
>> world where the code used nsown_capable, if an admin wants to stick a
>> container in /mnt/container1 with associated prid 1 and a userns,
>> shouldn't it just map only prid 1 into the user ns? Then a user in
>> that userns can't try to change the prid of a file to 2 because the
>> number "2" is unmapped for that user and translation will fail.
>
> You've effectively said "yes, project quotas are enabled, but you
> only have a single ID, it's always turned on and you can't change it
> to anything else.
>
> So, why do they need to be mapped via user namespaces to enable
> this? Think about it a little harder:
>
> - Project IDs are not user IDs.
> - Project IDs are not a security/permission mechanism.
> - Project quotas only provide a mechanism for
> resource usage control.
>
> Think about that last one some more. Perhaps, as a hint, I should
> relate it to control groups? :) i.e:
>
> - Project quotas can be used as an effective mount ns space
> usage controller.
>
> But this can only be safely and reliably by keeping the project IDs
> inaccessible from the containers themselves. I don't see why a
> mechanism that controls the amount of filesystem space used by a
> container should be considered any differently to a memory control
> group that limits the amount of memory the container can use.
>
> However, nobody on the container side of things would answer any of
> my questions about how project quotas were going to be used,
> limited, managed, etc back when we had to make a decision to enable
> XFS user ns support, I did what was needed to support the obvious
> container use case and close any possible loop hole that containers
> might be able to use to subvert that use case.

I have a solution: Hierarchical Project Quota! Each project might have
parent project and so on. Each level keeps usage, limits and also keeps
some preallocation from parent level to reduce count of quota updates.

This might be useful even without containers : normal user quota has
two levels and admins might classify users into groups and set group
quota for them. Project quota is flat and cannot provide any control
if we want classify projects.

For containers hierarchy provide full virtualization: user-namespace
maps maps second-level and projects into subset of real projects.

Changing limits and other managing for second-level project quotas
could be done in user-space by system service (systemd I suppose. lol),
so we don't have to manage this stuff inside the kernel.

[ I'm already working on prototype for ext4 ]

>
> If we want to do anything different, then there's a *lot* of
> userns aware regression tests needed to be written for xfstests....
>
> Cheers,
>
> Dave.
>

2015-01-28 00:37:52

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Tue, Jan 27, 2015 at 01:45:17PM +0300, Konstantin Khlebnikov wrote:
> On 27.01.2015 11:02, Dave Chinner wrote:
> >On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
> >>On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
> >>>On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
> >>
> >>I think I must be missing something simple here. In a hypothetical
> >>world where the code used nsown_capable, if an admin wants to stick a
> >>container in /mnt/container1 with associated prid 1 and a userns,
> >>shouldn't it just map only prid 1 into the user ns? Then a user in
> >>that userns can't try to change the prid of a file to 2 because the
> >>number "2" is unmapped for that user and translation will fail.
> >
> >You've effectively said "yes, project quotas are enabled, but you
> >only have a single ID, it's always turned on and you can't change it
> >to anything else.
> >
> >So, why do they need to be mapped via user namespaces to enable
> >this? Think about it a little harder:
> >
> > - Project IDs are not user IDs.
> > - Project IDs are not a security/permission mechanism.
> > - Project quotas only provide a mechanism for
> > resource usage control.
> >
> >Think about that last one some more. Perhaps, as a hint, I should
> >relate it to control groups? :) i.e:
> >
> > - Project quotas can be used as an effective mount ns space
> > usage controller.
> >
> >But this can only be safely and reliably by keeping the project IDs
> >inaccessible from the containers themselves. I don't see why a
> >mechanism that controls the amount of filesystem space used by a
> >container should be considered any differently to a memory control
> >group that limits the amount of memory the container can use.
> >
> >However, nobody on the container side of things would answer any of
> >my questions about how project quotas were going to be used,
> >limited, managed, etc back when we had to make a decision to enable
> >XFS user ns support, I did what was needed to support the obvious
> >container use case and close any possible loop hole that containers
> >might be able to use to subvert that use case.
>
> I have a solution: Hierarchical Project Quota! Each project might have
> parent project and so on. Each level keeps usage, limits and also keeps
> some preallocation from parent level to reduce count of quota updates.

That's an utter nightmare to manage - just ask the gluster guys who
thought this was a good idea when they first implemented quotas.

Besides, following down the path of heirarchical control groups
doesn't seem like a good idea to me because that path has already
proven to be a bad idea for container resource controllers. There's
good reason why control groups have gone back to a flattened ID
space like we already have for project quotas, so I don't think we
want to go that way.

> This might be useful even without containers : normal user quota has
> two levels and admins might classify users into groups and set group
> quota for them. Project quota is flat and cannot provide any control
> if we want classify projects.

I don't follow. project ID is exactly what allows you to control
project classification.

> For containers hierarchy provide full virtualization: user-namespace
> maps maps second-level and projects into subset of real projects.

It's not the mapping that matters - if project quotas are used
outside containers as a resource controller, then they can't be
used inside containers even with a unique mapping range because
we can only store a single project ID per inode.

Besides, I'm struggling to see the use case for project quotas
inside small containers that run single applications and typically
only have a single user. Project quotas have traditionally been used
to manage space in large filesystems shared by many users along
bounds that don't follow any specific heirarchy or permission set.

IOWs, you haven't described your use case for needing project quotas
inside containers, so I've got no idea what problem you are trying
to solve or whether project quotas are even appropriate as a
solution.

> Changing limits and other managing for second-level project quotas
> could be done in user-space by system service (systemd I suppose. lol),
> so we don't have to manage this stuff inside the kernel.

So you are proposing a fourth on-disk quota here i.e. user, group,
project, new_2nd_level_project? If so, forget about using systemd to
manage it, the first thing we'll need is need full support in
existing quota tools so that the regression tests you write for
xfstests are self contained.

> [ I'm already working on prototype for ext4 ]

You really need to post a use case description and adesign document
for review so we can actually discuss what you are planning. So far
everything you are doing strikes me as a "because they are there and
it sounds cool" type of development, not because people actually
need them. Let's have a discussion about the real problems and
architecture, not waste time on a stupid "solution looking for a
problem" discussion...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-01-28 00:45:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Tue, Jan 27, 2015 at 12:02 AM, Dave Chinner <[email protected]> wrote:
> On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
>> On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
>> > On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>> >> On 23.01.2015 04:53, Dave Chinner wrote:
>> >> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
>> >> >>>+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>> >> >>
>> >> >>Maybe current_user_ns()?
>> >> >>This code should be user-namespace aware from the beginning.
>> >> >
>> >> >No, the code is correct. Project quotas have nothing to do with
>> >> >UIDs and so should never have been included in the uid/gid
>> >> >namespace mapping infrastructure in the first place.
>> >>
>> >> Right, but user-namespace provides id mapping for project-id too.
>> >> This infrastructure adds support for nested project quotas with
>> >> virtualized ids in sub-containers. I couldn't say that this is
>> >> must have feature but implementation is trivial because whole
>> >> infrastructure is already here.
>> >
>> > This is an extremely common misunderstanding of project IDs. Project
>> > IDs are completely separate to the UID/GID namespace. Project
>> > quotas were originally designed specifically for
>> > accounting/enforcing quotas in situations where uid/gid
>> > accounting/enforcing is not possible. This design intent goes back
>> > 25 years - it predates XFS...
>> >
>> > IOWs, mapping prids via user namespaces defeats the purpose
>> > for which prids were originally intended for.
>> >
>> >> >Point in case: directory subtree quotas can be used as a resource
>> >> >controller for limiting space usage within separate containers that
>> >> >share the same underlying (large) filesystem via mount namespaces.
>> >>
>> >> That's exactly my use-case: 'sub-volumes' for containers with
>> >> quota for space usage/inodes count.
>> >
>> > That doesn't require mapped project IDs. Hard container space limits
>> > can only be controlled by the init namespace, and because inodes can
>> > hold only one project ID the current ns cannot be allowed to change
>> > the project ID on the inode because that allows them to escape the
>> > resource limits set on the project ID associated with the sub-mount
>> > set up by the init namespace...
>> >
>> > i.e.
>> >
>> > /mnt prid = 0, default for entire fs.
>> > /mnt/container1/ prid = 1, inherit, 10GB space limit
>> > /mnt/container2/ prid = 2, inherit, 50GB space limit
>> > .....
>> > /mnt/containerN/ prid = N, inherit, 20GB space limit
>> >
>> > And you clone the mount namespace for each container so the root is
>> > at the appropriate /mnt/containerX/. Now the containers have a
>> > fixed amount of space they can use in the parent filesystem they
>> > know nothing about, and it is enforced by directory subquotas
>> > controlled by the init namespace. This "fixed amount of space" is
>> > reflected in the container namespace when "df" is run as it will
>> > report the project quota space limits. Adding or removing space to a
>> > container is as simple as changing the project quota limits from the
>> > init namespace. i.e. an admin operation controlled by the host, not
>> > the container....
>> >
>> > Allowing the container to modify the prid and/or the inherit bit of
>> > inodes in it's namespace then means the user can define their own
>> > space usage limits, even turn them off. It's not a resource
>> > container at that point because the user can define their own
>> > limits. Hence, only if the current_ns cannot change project quotas
>> > will we have a hard fence on space usage that the container *cannot
>> > exceed*.
>>
>> I think I must be missing something simple here. In a hypothetical
>> world where the code used nsown_capable, if an admin wants to stick a
>> container in /mnt/container1 with associated prid 1 and a userns,
>> shouldn't it just map only prid 1 into the user ns? Then a user in
>> that userns can't try to change the prid of a file to 2 because the
>> number "2" is unmapped for that user and translation will fail.
>
> You've effectively said "yes, project quotas are enabled, but you
> only have a single ID, it's always turned on and you can't change it
> to anything else.

It's got to be a assigned somehow. Inheritance from the parent
directory probably works too, though.

>
> So, why do they need to be mapped via user namespaces to enable
> this? Think about it a little harder:
>
> - Project IDs are not user IDs.
> - Project IDs are not a security/permission mechanism.
> - Project quotas only provide a mechanism for
> resource usage control.
>
> Think about that last one some more. Perhaps, as a hint, I should
> relate it to control groups? :) i.e:
>
> - Project quotas can be used as an effective mount ns space
> usage controller.
>
> But this can only be safely and reliably by keeping the project IDs
> inaccessible from the containers themselves. I don't see why a
> mechanism that controls the amount of filesystem space used by a
> container should be considered any differently to a memory control
> group that limits the amount of memory the container can use.
>

Cgroups are ephemeral, and I'd want my containers' quotas to survive
container restarts and even reboots. I'm sure it *could* be done,
though.

> However, nobody on the container side of things would answer any of
> my questions about how project quotas were going to be used,
> limited, managed, etc back when we had to make a decision to enable
> XFS user ns support, I did what was needed to support the obvious
> container use case and close any possible loop hole that containers
> might be able to use to subvert that use case.
>
> If we want to do anything different, then there's a *lot* of
> userns aware regression tests needed to be written for xfstests....

Agreed.

--Andy

2015-02-04 15:22:01

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 28.01.2015 03:37, Dave Chinner wrote:
> On Tue, Jan 27, 2015 at 01:45:17PM +0300, Konstantin Khlebnikov wrote:
>> On 27.01.2015 11:02, Dave Chinner wrote:
>>> On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
>>>> On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
>>>>> On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>>>>
>>>> I think I must be missing something simple here. In a hypothetical
>>>> world where the code used nsown_capable, if an admin wants to stick a
>>>> container in /mnt/container1 with associated prid 1 and a userns,
>>>> shouldn't it just map only prid 1 into the user ns? Then a user in
>>>> that userns can't try to change the prid of a file to 2 because the
>>>> number "2" is unmapped for that user and translation will fail.
>>>
>>> You've effectively said "yes, project quotas are enabled, but you
>>> only have a single ID, it's always turned on and you can't change it
>>> to anything else.
>>>
>>> So, why do they need to be mapped via user namespaces to enable
>>> this? Think about it a little harder:
>>>
>>> - Project IDs are not user IDs.
>>> - Project IDs are not a security/permission mechanism.
>>> - Project quotas only provide a mechanism for
>>> resource usage control.
>>>
>>> Think about that last one some more. Perhaps, as a hint, I should
>>> relate it to control groups? :) i.e:
>>>
>>> - Project quotas can be used as an effective mount ns space
>>> usage controller.
>>>
>>> But this can only be safely and reliably by keeping the project IDs
>>> inaccessible from the containers themselves. I don't see why a
>>> mechanism that controls the amount of filesystem space used by a
>>> container should be considered any differently to a memory control
>>> group that limits the amount of memory the container can use.
>>>
>>> However, nobody on the container side of things would answer any of
>>> my questions about how project quotas were going to be used,
>>> limited, managed, etc back when we had to make a decision to enable
>>> XFS user ns support, I did what was needed to support the obvious
>>> container use case and close any possible loop hole that containers
>>> might be able to use to subvert that use case.
>>
>> I have a solution: Hierarchical Project Quota! Each project might have
>> parent project and so on. Each level keeps usage, limits and also keeps
>> some preallocation from parent level to reduce count of quota updates.
>
> That's an utter nightmare to manage - just ask the gluster guys who
> thought this was a good idea when they first implemented quotas.
>
> Besides, following down the path of heirarchical control groups
> doesn't seem like a good idea to me because that path has already
> proven to be a bad idea for container resource controllers. There's
> good reason why control groups have gone back to a flattened ID
> space like we already have for project quotas, so I don't think we
> want to go that way.
>
>> This might be useful even without containers : normal user quota has
>> two levels and admins might classify users into groups and set group
>> quota for them. Project quota is flat and cannot provide any control
>> if we want classify projects.
>
> I don't follow. project ID is exactly what allows you to control
> project classification.

I mean hierarchy allows to group several projects into one super-project
which sums all disk usage and could have its own limit too.

>
>> For containers hierarchy provide full virtualization: user-namespace
>> maps maps second-level and projects into subset of real projects.
>
> It's not the mapping that matters - if project quotas are used
> outside containers as a resource controller, then they can't be
> used inside containers even with a unique mapping range because
> we can only store a single project ID per inode.
>
> Besides, I'm struggling to see the use case for project quotas
> inside small containers that run single applications and typically
> only have a single user. Project quotas have traditionally been used
> to manage space in large filesystems shared by many users along
> bounds that don't follow any specific heirarchy or permission set.
>
> IOWs, you haven't described your use case for needing project quotas
> inside containers, so I've got no idea what problem you are trying
> to solve or whether project quotas are even appropriate as a
> solution.

Some people run inside containers complete distributives with multiple
services or even nested virtualization.

I've poked this code and played with some use-cases.
Hierarchical project quotas are cool and it seems the only option for
virtualization and providing seamless nested project quotas inside
containers. But, right now I'm not so interested in this feature.
Let's leave this for the future.


For now I'm more interested in participation disk space among services
in one system. As I see security model of project quota in XFS almost
non-existent for this case: it forbids linking/renaming files between
different projects but any unprivileged user might change project id
for its own files. That's strange, this operation should be privileged.

Also if user have permission for changing project id he could be
permitted to link and rename file into directory with any project id,
because he anyway could change project, move, and revert it back.


For me perfect interface looks like couple fcntls for getting/changing
project id:

int fcntl(fd, F_GET_PROJECT, projid_t *);
int fcntl(fd, F_SET_PROJECT, projid_t);

F_GET_PROJECT is allowed for everybody
F_SET_PROJECT requires CAP_SYS_ADMIN (or maybe CAP_FOWNER?)
(for virtualization id also must be mapped in user-ns)

ioctl XFS_IOC_FSSETXATTR should stay xfs specific.
And XFS_DIFLAG_PROJINHERIT should stay XFS-only feature too.
I don't see any use cases for that flag. For files is has no effect
for directories it's mostly equal to setting directory project id
to zero. The only difference in accounting directory itself.

Cross-project renaming/linking must be allowed if user have
permissions for changing project id at file and directory.
This is useful for sharing files between containers.

--
Konstantin

2015-02-05 09:32:02

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On 05.02.2015 01:58, Dave Chinner wrote:
> On Wed, Feb 04, 2015 at 06:22:01PM +0300, Konstantin Khlebnikov wrote:
>> On 28.01.2015 03:37, Dave Chinner wrote:
>>> On Tue, Jan 27, 2015 at 01:45:17PM +0300, Konstantin Khlebnikov wrote:
>>>> On 27.01.2015 11:02, Dave Chinner wrote:
>>>>> On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:
>>>>>> On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <[email protected]> wrote:
>>>>>>> On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>>>>>>
>>>>>> I think I must be missing something simple here. In a hypothetical
>>>>>> world where the code used nsown_capable, if an admin wants to stick a
>>>>>> container in /mnt/container1 with associated prid 1 and a userns,
>>>>>> shouldn't it just map only prid 1 into the user ns? Then a user in
>>>>>> that userns can't try to change the prid of a file to 2 because the
>>>>>> number "2" is unmapped for that user and translation will fail.
>>>>>
>>>>> You've effectively said "yes, project quotas are enabled, but you
>>>>> only have a single ID, it's always turned on and you can't change it
>>>>> to anything else.
>>>>>
>>>>> So, why do they need to be mapped via user namespaces to enable
>>>>> this? Think about it a little harder:
>>>>>
>>>>> - Project IDs are not user IDs.
>>>>> - Project IDs are not a security/permission mechanism.
>
> First, I'll just point this out again...

Ok, I get it.

>>>> This might be useful even without containers : normal user quota has
>>>> two levels and admins might classify users into groups and set group
>>>> quota for them. Project quota is flat and cannot provide any control
>>>> if we want classify projects.
>>>
>>> I don't follow. project ID is exactly what allows you to control
>>> project classification.
>>
>> I mean hierarchy allows to group several projects into one super-project
>> which sums all disk usage and could have its own limit too.
>
> Yes, I know, but you can also do this resource management from
> userspace with the existing project quota tools. It's just a matter
> of layering heirarchical limit management on top of the existing
> infrastructure.

Yes but not in all cases: it's impossible to overcommit disk limits on
project level without overcommiting on super-project level.
Hierarchical quotas can handle this [ hypothetically useful ] use case.

>>
>> For now I'm more interested in participation disk space among services
>> in one system. As I see security model of project quota in XFS almost
>> non-existent for this case: it forbids linking/renaming files between
>> different projects but any unprivileged user might change project id
>> for its own files. That's strange, this operation should be privileged.
>
> <sigh>
>
> It's clear you don't understand the design/architecture of project
> quotas. You've clearly read the code, but you haven't understood
> the design that lead to the specific implementation in XFS.
>
> Users have *always* been allowed to set the project ID of
> their own files. How else are they going to set the project ID on
> files they create in random directories so to account them to the
> correct project they are working on?

In this case project disk limits are almost useless and even dangerous
because any unprivileged user could add files into limited project
witch belongs to other user.

>
> However, you keep making the assumption that project quotas ==
> directory subtree quotas. Project quotas are *not limited* to
> directory subtrees - the subtree quota implementation is just an
> implementation that *sets the default project ID* on files as they
> are created.
>
> e.g. there are production systems out there where project quotas are
> used to track home directory space usage rather than user quotas.
> This means users can take actions like "this file actually belongs
> to project X and it shouldn't be accounted against my home
> directory". Users can create their own sub directories that account
> everything by default to project X rather than their own home
> directory.
>
> Again: project quotas are an *accounting* mechanism, not a security
> mechanism.
>
> Containers are *security mechanism* and hence we need a security
> model for container resource controller mechanisms. Project quotas
> do not provide a directory heirarchy access security model - that's
> what we use mount namespaces for. The resource controller security
> model only has to prevent users inside the container from subverting
> the resource controller mechanism, not anything else.
>
> Not surprisingly, we've implemented *exactly* the model you are
> suggesting: that modification of the resource accounting mechanism
> is a privileged operation that cannot be accessed from within the
> container. i.e. inside a userns container you can't change the
> project ID on a file, not even as root.
>
>> Also if user have permission for changing project id he could be
>> permitted to link and rename file into directory with any project
>> id, because he anyway could change project, move, and revert it
>> back.
>
> You don't appear to understand why XFS forbids linking/renaming
> across directories different project IDs. Hint: it's resource
> accounting simplification, *not a security mechanism*.
>
> Linking is obvious: you can't have the same inode accounted to
> multiple projects - it belongs to a single project and so can't be
> accounted to multiple projects. Hence if you want to link across
> different directory-based project quotas, you have to use symlinks.
>
> That's much simpler than having to decide what project the inode is
> accounted to, especially when removing links and link that owns the
> project ID is removed. How do you even know the link you are
> removing is the last link in the current project? IOWs, you have to
> search for the other owners of the inode to determine who the
> project quota is now accounted to...

But you have to search hardlinks everywhere (inode owner can hardlink it
into any directory where he has write access because project can be
changed temporary). And after that you have to search broken symlinks.
Also symlinks cannot share file between isolated containers which run in
chroot while creating hardlinks is still possible but requires some
extra steps like changing project id or creating temporary directories
even if you're root.

Not so useful too. Probably that's the reason why this feature seems
never been implemented anywhere except xfs.

Could we change that? For example by adding flag into quota-info block
which makes project id more restrictive and useful?

>
> Same for rename: there are a multitude of nasty corner cases when it
> comes to accounting the quotas correctly. So, either we try to do
> something complex and likely expensive and buggy, or we can return
> EXDEV. EXDEV was very carefully chosen here, and it's not for
> security reasons. It was chosen because applications know that if a
> rename returns EXDEV, they've got to *copy* the file instead. And,
> well, that create/write/unlink process results in correct project
> quota accounting at both the source and destination.
>
> IOWs: EXDEV not a security mechanism, it's an accounting mechanism.
>
> If you can implement project quota rename accounting and handle the
> multiple handlinks problem efficiently, then you can allow those
> things to be done directly in the filesystem rather than returning
> EXDEV.
>
>> For me perfect interface looks like couple fcntls for
>> getting/changing project id:
>>
>> int fcntl(fd, F_GET_PROJECT, projid_t *);
>> int fcntl(fd, F_SET_PROJECT, projid_t);
>>
>> F_GET_PROJECT is allowed for everybody
>> F_SET_PROJECT requires CAP_SYS_ADMIN (or maybe CAP_FOWNER?)
>
> Sure, it's nice, but you're ignoring the entire the point of making
> FS_IOC_SETXATTR generic: so that the *existing tools* that manage
> project quotas work on all project quota enabled filesystems.
> i.e. so that all filesystems *behave the same* and can *run
> identical regression tests*.

As i see quota tools in xfsprogs checks file-system name and doesn't
work for anything except "xfs", so we have to patch it anywas.
xfstests are cool but I think fixing one ioctl isn't a problem.
Something else?

>
> We do not want different project quota implementations on different
> filesystems. Like user and group quotas, they need to be
> consistently implemented across all filesystems. If you want
> something new, different and incompatible with existing
> infrastructure, then that's a separate line of development and
> discussion....
>
> Cheers,
>
> Dave.
>


--
Konstantin

2015-02-05 16:38:15

by Jan Kara

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

Hello,

> Users have *always* been allowed to set the project ID of
> their own files. How else are they going to set the project ID on
> files they create in random directories so to account them to the
> correct project they are working on?
>
> However, you keep making the assumption that project quotas ==
> directory subtree quotas. Project quotas are *not limited* to
> directory subtrees - the subtree quota implementation is just an
> implementation that *sets the default project ID* on files as they
> are created.
>
> e.g. there are production systems out there where project quotas are
> used to track home directory space usage rather than user quotas.
> This means users can take actions like "this file actually belongs
> to project X and it shouldn't be accounted against my home
> directory". Users can create their own sub directories that account
> everything by default to project X rather than their own home
> directory.
>
> Again: project quotas are an *accounting* mechanism, not a security
> mechanism.
OK, but now I got confused ;) So if users can change project ID of files
they own, what's the point of project quotas? If I need to create a file
and project quota doesn't allow me, I just set its project ID to some
random number and I'm done with that... So are really project quotas just
"advisory" - i.e., all users of a system cooperate so that project X
doesn't use more space than it should (and project quotas make this
cooperation somewhat simpler) or is there something which limits which
project IDs user can set? I didn't find anything...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-02-05 21:05:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

On Thu, Feb 05, 2015 at 05:38:15PM +0100, Jan Kara wrote:
> Hello,
>
> > Users have *always* been allowed to set the project ID of
> > their own files. How else are they going to set the project ID on
> > files they create in random directories so to account them to the
> > correct project they are working on?
> >
> > However, you keep making the assumption that project quotas ==
> > directory subtree quotas. Project quotas are *not limited* to
> > directory subtrees - the subtree quota implementation is just an
> > implementation that *sets the default project ID* on files as they
> > are created.
> >
> > e.g. there are production systems out there where project quotas are
> > used to track home directory space usage rather than user quotas.
> > This means users can take actions like "this file actually belongs
> > to project X and it shouldn't be accounted against my home
> > directory". Users can create their own sub directories that account
> > everything by default to project X rather than their own home
> > directory.
> >
> > Again: project quotas are an *accounting* mechanism, not a security
> > mechanism.
> OK, but now I got confused ;) So if users can change project ID of files
> they own, what's the point of project quotas? If I need to create a file
> and project quota doesn't allow me, I just set its project ID to some
> random number and I'm done with that...

Sure, but the admin is going to notice those random numbers in the
next quota report they run and then a user is going to get
re-educated with a clue bat.

> So are really project quotas just
> "advisory" - i.e., all users of a system cooperate so that project X
> doesn't use more space than it should (and project quotas make this
> cooperation somewhat simpler) or is there something which limits which
> project IDs user can set? I didn't find anything...

Not directly. Project quotas have historically been used in tandem
with user quotas - user quotas cannot be escaped and that puts a
limit on the shenanigans that users can play with project quota.

[ Indeed, Irix only had user and project quotas - it *never* had group
quotas. e.g. when XFS was ported to Linux the project quota inode
was re-purposed for group quotas in 2001:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=749b2bf3ed5ff064efd69370e6b31ea44c4a78a6
]

However, if you have a fileystem system that users can't directly
access (e.g. an NFS server) then you can use project quotas as a
space enforcement mechanism because users can't change the project
ID on the files on the server. We've used the same model with
containers - for the host to be able to use project quotas as space
resource controller, users inside a container must not be able to
change the project ID of a file....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-02-13 02:00:28

by Li Xi

[permalink] [raw]
Subject: Re: [v8 0/5] ext4: add project quota support

Hi Jon,

Yeah, I've written a script to test the project quota which has about
a dozen of test suites. However, sorry, I didn't add xfstest cases for
it. And we need to apply a series of e2fsprogs patches to enable
project quota feature though.

Thanks,
Li Xi

On Tue, Jan 27, 2015 at 2:46 AM, jon ernst <[email protected]> wrote:
> Xi,
> Just a question, do you have any test scripts or xfstest cases for
> this patchset?
>
> Thanks,
> Jon
>
> On Tue, Dec 9, 2014 at 12:22 AM, Li Xi <[email protected]> wrote:
>> The following patches propose an implementation of project quota
>> support for ext4. A project is an aggregate of unrelated inodes
>> which might scatter in different directories. Inodes that belong
>> to the same project possess an identical identification i.e.
>> 'project ID', just like every inode has its user/group
>> identification. The following patches add project quota as
>> supplement to the former uer/group quota types.
>>
>> The semantics of ext4 project quota is consistent with XFS. Each
>> directory can have EXT4_INODE_PROJINHERIT flag set. When the
>> EXT4_INODE_PROJINHERIT flag of a parent directory is not set, a
>> newly created inode under that directory will have a default project
>> ID (i.e. 0). And its EXT4_INODE_PROJINHERIT flag is not set either.
>> When this flag is set on a directory, following rules will be kept:
>>
>> 1) The newly created inode under that directory will inherit both
>> the EXT4_INODE_PROJINHERIT flag and the project ID from its parent
>> directory.
>>
>> 2) Hard-linking a inode with different project ID into that directory
>> will fail with errno EXDEV.
>>
>> 3) Renaming a inode with different project ID into that directory
>> will fail with errno EXDEV. However, 'mv' command will detect this
>> failure and copy the renamed inode to a new inode in the directory.
>> Thus, this new inode will inherit both the project ID and
>> EXT4_INODE_PROJINHERIT flag.
>>
>> 4) If the project quota of that ID is being enforced, statfs() on
>> that directory will take the quotas as another upper limits along
>> with the capacity of the file system, i.e. the total block/inode
>> number will be the minimum of the quota limits and file system
>> capacity.
>>
>> Changelog:
>> * v8 <- v7:
>> - Rebase to newest dev branch of ext4 repository (3.18.0_rc3).
>> * v7 <- v6:
>> - Map ext4 inode flags to xflags of struct fsxattr;
>> - Add patch to cleanup ext4 inode flag definitions.
>> * v6 <- v5:
>> - Add project ID check for cross rename;
>> - Remove patch of EXT4_IOC_GETPROJECT/EXT4_IOC_SETPROJECT ioctl
>> * v5 <- v4:
>> - Check project feature when set/get project ID;
>> - Do not check project feature for project quota;
>> - Add support of FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR.
>> * v4 <- v3:
>> - Do not check project feature when set/get project ID;
>> - Use EXT4_MAXQUOTAS instead of MAXQUOTAS in ext4 patches;
>> - Remove unnecessary change of fs/quota/dquot.c;
>> - Remove CONFIG_QUOTA_PROJECT.
>> * v3 <- v2:
>> - Add EXT4_INODE_PROJINHERIT semantics.
>> * v2 <- v1:
>> - Add ioctl interface for setting/getting project;
>> - Add EXT4_FEATURE_RO_COMPAT_PROJECT;
>> - Add get_projid() method in struct dquot_operations;
>> - Add error check of ext4_inode_projid_set/get().
>>
>> v7: http://www.spinics.net/lists/linux-fsdevel/msg80404.html
>> v6: http://www.spinics.net/lists/linux-fsdevel/msg80022.html
>> v5: http://www.spinics.net/lists/linux-api/msg04840.html
>> v4: http://lwn.net/Articles/612972/
>> v3: http://www.spinics.net/lists/linux-ext4/msg45184.html
>> v2: http://www.spinics.net/lists/linux-ext4/msg44695.html
>> v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153
>>
>> Any comments or feedbacks are appreciated.
>>
>> Regards,
>> - Li Xi
>>
>> Li Xi (5):
>> vfs: adds general codes to enforces project quota limits
>> ext4: adds project ID support
>> ext4: adds project quota support
>> ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
>> ext4: cleanup inode flag definitions
>>
>> fs/ext4/ext4.h | 190 +++++++++++++++++++++----
>> fs/ext4/ialloc.c | 6 +
>> fs/ext4/inode.c | 29 ++++
>> fs/ext4/ioctl.c | 330 +++++++++++++++++++++++++++++++-------------
>> fs/ext4/namei.c | 17 +++
>> fs/ext4/super.c | 96 +++++++++++--
>> fs/quota/dquot.c | 35 ++++-
>> fs/quota/quota.c | 8 +-
>> fs/quota/quotaio_v2.h | 6 +-
>> fs/xfs/xfs_fs.h | 47 +++----
>> include/linux/quota.h | 2 +
>> include/uapi/linux/fs.h | 59 ++++++++
>> include/uapi/linux/quota.h | 6 +-
>> 13 files changed, 650 insertions(+), 181 deletions(-)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html