2015-04-09 15:14:32

by Li Xi

[permalink] [raw]
Subject: [v12 0/5] ext4: add project quota support

The following patches propose an implementation of project quota
support for ext4. A project is an aggregate of unrelated inodes
which might scatter in different directories. Inodes that belong
to the same project possess an identical identification i.e.
'project ID', just like every inode has its user/group
identification. The following patches add project quota as
supplement to the former uer/group quota types.

The semantics of ext4 project quota is consistent with XFS. Each
directory can have EXT4_INODE_PROJINHERIT flag set. When the
EXT4_INODE_PROJINHERIT flag of a parent directory is not set, a
newly created inode under that directory will have a default project
ID (i.e. 0). And its EXT4_INODE_PROJINHERIT flag is not set either.
When this flag is set on a directory, following rules will be kept:

1) The newly created inode under that directory will inherit both
the EXT4_INODE_PROJINHERIT flag and the project ID from its parent
directory.

2) Hard-linking a inode with different project ID into that directory
will fail with errno EXDEV.

3) Renaming a inode with different project ID into that directory
will fail with errno EXDEV. However, 'mv' command will detect this
failure and copy the renamed inode to a new inode in the directory.
Thus, this new inode will inherit both the project ID and
EXT4_INODE_PROJINHERIT flag.

4) If the project quota of that ID is being enforced, statfs() on
that directory will take the quotas as another upper limits along
with the capacity of the file system, i.e. the total block/inode
number will be the minimum of the quota limits and file system
capacity.

Changelog:
* v12 <- v11:
- Relax the permission check when setting project ID.
* v11 <- v10:
- Remove project quota mount option;
- Fix permission check when setting project ID
* v10 <- v9:
- Remove non-journaled project quota interface;
- Only allow admin to read project quota info;
- Cleanup FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface.
* v9 <- v8:
- Remove non-journaled project quota;
- Rebase to newest dev branch of ext4 repository (3.19.0-rc3).
* v8 <- v7:
- Rebase to newest dev branch of ext4 repository (3.18.0_rc3).
* v7 <- v6:
- Map ext4 inode flags to xflags of struct fsxattr;
- Add patch to cleanup ext4 inode flag definitions.
* v6 <- v5:
- Add project ID check for cross rename;
- Remove patch of EXT4_IOC_GETPROJECT/EXT4_IOC_SETPROJECT ioctl
* v5 <- v4:
- Check project feature when set/get project ID;
- Do not check project feature for project quota;
- Add support of FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR.
* v4 <- v3:
- Do not check project feature when set/get project ID;
- Use EXT4_MAXQUOTAS instead of MAXQUOTAS in ext4 patches;
- Remove unnecessary change of fs/quota/dquot.c;
- Remove CONFIG_QUOTA_PROJECT.
* v3 <- v2:
- Add EXT4_INODE_PROJINHERIT semantics.
* v2 <- v1:
- Add ioctl interface for setting/getting project;
- Add EXT4_FEATURE_RO_COMPAT_PROJECT;
- Add get_projid() method in struct dquot_operations;
- Add error check of ext4_inode_projid_set/get().

v11: http://www.spinics.net/lists/linux-ext4/msg47450.html
v10: http://www.spinics.net/lists/linux-ext4/msg47413.html
v9: http://www.spinics.net/lists/linux-ext4/msg47326.html
v8: http://www.spinics.net/lists/linux-ext4/msg46545.html
v7: http://www.spinics.net/lists/linux-fsdevel/msg80404.html
v6: http://www.spinics.net/lists/linux-fsdevel/msg80022.html
v5: http://www.spinics.net/lists/linux-api/msg04840.html
v4: http://lwn.net/Articles/612972/
v3: http://www.spinics.net/lists/linux-ext4/msg45184.html
v2: http://www.spinics.net/lists/linux-ext4/msg44695.html
v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153

Any comments or feedbacks are appreciated.

Regards,
- Li Xi

Li Xi (5):
vfs: adds general codes to enforces project quota limits
ext4: adds project ID support
ext4: adds project quota support
ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
ext4: cleanup inode flag definitions

fs/ext4/ext4.h | 86 +++++++----
fs/ext4/ialloc.c | 5 +
fs/ext4/inode.c | 29 ++++
fs/ext4/ioctl.c | 361 +++++++++++++++++++++++++++++++++-----------
fs/ext4/namei.c | 18 +++
fs/ext4/super.c | 57 +++++++-
fs/quota/dquot.c | 35 ++++-
fs/quota/quota.c | 5 +-
fs/quota/quotaio_v2.h | 6 +-
fs/xfs/xfs_fs.h | 47 ++----
include/linux/quota.h | 2 +
include/uapi/linux/fs.h | 33 ++++
include/uapi/linux/quota.h | 6 +-
13 files changed, 527 insertions(+), 163 deletions(-)


2015-04-09 15:14:33

by Li Xi

[permalink] [raw]
Subject: [v12 1/5] vfs: adds general codes to enforces project quota limits

This patch adds support for a new quota type PRJQUOTA for project quota
enforcement. Also a new method get_projid() is added into dquot_operations
structure.

Signed-off-by: Li Xi <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/quota/dquot.c | 35 ++++++++++++++++++++++++++++++-----
fs/quota/quota.c | 5 ++++-
fs/quota/quotaio_v2.h | 6 ++++--
include/linux/quota.h | 2 ++
include/uapi/linux/quota.h | 6 ++++--
5 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 8f0acef..a02bb68 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1159,8 +1159,8 @@ static int need_print_warning(struct dquot_warn *warn)
return uid_eq(current_fsuid(), warn->w_dq_id.uid);
case GRPQUOTA:
return in_group_p(warn->w_dq_id.gid);
- case PRJQUOTA: /* Never taken... Just make gcc happy */
- return 0;
+ case PRJQUOTA:
+ return 1;
}
return 0;
}
@@ -1399,6 +1399,9 @@ static void __dquot_initialize(struct inode *inode, int type)
/* First get references to structures we might need. */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
struct kqid qid;
+ kprojid_t projid;
+ int rc;
+
got[cnt] = NULL;
if (type != -1 && cnt != type)
continue;
@@ -1409,6 +1412,10 @@ static void __dquot_initialize(struct inode *inode, int type)
*/
if (i_dquot(inode)[cnt])
continue;
+
+ if (!sb_has_quota_active(sb, cnt))
+ continue;
+
init_needed = 1;

switch (cnt) {
@@ -1418,6 +1425,12 @@ static void __dquot_initialize(struct inode *inode, int type)
case GRPQUOTA:
qid = make_kqid_gid(inode->i_gid);
break;
+ case PRJQUOTA:
+ rc = inode->i_sb->dq_op->get_projid(inode, &projid);
+ if (rc)
+ continue;
+ qid = make_kqid_projid(projid);
+ break;
}
got[cnt] = dqget(sb, qid);
}
@@ -2161,7 +2174,8 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
error = -EROFS;
goto out_fmt;
}
- if (!sb->s_op->quota_write || !sb->s_op->quota_read) {
+ if (!sb->s_op->quota_write || !sb->s_op->quota_read ||
+ (type == PRJQUOTA && sb->dq_op->get_projid == NULL)) {
error = -EINVAL;
goto out_fmt;
}
@@ -2402,8 +2416,19 @@ static void do_get_dqblk(struct dquot *dquot, struct fs_disk_quota *di)

memset(di, 0, sizeof(*di));
di->d_version = FS_DQUOT_VERSION;
- di->d_flags = dquot->dq_id.type == USRQUOTA ?
- FS_USER_QUOTA : FS_GROUP_QUOTA;
+ switch (dquot->dq_id.type) {
+ case USRQUOTA:
+ di->d_flags = FS_USER_QUOTA;
+ break;
+ case GRPQUOTA:
+ di->d_flags = FS_GROUP_QUOTA;
+ break;
+ case PRJQUOTA:
+ di->d_flags = FS_PROJ_QUOTA;
+ break;
+ default:
+ BUG();
+ }
di->d_id = from_kqid_munged(current_user_ns(), dquot->dq_id);

spin_lock(&dq_data_lock);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 2aa4151..33b30b1 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -30,7 +30,10 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
case Q_XGETQSTATV:
case Q_XQUOTASYNC:
break;
- /* allow to query information for dquots we "own" */
+ /*
+ * allow to query information for dquots we "own"
+ * always allow querying project quota
+ */
case Q_GETQUOTA:
case Q_XGETQUOTA:
if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) ||
diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
index f1966b4..4e95430 100644
--- a/fs/quota/quotaio_v2.h
+++ b/fs/quota/quotaio_v2.h
@@ -13,12 +13,14 @@
*/
#define V2_INITQMAGICS {\
0xd9c01f11, /* USRQUOTA */\
- 0xd9c01927 /* GRPQUOTA */\
+ 0xd9c01927, /* GRPQUOTA */\
+ 0xd9c03f14, /* PRJQUOTA */\
}

#define V2_INITQVERSIONS {\
1, /* USRQUOTA */\
- 1 /* GRPQUOTA */\
+ 1, /* GRPQUOTA */\
+ 1, /* PRJQUOTA */\
}

/* First generic header */
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 50978b7..ba51f7e 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -50,6 +50,7 @@

#undef USRQUOTA
#undef GRPQUOTA
+#undef PRJQUOTA
enum quota_type {
USRQUOTA = 0, /* element used for user quotas */
GRPQUOTA = 1, /* element used for group quotas */
@@ -317,6 +318,7 @@ struct dquot_operations {
/* get reserved quota for delayed alloc, value returned is managed by
* quota code only */
qsize_t *(*get_reserved_space) (struct inode *);
+ int (*get_projid) (struct inode *, kprojid_t *);/* Get project ID */
};

struct path;
diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h
index 3b6cfbe..b2d9486 100644
--- a/include/uapi/linux/quota.h
+++ b/include/uapi/linux/quota.h
@@ -36,11 +36,12 @@
#include <linux/errno.h>
#include <linux/types.h>

-#define __DQUOT_VERSION__ "dquot_6.5.2"
+#define __DQUOT_VERSION__ "dquot_6.6.0"

-#define MAXQUOTAS 2
+#define MAXQUOTAS 3
#define USRQUOTA 0 /* element used for user quotas */
#define GRPQUOTA 1 /* element used for group quotas */
+#define PRJQUOTA 2 /* element used for project quotas */

/*
* Definitions for the default names of the quotas files.
@@ -48,6 +49,7 @@
#define INITQFNAMES { \
"user", /* USRQUOTA */ \
"group", /* GRPQUOTA */ \
+ "project", /* PRJQUOTA */ \
"undefined", \
};

--
1.7.1


2015-04-09 15:14:34

by Li Xi

[permalink] [raw]
Subject: [v12 2/5] ext4: adds project ID support

This patch adds a new internal field of ext4 inode to save project
identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
inheriting project ID from parent directory.

Signed-off-by: Li Xi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 21 +++++++++++++++++----
fs/ext4/ialloc.c | 5 +++++
fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
fs/ext4/namei.c | 18 ++++++++++++++++++
fs/ext4/super.c | 1 +
include/uapi/linux/fs.h | 1 +
6 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7fec2ef..7acb2da 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -378,16 +378,18 @@ struct flex_groups {
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
+#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
-#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
+#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
+#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */

/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
- EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
+ EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
+ EXT4_PROJINHERIT_FL)

/* Flags that are appropriate for regular files (all but dir-specific ones). */
#define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
@@ -435,6 +437,7 @@ enum {
EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
+ EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
};

@@ -684,6 +687,7 @@ struct ext4_inode {
__le32 i_crtime; /* File Creation time */
__le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
__le32 i_version_hi; /* high 32 bits for 64-bit version */
+ __le32 i_projid; /* Project ID */
};

struct move_extent {
@@ -939,6 +943,7 @@ struct ext4_inode_info {

/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
__u32 i_csum_seed;
+ kprojid_t i_projid;
};

/*
@@ -1531,6 +1536,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
*/
#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
#define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
+#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000

#define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
#define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
@@ -1581,7 +1587,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
- EXT4_FEATURE_RO_COMPAT_QUOTA)
+ EXT4_FEATURE_RO_COMPAT_QUOTA |\
+ EXT4_FEATURE_RO_COMPAT_PROJECT)

/*
* Default values for user and/or group using reserved blocks
@@ -1589,6 +1596,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_DEF_RESUID 0
#define EXT4_DEF_RESGID 0

+/*
+ * Default project ID
+ */
+#define EXT4_DEF_PROJID 0
+
#define EXT4_DEF_INODE_READAHEAD_BLKS 32

/*
@@ -2141,6 +2153,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
+extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
extern void ext4_da_update_reserve_space(struct inode *inode,
int used, int quota_claim);

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index ac644c3..10ca9dd 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -756,6 +756,11 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
inode->i_gid = dir->i_gid;
} else
inode_init_owner(inode, dir, mode);
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+ ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
+ ei->i_projid = EXT4_I(dir)->i_projid;
+ else
+ ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
dquot_initialize(inode);

if (!goal)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4df6d01..6e4833f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3870,6 +3870,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
EXT4_I(inode)->i_inline_off = 0;
}

+int ext4_get_projid(struct inode *inode, kprojid_t *projid)
+{
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+ return -EOPNOTSUPP;
+ *projid = EXT4_I(inode)->i_projid;
+ return 0;
+}
+
struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
{
struct ext4_iloc iloc;
@@ -3881,6 +3889,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
int block;
uid_t i_uid;
gid_t i_gid;
+ projid_t i_projid;

inode = iget_locked(sb, ino);
if (!inode)
@@ -3930,12 +3939,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
inode->i_mode = le16_to_cpu(raw_inode->i_mode);
i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
+ i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
+ else
+ i_projid = EXT4_DEF_PROJID;
+
if (!(test_opt(inode->i_sb, NO_UID32))) {
i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
}
i_uid_write(inode, i_uid);
i_gid_write(inode, i_gid);
+ ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));

ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
@@ -4165,6 +4180,7 @@ static int ext4_do_update_inode(handle_t *handle,
int need_datasync = 0, set_large_file = 0;
uid_t i_uid;
gid_t i_gid;
+ projid_t i_projid;

spin_lock(&ei->i_raw_lock);

@@ -4177,6 +4193,7 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_mode = cpu_to_le16(inode->i_mode);
i_uid = i_uid_read(inode);
i_gid = i_gid_read(inode);
+ i_projid = from_kprojid(&init_user_ns, ei->i_projid);
if (!(test_opt(inode->i_sb, NO_UID32))) {
raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
@@ -4256,6 +4273,18 @@ static int ext4_do_update_inode(handle_t *handle,
}
}

+ BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT) &&
+ i_projid != EXT4_DEF_PROJID);
+ if (i_projid != EXT4_DEF_PROJID &&
+ (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
+ (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
+ spin_unlock(&ei->i_raw_lock);
+ err = -EFBIG;
+ goto out_brelse;
+ }
+ raw_inode->i_projid = cpu_to_le32(i_projid);
+
ext4_inode_csum_set(inode, raw_inode, ei);

spin_unlock(&ei->i_raw_lock);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2291923..63a9623 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;

+ if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
+ (!projid_eq(EXT4_I(dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)))
+ return -EXDEV;
+
dquot_initialize(dir);

retry:
@@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
int credits;
u8 old_file_type;

+ if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
+ (!projid_eq(EXT4_I(new_dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)))
+ return -EXDEV;
+
dquot_initialize(old.dir);
dquot_initialize(new.dir);

@@ -3395,6 +3405,14 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
u8 new_file_type;
int retval;

+ if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT) &&
+ !projid_eq(EXT4_I(new_dir)->i_projid,
+ EXT4_I(old_dentry->d_inode)->i_projid)) ||
+ (ext4_test_inode_flag(old_dir, EXT4_INODE_PROJINHERIT) &&
+ !projid_eq(EXT4_I(old_dir)->i_projid,
+ EXT4_I(new_dentry->d_inode)->i_projid)))
+ return -EXDEV;
+
dquot_initialize(old.dir);
dquot_initialize(new.dir);

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index bff3427..04c6cc3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1073,6 +1073,7 @@ static const struct dquot_operations ext4_quota_operations = {
.write_info = ext4_write_info,
.alloc_dquot = dquot_alloc,
.destroy_dquot = dquot_destroy,
+ .get_projid = ext4_get_projid,
};

static const struct quotactl_ops ext4_qctl_operations = {
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3735fa0..fcbf647 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -195,6 +195,7 @@ struct inodes_stat_t {
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
+#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */

#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
--
1.7.1


2015-04-09 15:14:37

by Li Xi

[permalink] [raw]
Subject: [v12 5/5] ext4: cleanup inode flag definitions

The inode flags defined in uapi/linux/fs.h were migrated from
ext4.h. This patch changes the inode flag definitions in ext4.h
to VFS definitions to make the gaps between them clearer.

Signed-off-by: Li Xi <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
---
fs/ext4/ext4.h | 50 +++++++++++++++++++++++++-------------------------
1 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 377fec0..05d0e8d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -352,34 +352,34 @@ struct flex_groups {
/*
* Inode flags
*/
-#define EXT4_SECRM_FL 0x00000001 /* Secure deletion */
-#define EXT4_UNRM_FL 0x00000002 /* Undelete */
-#define EXT4_COMPR_FL 0x00000004 /* Compress file */
-#define EXT4_SYNC_FL 0x00000008 /* Synchronous updates */
-#define EXT4_IMMUTABLE_FL 0x00000010 /* Immutable file */
-#define EXT4_APPEND_FL 0x00000020 /* writes to file may only append */
-#define EXT4_NODUMP_FL 0x00000040 /* do not dump file */
-#define EXT4_NOATIME_FL 0x00000080 /* do not update atime */
+#define EXT4_SECRM_FL FS_SECRM_FL /* Secure deletion */
+#define EXT4_UNRM_FL FS_UNRM_FL /* Undelete */
+#define EXT4_COMPR_FL FS_COMPR_FL /* Compress file */
+#define EXT4_SYNC_FL FS_SYNC_FL /* Synchronous updates */
+#define EXT4_IMMUTABLE_FL FS_IMMUTABLE_FL /* Immutable file */
+#define EXT4_APPEND_FL FS_APPEND_FL /* writes to file may only append */
+#define EXT4_NODUMP_FL FS_NODUMP_FL /* do not dump file */
+#define EXT4_NOATIME_FL FS_NOATIME_FL /* do not update atime */
/* Reserved for compression usage... */
-#define EXT4_DIRTY_FL 0x00000100
-#define EXT4_COMPRBLK_FL 0x00000200 /* One or more compressed clusters */
-#define EXT4_NOCOMPR_FL 0x00000400 /* Don't compress */
+#define EXT4_DIRTY_FL FS_DIRTY_FL
+#define EXT4_COMPRBLK_FL FS_COMPRBLK_FL /* One or more compressed clusters */
+#define EXT4_NOCOMPR_FL FS_NOCOMP_FL /* Don't compress */
/* nb: was previously EXT2_ECOMPR_FL */
-#define EXT4_ENCRYPT_FL 0x00000800 /* encrypted file */
+#define EXT4_ENCRYPT_FL 0x00000800 /* encrypted file */
/* End compression flags --- maybe not all used */
-#define EXT4_INDEX_FL 0x00001000 /* hash-indexed directory */
-#define EXT4_IMAGIC_FL 0x00002000 /* AFS directory */
-#define EXT4_JOURNAL_DATA_FL 0x00004000 /* file data should be journaled */
-#define EXT4_NOTAIL_FL 0x00008000 /* file tail should not be merged */
-#define EXT4_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
-#define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
-#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
-#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
-#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
-#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
-#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
-#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
-#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
+#define EXT4_INDEX_FL FS_INDEX_FL /* hash-indexed directory */
+#define EXT4_IMAGIC_FL FS_IMAGIC_FL /* AFS directory */
+#define EXT4_JOURNAL_DATA_FL FS_JOURNAL_DATA_FL /* file data should be journaled */
+#define EXT4_NOTAIL_FL FS_NOTAIL_FL /* file tail should not be merged */
+#define EXT4_DIRSYNC_FL FS_DIRSYNC_FL /* dirsync behaviour (directories only) */
+#define EXT4_TOPDIR_FL FS_TOPDIR_FL /* Top of directory hierarchies*/
+#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
+#define EXT4_EXTENTS_FL FS_EXTENT_FL /* Inode uses extents */
+#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
+#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
+#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
+#define EXT4_PROJINHERIT_FL FS_PROJINHERIT_FL /* Create with parents projid */
+#define EXT4_RESERVED_FL FS_RESERVED_FL /* reserved for ext4 lib */

#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
--
1.7.1


2015-04-09 15:14:36

by Li Xi

[permalink] [raw]
Subject: [v12 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
support for ext4. The interface is kept consistent with
XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.

Signed-off-by: Li Xi <[email protected]>
---
fs/ext4/ext4.h | 9 ++
fs/ext4/ioctl.c | 361 +++++++++++++++++++++++++++++++++++-----------
fs/xfs/xfs_fs.h | 47 +++----
include/uapi/linux/fs.h | 32 ++++
4 files changed, 332 insertions(+), 117 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8ddc723..377fec0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -384,6 +384,13 @@ struct flex_groups {
#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */

+#define EXT4_FL_XFLAG_VISIBLE (EXT4_SYNC_FL | \
+ EXT4_IMMUTABLE_FL | \
+ EXT4_APPEND_FL | \
+ EXT4_NODUMP_FL | \
+ EXT4_NOATIME_FL | \
+ EXT4_PROJINHERIT_FL)
+
/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
@@ -606,6 +613,8 @@ enum {
#define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
#define EXT4_IOC_SWAP_BOOT _IO('f', 17)
#define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18)
+#define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR
+#define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR

#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/*
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f58a0d1..b1e40ca 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -14,6 +14,8 @@
#include <linux/compat.h>
#include <linux/mount.h>
#include <linux/file.h>
+#include <linux/quotaops.h>
+#include <linux/quota.h>
#include <asm/uaccess.h>
#include "ext4_jbd2.h"
#include "ext4.h"
@@ -196,6 +198,222 @@ journal_err_out:
return err;
}

+static int ext4_ioctl_setflags(struct inode *inode,
+ unsigned int flags)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ handle_t *handle = NULL;
+ int err = EPERM, migrate = 0;
+ struct ext4_iloc iloc;
+ unsigned int oldflags, mask, i;
+ unsigned int jflag;
+
+ /* Is it quota file? Do not allow user to mess with it */
+ if (IS_NOQUOTA(inode))
+ goto flags_out;
+
+ oldflags = ei->i_flags;
+
+ /* The JOURNAL_DATA flag is modifiable only by root */
+ jflag = flags & EXT4_JOURNAL_DATA_FL;
+
+ /*
+ * The IMMUTABLE and APPEND_ONLY flags can only be changed by
+ * the relevant capability.
+ *
+ * This test looks nicer. Thanks to Pauline Middelink
+ */
+ if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
+ if (!capable(CAP_LINUX_IMMUTABLE))
+ goto flags_out;
+ }
+
+ /*
+ * The JOURNAL_DATA flag can only be changed by
+ * the relevant capability.
+ */
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
+ if (!capable(CAP_SYS_RESOURCE))
+ goto flags_out;
+ }
+ if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
+ migrate = 1;
+
+ if (flags & EXT4_EOFBLOCKS_FL) {
+ /* we don't support adding EOFBLOCKS flag */
+ if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
+ err = -EOPNOTSUPP;
+ goto flags_out;
+ }
+ } else if (oldflags & EXT4_EOFBLOCKS_FL)
+ ext4_truncate(inode);
+
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto flags_out;
+ }
+ if (IS_SYNC(inode))
+ ext4_handle_sync(handle);
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto flags_err;
+
+ for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
+ if (!(mask & EXT4_FL_USER_MODIFIABLE))
+ continue;
+ if (mask & flags)
+ ext4_set_inode_flag(inode, i);
+ else
+ ext4_clear_inode_flag(inode, i);
+ }
+
+ ext4_set_inode_flags(inode);
+ inode->i_ctime = ext4_current_time(inode);
+
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+flags_err:
+ ext4_journal_stop(handle);
+ if (err)
+ goto flags_out;
+
+ if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
+ err = ext4_change_inode_journal_flag(inode, jflag);
+ if (err)
+ goto flags_out;
+ if (migrate) {
+ if (flags & EXT4_EXTENTS_FL)
+ err = ext4_ext_migrate(inode);
+ else
+ err = ext4_ind_migrate(inode);
+ }
+
+flags_out:
+ return err;
+}
+
+static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
+{
+ struct inode *inode = file_inode(filp);
+ struct super_block *sb = inode->i_sb;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ int err;
+ handle_t *handle;
+ kprojid_t kprojid;
+ struct ext4_iloc iloc;
+ struct ext4_inode *raw_inode;
+
+ struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
+
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+ BUG_ON(__kprojid_val(EXT4_I(inode)->i_projid)
+ != EXT4_DEF_PROJID);
+ if (projid != EXT4_DEF_PROJID)
+ return -EOPNOTSUPP;
+ else
+ return 0;
+ }
+
+ kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
+
+ if (projid_eq(kprojid, EXT4_I(inode)->i_projid))
+ return 0;
+
+ err = mnt_want_write_file(filp);
+ if (err)
+ return err;
+
+ err = -EPERM;
+ mutex_lock(&inode->i_mutex);
+ /* Is it quota file? Do not allow user to mess with it */
+ if (IS_NOQUOTA(inode))
+ goto project_out;
+
+ dquot_initialize(inode);
+
+ handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
+ EXT4_QUOTA_INIT_BLOCKS(sb) +
+ EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto project_out;
+ }
+
+ err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto project_stop;
+
+ raw_inode = ext4_raw_inode(&iloc);
+ if ((EXT4_INODE_SIZE(sb) <=
+ EXT4_GOOD_OLD_INODE_SIZE) ||
+ (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))) {
+ err = -EFBIG;
+ goto project_stop;
+ }
+
+ transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid));
+ if (!transfer_to[PRJQUOTA])
+ goto project_set;
+
+ err = __dquot_transfer(inode, transfer_to);
+ dqput(transfer_to[PRJQUOTA]);
+ if (err)
+ goto project_stop;
+
+project_set:
+ EXT4_I(inode)->i_projid = kprojid;
+ inode->i_ctime = ext4_current_time(inode);
+ err = ext4_mark_iloc_dirty(handle, inode, &iloc);
+project_stop:
+ ext4_journal_stop(handle);
+project_out:
+ mutex_unlock(&inode->i_mutex);
+ mnt_drop_write_file(filp);
+ return err;
+}
+
+/* Transfer internal flags to xflags */
+static inline __u32 ext4_iflags_to_xflags(unsigned long iflags)
+{
+ __u32 xflags = 0;
+
+ if (iflags & EXT4_SYNC_FL)
+ xflags |= FS_XFLAG_SYNC;
+ if (iflags & EXT4_IMMUTABLE_FL)
+ xflags |= FS_XFLAG_IMMUTABLE;
+ if (iflags & EXT4_APPEND_FL)
+ xflags |= FS_XFLAG_APPEND;
+ if (iflags & EXT4_NODUMP_FL)
+ xflags |= FS_XFLAG_NODUMP;
+ if (iflags & EXT4_NOATIME_FL)
+ xflags |= FS_XFLAG_NOATIME;
+ if (iflags & EXT4_PROJINHERIT_FL)
+ xflags |= FS_XFLAG_PROJINHERIT;
+ return xflags;
+}
+
+/* Transfer xflags flags to internal */
+static inline unsigned long ext4_xflags_to_iflags(__u32 xflags)
+{
+ unsigned long iflags = 0;
+
+ if (xflags & FS_XFLAG_SYNC)
+ iflags |= EXT4_SYNC_FL;
+ if (xflags & FS_XFLAG_IMMUTABLE)
+ iflags |= EXT4_IMMUTABLE_FL;
+ if (xflags & FS_XFLAG_APPEND)
+ iflags |= EXT4_APPEND_FL;
+ if (xflags & FS_XFLAG_NODUMP)
+ iflags |= EXT4_NODUMP_FL;
+ if (xflags & FS_XFLAG_NOATIME)
+ iflags |= EXT4_NOATIME_FL;
+ if (xflags & FS_XFLAG_PROJINHERIT)
+ iflags |= EXT4_PROJINHERIT_FL;
+
+ return iflags;
+}
+
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -211,11 +429,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
return put_user(flags, (int __user *) arg);
case EXT4_IOC_SETFLAGS: {
- handle_t *handle = NULL;
- int err, migrate = 0;
- struct ext4_iloc iloc;
- unsigned int oldflags, mask, i;
- unsigned int jflag;
+ int err;

if (!inode_owner_or_capable(inode))
return -EACCES;
@@ -229,89 +443,8 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)

flags = ext4_mask_flags(inode->i_mode, flags);

- err = -EPERM;
mutex_lock(&inode->i_mutex);
- /* Is it quota file? Do not allow user to mess with it */
- if (IS_NOQUOTA(inode))
- goto flags_out;
-
- oldflags = ei->i_flags;
-
- /* The JOURNAL_DATA flag is modifiable only by root */
- jflag = flags & EXT4_JOURNAL_DATA_FL;
-
- /*
- * The IMMUTABLE and APPEND_ONLY flags can only be changed by
- * the relevant capability.
- *
- * This test looks nicer. Thanks to Pauline Middelink
- */
- if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
- if (!capable(CAP_LINUX_IMMUTABLE))
- goto flags_out;
- }
-
- /*
- * The JOURNAL_DATA flag can only be changed by
- * the relevant capability.
- */
- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL)) {
- if (!capable(CAP_SYS_RESOURCE))
- goto flags_out;
- }
- if ((flags ^ oldflags) & EXT4_EXTENTS_FL)
- migrate = 1;
-
- if (flags & EXT4_EOFBLOCKS_FL) {
- /* we don't support adding EOFBLOCKS flag */
- if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
- err = -EOPNOTSUPP;
- goto flags_out;
- }
- } else if (oldflags & EXT4_EOFBLOCKS_FL)
- ext4_truncate(inode);
-
- handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
- if (IS_ERR(handle)) {
- err = PTR_ERR(handle);
- goto flags_out;
- }
- if (IS_SYNC(inode))
- ext4_handle_sync(handle);
- err = ext4_reserve_inode_write(handle, inode, &iloc);
- if (err)
- goto flags_err;
-
- for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
- if (!(mask & EXT4_FL_USER_MODIFIABLE))
- continue;
- if (mask & flags)
- ext4_set_inode_flag(inode, i);
- else
- ext4_clear_inode_flag(inode, i);
- }
-
- ext4_set_inode_flags(inode);
- inode->i_ctime = ext4_current_time(inode);
-
- err = ext4_mark_iloc_dirty(handle, inode, &iloc);
-flags_err:
- ext4_journal_stop(handle);
- if (err)
- goto flags_out;
-
- if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
- err = ext4_change_inode_journal_flag(inode, jflag);
- if (err)
- goto flags_out;
- if (migrate) {
- if (flags & EXT4_EXTENTS_FL)
- err = ext4_ext_migrate(inode);
- else
- err = ext4_ind_migrate(inode);
- }
-
-flags_out:
+ err = ext4_ioctl_setflags(inode, flags);
mutex_unlock(&inode->i_mutex);
mnt_drop_write_file(filp);
return err;
@@ -615,7 +748,61 @@ resizefs_out:
}
case EXT4_IOC_PRECACHE_EXTENTS:
return ext4_ext_precache(inode);
+ case EXT4_IOC_FSGETXATTR:
+ {
+ struct fsxattr fa;
+
+ memset(&fa, 0, sizeof(struct fsxattr));

+ ext4_get_inode_flags(ei);
+ fa.fsx_xflags = ext4_iflags_to_xflags(ei->i_flags & EXT4_FL_USER_VISIBLE);
+
+ if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_PROJECT)) {
+ fa.fsx_projid = (__u32)from_kprojid(&init_user_ns,
+ EXT4_I(inode)->i_projid);
+ }
+
+ if (copy_to_user((struct fsxattr __user *)arg,
+ &fa, sizeof(fa)))
+ return -EFAULT;
+ return 0;
+ }
+ case EXT4_IOC_FSSETXATTR:
+ {
+ struct fsxattr fa;
+ int err;
+
+ if (copy_from_user(&fa, (struct fsxattr __user *)arg,
+ sizeof(fa)))
+ return -EFAULT;
+
+ /* Make sure caller has proper permission */
+ if (!inode_owner_or_capable(inode))
+ return -EACCES;
+
+ err = mnt_want_write_file(filp);
+ if (err)
+ return err;
+
+ flags = ext4_xflags_to_iflags(fa.fsx_xflags);
+ flags = ext4_mask_flags(inode->i_mode, flags);
+
+ mutex_lock(&inode->i_mutex);
+ flags = (ei->i_flags & ~EXT4_FL_XFLAG_VISIBLE) |
+ (flags & EXT4_FL_XFLAG_VISIBLE);
+ err = ext4_ioctl_setflags(inode, flags);
+ mutex_unlock(&inode->i_mutex);
+ mnt_drop_write_file(filp);
+ if (err)
+ return err;
+
+ err = ext4_ioctl_setproject(filp, fa.fsx_projid);
+ if (err)
+ return err;
+
+ return 0;
+ }
default:
return -ENOTTY;
}
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 18dc721..64c7ae6 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -36,38 +36,25 @@ struct dioattr {
#endif

/*
- * Structure for XFS_IOC_FSGETXATTR[A] and XFS_IOC_FSSETXATTR.
- */
-#ifndef HAVE_FSXATTR
-struct fsxattr {
- __u32 fsx_xflags; /* xflags field value (get/set) */
- __u32 fsx_extsize; /* extsize field value (get/set)*/
- __u32 fsx_nextents; /* nextents field value (get) */
- __u32 fsx_projid; /* project identifier (get/set) */
- unsigned char fsx_pad[12];
-};
-#endif
-
-/*
* Flags for the bs_xflags/fsx_xflags field
* There should be a one-to-one correspondence between these flags and the
* XFS_DIFLAG_s.
*/
-#define XFS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
-#define XFS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
-#define XFS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
-#define XFS_XFLAG_APPEND 0x00000010 /* all writes append */
-#define XFS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
-#define XFS_XFLAG_NOATIME 0x00000040 /* do not update access time */
-#define XFS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
-#define XFS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
-#define XFS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
-#define XFS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
-#define XFS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
-#define XFS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
-#define XFS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
-#define XFS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
-#define XFS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
+#define XFS_XFLAG_REALTIME FS_XFLAG_REALTIME /* data in realtime volume */
+#define XFS_XFLAG_PREALLOC FS_XFLAG_PREALLOC /* preallocated file extents */
+#define XFS_XFLAG_IMMUTABLE FS_XFLAG_IMMUTABLE /* file cannot be modified */
+#define XFS_XFLAG_APPEND FS_XFLAG_APPEND /* all writes append */
+#define XFS_XFLAG_SYNC FS_XFLAG_SYNC /* all writes synchronous */
+#define XFS_XFLAG_NOATIME FS_XFLAG_NOATIME /* do not update access time */
+#define XFS_XFLAG_NODUMP FS_XFLAG_NODUMP /* do not include in backups */
+#define XFS_XFLAG_RTINHERIT FS_XFLAG_RTINHERIT /* create with rt bit set */
+#define XFS_XFLAG_PROJINHERIT FS_XFLAG_PROJINHERIT /* create with parents projid */
+#define XFS_XFLAG_NOSYMLINKS FS_XFLAG_NOSYMLINKS /* disallow symlink creation */
+#define XFS_XFLAG_EXTSIZE FS_XFLAG_EXTSIZE /* extent size allocator hint */
+#define XFS_XFLAG_EXTSZINHERIT FS_XFLAG_EXTSZINHERIT /* inherit inode extent size */
+#define XFS_XFLAG_NODEFRAG FS_XFLAG_NODEFRAG /* do not defragment */
+#define XFS_XFLAG_FILESTREAM FS_XFLAG_FILESTREAM /* use filestream allocator */
+#define XFS_XFLAG_HASATTR FS_XFLAG_HASATTR /* no DIFLAG for this */

/*
* Structure for XFS_IOC_GETBMAP.
@@ -503,8 +490,8 @@ typedef struct xfs_swapext
#define XFS_IOC_ALLOCSP _IOW ('X', 10, struct xfs_flock64)
#define XFS_IOC_FREESP _IOW ('X', 11, struct xfs_flock64)
#define XFS_IOC_DIOINFO _IOR ('X', 30, struct dioattr)
-#define XFS_IOC_FSGETXATTR _IOR ('X', 31, struct fsxattr)
-#define XFS_IOC_FSSETXATTR _IOW ('X', 32, struct fsxattr)
+#define XFS_IOC_FSGETXATTR FS_IOC_FSGETXATTR
+#define XFS_IOC_FSSETXATTR FS_IOC_FSSETXATTR
#define XFS_IOC_ALLOCSP64 _IOW ('X', 36, struct xfs_flock64)
#define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
#define XFS_IOC_GETBMAP _IOWR('X', 38, struct getbmap)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index fcbf647..69dda62 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -58,6 +58,36 @@ struct inodes_stat_t {
long dummy[5]; /* padding for sysctl ABI compatibility */
};

+/*
+ * Structure for FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR.
+ */
+struct fsxattr {
+ __u32 fsx_xflags; /* xflags field value (get/set) */
+ __u32 fsx_extsize; /* extsize field value (get/set)*/
+ __u32 fsx_nextents; /* nextents field value (get) */
+ __u32 fsx_projid; /* project identifier (get/set) */
+ unsigned char fsx_pad[12];
+};
+
+/*
+ * Flags for the fsx_xflags field
+ */
+#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
+#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
+#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
+#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
+#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
+#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
+#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
+#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
+#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
+#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
+#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
+#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
+#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
+#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
+#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
+

#define NR_FILE 8192 /* this can well be larger on a larger system */

@@ -163,6 +193,8 @@ struct inodes_stat_t {
#define FS_IOC_GETVERSION _IOR('v', 1, long)
#define FS_IOC_SETVERSION _IOW('v', 2, long)
#define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
+#define FS_IOC_FSGETXATTR _IOR('X', 31, struct fsxattr)
+#define FS_IOC_FSSETXATTR _IOW('X', 32, struct fsxattr)
#define FS_IOC32_GETFLAGS _IOR('f', 1, int)
#define FS_IOC32_SETFLAGS _IOW('f', 2, int)
#define FS_IOC32_GETVERSION _IOR('v', 1, int)
--
1.7.1


2015-04-09 15:14:35

by Li Xi

[permalink] [raw]
Subject: [v12 3/5] ext4: adds project quota support

This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

Signed-off-by: Li Xi <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 6 +++-
fs/ext4/super.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7acb2da..8ddc723 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1169,7 +1169,8 @@ struct ext4_super_block {
__le32 s_overhead_clusters; /* overhead blocks/clusters in fs */
__le32 s_backup_bgs[2]; /* groups with sparse_super2 SBs */
__u8 s_encrypt_algos[4]; /* Encryption algorithms in use */
- __le32 s_reserved[105]; /* Padding to the end of the block */
+ __le32 s_prj_quota_inum; /* inode for tracking project quota */
+ __le32 s_reserved[104]; /* Padding to the end of the block */
__le32 s_checksum; /* crc32c(superblock) */
};

@@ -1184,7 +1185,7 @@ struct ext4_super_block {
#define EXT4_MF_FS_ABORTED 0x0002 /* Fatal error detected */

/* Number of quota types we support */
-#define EXT4_MAXQUOTAS 2
+#define EXT4_MAXQUOTAS 3

/*
* fourth extended-fs super-block data in memory
@@ -1376,6 +1377,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
ino == EXT4_BOOT_LOADER_INO ||
ino == EXT4_JOURNAL_INO ||
ino == EXT4_RESIZE_INO ||
+ ino == le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum) ||
(ino >= EXT4_FIRST_INO(sb) &&
ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 04c6cc3..476e46f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1036,8 +1036,8 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
}

#ifdef CONFIG_QUOTA
-#define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
-#define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
+static char *quotatypes[] = INITQFNAMES;
+#define QTYPE2NAME(t) (quotatypes[t])

static int ext4_write_dquot(struct dquot *dquot);
static int ext4_acquire_dquot(struct dquot *dquot);
@@ -3944,7 +3944,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sb->s_qcop = &ext4_qctl_sysfile_operations;
else
sb->s_qcop = &ext4_qctl_operations;
- sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP;
+ sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
#endif
memcpy(sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));

@@ -5040,6 +5040,46 @@ restore_opts:
return err;
}

+static int ext4_statfs_project(struct super_block *sb,
+ kprojid_t projid, struct kstatfs *buf)
+{
+ struct kqid qid;
+ struct dquot *dquot;
+ u64 limit;
+ u64 curblock;
+
+ qid = make_kqid_projid(projid);
+ dquot = dqget(sb, qid);
+ if (!dquot)
+ return -ESRCH;
+ spin_lock(&dq_data_lock);
+
+ limit = dquot->dq_dqb.dqb_bsoftlimit ?
+ dquot->dq_dqb.dqb_bsoftlimit :
+ dquot->dq_dqb.dqb_bhardlimit;
+ if (limit && buf->f_blocks * buf->f_bsize > limit) {
+ curblock = dquot->dq_dqb.dqb_curspace / buf->f_bsize;
+ buf->f_blocks = limit / buf->f_bsize;
+ buf->f_bfree = buf->f_bavail =
+ (buf->f_blocks > curblock) ?
+ (buf->f_blocks - curblock) : 0;
+ }
+
+ limit = dquot->dq_dqb.dqb_isoftlimit ?
+ dquot->dq_dqb.dqb_isoftlimit :
+ dquot->dq_dqb.dqb_ihardlimit;
+ if (limit && buf->f_files > limit) {
+ buf->f_files = limit;
+ buf->f_ffree =
+ (buf->f_files > dquot->dq_dqb.dqb_curinodes) ?
+ (buf->f_files - dquot->dq_dqb.dqb_curinodes) : 0;
+ }
+
+ spin_unlock(&dq_data_lock);
+ dqput(dquot);
+ return 0;
+}
+
static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
@@ -5048,6 +5088,7 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
ext4_fsblk_t overhead = 0, resv_blocks;
u64 fsid;
s64 bfree;
+ struct inode *inode = dentry->d_inode;
resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));

if (!test_opt(sb, MINIX_DF))
@@ -5072,6 +5113,9 @@ static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;

+ if (ext4_test_inode_flag(inode, EXT4_INODE_PROJINHERIT) &&
+ sb_has_quota_limits_enabled(sb, PRJQUOTA))
+ ext4_statfs_project(sb, EXT4_I(inode)->i_projid, buf);
return 0;
}

@@ -5236,7 +5280,8 @@ static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
struct inode *qf_inode;
unsigned long qf_inums[EXT4_MAXQUOTAS] = {
le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
- le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
};

BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_QUOTA));
@@ -5264,7 +5309,8 @@ static int ext4_enable_quotas(struct super_block *sb)
int type, err = 0;
unsigned long qf_inums[EXT4_MAXQUOTAS] = {
le32_to_cpu(EXT4_SB(sb)->s_es->s_usr_quota_inum),
- le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum)
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_grp_quota_inum),
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_prj_quota_inum)
};

sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;
--
1.7.1


2015-04-10 23:37:14

by Andreas Dilger

[permalink] [raw]
Subject: Re: [v12 2/5] ext4: adds project ID support

On Apr 9, 2015, at 9:14 AM, Li Xi <[email protected]> wrote:
>
> This patch adds a new internal field of ext4 inode to save project
> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> inheriting project ID from parent directory.
>
> Signed-off-by: Li Xi <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> fs/ext4/ext4.h | 21 +++++++++++++++++----
> fs/ext4/ialloc.c | 5 +++++
> fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
> fs/ext4/namei.c | 18 ++++++++++++++++++
> fs/ext4/super.c | 1 +
> include/uapi/linux/fs.h | 1 +
> 6 files changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 7fec2ef..7acb2da 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -378,16 +378,18 @@ struct flex_groups {
> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
> +#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>
> -#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
> -#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
> +#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
> +#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */

I just noticed that EXT4_INLINE_DATA_FL isn't in EXT4_FL_USER_VISIBLE,
but it probably should be. If this patch is refreshed it wouldn't hurt
to add that flag also.

> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
> - EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
> + EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
> + EXT4_PROJINHERIT_FL)
>
> /* Flags that are appropriate for regular files (all but dir-specific ones). */
> #define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
> @@ -435,6 +437,7 @@ enum {
> EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
> EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
> EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
> + EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
> EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
> };
>
> @@ -684,6 +687,7 @@ struct ext4_inode {
> __le32 i_crtime; /* File Creation time */
> __le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
> __le32 i_version_hi; /* high 32 bits for 64-bit version */
> + __le32 i_projid; /* Project ID */
> };
>
> struct move_extent {
> @@ -939,6 +943,7 @@ struct ext4_inode_info {
>
> /* Precomputed uuid+inum+igen checksum for seeding inode checksums */
> __u32 i_csum_seed;
> + kprojid_t i_projid;
> };
>
> /*
> @@ -1531,6 +1536,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> */
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> +#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
>
> #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
> #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
> @@ -1581,7 +1587,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
> EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> - EXT4_FEATURE_RO_COMPAT_QUOTA)
> + EXT4_FEATURE_RO_COMPAT_QUOTA |\
> + EXT4_FEATURE_RO_COMPAT_PROJECT)
>
> /*
> * Default values for user and/or group using reserved blocks
> @@ -1589,6 +1596,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> #define EXT4_DEF_RESUID 0
> #define EXT4_DEF_RESGID 0
>
> +/*
> + * Default project ID
> + */
> +#define EXT4_DEF_PROJID 0
> +
> #define EXT4_DEF_INODE_READAHEAD_BLKS 32
>
> /*
> @@ -2141,6 +2153,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
> loff_t lstart, loff_t lend);
> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
> +extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
> extern void ext4_da_update_reserve_space(struct inode *inode,
> int used, int quota_claim);
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index ac644c3..10ca9dd 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -756,6 +756,11 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> inode->i_gid = dir->i_gid;
> } else
> inode_init_owner(inode, dir, mode);
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
> + ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
> + ei->i_projid = EXT4_I(dir)->i_projid;
> + else
> + ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
> dquot_initialize(inode);
>
> if (!goal)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 4df6d01..6e4833f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3870,6 +3870,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
> EXT4_I(inode)->i_inline_off = 0;
> }
>
> +int ext4_get_projid(struct inode *inode, kprojid_t *projid)
> +{
> + if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
> + return -EOPNOTSUPP;
> + *projid = EXT4_I(inode)->i_projid;
> + return 0;
> +}
> +
> struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> {
> struct ext4_iloc iloc;
> @@ -3881,6 +3889,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> int block;
> uid_t i_uid;
> gid_t i_gid;
> + projid_t i_projid;
>
> inode = iget_locked(sb, ino);
> if (!inode)
> @@ -3930,12 +3939,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
> inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
> i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
> + i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
> + else
> + i_projid = EXT4_DEF_PROJID;
> +
> if (!(test_opt(inode->i_sb, NO_UID32))) {
> i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
> i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
> }
> i_uid_write(inode, i_uid);
> i_gid_write(inode, i_gid);
> + ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
> set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
>
> ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
> @@ -4165,6 +4180,7 @@ static int ext4_do_update_inode(handle_t *handle,
> int need_datasync = 0, set_large_file = 0;
> uid_t i_uid;
> gid_t i_gid;
> + projid_t i_projid;
>
> spin_lock(&ei->i_raw_lock);
>
> @@ -4177,6 +4193,7 @@ static int ext4_do_update_inode(handle_t *handle,
> raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> i_uid = i_uid_read(inode);
> i_gid = i_gid_read(inode);
> + i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> if (!(test_opt(inode->i_sb, NO_UID32))) {
> raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> @@ -4256,6 +4273,18 @@ static int ext4_do_update_inode(handle_t *handle,
> }
> }
>
> + BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
> + EXT4_FEATURE_RO_COMPAT_PROJECT) &&
> + i_projid != EXT4_DEF_PROJID);
> + if (i_projid != EXT4_DEF_PROJID &&
> + (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
> + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
> + spin_unlock(&ei->i_raw_lock);
> + err = -EFBIG;

I mentioned the following back on v8 of the patch, but I'll write it again:

I don't think -EFBIG "File too large" is a good error here. Better would
be -EOPNOTSUPP for EXT4_INODE_SIZE() <= EXT4_GOOD_OLD_INODE_SIZE, since
this will never work for this filesystem. For the !EXT4_FITS_IN_INODE()
case -EOVERFLOW would be better?

Also, returning the error from ext4_mark_iloc_dirty->ext4_do_update_inode()
is a bit late in the game, since the inode has already been modified at
this point. That callpath typically only returned an error if the disk
was bad or the journal aborted (again normally because the disk was bad),
so at that point the dirty in-memory data was lost anyway.

It would be better to check this in the caller before the inode is changed
so that an error can be returned without (essentially) corrupting the
in-memory state. Since the projid should only be set for new inodes (which
will always have enough space, assuming RO_COMPAT_PROJECT cannot be set on
filesystems with 128-byte inodes), or in case of rename into a project
directory, it shouldn't be too big a change.

> + goto out_brelse;
> + }
> + raw_inode->i_projid = cpu_to_le32(i_projid);
> +
> ext4_inode_csum_set(inode, raw_inode, ei);
>
> spin_unlock(&ei->i_raw_lock);
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 2291923..63a9623 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
> if (inode->i_nlink >= EXT4_LINK_MAX)
> return -EMLINK;
>
> + if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
> + (!projid_eq(EXT4_I(dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)))
> + return -EXDEV;
> +
> dquot_initialize(dir);
>
> retry:
> @@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> int credits;
> u8 old_file_type;
>
> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
> + (!projid_eq(EXT4_I(new_dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)))
> + return -EXDEV;
> +
> dquot_initialize(old.dir);
> dquot_initialize(new.dir);
>
> @@ -3395,6 +3405,14 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> u8 new_file_type;
> int retval;
>
> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT) &&
> + !projid_eq(EXT4_I(new_dir)->i_projid,
> + EXT4_I(old_dentry->d_inode)->i_projid)) ||
> + (ext4_test_inode_flag(old_dir, EXT4_INODE_PROJINHERIT) &&
> + !projid_eq(EXT4_I(old_dir)->i_projid,
> + EXT4_I(new_dentry->d_inode)->i_projid)))
> + return -EXDEV;
> +
> dquot_initialize(old.dir);
> dquot_initialize(new.dir);
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index bff3427..04c6cc3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1073,6 +1073,7 @@ static const struct dquot_operations ext4_quota_operations = {
> .write_info = ext4_write_info,
> .alloc_dquot = dquot_alloc,
> .destroy_dquot = dquot_destroy,
> + .get_projid = ext4_get_projid,
> };
>
> static const struct quotactl_ops ext4_qctl_operations = {
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 3735fa0..fcbf647 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -195,6 +195,7 @@ struct inodes_stat_t {
> #define FS_EXTENT_FL 0x00080000 /* Extents */
> #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
> #define FS_NOCOW_FL 0x00800000 /* Do not cow file */
> +#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
>
> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
> --
> 1.7.1
>


Cheers, Andreas

2015-04-12 13:49:11

by Alban Crequy

[permalink] [raw]
Subject: Re: [v12 1/5] vfs: adds general codes to enforces project quota limits

On 9 April 2015 at 17:14, Li Xi <[email protected]> wrote:
> This patch adds support for a new quota type PRJQUOTA for project quota
> enforcement. Also a new method get_projid() is added into dquot_operations
> structure.
>(...)
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 2aa4151..33b30b1 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -30,7 +30,10 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
> case Q_XGETQSTATV:
> case Q_XQUOTASYNC:
> break;
> - /* allow to query information for dquots we "own" */
> + /*
> + * allow to query information for dquots we "own"
> + * always allow querying project quota
> + */

I would add a precision in the comment:

always allow querying project quota when the project id is mapped in
the current user namespace.

id is the id in the current user namespace. So quotas for unmapped
users, unmapped groups or unmapped projects cannot be queried.

2015-04-12 15:36:53

by Alban Crequy

[permalink] [raw]
Subject: Re: [v12 0/5] ext4: add project quota support

On 9 April 2015 at 17:14, Li Xi <[email protected]> wrote:
> The following patches propose an implementation of project quota
> support for ext4. A project is an aggregate of unrelated inodes
> which might scatter in different directories. Inodes that belong
> to the same project possess an identical identification i.e.
> 'project ID', just like every inode has its user/group
> identification. The following patches add project quota as
> supplement to the former uer/group quota types.
> (...)

Thanks for this work, I would like to use this for containers. I am
adding containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org in Cc.

To make sure I understand correctly, I will describe the configuration
I have in mind and hopefully someone can tell me if it makes sense.

Containers created by rkt (https://github.com/coreos/rkt) use an
overlay filesystem as root and the lowerdir/upperdir directories are
based on an ext4 filesystem outside of the container's reach. The
lowerdir is the base image, and several container instances can
potentially use the same lowerdir. Each container has its upperdir
containing their changes.

With your patch set, I could assign a different projid to the upperdir
of each container with a specific quota. Then it will limit how much
the container will be able to write. I don't know if the overlay's
workdir would need to have projid too.

When a quota warning is sent on netlink, it is received only in the
initial user namespace and the processes in a different user namespace
will not be able to receive the netlink warnings. The user will only
receive a warning through the control terminal.

Since rkt does not use user namespaces yet, a rkt container could
unfortunately receive quota warnings through netlink concerning the
host or other containers. Or is it restricted to init_net?

quotactl() can be used in a rkt container if the proccesses in the
container can guess somehow which block device is used by the
filesystem hosting the overlay's upperdir and if they can mknod it
somewhere. Usually, containers don't restrict mknod but just restrict
read-write access through the device cgroup. The read-write access is
irrelevant for quotactl(): quotactl() just check that the device node
exists and that it is not on a nodev mount. The nodev check does not
restrict containers here because they usually have a /dev mounted as
tmpfs without the nodev option.

Containers that don't use user namespaces (so no projid mapping) would
be able to query quotas for projid assigned to other containers
(unfortunately). They would be able to change the quota of other
containers if they are privileged enough to be given CAP_SYS_RESOURCE.

Containers using user namespaces would not be able to change any quota
config because they don't have CAP_SYS_RESOURCE in the init user
namespace. If they are configured with a proper projid mapping, they
would only be able to query the projid they are assigned (they could
guess which projid to query by looking at /proc/self/projid_map).

Do you know if someone is working on the documentation? It would be
nice if filesystems/quota.txt could say who can receive the quota
warnings on netlink (which namespace) and if it could give some
information about projid. But maybe this belong to the proc(5) and
user_namespaces(7) manpages as well.

Is there any suggestions how to allocate projid in userspace?
Something like /etc/subprojid similar to /etc/subuid?

Thanks!
Alban

2015-04-13 16:33:30

by Li Xi

[permalink] [raw]
Subject: Re: [v12 2/5] ext4: adds project ID support

Sorry Andreas, I will update the patch soon.

On Fri, Apr 10, 2015 at 5:37 PM, Andreas Dilger <[email protected]> wrote:
> On Apr 9, 2015, at 9:14 AM, Li Xi <[email protected]> wrote:
>>
>> This patch adds a new internal field of ext4 inode to save project
>> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
>> inheriting project ID from parent directory.
>>
>> Signed-off-by: Li Xi <[email protected]>
>> Reviewed-by: Jan Kara <[email protected]>
>> ---
>> fs/ext4/ext4.h | 21 +++++++++++++++++----
>> fs/ext4/ialloc.c | 5 +++++
>> fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++
>> fs/ext4/namei.c | 18 ++++++++++++++++++
>> fs/ext4/super.c | 1 +
>> include/uapi/linux/fs.h | 1 +
>> 6 files changed, 71 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 7fec2ef..7acb2da 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -378,16 +378,18 @@ struct flex_groups {
>> #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
>> #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
>> #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
>> +#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
>> #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
>>
>> -#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
>> -#define EXT4_FL_USER_MODIFIABLE 0x004380FF /* User modifiable flags */
>> +#define EXT4_FL_USER_VISIBLE 0x204BDFFF /* User visible flags */
>> +#define EXT4_FL_USER_MODIFIABLE 0x204380FF /* User modifiable flags */
>
> I just noticed that EXT4_INLINE_DATA_FL isn't in EXT4_FL_USER_VISIBLE,
> but it probably should be. If this patch is refreshed it wouldn't hurt
> to add that flag also.
>
>> /* Flags that should be inherited by new inodes from their parent. */
>> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
>> EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
>> EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
>> - EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
>> + EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL |\
>> + EXT4_PROJINHERIT_FL)
>>
>> /* Flags that are appropriate for regular files (all but dir-specific ones). */
>> #define EXT4_REG_FLMASK (~(EXT4_DIRSYNC_FL | EXT4_TOPDIR_FL))
>> @@ -435,6 +437,7 @@ enum {
>> EXT4_INODE_EA_INODE = 21, /* Inode used for large EA */
>> EXT4_INODE_EOFBLOCKS = 22, /* Blocks allocated beyond EOF */
>> EXT4_INODE_INLINE_DATA = 28, /* Data in inode. */
>> + EXT4_INODE_PROJINHERIT = 29, /* Create with parents projid */
>> EXT4_INODE_RESERVED = 31, /* reserved for ext4 lib */
>> };
>>
>> @@ -684,6 +687,7 @@ struct ext4_inode {
>> __le32 i_crtime; /* File Creation time */
>> __le32 i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
>> __le32 i_version_hi; /* high 32 bits for 64-bit version */
>> + __le32 i_projid; /* Project ID */
>> };
>>
>> struct move_extent {
>> @@ -939,6 +943,7 @@ struct ext4_inode_info {
>>
>> /* Precomputed uuid+inum+igen checksum for seeding inode checksums */
>> __u32 i_csum_seed;
>> + kprojid_t i_projid;
>> };
>>
>> /*
>> @@ -1531,6 +1536,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> */
>> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
>> #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
>> +#define EXT4_FEATURE_RO_COMPAT_PROJECT 0x2000
>>
>> #define EXT4_FEATURE_INCOMPAT_COMPRESSION 0x0001
>> #define EXT4_FEATURE_INCOMPAT_FILETYPE 0x0002
>> @@ -1581,7 +1587,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
>> EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
>> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
>> - EXT4_FEATURE_RO_COMPAT_QUOTA)
>> + EXT4_FEATURE_RO_COMPAT_QUOTA |\
>> + EXT4_FEATURE_RO_COMPAT_PROJECT)
>>
>> /*
>> * Default values for user and/or group using reserved blocks
>> @@ -1589,6 +1596,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> #define EXT4_DEF_RESUID 0
>> #define EXT4_DEF_RESGID 0
>>
>> +/*
>> + * Default project ID
>> + */
>> +#define EXT4_DEF_PROJID 0
>> +
>> #define EXT4_DEF_INODE_READAHEAD_BLKS 32
>>
>> /*
>> @@ -2141,6 +2153,7 @@ extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
>> loff_t lstart, loff_t lend);
>> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
>> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
>> +extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
>> extern void ext4_da_update_reserve_space(struct inode *inode,
>> int used, int quota_claim);
>>
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index ac644c3..10ca9dd 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -756,6 +756,11 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>> inode->i_gid = dir->i_gid;
>> } else
>> inode_init_owner(inode, dir, mode);
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT) &&
>> + ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
>> + ei->i_projid = EXT4_I(dir)->i_projid;
>> + else
>> + ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
>> dquot_initialize(inode);
>>
>> if (!goal)
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 4df6d01..6e4833f 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3870,6 +3870,14 @@ static inline void ext4_iget_extra_inode(struct inode *inode,
>> EXT4_I(inode)->i_inline_off = 0;
>> }
>>
>> +int ext4_get_projid(struct inode *inode, kprojid_t *projid)
>> +{
>> + if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
>> + return -EOPNOTSUPP;
>> + *projid = EXT4_I(inode)->i_projid;
>> + return 0;
>> +}
>> +
>> struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> {
>> struct ext4_iloc iloc;
>> @@ -3881,6 +3889,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> int block;
>> uid_t i_uid;
>> gid_t i_gid;
>> + projid_t i_projid;
>>
>> inode = iget_locked(sb, ino);
>> if (!inode)
>> @@ -3930,12 +3939,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
>> inode->i_mode = le16_to_cpu(raw_inode->i_mode);
>> i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
>> i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
>> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_PROJECT))
>> + i_projid = (projid_t)le32_to_cpu(raw_inode->i_projid);
>> + else
>> + i_projid = EXT4_DEF_PROJID;
>> +
>> if (!(test_opt(inode->i_sb, NO_UID32))) {
>> i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
>> i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
>> }
>> i_uid_write(inode, i_uid);
>> i_gid_write(inode, i_gid);
>> + ei->i_projid = make_kprojid(&init_user_ns, i_projid);;
>> set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
>>
>> ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
>> @@ -4165,6 +4180,7 @@ static int ext4_do_update_inode(handle_t *handle,
>> int need_datasync = 0, set_large_file = 0;
>> uid_t i_uid;
>> gid_t i_gid;
>> + projid_t i_projid;
>>
>> spin_lock(&ei->i_raw_lock);
>>
>> @@ -4177,6 +4193,7 @@ static int ext4_do_update_inode(handle_t *handle,
>> raw_inode->i_mode = cpu_to_le16(inode->i_mode);
>> i_uid = i_uid_read(inode);
>> i_gid = i_gid_read(inode);
>> + i_projid = from_kprojid(&init_user_ns, ei->i_projid);
>> if (!(test_opt(inode->i_sb, NO_UID32))) {
>> raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
>> raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
>> @@ -4256,6 +4273,18 @@ static int ext4_do_update_inode(handle_t *handle,
>> }
>> }
>>
>> + BUG_ON(!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
>> + EXT4_FEATURE_RO_COMPAT_PROJECT) &&
>> + i_projid != EXT4_DEF_PROJID);
>> + if (i_projid != EXT4_DEF_PROJID &&
>> + (EXT4_INODE_SIZE(inode->i_sb) <= EXT4_GOOD_OLD_INODE_SIZE ||
>> + (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)))) {
>> + spin_unlock(&ei->i_raw_lock);
>> + err = -EFBIG;
>
> I mentioned the following back on v8 of the patch, but I'll write it again:
>
> I don't think -EFBIG "File too large" is a good error here. Better would
> be -EOPNOTSUPP for EXT4_INODE_SIZE() <= EXT4_GOOD_OLD_INODE_SIZE, since
> this will never work for this filesystem. For the !EXT4_FITS_IN_INODE()
> case -EOVERFLOW would be better?
>
> Also, returning the error from ext4_mark_iloc_dirty->ext4_do_update_inode()
> is a bit late in the game, since the inode has already been modified at
> this point. That callpath typically only returned an error if the disk
> was bad or the journal aborted (again normally because the disk was bad),
> so at that point the dirty in-memory data was lost anyway.
>
> It would be better to check this in the caller before the inode is changed
> so that an error can be returned without (essentially) corrupting the
> in-memory state. Since the projid should only be set for new inodes (which
> will always have enough space, assuming RO_COMPAT_PROJECT cannot be set on
> filesystems with 128-byte inodes), or in case of rename into a project
> directory, it shouldn't be too big a change.
>
>> + goto out_brelse;
>> + }
>> + raw_inode->i_projid = cpu_to_le32(i_projid);
>> +
>> ext4_inode_csum_set(inode, raw_inode, ei);
>>
>> spin_unlock(&ei->i_raw_lock);
>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
>> index 2291923..63a9623 100644
>> --- a/fs/ext4/namei.c
>> +++ b/fs/ext4/namei.c
>> @@ -2938,6 +2938,11 @@ static int ext4_link(struct dentry *old_dentry,
>> if (inode->i_nlink >= EXT4_LINK_MAX)
>> return -EMLINK;
>>
>> + if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
>> + (!projid_eq(EXT4_I(dir)->i_projid,
>> + EXT4_I(old_dentry->d_inode)->i_projid)))
>> + return -EXDEV;
>> +
>> dquot_initialize(dir);
>>
>> retry:
>> @@ -3217,6 +3222,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>> int credits;
>> u8 old_file_type;
>>
>> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT)) &&
>> + (!projid_eq(EXT4_I(new_dir)->i_projid,
>> + EXT4_I(old_dentry->d_inode)->i_projid)))
>> + return -EXDEV;
>> +
>> dquot_initialize(old.dir);
>> dquot_initialize(new.dir);
>>
>> @@ -3395,6 +3405,14 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
>> u8 new_file_type;
>> int retval;
>>
>> + if ((ext4_test_inode_flag(new_dir, EXT4_INODE_PROJINHERIT) &&
>> + !projid_eq(EXT4_I(new_dir)->i_projid,
>> + EXT4_I(old_dentry->d_inode)->i_projid)) ||
>> + (ext4_test_inode_flag(old_dir, EXT4_INODE_PROJINHERIT) &&
>> + !projid_eq(EXT4_I(old_dir)->i_projid,
>> + EXT4_I(new_dentry->d_inode)->i_projid)))
>> + return -EXDEV;
>> +
>> dquot_initialize(old.dir);
>> dquot_initialize(new.dir);
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index bff3427..04c6cc3 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1073,6 +1073,7 @@ static const struct dquot_operations ext4_quota_operations = {
>> .write_info = ext4_write_info,
>> .alloc_dquot = dquot_alloc,
>> .destroy_dquot = dquot_destroy,
>> + .get_projid = ext4_get_projid,
>> };
>>
>> static const struct quotactl_ops ext4_qctl_operations = {
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index 3735fa0..fcbf647 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -195,6 +195,7 @@ struct inodes_stat_t {
>> #define FS_EXTENT_FL 0x00080000 /* Extents */
>> #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
>> #define FS_NOCOW_FL 0x00800000 /* Do not cow file */
>> +#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
>> #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
>>
>> #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
>> --
>> 1.7.1
>>
>
>
> Cheers, Andreas
>
>
>
>
>

2015-04-14 08:21:15

by Jan Kara

[permalink] [raw]
Subject: Re: [v12 0/5] ext4: add project quota support

On Sun 12-04-15 17:36:53, Alban Crequy wrote:
> On 9 April 2015 at 17:14, Li Xi <[email protected]> wrote:
> > The following patches propose an implementation of project quota
> > support for ext4. A project is an aggregate of unrelated inodes
> > which might scatter in different directories. Inodes that belong
> > to the same project possess an identical identification i.e.
> > 'project ID', just like every inode has its user/group
> > identification. The following patches add project quota as
> > supplement to the former uer/group quota types.
> > (...)
>
> Thanks for this work, I would like to use this for containers. I am
> adding containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org in Cc.
>
> To make sure I understand correctly, I will describe the configuration
> I have in mind and hopefully someone can tell me if it makes sense.
>
> Containers created by rkt (https://github.com/coreos/rkt) use an
> overlay filesystem as root and the lowerdir/upperdir directories are
> based on an ext4 filesystem outside of the container's reach. The
> lowerdir is the base image, and several container instances can
> potentially use the same lowerdir. Each container has its upperdir
> containing their changes.
>
> With your patch set, I could assign a different projid to the upperdir
> of each container with a specific quota. Then it will limit how much
> the container will be able to write. I don't know if the overlay's
> workdir would need to have projid too.
I don't think overlay's workdir needs project id. Limits will be simply
checked when storing data into upperdir by overlayfs. Overlayfs will get
EDQUOT which it will report back into the user.

> When a quota warning is sent on netlink, it is received only in the
> initial user namespace and the processes in a different user namespace
> will not be able to receive the netlink warnings. The user will only
> receive a warning through the control terminal.
So I don't know much about namespaces but I don't see how quota netlink
messages would be connected with *user* namespaces. But you are right that
quota netlink messages will contain ID of the violator mapped into init
user namespace so it won't make sense to processes in other user namespaces
even if they were able to receive it.

> Since rkt does not use user namespaces yet, a rkt container could
> unfortunately receive quota warnings through netlink concerning the
> host or other containers. Or is it restricted to init_net?
Quota netlink messages are sent only in init_net namespace (since quota
netlink protocol wasn't made namespace aware). So this shouldn't be an
issue.

> quotactl() can be used in a rkt container if the proccesses in the
> container can guess somehow which block device is used by the
> filesystem hosting the overlay's upperdir and if they can mknod it
> somewhere. Usually, containers don't restrict mknod but just restrict
> read-write access through the device cgroup. The read-write access is
> irrelevant for quotactl(): quotactl() just check that the device node
> exists and that it is not on a nodev mount. The nodev check does not
> restrict containers here because they usually have a /dev mounted as
> tmpfs without the nodev option.
Correct. This raises a somewhat unrelated question: Does this mean that a
container is able to mount arbitrary block device? Because also there we
just pass a device path to the kernel...

> Containers that don't use user namespaces (so no projid mapping) would
> be able to query quotas for projid assigned to other containers
> (unfortunately). They would be able to change the quota of other
> containers if they are privileged enough to be given CAP_SYS_RESOURCE.
Yes.

> Containers using user namespaces would not be able to change any quota
> config because they don't have CAP_SYS_RESOURCE in the init user
> namespace. If they are configured with a proper projid mapping, they
> would only be able to query the projid they are assigned (they could
> guess which projid to query by looking at /proc/self/projid_map).
Yes.

> Do you know if someone is working on the documentation? It would be
> nice if filesystems/quota.txt could say who can receive the quota
> warnings on netlink (which namespace) and if it could give some
I have added that.

> information about projid. But maybe this belong to the proc(5) and
> user_namespaces(7) manpages as well.
Project ID in VFS quotas is fairly new thing. Once ext4 gains support for
it, I can add some documentation.

> Is there any suggestions how to allocate projid in userspace?
> Something like /etc/subprojid similar to /etc/subuid?
I guess you need some coordination between namespaces? I only know that
traditionally xfsprogs use /etc/projid for name->project id translation
and /etc/projects contain roots of directory trees for which you wish to
maintain directory quota together with project ids for each of the trees.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-04-14 10:07:50

by Alban Crequy

[permalink] [raw]
Subject: Re: [v12 0/5] ext4: add project quota support

On Tue, Apr 14, 2015 at 10:21 AM, Jan Kara <[email protected]> wrote:
> On Sun 12-04-15 17:36:53, Alban Crequy wrote:
>> On 9 April 2015 at 17:14, Li Xi <[email protected]> wrote:
>> > The following patches propose an implementation of project quota
>> > support for ext4. A project is an aggregate of unrelated inodes
>> > which might scatter in different directories. Inodes that belong
>> > to the same project possess an identical identification i.e.
>> > 'project ID', just like every inode has its user/group
>> > identification. The following patches add project quota as
>> > supplement to the former uer/group quota types.
>> > (...)
>>
>> Thanks for this work, I would like to use this for containers. I am
>> adding containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org in Cc.
>>
>> To make sure I understand correctly, I will describe the configuration
>> I have in mind and hopefully someone can tell me if it makes sense.
>>
>> Containers created by rkt (https://github.com/coreos/rkt) use an
>> overlay filesystem as root and the lowerdir/upperdir directories are
>> based on an ext4 filesystem outside of the container's reach. The
>> lowerdir is the base image, and several container instances can
>> potentially use the same lowerdir. Each container has its upperdir
>> containing their changes.
>>
>> With your patch set, I could assign a different projid to the upperdir
>> of each container with a specific quota. Then it will limit how much
>> the container will be able to write. I don't know if the overlay's
>> workdir would need to have projid too.
> I don't think overlay's workdir needs project id. Limits will be simply
> checked when storing data into upperdir by overlayfs. Overlayfs will get
> EDQUOT which it will report back into the user.

Noted, thanks.

>> When a quota warning is sent on netlink, it is received only in the
>> initial user namespace and the processes in a different user namespace
>> will not be able to receive the netlink warnings. The user will only
>> receive a warning through the control terminal.
> So I don't know much about namespaces but I don't see how quota netlink
> messages would be connected with *user* namespaces. But you are right that
> quota netlink messages will contain ID of the violator mapped into init
> user namespace so it won't make sense to processes in other user namespaces
> even if they were able to receive it.
>
>> Since rkt does not use user namespaces yet, a rkt container could
>> unfortunately receive quota warnings through netlink concerning the
>> host or other containers. Or is it restricted to init_net?
> Quota netlink messages are sent only in init_net namespace (since quota
> netlink protocol wasn't made namespace aware). So this shouldn't be an
> issue.

You're right, I misread it, it references the init network namespace
and not the user namespace:

fs/quota/netlink.c:quota_send_warning() uses genlmsg_multicast() which
specifically references init_net:

return genlmsg_multicast_netns(family, &init_net, skb,
portid, group, flags);

>> quotactl() can be used in a rkt container if the proccesses in the
>> container can guess somehow which block device is used by the
>> filesystem hosting the overlay's upperdir and if they can mknod it
>> somewhere. Usually, containers don't restrict mknod but just restrict
>> read-write access through the device cgroup. The read-write access is
>> irrelevant for quotactl(): quotactl() just check that the device node
>> exists and that it is not on a nodev mount. The nodev check does not
>> restrict containers here because they usually have a /dev mounted as
>> tmpfs without the nodev option.
> Correct. This raises a somewhat unrelated question: Does this mean that a
> container is able to mount arbitrary block device? Because also there we
> just pass a device path to the kernel...

The process would still need CAP_SYS_ADMIN and there are additional
checks when the user namespace is not the initial user namespace:

fs/namespace.c do_new_mount()
if (user_ns != &init_user_ns) {
if (!(type->fs_flags & FS_USERNS_MOUNT)) {
put_filesystem(type);
return -EPERM;
}...

For example, FS_USERNS_MOUNT is set on devpts_fs_type but not on
ext4_fs_type. So it's not possible to mount ext4 in a different user
namespace. Containers that don't use user namespaces can avoid giving
CAP_SYS_ADMIN or restrict mount with some AppArmor rules.

>> Containers that don't use user namespaces (so no projid mapping) would
>> be able to query quotas for projid assigned to other containers
>> (unfortunately). They would be able to change the quota of other
>> containers if they are privileged enough to be given CAP_SYS_RESOURCE.
> Yes.
>
>> Containers using user namespaces would not be able to change any quota
>> config because they don't have CAP_SYS_RESOURCE in the init user
>> namespace. If they are configured with a proper projid mapping, they
>> would only be able to query the projid they are assigned (they could
>> guess which projid to query by looking at /proc/self/projid_map).
> Yes.
>
>> Do you know if someone is working on the documentation? It would be
>> nice if filesystems/quota.txt could say who can receive the quota
>> warnings on netlink (which namespace) and if it could give some
> I have added that.
>
>> information about projid. But maybe this belong to the proc(5) and
>> user_namespaces(7) manpages as well.
> Project ID in VFS quotas is fairly new thing. Once ext4 gains support for
> it, I can add some documentation.
>
>> Is there any suggestions how to allocate projid in userspace?
>> Something like /etc/subprojid similar to /etc/subuid?
> I guess you need some coordination between namespaces?

Yes, I was thinking if Docker uses projid for some containers, rkt
uses other projid for other containers and the sysadmin also define
some projid manually.

> I only know that
> traditionally xfsprogs use /etc/projid for name->project id translation
> and /etc/projects contain roots of directory trees for which you wish to
> maintain directory quota together with project ids for each of the trees.

Thanks for the pointer.

Alban

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers