2014-08-08 16:39:58

by Li Xi

[permalink] [raw]
Subject: [PATCH v2 0/4] quota: add project quota support

Hi all,

The following patches propose an implementation of project support
for ext4. A project is an aggregate of unrelated inodes which might
scatter in different directories. Inodes belongs to a project
possesses a same identification i.e. 'project ID', just like every
inode has its user/group indentification. The following patches adds
project quota as supplement to the former uer/group quota types.

This project ID of an inode is iherited from its parent direcotry
and saved as an internal field of ext4 inode.

This is not the first existed attepmtion to add project quta support
for ext4. Patches of subtree quota support which was posted by Dmity
Monakhov in 2012 (http://lwn.net/Articles/506064/) implemented the
similar feature in a different way. Rather than saving the project
(or subtree) ID as an internal inode field, those patches manages
the ID as extented attributes.

We rebased both patch sets onto the same kernel version and run
benchmakrs respectively to comparing the peformance difference.
It is worth noting that patches from Lai Siyao and Niu Yawei
(quota: remove-dqptr_sem,
http://article.gmane.org/gmane.comp.file-systems.ext4/44341/)
improve the performance of quota enforcement significantly, which
can be seen clearly from following results.

It is obvious that extended attribute implementation has performance
impact when creating files. That is why we choose to push the patches
which use internal inode field to save project ID.

Kernel: 3.16.0-rc5
Server: Dell R620 (2 x [email protected], 256GB memory)
Storage: 10 x 15K SAS disks(RAID10)
Test tool: mdtest-1.9.3. Mdtest created 800K files in total. Each
thread created files in unique directory.

File Creation:
1thr 2thr 4thr 8thr 16thr
- vanilla
quota disabled 66094 105781 178968 186647 172536
quotaon(ug) 60337 99582 157396 171463 162872

- vanilla + remove-dqptr_sem patches
quota disabled 65955 112082 185550 181511 171988
quotaon(ug) 62391 101905 171013 190570 168914

- prjquota(xattr)
quota disabled 61396 97580 147852 146423 164895
quotaon(ug) 57009 93435 140589 135748 153196
quotaon(ugP) 57500 89419 133604 125291 105127

- prjquota(xattr) + remove-dqptr_sem patches
quota disabled 64053 100078 147608 139403 163960
quotaon(ug) 60754 104726 149231 139053 165990
quotaon(ugP) 59238 93606 148921 138434 163931

- prjquota(internal) + remove-dqptr_sem patches
quota disabled 65826 111828 181486 189227 171241
quotaon(ug) 65418 107745 173584 180562 173752
quotaon(ugP) 64669 103890 169176 186426 172192


File Removal:
1thr 2thr 4thr 8thr 16thr
- vanilla
quota disabled 118059 169825 234661 291812 345656
quotaon(ug) 106675 135834 153532 100437 87489

- vanilla + remove-dqptr_sem patches
quota disabled 120374 168437 236818 291754 331141
quotaon(ug) 110709 161954 238333 293700 329015

- prjquota(xattr)
quota disabled 116680 161662 229190 295642 332959
quotaon(ug) 104783 134359 154950 100516 87923
quotaon(ugP) 100240 125978 108653 68286 58991

- prjquota(xattr) + remove-dqptr_sem patches
quota disabled 116281 168938 233733 286663 344002
quotaon(ug) 109775 164995 236001 299389 340683
quotaon(ugP) 113935 162979 236112 300033 356117

- prjquota(internal) + remove-dqptr_sem patches
quota disabled 119537 171565 247418 291068 350138
quotaon(ug) 121756 159580 240778 298012 342437
quotaon(ugP) 118954 168022 241206 289055 334008

Changelog:
* v2 <- v1:
- Add ioctl interface for setting/getting project;
- Add EXT4_FEATURE_RO_COMPAT_PROJECT;
- Add get_projid() method in struct dquot_operations;
- Add error check of ext4_inode_projid_set/get().

v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153

Any comments or feedbacks are appreciated.

Regards,
- Li Xi

Li Xi(4):
quota: Adds general codes to enforces project quota limites
ext4: Adds project ID support for ext4
ext4: Adds project quota support for ext4
ext4: Adds ioctl interface support for ext4 project

Documentation/filesystems/ext4.txt | 4 +
fs/ext4/Kconfig | 11 --
fs/ext4/Makefile | 1 -
fs/ext4/ext4.h | 19 +++-
fs/ext4/ialloc.c | 16 +--
fs/ext4/inode.c | 85 +++++++++++++-
fs/ext4/ioctl.c | 100 ++++++++++++++++
fs/ext4/project.c | 224 ------------------------------------
fs/ext4/project.h | 58 ---------
fs/ext4/super.c | 45 ++++++--
fs/ext4/xattr.c | 6 -
fs/ext4/xattr.h | 2 -
fs/quota/Kconfig | 9 ++
fs/quota/dquot.c | 120 ++++++++++++++-----
fs/quota/quota.c | 5 +-
fs/quota/quotaio_v2.h | 4 +-
include/linux/fs.h | 1 -
include/linux/quota.h | 8 ++
include/uapi/linux/xattr.h | 2 -
19 files changed, 345 insertions(+), 375 deletions(-)


2014-08-08 16:58:19

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

Sorry, please ignore the wrong patch summary in the end of my last
email. Following is the right one:

Documentation/filesystems/ext4.txt | 4 +
fs/ext4/ext4.h | 19 +++++-
fs/ext4/ialloc.c | 4 +
fs/ext4/inode.c | 80 +++++++++++++++++++++++-
fs/ext4/ioctl.c | 100 +++++++++++++++++++++++++++++
fs/ext4/super.c | 77 +++++++++++++++++++---
fs/quota/Kconfig | 9 +++
fs/quota/dquot.c | 123 ++++++++++++++++++++++++++++--------
fs/quota/quota.c | 5 +-
fs/quota/quotaio_v2.h | 6 +-
include/linux/quota.h | 9 +++
include/uapi/linux/quota.h | 6 +-
12 files changed, 398 insertions(+), 44 deletions(-)

2014-08-08 22:33:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 09, 2014 at 12:39:58AM +0800, Li Xi wrote:
>
> It is obvious that extended attribute implementation has performance
> impact when creating files. That is why we choose to push the patches
> which use internal inode field to save project ID.

Were you using 256-byte inodes or 128-byte inodes when you benchmarked
using xattr versus an in-inode project quota?

The other major comment I have is that as much as possible, the
semantics should be compatible xfs's project quota. In particular,
this bit:

A managed tree must be setup initially using the -s option to the
project command. The specified project name or identifier is matched to
one or more trees defined in /etc/projects, and these trees are then
recursively descended to mark the affected inodes as being part of that
tree. This process sets an inode flag and the project identifier on
every file in the affected tree. Once this has been done, new files
created in the tree will automatically be accounted to the tree based
on their project identifier. An attempt to create a hard link to a
file in the tree will only succeed if the project identifier matches
the project identifier for the tree. The xfs_io utility can be used to
set the project ID for an arbitrary file, but this can only be done by
a privileged user.

Note the hard link restriction. And we should check with the XFS
folks what happens if you move a file from one directory which belongs
to one project quota to another directory which has a different
project quota (or no quota whatsoever). I suspect the right answer is
that the quota gets transferred from one project to another, so that
it is a true directory-tree quota system, but regardless, if we're
going to go down this path, let's stay consistent with how XFS does
things.

Cheers,

- Ted

2014-08-09 14:25:00

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 9, 2014 at 6:33 AM, Theodore Ts'o <[email protected]> wrote:
> On Sat, Aug 09, 2014 at 12:39:58AM +0800, Li Xi wrote:
>>
>> It is obvious that extended attribute implementation has performance
>> impact when creating files. That is why we choose to push the patches
>> which use internal inode field to save project ID.
>
> Were you using 256-byte inodes or 128-byte inodes when you benchmarked
> using xattr versus an in-inode project quota?
We don't change any option when formating ext4, so the inode size
is always 256 bytes as default when we were running benchmarks.
>
> The other major comment I have is that as much as possible, the
> semantics should be compatible xfs's project quota. In particular,
> this bit:
>
> A managed tree must be setup initially using the -s option to the
> project command. The specified project name or identifier is matched to
> one or more trees defined in /etc/projects, and these trees are then
> recursively descended to mark the affected inodes as being part of that
> tree. This process sets an inode flag and the project identifier on
> every file in the affected tree. Once this has been done, new files
> created in the tree will automatically be accounted to the tree based
> on their project identifier. An attempt to create a hard link to a
> file in the tree will only succeed if the project identifier matches
> the project identifier for the tree. The xfs_io utility can be used to
> set the project ID for an arbitrary file, but this can only be done by
> a privileged user.
>
> Note the hard link restriction. And we should check with the XFS
> folks what happens if you move a file from one directory which belongs
> to one project quota to another directory which has a different
> project quota (or no quota whatsoever). I suspect the right answer is
> that the quota gets transferred from one project to another, so that
> it is a true directory-tree quota system, but regardless, if we're
> going to go down this path, let's stay consistent with how XFS does
> things.
I agree that compatibility is important. And ofcouse, it would be nice if
XFS and Ext4 could have the same semantics of project quota. However, I am
wondering whether it is more important for prject quota of ext4 to keep
self-consistent instead of keeping consistent with XFS.

Given the fact that project indentifier is naturally like UID/GID, project ID
should be managed in the similar way like UID/GID. And that is what people are
actually doing right now. We already have kuid_t/kgid_t/kprojid_t and a series
of similar functions for both UIG/GID and project ID. I don't think it is
necessary to break this kind of self-consistency when we are trying to add
project quota support. It would be really straightfoward for a new user to
understand what project ID and project quota means, if we are able to explain
that project quota has exactly the same semantics with user/group quota.

User/Group management works well without enforcing unnecessary restrictions.
There seems no obvious reason to enforce such restrictions to project too.
I guess the idea behind the restrictions of XFS project quota is that inodes
belong to different projects should be under different sub-tree. (Sorry if I
am wrong, becasue I am confused by its limits aready. But, in this sense,
yeah, project quota of XFS is actually a directory-tree quota system.)
However, I am wondering whether this limit is really necessary. It is a common
use case that different users have seperate home directories and files of
a users don't scatter in other users' directory. And this use case is enabled
by a little bit of extra system management without enforcing any system-level
limit to file operations across users. Since project ID can only be changed by
a privileged user, usually the system administrator, there is no difficulty in
enforce that projects with different project IDs stay in seperate
directory-trees.

Comparing to real project quota, another problem of directory-tree quota is
the handling of renaming file across directories. Again, it is caused by the
unnecessary restriction. In oder to enforce that restriction, there are two
solution. First, return an error when renaming files, i.e. do not allow to
renaming across projects. Second, do qouta transfer when renaming. Like limit
of disallowing links aross projects, the first solution would cause annoying
failure which confuses users easily. A simple message about the errno just
won't remind users well to realize that it is a project quota limits, let
alone the users who don't know project quota well enough. And in the solution,
quota transfer would cause significant performance degression, given the fact
that rename is a frequently used operation (probably much more frequently than
chmod or chgrp) and is widely considered to be atomic and thus light weighted.

Another advantage of real project quota without any unnecessary limits is that
it is more flexible thus enables more potential use cases. For example, we can
easily combine 'find' or 'grep' command and project quota to calculate the
total number or disk usage of inodes with common attributes or contents. I am
not sure whether it is possible for XFS to so because of its restrictions.

The implementation of XFS project quota is a good example and comparison for
implementing project quota suport for Ext4. But I don't think it is necessary
to take everything of XFS as a standard, because XFS itself doen't implement
everything standardly. For example, XFS does not use standard quota framework
like Ext4 does and it is still struggling to mapping its internal IDs to
standard UID/GID/project ID (https://lkml.org/lkml/2013/2/17/229).

In a word, for the developers, real project quota needs less codes, requires
less exception handling, and provides better self-consistency. And for the
users, it requires less limits, provides cleaner semantics and enables more
uses cases. I'd suggest we choose a straightfowrd way to implement project
quota rather than restrained by an existing but suboptimum design.

Regards,

-Li Xi

2014-08-09 17:24:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 09, 2014 at 10:24:59PM +0800, Li Xi wrote:
> Given the fact that project indentifier is naturally like UID/GID, project ID
> should be managed in the similar way like UID/GID. And that is what people are
> actually doing right now. We already have kuid_t/kgid_t/kprojid_t and a series
> of similar functions for both UIG/GID and project ID. I don't think it is
> necessary to break this kind of self-consistency when we are trying to add
> project quota support. It would be really straightfoward for a new user to
> understand what project ID and project quota means, if we are able to explain
> that project quota has exactly the same semantics with user/group quota.

There is a very big and fundamental difference about project id's
versus user/group id's, and that is that project id's are not first
class objects in the file system. That is, stat() doesn't understand
them. Processes do not belong to one (or more) projects, the way they
do for user and group id's.

As a result, project id's are a massive administration nightmare. You
can't easily see which project a file belongs to, since "ls" and
"stat" has no support for the project id. So if a file is created in
one directory, the file will inherent that project id from its parent
directory. Suppose that file is 100 gigabytes, and it chews up most
of the project quota for that project. Now suppose that file is moved
to some other directory.

How in the world is the administrator supposed to find the file which
is chewing up 100GB of quota? Find doesn't support project id's
either....

The last time I asked why in the world anyone would want to use this
feature, the only use case that I heard was people who were using
containers, and where the all of the project id's were inside a
chroot. Hence, any questions I asked about what happens when a file
gets moved out from the hierarchy were hand-waved away, since inside a
chroot, it could never happen.

The question is what are the sane semantics when you don't have the
chroot restriction, and having free-range inodes with project quotas
that can moved all over the file system, seems to me to result in a
not very usable system in the end.

Regards,

- Ted

2014-08-09 22:14:53

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Fri, Aug 08, 2014 at 06:33:35PM -0400, Theodore Ts'o wrote:
> On Sat, Aug 09, 2014 at 12:39:58AM +0800, Li Xi wrote:
> >
> > It is obvious that extended attribute implementation has performance
> > impact when creating files. That is why we choose to push the patches
> > which use internal inode field to save project ID.
>
> Were you using 256-byte inodes or 128-byte inodes when you benchmarked
> using xattr versus an in-inode project quota?
>
> The other major comment I have is that as much as possible, the
> semantics should be compatible xfs's project quota. In particular,
> this bit:
>
> A managed tree must be setup initially using the -s option to the
> project command. The specified project name or identifier is matched to
> one or more trees defined in /etc/projects, and these trees are then
> recursively descended to mark the affected inodes as being part of that
> tree. This process sets an inode flag and the project identifier on
> every file in the affected tree. Once this has been done, new files
> created in the tree will automatically be accounted to the tree based
> on their project identifier. An attempt to create a hard link to a
> file in the tree will only succeed if the project identifier matches
> the project identifier for the tree. The xfs_io utility can be used to
> set the project ID for an arbitrary file, but this can only be done by
> a privileged user.
>
> Note the hard link restriction. And we should check with the XFS
> folks what happens if you move a file from one directory which belongs
> to one project quota to another directory which has a different
> project quota (or no quota whatsoever).

Rename to a destination with a different project quota gives EXDEV,
same as if you were trying to rename across different filesystems.
See xfs_rename().

> I suspect the right answer is
> that the quota gets transferred from one project to another, so that
> it is a true directory-tree quota system,

XFS doesn't transfer the quota from projid to projid because it's
borderline impossible to correctly track all the metadata
allocation/free operations that can happen in a rename operation and
account them to the correct quota. Hence all those corner cases are
avoided by treating it as EXDEV and forcing userspace to cp/unlink
the files rather than rename.

That's really an implementation detail.

> but regardless, if we're
> going to go down this path, let's stay consistent with how XFS does
> things.

Agreed. I also think they shoul duse the same userspace quota
interface, so you can use xfs_quota to manage project quotas on both
ext4 and XFS. That way all the xfstests that test project quota
behaviour will work on ext4 without modification, and tests written
for ext4 should also work on XFS.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-08-09 22:17:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

An additional philosophical question. If your argument is that you
want project quotas to be as fully general and to work like group
quotas --- then this brings up a fundamental question --- why can't
you just use group quotas?

What is the use case where you need to have two different quotas that
work exactly like group quotas? And following in the general design
rule of "Zero, one, or infinity: there is no two", for whatever use
case where you might argue that you need _two_ quotas with identical
semantics as group quotas, who is to say that there won't be someone
that comes up with some other use case where you need _three_ quotas
with identical semantics as group quotas. Or _four_ group quotas
being tracked simultaneously. Etc, etc., etc.

The advantage of doing the directory hierarcy based quota system is
not just that it's compatible with XFS; it is that it is *different*
from group quotas. Not more restrictive, but *different*. There will
certainly be scenarios where someone wants to enforce a restriction on
the size or number of inodes in a directory hierarcy, and where when
you move a file out of a directory hierarcy into another one, you
*want* the usage quota to be transfered from the source to the
destination hierarcy.

It may not be what *you* want, but let me ask you this --- why is it
that you can't use the group quota system, and need to invent an
entirely new project quota? The only excuse I've heard is for people
who are doing container virtualization. I don't know if that's your
reason, but let's examine that use case in detail. The reason why the
container virtualization folks want project quotas is because they
want to have quotas imposed on a portion of the directory hiearchy
that is given to a customer to use in a chrooted container-style "VM".
And since the user is going to be using their own user and group id's,
virtualized using the user and group namespaces, they need a third
dimension, called project id's.

That's all very fine and good, but if you make it fully general, where
support for it is in ls, find, a new "chproj" command, etc., it start
becoming an attractive nuisance which either systemd or GNOME might
start using for their own nefarious purposes. And once they start
using that, and it's incorporated into a Fedora release, now someone
who wants to run Fedora inside a container, and use project id's for a
quota system for a container, will collide with the use of project
id's for Fedora! Oops.

And this is where the "zero, one, or infinity" rule comes into play.
You can either keep project quotas very tightly constrained for a
single use case --- namely, virtualization for containers, in which
case what you want really *is* based on directory hierarcies --- or
you make it be something fully general, where these different quota
types are stored as extended attributes, so you can have multiple
different namespaces --- one for the Parallel's container group name,
for the container quota system; another one for the GNOME use of the
"project quota", and so instead of having a single "project quota"
inode, let that reserved inode be used for a directory, so you can
have multiple "quota inodes" for the different dimensions of quota
usage.

Personally, I think this latter approach is way too complicated, and
I'd much rather implement a single directory hierarcy based quota
system which is compatible with XFS and has XFS's semantics. But at
least this second approach is *fully* general, if you are going to
argue for a more general solution.

Regards,

- Ted

2014-08-09 23:38:37

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 09, 2014 at 06:17:10PM -0400, Theodore Ts'o wrote:
> An additional philosophical question. If your argument is that you
> want project quotas to be as fully general and to work like group
> quotas --- then this brings up a fundamental question --- why can't
> you just use group quotas?
>
> What is the use case where you need to have two different quotas that
> work exactly like group quotas? And following in the general design
> rule of "Zero, one, or infinity: there is no two", for whatever use
> case where you might argue that you need _two_ quotas with identical
> semantics as group quotas, who is to say that there won't be someone
> that comes up with some other use case where you need _three_ quotas
> with identical semantics as group quotas. Or _four_ group quotas
> being tracked simultaneously. Etc, etc., etc.
>
> The advantage of doing the directory hierarcy based quota system is
> not just that it's compatible with XFS; it is that it is *different*
> from group quotas. Not more restrictive, but *different*. There will
> certainly be scenarios where someone wants to enforce a restriction on
> the size or number of inodes in a directory hierarcy, and where when
> you move a file out of a directory hierarcy into another one, you
> *want* the usage quota to be transfered from the source to the
> destination hierarcy.
>
> It may not be what *you* want, but let me ask you this --- why is it
> that you can't use the group quota system, and need to invent an
> entirely new project quota? The only excuse I've heard is for people
> who are doing container virtualization.

Step into the enterprise or the HPC world where you are managing
thousands of users spread across departmental/research groups and
undertaking a few tens of distinct projects at the same time.

Users have space limits, departments are billed for their user's
space usage, and project space usage needs to be accounted (and
maybe limited) to ensure the shared storage doesn't run out of space
inapprpriately.

I've seen this sort of thing quite a bit over the past 10 years.
Most of the time on storage systems measured in the high tens to
hundreds of TB of storage, which puts it way out of the scope of
knowledge of most Linux distro and application developers. That's
most likely why you don't get any other answer to your questions -
most people can't see how project quotas get used because they've
never worked in a large, multi-project environment before.

> Personally, I think this latter approach is way too complicated, and
> I'd much rather implement a single directory hierarcy based quota
> system which is compatible with XFS and has XFS's semantics. But at
> least this second approach is *fully* general, if you are going to
> argue for a more general solution.

AFAICT, the 90% solution is "compatible with XFS" solution. It's
also the simplest and lowest cost, given that you should be able
to do it with a few hundred lines of kernel code. Userspace doesn't
need immediate work, because you can use the XFS tools initially
and hence all the xfstests validation. Don't be different just
because of NIH syndrome....

If we need a more *complex* solution because people need more than
just what the simple solution gives them, then that is a topic for
-fsdevel and probably LSFMM because there's all sorts of semantic
and interface discussions that are needed and a lot more code that
needs to be written. i.e. the simple solution can be deployed within
a couple of kernel releases, a generic solution is more likely a
coupleof *years* of work to deploy...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-08-10 00:09:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sun, Aug 10, 2014 at 09:38:32AM +1000, Dave Chinner wrote:
>
> I've seen this sort of thing quite a bit over the past 10 years.
> Most of the time on storage systems measured in the high tens to
> hundreds of TB of storage, which puts it way out of the scope of
> knowledge of most Linux distro and application developers. That's
> most likely why you don't get any other answer to your questions -
> most people can't see how project quotas get used because they've
> never worked in a large, multi-project environment before.

Sure, but the people who are advocating for project quotas had better
understand how they plan to use them, both so that (a) if they do a
design different from XFS, they can justify why the differences are
necessary, and (b) to justify whether we need it in ext4 to begin
with.

The "directory hierarchy quota" is easy to understand, it's something
that the Andrew File System had --- down to restriction that you can't
move a file between different AFS volumes, but instead have to copy
and unlink.

> If we need a more *complex* solution because people need more than
> just what the simple solution gives them, then that is a topic for
> -fsdevel and probably LSFMM because there's all sorts of semantic
> and interface discussions that are needed and a lot more code that
> needs to be written. i.e. the simple solution can be deployed within
> a couple of kernel releases, a generic solution is more likely a
> coupleof *years* of work to deploy...

100% agreed. And I have yet to see a compelling case that even the
simple form of project id's would get a lot of use in the ext4 world.
Which is why I want to know from those who want to add project quotas
in to ext4. How do you plan to use them? What's the use case
scenario?

- Ted

2014-08-10 00:38:19

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> There is a very big and fundamental difference about project id's
> versus user/group id's, and that is that project id's are not first
> class objects in the file system. That is, stat() doesn't understand
> them. Processes do not belong to one (or more) projects, the way they
> do for user and group id's.
Yeah, I totally agree that project ID is different from UID/GID in this
sense. And I think it happens to be why we need project quota rather than
using existing group quota actually.
>
> As a result, project id's are a massive administration nightmare. You
> can't easily see which project a file belongs to, since "ls" and
> "stat" has no support for the project id. So if a file is created in
> one directory, the file will inherent that project id from its parent
> directory. Suppose that file is 100 gigabytes, and it chews up most
> of the project quota for that project. Now suppose that file is moved
> to some other directory.
>
> How in the world is the administrator supposed to find the file which
> is chewing up 100GB of quota? Find doesn't support project id's
> either....
That is the same problem when we are scattering inodes with different
UID/GID into directory-trees. I don't think there is any difficulty in
writing a script to locates the files with a specific project ID. It
might be slightly slower than native find, but I don't think there is
any fundamental difference.
>
> The last time I asked why in the world anyone would want to use this
> feature, the only use case that I heard was people who were using
> containers, and where the all of the project id's were inside a
> chroot. Hence, any questions I asked about what happens when a file
> gets moved out from the hierarchy were hand-waved away, since inside a
> chroot, it could never happen.
I think administrators in HPC or datacenter would like this feature in a
flexible way. I was always getting asked by customers 'how can I know
the spaces/inode used by directoris shared by multiple users/groups'.
'du' doesn't help much because no body wants to wait such a long time if
disk space is at PB-level. In this use case of project quota, I think
it is common for users to move files inside or outside directory-trees.

Regards,
-Li Xi

2014-08-10 02:15:36

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> An additional philosophical question. If your argument is that you
> want project quotas to be as fully general and to work like group
> quotas --- then this brings up a fundamental question --- why can't
> you just use group quotas?
I think the reason that people want another quota type is that the UID/GID
is used in privilege management. And it is always troublesome when things
are related to privilege, because no administrator want to be blamed for
file access from unauthorized users. UID/GID usually has a determined
mapping rule to customers in reality. In this sense, I don't think
administrators want to (or can) change user/group settings frequently.
And that is why we need an extra quota type. It looks like group quota,
but it is flexible and safer for management.
>
> What is the use case where you need to have two different quotas that
> work exactly like group quotas? And following in the general design
> rule of "Zero, one, or infinity: there is no two", for whatever use
> case where you might argue that you need _two_ quotas with identical
> semantics as group quotas, who is to say that there won't be someone
> that comes up with some other use case where you need _three_ quotas
> with identical semantics as group quotas. Or _four_ group quotas
> being tracked simultaneously. Etc, etc., etc.
"Zero, one, or infinity: there is no two" looks like a beautiful philosophy.
I totally agree on that. However, I think general project quota happens to
be the one, not two. If it is an acceptable conclusion that administrator
can't change global group settings and group attributes of inodes frequently,
project quota become the first flexible and fully controled quota type,
which administrators can use for freewill space accountment and limit
enforcement. Unfortunately, this feature has been missing for such a long
time and the requirement for it grows to a critical point that a lot of
distributed file systems find their own way of providing similar features,
e.g. directory/volume/file-set/project quotas. These features looks
extremely familar, yet aim at different use cases and havs different
restraints. I prefer a version without any unnecessary restraints becasue
it retains all possibility.
>
> The advantage of doing the directory hierarcy based quota system is
> not just that it's compatible with XFS; it is that it is *different*
> from group quotas. Not more restrictive, but *different*. There will
> certainly be scenarios where someone wants to enforce a restriction on
> the size or number of inodes in a directory hierarcy, and where when
> you move a file out of a directory hierarcy into another one, you
> *want* the usage quota to be transfered from the source to the
> destination hierarcy.
I agree that project quota and group quota is different. And I gree that
the use case of enforcing directory-tree quota is a very important use case,
probably the most important use case. What I am suggesting is that, with an
unlimited general project quota, we can enable other potential use cases
without harming this use case at all. For example, if we want to move
a file out of a directory hierarcy into another and want the usage quota
to be transfered, why can't we add a 'setproject' command following with
'rename/mv' command? In this use case, this operation is usually done by
administrator. And I guess it can be safely assumed that a administrator
is well trained to know what should be done when managing project related
directories.
> It may not be what *you* want, but let me ask you this --- why is it
> that you can't use the group quota system, and need to invent an
> entirely new project quota? The only excuse I've heard is for people
> who are doing container virtualization. I don't know if that's your
> reason, but let's examine that use case in detail. The reason why the
> container virtualization folks want project quotas is because they
> want to have quotas imposed on a portion of the directory hiearchy
> that is given to a customer to use in a chrooted container-style "VM".
> And since the user is going to be using their own user and group id's,
> virtualized using the user and group namespaces, they need a third
> dimension, called project id's.
> That's all very fine and good, but if you make it fully general, where
> support for it is in ls, find, a new "chproj" command, etc., it start
> becoming an attractive nuisance which either systemd or GNOME might
> start using for their own nefarious purposes. And once they start
> using that, and it's incorporated into a Fedora release, now someone
> who wants to run Fedora inside a container, and use project id's for a
> quota system for a container, will collide with the use of project
> id's for Fedora! Oops.
>
> And this is where the "zero, one, or infinity" rule comes into play.
> You can either keep project quotas very tightly constrained for a
> single use case --- namely, virtualization for containers, in which
Yeah, that is truely a problem. However, I don't think it is possible
for a feature to limit how it can be used. Even we set the restraints
for project quota feature, the project ID can still be messed by multiple
selfish applications any way.
> case what you want really *is* based on directory hierarcies --- or
> you make it be something fully general, where these different quota
> types are stored as extended attributes, so you can have multiple
> different namespaces --- one for the Parallel's container group name,
> for the container quota system; another one for the GNOME use of the
> "project quota", and so instead of having a single "project quota"
> inode, let that reserved inode be used for a directory, so you can
> have multiple "quota inodes" for the different dimensions of quota
> usage.
Again, without project quota, administrators do not have any similar tool.
Either they get project quota, or nothing. So, what we are concerned now is
to provide project quota as the first 'one'. When it turns out project
is not enough any more in the future, we definitely need to provide
'infinity'. :)

Regards,
-Li Xi

2014-08-10 08:38:25

by Shuichi Ihara

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

Hello,

One of our (Li Xi and ourself) purpose of what we need project quota support in ext4, is project quota support in the Lustre filesystem.
Lustre has been running as high performance parallel filesystem for mainly scratch space of HPC system per organization, site, or group. So, organization/site level UID/GID quota might be enough.

However, recently, multiple sites, groups or organizations are using the same Lustre filesystem for not only scratch space, but also a several variety of use cases and reasons.
The other hand, filesystem size is getting larger and larger, people (include users and administrator) prefer single namespace as performance and simple administration perspective rather than managing many small filesystems. (it helps reducing copy costs from filesystem to filesystem)

Therefore, even under the same GID, they have different projects/purposes on the filesystem and are involved in these several projects. And also, sometimes, the budgets are allocated to each project or tasks. (or multiple projects and tasks) If they have huge budgets, administrator can allocate a lot of storage resources to that project, but it's less budgets, less storage resource allocation for fair cost/resource management. In this use case, it's very harder of storage management with GID based quota.

Although Lustre is distributed filesystem, it's based on aggregated local filesystem, mainly ext4 today. And, Lustre quota has cluster wide quota mechanism/management, but it relies on the backend local filesystem's quota. Now, additional "id" for different type of quota accounting (project) is required in ext4 to cover above new type of quota management that can't be easy handling with UID/GID, today....

Regards,
Ihara

On Aug 10, 2014, at 7:17 AM, Theodore Ts'o <[email protected]> wrote:

> An additional philosophical question. If your argument is that you
> want project quotas to be as fully general and to work like group
> quotas --- then this brings up a fundamental question --- why can't
> you just use group quotas?
>
> What is the use case where you need to have two different quotas that
> work exactly like group quotas? And following in the general design
> rule of "Zero, one, or infinity: there is no two", for whatever use
> case where you might argue that you need _two_ quotas with identical
> semantics as group quotas, who is to say that there won't be someone
> that comes up with some other use case where you need _three_ quotas
> with identical semantics as group quotas. Or _four_ group quotas
> being tracked simultaneously. Etc, etc., etc.
>
> The advantage of doing the directory hierarcy based quota system is
> not just that it's compatible with XFS; it is that it is *different*
> from group quotas. Not more restrictive, but *different*. There will
> certainly be scenarios where someone wants to enforce a restriction on
> the size or number of inodes in a directory hierarcy, and where when
> you move a file out of a directory hierarcy into another one, you
> *want* the usage quota to be transfered from the source to the
> destination hierarcy.
>
> It may not be what *you* want, but let me ask you this --- why is it
> that you can't use the group quota system, and need to invent an
> entirely new project quota? The only excuse I've heard is for people
> who are doing container virtualization. I don't know if that's your
> reason, but let's examine that use case in detail. The reason why the
> container virtualization folks want project quotas is because they
> want to have quotas imposed on a portion of the directory hiearchy
> that is given to a customer to use in a chrooted container-style "VM".
> And since the user is going to be using their own user and group id's,
> virtualized using the user and group namespaces, they need a third
> dimension, called project id's.
>
> That's all very fine and good, but if you make it fully general, where
> support for it is in ls, find, a new "chproj" command, etc., it start
> becoming an attractive nuisance which either systemd or GNOME might
> start using for their own nefarious purposes. And once they start
> using that, and it's incorporated into a Fedora release, now someone
> who wants to run Fedora inside a container, and use project id's for a
> quota system for a container, will collide with the use of project
> id's for Fedora! Oops.
>
> And this is where the "zero, one, or infinity" rule comes into play.
> You can either keep project quotas very tightly constrained for a
> single use case --- namely, virtualization for containers, in which
> case what you want really *is* based on directory hierarcies --- or
> you make it be something fully general, where these different quota
> types are stored as extended attributes, so you can have multiple
> different namespaces --- one for the Parallel's container group name,
> for the container quota system; another one for the GNOME use of the
> "project quota", and so instead of having a single "project quota"
> inode, let that reserved inode be used for a directory, so you can
> have multiple "quota inodes" for the different dimensions of quota
> usage.
>
> Personally, I think this latter approach is way too complicated, and
> I'd much rather implement a single directory hierarcy based quota
> system which is compatible with XFS and has XFS's semantics. But at
> least this second approach is *fully* general, if you are going to
> argue for a more general solution.
>
> Regards,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-08-10 16:52:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sun, Aug 10, 2014 at 08:38:25AM +0000, Shuichi Ihara wrote:
>
> One of our (Li Xi and ourself) purpose of what we need project quota
> support in ext4, is project quota support in the Lustre filesystem.

OK, but for lustre, you completely bypass the VFS when you write the
back end files. Yes? So if implement something which is XFS
compatible vis-a-vis a directory tree quota, it doesn't matter if
Lustre is creating many different files that belong to project id's.


This being said, for this particular use case, I'm not entirely sure
why you can't just create separate groups for each project, and then
let group inheritance take care of things:

mkdir top-level
chgrp project1 top-level
chmod g+s top-level

Now all of the files created in top-level will be accounted in
project1's quota.

If the answer is that it's too easy to evade quota controls by using
the "chgrp" command, note that if you are going to allow users to mv
files around, they can easily evade the project quota anyway, by
creating the file in top-level dirctory of project2, and then mv'ing
it into the top-level directory of project1.

Or are you really saying you really need to simultaneously track quota
from a group perspective, and a project perspectively, at the same
time? If so, why?

Regards,

- Ted

2014-08-10 20:47:27

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support


On Sat, 2014-08-09 at 13:24 -0400, Theodore Ts'o wrote:
> The last time I asked why in the world anyone would want to use this
> feature, the only use case that I heard was people who were using
> containers, and where the all of the project id's were inside a
> chroot. Hence, any questions I asked about what happens when a file
> gets moved out from the hierarchy were hand-waved away, since inside a
> chroot, it could never happen.

Actually, I don't believe that's entirely accurate. The performance
problem with shared filesystem roots for containers has meant OpenVZ has
been using a block root for a while. However, we still support the old
shared filesystem root, but for quota's within the chroot, we use a subtree
quota system (not a project quota) for which Dmitry Monakhov
posted the patches several times a couple of years ago.

James



2014-08-10 21:49:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sun, Aug 10, 2014 at 01:47:24PM -0700, James Bottomley wrote:
>
> Actually, I don't believe that's entirely accurate. The performance
> problem with shared filesystem roots for containers has meant OpenVZ has
> been using a block root for a while. However, we still support the old
> shared filesystem root, but for quota's within the chroot, we use a subtree
> quota system (not a project quota) for which Dmitry Monakhov
> posted the patches several times a couple of years ago.

The XFS-compatible project quota is effectively a subtree quota
system. My argument is that if we're going to try to get something
like this upstream, it should have the same properties as the XFS
project quota system; and that should be semantically compatible with
the patches you are using.

(If we end up using the same ioctl's as xfs_quota uses, which in
theory I'm in favor of, but which I haven't studied yet, then it might
not be ABI compatible with Dmitry's patches, but it should simplify
the patches that OpenVZ would need to carry.)

- Ted

2014-08-10 22:18:25

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 09, 2014 at 08:09:51PM -0400, Theodore Ts'o wrote:
> On Sun, Aug 10, 2014 at 09:38:32AM +1000, Dave Chinner wrote:
> >
> > I've seen this sort of thing quite a bit over the past 10 years.
> > Most of the time on storage systems measured in the high tens to
> > hundreds of TB of storage, which puts it way out of the scope of
> > knowledge of most Linux distro and application developers. That's
> > most likely why you don't get any other answer to your questions -
> > most people can't see how project quotas get used because they've
> > never worked in a large, multi-project environment before.
>
> Sure, but the people who are advocating for project quotas had better
> understand how they plan to use them, both so that (a) if they do a
> design different from XFS, they can justify why the differences are
> necessary, and (b) to justify whether we need it in ext4 to begin
> with.
>
> The "directory hierarchy quota" is easy to understand, it's something
> that the Andrew File System had --- down to restriction that you can't
> move a file between different AFS volumes, but instead have to copy
> and unlink.

*nod* - that's the semantics the EXDEV error a cross-project rename
in the XFS rename code gives. Looking back at some of the comments
in the thread, I suspect that the behaviour this triggers in all
userspace utilities isn't clear: separately managed directory
hierarchies are designed to appear to userspace as separate
filesystems from an accounting and behavioural POV.

Note, also, that this means running df on a XFS filesystem with a
path inside a directory tree quota heirarchies will report the space
used of the direct tree quota, not the overall filesystem...

> > If we need a more *complex* solution because people need more than
> > just what the simple solution gives them, then that is a topic for
> > -fsdevel and probably LSFMM because there's all sorts of semantic
> > and interface discussions that are needed and a lot more code that
> > needs to be written. i.e. the simple solution can be deployed within
> > a couple of kernel releases, a generic solution is more likely a
> > coupleof *years* of work to deploy...
>
> 100% agreed. And I have yet to see a compelling case that even the
> simple form of project id's would get a lot of use in the ext4 world.
> Which is why I want to know from those who want to add project quotas
> in to ext4. How do you plan to use them? What's the use case
> scenario?

That's fair enough, though I think you'll find that the plain
project quota will find many different uses that filesystem
developers will have never thought of if it is there. e.g. I came
across an embedded NAS device a few years ago implemented with a
centralised object stores but had per-export space usage accounting
and enforcement by assigning every object associated with a specific
exported volume the same project quota....

Fundamentally, project quotas provide a quota mechanism that is
independent of both the filesystem heirarchy and the owner
credentials of the file. That means it can be used for all sorts of
things traditional u/g quotas cannot be used for and directory tree
quota is just one of them. As fs developers, we usually talk about
directory tree quota because it's the only project quota use case
that I've come across that needs kernel/fs help to implement sanely.
;)

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-08-11 00:06:42

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> This being said, for this particular use case, I'm not entirely sure
> why you can't just create separate groups for each project, and then
> let group inheritance take care of things:
>
> mkdir top-level
> chgrp project1 top-level
> chmod g+s top-level
>
> Now all of the files created in top-level will be accounted in
> project1's quota.
>
> If the answer is that it's too easy to evade quota controls by using
> the "chgrp" command, note that if you are going to allow users to mv
> files around, they can easily evade the project quota anyway, by
> creating the file in top-level dirctory of project2, and then mv'ing
> it into the top-level directory of project1.
Yeah, we don't want common users to change the project ID of thier
files, so setting project is only allowed for administrator in this
implementation. And since project ID of an inode won't be changed
when it is renamed around, common users has no way to evade
project quota.

Regards,
-Li Xi

2014-08-11 00:19:46

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> That's fair enough, though I think you'll find that the plain
> project quota will find many different uses that filesystem
> developers will have never thought of if it is there. e.g. I came
> across an embedded NAS device a few years ago implemented with a
> centralised object stores but had per-export space usage accounting
> and enforcement by assigning every object associated with a specific
> exported volume the same project quota....
100% agreed. I can imagine that a lot of users will find their requirement
of space managing with project quota. In this sense, general project quota
is just like extended attribute comparing to internal attribute. For unknown
use case in the future, I' d suggest to keep everything flexible. We don't
really want to hear from customers that project quota looks attractive, but
needs to be hacked to be usable for their usage.

Regards,
Li Xi

2014-08-11 10:23:54

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

>
> Or are you really saying you really need to simultaneously track quota
> from a group perspective, and a project perspectively, at the same
> time? If so, why?
An reason that we don't want to use group quota for project use case is that
we don't want to lose the track of the group usage that the user belongs
to. Let's assume GROUP1 has only two users, USER1 and USER2. Normally, we
want to query the disk usage of GROUP1 by 'quota -g GROUP1 -v'. The disk
usage should be the sum of 'quota -u USER1 -v' and 'quota -u USER2 -v', if
we never do any unusual chgrp/chown. But if we change the GID of files
created by USER1 to PROJECT1, there will be a mismatching between
'quota -g GROUP1 -v' and ('quota -u USER1 -v' + 'quota -u USER2 -v').

As mentioned by Ihara, project quota support for ext4 is an important
part of project quota support for Lustre. I'd like to explain a little
bit about the expecting usage of project quota for Lustre.

As a distributed file system, Lustre is able to use hundreds of seperate
ext4 file systems to store its data as well as metadata, yet provides a
united global name space. Some of users start to use SSD devices for better
performance on Lustre. However as we can expect, they might want to replace
only part of the drivers to SSD, since SSD is expensive. That means, part
of the ext4 file systems are using SSD and the other part of the ext4 file
systems are using hard disks. In the sight of Lustre, users can choose to
locate files on SSDs or hard disks using features of Lustre, namely 'stripe'
and 'OST pool'. Here comes the problem, how to limit the usage of SSD since
all end users want good performance badly?

Quota system is designed to allot such kind of limited resources. However,
unfortunately, former UID/GID based quotas won't help in this case. UID/GID
based quotas works well in alloting one determined kind of resource, i.e.
global space and inode usage. But when the resource itself have to be
devided to seperate parts either for management reasons (e.g. virtualization)
or physical reasons (e.g. SSD v.s. hard disk v.s. tape), UID/GID based quota
is not helping, simply because UID/GID is not a suitable way to distinguish
resources. That is why we need another dimension of quota.

Of course, we might be able to find some walk-around ways using group quota.
However, because the owners of the files can change the group attributes
freely, it is so easy for the users to evade the group quota and steal the
tight resources. For example, in order to steal SSD space, a user can just
creating the files using the sepcific group ID and then change it back.
And administrators can never expect users will cooperate on this. Users always
have enough excuses to ignore requirements from administrators to delete
unnecessary data on a shared file system, if there is no hard quota
limits on that system. In the current implementation of project quota,
project ID can only be changed by privileged users, so that won't be a
problem for it.

Thanks,
Li Xi

2014-08-11 10:49:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sun 10-08-14 10:15:36, Li Xi wrote:
> > An additional philosophical question. If your argument is that you
> > want project quotas to be as fully general and to work like group
> > quotas --- then this brings up a fundamental question --- why can't
> > you just use group quotas?
> I think the reason that people want another quota type is that the UID/GID
> is used in privilege management. And it is always troublesome when things
> are related to privilege, because no administrator want to be blamed for
> file access from unauthorized users. UID/GID usually has a determined
> mapping rule to customers in reality. In this sense, I don't think
> administrators want to (or can) change user/group settings frequently.
> And that is why we need an extra quota type. It looks like group quota,
> but it is flexible and safer for management.
Well, but you can easily setup the system in such a way that GID (outside
of several system groups) has no priviledge implications (you can use ACLs
if you need more users to access files) and use group quotas if you really
need full flexibility of additional file ID... So I'm not sure I buy this
fear of security implications.

> > What is the use case where you need to have two different quotas that
> > work exactly like group quotas? And following in the general design
> > rule of "Zero, one, or infinity: there is no two", for whatever use
> > case where you might argue that you need _two_ quotas with identical
> > semantics as group quotas, who is to say that there won't be someone
> > that comes up with some other use case where you need _three_ quotas
> > with identical semantics as group quotas. Or _four_ group quotas
> > being tracked simultaneously. Etc, etc., etc.
> "Zero, one, or infinity: there is no two" looks like a beautiful philosophy.
> I totally agree on that. However, I think general project quota happens to
> be the one, not two. If it is an acceptable conclusion that administrator
> can't change global group settings and group attributes of inodes frequently,
But why would administrator have to change group setting or attributes
frequently?

> project quota become the first flexible and fully controled quota type,
> which administrators can use for freewill space accountment and limit
> enforcement. Unfortunately, this feature has been missing for such a long
> time and the requirement for it grows to a critical point that a lot of
> distributed file systems find their own way of providing similar features,
> e.g. directory/volume/file-set/project quotas. These features looks
> extremely familar, yet aim at different use cases and havs different
> restraints. I prefer a version without any unnecessary restraints becasue
> it retains all possibility.
I agree additional ID for quota purposes is useful. The question is
whether we leave it as 'just another ID attached to a file' or constrain it
in some way.

> > The advantage of doing the directory hierarcy based quota system is
> > not just that it's compatible with XFS; it is that it is *different*
> > from group quotas. Not more restrictive, but *different*. There will
> > certainly be scenarios where someone wants to enforce a restriction on
> > the size or number of inodes in a directory hierarcy, and where when
> > you move a file out of a directory hierarcy into another one, you
> > *want* the usage quota to be transfered from the source to the
> > destination hierarcy.
> I agree that project quota and group quota is different. And I gree that
> the use case of enforcing directory-tree quota is a very important use case,
> probably the most important use case. What I am suggesting is that, with an
> unlimited general project quota, we can enable other potential use cases
> without harming this use case at all. For example, if we want to move
> a file out of a directory hierarcy into another and want the usage quota
> to be transfered, why can't we add a 'setproject' command following with
> 'rename/mv' command? In this use case, this operation is usually done by
> administrator. And I guess it can be safely assumed that a administrator
> is well trained to know what should be done when managing project related
> directories.
So I actually don't think project quota as implemented by XFS (and with
which me and Ted want to stay compatible with) isn't different from what you
want. Let me explain what XFS does:
1) Each file has an additional ID - the project ID
2) Each dir can have XFS_DIFLAG_PROJINHERIT flag set. When this flag is
set, all files and directories created in the directory inherit project
ID, directories also inherit XFS_DIFLAG_PROJINHERIT - this is
equivalent of sgid bit on directories for gids.
3) When you hard-link a file into directory with XFS_DIFLAG_PROJINHERIT
set, the file already has to have the same project ID as the directory
you are linking into.
4) When you rename a file into a directory with XFS_DIFLAG_PROJINHERIT
set, the file already has to have the same project ID as the directory
you are renaming into.
5) If you call statfs() on a directory with XFS_DIFLAG_PROJINHERIT set
and project quota is being enforced, statfs() will return free/used
blocks of the corresponding project instead of number of free/used
blocks in the filesystem.

Now if, as an administrator, you decide you need completely generic
additional ID, you can do so. You just never set XFS_DIFLAG_PROJINHERIT.
Quota accounting and enforcement works just fine without that flag.

So the discussion really is about the semantics of the
XFS_DIFLAG_PROJINHERIT flag. If you want automatic inheritance of project
ids, you set XFS_DIFLAG_PROJINHERIT. With that you'll get additional
limitations described in 3) and 4). And I am of the opinion that these
limitations help to maintain sanity in a system where project quotas are
used. I can imagine XFS_DIFLAG_PROJINHERIT would be split in two flags
for ext4 - one controlling whether project ID is inherited, another
controlling whether we enforce rules 3) and 4) but such difference from XFS
would have to be very well justified because different filesystems having
subtly different semantics is a real administrative nightmare, much worse
than the additional cp + unlink done when rename() returns EXDEV because
you tried to rename from one project to another.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-08-11 13:48:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Mon, Aug 11, 2014 at 06:23:53PM +0800, Li Xi wrote:
> As a distributed file system, Lustre is able to use hundreds of seperate
> ext4 file systems to store its data as well as metadata, yet provides a
> united global name space. Some of users start to use SSD devices for better
> performance on Lustre. However as we can expect, they might want to replace
> only part of the drivers to SSD, since SSD is expensive. That means, part
> of the ext4 file systems are using SSD and the other part of the ext4 file
> systems are using hard disks. In the sight of Lustre, users can choose to
> locate files on SSDs or hard disks using features of Lustre, namely 'stripe'
> and 'OST pool'. Here comes the problem, how to limit the usage of SSD since
> all end users want good performance badly?

Ext4 quotas are per-disk, and storage technologies are per disk. So
if *I* were designing a clustered file system, and we had different
cost centers, say, "mail", and "maps", "social", and "search", each of
which might have differnt amounts disk drive and SSD space, which
might be based on how much SSD each of the product area budgets are
willing to pay, and what the requires of each of the products might
be, I'd simply assign different groups to each of these cost centers.

For the purposes of usages of clustered file systems, you don't want
to do quota enforcement. If you've spent tens or hundreds of CPU
years working on some distributed computation, you don't want to throw
it all away due to a quota failure. Or if you are running an
international web-based service, causing a even a partial downtime of
everyone's maps or e-mail due to quota failure is also considered,
well, not cool.

So let's assume that you're only doing usage tracking, but even if you
wanted to do usage control, the files will be scattered across many
different servers and file systems, and so it doesn't make sense to do
quota control, or even usage tracking, on a disk by disk basis.

Hence, the clustered file system will have to sum up the usage quotas
of every each underlying file system, with different sums for the
HDD's and SSD's, by group. Fortunately, Map Reduce is your friend.

Then for each group the cluster file system can report usage of HDD
and SSD space and inodes, separately. When a project gets within a
few terabytes of being filled, or the overall free space in the
cluster drops below a few petabytes, you page the your SRE or devops
team so they can take care of things, perhaps by negotiating an
emergency quota increase, or moving files around, or deleting old
files, etc.

The bottom line is that you *can* run an exabyte+ cluster file system
supporting many different budget/cost centers with only group-level
quotas and nothing else. And you can do this even supporting both
HDD's and SSD's, with separate quota tracking of the two storage
technologies.

Can you go into more detail about how Lustre would use project quotas
from a the cluster file system centric perspective, such as I've
sketched out above?

> Of course, we might be able to find some walk-around ways using group quota.
> However, because the owners of the files can change the group attributes
> freely, it is so easy for the users to evade the group quota and steal the
> tight resources.

But all of the users will be sending chgrp request through Lustre, or
whatever the cluster file system is. So Lustre can enforce whatever
permissions policy it would like.

> For example, in order to steal SSD space, a user can just
> creating the files using the sepcific group ID and then change it back.

But since you've been arguing that the project id should get preserved
across renames, they can evade quota usage by doing:

touch /product/mail/huge_file
mv /product/mail/huge_file /product/maps

And if you allow the rename, and allow the project id to be preserved
across renames, then the quota evasion is just as easy. And yes, you
could prevent renames at the cluster file system level. But the
question remains what makes sense on a single disk system, and if
users can trivially subvert the project quota by creating the file in
one directory, where it inherits the quota of project A, and then be
able to move the file to another directory, they have evaded quota
enforcement just as surely if they used chgrp.

Hence, to prevent this, you need to restrict administrator changes to
the superuser, *and* not allow renames across project hierarchies.
And surprise! That looks exactly what XFS has built.

Cheers,

- Ted

2014-08-11 14:16:05

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> So I actually don't think project quota as implemented by XFS (and with
> which me and Ted want to stay compatible with) isn't different from what you
> want. Let me explain what XFS does:
> 1) Each file has an additional ID - the project ID
> 2) Each dir can have XFS_DIFLAG_PROJINHERIT flag set. When this flag is
> set, all files and directories created in the directory inherit project
> ID, directories also inherit XFS_DIFLAG_PROJINHERIT - this is
> equivalent of sgid bit on directories for gids.
> 3) When you hard-link a file into directory with XFS_DIFLAG_PROJINHERIT
> set, the file already has to have the same project ID as the directory
> you are linking into.
> 4) When you rename a file into a directory with XFS_DIFLAG_PROJINHERIT
> set, the file already has to have the same project ID as the directory
> you are renaming into.
> 5) If you call statfs() on a directory with XFS_DIFLAG_PROJINHERIT set
> and project quota is being enforced, statfs() will return free/used
> blocks of the corresponding project instead of number of free/used
> blocks in the filesystem.
>
> Now if, as an administrator, you decide you need completely generic
> additional ID, you can do so. You just never set XFS_DIFLAG_PROJINHERIT.
> Quota accounting and enforcement works just fine without that flag.
>
> So the discussion really is about the semantics of the
> XFS_DIFLAG_PROJINHERIT flag. If you want automatic inheritance of project
> ids, you set XFS_DIFLAG_PROJINHERIT. With that you'll get additional
> limitations described in 3) and 4). And I am of the opinion that these
> limitations help to maintain sanity in a system where project quotas are
> used. I can imagine XFS_DIFLAG_PROJINHERIT would be split in two flags
> for ext4 - one controlling whether project ID is inherited, another
> controlling whether we enforce rules 3) and 4) but such difference from XFS
> would have to be very well justified because different filesystems having
> subtly different semantics is a real administrative nightmare, much worse
> than the additional cp + unlink done when rename() returns EXDEV because
> you tried to rename from one project to another.
Thank you so much for your detailed introduction! I didn't know there
is a tunable
XFS_DIFLAG_PROJINHERIT flag and I thought that rule 3) and 4) always
took effect. I would like to keep everything compatible with XFS if
this flag can
turn on/off rule 3) and 4) freely. I will check the implement details of XFS in
case of making similar mistakes.

Regards,
Li Xi

2014-08-11 14:40:39

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

> But since you've been arguing that the project id should get preserved
> across renames, they can evade quota usage by doing:
>
> touch /product/mail/huge_file
> mv /product/mail/huge_file /product/maps
I don't really understand why these commands can evade project quota
since:
1) A newly created file will inherit project ID from its parent inode.
2) Project ID will be preserved across renames
3) Project quota won't be transfered unless its project ID is changed.
4) Only root user has the right to change project ID.
The rule 2) and 3) are just the same sematics with UID/GID quotas.
So, becasue of rule 1), after 'touch /product/mail/huge_file', the project
ID of 'huge_file' is 'mail', and its usage is accouted as project 'mail'.
Even we do 'mv /product/mail/huge_file /product/maps', because
of rule 2), there is no project ID updating and no quota transfer. Since
so, the project quota of file 'huge_file' is always accounted as 'mail',
from the first beginning to the end. And that is why I think project quota
of 'mail' can't be evaded in this way.

According to the discription of Jan Kara, I think the current implementation
of project quota support for ext4 is just a subset of XFS project quota, i.e.
project quota without XFS_DIFLAG_PROJINHERIT flag. I would like to
add such kind of flag for sure.

Regards,
Li Xi

2014-08-11 14:41:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, Aug 09, 2014 at 12:39:58AM +0800, Li Xi wrote:
> It is obvious that extended attribute implementation has performance
> impact when creating files. That is why we choose to push the patches
> which use internal inode field to save project ID.

Looking at your numbers more closely, I bit you are parsing the
extended attributes each time you need to adjust the project quota for
the file, correct? I suspect that if you cache the project ID in the
in-memory struct ext4_inode_info, the performance difference between
using an extended attribute versus an internal inode field will be
negligible. The only difference would be a tiny amount of CPU time
when you first create the inode, and when you read the inode from the
inode table block on disk, since the project ID will under normal
circumstances never or hardly ever change.

Regards,

- Ted

2014-08-11 14:45:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Mon, Aug 11, 2014 at 10:40:38PM +0800, Li Xi wrote:
> > But since you've been arguing that the project id should get preserved
> > across renames, they can evade quota usage by doing:
> >
> > touch /product/mail/huge_file
> > mv /product/mail/huge_file /product/maps
> I don't really understand why these commands can evade project quota
> since:
> 1) A newly created file will inherit project ID from its parent inode.
> 2) Project ID will be preserved across renames
> 3) Project quota won't be transfered unless its project ID is changed.
> 4) Only root user has the right to change project ID.
> The rule 2) and 3) are just the same sematics with UID/GID quotas.
> So, becasue of rule 1), after 'touch /product/mail/huge_file', the project
> ID of 'huge_file' is 'mail', and its usage is accouted as project 'mail'.
> Even we do 'mv /product/mail/huge_file /product/maps', because
> of rule 2), there is no project ID updating and no quota transfer. Since
> so, the project quota of file 'huge_file' is always accounted as 'mail',
> from the first beginning to the end. And that is why I think project quota
> of 'mail' can't be evaded in this way.

Yes, and *that* is the quota evasion. There is no difference in terms
of who ends up owning the quota between:

touch /product/mail/huge_file
mv /product/mail/huge_file /product/maps

and

touch /product/maps/huge_file
chgrp mail /product/maps/huge_file

Either way, a file that is storing maps information (that is why it is
in /product/maps/huge_file) ends up getting accounted against the mail
product's quota.

So if you say, ok, we're using project quota, we won't allow:

chproject mail /product/maps/huge_file

But then the user can just do this instead:

touch /product/mail/huge_file
mv /product/mail/huge_file /product/maps

This is why we MUST NOT allow the rename, or force the project quota
to change when you move the inode to a different directory hierarchy
owned by a different project.

- Ted

2014-08-11 14:49:59

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Mon, Aug 11, 2014 at 10:41 PM, Theodore Ts'o <[email protected]> wrote:
> On Sat, Aug 09, 2014 at 12:39:58AM +0800, Li Xi wrote:
>> It is obvious that extended attribute implementation has performance
>> impact when creating files. That is why we choose to push the patches
>> which use internal inode field to save project ID.
>
> Looking at your numbers more closely, I bit you are parsing the
> extended attributes each time you need to adjust the project quota for
> the file, correct? I suspect that if you cache the project ID in the
> in-memory struct ext4_inode_info, the performance difference between
> using an extended attribute versus an internal inode field will be
> negligible. The only difference would be a tiny amount of CPU time
> when you first create the inode, and when you read the inode from the
> inode table block on disk, since the project ID will under normal
> circumstances never or hardly ever change.
Yeah, I cached project ID in memory, not only for this internal inode field
implementation but also for xattr based implementation. Yeah, that is
right that reading project ID doesn't impact performance. However, as
the results shows, creating xattr costs extra time. That is why creating files
on xattr based implemetation is significantly slower than internal inode filed
implementation. Other operations won't be effected at all. We confirmed
this in varible way. If we remove xattr saving when creating files, the
performance will go up immediately. And also, we confirmed that ACL
has similar performance problem.

Regards,
Li Xi

2014-08-11 15:03:49

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Mon, Aug 11, 2014 at 10:45 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Aug 11, 2014 at 10:40:38PM +0800, Li Xi wrote:
>> > But since you've been arguing that the project id should get preserved
>> > across renames, they can evade quota usage by doing:
>> >
>> > touch /product/mail/huge_file
>> > mv /product/mail/huge_file /product/maps
>> I don't really understand why these commands can evade project quota
>> since:
>> 1) A newly created file will inherit project ID from its parent inode.
>> 2) Project ID will be preserved across renames
>> 3) Project quota won't be transfered unless its project ID is changed.
>> 4) Only root user has the right to change project ID.
>> The rule 2) and 3) are just the same sematics with UID/GID quotas.
>> So, becasue of rule 1), after 'touch /product/mail/huge_file', the project
>> ID of 'huge_file' is 'mail', and its usage is accouted as project 'mail'.
>> Even we do 'mv /product/mail/huge_file /product/maps', because
>> of rule 2), there is no project ID updating and no quota transfer. Since
>> so, the project quota of file 'huge_file' is always accounted as 'mail',
>> from the first beginning to the end. And that is why I think project quota
>> of 'mail' can't be evaded in this way.
>
> Yes, and *that* is the quota evasion. There is no difference in terms
> of who ends up owning the quota between:
>
> touch /product/mail/huge_file
> mv /product/mail/huge_file /product/maps
>
> and
>
> touch /product/maps/huge_file
> chgrp mail /product/maps/huge_file
>
> Either way, a file that is storing maps information (that is why it is
> in /product/maps/huge_file) ends up getting accounted against the mail
> product's quota.
Ah... I am getting the point. :) Yeah, it seems like strange that huge_file
is under directory 'maps' but using the space of 'mail'. But that looks normal
immediately, if we judge files by what it has (i.e. project ID) rather
where it is
(i.e. directory path). Actually we are doing the same things for UID/GID. For
example, let's assume User1 and User2 are using /home/user1 and /home/user2.
We don't think following commds is way of evade quota:

touch /home/user1/huge_file
mv /home/user1/huge_file /home/user2

Then, why things should be different for project qutoa?

Regards,
- Li Xi

2014-08-12 15:35:58

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Sat, 9 Aug 2014 00:39:58 +0800, Li Xi <[email protected]> wrote:
> Hi all,
>
> The following patches propose an implementation of project support
> for ext4. A project is an aggregate of unrelated inodes which might
> scatter in different directories. Inodes belongs to a project
> possesses a same identification i.e. 'project ID', just like every
> inode has its user/group indentification. The following patches adds
> project quota as supplement to the former uer/group quota types.
>
> This project ID of an inode is iherited from its parent direcotry
> and saved as an internal field of ext4 inode.
>
> This is not the first existed attepmtion to add project quta support
> for ext4. Patches of subtree quota support which was posted by Dmity
> Monakhov in 2012 (http://lwn.net/Articles/506064/) implemented the
> similar feature in a different way. Rather than saving the project
> (or subtree) ID as an internal inode field, those patches manages
> the ID as extented attributes.
>
> We rebased both patch sets onto the same kernel version and run
> benchmakrs respectively to comparing the peformance difference.
> It is worth noting that patches from Lai Siyao and Niu Yawei
> (quota: remove-dqptr_sem,
> http://article.gmane.org/gmane.comp.file-systems.ext4/44341/)
> improve the performance of quota enforcement significantly, which
> can be seen clearly from following results.
>
> It is obvious that extended attribute implementation has performance
> impact when creating files. That is why we choose to push the patches
> which use internal inode field to save project ID.
I'll bet a box of "russian caviar" that AlViro will never ever allow
to place project ID to generic inode. Because it is obviously has no
reason for any other filesystem except xfs/ext4.

BTW. Which quota options you use for performance testing? It looks like
you use non-journaled quota. But this means that you have to fully
recalculate quota in case of power failure. It is reasonable to enable
journaled-quota, but it result in visible journaling overhead.
>
> Kernel: 3.16.0-rc5
> Server: Dell R620 (2 x [email protected], 256GB memory)
> Storage: 10 x 15K SAS disks(RAID10)
> Test tool: mdtest-1.9.3. Mdtest created 800K files in total. Each
> thread created files in unique directory.
>
> File Creation:
> 1thr 2thr 4thr 8thr 16thr
> - vanilla
> quota disabled 66094 105781 178968 186647 172536
> quotaon(ug) 60337 99582 157396 171463 162872
>
> - vanilla + remove-dqptr_sem patches
> quota disabled 65955 112082 185550 181511 171988
> quotaon(ug) 62391 101905 171013 190570 168914
>
> - prjquota(xattr)
> quota disabled 61396 97580 147852 146423 164895
> quotaon(ug) 57009 93435 140589 135748 153196
> quotaon(ugP) 57500 89419 133604 125291 105127
>
> - prjquota(xattr) + remove-dqptr_sem patches
> quota disabled 64053 100078 147608 139403 163960
> quotaon(ug) 60754 104726 149231 139053 165990
> quotaon(ugP) 59238 93606 148921 138434 163931
>
> - prjquota(internal) + remove-dqptr_sem patches
> quota disabled 65826 111828 181486 189227 171241
> quotaon(ug) 65418 107745 173584 180562 173752
> quotaon(ugP) 64669 103890 169176 186426 172192
>
>
> File Removal:
> 1thr 2thr 4thr 8thr 16thr
> - vanilla
> quota disabled 118059 169825 234661 291812 345656
> quotaon(ug) 106675 135834 153532 100437 87489
>
> - vanilla + remove-dqptr_sem patches
> quota disabled 120374 168437 236818 291754 331141
> quotaon(ug) 110709 161954 238333 293700 329015
>
> - prjquota(xattr)
> quota disabled 116680 161662 229190 295642 332959
> quotaon(ug) 104783 134359 154950 100516 87923
> quotaon(ugP) 100240 125978 108653 68286 58991
>
> - prjquota(xattr) + remove-dqptr_sem patches
> quota disabled 116281 168938 233733 286663 344002
> quotaon(ug) 109775 164995 236001 299389 340683
> quotaon(ugP) 113935 162979 236112 300033 356117
>
> - prjquota(internal) + remove-dqptr_sem patches
> quota disabled 119537 171565 247418 291068 350138
> quotaon(ug) 121756 159580 240778 298012 342437
> quotaon(ugP) 118954 168022 241206 289055 334008
>
> Changelog:
> * v2 <- v1:
> - Add ioctl interface for setting/getting project;
> - Add EXT4_FEATURE_RO_COMPAT_PROJECT;
> - Add get_projid() method in struct dquot_operations;
> - Add error check of ext4_inode_projid_set/get().
>
> v1: http://article.gmane.org/gmane.comp.file-systems.ext4/45153
>
> Any comments or feedbacks are appreciated.
>
> Regards,
> - Li Xi
>
> Li Xi(4):
> quota: Adds general codes to enforces project quota limites
> ext4: Adds project ID support for ext4
> ext4: Adds project quota support for ext4
> ext4: Adds ioctl interface support for ext4 project
>
> Documentation/filesystems/ext4.txt | 4 +
> fs/ext4/Kconfig | 11 --
> fs/ext4/Makefile | 1 -
> fs/ext4/ext4.h | 19 +++-
> fs/ext4/ialloc.c | 16 +--
> fs/ext4/inode.c | 85 +++++++++++++-
> fs/ext4/ioctl.c | 100 ++++++++++++++++
> fs/ext4/project.c | 224 ------------------------------------
> fs/ext4/project.h | 58 ---------
> fs/ext4/super.c | 45 ++++++--
> fs/ext4/xattr.c | 6 -
> fs/ext4/xattr.h | 2 -
> fs/quota/Kconfig | 9 ++
> fs/quota/dquot.c | 120 ++++++++++++++-----
> fs/quota/quota.c | 5 +-
> fs/quota/quotaio_v2.h | 4 +-
> include/linux/fs.h | 1 -
> include/linux/quota.h | 8 ++
> include/uapi/linux/xattr.h | 2 -
> 19 files changed, 345 insertions(+), 375 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-08-13 02:32:31

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Tue, Aug 12, 2014 at 11:35 PM, Dmitry Monakhov <[email protected]> wrote:
> I'll bet a box of "russian caviar" that AlViro will never ever allow
> to place project ID to generic inode. Because it is obviously has no
> reason for any other filesystem except xfs/ext4.
Yeah, understood. Maybe the discription is misleading. I added a
field of project ID in ext4_inode structure. The general ext4 inode
structure is not changed.
>
> BTW. Which quota options you use for performance testing? It looks like
> you use non-journaled quota. But this means that you have to fully
> recalculate quota in case of power failure. It is reasonable to enable
> journaled-quota, but it result in visible journaling overhead.
Yeah, we were using non-journaled quota. And we were doing this
benchmark to confirm that xattr based implementation has extra
overhead. We will run benchmarks on journaled-quota, and let's see
what is the performance difference between non-journaled and
journaled quotas.

Regards,
-Li Xi

2014-08-13 13:22:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Wed, Aug 13, 2014 at 10:32:31AM +0800, Li Xi wrote:
> Yeah, we were using non-journaled quota. And we were doing this
> benchmark to confirm that xattr based implementation has extra
> overhead. We will run benchmarks on journaled-quota, and let's see
> what is the performance difference between non-journaled and
> journaled quotas.

Can you give a lot of details about exactly how you ran the benchmark
(and run future benchmarks)? Was this on a ramdisk? An SSD? A HDD?
How many CPU's, how many threads were creating files, etc. And do you
understand where the performance overhead was coming from? Was it CPU
overhead? Locking overhead?

It just doesn't make sense that storing the value in the xattr, when
the xattr is stored in the on-disk inode, that it should make a huge
difference; the cost of the I/O should completely dominate the cost of
whether we format the bytes as an integer or storing it in the
in-inode xattr. So either there is a bug in the benchmark, or a bug
in our code somewhere. Either way, we should find and fix it.

Regards,

- Ted

2014-08-14 01:34:55

by Li Xi

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] quota: add project quota support

On Wed, Aug 13, 2014 at 9:22 PM, Theodore Ts'o <[email protected]> wrote:
> On Wed, Aug 13, 2014 at 10:32:31AM +0800, Li Xi wrote:
>> Yeah, we were using non-journaled quota. And we were doing this
>> benchmark to confirm that xattr based implementation has extra
>> overhead. We will run benchmarks on journaled-quota, and let's see
>> what is the performance difference between non-journaled and
>> journaled quotas.
>
> Can you give a lot of details about exactly how you ran the benchmark
> (and run future benchmarks)? Was this on a ramdisk? An SSD? A HDD?
> How many CPU's, how many threads were creating files, etc. And do you
> understand where the performance overhead was coming from? Was it CPU
> overhead? Locking overhead?
>
> It just doesn't make sense that storing the value in the xattr, when
> the xattr is stored in the on-disk inode, that it should make a huge
> difference; the cost of the I/O should completely dominate the cost of
> whether we format the bytes as an integer or storing it in the
> in-inode xattr. So either there is a bug in the benchmark, or a bug
> in our code somewhere. Either way, we should find and fix it.
All these tests were running on following environment:

Kernel: 3.16.0-rc5
Server: Dell R620 (2 x [email protected], 16 CPU cores, 256GB memory(1866MHz))
Storage: Dothill 3730(10 x 15K RPM SAS, RAID1+0, 2GB RAID Cache, 8Gbps FC)
Test tool: mdtest-1.9.3. Mdtest created 800K files in total. Each
thread created files in unique directory.

And we are using default settings of ext4, execept journal size is set
to 4GB. Following is the output of dumpe2fs:

Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 98b77829-6154-4c5f-a0a5-da1306e82d8d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 6111232
Block count: 24414048
Reserved block count: 1220702
Free blocks: 22966646
Free inodes: 6111221
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1018
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stripe width: 256
Flex block group size: 16
Filesystem created: Wed Aug 13 23:52:54 2014
Last mount time: n/a
Last write time: Wed Aug 13 23:52:57 2014
Mount count: 0
Maximum mount count: -1
Last checked: Wed Aug 13 23:52:54 2014
Check interval: 0 (<none>)
Lifetime writes: 4104 MB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 96a5bb0c-3fcf-4114-bba3-049f15a2d8f4
Journal backup: inode blocks
Journal features: (none)
Journal size: 4096M
Journal length: 1048576
Journal sequence: 0x00000001
Journal start: 0

We did several rounds of file creation and removal:
1) vanilla kernel
1a) diabling quota
1b) enabling user/group quota

2) vanilla kernel with remove-dqptr_sem patches
2a) diabling quota
2b) enabling user/group quota

3) kernel with xattr based project quota
3a) diabling quota
3b) enabling user/group quota
3c) enabling user/group/project quota

4) kernel with remove-dqptr_sem patches as well as xattr based project quota
4a) diabling quota
4b) enabling user/group quota
4c) enabling user/group/project quota

5) kernel with remove-dqptr_sem patches as well as inode-internal project
quota
5a) diabling quota
5b) enabling user/group quota
5c) enabling user/group/project quota

6) kernel with remove-dqptr_sem patches as well as xattr based project quota,
but skipped writing the project ID xattr onto disk.
6a) diabling quota
6b) enabling user/group quota
6c) enabling user/group/project quota

7) kernel with remove-dqptr_sem patches, but test in a directory with
default ACL. That means file creation will have to inherit ACL xattr from
the directory too.
7a) diabling quota
7b) enabling user/group quota


We got following conclusion:

A) The results of 1a) and 1b) shows that quota enforcement brings
significant performance regression, especially to file removal, if
remove-dqptr_sem patches are not landed.

B) The results of 1b) and 2b) shows that remove-dqptr_sem patches indeed
improves performance when quota is enabled.

C) The results of 2a) and 2b) shows that remove-dqptr_sem patches elimits
the overhead of enforcing quota.

D) The results of 3a) and 1a) shows that xattr based quota brings
significant performance regression to file creation. Even the quota
enforcement is disabled, the iheriting of project ID harms the
performance.

E) The results of 3b) and 2b) confirms D).

F) The results of 3c) and 3b) shows that extra quota enforcement
brings performance regression both to file creation and removal, if
remove-dqptr_sem patches are not landed.

G) The results of 4b) and 3b) confirms B).

H) The results of 4a), 4b) and 4c) confirms C). Extra quota enforcement
does not necessarily harms performance if remove-dqptr_sem patches
are landed.

I) The results of 4c) and 3c) confirms C).

J) The results of 5a) and 4a) shows that inode-internal based project ID
improves performance, especially for file creation. Since the quota
enforcement is not disabled, the overhead of inheriting project ID harms
the performance of 4a).

K) The results of 5b) and 4b) confirms J).

L) The results of 5c) and 4c) confirms J).

M) The results of 6) and 4) confirms saving xattr of project ID is the main
cause of the performance regression.

N) The result of 7) and 2) shows that iheriting default ACL
causes performance regression for file creation. This result confirms M).


Following are the test results:
File Creation:
1thr 2thr 4thr 8thr 16thr
1) vanilla
1a) quota disabled 66094 105781 178968 186647 172536
1b) quotaon(ug) 60337 99582 157396 171463 162872

2) vanilla + remove-dqptr_sem patches
2a) quota disabled 65955 112082 185550 181511 171988
2b) quotaon(ug) 62391 101905 171013 190570 168914

3) prjquota(xattr)
3a) quota disabled 61396 97580 147852 146423 164895
3b) quotaon(ug) 57009 93435 140589 135748 153196
3c) quotaon(ugP) 57500 89419 133604 125291 105127

4) prjquota(xattr) + remove-dqptr_sem patches
4a) quota disabled 64053 100078 147608 139403 163960
4b) quotaon(ug) 60754 104726 149231 139053 165990
4c) quotaon(ugP) 59238 93606 148921 138434 163931

5) prjquota(internal) + remove-dqptr_sem patches
5a) quota disabled 65826 111828 181486 189227 171241
5b) quotaon(ug) 65418 107745 173584 180562 173752
5c) quotaon(ugP) 64669 103890 169176 186426 172192

6) prjquota(xattr) + remove-dqptr_sem patches + skip xattr saving
6a) quota disabled 68590 112022 181028 185626 174231
6b) quotaon(ug) 58189 99716 167318 179360 180188
6c) quotaon(ugP) 59049 99110 172885 181841 172034

7) vanilla + remove-dqptr_sem patches + in directory with default ACL
7a) quota disabled 63630 103930 134828 114534 137676
7b) quotaon(ug) 62274 93960 130317 111247 138620

File Removal:
1thr 2thr 4thr 8thr 16thr
1) vanilla
1a) quota disabled 118059 169825 234661 291812 345656
1b) quotaon(ug) 106675 135834 153532 100437 87489

2) vanilla + remove-dqptr_sem patches
2a) quota disabled 120374 168437 236818 291754 331141
2b) quotaon(ug) 110709 161954 238333 293700 329015

3) prjquota(xattr)
3a) quota disabled 116680 161662 229190 295642 332959
3b) quotaon(ug) 104783 134359 154950 100516 87923
3c) quotaon(ugP) 100240 125978 108653 68286 58991

4) prjquota(xattr) + remove-dqptr_sem patches
4a) quota disabled 116281 168938 233733 286663 344002
4b) quotaon(ug) 109775 164995 236001 299389 340683
4c) quotaon(ugP) 113935 162979 236112 300033 356117

5) prjquota(internal) + remove-dqptr_sem patches
5a) quota disabled 119537 171565 247418 291068 350138
5b) quotaon(ug) 121756 159580 240778 298012 342437
5c) quotaon(ugP) 118954 168022 241206 289055 334008

6) prjquota(xattr) + remove-dqptr_sem patches + skip xattr saving
6a) quota disabled 120573 170316 239318 300070 344562
6b) quotaon(ug) 102690 156876 233259 302307 330111
6c) quotaon(ugP) 104816 161109 239573 294188 332578

7) vanilla + remove-dqptr_sem patches + in directory with default ACL
7a) quota disabled 123580 173847 246271 302456 335549
7b) quotaon(ug) 115557 156577 236049 305340 339873


115557 156577 236049 305340 339873

Following is the script we used to get result 3), 4), 5) and 6). (Please
note that we wrote a xattr interface for inode-internal implementation too.
That is why we are able to use this script to get result of 5).)
--
#!/bin/sh

export PATH=/work/tools/openmpi-1.6.2/bin:/usr/local/projquota/bin:/usr/local/projquota/sbin:$PATH
export LD_LIBRARY_PATH=/work/tools/openmpi-1.6.2/lib/
NFILES=800000
DIR=/mnt/ext4/testdir/
DEV=/dev/mapper/LUN01
MNTPT=/mnt/ext4
MPIRUN="mpirun -mca btl ^openib"
USER_ID=1000
PROJ_ID=1000

do_mount()
{
local opts="$1"

sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
sudo umount $MNTPT
sudo sh -c "echo y | mkfs.ext4 -J size=4096 $DEV"
sudo mount -t ext4 $DEV $opts $MNTPT
sudo mkdir $DIR
#sudo setfacl -m d:user1:--- $DIR
sudo chmod 777 $DIR
}

quota_on()
{
local opts="$1"

sudo -i quotacheck $opts $MNTPT
sudo -i quotaon $opts $MNTPT
}

run_pretest()
{
local logfile=$1
local opts=$2
local nfiles=1000

$MPIRUN -np 1 ./mdtest -n $nfiles -u -i 1 -F -C -d $DIR
echo "#### quota $opts after file creation ####" > $logfile 2>&1
sudo -i quota $opts >> $logfile 2>&1
$MPIRUN -np 1 ./mdtest -n $nfiles -u -i 1 -F -r -d $DIR
echo "#### quota $opts after file removal ####" >> $logfile 2>&1
sudo -i quota $opts >> $logfile 2>&1
}

run_mdtest()
{
local logfile=$1

for n in 1 2 4 8 16; do
nfiles=$((NFILES/n))
$MPIRUN -np $n ./mdtest -n $nfiles -u -i 5 -F -d $DIR >> $logfile 2>&1
done
}

# test with disabled quota
test_disable_quota()
{
do_mount
run_mdtest mdtest-disable-noopt.log
}

# test with enabled quota, but only user/group quota enforced
test_enable_quota_ug()
{
do_mount "-o usrquota,grpquota"
quota_on "-ug"
sudo -i setquota -u $USER_ID 0 0 0 0 $MNTPT
run_pretest "mdtest-enable-ug.log" "-vu $USER_ID"
run_mdtest mdtest-enable-ug.log
}

# test with enabled quota and enforced project quota as well as user/group quota
test_enable_quota_ugP()
{
do_mount "-o usrquota,grpquota,prjquota"
quota_on "-ugP"
sudo -i setfattr -n "system.project" -v $PROJ_ID $DIR
sudo -i setquota -P $PROJ_ID 0 0 0 0 $MNTPT
run_pretest "mdtest-enable-ugP.log" "-vP $PROJ_ID"
run_mdtest mdtest-enable-ugP.log
}

sudo sh -c ./tune.sh

test_disable_quota
test_enable_quota_ug
test_enable_quota_ugP