2012-05-30 14:58:54

by Jeff Liu

[permalink] [raw]
Subject: container disk quota

Hello All,

According to glauber's comments regarding container disk quota, it should be binded to mount
namespace rather than cgroup.

Per my try out, it works just fine by combining with userland quota utilitly in this way.
However, they are something has to be done at user tools too IMHO.

Currently, the patchset is in very initial phase, I'd like to post it early to seek more
feedbacks from you guys.

Hopefully I can clarify my ideas clearly.

Kernel part:
* Container quota can be enabled indenpent to VFS quota or particular file system quota.
quota per user/group are kept at memory instead of saved at separately files like general quota.
There is no need to remount the rootfs inside container with general quota strings, quota could be
enabled through quotaon/off directly.

* Always honor underlying file system quota checking firstly. i.e, the exported quota bill up
routines are take affected only after file system quota check up done if it is enabled at the
same time. hence the space allocation or inode creation inside container will failed if the
outside quota limits were exceeded.

* Make use of the general VFS Q_XXXX quota control flags.

* Introduce a new disk quota struture as well as the operations to mount namespacedata structure,
it should only be allocated and initialized at CLONE stage for contianer.

* Modify quotactl(2) to examine if the caller is invoked inside container.
implemented by checking the quota device name("rootfs" for lxc guest) or current pid namespace
is not the initial one, then do mount namespace quotactl if required, or goto
the normal quotactl procedure.

* Introduce a new quota format "QFMT_NS" for container. It will be used to examine the quota
format at userland tools, so that quotacheck will do container quota IO initialization and
proceeding operations. This flag returned when Q_GETQINFO was issued.

* Export a couple of container quota bill routines to the desired underlying
file system. They will take affected if container quota is enabled at kernel
configuration, or just some inline functions without much overhead.

* Also, I have not handle a couple of things for now.
. I think the container quota should be isolated to Jan's fs/quota/ directory.
. There are a dozens of helper routines at general quota, e.g,
struct if_dqblk <-> struct fs_disk_quota converts.
dquot space and inodes bill up.
They can be refactored as shared routines to some extents.
. quotastats(8) is not teached to aware container for now.

Changes in quota userland utility:
* Introduce a new quota format string "lxc" to all quota control utility, to
let each utility know that the user want to run container quota control. e.g:
quotacheck -cvugm -F "lxc" /
quotaon -u -F "lxc" /
....

* Currently, I manually created the underlying device(by editing cgroup
device access list and running mknod /dev/sdaX x x) for the rootfs
inside containers to let the cache mount points routine pass for
executing quotacheck against the "/" directory. Actually, it can be
omitted here.

* Add a new quotaio_lxc.c[.h] for container quota IO, it basically same to
VFS quotaio logic, I just hope to isolate container stuff here.

Issues:
* How to detect quotactl(2) is launched from container in a reasonable way.

* Do we need to let container quota works for cgroup combine with unshare(1)?
Now the patchset is mainly works for lxc guest. IMHO, it can be used outside
guest if the user desired. In this case, the quota limits can take effort
among different underlying file systems if they have exported quota billing
routines.

* As the configure entry for print warnning info to TTY has been marked to
obsoleted, do we still need to support that.

* The warnning info format for sending it through netlink interface.
VFS quota has a device parameter filled in the warns, how we define the
format for container?

* The hash table list defines(hash table size)for dquot caching for each type is
referred to kernel/user.c, maybe its better to define an array separatly for
performance optimizations. Of course, that's all depending on my current
implementation is on the right road. :)

* Container quota statistics, should them be calculated and exposed to /proc/fs/quota? If the underlying file system also enabled with quotas, they will be
mixed up, so how about add a new proc file like "ns_quota" there?

* Memory shrinks acquired from kswap.
As all dquot are cached in memory, and if the user executing quotaoff, maybe
I need to handle quota disable but still be kept at memory.
Also, add another routine to disable and remove all quotas from memory to
save memory directly.

* Project quota(i.e, tree quota) support.
Now the quota implemented without project quota supports, but it can be
supported not complex based on current code, add a new parameter to
ns_dquot_alloc_block(), etc... is ok.
However, XFS support project quota setup on xfs tools, I observed there
already have patchset for this feature in EXT4 mailist, is it possble
to supply a unique interface and implementation to quota tools in the
furture?
AFAICS, project quota can be setup in container, because of we can
fetch the super block from the transferred path. Hence, the desired
ioctl(2) for underlying file system can be invoked.

* Security check up for mount namespace quotactl(2).
In this version, I only do basic security check up to see if the caller
has properly permissions for doing that. I think I must miss much things
in this point.

Testing:
Currently patch is lacking tests, I only do a few check to make sure the
basic operations works.

First of all, we need to invoke quotacheck with "--no-remount" opition
since the rootfs inside container guest can not be remouted:
root@debian:~/# quotacheck -cvugm -F "lxc" /
quotacheck: quotacheck: Scanning rootfs [/] done
quotacheck: Old user file name could not been determined. Usage will not be subtracted.
quotacheck: Old group file name could not been determined. Usage will not be subtracted.
quotacheck: Old user file name could not been determined. Usage will not be subtracted.
quotacheck: Old group file name could not been determined. Usage will not be subtracted.
quotacheck: Checked 3370 directories and 39434 files

By default, user/group quota is off:
root@debian:~/# quotaon -u -F "lxc" -p /
user quota on / (rootfs) is off

root@debian:~/# quotaon -u -F "lxc" -p /
group quota on / (rootfs) is off

Turn them on:
root@debian:~/# quotaon -u -F "lxc" /
root@debian:~/# quotaon -g -F "lxc" /
root@debian:~/# quotaon -u -F "lxc" -p /
user quota on / (rootfs) is on
root@debian:~/# quotaon -g -F "lxc" -p /
group quota on / (rootfs) is on

Edit quota, soft/hard for both space and inode are zeros by default:
configure them to a desired value:
root@debian:~/# edquota -u -F "lxc" /
Disk quotas for user jeff (uid 1000):
Filesystem blocks soft hard inodes soft
hard
rootfs 2025740 2025840 2026000 42786 42790 42800

The configuration are saved properly:
root@debian:~/# repquota -u -F "lxc" /
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff -- 2025740 2025840 2026000 42786 42790 42800

Do checking for blocks and inodes limits:
root@debian:~/# su - jeff
jeff@debian:/$ dd if=/dev/zero of=abc bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 1.19014 s, 8.8 MB/s
root@debian:~/# repquota -u -F "lxc" /
Jeff *** report() type=0 handle index=0
*** Report for user quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff +- 2025980 2025840 2026000 7days 42786 42790 42800

root@debian:~/# repquota -g -F "lxc" /
*** Report for group quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
Group used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 8564 0 0 390 0 0
adm -- 220 0 0 6 0 0
tty -- 0 0 0 1 0 0
utmp -- 4 0 0 1 0 0
jeff -- 2021268 0 0 42716 0 0

root@debian:~/# su - jeff
jeff@debian:/$ dd if=/dev/zero of=test_space bs=1M count=100
dd: writing `test_space': Disk quota exceeded
11+0 records in
10+0 records out
10506240 bytes (11 MB) copied, 1.24721 s, 8.4 MB/s

root@debian:~/# repquota -u -F "lxc" /
Jeff *** report() type=0 handle index=0
*** Report for user quotas on device rootfs
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff +- 2026000 2025840 2026000 7days 42786 42790 42800

root@debian:~/# su - jeff
jeff@debian:/$ for ((i=0; i<20; i++)); do touch test_file_cnt.$i; done
touch: cannot touch `test_file_cnt.14': Disk quota exceeded
touch: cannot touch `test_file_cnt.16': Disk quota exceeded
touch: cannot touch `test_file_cnt.18': Disk quota exceeded

root@debian:~/# repquota -u -F "lxc" /
Block grace time: 00:00; Inode grace time: 00:00
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 44 0 0 20 0 0
jeff ++ 2026000 2025840 2026000 6days 42800 42790 42800 7days

Any comments are appreciated, have a nice day!

-Jeff


2012-05-31 08:54:25

by Glauber Costa

[permalink] [raw]
Subject: Re: container disk quota

On 05/30/2012 06:58 PM, [email protected] wrote:
> Hello All,
>
> According to glauber's comments regarding container disk quota, it should be binded to mount
> namespace rather than cgroup.
>
> Per my try out, it works just fine by combining with userland quota utilitly in this way.
that's great.

I'll take a look at the patches.


>
> * Modify quotactl(2) to examine if the caller is invoked inside container.
> implemented by checking the quota device name("rootfs" for lxc guest) or current pid namespace
> is not the initial one, then do mount namespace quotactl if required, or goto
> the normal quotactl procedure.

I dislike the use of "lxc" name. There is nothing lxc-specific in this,
this is namespace-specific. lxc is just one of the container solutions
out there, so let's keep it generic.

>
> * Also, I have not handle a couple of things for now.
> . I think the container quota should be isolated to Jan's fs/quota/ directory.
> . There are a dozens of helper routines at general quota, e.g,
> struct if_dqblk<-> struct fs_disk_quota converts.
> dquot space and inodes bill up.
> They can be refactored as shared routines to some extents.
> . quotastats(8) is not teached to aware container for now.
>
> Changes in quota userland utility:
> * Introduce a new quota format string "lxc" to all quota control utility, to
> let each utility know that the user want to run container quota control. e.g:
> quotacheck -cvugm -F "lxc" /
> quotaon -u -F "lxc" /
> ....
>
> * Currently, I manually created the underlying device(by editing cgroup
> device access list and running mknod /dev/sdaX x x) for the rootfs
> inside containers to let the cache mount points routine pass for
> executing quotacheck against the "/" directory. Actually, it can be
> omitted here.
>
> * Add a new quotaio_lxc.c[.h] for container quota IO, it basically same to
> VFS quotaio logic, I just hope to isolate container stuff here.
>
> Issues:
> * How to detect quotactl(2) is launched from container in a reasonable way.

It's a system call. It is always called by a process. The process
belongs to a namespace. What else is needed?

> * Do we need to let container quota works for cgroup combine with unshare(1)?
> Now the patchset is mainly works for lxc guest. IMHO, it can be used outside
> guest if the user desired. In this case, the quota limits can take effort
> among different underlying file systems if they have exported quota billing
> routines.

I still don't understand what is the business of cgroups here. If you
are attaching it to mount namespace, you can always infer the context
from the calling process. I still need to look at your patches, but I
believe that dropping the "feature" of manipulating this from outside of
the container will save you a lot of trouble.

Please note that a process can temporarily join a namespace with
setns(). So you can have a *utility* that does it from the outer world,
but the kernel has no business with that. As far as we're concerned, I
believe that you should always get your context from the current
namespace, and forbid any usage from outside.

> * The hash table list defines(hash table size)for dquot caching for each type is
> referred to kernel/user.c, maybe its better to define an array separatly for
> performance optimizations. Of course, that's all depending on my current
> implementation is on the right road. :)
>
> * Container quota statistics, should them be calculated and exposed to /proc/fs/quota? If the underlying file system also enabled with quotas, they will be
> mixed up, so how about add a new proc file like "ns_quota" there?
No, this should be transferred to the process-specific proc and them
symlinked. Take a look at "/proc/self".

>
> * Memory shrinks acquired from kswap.
> As all dquot are cached in memory, and if the user executing quotaoff, maybe
> I need to handle quota disable but still be kept at memory.
> Also, add another routine to disable and remove all quotas from memory to
> save memory directly.

I didn't read your patches yet, so take it with a grain of salt here.
But I don't understand why you make this distinction of keeping it in
memory only.

You could keep quota files outside of the container, and then bind mount
them to the current location in the setup-phase.

2012-05-31 09:19:07

by Glauber Costa

[permalink] [raw]
Subject: Re: container disk quota

On 05/31/2012 12:54 PM, Glauber Costa wrote:
> On 05/30/2012 06:58 PM, [email protected] wrote:
>> Hello All,
>>
>> According to glauber's comments regarding container disk quota, it
>> should be binded to mount
>> namespace rather than cgroup.
>>
>> Per my try out, it works just fine by combining with userland quota
>> utilitly in this way.
> that's great.
>
> I'll take a look at the patches.

Despite my criticism, I do believe those are a lot better than the
previous version. It seems to me to be the right direction, but some
more transparency is lacking.

I believe you should be able to do it without a lot less changes to the
tools - if any.

2012-05-31 12:31:42

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

Hi Glauber,

Thanks for you comments!

On 05/31/2012 04:54 PM, Glauber Costa wrote:

> On 05/30/2012 06:58 PM, [email protected] wrote:
>> Hello All,
>>
>> According to glauber's comments regarding container disk quota, it
>> should be binded to mount
>> namespace rather than cgroup.
>>
>> Per my try out, it works just fine by combining with userland quota
>> utilitly in this way.
> that's great.
>
> I'll take a look at the patches.
>
>
>>
>> * Modify quotactl(2) to examine if the caller is invoked inside
>> container.
>> implemented by checking the quota device name("rootfs" for lxc
>> guest) or current pid namespace
>> is not the initial one, then do mount namespace quotactl if
>> required, or goto
>> the normal quotactl procedure.
>
> I dislike the use of "lxc" name. There is nothing lxc-specific in this,
> this is namespace-specific. lxc is just one of the container solutions
> out there, so let's keep it generic.

I think I should forget all things regarding LXC, just treat it as a new
quota feature with regard to namespace.

>>
>> * Also, I have not handle a couple of things for now.
>> . I think the container quota should be isolated to Jan's fs/quota/
>> directory.
>> . There are a dozens of helper routines at general quota, e.g,
>> struct if_dqblk<-> struct fs_disk_quota converts.
>> dquot space and inodes bill up.
>> They can be refactored as shared routines to some extents.
>> . quotastats(8) is not teached to aware container for now.
>>
>> Changes in quota userland utility:
>> * Introduce a new quota format string "lxc" to all quota control
>> utility, to
>> let each utility know that the user want to run container quota
>> control. e.g:
>> quotacheck -cvugm -F "lxc" /
>> quotaon -u -F "lxc" /
>> ....
>>
>> * Currently, I manually created the underlying device(by editing cgroup
>> device access list and running mknod /dev/sdaX x x) for the rootfs
>> inside containers to let the cache mount points routine pass for
>> executing quotacheck against the "/" directory. Actually, it can be
>> omitted here.
>>
>> * Add a new quotaio_lxc.c[.h] for container quota IO, it basically
>> same to
>> VFS quotaio logic, I just hope to isolate container stuff here.
>>
>> Issues:
>> * How to detect quotactl(2) is launched from container in a reasonable
>> way.
>
> It's a system call. It is always called by a process. The process
> belongs to a namespace. What else is needed?

nothing now. :)

>
>> * Do we need to let container quota works for cgroup combine with
>> unshare(1)?
>> Now the patchset is mainly works for lxc guest. IMHO, it can be
>> used outside
>> guest if the user desired. In this case, the quota limits can take
>> effort
>> among different underlying file systems if they have exported quota
>> billing
>> routines.
>
> I still don't understand what is the business of cgroups here. If you
> are attaching it to mount namespace, you can always infer the context
> from the calling process. I still need to look at your patches, but I
> believe that dropping the "feature" of manipulating this from outside of
> the container will save you a lot of trouble.

Yup, just treat it to be namespace specific, there is nothing need to
consider with cgroup interface.

>
> Please note that a process can temporarily join a namespace with
> setns(). So you can have a *utility* that does it from the outer world,
> but the kernel has no business with that. As far as we're concerned, I
> believe that you should always get your context from the current
> namespace, and forbid any usage from outside.

I'll more investigation for that.

>
>> * The hash table list defines(hash table size)for dquot caching for
>> each type is
>> referred to kernel/user.c, maybe its better to define an array
>> separatly for
>> performance optimizations. Of course, that's all depending on my
>> current
>> implementation is on the right road. :)
>>
>> * Container quota statistics, should them be calculated and exposed to
>> /proc/fs/quota? If the underlying file system also enabled with
>> quotas, they will be
>> mixed up, so how about add a new proc file like "ns_quota" there?
> No, this should be transferred to the process-specific proc and them
> symlinked. Take a look at "/proc/self".
>
>>
>> * Memory shrinks acquired from kswap.
>> As all dquot are cached in memory, and if the user executing
>> quotaoff, maybe
>> I need to handle quota disable but still be kept at memory.
>> Also, add another routine to disable and remove all quotas from
>> memory to
>> save memory directly.
>
> I didn't read your patches yet, so take it with a grain of salt here.
> But I don't understand why you make this distinction of keeping it in
> memory only.
>
> You could keep quota files outside of the container, and then bind mount
> them to the current location in the setup-phase.

I have tried to keep quota files outsides originally, but I changed my
thoughts afterwards, because of three reasons at that time:

1) The quota files could be overwrote if the container's rootfs is
located at the root directory of a storage partition, and this partition
is mounted with quota limits enabled.

2) To deal with quota files, looks I have to tweak up
quota_read()/quota_write(), assuming ext4, which are corresponding to
ext4_quota_read()/ext4_quota_write().

3) As mount namespace could be created and destroyed at any stage,
it has no memory to recall which inodes are quota files. however, quota
tools need to restore a few things from those files I remember.
but can not recalled all of them for now. :( I'll do some check up to
refresh my head in this point.

Sure, considering that we can bind mount them at setup phase, the first
concern could be ignored.


Thanks,
-Jeff

2012-05-31 13:04:50

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 05/31/2012 05:19 PM, Glauber Costa wrote:

> On 05/31/2012 12:54 PM, Glauber Costa wrote:
>> On 05/30/2012 06:58 PM, [email protected] wrote:
>>> Hello All,
>>>
>>> According to glauber's comments regarding container disk quota, it
>>> should be binded to mount
>>> namespace rather than cgroup.
>>>
>>> Per my try out, it works just fine by combining with userland quota
>>> utilitly in this way.
>> that's great.
>>
>> I'll take a look at the patches.
>
> Despite my criticism, I do believe those are a lot better than the
> previous version. It seems to me to be the right direction, but some
> more transparency is lacking.

Thanks for your timely response and verification.

>
> I believe you should be able to do it without a lot less changes to the
> tools - if any.

Let me try!

-Jeff

>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-06-01 15:54:57

by Jan Kara

[permalink] [raw]
Subject: Re: container disk quota

Hello,

On Wed 30-05-12 22:58:54, [email protected] wrote:
> According to glauber's comments regarding container disk quota, it should be binded to mount
> namespace rather than cgroup.
>
> Per my try out, it works just fine by combining with userland quota utilitly in this way.
> However, they are something has to be done at user tools too IMHO.
>
> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
> feedbacks from you guys.
>
> Hopefully I can clarify my ideas clearly.
So what I miss in this introductory email is some highlevel description
like what is the desired functionality you try to implement and what is it
good for. Looking at the examples below, it seems you want to be able to
set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
right?

If yes, then I would like to understand one thing: When writing to a
file, used space is accounted to the owner of the file. Now how do we
determine owning namespace? Do you implicitely assume that only processes
from one namespace will be able to access the file?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-06-01 16:04:21

by Serge Hallyn

[permalink] [raw]
Subject: Re: container disk quota

Quoting Jan Kara ([email protected]):
> Hello,
>
> On Wed 30-05-12 22:58:54, [email protected] wrote:
> > According to glauber's comments regarding container disk quota, it should be binded to mount
> > namespace rather than cgroup.
> >
> > Per my try out, it works just fine by combining with userland quota utilitly in this way.
> > However, they are something has to be done at user tools too IMHO.
> >
> > Currently, the patchset is in very initial phase, I'd like to post it early to seek more
> > feedbacks from you guys.
> >
> > Hopefully I can clarify my ideas clearly.
> So what I miss in this introductory email is some highlevel description
> like what is the desired functionality you try to implement and what is it
> good for. Looking at the examples below, it seems you want to be able to
> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
> right?
>
> If yes, then I would like to understand one thing: When writing to a
> file, used space is accounted to the owner of the file. Now how do we
> determine owning namespace? Do you implicitely assume that only processes
> from one namespace will be able to access the file?
>
> Honza

Not having looked closely at the original patchset, let me ask - is this
feature going to be a freebie with Eric's usernamespace patches?

There, a container can be started in its own user namespace. It's uid
1000 will be mapped to something like 1101000 on the host. So the actual
uid against who the quota is counted is 1101000. In another container,
uid 1000 will be mapped to 1201000, and again quota will be counted against
1201000.

Note that this won't work with bind mounts, as a file can only be owned
by one uid, be it 1000, 1101000, or 1201000. So for the quota to work
each container would need its own files. (Of course the underlying
metadata can be shared through whatever ways - btrfs, lvm snapshotting,
etc)

-serge

2012-06-02 05:42:18

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

Hi Jan,

On 06/01/2012 11:54 PM, Jan Kara wrote:

> Hello,
>
> On Wed 30-05-12 22:58:54, [email protected] wrote:
>> According to glauber's comments regarding container disk quota, it should be binded to mount
>> namespace rather than cgroup.
>>
>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>> However, they are something has to be done at user tools too IMHO.
>>
>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>> feedbacks from you guys.
>>
>> Hopefully I can clarify my ideas clearly.
> So what I miss in this introductory email is some highlevel description
> like what is the desired functionality you try to implement and what is it
> good for. Looking at the examples below, it seems you want to be able to
> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
> right?

Sorry for lacking the high level descriptions.

The main idea is to introduce a quota mechanism to let container make use of it like VFS quota/quota
tools as transparent as possible.

However our general quota subsystem works with one global hashtable list, and it based on superblock
and global UID/PID for those accounting, and looks that's not convenient to support container quota
without much changes IMHO.

The current opinion is to bind it to mount namespace, since container can be setup through CLONE_MNTNS to
have its private mount tree.

>
> If yes, then I would like to understand one thing: When writing to a
> file, used space is accounted to the owner of the file. Now how do we
> determine owning namespace?

That's also the main point I am concerned.
It think the latest USER_NS feature would be got involved in.
And also, we can detect the mount namespace by inferring it from processes context. We only bill those
space operations if themount namespace is quota desired.

Here has another issue, when we doing quota info initialization on a mount namespace?
In my current implementation, it is simply implemented by checking if mount namespace was cloned out,
and do quotactl(2) if the input device is "rootfs", or just checking if the mount namespace is not the
initial global one.
quotactl(2) will be processed if meet either above conditions, and space operations will be counted if
the desired UID/GID dquot could be searched.

> Do you implicitely assume that only processes
> from one namespace will be able to access the file?

All processes could access the file, but only those space operation of processes resides in the mount
namespace with quota setup will be billed.


Thanks,
-Jeff

> Honza

2012-06-02 05:59:23

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

Hi Serge,

On 06/02/2012 12:04 AM, Serge Hallyn wrote:

> Quoting Jan Kara ([email protected]):
>> Hello,
>>
>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>> namespace rather than cgroup.
>>>
>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>> However, they are something has to be done at user tools too IMHO.
>>>
>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>> feedbacks from you guys.
>>>
>>> Hopefully I can clarify my ideas clearly.
>> So what I miss in this introductory email is some highlevel description
>> like what is the desired functionality you try to implement and what is it
>> good for. Looking at the examples below, it seems you want to be able to
>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>> right?
>>
>> If yes, then I would like to understand one thing: When writing to a
>> file, used space is accounted to the owner of the file. Now how do we
>> determine owning namespace? Do you implicitely assume that only processes
>> from one namespace will be able to access the file?
>>
>> Honza
>
> Not having looked closely at the original patchset, let me ask - is this
> feature going to be a freebie with Eric's usernamespace patches?

It we can reach a consensus to bind quota on mount namespace for
container or other things maybe.
I think it definitely should depends on user namespace.

>
> There, a container can be started in its own user namespace. It's uid
> 1000 will be mapped to something like 1101000 on the host. So the actual
> uid against who the quota is counted is 1101000. In another container,
> uid 1000 will be mapped to 1201000, and again quota will be counted against
> 1201000.

Is it also an implications that we can examine do container quota or not
based on the uid/gid number?

>
> Note that this won't work with bind mounts, as a file can only be owned
> by one uid, be it 1000, 1101000, or 1201000. So for the quota to work
> each container would need its own files. (Of course the underlying
> metadata can be shared through whatever ways - btrfs, lvm snapshotting,
> etc)

Do you means that we can not bind mount outside files to container for
as general adquot.user/adquot.group purpose?

If so, per glauber's comments, bind quota to mount namespace should be a
generic feature, and container just one of users could make use of it.

Again, if bind quota to mount namespace is on right direction, and it
only does make sense to container for now, maybe we don't need such
files. IMHO, container is a lightweight virtualization solution, maybe
its fine to make it as simple as possible. If the server admin need to
configure hundreds of user/group dquot per container, perhaps he should
consider KVM/XEN.


Thanks,
-Jeff



>
> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-06-02 06:07:11

by Kirill Korotaev

[permalink] [raw]
Subject: Re: container disk quota


On Jun 2, 2012, at 09:59 , Jeff Liu wrote:

> Hi Serge,
>
> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>
>> Quoting Jan Kara ([email protected]):
>>> Hello,
>>>
>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>> namespace rather than cgroup.
>>>>
>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>> However, they are something has to be done at user tools too IMHO.
>>>>
>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>> feedbacks from you guys.
>>>>
>>>> Hopefully I can clarify my ideas clearly.
>>> So what I miss in this introductory email is some highlevel description
>>> like what is the desired functionality you try to implement and what is it
>>> good for. Looking at the examples below, it seems you want to be able to
>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>> right?
>>>
>>> If yes, then I would like to understand one thing: When writing to a
>>> file, used space is accounted to the owner of the file. Now how do we
>>> determine owning namespace? Do you implicitely assume that only processes
>>> from one namespace will be able to access the file?
>>>
>>> Honza
>>
>> Not having looked closely at the original patchset, let me ask - is this
>> feature going to be a freebie with Eric's usernamespace patches?
>
> It we can reach a consensus to bind quota on mount namespace for
> container or other things maybe.

1. OpenVZ doesn't use mount namespaces and still has quotas per container.

2. BTW, have you seen Dmitry Monakhov patches for same containers quotas via additional inode attribute? it allows to make it journaled.
How quotas are stored in your case?

3. I tend to think nowdays such quotas maybe of less need. Quota code doesn't scale well. And it's easier to put container in image file (as OpenVZ recently introduced).

Thanks,
Kirill


2012-06-02 06:24:03

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/02/2012 02:06 PM, Kirill Korotaev wrote:

>
> On Jun 2, 2012, at 09:59 , Jeff Liu wrote:
>
>> Hi Serge,
>>
>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>>
>>> Quoting Jan Kara ([email protected]):
>>>> Hello,
>>>>
>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>>> namespace rather than cgroup.
>>>>>
>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>>> However, they are something has to be done at user tools too IMHO.
>>>>>
>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>>> feedbacks from you guys.
>>>>>
>>>>> Hopefully I can clarify my ideas clearly.
>>>> So what I miss in this introductory email is some highlevel description
>>>> like what is the desired functionality you try to implement and what is it
>>>> good for. Looking at the examples below, it seems you want to be able to
>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>>> right?
>>>>
>>>> If yes, then I would like to understand one thing: When writing to a
>>>> file, used space is accounted to the owner of the file. Now how do we
>>>> determine owning namespace? Do you implicitely assume that only processes
>>>> from one namespace will be able to access the file?
>>>>
>>>> Honza
>>>
>>> Not having looked closely at the original patchset, let me ask - is this
>>> feature going to be a freebie with Eric's usernamespace patches?
>>
>> It we can reach a consensus to bind quota on mount namespace for
>> container or other things maybe.
>
> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.

AFAICS, OpenVZ has self-released quota tools to supply this feature.

>
> 2. BTW, have you seen Dmitry Monakhov patches for same containers quotas via additional inode attribute? it allows to make it journaled.

You means the directly/project quota on ext4?
If yes, I have observed this feature back to the end of last year in
EXT4 mail list.

> How quotas are stored in your case?

It simply cached at memory for now, it also can be tweak up to journaled
I think, if introducing corresponding routines quota_read/quota_write to
particular journal file system.

>
> 3. I tend to think nowdays such quotas maybe of less need. Quota code doesn't scale well. And it's easier to put container in image file (as OpenVZ recently introduced).

There have such requirements dropped to LXC mail list nowadays.
Directory quota is pretty cool and it also useful to containers perspective.

However, that's two different quota mechanism.

"Quota code doesn't scale well".
Do you means it have global locking mechanism and only quota structure
to bill up quota for all file systems with VFS quota enabled?

I noticed that OpenVZ has introduced an image file to supply container
quota, and especially for the container migration consideration.
However, could it be a general solution to LXC?


Thanks,
-Jeff

>
> Thanks,
> Kirill
>

2012-06-02 15:21:17

by Kirill Korotaev

[permalink] [raw]
Subject: Re: container disk quota

>>>>
>>>> Not having looked closely at the original patchset, let me ask - is this
>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>
>>> It we can reach a consensus to bind quota on mount namespace for
>>> container or other things maybe.
>>
>> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.
>
> AFAICS, OpenVZ has self-released quota tools to supply this feature.

but standard quota tools work inside container w/o any modifications.
This is very important for us, cause we run unmodified distros inside.

Actually, this is unrelated. I meant that OpenVZ needs ability to have group quotas w/o mount namespaces.

>
>>
>> 2. BTW, have you seen Dmitry Monakhov patches for same containers quotas via additional inode attribute? it allows to make it journaled.
>
> You means the directly/project quota on ext4?
> If yes, I have observed this feature back to the end of last year in
> EXT4 mail list.

yes

>
>> How quotas are stored in your case?
>
> It simply cached at memory for now, it also can be tweak up to journaled
> I think, if introducing corresponding routines quota_read/quota_write to
> particular journal file system.

just cached quotas are bad - you never sure they are correct.
journaled quotas (as standart) are much better.

>
>>
>> 3. I tend to think nowdays such quotas maybe of less need. Quota code doesn't scale well. And it's easier to put container in image file (as OpenVZ recently introduced).
>
> There have such requirements dropped to LXC mail list nowadays.
> Directory quota is pretty cool and it also useful to containers perspective.
>
> However, that's two different quota mechanism.
>
> "Quota code doesn't scale well".
> Do you means it have global locking mechanism and only quota structure
> to bill up quota for all file systems with VFS quota enabled?

yes.

Kirill

2012-06-03 04:23:39

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

Hi Kirill,

On 06/02/2012 11:21 PM, Kirill Korotaev wrote:

>>>>>
>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>
>>>> It we can reach a consensus to bind quota on mount namespace for
>>>> container or other things maybe.
>>>
>>> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.
>>
>> AFAICS, OpenVZ has self-released quota tools to supply this feature.
>
> but standard quota tools work inside container w/o any modifications.
> This is very important for us, cause we run unmodified distros inside.

Yes, am agree.
I can work out a new patches regarding quota tools based on mount namespace w/o any modification.

>
> Actually, this is unrelated. I meant that OpenVZ needs ability to have group quotas w/o mount namespaces.
>
>>
>>>
>>> 2. BTW, have you seen Dmitry Monakhov patches for same containers quotas via additional inode attribute? it allows to make it journaled.
>>
>> You means the directly/project quota on ext4?
>> If yes, I have observed this feature back to the end of last year in
>> EXT4 mail list.
>
> yes
>
>>
>>> How quotas are stored in your case?
>>
>> It simply cached at memory for now, it also can be tweak up to journaled
>> I think, if introducing corresponding routines quota_read/quota_write to
>> particular journal file system.
>
> just cached quotas are bad - you never sure they are correct.
> journaled quotas (as standart) are much better.

Exactly.

>
>>
>>>
>>> 3. I tend to think nowdays such quotas maybe of less need. Quota code doesn't scale well. And it's easier to put container in image file (as OpenVZ recently introduced).
>>
>> There have such requirements dropped to LXC mail list nowadays.
>> Directory quota is pretty cool and it also useful to containers perspective.
>>
>> However, that's two different quota mechanism.
>>
>> "Quota code doesn't scale well".
>> Do you means it have global locking mechanism and only quota structure
>> to bill up quota for all file systems with VFS quota enabled?
>
> yes.

That's also means there has a potential opportunity for improvement in terms of scalability.

Thanks for your info!
-Jeff

>
> Kirill
>

2012-06-03 05:47:56

by Kirill Korotaev

[permalink] [raw]
Subject: Re: container disk quota


On Jun 3, 2012, at 08:23 , Jeff Liu wrote:

> Hi Kirill,
>
> On 06/02/2012 11:21 PM, Kirill Korotaev wrote:
>
>>>>>>
>>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>>
>>>>> It we can reach a consensus to bind quota on mount namespace for
>>>>> container or other things maybe.
>>>>
>>>> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.
>>>
>>> AFAICS, OpenVZ has self-released quota tools to supply this feature.
>>
>> but standard quota tools work inside container w/o any modifications.
>> This is very important for us, cause we run unmodified distros inside.
>
> Yes, am agree.
> I can work out a new patches regarding quota tools based on mount namespace w/o any modification.

Jeff, why do you need fs namespace for quotas? OpenVZ works w/o it.
It sounds as too strict use case limitation. Or do I understand your patchset description wrong?

Thanks,
Kirill

2012-06-03 06:02:33

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/03/2012 01:47 PM, Kirill Korotaev wrote:

>
> On Jun 3, 2012, at 08:23 , Jeff Liu wrote:
>
>> Hi Kirill,
>>
>> On 06/02/2012 11:21 PM, Kirill Korotaev wrote:
>>
>>>>>>>
>>>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>>>
>>>>>> It we can reach a consensus to bind quota on mount namespace for
>>>>>> container or other things maybe.
>>>>>
>>>>> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.
>>>>
>>>> AFAICS, OpenVZ has self-released quota tools to supply this feature.
>>>
>>> but standard quota tools work inside container w/o any modifications.
>>> This is very important for us, cause we run unmodified distros inside.
>>
>> Yes, am agree.
>> I can work out a new patches regarding quota tools based on mount namespace w/o any modification.
>
> Jeff, why do you need fs namespace for quotas? OpenVZ works w/o it.

This idea is come from by Glauber's comments for my previous patches
which bind quota to cgroup, and per my try out, it really works to some
extents.

> It sounds as too strict use case limitation. Or do I understand your patchset description wrong?

According to my understood, bind quota to mount namespace is another
quota strategy, and container is a potential user case, maybe could be
used for other things.

I certainly will study openVZ's quota code.

Thanks,
-Jeff

>
> Thanks,
> Kirill
>

2012-06-03 09:50:45

by Glauber Costa

[permalink] [raw]
Subject: Re: container disk quota

On 06/03/2012 09:47 AM, Kirill Korotaev wrote:
>
> On Jun 3, 2012, at 08:23 , Jeff Liu wrote:
>
>> Hi Kirill,
>>
>> On 06/02/2012 11:21 PM, Kirill Korotaev wrote:
>>
>>>>>>>
>>>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>>>
>>>>>> It we can reach a consensus to bind quota on mount namespace for
>>>>>> container or other things maybe.
>>>>>
>>>>> 1. OpenVZ doesn't use mount namespaces and still has quotas per container.
>>>>
>>>> AFAICS, OpenVZ has self-released quota tools to supply this feature.
>>>
>>> but standard quota tools work inside container w/o any modifications.
>>> This is very important for us, cause we run unmodified distros inside.
>>
>> Yes, am agree.
>> I can work out a new patches regarding quota tools based on mount namespace w/o any modification.
>
> Jeff, why do you need fs namespace for quotas? OpenVZ works w/o it.
> It sounds as too strict use case limitation. Or do I understand your patchset description wrong?
>

OpenVZ has its own kernel, with its very own isolation capabilities.
So in upstream, something has to be used instead.



2012-06-04 02:57:16

by Serge Hallyn

[permalink] [raw]
Subject: Re: container disk quota

Quoting Jeff Liu ([email protected]):
> Hi Serge,
>
> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>
> > Quoting Jan Kara ([email protected]):
> >> Hello,
> >>
> >> On Wed 30-05-12 22:58:54, [email protected] wrote:
> >>> According to glauber's comments regarding container disk quota, it should be binded to mount
> >>> namespace rather than cgroup.
> >>>
> >>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
> >>> However, they are something has to be done at user tools too IMHO.
> >>>
> >>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
> >>> feedbacks from you guys.
> >>>
> >>> Hopefully I can clarify my ideas clearly.
> >> So what I miss in this introductory email is some highlevel description
> >> like what is the desired functionality you try to implement and what is it
> >> good for. Looking at the examples below, it seems you want to be able to
> >> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
> >> right?
> >>
> >> If yes, then I would like to understand one thing: When writing to a
> >> file, used space is accounted to the owner of the file. Now how do we
> >> determine owning namespace? Do you implicitely assume that only processes
> >> from one namespace will be able to access the file?
> >>
> >> Honza
> >
> > Not having looked closely at the original patchset, let me ask - is this
> > feature going to be a freebie with Eric's usernamespace patches?
>
> It we can reach a consensus to bind quota on mount namespace for
> container or other things maybe.
> I think it definitely should depends on user namespace.
>
> >
> > There, a container can be started in its own user namespace. It's uid
> > 1000 will be mapped to something like 1101000 on the host. So the actual
> > uid against who the quota is counted is 1101000. In another container,
> > uid 1000 will be mapped to 1201000, and again quota will be counted against
> > 1201000.
>
> Is it also an implications that we can examine do container quota or not
> based on the uid/gid number?

I'm sorry I don't understand the question.

As an attempt at an answer: the quota code wouldn't change at all. We would
simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
which is different from the real uid 102100 assigned to uid 1000 in container2
and from real uid 1000 (uid 1000 on the host).

> > Note that this won't work with bind mounts, as a file can only be owned
> > by one uid, be it 1000, 1101000, or 1201000. So for the quota to work
> > each container would need its own files. (Of course the underlying
> > metadata can be shared through whatever ways - btrfs, lvm snapshotting,
> > etc)
>
> Do you means that we can not bind mount outside files to container for
> as general adquot.user/adquot.group purpose?

Right, not without some sort of stackable filesystem which masks the uid.

Actually there may be a way around it (simply provide a mount option,
requiring privilege in the original user namespace, saying mask uid x to
look like uid y for this bind mount), but it's too early to say how
cleanly that could be done.

> If so, per glauber's comments, bind quota to mount namespace should be a
> generic feature, and container just one of users could make use of it.
>
> Again, if bind quota to mount namespace is on right direction, and it
> only does make sense to container for now, maybe we don't need such
> files. IMHO, container is a lightweight virtualization solution, maybe
> its fine to make it as simple as possible. If the server admin need to
> configure hundreds of user/group dquot per container, perhaps he should
> consider KVM/XEN.

Server admin doesn't need to do that.

-serge

2012-06-04 04:46:49

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/04/2012 10:57 AM, Serge Hallyn wrote:

> Quoting Jeff Liu ([email protected]):
>> Hi Serge,
>>
>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>>
>>> Quoting Jan Kara ([email protected]):
>>>> Hello,
>>>>
>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>>> namespace rather than cgroup.
>>>>>
>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>>> However, they are something has to be done at user tools too IMHO.
>>>>>
>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>>> feedbacks from you guys.
>>>>>
>>>>> Hopefully I can clarify my ideas clearly.
>>>> So what I miss in this introductory email is some highlevel description
>>>> like what is the desired functionality you try to implement and what is it
>>>> good for. Looking at the examples below, it seems you want to be able to
>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>>> right?
>>>>
>>>> If yes, then I would like to understand one thing: When writing to a
>>>> file, used space is accounted to the owner of the file. Now how do we
>>>> determine owning namespace? Do you implicitely assume that only processes
>>>> from one namespace will be able to access the file?
>>>>
>>>> Honza
>>>
>>> Not having looked closely at the original patchset, let me ask - is this
>>> feature going to be a freebie with Eric's usernamespace patches?
>>
>> It we can reach a consensus to bind quota on mount namespace for
>> container or other things maybe.
>> I think it definitely should depends on user namespace.
>>
>>>
>>> There, a container can be started in its own user namespace. It's uid
>>> 1000 will be mapped to something like 1101000 on the host. So the actual
>>> uid against who the quota is counted is 1101000. In another container,
>>> uid 1000 will be mapped to 1201000, and again quota will be counted against
>>> 1201000.
>>
>> Is it also an implications that we can examine do container quota or not
>> based on the uid/gid number?
>
> I'm sorry I don't understand the question.

Sorry for my poor english.

>
> As an attempt at an answer: the quota code wouldn't change at all. We would
> simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
> which is different from the real uid 102100 assigned to uid 1000 in container2
> and from real uid 1000 (uid 1000 on the host).

In that case, looks we only need to figure out how to let quota tools
works at container.
I'll build a new kernel with user_ns to give a try.

>
>>> Note that this won't work with bind mounts, as a file can only be owned
>>> by one uid, be it 1000, 1101000, or 1201000. So for the quota to work
>>> each container would need its own files. (Of course the underlying
>>> metadata can be shared through whatever ways - btrfs, lvm snapshotting,
>>> etc)
>>
>> Do you means that we can not bind mount outside files to container for
>> as general adquot.user/adquot.group purpose?
>
> Right, not without some sort of stackable filesystem which masks the uid.
>
> Actually there may be a way around it (simply provide a mount option,
> requiring privilege in the original user namespace, saying mask uid x to
> look like uid y for this bind mount), but it's too early to say how
> cleanly that could be done.

>
>> If so, per glauber's comments, bind quota to mount namespace should be a
>> generic feature, and container just one of users could make use of it.
>>
>> Again, if bind quota to mount namespace is on right direction, and it
>> only does make sense to container for now, maybe we don't need such
>> files. IMHO, container is a lightweight virtualization solution, maybe
>> its fine to make it as simple as possible. If the server admin need to
>> configure hundreds of user/group dquot per container, perhaps he should
>> consider KVM/XEN.
>
> Server admin doesn't need to do that.

Thanks for the info!

-Jeff

>
> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-06-04 09:42:24

by Jan Kara

[permalink] [raw]
Subject: Re: container disk quota

On Mon 04-06-12 12:46:49, Jeff Liu wrote:
> On 06/04/2012 10:57 AM, Serge Hallyn wrote:
> > Quoting Jeff Liu ([email protected]):
> >> Hi Serge,
> >>
> >> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
> >>
> >>> Quoting Jan Kara ([email protected]):
> >>>> Hello,
> >>>>
> >>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
> >>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
> >>>>> namespace rather than cgroup.
> >>>>>
> >>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
> >>>>> However, they are something has to be done at user tools too IMHO.
> >>>>>
> >>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
> >>>>> feedbacks from you guys.
> >>>>>
> >>>>> Hopefully I can clarify my ideas clearly.
> >>>> So what I miss in this introductory email is some highlevel description
> >>>> like what is the desired functionality you try to implement and what is it
> >>>> good for. Looking at the examples below, it seems you want to be able to
> >>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
> >>>> right?
> >>>>
> >>>> If yes, then I would like to understand one thing: When writing to a
> >>>> file, used space is accounted to the owner of the file. Now how do we
> >>>> determine owning namespace? Do you implicitely assume that only processes
> >>>> from one namespace will be able to access the file?
> >>>>
> >>>> Honza
> >>>
> >>> Not having looked closely at the original patchset, let me ask - is this
> >>> feature going to be a freebie with Eric's usernamespace patches?
> >>
> >> It we can reach a consensus to bind quota on mount namespace for
> >> container or other things maybe.
> >> I think it definitely should depends on user namespace.
> >>
> >>>
> >>> There, a container can be started in its own user namespace. It's uid
> >>> 1000 will be mapped to something like 1101000 on the host. So the actual
> >>> uid against who the quota is counted is 1101000. In another container,
> >>> uid 1000 will be mapped to 1201000, and again quota will be counted against
> >>> 1201000.
> >>
> >> Is it also an implications that we can examine do container quota or not
> >> based on the uid/gid number?
> >
> > I'm sorry I don't understand the question.
>
> Sorry for my poor english.
>
> >
> > As an attempt at an answer: the quota code wouldn't change at all. We would
> > simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
> > which is different from the real uid 102100 assigned to uid 1000 in container2
> > and from real uid 1000 (uid 1000 on the host).
>
> In that case, looks we only need to figure out how to let quota tools
> works at container.
> I'll build a new kernel with user_ns to give a try.
GETQUOTA or SETQUOTA quotactls should work just fine inside a container
(for those quota-tools just need access to /proc/mounts). QUOTAON should
also work for e.g. XFS or ext4 with hidden quota files. When quota files
are visible in fs namespace (as for ext3 or so), things would be a bit
tricky because they won't be possibly visible from container and QUOTAON
needs that.

Also with QUOTAON there is the principial problem that quotas either are or
are not enabled for the whole filesystem. So probably the only reasonable
choice when you would like to supporot quotas in the container would be to
have quotas enabled all the time, and inside the container, you would just
set some quota limits or you won't...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-06-04 13:35:06

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/04/2012 05:42 PM, Jan Kara wrote:

> On Mon 04-06-12 12:46:49, Jeff Liu wrote:
>> On 06/04/2012 10:57 AM, Serge Hallyn wrote:
>>> Quoting Jeff Liu ([email protected]):
>>>> Hi Serge,
>>>>
>>>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>>>>
>>>>> Quoting Jan Kara ([email protected]):
>>>>>> Hello,
>>>>>>
>>>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>>>>> namespace rather than cgroup.
>>>>>>>
>>>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>>>>> However, they are something has to be done at user tools too IMHO.
>>>>>>>
>>>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>>>>> feedbacks from you guys.
>>>>>>>
>>>>>>> Hopefully I can clarify my ideas clearly.
>>>>>> So what I miss in this introductory email is some highlevel description
>>>>>> like what is the desired functionality you try to implement and what is it
>>>>>> good for. Looking at the examples below, it seems you want to be able to
>>>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>>>>> right?
>>>>>>
>>>>>> If yes, then I would like to understand one thing: When writing to a
>>>>>> file, used space is accounted to the owner of the file. Now how do we
>>>>>> determine owning namespace? Do you implicitely assume that only processes
>>>>>> from one namespace will be able to access the file?
>>>>>>
>>>>>> Honza
>>>>>
>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>
>>>> It we can reach a consensus to bind quota on mount namespace for
>>>> container or other things maybe.
>>>> I think it definitely should depends on user namespace.
>>>>
>>>>>
>>>>> There, a container can be started in its own user namespace. It's uid
>>>>> 1000 will be mapped to something like 1101000 on the host. So the actual
>>>>> uid against who the quota is counted is 1101000. In another container,
>>>>> uid 1000 will be mapped to 1201000, and again quota will be counted against
>>>>> 1201000.
>>>>
>>>> Is it also an implications that we can examine do container quota or not
>>>> based on the uid/gid number?
>>>
>>> I'm sorry I don't understand the question.
>>
>> Sorry for my poor english.
>>
>>>
>>> As an attempt at an answer: the quota code wouldn't change at all. We would
>>> simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
>>> which is different from the real uid 102100 assigned to uid 1000 in container2
>>> and from real uid 1000 (uid 1000 on the host).
>>
>> In that case, looks we only need to figure out how to let quota tools
>> works at container.
>> I'll build a new kernel with user_ns to give a try.
> GETQUOTA or SETQUOTA quotactls should work just fine inside a container
> (for those quota-tools just need access to /proc/mounts). QUOTAON should
> also work for e.g. XFS or ext4 with hidden quota files. When quota files
> are visible in fs namespace (as for ext3 or so), things would be a bit
> tricky because they won't be possibly visible from container and QUOTAON
> needs that.

I still think if we can cache container dquot on memory to make this feature as simple as possible. :)

And also, quotacheck is the major issue I have faced previously, since we need a reasonable approach to calculate
and save the current inodes/blocks usage firstly.

>
> Also with QUOTAON there is the principial problem that quotas either are or
> are not enabled for the whole filesystem.

IMHO, we could supply uid/gid quota for the whole filesystem only(i.e, the "/" rootfs), and we can support project
quota among sub-directories in the future if possible.

> So probably the only reasonable
> choice when you would like to supporot quotas in the container would be to
> have quotas enabled all the time, and inside the container, you would just
> set some quota limits or you won't...

I remember that ext4 has already supported quota as the first class, looks we can consider container quota same to that.

So we can ignore the quotacheck step, only focus on quota limits setup inside container?


Thanks for the teaching!
-Jeff

>
> Honza

2012-06-04 13:56:18

by Jan Kara

[permalink] [raw]
Subject: Re: container disk quota

On Mon 04-06-12 21:35:06, Jeff Liu wrote:
> On 06/04/2012 05:42 PM, Jan Kara wrote:
>
> > On Mon 04-06-12 12:46:49, Jeff Liu wrote:
> >> On 06/04/2012 10:57 AM, Serge Hallyn wrote:
> >>> Quoting Jeff Liu ([email protected]):
> >>>> Hi Serge,
> >>>>
> >>>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
> >>>>
> >>>>> Quoting Jan Kara ([email protected]):
> >>>>>> Hello,
> >>>>>>
> >>>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
> >>>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
> >>>>>>> namespace rather than cgroup.
> >>>>>>>
> >>>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
> >>>>>>> However, they are something has to be done at user tools too IMHO.
> >>>>>>>
> >>>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
> >>>>>>> feedbacks from you guys.
> >>>>>>>
> >>>>>>> Hopefully I can clarify my ideas clearly.
> >>>>>> So what I miss in this introductory email is some highlevel description
> >>>>>> like what is the desired functionality you try to implement and what is it
> >>>>>> good for. Looking at the examples below, it seems you want to be able to
> >>>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
> >>>>>> right?
> >>>>>>
> >>>>>> If yes, then I would like to understand one thing: When writing to a
> >>>>>> file, used space is accounted to the owner of the file. Now how do we
> >>>>>> determine owning namespace? Do you implicitely assume that only processes
> >>>>>> from one namespace will be able to access the file?
> >>>>>>
> >>>>>> Honza
> >>>>>
> >>>>> Not having looked closely at the original patchset, let me ask - is this
> >>>>> feature going to be a freebie with Eric's usernamespace patches?
> >>>>
> >>>> It we can reach a consensus to bind quota on mount namespace for
> >>>> container or other things maybe.
> >>>> I think it definitely should depends on user namespace.
> >>>>
> >>>>>
> >>>>> There, a container can be started in its own user namespace. It's uid
> >>>>> 1000 will be mapped to something like 1101000 on the host. So the actual
> >>>>> uid against who the quota is counted is 1101000. In another container,
> >>>>> uid 1000 will be mapped to 1201000, and again quota will be counted against
> >>>>> 1201000.
> >>>>
> >>>> Is it also an implications that we can examine do container quota or not
> >>>> based on the uid/gid number?
> >>>
> >>> I'm sorry I don't understand the question.
> >>
> >> Sorry for my poor english.
> >>
> >>>
> >>> As an attempt at an answer: the quota code wouldn't change at all. We would
> >>> simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
> >>> which is different from the real uid 102100 assigned to uid 1000 in container2
> >>> and from real uid 1000 (uid 1000 on the host).
> >>
> >> In that case, looks we only need to figure out how to let quota tools
> >> works at container.
> >> I'll build a new kernel with user_ns to give a try.
> > GETQUOTA or SETQUOTA quotactls should work just fine inside a container
> > (for those quota-tools just need access to /proc/mounts). QUOTAON should
> > also work for e.g. XFS or ext4 with hidden quota files. When quota files
> > are visible in fs namespace (as for ext3 or so), things would be a bit
> > tricky because they won't be possibly visible from container and QUOTAON
> > needs that.
>
> I still think if we can cache container dquot on memory to make this
> feature as simple as possible. :)
Sorry, I don't understand. Quota structures are cached in memory. Also
what would be simpler if you also do some caching in a container?

> And also, quotacheck is the major issue I have faced previously, since we need a reasonable approach to calculate
> and save the current inodes/blocks usage firstly.
Yes, quotacheck inside a container is a problem. But similarly as with
quotaon(8), I think such global operation should rather be done outside.

> > Also with QUOTAON there is the principial problem that quotas either are or
> > are not enabled for the whole filesystem.
>
> IMHO, we could supply uid/gid quota for the whole filesystem only(i.e,
> the "/" rootfs), and we can support project quota among sub-directories
> in the future if possible.
>
> > So probably the only reasonable
> > choice when you would like to supporot quotas in the container would be to
> > have quotas enabled all the time, and inside the container, you would just
> > set some quota limits or you won't...
>
> I remember that ext4 has already supported quota as the first class,
> looks we can consider container quota same to that.
>
> So we can ignore the quotacheck step, only focus on quota limits setup
> inside container?
Yes, that would be my suggestion.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-06-04 14:55:32

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/04/2012 09:56 PM, Jan Kara wrote:

> On Mon 04-06-12 21:35:06, Jeff Liu wrote:
>> On 06/04/2012 05:42 PM, Jan Kara wrote:
>>
>>> On Mon 04-06-12 12:46:49, Jeff Liu wrote:
>>>> On 06/04/2012 10:57 AM, Serge Hallyn wrote:
>>>>> Quoting Jeff Liu ([email protected]):
>>>>>> Hi Serge,
>>>>>>
>>>>>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>>>>>>
>>>>>>> Quoting Jan Kara ([email protected]):
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>>>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>>>>>>> namespace rather than cgroup.
>>>>>>>>>
>>>>>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>>>>>>> However, they are something has to be done at user tools too IMHO.
>>>>>>>>>
>>>>>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>>>>>>> feedbacks from you guys.
>>>>>>>>>
>>>>>>>>> Hopefully I can clarify my ideas clearly.
>>>>>>>> So what I miss in this introductory email is some highlevel description
>>>>>>>> like what is the desired functionality you try to implement and what is it
>>>>>>>> good for. Looking at the examples below, it seems you want to be able to
>>>>>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>>>>>>> right?
>>>>>>>>
>>>>>>>> If yes, then I would like to understand one thing: When writing to a
>>>>>>>> file, used space is accounted to the owner of the file. Now how do we
>>>>>>>> determine owning namespace? Do you implicitely assume that only processes
>>>>>>>> from one namespace will be able to access the file?
>>>>>>>>
>>>>>>>> Honza
>>>>>>>
>>>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>>>
>>>>>> It we can reach a consensus to bind quota on mount namespace for
>>>>>> container or other things maybe.
>>>>>> I think it definitely should depends on user namespace.
>>>>>>
>>>>>>>
>>>>>>> There, a container can be started in its own user namespace. It's uid
>>>>>>> 1000 will be mapped to something like 1101000 on the host. So the actual
>>>>>>> uid against who the quota is counted is 1101000. In another container,
>>>>>>> uid 1000 will be mapped to 1201000, and again quota will be counted against
>>>>>>> 1201000.
>>>>>>
>>>>>> Is it also an implications that we can examine do container quota or not
>>>>>> based on the uid/gid number?
>>>>>
>>>>> I'm sorry I don't understand the question.
>>>>
>>>> Sorry for my poor english.
>>>>
>>>>>
>>>>> As an attempt at an answer: the quota code wouldn't change at all. We would
>>>>> simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
>>>>> which is different from the real uid 102100 assigned to uid 1000 in container2
>>>>> and from real uid 1000 (uid 1000 on the host).
>>>>
>>>> In that case, looks we only need to figure out how to let quota tools
>>>> works at container.
>>>> I'll build a new kernel with user_ns to give a try.
>>> GETQUOTA or SETQUOTA quotactls should work just fine inside a container
>>> (for those quota-tools just need access to /proc/mounts). QUOTAON should
>>> also work for e.g. XFS or ext4 with hidden quota files. When quota files
>>> are visible in fs namespace (as for ext3 or so), things would be a bit
>>> tricky because they won't be possibly visible from container and QUOTAON
>>> needs that.
>>
>> I still think if we can cache container dquot on memory to make this
>> feature as simple as possible. :)
> Sorry, I don't understand. Quota structures are cached in memory.

I means teaching Q_SETQUOTA routine, don't write those info to quota
file if it was issued from container in quotacheck stage. Instead,
allocate a dquot object at memory and keep it until quotaoff or
container destory procedures maybe.

> Also what would be simpler if you also do some caching in a container?

Sorry, does it means do caching in quota files?
currently, I have no good idea in this point. :(

>
>> And also, quotacheck is the major issue I have faced previously, since we need a reasonable approach to calculate
>> and save the current inodes/blocks usage firstly.
> Yes, quotacheck inside a container is a problem. But similarly as with
> quotaon(8), I think such global operation should rather be done outside.
>
>>> Also with QUOTAON there is the principial problem that quotas either are or
>>> are not enabled for the whole filesystem.
>>
>> IMHO, we could supply uid/gid quota for the whole filesystem only(i.e,
>> the "/" rootfs), and we can support project quota among sub-directories
>> in the future if possible.
>>
>>> So probably the only reasonable
>>> choice when you would like to supporot quotas in the container would be to
>>> have quotas enabled all the time, and inside the container, you would just
>>> set some quota limits or you won't...
>>
>> I remember that ext4 has already supported quota as the first class,
>> looks we can consider container quota same to that.
>>
>> So we can ignore the quotacheck step, only focus on quota limits setup
>> inside container?
> Yes, that would be my suggestion.

Yeah, that would be fine.

Thanks,
-Jeff

>
> Honza

2012-06-04 15:50:07

by Jeff Liu

[permalink] [raw]
Subject: Re: container disk quota

On 06/04/2012 10:55 PM, Jeff Liu wrote:

> On 06/04/2012 09:56 PM, Jan Kara wrote:
>
>> On Mon 04-06-12 21:35:06, Jeff Liu wrote:
>>> On 06/04/2012 05:42 PM, Jan Kara wrote:
>>>
>>>> On Mon 04-06-12 12:46:49, Jeff Liu wrote:
>>>>> On 06/04/2012 10:57 AM, Serge Hallyn wrote:
>>>>>> Quoting Jeff Liu ([email protected]):
>>>>>>> Hi Serge,
>>>>>>>
>>>>>>> On 06/02/2012 12:04 AM, Serge Hallyn wrote:
>>>>>>>
>>>>>>>> Quoting Jan Kara ([email protected]):
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> On Wed 30-05-12 22:58:54, [email protected] wrote:
>>>>>>>>>> According to glauber's comments regarding container disk quota, it should be binded to mount
>>>>>>>>>> namespace rather than cgroup.
>>>>>>>>>>
>>>>>>>>>> Per my try out, it works just fine by combining with userland quota utilitly in this way.
>>>>>>>>>> However, they are something has to be done at user tools too IMHO.
>>>>>>>>>>
>>>>>>>>>> Currently, the patchset is in very initial phase, I'd like to post it early to seek more
>>>>>>>>>> feedbacks from you guys.
>>>>>>>>>>
>>>>>>>>>> Hopefully I can clarify my ideas clearly.
>>>>>>>>> So what I miss in this introductory email is some highlevel description
>>>>>>>>> like what is the desired functionality you try to implement and what is it
>>>>>>>>> good for. Looking at the examples below, it seems you want to be able to
>>>>>>>>> set quota limits for namespace-uid (and also namespace-gid???) pairs, am I
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> If yes, then I would like to understand one thing: When writing to a
>>>>>>>>> file, used space is accounted to the owner of the file. Now how do we
>>>>>>>>> determine owning namespace? Do you implicitely assume that only processes
>>>>>>>>> from one namespace will be able to access the file?
>>>>>>>>>
>>>>>>>>> Honza
>>>>>>>>
>>>>>>>> Not having looked closely at the original patchset, let me ask - is this
>>>>>>>> feature going to be a freebie with Eric's usernamespace patches?
>>>>>>>
>>>>>>> It we can reach a consensus to bind quota on mount namespace for
>>>>>>> container or other things maybe.
>>>>>>> I think it definitely should depends on user namespace.
>>>>>>>
>>>>>>>>
>>>>>>>> There, a container can be started in its own user namespace. It's uid
>>>>>>>> 1000 will be mapped to something like 1101000 on the host. So the actual
>>>>>>>> uid against who the quota is counted is 1101000. In another container,
>>>>>>>> uid 1000 will be mapped to 1201000, and again quota will be counted against
>>>>>>>> 1201000.
>>>>>>>
>>>>>>> Is it also an implications that we can examine do container quota or not
>>>>>>> based on the uid/gid number?
>>>>>>
>>>>>> I'm sorry I don't understand the question.
>>>>>
>>>>> Sorry for my poor english.
>>>>>
>>>>>>
>>>>>> As an attempt at an answer: the quota code wouldn't change at all. We would
>>>>>> simply exploit the fact that uid 1000 in container1 has a real uid of 101100,
>>>>>> which is different from the real uid 102100 assigned to uid 1000 in container2
>>>>>> and from real uid 1000 (uid 1000 on the host).
>>>>>
>>>>> In that case, looks we only need to figure out how to let quota tools
>>>>> works at container.
>>>>> I'll build a new kernel with user_ns to give a try.
>>>> GETQUOTA or SETQUOTA quotactls should work just fine inside a container
>>>> (for those quota-tools just need access to /proc/mounts). QUOTAON should
>>>> also work for e.g. XFS or ext4 with hidden quota files. When quota files
>>>> are visible in fs namespace (as for ext3 or so), things would be a bit
>>>> tricky because they won't be possibly visible from container and QUOTAON
>>>> needs that.
>>>
>>> I still think if we can cache container dquot on memory to make this
>>> feature as simple as possible. :)
>> Sorry, I don't understand. Quota structures are cached in memory.
>
> I means teaching Q_SETQUOTA routine, don't write those info to quota
> file if it was issued from container in quotacheck stage. Instead,
> allocate a dquot object at memory and keep it until quotaoff or
> container destory procedures maybe.

Sorry, I must misled you.
We can not save quota usage info to memory cache without changing the
quota tools.

Originally, I introduced a new format option(which is "lxc") to
quotacheck, etc...
so if they were issued with -F "lxc" option, those tools will not
performed combined with quota file path as usual.

Since we would not like to change the quota tools, so it is absolutely
wrong.


Thanks,
-Jeff

>
>> Also what would be simpler if you also do some caching in a container?
>
> Sorry, does it means do caching in quota files?
> currently, I have no good idea in this point. :(
>
>>
>>> And also, quotacheck is the major issue I have faced previously, since we need a reasonable approach to calculate
>>> and save the current inodes/blocks usage firstly.
>> Yes, quotacheck inside a container is a problem. But similarly as with
>> quotaon(8), I think such global operation should rather be done outside.
>>
>>>> Also with QUOTAON there is the principial problem that quotas either are or
>>>> are not enabled for the whole filesystem.
>>>
>>> IMHO, we could supply uid/gid quota for the whole filesystem only(i.e,
>>> the "/" rootfs), and we can support project quota among sub-directories
>>> in the future if possible.
>>>
>>>> So probably the only reasonable
>>>> choice when you would like to supporot quotas in the container would be to
>>>> have quotas enabled all the time, and inside the container, you would just
>>>> set some quota limits or you won't...
>>>
>>> I remember that ext4 has already supported quota as the first class,
>>> looks we can consider container quota same to that.
>>>
>>> So we can ignore the quotacheck step, only focus on quota limits setup
>>> inside container?
>> Yes, that would be my suggestion.
>
> Yeah, that would be fine.
>
> Thanks,
> -Jeff
>
>>
>> Honza
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html