2010-07-29 02:08:43

by Akira Fujita

[permalink] [raw]
Subject: BUG? ext3: Allocate blocks over quota limit with mmap

Hi,

I found a problem that user can allocate blocks over quota limitation
on ext3 (and ext2) with mmap.
You can reproduce this with the following steps:

1. Enable user quota on ext3
[akira@bsd086 mnt]$ uname -r
2.6.35-rc6

[root@bsd086 mnt]# cat /proc/mounts | grep /dev/sda9
/dev/sda9 /mnt/mp1 ext3 rw,relatime,errors=continue,barrier=0,data=ordered,usrquota 0 0

[root@bsd086 mnt]# quotaon -p /mnt/mp1
group quota on /mnt/mp1 (/dev/sda9) is off
user quota on /mnt/mp1 (/dev/sda9) is on

[root@bsd086 mnt]# repquota -v /mnt/mp1
*** Report for user quotas on device /dev/sda9
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 1229 0 0 4 0 0
akira -- 0 100 1000 0 0 0


2. Create sparse file on ext3
[akira@bsd086 mnt]$ df -T /mnt/mp1
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda9 ext3 23300 1236 20861 6% /mnt/mp1

[akira@bsd086 mnt]$ dd if=/dev/zero of=/mnt/mp1/file bs=4096 seek=1MB count=1

[akira@bsd086 mnt]$ ls -ls /mnt/mp1
total 26
7 -rw------- 1 root root 7168 Jul 28 15:53 aquota.user
7 -rw-rw-r-- 1 akira akira 4096004096 Jul 28 15:53 file
12 drwx------ 2 root root 12288 Jul 28 14:49 lost+found

[root@bsd086 mnt]# repquota -v /mnt/mp1
*** Report for user quotas on device /dev/sda9
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 1228 0 0 3 0 0
akira -- 8 100 1000 2 0 0

3. Write data to "file" with mmap and msync.
(In this time, write size is 50MB. It's larger than partition size )
e.g.
long long contents = 0x0002;
fd = (file, O_APPEND | O_RDWR, 0666);
p = mmap(NULL, psize, PROT_WRITE, MAP_SHARED, fd, offset);
memset(p, contents++, psize);
offset += psize
munmap(p, psize);
close(fd);

4. Then run out disk space, user uses all of the blocks.
[akira@bsd086 mnt]$ df -T /mnt/mp1
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda9 ext3 23300 23300 0 100% /mnt/mp1
~~~~~
[root@bsd086 mnt]# repquota -v /mnt/mp1
*** Report for user quotas on device /dev/sda9
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
User used soft hard grace used soft hard grace
----------------------------------------------------------------------
root -- 1228 0 0 3 0 0
akira +- 22065 100 1000 6days 2 0 0
~~~~~

memset() after mmap() triggers the pagefault and then __do_fault
marks whole pages correspond to offset we specified as dirty.
After 5 seconds (or call sync), the kjournald tries to write out all of dirtied pages
with getting blocks to disk.
kjournald has CAP_SYS_RESOURCE capability, therefore it can ignore
quota limitation (also can use blocks for root user).
As a result, user can have blocks over quota limitation,
though quota is enabled.
Note: ext4 has own page_mkwrite, so this problem does not happen on it.

I guess behavior of kjournald is correct (write out all dirty pages of file),
so we need some consideration for pagefault behavior for ext3 and ext2.

Is this a bug?

Regards,
Akira Fujita



2010-08-02 05:11:15

by Akira Fujita

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

Hi ext3 maintainers,

Could you look into this?
If this is not a problem, it is good though.

Regards,
Akira Fujita


(2010/07/29 11:08), Akira Fujita wrote:
> Hi,
>
> I found a problem that user can allocate blocks over quota limitation
> on ext3 (and ext2) with mmap.
> You can reproduce this with the following steps:
>
> 1. Enable user quota on ext3
> [akira@bsd086 mnt]$ uname -r
> 2.6.35-rc6
>
> [root@bsd086 mnt]# cat /proc/mounts | grep /dev/sda9
> /dev/sda9 /mnt/mp1 ext3 rw,relatime,errors=continue,barrier=0,data=ordered,usrquota 0 0
>
> [root@bsd086 mnt]# quotaon -p /mnt/mp1
> group quota on /mnt/mp1 (/dev/sda9) is off
> user quota on /mnt/mp1 (/dev/sda9) is on
>
> [root@bsd086 mnt]# repquota -v /mnt/mp1
> *** Report for user quotas on device /dev/sda9
> Block grace time: 7days; Inode grace time: 7days
> Block limits File limits
> User used soft hard grace used soft hard grace
> ----------------------------------------------------------------------
> root -- 1229 0 0 4 0 0
> akira -- 0 100 1000 0 0 0
>
>
> 2. Create sparse file on ext3
> [akira@bsd086 mnt]$ df -T /mnt/mp1
> Filesystem Type 1K-blocks Used Available Use% Mounted on
> /dev/sda9 ext3 23300 1236 20861 6% /mnt/mp1
>
> [akira@bsd086 mnt]$ dd if=/dev/zero of=/mnt/mp1/file bs=4096 seek=1MB count=1
>
> [akira@bsd086 mnt]$ ls -ls /mnt/mp1
> total 26
> 7 -rw------- 1 root root 7168 Jul 28 15:53 aquota.user
> 7 -rw-rw-r-- 1 akira akira 4096004096 Jul 28 15:53 file
> 12 drwx------ 2 root root 12288 Jul 28 14:49 lost+found
>
> [root@bsd086 mnt]# repquota -v /mnt/mp1
> *** Report for user quotas on device /dev/sda9
> Block grace time: 7days; Inode grace time: 7days
> Block limits File limits
> User used soft hard grace used soft hard grace
> ----------------------------------------------------------------------
> root -- 1228 0 0 3 0 0
> akira -- 8 100 1000 2 0 0
>
> 3. Write data to "file" with mmap and msync.
> (In this time, write size is 50MB. It's larger than partition size )
> e.g.
> long long contents = 0x0002;
> fd = (file, O_APPEND | O_RDWR, 0666);
> p = mmap(NULL, psize, PROT_WRITE, MAP_SHARED, fd, offset);
> memset(p, contents++, psize);
> offset += psize
> munmap(p, psize);
> close(fd);
>
> 4. Then run out disk space, user uses all of the blocks.
> [akira@bsd086 mnt]$ df -T /mnt/mp1
> Filesystem Type 1K-blocks Used Available Use% Mounted on
> /dev/sda9 ext3 23300 23300 0 100% /mnt/mp1
> ~~~~~
> [root@bsd086 mnt]# repquota -v /mnt/mp1
> *** Report for user quotas on device /dev/sda9
> Block grace time: 7days; Inode grace time: 7days
> Block limits File limits
> User used soft hard grace used soft hard grace
> ----------------------------------------------------------------------
> root -- 1228 0 0 3 0 0
> akira +- 22065 100 1000 6days 2 0 0
> ~~~~~
>
> memset() after mmap() triggers the pagefault and then __do_fault
> marks whole pages correspond to offset we specified as dirty.
> After 5 seconds (or call sync), the kjournald tries to write out all of dirtied pages
> with getting blocks to disk.
> kjournald has CAP_SYS_RESOURCE capability, therefore it can ignore
> quota limitation (also can use blocks for root user).
> As a result, user can have blocks over quota limitation,
> though quota is enabled.
> Note: ext4 has own page_mkwrite, so this problem does not happen on it.
>
> I guess behavior of kjournald is correct (write out all dirty pages of file),
> so we need some consideration for pagefault behavior for ext3 and ext2.
>
> Is this a bug?
>
> Regards,
> Akira Fujita
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-08-02 05:22:16

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

Akira Fujita <[email protected]> writes:

> Hi ext3 maintainers,
>
> Could you look into this?
> If this is not a problem, it is good though.
Actually this is a problem. Because this issue makes quota just a fake
limit. I've done this test for ext4 and was satisfied with result,
but was too lazy to perform it on ext3/2 :(
At least we have to have testcase for that in xfstest-qa.
It seems that private page_mkwrite will be sufficient.
I'm working on that.
>
> Regards,
> Akira Fujita
>
>
> (2010/07/29 11:08), Akira Fujita wrote:
>> Hi,
>>
>> I found a problem that user can allocate blocks over quota limitation
>> on ext3 (and ext2) with mmap.
>> You can reproduce this with the following steps:
>>
>> 1. Enable user quota on ext3
>> [akira@bsd086 mnt]$ uname -r
>> 2.6.35-rc6
>>
>> [root@bsd086 mnt]# cat /proc/mounts | grep /dev/sda9
>> /dev/sda9 /mnt/mp1 ext3 rw,relatime,errors=continue,barrier=0,data=ordered,usrquota 0 0
>>
>> [root@bsd086 mnt]# quotaon -p /mnt/mp1
>> group quota on /mnt/mp1 (/dev/sda9) is off
>> user quota on /mnt/mp1 (/dev/sda9) is on
>>
>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>> *** Report for user quotas on device /dev/sda9
>> Block grace time: 7days; Inode grace time: 7days
>> Block limits File limits
>> User used soft hard grace used soft hard grace
>> ----------------------------------------------------------------------
>> root -- 1229 0 0 4 0 0
>> akira -- 0 100 1000 0 0 0
>>
>>
>> 2. Create sparse file on ext3
>> [akira@bsd086 mnt]$ df -T /mnt/mp1
>> Filesystem Type 1K-blocks Used Available Use% Mounted on
>> /dev/sda9 ext3 23300 1236 20861 6% /mnt/mp1
>>
>> [akira@bsd086 mnt]$ dd if=/dev/zero of=/mnt/mp1/file bs=4096 seek=1MB count=1
>>
>> [akira@bsd086 mnt]$ ls -ls /mnt/mp1
>> total 26
>> 7 -rw------- 1 root root 7168 Jul 28 15:53 aquota.user
>> 7 -rw-rw-r-- 1 akira akira 4096004096 Jul 28 15:53 file
>> 12 drwx------ 2 root root 12288 Jul 28 14:49 lost+found
>>
>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>> *** Report for user quotas on device /dev/sda9
>> Block grace time: 7days; Inode grace time: 7days
>> Block limits File limits
>> User used soft hard grace used soft hard grace
>> ----------------------------------------------------------------------
>> root -- 1228 0 0 3 0 0
>> akira -- 8 100 1000 2 0 0
>>
>> 3. Write data to "file" with mmap and msync.
>> (In this time, write size is 50MB. It's larger than partition size )
>> e.g.
>> long long contents = 0x0002;
>> fd = (file, O_APPEND | O_RDWR, 0666);
>> p = mmap(NULL, psize, PROT_WRITE, MAP_SHARED, fd, offset);
>> memset(p, contents++, psize);
>> offset += psize
>> munmap(p, psize);
>> close(fd);
>>
>> 4. Then run out disk space, user uses all of the blocks.
>> [akira@bsd086 mnt]$ df -T /mnt/mp1
>> Filesystem Type 1K-blocks Used Available Use% Mounted on
>> /dev/sda9 ext3 23300 23300 0 100% /mnt/mp1
>> ~~~~~
>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>> *** Report for user quotas on device /dev/sda9
>> Block grace time: 7days; Inode grace time: 7days
>> Block limits File limits
>> User used soft hard grace used soft hard grace
>> ----------------------------------------------------------------------
>> root -- 1228 0 0 3 0 0
>> akira +- 22065 100 1000 6days 2 0 0
>> ~~~~~
>>
>> memset() after mmap() triggers the pagefault and then __do_fault
>> marks whole pages correspond to offset we specified as dirty.
>> After 5 seconds (or call sync), the kjournald tries to write out all of dirtied pages
>> with getting blocks to disk.
>> kjournald has CAP_SYS_RESOURCE capability, therefore it can ignore
>> quota limitation (also can use blocks for root user).
>> As a result, user can have blocks over quota limitation,
>> though quota is enabled.
>> Note: ext4 has own page_mkwrite, so this problem does not happen on it.
>>
>> I guess behavior of kjournald is correct (write out all dirty pages of file),
>> so we need some consideration for pagefault behavior for ext3 and ext2.
>>
>> Is this a bug?
>>
>> Regards,
>> Akira Fujita
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-08-02 05:59:25

by Akira Fujita

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

Hi Dmitry,

> It seems that private page_mkwrite will be sufficient.

Agree. This problem also breaks "reserved blocks count" semantics,
private page_mkwrite for ext2/3 will be necessary.
Thank you for working this on.

Regards,
Akira Fujita


(2010/08/02 14:22), Dmitry Monakhov wrote:
> Akira Fujita<[email protected]> writes:
>
>> Hi ext3 maintainers,
>>
>> Could you look into this?
>> If this is not a problem, it is good though.
> Actually this is a problem. Because this issue makes quota just a fake
> limit. I've done this test for ext4 and was satisfied with result,
> but was too lazy to perform it on ext3/2 :(
> At least we have to have testcase for that in xfstest-qa.
> It seems that private page_mkwrite will be sufficient.
> I'm working on that.
>>
>> Regards,
>> Akira Fujita
>>
>>
>> (2010/07/29 11:08), Akira Fujita wrote:
>>> Hi,
>>>
>>> I found a problem that user can allocate blocks over quota limitation
>>> on ext3 (and ext2) with mmap.
>>> You can reproduce this with the following steps:
>>>
>>> 1. Enable user quota on ext3
>>> [akira@bsd086 mnt]$ uname -r
>>> 2.6.35-rc6
>>>
>>> [root@bsd086 mnt]# cat /proc/mounts | grep /dev/sda9
>>> /dev/sda9 /mnt/mp1 ext3 rw,relatime,errors=continue,barrier=0,data=ordered,usrquota 0 0
>>>
>>> [root@bsd086 mnt]# quotaon -p /mnt/mp1
>>> group quota on /mnt/mp1 (/dev/sda9) is off
>>> user quota on /mnt/mp1 (/dev/sda9) is on
>>>
>>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>>> *** Report for user quotas on device /dev/sda9
>>> Block grace time: 7days; Inode grace time: 7days
>>> Block limits File limits
>>> User used soft hard grace used soft hard grace
>>> ----------------------------------------------------------------------
>>> root -- 1229 0 0 4 0 0
>>> akira -- 0 100 1000 0 0 0
>>>
>>>
>>> 2. Create sparse file on ext3
>>> [akira@bsd086 mnt]$ df -T /mnt/mp1
>>> Filesystem Type 1K-blocks Used Available Use% Mounted on
>>> /dev/sda9 ext3 23300 1236 20861 6% /mnt/mp1
>>>
>>> [akira@bsd086 mnt]$ dd if=/dev/zero of=/mnt/mp1/file bs=4096 seek=1MB count=1
>>>
>>> [akira@bsd086 mnt]$ ls -ls /mnt/mp1
>>> total 26
>>> 7 -rw------- 1 root root 7168 Jul 28 15:53 aquota.user
>>> 7 -rw-rw-r-- 1 akira akira 4096004096 Jul 28 15:53 file
>>> 12 drwx------ 2 root root 12288 Jul 28 14:49 lost+found
>>>
>>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>>> *** Report for user quotas on device /dev/sda9
>>> Block grace time: 7days; Inode grace time: 7days
>>> Block limits File limits
>>> User used soft hard grace used soft hard grace
>>> ----------------------------------------------------------------------
>>> root -- 1228 0 0 3 0 0
>>> akira -- 8 100 1000 2 0 0
>>>
>>> 3. Write data to "file" with mmap and msync.
>>> (In this time, write size is 50MB. It's larger than partition size )
>>> e.g.
>>> long long contents = 0x0002;
>>> fd = (file, O_APPEND | O_RDWR, 0666);
>>> p = mmap(NULL, psize, PROT_WRITE, MAP_SHARED, fd, offset);
>>> memset(p, contents++, psize);
>>> offset += psize
>>> munmap(p, psize);
>>> close(fd);
>>>
>>> 4. Then run out disk space, user uses all of the blocks.
>>> [akira@bsd086 mnt]$ df -T /mnt/mp1
>>> Filesystem Type 1K-blocks Used Available Use% Mounted on
>>> /dev/sda9 ext3 23300 23300 0 100% /mnt/mp1
>>> ~~~~~
>>> [root@bsd086 mnt]# repquota -v /mnt/mp1
>>> *** Report for user quotas on device /dev/sda9
>>> Block grace time: 7days; Inode grace time: 7days
>>> Block limits File limits
>>> User used soft hard grace used soft hard grace
>>> ----------------------------------------------------------------------
>>> root -- 1228 0 0 3 0 0
>>> akira +- 22065 100 1000 6days 2 0 0
>>> ~~~~~
>>>
>>> memset() after mmap() triggers the pagefault and then __do_fault
>>> marks whole pages correspond to offset we specified as dirty.
>>> After 5 seconds (or call sync), the kjournald tries to write out all of dirtied pages
>>> with getting blocks to disk.
>>> kjournald has CAP_SYS_RESOURCE capability, therefore it can ignore
>>> quota limitation (also can use blocks for root user).
>>> As a result, user can have blocks over quota limitation,
>>> though quota is enabled.
>>> Note: ext4 has own page_mkwrite, so this problem does not happen on it.
>>>
>>> I guess behavior of kjournald is correct (write out all dirty pages of file),
>>> so we need some consideration for pagefault behavior for ext3 and ext2.
>>>
>>> Is this a bug?
>>>
>>> Regards,
>>> Akira Fujita
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Akira Fujita <[email protected]>

The First Fundamental Software Development Group,
Software Development Division,
NEC Software Tohoku, Ltd.


2010-08-02 12:44:36

by Jan Kara

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

On Mon 02-08-10 09:22:12, Dmitry Monakhov wrote:
> Akira Fujita <[email protected]> writes:
>
> > Hi ext3 maintainers,
> >
> > Could you look into this?
> > If this is not a problem, it is good though.
> Actually this is a problem. Because this issue makes quota just a fake
> limit. I've done this test for ext4 and was satisfied with result,
> but was too lazy to perform it on ext3/2 :(
> At least we have to have testcase for that in xfstest-qa.
> It seems that private page_mkwrite will be sufficient.
> I'm working on that.
Yes, it's a long standing bug. Another manifestation of the bug is that
we just throw away user's data without warning if we really cannot find
space for it. Fixing it isn't completely trivial - doing block allocation
during page_mkwrite really sucks performance-wise (tried that) so we
basically have to implement delayed allocation for ext3 (and other
filesystems) for mmaped writes and do reservation on page_mkwrite time and
allocation on writepage time. I already have patches doing that but they
depended on the truncate rewrite patch series and that was dragging on and
on for half an year or so. Now I guess it's right time to rebase them and
start pushing them again...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2010-08-02 12:47:13

by Jan Kara

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

On Mon 02-08-10 14:10:34, Akira Fujita wrote:
> Hi ext3 maintainers,
>
> Could you look into this?
> If this is not a problem, it is good though.
It's a bug and I'm aware of problems of this sort for some time already.
But I never realized this particular effect which is really nasty. Thanks
for letting me now. I'll give more priority to rebasing my patches fixing
this and pushing them upstream.

Honza

> (2010/07/29 11:08), Akira Fujita wrote:
> > Hi,
> >
> > I found a problem that user can allocate blocks over quota limitation
> > on ext3 (and ext2) with mmap.
> > You can reproduce this with the following steps:
> >
> > 1. Enable user quota on ext3
> > [akira@bsd086 mnt]$ uname -r
> > 2.6.35-rc6
> >
> > [root@bsd086 mnt]# cat /proc/mounts | grep /dev/sda9
> > /dev/sda9 /mnt/mp1 ext3 rw,relatime,errors=continue,barrier=0,data=ordered,usrquota 0 0
> >
> > [root@bsd086 mnt]# quotaon -p /mnt/mp1
> > group quota on /mnt/mp1 (/dev/sda9) is off
> > user quota on /mnt/mp1 (/dev/sda9) is on
> >
> > [root@bsd086 mnt]# repquota -v /mnt/mp1
> > *** Report for user quotas on device /dev/sda9
> > Block grace time: 7days; Inode grace time: 7days
> > Block limits File limits
> > User used soft hard grace used soft hard grace
> > ----------------------------------------------------------------------
> > root -- 1229 0 0 4 0 0
> > akira -- 0 100 1000 0 0 0
> >
> >
> > 2. Create sparse file on ext3
> > [akira@bsd086 mnt]$ df -T /mnt/mp1
> > Filesystem Type 1K-blocks Used Available Use% Mounted on
> > /dev/sda9 ext3 23300 1236 20861 6% /mnt/mp1
> >
> > [akira@bsd086 mnt]$ dd if=/dev/zero of=/mnt/mp1/file bs=4096 seek=1MB count=1
> >
> > [akira@bsd086 mnt]$ ls -ls /mnt/mp1
> > total 26
> > 7 -rw------- 1 root root 7168 Jul 28 15:53 aquota.user
> > 7 -rw-rw-r-- 1 akira akira 4096004096 Jul 28 15:53 file
> > 12 drwx------ 2 root root 12288 Jul 28 14:49 lost+found
> >
> > [root@bsd086 mnt]# repquota -v /mnt/mp1
> > *** Report for user quotas on device /dev/sda9
> > Block grace time: 7days; Inode grace time: 7days
> > Block limits File limits
> > User used soft hard grace used soft hard grace
> > ----------------------------------------------------------------------
> > root -- 1228 0 0 3 0 0
> > akira -- 8 100 1000 2 0 0
> >
> > 3. Write data to "file" with mmap and msync.
> > (In this time, write size is 50MB. It's larger than partition size )
> > e.g.
> > long long contents = 0x0002;
> > fd = (file, O_APPEND | O_RDWR, 0666);
> > p = mmap(NULL, psize, PROT_WRITE, MAP_SHARED, fd, offset);
> > memset(p, contents++, psize);
> > offset += psize
> > munmap(p, psize);
> > close(fd);
> >
> > 4. Then run out disk space, user uses all of the blocks.
> > [akira@bsd086 mnt]$ df -T /mnt/mp1
> > Filesystem Type 1K-blocks Used Available Use% Mounted on
> > /dev/sda9 ext3 23300 23300 0 100% /mnt/mp1
> > ~~~~~
> > [root@bsd086 mnt]# repquota -v /mnt/mp1
> > *** Report for user quotas on device /dev/sda9
> > Block grace time: 7days; Inode grace time: 7days
> > Block limits File limits
> > User used soft hard grace used soft hard grace
> > ----------------------------------------------------------------------
> > root -- 1228 0 0 3 0 0
> > akira +- 22065 100 1000 6days 2 0 0
> > ~~~~~
> >
> > memset() after mmap() triggers the pagefault and then __do_fault
> > marks whole pages correspond to offset we specified as dirty.
> > After 5 seconds (or call sync), the kjournald tries to write out all of dirtied pages
> > with getting blocks to disk.
> > kjournald has CAP_SYS_RESOURCE capability, therefore it can ignore
> > quota limitation (also can use blocks for root user).
> > As a result, user can have blocks over quota limitation,
> > though quota is enabled.
> > Note: ext4 has own page_mkwrite, so this problem does not happen on it.
> >
> > I guess behavior of kjournald is correct (write out all dirty pages of file),
> > so we need some consideration for pagefault behavior for ext3 and ext2.
> >
> > Is this a bug?
> >
> > Regards,
> > Akira Fujita
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR

2010-08-02 13:00:29

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: BUG? ext3: Allocate blocks over quota limit with mmap

Jan Kara <[email protected]> writes:

> On Mon 02-08-10 09:22:12, Dmitry Monakhov wrote:
>> Akira Fujita <[email protected]> writes:
>>
>> > Hi ext3 maintainers,
>> >
>> > Could you look into this?
>> > If this is not a problem, it is good though.
>> Actually this is a problem. Because this issue makes quota just a fake
>> limit. I've done this test for ext4 and was satisfied with result,
>> but was too lazy to perform it on ext3/2 :(
>> At least we have to have testcase for that in xfstest-qa.
>> It seems that private page_mkwrite will be sufficient.
>> I'm working on that.
> Yes, it's a long standing bug. Another manifestation of the bug is that
> we just throw away user's data without warning if we really cannot find
> space for it. Fixing it isn't completely trivial - doing block allocation
> during page_mkwrite really sucks performance-wise (tried that) so we
> basically have to implement delayed allocation for ext3 (and other
> filesystems) for mmaped writes and do reservation on page_mkwrite time and
> allocation on writepage time. I already have patches doing that but they
> depended on the truncate rewrite patch series and that was dragging on and
> on for half an year or so. Now I guess it's right time to rebase them and
> start pushing them again...
Indeed. Let implement it similar to ext4 "do not reserve quota space for
metadata but only for data". And speculatively charge metadata during
allocation. This makes page_mkwrite() simple and clean.
>
> Honza