On the latest kernel 6.0.0-0.rc2, I find the user quota limit in an
ext4 mount is unstable, that after several successful "write file then
delete" loops, it will finally fail with "Disk quota exceeded". This
bug can be reproduced on at least kernel-6.0.0-0.rc2 and
kernel-5.14.0-*, but can't be reproduced on kernel-4.18.0 based RHEL8
kernel.
Reproducer (can also be found as the attachment):
```
#!/bin/bash
# setup
groupadd -f quota_test
useradd -g quota_test quota_test_user1
dd if=/dev/null of=ext4_5G.img bs=1G seek=5
lo_dev=$(losetup -f --show ext4_5G.img)
mkdir /mntpt
mkfs.ext4 -F ext4_5G.img
mount -o usrquota ext4_5G.img /mntpt
chmod 777 /mntpt
quotacheck -u /mntpt
setquota -u quota_test_user1 200000 300000 2000 3000 /mntpt
quotaon -u /mntpt
# test
for i in $(seq 1 100); do
echo "*** Run#$((i++)) ***"
echo "--- Quota before writing file ---"; quota -uv
quota_test_user1; echo "--- ---"
su - quota_test_user1 -c "dd if=/dev/zero of=/mntpt/test_300m
bs=1024 count=300000" || break_flag=1
echo "--- Quota after writing file ---"; quota -uv
quota_test_user1; echo "--- ---"
rm -f /mntpt/test_300m
sleep 10s # in case slow deletion
echo "--- Quota after deleting file ---"; quota -uv
quota_test_user1; echo "--- ---"
[[ $break_flag -eq 1 ]] && break
done
# cleanup
umount /mntpt
losetup -D
rm -rf ext4_5G.img /mntpt
userdel -r quota_test_user1
groupdel quota_test
```
Run log on kernel-6.0.0-0.rc2
```
(...skip successful Run#[1-2]...)
*** Run#3 ***
--- Quota before writing file ---
Disk quotas for user quota_test_user1 (uid 1003):
Filesystem blocks quota limit grace files quota limit grace
/dev/loop0 0 200000 300000 0 2000 3000
--- ---
dd: error writing '/mntpt/test_300m': Disk quota exceeded
299997+0 records in
299996+0 records out
307195904 bytes (307 MB, 293 MiB) copied, 1.44836 s, 212 MB/s
--- Quota after writing file ---
Disk quotas for user quota_test_user1 (uid 1003):
Filesystem blocks quota limit grace files quota limit grace
/dev/loop0 300000* 200000 300000 7days 1 2000 3000
--- ---
--- Quota after deleting file ---
Disk quotas for user quota_test_user1 (uid 1003):
Filesystem blocks quota limit grace files quota limit grace
/dev/loop0 0 200000 300000 0 2000 3000
--- ---
```
The kernel in test can be found at
https://koji.fedoraproject.org/koji/buildinfo?buildID=2050107
Hello!
On Tue 23-08-22 12:16:46, Boyang Xue wrote:
> On the latest kernel 6.0.0-0.rc2, I find the user quota limit in an
> ext4 mount is unstable, that after several successful "write file then
> delete" loops, it will finally fail with "Disk quota exceeded". This
> bug can be reproduced on at least kernel-6.0.0-0.rc2 and
> kernel-5.14.0-*, but can't be reproduced on kernel-4.18.0 based RHEL8
> kernel.
<snip reproducer>
> Run log on kernel-6.0.0-0.rc2
> ```
> (...skip successful Run#[1-2]...)
> *** Run#3 ***
> --- Quota before writing file ---
> Disk quotas for user quota_test_user1 (uid 1003):
> Filesystem blocks quota limit grace files quota limit grace
> /dev/loop0 0 200000 300000 0 2000 3000
> --- ---
> dd: error writing '/mntpt/test_300m': Disk quota exceeded
> 299997+0 records in
> 299996+0 records out
> 307195904 bytes (307 MB, 293 MiB) copied, 1.44836 s, 212 MB/s
So this shows that we have failed allocating the last filesystem block. I
suspect this happens because the file gets allocted from several free space
extens and so one extra indirect tree block needs to be allocated (or
something like that). To verify that you can check the created file with
"filefrag -v".
Anyway I don't think it is quite correct to assume the filesystem can fit
300000 data blocks within 300000 block quota because the metadata overhead
gets accounted into quota as well and the user has no direct control over
that. So you should probably give filesystem some slack space in your
tests for metadata overhead.
> --- Quota after writing file ---
> Disk quotas for user quota_test_user1 (uid 1003):
> Filesystem blocks quota limit grace files quota limit grace
> /dev/loop0 300000* 200000 300000 7days 1 2000 3000
> --- ---
> --- Quota after deleting file ---
> Disk quotas for user quota_test_user1 (uid 1003):
> Filesystem blocks quota limit grace files quota limit grace
> /dev/loop0 0 200000 300000 0 2000 3000
> --- ---
> ```
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
Hi Jan,
On Thu, Sep 22, 2022 at 8:02 PM Jan Kara <[email protected]> wrote:
>
> Hello!
>
> On Tue 23-08-22 12:16:46, Boyang Xue wrote:
> > On the latest kernel 6.0.0-0.rc2, I find the user quota limit in an
> > ext4 mount is unstable, that after several successful "write file then
> > delete" loops, it will finally fail with "Disk quota exceeded". This
> > bug can be reproduced on at least kernel-6.0.0-0.rc2 and
> > kernel-5.14.0-*, but can't be reproduced on kernel-4.18.0 based RHEL8
> > kernel.
>
> <snip reproducer>
>
> > Run log on kernel-6.0.0-0.rc2
> > ```
> > (...skip successful Run#[1-2]...)
> > *** Run#3 ***
> > --- Quota before writing file ---
> > Disk quotas for user quota_test_user1 (uid 1003):
> > Filesystem blocks quota limit grace files quota limit grace
> > /dev/loop0 0 200000 300000 0 2000 3000
> > --- ---
> > dd: error writing '/mntpt/test_300m': Disk quota exceeded
> > 299997+0 records in
> > 299996+0 records out
> > 307195904 bytes (307 MB, 293 MiB) copied, 1.44836 s, 212 MB/s
>
> So this shows that we have failed allocating the last filesystem block. I
> suspect this happens because the file gets allocted from several free space
> extens and so one extra indirect tree block needs to be allocated (or
> something like that). To verify that you can check the created file with
> "filefrag -v".
By hooking a "filefrag -v" in each run, I find a pattern that only
when the dd command writes out of disk quota, "filefrag -v" shows
"unwritten extents", like this:
```
Filesystem type is: ef53
File size of /mntpt/test_300m is 307195904 (74999 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 1023: 98976.. 99999: 1024:
1: 1024.. 18431: 112640.. 130047: 17408: 100000:
2: 18432.. 51199: 131072.. 163839: 32768: 130048:
3: 51200.. 55236: 165888.. 169924: 4037: 163840: unwritten
4: 55237.. 74998: 0.. 0: 0:
last,unknown_loc,delalloc,eof
/mntpt/test_300m: 5 extents found
```
>
> Anyway I don't think it is quite correct to assume the filesystem can fit
> 300000 data blocks within 300000 block quota because the metadata overhead
> gets accounted into quota as well and the user has no direct control over
> that. So you should probably give filesystem some slack space in your
> tests for metadata overhead.
It makes sense to me. Indeed my test should count in the metadata
overhead. Thanks for the explanation!
-Boyang
>
> > --- Quota after writing file ---
> > Disk quotas for user quota_test_user1 (uid 1003):
> > Filesystem blocks quota limit grace files quota limit grace
> > /dev/loop0 300000* 200000 300000 7days 1 2000 3000
> > --- ---
> > --- Quota after deleting file ---
> > Disk quotas for user quota_test_user1 (uid 1003):
> > Filesystem blocks quota limit grace files quota limit grace
> > /dev/loop0 0 200000 300000 0 2000 3000
> > --- ---
> > ```
>
> Honza
>
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>
On Fri 23-09-22 15:37:55, Boyang Xue wrote:
> Hi Jan,
>
> On Thu, Sep 22, 2022 at 8:02 PM Jan Kara <[email protected]> wrote:
> >
> > Hello!
> >
> > On Tue 23-08-22 12:16:46, Boyang Xue wrote:
> > > On the latest kernel 6.0.0-0.rc2, I find the user quota limit in an
> > > ext4 mount is unstable, that after several successful "write file then
> > > delete" loops, it will finally fail with "Disk quota exceeded". This
> > > bug can be reproduced on at least kernel-6.0.0-0.rc2 and
> > > kernel-5.14.0-*, but can't be reproduced on kernel-4.18.0 based RHEL8
> > > kernel.
> >
> > <snip reproducer>
> >
> > > Run log on kernel-6.0.0-0.rc2
> > > ```
> > > (...skip successful Run#[1-2]...)
> > > *** Run#3 ***
> > > --- Quota before writing file ---
> > > Disk quotas for user quota_test_user1 (uid 1003):
> > > Filesystem blocks quota limit grace files quota limit grace
> > > /dev/loop0 0 200000 300000 0 2000 3000
> > > --- ---
> > > dd: error writing '/mntpt/test_300m': Disk quota exceeded
> > > 299997+0 records in
> > > 299996+0 records out
> > > 307195904 bytes (307 MB, 293 MiB) copied, 1.44836 s, 212 MB/s
> >
> > So this shows that we have failed allocating the last filesystem block. I
> > suspect this happens because the file gets allocted from several free space
> > extens and so one extra indirect tree block needs to be allocated (or
> > something like that). To verify that you can check the created file with
> > "filefrag -v".
>
> By hooking a "filefrag -v" in each run, I find a pattern that only
> when the dd command writes out of disk quota, "filefrag -v" shows
> "unwritten extents", like this:
> ```
> Filesystem type is: ef53
> File size of /mntpt/test_300m is 307195904 (74999 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 1023: 98976.. 99999: 1024:
> 1: 1024.. 18431: 112640.. 130047: 17408: 100000:
> 2: 18432.. 51199: 131072.. 163839: 32768: 130048:
> 3: 51200.. 55236: 165888.. 169924: 4037: 163840: unwritten
> 4: 55237.. 74998: 0.. 0: 0:
> last,unknown_loc,delalloc,eof
> /mntpt/test_300m: 5 extents found
> ```
OK, this matches what I've said. The unwritten extent is there because the
inode is just undergoing writeback and that may (at least temporarily)
increase number of extents. The inode can hold 4 extents, once fifth extent
is added we have to allocate indirect block which is what breaks your test.
So nothing unexpected here. Thanks for checking!
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR