2021-10-14 13:16:05

by Tommi Rantala

[permalink] [raw]
Subject: Inode 2885482 (000000008e814f64): i_reserved_data_blocks (2) not cleared!

Hi,

I'm seeing these i_reserved_data_blocks not cleared! messages when using ext4
with nodelalloc, message added in:

commit 6fed83957f21eff11c8496e9f24253b03d2bc1dc
Author: Jeffle Xu <[email protected]>
Date: Mon Aug 23 14:13:58 2021 +0800

ext4: fix reserved space counter leakage

I can quickly reproduce in 5.15.0-rc5-00041-g348949d9a444 by doing some
filesystem I/O while toggling delalloc:


while true; do mount -o remount,nodelalloc /; sleep 1; mount -o remount,delalloc /; sleep 1; done &
git clone linux xxx; rm -rf xxx

[ 222.928341] EXT4-fs (vdb1): re-mounted. Opts: delalloc. Quota mode: disabled.
[ 223.932516] EXT4-fs (vdb1): re-mounted. Opts: nodelalloc. Quota mode: disabled.
[ 224.183741] EXT4-fs (vdb1): Inode 2885482 (000000008e814f64): i_reserved_data_blocks (2) not cleared!
[ 224.185064] EXT4-fs (vdb1): Inode 2885478 (00000000862b48ad): i_reserved_data_blocks (2) not cleared!
[ 224.186434] EXT4-fs (vdb1): Inode 2885474 (00000000a20bdd95): i_reserved_data_blocks (7) not cleared!
[ 224.187649] EXT4-fs (vdb1): Inode 2885476 (00000000028005e1): i_reserved_data_blocks (2) not cleared!
[ 224.189016] EXT4-fs (vdb1): Inode 2885475 (0000000025d9617d): i_reserved_data_blocks (2) not cleared!
[ 224.190370] EXT4-fs (vdb1): Inode 2885480 (00000000d0722d90): i_reserved_data_blocks (7) not cleared!
[ 224.191732] EXT4-fs (vdb1): Inode 2885481 (000000009b50d6cb): i_reserved_data_blocks (1) not cleared!
[ 224.193093] EXT4-fs (vdb1): Inode 2885472 (00000000fe907f54): i_reserved_data_blocks (1) not cleared!
[ 227.946984] EXT4-fs: 9213 callbacks suppressed
[ 227.946989] EXT4-fs (vdb1): re-mounted. Opts: nodelalloc. Quota mode: disabled.


-Tommi


2021-10-15 03:00:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Inode 2885482 (000000008e814f64): i_reserved_data_blocks (2) not cleared!

On Fri, Oct 15, 2021 at 02:06:52AM +0800, Gao Xiang wrote:
> On Thu, Oct 14, 2021 at 12:54:14PM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
> > Hi,
> >
> > I'm seeing these i_reserved_data_blocks not cleared! messages when using ext4
> > with nodelalloc, message added in:
> >
> > commit 6fed83957f21eff11c8496e9f24253b03d2bc1dc
> > Author: Jeffle Xu <[email protected]>
> > Date: Mon Aug 23 14:13:58 2021 +0800
> >
> > ext4: fix reserved space counter leakage
> >
> > I can quickly reproduce in 5.15.0-rc5-00041-g348949d9a444 by doing some
> > filesystem I/O while toggling delalloc:
> >
> >
> > while true; do mount -o remount,nodelalloc /; sleep 1; mount -o remount,delalloc /; sleep 1; done &
> > git clone linux xxx; rm -rf xxx
>
> If I understand correctly, switching such option implies
> sync inodes to write back exist delayed allocation blocks.

Well, no. What it implies is that all writes after the remount into
an unallocated portion of the file will be allocated at the time when
the page is dirtied, instead of when the page is written back. It's
possible for some pages to be written using delayed allocation, and
some other pages in the legacy "allocate on page dirty" mechanism.
This can happen when the file system is remounted; it can also happen
when the file system starts getting close to 100% full. See the
comment in ext4_nonda_switch:

/*
* switch to non delalloc mode if we are running low
* on free block. The free block accounting via percpu
* counters can get slightly wrong with percpu_counter_batch getting
* accumulated on each CPU without updating global counters
* Delalloc need an accurate free block accounting. So switch
* to non delalloc when we are near to error range.
*/

Cheers,

- Ted

2021-10-15 08:17:09

by Gao Xiang

[permalink] [raw]
Subject: Re: Inode 2885482 (000000008e814f64): i_reserved_data_blocks (2) not cleared!

On Thu, Oct 14, 2021 at 05:57:32PM -0400, Theodore Ts'o wrote:
> On Fri, Oct 15, 2021 at 02:06:52AM +0800, Gao Xiang wrote:
> > On Thu, Oct 14, 2021 at 12:54:14PM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
> > > Hi,
> > >
> > > I'm seeing these i_reserved_data_blocks not cleared! messages when using ext4
> > > with nodelalloc, message added in:
> > >
> > > commit 6fed83957f21eff11c8496e9f24253b03d2bc1dc
> > > Author: Jeffle Xu <[email protected]>
> > > Date: Mon Aug 23 14:13:58 2021 +0800
> > >
> > > ext4: fix reserved space counter leakage
> > >
> > > I can quickly reproduce in 5.15.0-rc5-00041-g348949d9a444 by doing some
> > > filesystem I/O while toggling delalloc:
> > >
> > >
> > > while true; do mount -o remount,nodelalloc /; sleep 1; mount -o remount,delalloc /; sleep 1; done &
> > > git clone linux xxx; rm -rf xxx
> >
> > If I understand correctly, switching such option implies
> > sync inodes to write back exist delayed allocation blocks.
>
> Well, no. What it implies is that all writes after the remount into
> an unallocated portion of the file will be allocated at the time when
> the page is dirtied, instead of when the page is written back. It's
> possible for some pages to be written using delayed allocation, and
> some other pages in the legacy "allocate on page dirty" mechanism.
> This can happen when the file system is remounted; it can also happen
> when the file system starts getting close to 100% full. See the
> comment in ext4_nonda_switch:
>
> /*
> * switch to non delalloc mode if we are running low
> * on free block. The free block accounting via percpu
> * counters can get slightly wrong with percpu_counter_batch getting
> * accumulated on each CPU without updating global counters
> * Delalloc need an accurate free block accounting. So switch
> * to non delalloc when we are near to error range.
> */

Hi Ted,

Ok, thanks for the detailed behavior explanation yet I guess several
checks of "test_opt(inode->i_sb, DELALLOC)" could be somewhat racy
then? For example a check in __es_remove_extent() of extents_status.c?

Thanks,
Gao Xiang

>
> Cheers,
>
> - Ted

2021-10-18 04:25:54

by Jingbo Xu

[permalink] [raw]
Subject: Re: Inode 2885482 (000000008e814f64): i_reserved_data_blocks (2) not cleared!



On 10/15/21 5:57 AM, Theodore Ts'o wrote:
> On Fri, Oct 15, 2021 at 02:06:52AM +0800, Gao Xiang wrote:
>> On Thu, Oct 14, 2021 at 12:54:14PM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
>>> Hi,
>>>
>>> I'm seeing these i_reserved_data_blocks not cleared! messages when using ext4
>>> with nodelalloc, message added in:
>>>
>>> commit 6fed83957f21eff11c8496e9f24253b03d2bc1dc
>>> Author: Jeffle Xu <[email protected]>
>>> Date: Mon Aug 23 14:13:58 2021 +0800
>>>
>>> ext4: fix reserved space counter leakage
>>>
>>> I can quickly reproduce in 5.15.0-rc5-00041-g348949d9a444 by doing some
>>> filesystem I/O while toggling delalloc:
>>>
>>>
>>> while true; do mount -o remount,nodelalloc /; sleep 1; mount -o remount,delalloc /; sleep 1; done &
>>> git clone linux xxx; rm -rf xxx
>>
>> If I understand correctly, switching such option implies
>> sync inodes to write back exist delayed allocation blocks.
>
> Well, no. What it implies is that all writes after the remount into
> an unallocated portion of the file will be allocated at the time when
> the page is dirtied, instead of when the page is written back. It's
> possible for some pages to be written using delayed allocation, and
> some other pages in the legacy "allocate on page dirty" mechanism.
> This can happen when the file system is remounted; it can also happen
> when the file system starts getting close to 100% full. See the
> comment in ext4_nonda_switch:
>
> /*
> * switch to non delalloc mode if we are running low
> * on free block. The free block accounting via percpu
> * counters can get slightly wrong with percpu_counter_batch getting
> * accumulated on each CPU without updating global counters
> * Delalloc need an accurate free block accounting. So switch
> * to non delalloc when we are near to error range.
> */
>

So it seems possible that s_dirtyclusters_counter/i_reserved_data_blocks
counters are not maintained anymore when filesystem gets remounted from
'delalloc' to 'nodelalloc', even when you're writing back a (previously)
delay allocated page cache (when it's still mounted as 'delalloc'). Thus
it is possible that s_dirtyclusters_counter/i_reserved_data_blocks
counters are non-zero when the inode is finally evicted and destroyed.

IMHO I think this inconsistency is problematic. For example, when
filesystem gets remounted from 'delalloc' to 'nodelalloc' and then runs
for a period, s_dirtyclusters_counter/i_reserved_data_blocks counters
already gets inconsistent. Then it's remounted back to 'delalloc', in
which case s_dirtyclusters_counter/i_reserved_data_blocks counters are
already incorrect.



--
Thanks,
Jeffle