Date:   Tue, 14 Feb 2023 23:32:02 -0500
From:   "Theodore Ts'o" <tytso@mit.edu>
To:     Jun Nie <jun.nie@linaro.org>
Cc:     "Darrick J. Wong" <djwong@kernel.org>, adilger.kernel@dilger.ca,
        linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
        tudor.ambarus@linaro.org
Subject: Re: [PATCH] ext4: reject 1k block fs on the first block of disk
Message-ID: <Y+xgQklC81XCB+q4@mit.edu>
References: <20221229014502.2322727-1-jun.nie@linaro.org>
 <Y7R/QKIbYQ2TCP+W@magnolia>
 <CABymUCPCT9KbMQDUTxwf6A+Cg9fWJNkefbMHD7SZD3Fc7FMFHg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CABymUCPCT9KbMQDUTxwf6A+Cg9fWJNkefbMHD7SZD3Fc7FMFHg@mail.gmail.com>
Precedence: bulk

On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
> Darrick J. Wong <djwong@kernel.org> 于2023年1月4日周三 03:17写道：
> >
> > On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
> > > For 1k-block filesystems, the filesystem starts at block 1, not block 0.
> > > If start_fsb is 0, it will be bump up to s_first_data_block. Then
> > > ext4_get_group_no_and_offset don't know what to do and return garbage
> > > results (blockgroup 2^32-1). The underflow make index
> > > exceed es->s_groups_count in ext4_get_group_info() and trigger the BUG_ON.
> > >
> > > Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
> > > Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
> > > Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
> > > Signed-off-by: Jun Nie <jun.nie@linaro.org>
> > > ---
> > >  fs/ext4/fsmap.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
> > > index 4493ef0c715e..1aef127b0634 100644
> > > --- a/fs/ext4/fsmap.c
> > > +++ b/fs/ext4/fsmap.c
> > > @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
> > >               if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
> > >                       memset(&dkeys[0], 0, sizeof(struct ext4_fsmap));
> > >
> > > +             /*
> > > +              * Re-check the range after above limit operation and reject
> > > +              * 1K fs on block 0 as fs should start block 1. */
> > > +             if (dkeys[0].fmr_physical ==0 && dkeys[1].fmr_physical == 0)
> > > +                     continue;
> >
> > ...and if this filesystem has 4k blocks, and therefore *does* define a
> > block 0?
> 
> Yes, this is a real corner case test :-)

So I'm really nervous about this change.  I don't understand the code;
and I don't understand how the reproducer works.  I can certainly
reproduce it using the reproducer found here[1], but it seems to
require running multiple processes all creating loop devices and then
running FS_IOC_GETMAP.

[1] https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002

If I change the reproducer to just run the execute_one() once, it
doesn't trigger the bug.  It seems to only trigger when you have
multiple processes all racing to create a loop device, mount the file
system, try running FS_IOC_GETMAP --- and then delete the loop device
without actually unmounting the file system.  Which is **weird***.

I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
and that doesn't trigger it either.

And I don't understand the reply to Darrick's question about why it's
safe to add the check since for 4k block file systems, block 0 *is*
valid.

So if someone can explain to me what is going on here with this code
(there are too many abstractions and what's going on with keys is just
making my head hurt), *and* what the change actually does, and how to
reproduce the problem with a ***simple*** reproducer -- the syzbot
mess doesn't count, that would be great.  But applying a change that I
don't understand to code I don't understand, to fix a reproducer which
I also doesn't understand, just doesn't make me feel comfortable.

Regards,

					- Ted