2024-03-15 04:42:12

by Kent Overstreet

[permalink] [raw]
Subject: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

there's a bug in 6.7 with filesystems that are mid upgrade and then get
downgraded not getting marked in the superblock as downgraded, and this
translates to a really horrific bug in splitbrain detection when the old
version isn't updating member sequence nmubers and you go back to the
new version - this results in every device being kicked out of the fs.

and our backports are not being picked up by the stable team, so - do
not run 6.7, switch to 6.8 immediately, running 6.7 with new -tools will
trigger it.

if you are affected:

- 6.9 (once Linus merges) will have a new no_splitbrain_check option,
which runs the splitbrain checks in dry mode and won't kick your
devices out

- we have new repair code landing soon that can recover from
missing/unreadable btree roots by scanning the entire device(s) for
btree nodes (which, fortunately, we have sufficient metadata in btree
node headers to do safely; reiserfs famously did not). i've seen some
crazy corruption resulting from this, but it might still be
recoverable


2024-03-15 09:07:37

by Martin Steigerwald

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

Hi Kent, hi.

Kent Overstreet - 15.03.24, 05:41:09 CET:
> there's a bug in 6.7 with filesystems that are mid upgrade and then get
> downgraded not getting marked in the superblock as downgraded, and this
> translates to a really horrific bug in splitbrain detection when the old
> version isn't updating member sequence nmubers and you go back to the
> new version - this results in every device being kicked out of the fs.

I take it that single device BCacheFS filesystems can be upgraded just fine?

I can also recreate and repopulate once I upgraded to 6.8. Still waiting a
bit.

Best.
--
Martin



2024-03-15 17:58:12

by Kent Overstreet

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

On Fri, Mar 15, 2024 at 09:57:34AM +0100, Martin Steigerwald wrote:
> Hi Kent, hi.
>
> Kent Overstreet - 15.03.24, 05:41:09 CET:
> > there's a bug in 6.7 with filesystems that are mid upgrade and then get
> > downgraded not getting marked in the superblock as downgraded, and this
> > translates to a really horrific bug in splitbrain detection when the old
> > version isn't updating member sequence nmubers and you go back to the
> > new version - this results in every device being kicked out of the fs.
>
> I take it that single device BCacheFS filesystems can be upgraded just fine?
>
> I can also recreate and repopulate once I upgraded to 6.8. Still waiting a
> bit.

No need to recreate and repopulate - you just don't want to be going
back to 6.7 from a newer version.

2024-03-16 16:18:47

by Martin Steigerwald

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

Kent Overstreet - 15.03.24, 18:57:51 CET:
> > I take it that single device BCacheFS filesystems can be upgraded just
> > fine?
> >
> > I can also recreate and repopulate once I upgraded to 6.8. Still
> > waiting a bit.
>
> No need to recreate and repopulate - you just don't want to be going
> back to 6.7 from a newer version.

Unfortunately I need to do exactly that, as 6.8.1 breaks hibernation on
ThinkPad T14 AMD Gen 1:

[regression] 6.8.1: fails to hibernate with
pm_runtime_force_suspend+0x0/0x120 returns -16

https://lore.kernel.org/linux-pm/[email protected]/T/#t

No luck with kernel upgrades these days. :(

I will backup the test filesystem so in case something breaks I can
recreate and repopulate it again. Downgrading to 6.7.10 then and will see
what happens. Can repopulate from backup again if need be. Also when
upgrading to 6.8 again after the regression has been fixed.

Best,
--
Martin



2024-03-16 16:41:27

by Kent Overstreet

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

On Sat, Mar 16, 2024 at 05:18:30PM +0100, Martin Steigerwald wrote:
> Kent Overstreet - 15.03.24, 18:57:51 CET:
> > > I take it that single device BCacheFS filesystems can be upgraded just
> > > fine?
> > >
> > > I can also recreate and repopulate once I upgraded to 6.8. Still
> > > waiting a bit.
> >
> > No need to recreate and repopulate - you just don't want to be going
> > back to 6.7 from a newer version.
>
> Unfortunately I need to do exactly that, as 6.8.1 breaks hibernation on
> ThinkPad T14 AMD Gen 1:
>
> [regression] 6.8.1: fails to hibernate with
> pm_runtime_force_suspend+0x0/0x120 returns -16
>
> https://lore.kernel.org/linux-pm/[email protected]/T/#t
>
> No luck with kernel upgrades these days. :(
>
> I will backup the test filesystem so in case something breaks I can
> recreate and repopulate it again. Downgrading to 6.7.10 then and will see
> what happens. Can repopulate from backup again if need be. Also when
> upgrading to 6.8 again after the regression has been fixed.

run this tree then:

https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-v6.7

2024-03-16 16:50:57

by Martin Steigerwald

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

Kent Overstreet - 16.03.24, 17:41:08 CET:
> > > No need to recreate and repopulate - you just don't want to be going
> > > back to 6.7 from a newer version.
> >
> > Unfortunately I need to do exactly that, as 6.8.1 breaks hibernation
> > on ThinkPad T14 AMD Gen 1:
[…]
> run this tree then:
>
> https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-v6.7

Wonderful. Thanks! Compiling this one instead. Shall I report something to
you once I booted into it? I read you had difficulties getting those patches
into stable.

It's 6.7.9, but there is no use for the Intel Atom mitigation in 6.7.10
for this laptop. So it will work perfectly.

--
Martin



2024-03-16 17:32:16

by Martin Steigerwald

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

Martin Steigerwald - 16.03.24, 17:49:52 CET:
> > > Unfortunately I need to do exactly that, as 6.8.1 breaks hibernation
>
> > > on ThinkPad T14 AMD Gen 1:
> […]
>
> > run this tree then:
> >
> > https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-v6.7
>
> Wonderful. Thanks! Compiling this one instead. Shall I report something
> to you once I booted into it? I read you had difficulties getting those
> patches into stable.

It seems that the downgrade succeeded.

First mount:

[ 22.565053] bcachefs (dm-5): mounting version 1.4: (unknown version) opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
[ 22.565686] bcachefs (dm-5): recovering from clean shutdown, journal seq 116996
[ 22.565717] bcachefs (dm-5): Version downgrade required:

[ 22.590487] bcachefs (dm-5): alloc_read... done
[ 22.597896] bcachefs (dm-5): stripes_read... done
[ 22.597930] bcachefs (dm-5): snapshots_read... done
[ 22.651106] bcachefs (dm-5): journal_replay... done
[ 22.651667] bcachefs (dm-5): resume_logged_ops... done
[ 22.651736] bcachefs (dm-5): going read-write

I wonder whether there was some text supposed to follow
"Version downgrade required:". The line feed was in the output.

Second mount:

[ 113.059224] bcachefs (dm-5): mounting version 1.3: rebalance_work opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
[ 113.059259] bcachefs (dm-5): recovering from clean shutdown, journal seq 117013
[ 113.083911] bcachefs (dm-5): alloc_read... done
[ 113.091268] bcachefs (dm-5): stripes_read... done
[ 113.091281] bcachefs (dm-5): snapshots_read... done
[ 113.142374] bcachefs (dm-5): journal_replay... done
[ 113.142390] bcachefs (dm-5): resume_logged_ops... done
[ 113.142406] bcachefs (dm-5): going read-write

Thanks,
--
Martin



2024-03-16 18:09:24

by Kent Overstreet

[permalink] [raw]
Subject: Re: bcachefs: do not run 6.7: upgrade to 6.8 immediately if you have a multi device fs

On Sat, Mar 16, 2024 at 06:31:58PM +0100, Martin Steigerwald wrote:
> Martin Steigerwald - 16.03.24, 17:49:52 CET:
> > > > Unfortunately I need to do exactly that, as 6.8.1 breaks hibernation
> >
> > > > on ThinkPad T14 AMD Gen 1:
> > […]
> >
> > > run this tree then:
> > >
> > > https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-v6.7
> >
> > Wonderful. Thanks! Compiling this one instead. Shall I report something
> > to you once I booted into it? I read you had difficulties getting those
> > patches into stable.
>
> It seems that the downgrade succeeded.
>
> First mount:
>
> [ 22.565053] bcachefs (dm-5): mounting version 1.4: (unknown version) opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
> [ 22.565686] bcachefs (dm-5): recovering from clean shutdown, journal seq 116996
> [ 22.565717] bcachefs (dm-5): Version downgrade required:
>
> [ 22.590487] bcachefs (dm-5): alloc_read... done
> [ 22.597896] bcachefs (dm-5): stripes_read... done
> [ 22.597930] bcachefs (dm-5): snapshots_read... done
> [ 22.651106] bcachefs (dm-5): journal_replay... done
> [ 22.651667] bcachefs (dm-5): resume_logged_ops... done
> [ 22.651736] bcachefs (dm-5): going read-write
>
> I wonder whether there was some text supposed to follow
> "Version downgrade required:". The line feed was in the output.
>
> Second mount:
>
> [ 113.059224] bcachefs (dm-5): mounting version 1.3: rebalance_work opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
> [ 113.059259] bcachefs (dm-5): recovering from clean shutdown, journal seq 117013
> [ 113.083911] bcachefs (dm-5): alloc_read... done
> [ 113.091268] bcachefs (dm-5): stripes_read... done
> [ 113.091281] bcachefs (dm-5): snapshots_read... done
> [ 113.142374] bcachefs (dm-5): journal_replay... done
> [ 113.142390] bcachefs (dm-5): resume_logged_ops... done
> [ 113.142406] bcachefs (dm-5): going read-write

Yup looks good