2023-12-29 11:43:55

by Daniel J Blueman

[permalink] [raw]
Subject: Stack corruption in bch2_nocow_write

Hi Kent et al,

On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
[1], with a crafted workload [2] I'm able to trigger stack corruption
in bch2_nocow_write [3].

Let me know if you can't reproduce it and I'll check reproducibility
on another platform, and let me know for any patch testing.

Reported-by: Daniel J Blueman <[email protected]>

Thanks,
Daniel

-- [1] https://pastebin.com/WrhtGzck

-- [2]

modprobe brd rd_size=536870912 rd_nr=7
bcachefs format -f --nocow --foreground_target=/dev/ram4
--promote_target=/dev/ram5 /dev/ram1 /dev/ram2 /dev/ram5 /dev/ram0
/dev/ram4 /dev/ram6 /dev/ram3
mount -t bcachefs
/dev/ram1:/dev/ram2:/dev/ram5:/dev/ram0:/dev/ram4:/dev/ram6:/dev/ram3
/mnt
fio --group_reporting --ioengine=io_uring --directory=/mnt --size=16m
--time_based --runtime=60s --iodepth=256 --verify_async=8 --bs=4k-64k
--norandommap --random_distribution=zipf:0.5 --numjobs=16 --rw=randrw
--name=A --direct=1 --name=B --direct=0 >/dev/null &
sleep 10
bcachefs device offline /dev/ram5

-- [3]

================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1258:11
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1259:11
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1262:30
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1260:11
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1264:4
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1284:55
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1285:41
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1289:29
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1293:67
index 4 is out of range for type '<unknown> [4]'
================================================================================
================================================================================
UBSAN: array-index-out-of-bounds in fs/bcachefs/io_write.c:1293:45
index 4 is out of range for type '<unknown> [4]'
================================================================================
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: bch2_nocow_write (fs/bcachefs/io_write.c:1284)
CPU: 29 PID: 4362 Comm: iou-wrk-3332 Not tainted 6.7.0-rc7+ #25
Hardware name: Supermicro AS -3014TS-i/H12SSL-i, BIOS 2.5 09/08/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
dump_stack (lib/dump_stack.c:114)
panic (kernel/panic.c:344)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? bch2_nocow_write (fs/bcachefs/io_write.c:1284)
__stack_chk_fail (??:?)
bch2_nocow_write (fs/bcachefs/io_write.c:1284)
? __lock_acquire (kernel/locking/lockdep.c:4599 (discriminator 1)
kernel/locking/lockdep.c:5091 (discriminator 1))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? lock_release (kernel/locking/lockdep.c:5430 kernel/locking/lockdep.c:5774)
? bch2_nocow_write (fs/bcachefs/bcachefs.h:1220 (discriminator 3)
fs/bcachefs/btree_iter.h:441 (discriminator 3)
fs/bcachefs/btree_iter.h:485 (discriminator 3)
fs/bcachefs/io_write.c:1231 (discriminator 3))
__bch2_write (fs/bcachefs/io_write.c:1393)
? __bch2_write (fs/bcachefs/io_write.c:1393)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
? __bch2_increment_clock (fs/bcachefs/clock.c:153)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? lock_release (kernel/locking/lockdep.c:5430 kernel/locking/lockdep.c:5774)
? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
bch2_write (fs/bcachefs/io_write.c:1613)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? bch2_write (fs/bcachefs/io_write.c:1613)
bch2_direct_write (fs/bcachefs/fs-io-direct.c:528
fs/bcachefs/fs-io-direct.c:644)
? lock_acquire (kernel/locking/lockdep.c:467 (discriminator 4)
kernel/locking/lockdep.c:5756 (discriminator 4)
kernel/locking/lockdep.c:5719 (discriminator 4))
? __entry_text_end (??:?)
bch2_write_iter (fs/bcachefs/fs-io-buffered.c:1055)
? bch2_read_iter (fs/bcachefs/errcode.h:266 (discriminator 1)
fs/bcachefs/fs-io-direct.c:206 (discriminator 1))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? lock_acquire (kernel/locking/lockdep.c:467 (discriminator 4)
kernel/locking/lockdep.c:5756 (discriminator 4)
kernel/locking/lockdep.c:5719 (discriminator 4))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? lock_release (kernel/locking/lockdep.c:5430 kernel/locking/lockdep.c:5774)
io_write (./include/linux/fs.h:2020 io_uring/rw.c:1029)
io_issue_sqe (io_uring/io_uring.c:1888)
io_wq_submit_work (io_uring/io_uring.c:1969)
io_worker_handle_work (io_uring/io-wq.c:540 io_uring/io-wq.c:597)
io_wq_worker (io_uring/io-wq.c:258 io_uring/io-wq.c:648)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? lock_release (kernel/locking/lockdep.c:5430 kernel/locking/lockdep.c:5774)
? __pfx_io_wq_worker (io_uring/io-wq.c:627)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:63)
? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
? __pfx_io_wq_worker (io_uring/io-wq.c:627)
ret_from_fork (arch/x86/kernel/process.c:153)
? __pfx_io_wq_worker (io_uring/io-wq.c:627)
ret_from_fork_asm (arch/x86/entry/entry_64.S:250)
RIP: 0033:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================
RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000564239f39ea3
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000006
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 00007f1b2dfd25c0
R13: 0000000000000002 R14: 0000000000000002 R15: 000056423a4e7b20
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: stack-protector: Kernel stack is
corrupted in: bch2_nocow_write+0x13e9/0x1770 ]---
--
Daniel J Blueman


2023-12-29 18:55:07

by Kent Overstreet

[permalink] [raw]
Subject: Re: Stack corruption in bch2_nocow_write

On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> Hi Kent et al,
>
> On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> [1], with a crafted workload [2] I'm able to trigger stack corruption
> in bch2_nocow_write [3].
>
> Let me know if you can't reproduce it and I'll check reproducibility
> on another platform, and let me know for any patch testing.

this should be fixed in the testing branch:

commit ab35f724070ccdaa31f6376a1890473e7d031ed0
Author: Kent Overstreet <[email protected]>
Date: Fri Dec 29 13:54:00 2023 -0500

bcachefs: fix nocow write path when writing to multiple extents

Signed-off-by: Kent Overstreet <[email protected]>

diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index c5961bac19f0..7c5963cd0b85 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -1316,6 +1316,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
closure_get(&op->cl);
bch2_submit_wbio_replicas(to_wbio(bio), c, BCH_DATA_user,
op->insert_keys.top, true);
+ nr_buckets = 0;

bch2_keylist_push(&op->insert_keys);
if (op->flags & BCH_WRITE_DONE)

2023-12-30 08:35:08

by Daniel J Blueman

[permalink] [raw]
Subject: Re: Stack corruption in bch2_nocow_write

On Sat, 30 Dec 2023 at 02:54, Kent Overstreet <[email protected]> wrote:
>
> On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> > Hi Kent et al,
> >
> > On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> > [1], with a crafted workload [2] I'm able to trigger stack corruption
> > in bch2_nocow_write [3].
> >
> > Let me know if you can't reproduce it and I'll check reproducibility
> > on another platform, and let me know for any patch testing.
>
> this should be fixed in the testing branch:
>
> commit ab35f724070ccdaa31f6376a1890473e7d031ed0
> Author: Kent Overstreet <[email protected]>
> Date: Fri Dec 29 13:54:00 2023 -0500
>
> bcachefs: fix nocow write path when writing to multiple extents
>
> Signed-off-by: Kent Overstreet <[email protected]>
>
> diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
> index c5961bac19f0..7c5963cd0b85 100644
> --- a/fs/bcachefs/io_write.c
> +++ b/fs/bcachefs/io_write.c
> @@ -1316,6 +1316,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
> closure_get(&op->cl);
> bch2_submit_wbio_replicas(to_wbio(bio), c, BCH_DATA_user,
> op->insert_keys.top, true);
> + nr_buckets = 0;
>
> bch2_keylist_push(&op->insert_keys);
> if (op->flags & BCH_WRITE_DONE)

Thanks for the quick update, Kent.

With this change and a few runs of the reproducer, I still hit this
stack corruption with the same backtrace.

Let me know for any further testing,
Dan
--
Daniel J Blueman

2023-12-30 19:24:26

by Kent Overstreet

[permalink] [raw]
Subject: Re: Stack corruption in bch2_nocow_write

On Sat, Dec 30, 2023 at 04:34:39PM +0800, Daniel J Blueman wrote:
> On Sat, 30 Dec 2023 at 02:54, Kent Overstreet <[email protected]> wrote:
> >
> > On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> > > Hi Kent et al,
> > >
> > > On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> > > [1], with a crafted workload [2] I'm able to trigger stack corruption
> > > in bch2_nocow_write [3].
> > >
> > > Let me know if you can't reproduce it and I'll check reproducibility
> > > on another platform, and let me know for any patch testing.
> >
> > this should be fixed in the testing branch:
> >
> > commit ab35f724070ccdaa31f6376a1890473e7d031ed0
> > Author: Kent Overstreet <[email protected]>
> > Date: Fri Dec 29 13:54:00 2023 -0500
> >
> > bcachefs: fix nocow write path when writing to multiple extents
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> >
> > diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
> > index c5961bac19f0..7c5963cd0b85 100644
> > --- a/fs/bcachefs/io_write.c
> > +++ b/fs/bcachefs/io_write.c
> > @@ -1316,6 +1316,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
> > closure_get(&op->cl);
> > bch2_submit_wbio_replicas(to_wbio(bio), c, BCH_DATA_user,
> > op->insert_keys.top, true);
> > + nr_buckets = 0;
> >
> > bch2_keylist_push(&op->insert_keys);
> > if (op->flags & BCH_WRITE_DONE)
>
> Thanks for the quick update, Kent.
>
> With this change and a few runs of the reproducer, I still hit this
> stack corruption with the same backtrace.

Reprod it - my first fix was bogus. Turns out I didn't consider cached
extents; those can exceed the BCH_REPLICAS_MAX limit, and there's
another issue I just spotted - the bucket invalidate path doesn't
respect nocow locking.

And I'm wondering if there's a lock inversion between nocow locks and
btree locks as well; need to add lockdep support for that.

2023-12-30 22:41:14

by Kent Overstreet

[permalink] [raw]
Subject: Re: Stack corruption in bch2_nocow_write

On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> Hi Kent et al,
>
> On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> [1], with a crafted workload [2] I'm able to trigger stack corruption
> in bch2_nocow_write [3].
>
> Let me know if you can't reproduce it and I'll check reproducibility
> on another platform, and let me know for any patch testing.

Can you give the bcachefs-testing branch a try?

2023-12-31 09:54:35

by Daniel J Blueman

[permalink] [raw]
Subject: Re: Stack corruption in bch2_nocow_write

On Sun, 31 Dec 2023 at 06:41, Kent Overstreet <[email protected]> wrote:
>
> On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> > Hi Kent et al,
> >
> > On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> > [1], with a crafted workload [2] I'm able to trigger stack corruption
> > in bch2_nocow_write [3].
> >
> > Let me know if you can't reproduce it and I'll check reproducibility
> > on another platform, and let me know for any patch testing.
>
> Can you give the bcachefs-testing branch a try?

Good fixes! I can't reproduce the issue on SHA 67503f8d4 in
bcachefs.git/bcachefs-testing, and no warnings are triggered with
CONFIG_PROVE_LOCKING.

Tested-by: Daniel J Blueman <[email protected]>

Thanks,
Dan
--
Daniel J Blueman