2022-02-03 21:59:41

by Su Yue

[permalink] [raw]
Subject: Re: [PATCH 2/2] btrfs: prevent copying too big compressed lzo segment


On Wed 02 Feb 2022 at 23:44, Dāvis Mosāns <[email protected]>
wrote:

> Compressed length can be corrupted to be a lot larger than
> memory
> we have allocated for buffer.
> This will cause memcpy in copy_compressed_segment to write
> outside
> of allocated memory.
>
> This mostly results in stuck read syscall but sometimes when
> using
> btrfs send can get #GP
>
> kernel: general protection fault, probably for non-canonical
> address 0x841551d5c1000: 0000 [#1] PREEMPT SMP NOPTI
> kernel: CPU: 17 PID: 264 Comm: kworker/u256:7 Tainted: P
> OE 5.17.0-rc2-1 #12
> kernel: Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> kernel: RIP: 0010:lzo_decompress_bio
> (./include/linux/fortify-string.h:225 fs/btrfs/lzo.c:322
> fs/btrfs/lzo.c:394) btrfs
> Code starting with the faulting instruction
> ===========================================
> 0:* 48 8b 06 mov (%rsi),%rax
> <-- trapping instruction
> 3: 48 8d 79 08 lea 0x8(%rcx),%rdi
> 7: 48 83 e7 f8 and $0xfffffffffffffff8,%rdi
> b: 48 89 01 mov %rax,(%rcx)
> e: 44 89 f0 mov %r14d,%eax
> 11: 48 8b 54 06 f8 mov -0x8(%rsi,%rax,1),%rdx
> kernel: RSP: 0018:ffffb110812efd50 EFLAGS: 00010212
> kernel: RAX: 0000000000001000 RBX: 000000009ca264c8 RCX:
> ffff98996e6d8ff8
> kernel: RDX: 0000000000000064 RSI: 000841551d5c1000 RDI:
> ffffffff9500435d
> kernel: RBP: ffff989a3be856c0 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000000000 R11: 0000000000001000 R12:
> ffff98996e6d8000
> kernel: R13: 0000000000000008 R14: 0000000000001000 R15:
> 000841551d5c1000
> kernel: FS: 0000000000000000(0000) GS:ffff98a09d640000(0000)
> knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 00001e9f984d9ea8 CR3: 000000014971a000 CR4:
> 00000000003506e0
> kernel: Call Trace:
> kernel: <TASK>
> kernel: end_compressed_bio_read (fs/btrfs/compression.c:104
> fs/btrfs/compression.c:1363 fs/btrfs/compression.c:323) btrfs
> kernel: end_workqueue_fn (fs/btrfs/disk-io.c:1923) btrfs
> kernel: btrfs_work_helper (fs/btrfs/async-thread.c:326) btrfs
> kernel: process_one_work (./arch/x86/include/asm/jump_label.h:27
> ./include/linux/jump_label.h:212
> ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2312)
> kernel: worker_thread (./include/linux/list.h:292
> kernel/workqueue.c:2455)
> kernel: ? process_one_work (kernel/workqueue.c:2397)
> kernel: kthread (kernel/kthread.c:377)
> kernel: ? kthread_complete_and_exit (kernel/kthread.c:332)
> kernel: ret_from_fork (arch/x86/entry/entry_64.S:301)
> kernel: </TASK>
>
> Signed-off-by: Dāvis Mosāns <[email protected]>
> ---
> fs/btrfs/lzo.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index 31319dfcc9fb..ebaa5083f2ae 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -383,6 +383,13 @@ int lzo_decompress_bio(struct list_head
> *ws, struct compressed_bio *cb)
> kunmap(cur_page);
> cur_in += LZO_LEN;
>
> + if (seg_len > WORKSPACE_CBUF_LENGTH) {
> + // seg_len shouldn't be larger than we
> have allocated for workspace->cbuf
>
Makes sense.
Is the corrupted lzo compressed extent produced by a normal fs or
crafted manually? If it is from a normal fs, something insane
happened
in extent compressed path.

--
Su

> + btrfs_err(fs_info, "unexpectedly large lzo
> segment len %u", seg_len);
> + ret = -EUCLEAN;
> + goto out;
> + }
> +
> /* Copy the compressed segment payload into
> workspace */
> copy_compressed_segment(cb, workspace->cbuf,
> seg_len, &cur_in);


2022-02-04 12:16:10

by Dāvis Mosāns

[permalink] [raw]
Subject: Re: [PATCH 2/2] btrfs: prevent copying too big compressed lzo segment

ceturtd., 2022. g. 3. febr., plkst. 15:33 — lietotājs Su Yue
(<[email protected]>) rakstīja:
>
>
> On Wed 02 Feb 2022 at 23:44, Dāvis Mosāns <[email protected]>
> wrote:
>
> > Compressed length can be corrupted to be a lot larger than
> > memory
> > we have allocated for buffer.
> > This will cause memcpy in copy_compressed_segment to write
> > outside
> > of allocated memory.
> >
> > This mostly results in stuck read syscall but sometimes when
> > using
> > btrfs send can get #GP
> >
> > kernel: general protection fault, probably for non-canonical
> > address 0x841551d5c1000: 0000 [#1] PREEMPT SMP NOPTI
> > kernel: CPU: 17 PID: 264 Comm: kworker/u256:7 Tainted: P
> > OE 5.17.0-rc2-1 #12
> > kernel: Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> > kernel: RIP: 0010:lzo_decompress_bio
> > (./include/linux/fortify-string.h:225 fs/btrfs/lzo.c:322
> > fs/btrfs/lzo.c:394) btrfs
> > Code starting with the faulting instruction
> > ===========================================
> > 0:* 48 8b 06 mov (%rsi),%rax
> > <-- trapping instruction
> > 3: 48 8d 79 08 lea 0x8(%rcx),%rdi
> > 7: 48 83 e7 f8 and $0xfffffffffffffff8,%rdi
> > b: 48 89 01 mov %rax,(%rcx)
> > e: 44 89 f0 mov %r14d,%eax
> > 11: 48 8b 54 06 f8 mov -0x8(%rsi,%rax,1),%rdx
> > kernel: RSP: 0018:ffffb110812efd50 EFLAGS: 00010212
> > kernel: RAX: 0000000000001000 RBX: 000000009ca264c8 RCX:
> > ffff98996e6d8ff8
> > kernel: RDX: 0000000000000064 RSI: 000841551d5c1000 RDI:
> > ffffffff9500435d
> > kernel: RBP: ffff989a3be856c0 R08: 0000000000000000 R09:
> > 0000000000000000
> > kernel: R10: 0000000000000000 R11: 0000000000001000 R12:
> > ffff98996e6d8000
> > kernel: R13: 0000000000000008 R14: 0000000000001000 R15:
> > 000841551d5c1000
> > kernel: FS: 0000000000000000(0000) GS:ffff98a09d640000(0000)
> > knlGS:0000000000000000
> > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: 00001e9f984d9ea8 CR3: 000000014971a000 CR4:
> > 00000000003506e0
> > kernel: Call Trace:
> > kernel: <TASK>
> > kernel: end_compressed_bio_read (fs/btrfs/compression.c:104
> > fs/btrfs/compression.c:1363 fs/btrfs/compression.c:323) btrfs
> > kernel: end_workqueue_fn (fs/btrfs/disk-io.c:1923) btrfs
> > kernel: btrfs_work_helper (fs/btrfs/async-thread.c:326) btrfs
> > kernel: process_one_work (./arch/x86/include/asm/jump_label.h:27
> > ./include/linux/jump_label.h:212
> > ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2312)
> > kernel: worker_thread (./include/linux/list.h:292
> > kernel/workqueue.c:2455)
> > kernel: ? process_one_work (kernel/workqueue.c:2397)
> > kernel: kthread (kernel/kthread.c:377)
> > kernel: ? kthread_complete_and_exit (kernel/kthread.c:332)
> > kernel: ret_from_fork (arch/x86/entry/entry_64.S:301)
> > kernel: </TASK>
> >
> > Signed-off-by: Dāvis Mosāns <[email protected]>
> > ---
> > fs/btrfs/lzo.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> > index 31319dfcc9fb..ebaa5083f2ae 100644
> > --- a/fs/btrfs/lzo.c
> > +++ b/fs/btrfs/lzo.c
> > @@ -383,6 +383,13 @@ int lzo_decompress_bio(struct list_head
> > *ws, struct compressed_bio *cb)
> > kunmap(cur_page);
> > cur_in += LZO_LEN;
> >
> > + if (seg_len > WORKSPACE_CBUF_LENGTH) {
> > + // seg_len shouldn't be larger than we
> > have allocated for workspace->cbuf
> >
> Makes sense.
> Is the corrupted lzo compressed extent produced by a normal fs or
> crafted manually? If it is from a normal fs, something insane
> happened
> in extent compressed path.
>

Happened normally, but in 2016 year. It's RAID1 where HBA dropped out
some disks and some sectors didn't got written, so most likely that
section contains previous unrelated data.