Hello,
syzbot found the following issue on:
HEAD commit: 39676dfe5233 Add linux-next specific files for 20231222
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=172080a1e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=178f6e26e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c399e9e80000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/360542c2ca67/disk-39676dfe.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/900dfb21ca8a/vmlinux-39676dfe.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c94a2a3ea0e0/bzImage-39676dfe.xz
The issue was bisected to:
commit 7bc134496bbbaacb0d4423b819da4eca850a839d
Author: Chengming Zhou <[email protected]>
Date: Mon Dec 18 11:50:31 2023 +0000
mm/zswap: change dstmem size to one page
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15f60c36e80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=17f60c36e80000
console output: https://syzkaller.appspot.com/x/log.txt?x=13f60c36e80000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 7bc134496bbb ("mm/zswap: change dstmem size to one page")
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5065 Comm: syz-executor140 Not tainted 6.7.0-rc6-next-20231222-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
walk_pud_range mm/pagewalk.c:221 [inline]
walk_p4d_range mm/pagewalk.c:256 [inline]
walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
__walk_page_range+0x630/0x770 mm/pagewalk.c:395
walk_page_range+0x626/0xa80 mm/pagewalk.c:521
madvise_pageout_page_range mm/madvise.c:585 [inline]
madvise_pageout+0x32c/0x820 mm/madvise.c:612
madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
do_madvise+0x333/0x660 mm/madvise.c:1440
__do_sys_madvise mm/madvise.c:1453 [inline]
__se_sys_madvise mm/madvise.c:1451 [inline]
__x64_sys_madvise+0xa9/0x110 mm/madvise.c:1451
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x62/0x6a
RIP: 0033:0x7f15a5e14b69
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffde7b4a5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f15a5e14b69
RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
RBP: 0000000000000000 R08: 00005555565c6610 R09: 00005555565c6610
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffde7b4a808 R14: 0000000000000001 R15: 0000000000000001
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: f0 48 c1 e8 03 lock shr $0x3,%rax
5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
9: 0f 85 81 01 00 00 jne 0x190
f: 49 8d 44 24 08 lea 0x8(%r12),%rax
14: 4d 89 26 mov %r12,(%r14)
17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
1e: fc ff df
21: 48 89 44 24 10 mov %rax,0x10(%rsp)
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 74 08 je 0x3a
32: 3c 03 cmp $0x3,%al
34: 0f 8e 47 01 00 00 jle 0x181
3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
3f: 41 rex.B
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
Hi Nhat,
Thanks for the first stab on this bug.
On Tue, Dec 26, 2023 at 1:08 PM Nhat Pham <[email protected]> wrote:
>
> On Tue, Dec 26, 2023 at 7:28 AM syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 39676dfe5233 Add linux-next specific files for 20231222
> > git tree: linux-next
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=172080a1e80000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=178f6e26e80000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c399e9e80000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/360542c2ca67/disk-39676dfe.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/900dfb21ca8a/vmlinux-39676dfe.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/c94a2a3ea0e0/bzImage-39676dfe.xz
> >
> > The issue was bisected to:
> >
> > commit 7bc134496bbbaacb0d4423b819da4eca850a839d
> > Author: Chengming Zhou <[email protected]>
> > Date: Mon Dec 18 11:50:31 2023 +0000
> >
> > mm/zswap: change dstmem size to one page
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15f60c36e80000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=17f60c36e80000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f60c36e80000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> > Fixes: 7bc134496bbb ("mm/zswap: change dstmem size to one page")
> >
> > general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> > CPU: 0 PID: 5065 Comm: syz-executor140 Not tainted 6.7.0-rc6-next-20231222-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline] <========= This is the offending line.
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
This is the call stack with an inline function. Very nice annotations
on the inline function expansions call stacks.
> > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
> > scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
>
> Looks like it's this line:
>
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);
Yes indeed.
The offending instruction is actually this one:
The inlined part of the call stack:
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
<========= This is the offending line.
static inline void scatterwalk_start(struct scatter_walk *walk,
struct scatterlist *sg)
{
walk->sg = sg;
walk->offset = sg->offset; <============== RIP pointer
}
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
static inline void scatterwalk_pagedone(struct scatter_walk *walk, int out,
unsigned int more)
{
if (out) {
struct page *page;
page = sg_page(walk->sg) + ((walk->offset - 1) >> PAGE_SHIFT);
flush_dcache_page(page);
}
if (more && walk->offset >= walk->sg->offset + walk->sg->length)
scatterwalk_start(walk, sg_next(walk->sg)); <==========================
}
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
void scatterwalk_copychunks(void *buf, struct scatter_walk *walk,
size_t nbytes, int out)
{
for (;;) {
unsigned int len_this_page = scatterwalk_pagelen(walk);
u8 *vaddr;
if (len_this_page > nbytes)
len_this_page = nbytes;
if (out != 2) {
vaddr = scatterwalk_map(walk);
memcpy_dir(buf, vaddr, len_this_page, out);
scatterwalk_unmap(vaddr);
}
scatterwalk_advance(walk, len_this_page);
if (nbytes == len_this_page)
break;
buf += len_this_page;
nbytes -= len_this_page;
scatterwalk_pagedone(walk, out & 1, 1); <=========================
}
}
then the unlined portion of the call stack:
scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
unsigned int start, unsigned int nbytes, int out)
{
struct scatter_walk walk;
struct scatterlist tmp[2];
if (!nbytes)
return;
sg = scatterwalk_ffwd(tmp, sg, start);
scatterwalk_start(&walk, sg);
scatterwalk_copychunks(buf, &walk, nbytes, out); <===========================
scatterwalk_done(&walk, out, 0);
}
scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
if (dir)
ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
scratch->dst, &req->dlen, *ctx);
else
ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
scratch->dst, &req->dlen, *ctx);
if (!ret) {
if (!req->dst) {
req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
if (!req->dst) {
ret = -ENOMEM;
goto out;
}
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
<=======================
1);
}
crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
>
> My guess is dlen here exceeds 1 page - maybe the data is
> incompressible, so the output exceeds the original input? Based on the
> included kernel config, the algorithm used here is lzo, and it seems
> that can happen for this particular compressor:
>
> https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
> https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62
The decompressed output can be bigger than input. However, here we
specify the output size limit to be one page. The decompressor should
not output more than the 1 page of the dst buffer.
I did check the lzo1x_decompress_safe() function.
It is supposed to use the NEED_OP(x) check before it stores the output buffer.
However I do find one place that seems to be missing that check, at
least it is not obvious to me how that check is effective. I will
paste it here let other experts take a look as well:
Line 228:
if (op - m_pos >= 8) {
unsigned char *oe = op + t;
if (likely(HAVE_OP(t + 15))) {
do {
COPY8(op, m_pos);
op += 8;
m_pos += 8;
COPY8(op, m_pos);
op += 8;
m_pos += 8;
} while (op < oe);
op = oe;
if (HAVE_IP(6)) {
state = next;
COPY4(op, ip); <========================= This COPY4 does not have
obvious NEED_OP() check. As far as I can tell, this output is not
converted by the above HAVE_OP(t + 15)) check, which means to protect
the unaligned two 8 byte stores inside the "do while" loops.
op += next;
ip += next;
continue;
}
} else {
NEED_OP(t);
do {
*op++ = *m_pos++;
} while (op < oe);
}
} else
>
> Not 100% sure about linux kernel's implementation though. I'm no
> compression expert :)
Me neither. Anyway, if it is a decompression issue. For this bug, it
is good to have some debug print assert to check that after
decompression, the *dlen is not bigger than the original output
length. If it does blow over the decompression buffer, it is a bug
that needs to be fixed in the decompression function side, not the
zswap code.
Chris
>
> If this is the case though, perhaps we can't reduce the output buffer
> size to 1 page after all, unless we contractually obligate the output
> size to be <= input size (i.e new function in the compression API).
>
>
> > crypto_acomp_compress include/crypto/acompress.h:302 [inline]
> > zswap_store+0x98b/0x2430 mm/zswap.c:1666
> > swap_writepage+0x8e/0x220 mm/page_io.c:198
> > pageout+0x399/0x9e0 mm/vmscan.c:656
> > shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
> > reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
> > reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
> > madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
> > walk_pmd_range mm/pagewalk.c:143 [inline]
> > walk_pud_range mm/pagewalk.c:221 [inline]
> > walk_p4d_range mm/pagewalk.c:256 [inline]
> > walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
> > __walk_page_range+0x630/0x770 mm/pagewalk.c:395
> > walk_page_range+0x626/0xa80 mm/pagewalk.c:521
> > madvise_pageout_page_range mm/madvise.c:585 [inline]
> > madvise_pageout+0x32c/0x820 mm/madvise.c:612
> > madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
> > madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
> > do_madvise+0x333/0x660 mm/madvise.c:1440
> > __do_sys_madvise mm/madvise.c:1453 [inline]
> > __se_sys_madvise mm/madvise.c:1451 [inline]
> > __x64_sys_madvise+0xa9/0x110 mm/madvise.c:1451
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x62/0x6a
> > RIP: 0033:0x7f15a5e14b69
> > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007ffde7b4a5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f15a5e14b69
> > RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
> > RBP: 0000000000000000 R08: 00005555565c6610 R09: 00005555565c6610
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > R13: 00007ffde7b4a808 R14: 0000000000000001 R15: 0000000000000001
> > </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
> > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > ----------------
> > Code disassembly (best guess):
> > 0: f0 48 c1 e8 03 lock shr $0x3,%rax
> > 5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
> > 9: 0f 85 81 01 00 00 jne 0x190
> > f: 49 8d 44 24 08 lea 0x8(%r12),%rax
> > 14: 4d 89 26 mov %r12,(%r14)
> > 17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
> > 1e: fc ff df
> > 21: 48 89 44 24 10 mov %rax,0x10(%rsp)
> > 26: 48 c1 e8 03 shr $0x3,%rax
> > * 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
> > 2e: 84 c0 test %al,%al
> > 30: 74 08 je 0x3a
> > 32: 3c 03 cmp $0x3,%al
> > 34: 0f 8e 47 01 00 00 jle 0x181
> > 3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> > 3f: 41 rex.B
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at [email protected].
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>
On Tue, Dec 26, 2023 at 2:32 PM Chris Li <[email protected]> wrote:
>
> Hi Nhat,
>
> Thanks for the first stab on this bug.
>
> On Tue, Dec 26, 2023 at 1:08 PM Nhat Pham <[email protected]> wrote:
> >
> > On Tue, Dec 26, 2023 at 7:28 AM syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 39676dfe5233 Add linux-next specific files for 20231222
> > > git tree: linux-next
> > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=172080a1e80000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
> > > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=178f6e26e80000
> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c399e9e80000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/360542c2ca67/disk-39676dfe.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/900dfb21ca8a/vmlinux-39676dfe.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/c94a2a3ea0e0/bzImage-39676dfe.xz
> > >
> > > The issue was bisected to:
> > >
> > > commit 7bc134496bbbaacb0d4423b819da4eca850a839d
> > > Author: Chengming Zhou <[email protected]>
> > > Date: Mon Dec 18 11:50:31 2023 +0000
> > >
> > > mm/zswap: change dstmem size to one page
> > >
> > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15f60c36e80000
> > > final oops: https://syzkaller.appspot.com/x/report.txt?x=17f60c36e80000
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=13f60c36e80000
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> > > Fixes: 7bc134496bbb ("mm/zswap: change dstmem size to one page")
> > >
> > > general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
> > > KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> > > CPU: 0 PID: 5065 Comm: syz-executor140 Not tainted 6.7.0-rc6-next-20231222-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> > > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline] <========= This is the offending line.
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
>
> This is the call stack with an inline function. Very nice annotations
> on the inline function expansions call stacks.
>
> > > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > > <TASK>
> > > scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
> > > scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
> >
> > Looks like it's this line:
> >
> > scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);
>
> Yes indeed.
>
> The offending instruction is actually this one:
>
> The inlined part of the call stack:
> RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
> <========= This is the offending line.
> static inline void scatterwalk_start(struct scatter_walk *walk,
> struct scatterlist *sg)
> {
> walk->sg = sg;
> walk->offset = sg->offset; <============== RIP pointer
> }
>
> RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
>
> static inline void scatterwalk_pagedone(struct scatter_walk *walk, int out,
> unsigned int more)
> {
> if (out) {
> struct page *page;
>
> page = sg_page(walk->sg) + ((walk->offset - 1) >> PAGE_SHIFT);
> flush_dcache_page(page);
> }
>
> if (more && walk->offset >= walk->sg->offset + walk->sg->length)
> scatterwalk_start(walk, sg_next(walk->sg)); <==========================
> }
>
> RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
>
> void scatterwalk_copychunks(void *buf, struct scatter_walk *walk,
> size_t nbytes, int out)
> {
> for (;;) {
> unsigned int len_this_page = scatterwalk_pagelen(walk);
> u8 *vaddr;
>
> if (len_this_page > nbytes)
> len_this_page = nbytes;
>
> if (out != 2) {
> vaddr = scatterwalk_map(walk);
> memcpy_dir(buf, vaddr, len_this_page, out);
> scatterwalk_unmap(vaddr);
> }
>
> scatterwalk_advance(walk, len_this_page);
>
> if (nbytes == len_this_page)
> break;
>
> buf += len_this_page;
> nbytes -= len_this_page;
>
> scatterwalk_pagedone(walk, out & 1, 1); <=========================
> }
> }
>
> then the unlined portion of the call stack:
> scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
>
> void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
> unsigned int start, unsigned int nbytes, int out)
> {
> struct scatter_walk walk;
> struct scatterlist tmp[2];
>
> if (!nbytes)
> return;
>
> sg = scatterwalk_ffwd(tmp, sg, start);
>
> scatterwalk_start(&walk, sg);
> scatterwalk_copychunks(buf, &walk, nbytes, out); <===========================
> scatterwalk_done(&walk, out, 0);
> }
>
> scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
> if (dir)
> ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
> scratch->dst, &req->dlen, *ctx);
> else
> ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
> scratch->dst, &req->dlen, *ctx);
> if (!ret) {
> if (!req->dst) {
> req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
> if (!req->dst) {
> ret = -ENOMEM;
> goto out;
> }
> }
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> <=======================
> 1);
> }
>
> crypto_acomp_compress include/crypto/acompress.h:302 [inline]
> zswap_store+0x98b/0x2430 mm/zswap.c:1666
> swap_writepage+0x8e/0x220 mm/page_io.c:198
> pageout+0x399/0x9e0 mm/vmscan.c:656
> shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
> reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
> reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
> madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
> walk_pmd_range mm/pagewalk.c:143 [inline]
> >
> > My guess is dlen here exceeds 1 page - maybe the data is
> > incompressible, so the output exceeds the original input? Based on the
> > included kernel config, the algorithm used here is lzo, and it seems
> > that can happen for this particular compressor:
> >
> > https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
> > https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62
>
> The decompressed output can be bigger than input. However, here we
> specify the output size limit to be one page. The decompressor should
> not output more than the 1 page of the dst buffer.
>
> I did check the lzo1x_decompress_safe() function.
I think you meant lzo1x_1_do_compress() right? This error happens on
the zswap_store() path, so it is a compression bug.
> It is supposed to use the NEED_OP(x) check before it stores the output buffer.
I can't seem to find any check in compression code. But that function
is 300 LoC, with no comment :)
> However I do find one place that seems to be missing that check, at
> least it is not obvious to me how that check is effective. I will
> paste it here let other experts take a look as well:
> Line 228:
>
> if (op - m_pos >= 8) {
> unsigned char *oe = op + t;
> if (likely(HAVE_OP(t + 15))) {
> do {
> COPY8(op, m_pos);
> op += 8;
> m_pos += 8;
> COPY8(op, m_pos);
> op += 8;
> m_pos += 8;
> } while (op < oe);
> op = oe;
> if (HAVE_IP(6)) {
> state = next;
> COPY4(op, ip); <========================= This COPY4 does not have
> obvious NEED_OP() check. As far as I can tell, this output is not
> converted by the above HAVE_OP(t + 15)) check, which means to protect
> the unaligned two 8 byte stores inside the "do while" loops.
> op += next;
> ip += next;
> continue;
> }
> } else {
> NEED_OP(t);
> do {
> *op++ = *m_pos++;
> } while (op < oe);
> }
> } else
>
>
> >
> > Not 100% sure about linux kernel's implementation though. I'm no
> > compression expert :)
>
> Me neither. Anyway, if it is a decompression issue. For this bug, it
> is good to have some debug print assert to check that after
> decompression, the *dlen is not bigger than the original output
> length. If it does blow over the decompression buffer, it is a bug
> that needs to be fixed in the decompression function side, not the
> zswap code.
But regardless, I agree. We should enforce the condition that the
output should not exceed the given buffer's size, and gracefully fails
if it does (i.e returns an interpretable error code as opposed to just
walking off the cliff like this).
I imagine that many compression use-cases are best-effort
optimization, i.e it's fine to fail. zswap is exactly this - do your
best to compress the data, but if it's too incompressible, failure is
acceptable (we have plenty of fallback options - swapping, keeping the
page in memory, etc.).
Furthermore, this is a bit of a leaky abstraction - currently there's
no contract on how much bigger can the "compressed" output be with
respect to the input. It's compression algorithm-dependent, which
defeats the point of the abstraction. In zswap case, 2 * PAGE_SIZE is
just a rule of thumb - now imagine we use a compression algorithm that
behaves super well in most of the cases, but would output 3 *
PAGE_SIZE in some edge cases. This will still break the code. Better
to give the caller the ability to bound the output size, and
gracefully fail if that bound cannot be respected for a particular
input.
>
> Chris
>
> >
> > If this is the case though, perhaps we can't reduce the output buffer
> > size to 1 page after all, unless we contractually obligate the output
> > size to be <= input size (i.e new function in the compression API).
> >
> >
> > > crypto_acomp_compress include/crypto/acompress.h:302 [inline]
> > > zswap_store+0x98b/0x2430 mm/zswap.c:1666
> > > swap_writepage+0x8e/0x220 mm/page_io.c:198
> > > pageout+0x399/0x9e0 mm/vmscan.c:656
> > > shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
> > > reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
> > > reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
> > > madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
> > > walk_pmd_range mm/pagewalk.c:143 [inline]
> > > walk_pud_range mm/pagewalk.c:221 [inline]
> > > walk_p4d_range mm/pagewalk.c:256 [inline]
> > > walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
> > > __walk_page_range+0x630/0x770 mm/pagewalk.c:395
> > > walk_page_range+0x626/0xa80 mm/pagewalk.c:521
> > > madvise_pageout_page_range mm/madvise.c:585 [inline]
> > > madvise_pageout+0x32c/0x820 mm/madvise.c:612
> > > madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
> > > madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
> > > do_madvise+0x333/0x660 mm/madvise.c:1440
> > > __do_sys_madvise mm/madvise.c:1453 [inline]
> > > __se_sys_madvise mm/madvise.c:1451 [inline]
> > > __x64_sys_madvise+0xa9/0x110 mm/madvise.c:1451
> > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
> > > entry_SYSCALL_64_after_hwframe+0x62/0x6a
> > > RIP: 0033:0x7f15a5e14b69
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:00007ffde7b4a5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f15a5e14b69
> > > RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
> > > RBP: 0000000000000000 R08: 00005555565c6610 R09: 00005555565c6610
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > > R13: 00007ffde7b4a808 R14: 0000000000000001 R15: 0000000000000001
> > > </TASK>
> > > Modules linked in:
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
> > > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > ----------------
> > > Code disassembly (best guess):
> > > 0: f0 48 c1 e8 03 lock shr $0x3,%rax
> > > 5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
> > > 9: 0f 85 81 01 00 00 jne 0x190
> > > f: 49 8d 44 24 08 lea 0x8(%r12),%rax
> > > 14: 4d 89 26 mov %r12,(%r14)
> > > 17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
> > > 1e: fc ff df
> > > 21: 48 89 44 24 10 mov %rax,0x10(%rsp)
> > > 26: 48 c1 e8 03 shr $0x3,%rax
> > > * 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
> > > 2e: 84 c0 test %al,%al
> > > 30: 74 08 je 0x3a
> > > 32: 3c 03 cmp $0x3,%al
> > > 34: 0f 8e 47 01 00 00 jle 0x181
> > > 3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> > > 3f: 41 rex.B
> > >
> > >
> > > ---
> > > This report is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at [email protected].
> > >
> > > syzbot will keep track of this issue. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> > >
> > > If the report is already addressed, let syzbot know by replying with:
> > > #syz fix: exact-commit-title
> > >
> > > If you want syzbot to run the reproducer, reply with:
> > > #syz test: git://repo/address.git branch-or-commit-hash
> > > If you attach or paste a git patch, syzbot will apply it before testing.
> > >
> > > If you want to overwrite report's subsystems, reply with:
> > > #syz set subsystems: new-subsystem
> > > (See the list of subsystem names on the web dashboard)
> > >
> > > If the report is a duplicate of another one, reply with:
> > > #syz dup: exact-subject-of-another-report
> > >
> > > If you want to undo deduplication, reply with:
> > > #syz undup
> >
Hi Nhat,
On Tue, Dec 26, 2023 at 3:11 PM Nhat Pham <[email protected]> wrote:
> > The decompressed output can be bigger than input. However, here we
> > specify the output size limit to be one page. The decompressor should
> > not output more than the 1 page of the dst buffer.
> >
> > I did check the lzo1x_decompress_safe() function.
>
> I think you meant lzo1x_1_do_compress() right? This error happens on
> the zswap_store() path, so it is a compression bug.
Ah, my bad. I don't know why I am looking at the decompression rather
than compression.
Thanks for catching that.
>
> > It is supposed to use the NEED_OP(x) check before it stores the output buffer.
>
> I can't seem to find any check in compression code. But that function
> is 300 LoC, with no comment :)
It seems the compression side does not have a safe version of the
function which respects the output buffer size. I agree with you there
seems to be no check on the output buffer size before writing to the
output buffer. The "size_t *out_len" seems to work as an output only
pointer. It does not respect the output size limit.
The only place use the output_len is at lzo1x_compress.c:298
*out_len = op - out;
So confirm it is output only :-(
>
> > However I do find one place that seems to be missing that check, at
> > least it is not obvious to me how that check is effective. I will
> > paste it here let other experts take a look as well:
> > Line 228:
> >
> > if (op - m_pos >= 8) {
> > unsigned char *oe = op + t;
> > if (likely(HAVE_OP(t + 15))) {
> > do {
> > COPY8(op, m_pos);
> > op += 8;
> > m_pos += 8;
> > COPY8(op, m_pos);
> > op += 8;
> > m_pos += 8;
> > } while (op < oe);
> > op = oe;
> > if (HAVE_IP(6)) {
> > state = next;
> > COPY4(op, ip); <========================= This COPY4 does not have
> > obvious NEED_OP() check. As far as I can tell, this output is not
> > converted by the above HAVE_OP(t + 15)) check, which means to protect
> > the unaligned two 8 byte stores inside the "do while" loops.
> > op += next;
> > ip += next;
> > continue;
> > }
> > } else {
> > NEED_OP(t);
> > do {
> > *op++ = *m_pos++;
> > } while (op < oe);
> > }
> > } else
> >
> >
> > >
> > > Not 100% sure about linux kernel's implementation though. I'm no
> > > compression expert :)
> >
> > Me neither. Anyway, if it is a decompression issue. For this bug, it
> > is good to have some debug print assert to check that after
> > decompression, the *dlen is not bigger than the original output
> > length. If it does blow over the decompression buffer, it is a bug
> > that needs to be fixed in the decompression function side, not the
> > zswap code.
>
> But regardless, I agree. We should enforce the condition that the
> output should not exceed the given buffer's size, and gracefully fails
> if it does (i.e returns an interpretable error code as opposed to just
> walking off the cliff like this).
Again, sorry I was looking at the decompression side rather than the
compression side. The compression side does not even offer a safe
version of the compression function.
That seems to be dangerous. It seems for now we should make the zswap
roll back to 2 page buffer until we have a safe way to do compression
without overwriting the output buffers.
>
> I imagine that many compression use-cases are best-effort
> optimization, i.e it's fine to fail. zswap is exactly this - do your
> best to compress the data, but if it's too incompressible, failure is
> acceptable (we have plenty of fallback options - swapping, keeping the
> page in memory, etc.).
>
> Furthermore, this is a bit of a leaky abstraction - currently there's
> no contract on how much bigger can the "compressed" output be with
> respect to the input. It's compression algorithm-dependent, which
> defeats the point of the abstraction. In zswap case, 2 * PAGE_SIZE is
> just a rule of thumb - now imagine we use a compression algorithm that
> behaves super well in most of the cases, but would output 3 *
> PAGE_SIZE in some edge cases. This will still break the code. Better
> to give the caller the ability to bound the output size, and
> gracefully fail if that bound cannot be respected for a particular
> input.
Agree.
Chris
On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
>
> Again, sorry I was looking at the decompression side rather than the
> compression side. The compression side does not even offer a safe
> version of the compression function.
> That seems to be dangerous. It seems for now we should make the zswap
> roll back to 2 page buffer until we have a safe way to do compression
> without overwriting the output buffers.
Unfortunately, I think this is the way - at least until we rework the
crypto/compression API (if that's even possible?).
I still think the 2 page buffer is dumb, but it is what it is :(
On 2023/12/27 08:23, Nhat Pham wrote:
> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
>>
>> Again, sorry I was looking at the decompression side rather than the
>> compression side. The compression side does not even offer a safe
>> version of the compression function.
>> That seems to be dangerous. It seems for now we should make the zswap
>> roll back to 2 page buffer until we have a safe way to do compression
>> without overwriting the output buffers.
>
> Unfortunately, I think this is the way - at least until we rework the
> crypto/compression API (if that's even possible?).
> I still think the 2 page buffer is dumb, but it is what it is :(
Hi,
I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
the caller passed "src" and "dst" scatterlist. Instead, it uses its own
per-cpu "scomp_scratch", which have 128KB src and dst.
When compression done, it uses the output req->dlen to copy scomp_scratch->dst
to our dstmem, which has only one page now, so this problem happened.
I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
check the dlen? It seems an obvious bug, right?
As for this problem in `scomp_acomp_comp_decomp()`, this patch below
should fix it. I will set up a few tests to check later.
Thanks!
diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..e654a120ae5a 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;
if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;
+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);
@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOMEM;
+ goto out;
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
1);
On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/27 08:23, Nhat Pham wrote:
> > On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
> >>
> >> Again, sorry I was looking at the decompression side rather than the
> >> compression side. The compression side does not even offer a safe
> >> version of the compression function.
> >> That seems to be dangerous. It seems for now we should make the zswap
> >> roll back to 2 page buffer until we have a safe way to do compression
> >> without overwriting the output buffers.
> >
> > Unfortunately, I think this is the way - at least until we rework the
> > crypto/compression API (if that's even possible?).
> > I still think the 2 page buffer is dumb, but it is what it is :(
>
> Hi,
>
> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
> per-cpu "scomp_scratch", which have 128KB src and dst.
>
> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
> to our dstmem, which has only one page now, so this problem happened.
>
> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
> check the dlen? It seems an obvious bug, right?
>
> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
> should fix it. I will set up a few tests to check later.
>
> Thanks!
>
> diff --git a/crypto/scompress.c b/crypto/scompress.c
> index 442a82c9de7d..e654a120ae5a 100644
> --- a/crypto/scompress.c
> +++ b/crypto/scompress.c
> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> struct crypto_scomp *scomp = *tfm_ctx;
> void **ctx = acomp_request_ctx(req);
> struct scomp_scratch *scratch;
> + unsigned int dlen;
> int ret;
>
> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> req->dlen = SCOMP_SCRATCH_SIZE;
>
> + dlen = req->dlen;
> +
> scratch = raw_cpu_ptr(&scomp_scratch);
> spin_lock(&scratch->lock);
>
> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> ret = -ENOMEM;
> goto out;
> }
> + } else if (req->dlen > dlen) {
> + ret = -ENOMEM;
> + goto out;
> }
This can't fix the problem as crypto_scomp_compress() has written overflow data.
BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
decompression by off-loading CPU;
we won't have a chance to let hardware check the dst buffer size. so
giving the dst buffer
with enough length to the hardware's dma engine is the right way. I
mean, we shouldn't
change dst from 2pages to 1page.
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> 1);
Thanks
Barry
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/27 14:25, Barry Song wrote:
> > On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
> > <[email protected]> wrote:
> >>
> >> On 2023/12/27 08:23, Nhat Pham wrote:
> >>> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
> >>>>
> >>>> Again, sorry I was looking at the decompression side rather than the
> >>>> compression side. The compression side does not even offer a safe
> >>>> version of the compression function.
> >>>> That seems to be dangerous. It seems for now we should make the zswap
> >>>> roll back to 2 page buffer until we have a safe way to do compression
> >>>> without overwriting the output buffers.
> >>>
> >>> Unfortunately, I think this is the way - at least until we rework the
> >>> crypto/compression API (if that's even possible?).
> >>> I still think the 2 page buffer is dumb, but it is what it is :(
> >>
> >> Hi,
> >>
> >> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
> >> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
> >> per-cpu "scomp_scratch", which have 128KB src and dst.
> >>
> >> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
> >> to our dstmem, which has only one page now, so this problem happened.
> >>
> >> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
> >> check the dlen? It seems an obvious bug, right?
> >>
> >> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
> >> should fix it. I will set up a few tests to check later.
> >>
> >> Thanks!
> >>
> >> diff --git a/crypto/scompress.c b/crypto/scompress.c
> >> index 442a82c9de7d..e654a120ae5a 100644
> >> --- a/crypto/scompress.c
> >> +++ b/crypto/scompress.c
> >> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> struct crypto_scomp *scomp = *tfm_ctx;
> >> void **ctx = acomp_request_ctx(req);
> >> struct scomp_scratch *scratch;
> >> + unsigned int dlen;
> >> int ret;
> >>
> >> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> >> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> >> req->dlen = SCOMP_SCRATCH_SIZE;
> >>
> >> + dlen = req->dlen;
> >> +
> >> scratch = raw_cpu_ptr(&scomp_scratch);
> >> spin_lock(&scratch->lock);
> >>
> >> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> ret = -ENOMEM;
> >> goto out;
> >> }
> >> + } else if (req->dlen > dlen) {
> >> + ret = -ENOMEM;
> >> + goto out;
> >> }
> >
> > This can't fix the problem as crypto_scomp_compress() has written overflow data.
>
> No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
> to our dstmem.
>
> >
> > BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
> > decompression by off-loading CPU;
> > we won't have a chance to let hardware check the dst buffer size. so
> > giving the dst buffer
> > with enough length to the hardware's dma engine is the right way. I
> > mean, we shouldn't
> > change dst from 2pages to 1page.
>
> But how do we know 2 pages is enough for any compression algorithm?
>
There is no guarrette 2 pages is enough.
We can invent our own compression algorithm, in our algorithm, we can
add a lot of metadata, for example, longer than 4KB when the source data
is uncompress-able. then dst can be larger than 2 * PAGE_SIZE. but this
is not the case :-) This kind of algorithm may never succeed.
For those in-tree algorithms, we have a WORST_COMPR_FACTOR. in
ubifs, zram and zswap, we all use "2" as the worst comp factor.
/*
* Some compressors, like LZO, may end up with more data then the input buffer.
* So UBIFS always allocates larger output buffer, to be sure the compressor
* will not corrupt memory in case of worst case compression.
*/
#define WORST_COMPR_FACTOR 2
#ifdef CONFIG_FS_ENCRYPTION
#define UBIFS_CIPHER_BLOCK_SIZE FSCRYPT_CONTENTS_ALIGNMENT
#else
#define UBIFS_CIPHER_BLOCK_SIZE 0
#endif
/*
* How much memory is needed for a buffer where we compress a data node.
*/
#define COMPRESSED_DATA_NODE_BUF_SZ \
(UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR)
For years, we have never seen 2 pages that can be a problem. but 1
page is definitely
not enough, I remember I once saw many cases where accelerators' dmaengine
can write more than 1 page.
> Thanks.
>
> >
> >> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> >> 1);
> >
> >
Thanks
Barry
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/27 14:25, Barry Song wrote:
> > On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
> > <[email protected]> wrote:
> >>
> >> On 2023/12/27 08:23, Nhat Pham wrote:
> >>> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
> >>>>
> >>>> Again, sorry I was looking at the decompression side rather than the
> >>>> compression side. The compression side does not even offer a safe
> >>>> version of the compression function.
> >>>> That seems to be dangerous. It seems for now we should make the zswap
> >>>> roll back to 2 page buffer until we have a safe way to do compression
> >>>> without overwriting the output buffers.
> >>>
> >>> Unfortunately, I think this is the way - at least until we rework the
> >>> crypto/compression API (if that's even possible?).
> >>> I still think the 2 page buffer is dumb, but it is what it is :(
> >>
> >> Hi,
> >>
> >> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
> >> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
> >> per-cpu "scomp_scratch", which have 128KB src and dst.
> >>
> >> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
> >> to our dstmem, which has only one page now, so this problem happened.
> >>
> >> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
> >> check the dlen? It seems an obvious bug, right?
> >>
> >> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
> >> should fix it. I will set up a few tests to check later.
> >>
> >> Thanks!
> >>
> >> diff --git a/crypto/scompress.c b/crypto/scompress.c
> >> index 442a82c9de7d..e654a120ae5a 100644
> >> --- a/crypto/scompress.c
> >> +++ b/crypto/scompress.c
> >> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> struct crypto_scomp *scomp = *tfm_ctx;
> >> void **ctx = acomp_request_ctx(req);
> >> struct scomp_scratch *scratch;
> >> + unsigned int dlen;
> >> int ret;
> >>
> >> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> >> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> >> req->dlen = SCOMP_SCRATCH_SIZE;
> >>
> >> + dlen = req->dlen;
> >> +
> >> scratch = raw_cpu_ptr(&scomp_scratch);
> >> spin_lock(&scratch->lock);
> >>
> >> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> ret = -ENOMEM;
> >> goto out;
> >> }
> >> + } else if (req->dlen > dlen) {
> >> + ret = -ENOMEM;
> >> + goto out;
> >> }
> >
> > This can't fix the problem as crypto_scomp_compress() has written overflow data.
>
> No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
> to our dstmem.
Thanks, I got your point as you were using scomp. I used to depend on
acomp, so that
wasn't the case.
>
> >
> > BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
> > decompression by off-loading CPU;
> > we won't have a chance to let hardware check the dst buffer size. so
> > giving the dst buffer
> > with enough length to the hardware's dma engine is the right way. I
> > mean, we shouldn't
> > change dst from 2pages to 1page.
>
> But how do we know 2 pages is enough for any compression algorithm?
>
> Thanks.
>
> >
> >> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> >> 1);
Thanks
Barry
On 2023/12/27 15:01, Barry Song wrote:
> On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
> <[email protected]> wrote:
>>
>> On 2023/12/27 14:25, Barry Song wrote:
>>> On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
>>> <[email protected]> wrote:
>>>>
>>>> On 2023/12/27 08:23, Nhat Pham wrote:
>>>>> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
>>>>>>
>>>>>> Again, sorry I was looking at the decompression side rather than the
>>>>>> compression side. The compression side does not even offer a safe
>>>>>> version of the compression function.
>>>>>> That seems to be dangerous. It seems for now we should make the zswap
>>>>>> roll back to 2 page buffer until we have a safe way to do compression
>>>>>> without overwriting the output buffers.
>>>>>
>>>>> Unfortunately, I think this is the way - at least until we rework the
>>>>> crypto/compression API (if that's even possible?).
>>>>> I still think the 2 page buffer is dumb, but it is what it is :(
>>>>
>>>> Hi,
>>>>
>>>> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
>>>> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
>>>> per-cpu "scomp_scratch", which have 128KB src and dst.
>>>>
>>>> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
>>>> to our dstmem, which has only one page now, so this problem happened.
>>>>
>>>> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
>>>> check the dlen? It seems an obvious bug, right?
>>>>
>>>> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
>>>> should fix it. I will set up a few tests to check later.
>>>>
>>>> Thanks!
>>>>
>>>> diff --git a/crypto/scompress.c b/crypto/scompress.c
>>>> index 442a82c9de7d..e654a120ae5a 100644
>>>> --- a/crypto/scompress.c
>>>> +++ b/crypto/scompress.c
>>>> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>> struct crypto_scomp *scomp = *tfm_ctx;
>>>> void **ctx = acomp_request_ctx(req);
>>>> struct scomp_scratch *scratch;
>>>> + unsigned int dlen;
>>>> int ret;
>>>>
>>>> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
>>>> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
>>>> req->dlen = SCOMP_SCRATCH_SIZE;
>>>>
>>>> + dlen = req->dlen;
>>>> +
>>>> scratch = raw_cpu_ptr(&scomp_scratch);
>>>> spin_lock(&scratch->lock);
>>>>
>>>> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>> ret = -ENOMEM;
>>>> goto out;
>>>> }
>>>> + } else if (req->dlen > dlen) {
>>>> + ret = -ENOMEM;
>>>> + goto out;
>>>> }
>>>
>>> This can't fix the problem as crypto_scomp_compress() has written overflow data.
>>
>> No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
>> to our dstmem.
>>
>>>
>>> BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
>>> decompression by off-loading CPU;
>>> we won't have a chance to let hardware check the dst buffer size. so
>>> giving the dst buffer
>>> with enough length to the hardware's dma engine is the right way. I
>>> mean, we shouldn't
>>> change dst from 2pages to 1page.
>>
>> But how do we know 2 pages is enough for any compression algorithm?
>>
>
> There is no guarrette 2 pages is enough.
>
> We can invent our own compression algorithm, in our algorithm, we can
> add a lot of metadata, for example, longer than 4KB when the source data
> is uncompress-able. then dst can be larger than 2 * PAGE_SIZE. but this
> is not the case :-) This kind of algorithm may never succeed.
>
> For those in-tree algorithms, we have a WORST_COMPR_FACTOR. in
> ubifs, zram and zswap, we all use "2" as the worst comp factor.
Thanks for your explanation! Maybe it's best for us to return to 2 pages
if no other people's comments. And this really need more documentation :-)
since there is no any comment or check in the acomp compress interface.
/*
* @src: Source Data
* @dst: Destination data
* @slen: Size of the input buffer
* @dlen: Size of the output buffer and number of bytes produced
* @flags: Internal flags
* @__ctx: Start of private context data
*/
struct acomp_req {
struct crypto_async_request base;
struct scatterlist *src;
struct scatterlist *dst;
unsigned int slen;
unsigned int dlen;
u32 flags;
void *__ctx[] CRYPTO_MINALIGN_ATTR;
};
>
> /*
> * Some compressors, like LZO, may end up with more data then the input buffer.
> * So UBIFS always allocates larger output buffer, to be sure the compressor
> * will not corrupt memory in case of worst case compression.
> */
> #define WORST_COMPR_FACTOR 2
>
> #ifdef CONFIG_FS_ENCRYPTION
> #define UBIFS_CIPHER_BLOCK_SIZE FSCRYPT_CONTENTS_ALIGNMENT
> #else
> #define UBIFS_CIPHER_BLOCK_SIZE 0
> #endif
>
> /*
> * How much memory is needed for a buffer where we compress a data node.
> */
> #define COMPRESSED_DATA_NODE_BUF_SZ \
> (UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR)
>
>
> For years, we have never seen 2 pages that can be a problem. but 1
> page is definitely
> not enough, I remember I once saw many cases where accelerators' dmaengine
> can write more than 1 page.
>
>> Thanks.
>>
>>>
>>>> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>>>> 1);
>>>
>>>
>
> Thanks
> Barry
From: Chengming Zhou <[email protected]>
The req->dst buffer size should be checked before copying from the
scomp_scratch->dst to avoid req->dst buffer overflow problem.
Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Chengming Zhou <[email protected]>
---
v2:
- change error code to ENOSPC.
---
crypto/scompress.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..b108a30a7600 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;
if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;
+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);
@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOSPC;
+ goto out;
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
1);
--
2.40.1
On Wed, Dec 27, 2023 at 5:35 PM <[email protected]> wrote:
>
> From: Chengming Zhou <[email protected]>
>
> The req->dst buffer size should be checked before copying from the
> scomp_scratch->dst to avoid req->dst buffer overflow problem.
>
> Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Chengming Zhou <[email protected]>
> ---
> v2:
> - change error code to ENOSPC.
Reviewed-by: Barry Song <[email protected]>
> ---
> crypto/scompress.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/crypto/scompress.c b/crypto/scompress.c
> index 442a82c9de7d..b108a30a7600 100644
> --- a/crypto/scompress.c
> +++ b/crypto/scompress.c
> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> struct crypto_scomp *scomp = *tfm_ctx;
> void **ctx = acomp_request_ctx(req);
> struct scomp_scratch *scratch;
> + unsigned int dlen;
> int ret;
>
> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> req->dlen = SCOMP_SCRATCH_SIZE;
>
> + dlen = req->dlen;
> +
> scratch = raw_cpu_ptr(&scomp_scratch);
> spin_lock(&scratch->lock);
>
> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> ret = -ENOMEM;
> goto out;
> }
> + } else if (req->dlen > dlen) {
> + ret = -ENOSPC;
> + goto out;
> }
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> 1);
> --
> 2.40.1
>
On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/27 15:01, Barry Song wrote:
> > On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
> > <[email protected]> wrote:
> >>
> >> On 2023/12/27 14:25, Barry Song wrote:
> >>> On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
> >>> <[email protected]> wrote:
> >>>>
> >>>> On 2023/12/27 08:23, Nhat Pham wrote:
> >>>>> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
> >>>>>>
> >>>>>> Again, sorry I was looking at the decompression side rather than the
> >>>>>> compression side. The compression side does not even offer a safe
> >>>>>> version of the compression function.
> >>>>>> That seems to be dangerous. It seems for now we should make the zswap
> >>>>>> roll back to 2 page buffer until we have a safe way to do compression
> >>>>>> without overwriting the output buffers.
> >>>>>
> >>>>> Unfortunately, I think this is the way - at least until we rework the
> >>>>> crypto/compression API (if that's even possible?).
> >>>>> I still think the 2 page buffer is dumb, but it is what it is :(
> >>>>
> >>>> Hi,
> >>>>
> >>>> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
> >>>> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
> >>>> per-cpu "scomp_scratch", which have 128KB src and dst.
> >>>>
> >>>> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
> >>>> to our dstmem, which has only one page now, so this problem happened.
> >>>>
> >>>> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
> >>>> check the dlen? It seems an obvious bug, right?
> >>>>
> >>>> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
> >>>> should fix it. I will set up a few tests to check later.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> diff --git a/crypto/scompress.c b/crypto/scompress.c
> >>>> index 442a82c9de7d..e654a120ae5a 100644
> >>>> --- a/crypto/scompress.c
> >>>> +++ b/crypto/scompress.c
> >>>> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >>>> struct crypto_scomp *scomp = *tfm_ctx;
> >>>> void **ctx = acomp_request_ctx(req);
> >>>> struct scomp_scratch *scratch;
> >>>> + unsigned int dlen;
> >>>> int ret;
> >>>>
> >>>> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> >>>> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >>>> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> >>>> req->dlen = SCOMP_SCRATCH_SIZE;
> >>>>
> >>>> + dlen = req->dlen;
> >>>> +
> >>>> scratch = raw_cpu_ptr(&scomp_scratch);
> >>>> spin_lock(&scratch->lock);
> >>>>
> >>>> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >>>> ret = -ENOMEM;
> >>>> goto out;
> >>>> }
> >>>> + } else if (req->dlen > dlen) {
> >>>> + ret = -ENOMEM;
> >>>> + goto out;
> >>>> }
> >>>
> >>> This can't fix the problem as crypto_scomp_compress() has written overflow data.
> >>
> >> No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
> >> to our dstmem.
> >>
> >>>
> >>> BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
> >>> decompression by off-loading CPU;
> >>> we won't have a chance to let hardware check the dst buffer size. so
> >>> giving the dst buffer
> >>> with enough length to the hardware's dma engine is the right way. I
> >>> mean, we shouldn't
> >>> change dst from 2pages to 1page.
> >>
> >> But how do we know 2 pages is enough for any compression algorithm?
> >>
> >
> > There is no guarrette 2 pages is enough.
> >
> > We can invent our own compression algorithm, in our algorithm, we can
> > add a lot of metadata, for example, longer than 4KB when the source data
> > is uncompress-able. then dst can be larger than 2 * PAGE_SIZE. but this
> > is not the case :-) This kind of algorithm may never succeed.
> >
> > For those in-tree algorithms, we have a WORST_COMPR_FACTOR. in
> > ubifs, zram and zswap, we all use "2" as the worst comp factor.
>
> Thanks for your explanation! Maybe it's best for us to return to 2 pages
> if no other people's comments. And this really need more documentation :-)
I agree. we need some doc.
besides, i actually think we can skip zswap frontend if
over-compression is really
happening.
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1318,7 +1318,7 @@ bool zswap_store(struct folio *folio)
if (zpool_malloc_support_movable(zpool))
gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
ret = zpool_malloc(zpool, dlen, gfp, &handle);
- if (ret == -ENOSPC) {
+ if (ret == -ENOSPC || dlen > PAGE_SIZE) {
zswap_reject_compress_poor++;
goto put_dstmem;
}
since we are not saving but wasting space in this corner case?
> since there is no any comment or check in the acomp compress interface.
>
> /*
> * @src: Source Data
> * @dst: Destination data
> * @slen: Size of the input buffer
> * @dlen: Size of the output buffer and number of bytes produced
> * @flags: Internal flags
> * @__ctx: Start of private context data
> */
> struct acomp_req {
> struct crypto_async_request base;
> struct scatterlist *src;
> struct scatterlist *dst;
> unsigned int slen;
> unsigned int dlen;
> u32 flags;
> void *__ctx[] CRYPTO_MINALIGN_ATTR;
> };
>
> >
> > /*
> > * Some compressors, like LZO, may end up with more data then the input buffer.
> > * So UBIFS always allocates larger output buffer, to be sure the compressor
> > * will not corrupt memory in case of worst case compression.
> > */
> > #define WORST_COMPR_FACTOR 2
> >
> > #ifdef CONFIG_FS_ENCRYPTION
> > #define UBIFS_CIPHER_BLOCK_SIZE FSCRYPT_CONTENTS_ALIGNMENT
> > #else
> > #define UBIFS_CIPHER_BLOCK_SIZE 0
> > #endif
> >
> > /*
> > * How much memory is needed for a buffer where we compress a data node.
> > */
> > #define COMPRESSED_DATA_NODE_BUF_SZ \
> > (UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR)
> >
> >
> > For years, we have never seen 2 pages that can be a problem. but 1
> > page is definitely
> > not enough, I remember I once saw many cases where accelerators' dmaengine
> > can write more than 1 page.
> >
> >> Thanks.
> >>
> >>>
> >>>> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> >>>> 1);
> >>>
> >>>
>
Thanks
Barry
On Wed, Dec 27, 2023 at 3:10 AM Barry Song <[email protected]> wrote:
>
> On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> <[email protected]> wrote:
> >
> > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > if no other people's comments. And this really need more documentation :-)
Fine by me. Hmm we're basically wasting one extra page per CPU (since
these buffers are per-CPU), correct? That's not ideal, but not *too*
bad for now I suppose...
>
> I agree. we need some doc.
>
> besides, i actually think we can skip zswap frontend if
> over-compression is really
> happening.
IIUC, zsmalloc already checked that - and most people are (or should
be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
an added layer of protection on the zswap side, but not super high
priority I'd say.
On Thu, Dec 28, 2023 at 7:26 AM Nhat Pham <[email protected]> wrote:
>
> On Wed, Dec 27, 2023 at 3:10 AM Barry Song <[email protected]> wrote:
> >
> > On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> > <[email protected]> wrote:
> > >
> > > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > > if no other people's comments. And this really need more documentation :-)
>
> Fine by me. Hmm we're basically wasting one extra page per CPU (since
> these buffers are per-CPU), correct? That's not ideal, but not *too*
> bad for now I suppose...
>
> >
> > I agree. we need some doc.
> >
> > besides, i actually think we can skip zswap frontend if
> > over-compression is really
> > happening.
>
> IIUC, zsmalloc already checked that - and most people are (or should
> be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
> an added layer of protection on the zswap side, but not super high
> priority I'd say.
Thanks for this info. I guess you mean the below ?
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
{
...
if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
return (unsigned long)ERR_PTR(-EINVAL);
}
i find zbud also has similar code:
static int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp,
unsigned long *handle)
{
int chunks, i, freechunks;
struct zbud_header *zhdr = NULL;
enum buddy bud;
struct page *page;
if (!size || (gfp & __GFP_HIGHMEM))
return -EINVAL;
if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE)
return -ENOSPC;
and z3fold,
static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
unsigned long *handle)
{
int chunks = size_to_chunks(size);
struct z3fold_header *zhdr = NULL;
struct page *page = NULL;
enum buddy bud;
bool can_sleep = gfpflags_allow_blocking(gfp);
if (!size || (gfp & __GFP_HIGHMEM))
return -EINVAL;
if (size > PAGE_SIZE)
return -ENOSPC;
Thus, I agree that another layer to check size in zswap isn't necessary now.
Thanks
Barry
On Thu, Dec 28, 2023 at 9:43 AM Barry Song <[email protected]> wrote:
>
> On Thu, Dec 28, 2023 at 7:26 AM Nhat Pham <[email protected]> wrote:
> >
> > On Wed, Dec 27, 2023 at 3:10 AM Barry Song <[email protected]> wrote:
> > >
> > > On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> > > <[email protected]> wrote:
> > > >
> > > > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > > > if no other people's comments. And this really need more documentation :-)
> >
> > Fine by me. Hmm we're basically wasting one extra page per CPU (since
> > these buffers are per-CPU), correct? That's not ideal, but not *too*
> > bad for now I suppose...
> >
> > >
> > > I agree. we need some doc.
> > >
> > > besides, i actually think we can skip zswap frontend if
> > > over-compression is really
> > > happening.
> >
> > IIUC, zsmalloc already checked that - and most people are (or should
> > be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
> > an added layer of protection on the zswap side, but not super high
> > priority I'd say.
>
> Thanks for this info. I guess you mean the below ?
> unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
> {
> ...
>
> if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
> return (unsigned long)ERR_PTR(-EINVAL);
BTW, do you think zsmalloc is worth a patch to change the ret from
EINVAL to ENOSPC?
This seems more sensible and matches the behaviour of zswap, and zbud, z3fold.
ret = zpool_malloc(zpool, dlen, gfp, &handle);
if (ret == -ENOSPC) {
zswap_reject_compress_poor++;
goto put_dstmem;
}
if (ret) {
zswap_reject_alloc_fail++;
goto put_dstmem;
}
buf = z
>
> }
>
> i find zbud also has similar code:
> static int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp,
> unsigned long *handle)
> {
> int chunks, i, freechunks;
> struct zbud_header *zhdr = NULL;
> enum buddy bud;
> struct page *page;
>
> if (!size || (gfp & __GFP_HIGHMEM))
> return -EINVAL;
> if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE)
> return -ENOSPC;
>
> and z3fold,
>
> static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
> unsigned long *handle)
> {
> int chunks = size_to_chunks(size);
> struct z3fold_header *zhdr = NULL;
> struct page *page = NULL;
> enum buddy bud;
> bool can_sleep = gfpflags_allow_blocking(gfp);
>
> if (!size || (gfp & __GFP_HIGHMEM))
> return -EINVAL;
>
> if (size > PAGE_SIZE)
> return -ENOSPC;
>
>
> Thus, I agree that another layer to check size in zswap isn't necessary now.
Thanks
Barry
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-and-tested-by: [email protected]
Tested on:
commit: 39676dfe Add linux-next specific files for 20231222
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=104d06d9e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=13756fe9e80000
Note: testing is done by a robot and is best-effort only.
On Wed, Dec 27, 2023 at 5:47 PM Barry Song <[email protected]> wrote:
>
> On Thu, Dec 28, 2023 at 9:43 AM Barry Song <[email protected]> wrote:
> >
> > On Thu, Dec 28, 2023 at 7:26 AM Nhat Pham <[email protected]> wrote:
> > >
> > > On Wed, Dec 27, 2023 at 3:10 AM Barry Song <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> > > > <[email protected]> wrote:
> > > > >
> > > > > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > > > > if no other people's comments. And this really need more documentation :-)
> > >
> > > Fine by me. Hmm we're basically wasting one extra page per CPU (since
> > > these buffers are per-CPU), correct? That's not ideal, but not *too*
> > > bad for now I suppose...
> > >
> > > >
> > > > I agree. we need some doc.
> > > >
> > > > besides, i actually think we can skip zswap frontend if
> > > > over-compression is really
> > > > happening.
> > >
> > > IIUC, zsmalloc already checked that - and most people are (or should
> > > be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
> > > an added layer of protection on the zswap side, but not super high
> > > priority I'd say.
> >
> > Thanks for this info. I guess you mean the below ?
> > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
> > {
> > ...
> >
> > if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
> > return (unsigned long)ERR_PTR(-EINVAL);
>
> BTW, do you think zsmalloc is worth a patch to change the ret from
> EINVAL to ENOSPC?
> This seems more sensible and matches the behaviour of zswap, and zbud, z3fold.
>
> ret = zpool_malloc(zpool, dlen, gfp, &handle);
> if (ret == -ENOSPC) {
> zswap_reject_compress_poor++;
> goto put_dstmem;
> }
> if (ret) {
> zswap_reject_alloc_fail++;
> goto put_dstmem;
> }
> buf = z
>
I haven't looked at the code surrounding it too closely, but IMHO this
would be nice. Poor compressibility of the workload's memory is
something we should monitor more carefully. This has come up a couple
times in discussion:
https://lore.kernel.org/linux-mm/CAJD7tkZXS-UJVAFfvxJ0nNgTzWBiqepPYA4hEozi01_qktkitg@mail.gmail.com/
https://lore.kernel.org/all/[email protected]/T/#u
> >
> > }
> >
> > i find zbud also has similar code:
> > static int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp,
> > unsigned long *handle)
> > {
> > int chunks, i, freechunks;
> > struct zbud_header *zhdr = NULL;
> > enum buddy bud;
> > struct page *page;
> >
> > if (!size || (gfp & __GFP_HIGHMEM))
> > return -EINVAL;
> > if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE)
> > return -ENOSPC;
> >
> > and z3fold,
> >
> > static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
> > unsigned long *handle)
> > {
> > int chunks = size_to_chunks(size);
> > struct z3fold_header *zhdr = NULL;
> > struct page *page = NULL;
> > enum buddy bud;
> > bool can_sleep = gfpflags_allow_blocking(gfp);
> >
> > if (!size || (gfp & __GFP_HIGHMEM))
> > return -EINVAL;
> >
> > if (size > PAGE_SIZE)
> > return -ENOSPC;
> >
> >
> > Thus, I agree that another layer to check size in zswap isn't necessary now.
>
> Thanks
> Barry
On Wed, Dec 27, 2023 at 09:35:23AM +0000, [email protected] wrote:
> From: Chengming Zhou <[email protected]>
>
> The req->dst buffer size should be checked before copying from the
> scomp_scratch->dst to avoid req->dst buffer overflow problem.
>
> Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Chengming Zhou <[email protected]>
> ---
> v2:
> - change error code to ENOSPC.
> ---
> crypto/scompress.c | 6 ++++++
> 1 file changed, 6 insertions(+)
Patch applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/27 14:25, Barry Song wrote:
> > On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
> > <[email protected]> wrote:
> >>
> >> On 2023/12/27 08:23, Nhat Pham wrote:
> >>> On Tue, Dec 26, 2023 at 3:30 PM Chris Li <[email protected]> wrote:
> >>>>
> >>>> Again, sorry I was looking at the decompression side rather than the
> >>>> compression side. The compression side does not even offer a safe
> >>>> version of the compression function.
> >>>> That seems to be dangerous. It seems for now we should make the zswap
> >>>> roll back to 2 page buffer until we have a safe way to do compression
> >>>> without overwriting the output buffers.
> >>>
> >>> Unfortunately, I think this is the way - at least until we rework the
> >>> crypto/compression API (if that's even possible?).
> >>> I still think the 2 page buffer is dumb, but it is what it is :(
> >>
> >> Hi,
> >>
> >> I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
> >> the caller passed "src" and "dst" scatterlist. Instead, it uses its own
> >> per-cpu "scomp_scratch", which have 128KB src and dst.
> >>
> >> When compression done, it uses the output req->dlen to copy scomp_scratch->dst
> >> to our dstmem, which has only one page now, so this problem happened.
> >>
> >> I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
> >> check the dlen? It seems an obvious bug, right?
> >>
> >> As for this problem in `scomp_acomp_comp_decomp()`, this patch below
> >> should fix it. I will set up a few tests to check later.
> >>
> >> Thanks!
> >>
> >> diff --git a/crypto/scompress.c b/crypto/scompress.c
> >> index 442a82c9de7d..e654a120ae5a 100644
> >> --- a/crypto/scompress.c
> >> +++ b/crypto/scompress.c
> >> @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> struct crypto_scomp *scomp = *tfm_ctx;
> >> void **ctx = acomp_request_ctx(req);
> >> struct scomp_scratch *scratch;
> >> + unsigned int dlen;
> >> int ret;
> >>
> >> if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
> >> @@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
> >> req->dlen = SCOMP_SCRATCH_SIZE;
> >>
> >> + dlen = req->dlen;
> >> +
> >> scratch = raw_cpu_ptr(&scomp_scratch);
> >> spin_lock(&scratch->lock);
> >>
> >> @@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >> ret = -ENOMEM;
> >> goto out;
> >> }
> >> + } else if (req->dlen > dlen) {
> >> + ret = -ENOMEM;
> >> + goto out;
> >> }
> >
> > This can't fix the problem as crypto_scomp_compress() has written overflow data.
>
> No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
> to our dstmem.
Hi Chengming,
I still feel these two memcpys are too big and unnecessary, so i sent
a RFC[1] to remove
them as well as another one removing memcpy in zswap[2].
but unfortunately I don't have real hardware to run and collect data,
I wonder if you are
interested in testing and collecting data as you are actively
contributing to zswap.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2]
https://lore.kernel.org/linux-mm/[email protected]/
https://lore.kernel.org/linux-mm/[email protected]/
Thanks
Barry