Hi Linus
I have a pile of (still under triage) sysbot reports coming after one of your patch
Typical stack trace:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 24889 at mm/util.c:597 kvmalloc_node+0x111/0x120 mm/util.c:597
Modules linked in:
CPU: 1 PID: 24889 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:kvmalloc_node+0x111/0x120 mm/util.c:597
Code: 01 00 00 00 4c 89 e7 e8 8d 12 0d 00 49 89 c5 e9 69 ff ff ff e8 f0 21 d1 ff 41 89 ed 41 81 cd 00 20 01 00 eb 95 e8 df 21 d1 ff <0f> 0b e9 4c ff ff ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53 e8 c6
RSP: 0018:ffffc90002abf268 EFLAGS: 00010216
RAX: 0000000000000431 RBX: ffffc90002abf380 RCX: ffffc90014157000
RDX: 0000000000040000 RSI: ffffffff81a4f621 RDI: 0000000000000003
RBP: 0000000000400dc0 R08: 000000007fffffff R09: 00000000ffffffff
R10: ffffffff81a4f5de R11: 000000000000001f R12: 0000000200000018
R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8880121f0a00
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0063) knlGS:00000000f5562b40
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000561d64b67478 CR3: 00000000196e8000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
hash_ipport_create+0x3dd/0x1220 net/netfilter/ipset/ip_set_hash_gen.h:1524
ip_set_create+0x782/0x15a0 net/netfilter/ipset/ip_set_core.c:1100
nfnetlink_rcv_msg+0xbc9/0x13f0 net/netfilter/nfnetlink.c:296
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:654
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
RIP: 0023:0xf7f68549
Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f55625fc EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000020000100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
----------------
Caused by
commit 7661809d493b426e979f39ab512e3adf41fbcc69
Author: Linus Torvalds <[email protected]>
Date: Wed Jul 14 09:45:49 2021 -0700
mm: don't allow oversized kvmalloc() calls
Do we want to fix all problematic callers, with ad-hoc patches like
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 16ae92054baa80b232995661ef72f5c8e6866663..ed14af7761166aebffcd1d98597d9023b147ddf3 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -250,6 +250,8 @@ EXPORT_SYMBOL_GPL(ip_set_type_unregister);
void *
ip_set_alloc(size_t size)
{
+ if (size > INT_MAX)
+ return NULL;
return kvzalloc(size, GFP_KERNEL_ACCOUNT);
}
EXPORT_SYMBOL_GPL(ip_set_alloc);
Thanks !
On Fri, Sep 3, 2021 at 1:44 PM Eric Dumazet <[email protected]> wrote:
>
> I have a pile of (still under triage) sysbot reports coming after one of your patch
>
> Typical stack trace:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 24889 at mm/util.c:597 kvmalloc_node+0x111/0x120 mm/util.c:597
> Call Trace:
> hash_ipport_create+0x3dd/0x1220 net/netfilter/ipset/ip_set_hash_gen.h:1524
> ip_set_create+0x782/0x15a0 net/netfilter/ipset/ip_set_core.c:1100
> nfnetlink_rcv_msg+0xbc9/0x13f0 net/netfilter/nfnetlink.c:296
So the real question is mainly just whether those huge allocations
actually make sense or not.
If they truly are sensible, we can remove the warning. But it would be
good to perhaps look at them more.
Because no:
> Do we want to fix all problematic callers, with ad-hoc patches like
Not insane patches like this, no.
> ip_set_alloc(size_t size)
> {
> + if (size > INT_MAX)
> + return NULL;
> return kvzalloc(size, GFP_KERNEL_ACCOUNT);
> }
But does that kind of size really make sense? I'm looking at the
particular caller, is it *really* senseible to have a 4GB+ hash table
size?
IOW, I don't think we want that warning to cause the above kinds of
ad-hoc patches.
But I _do_ want that warning to make people go "is that allocation
truly sensible"?
IOW, it sounds like you can send some netlink message that causes
insane hash size allocations. Shouldn't _that_ be fixed?
Linus
On Fri, Sep 3, 2021 at 3:21 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Sep 3, 2021 at 1:44 PM Eric Dumazet <[email protected]> wrote:
> >
> > I have a pile of (still under triage) sysbot reports coming after one of your patch
> >
> > Typical stack trace:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 24889 at mm/util.c:597 kvmalloc_node+0x111/0x120 mm/util.c:597
> > Call Trace:
> > hash_ipport_create+0x3dd/0x1220 net/netfilter/ipset/ip_set_hash_gen.h:1524
> > ip_set_create+0x782/0x15a0 net/netfilter/ipset/ip_set_core.c:1100
> > nfnetlink_rcv_msg+0xbc9/0x13f0 net/netfilter/nfnetlink.c:296
>
> So the real question is mainly just whether those huge allocations
> actually make sense or not.
>
> If they truly are sensible, we can remove the warning. But it would be
> good to perhaps look at them more.
>
> Because no:
>
> > Do we want to fix all problematic callers, with ad-hoc patches like
>
> Not insane patches like this, no.
>
> > ip_set_alloc(size_t size)
> > {
> > + if (size > INT_MAX)
> > + return NULL;
> > return kvzalloc(size, GFP_KERNEL_ACCOUNT);
> > }
>
> But does that kind of size really make sense? I'm looking at the
> particular caller, is it *really* senseible to have a 4GB+ hash table
> size?
>
> IOW, I don't think we want that warning to cause the above kinds of
> ad-hoc patches.
>
> But I _do_ want that warning to make people go "is that allocation
> truly sensible"?
>
> IOW, it sounds like you can send some netlink message that causes
> insane hash size allocations. Shouldn't _that_ be fixed?
>
Probably, but as I said there are many different reports.
If there was only one or two, I would simply have sent a fix(es).
I will probably release these bugs, so that they can be spread among
interested parties.
WARNING: CPU: 1 PID: 26011 at mm/util.c:597 kvmalloc_node+0x111/0x120
mm/util.c:597
Modules linked in:
CPU: 1 PID: 26011 Comm: syz-executor.2 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
RIP: 0010:kvmalloc_node+0x111/0x120 mm/util.c:597
Code: 01 00 00 00 4c 89 e7 e8 8d 12 0d 00 49 89 c5 e9 69 ff ff ff e8
f0 21 d1 ff 41 89 ed 41 81 cd 00 20 01 00 eb 95 e8 df 21 d1 ff <0f> 0b
e9 4c ff ff ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53 e8 c6
RSP: 0018:ffffc90003a77720 EFLAGS: 00010216
RAX: 0000000000000e48 RBX: ffffc90003a77e18 RCX: ffffc9000df6d000
RDX: 0000000000040000 RSI: ffffffff81a4f621 RDI: 0000000000000003
RBP: 0000000000002dc0 R08: 000000007fffffff R09: 00000000ffffffff
R10: ffffffff81a4f5de R11: 0000000000000000 R12: 000000020008a100
R13: 0000000000000000 R14: 00000000ffffffff R15: ffff88802032c000
FS: 00007fbfc5618700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000540198 CR3: 000000006e529000 CR4: 0000000000350ee0
Call Trace:
kvmalloc include/linux/mm.h:806 [inline]
kvmalloc_array include/linux/mm.h:824 [inline]
kvcalloc include/linux/mm.h:829 [inline]
check_btf_line+0x1a9/0xad0 kernel/bpf/verifier.c:9925
check_btf_info kernel/bpf/verifier.c:10049 [inline]
bpf_check+0x1636/0xbd20 kernel/bpf/verifier.c:13759
bpf_prog_load+0xe57/0x21f0 kernel/bpf/syscall.c:2301
__sys_bpf+0x67e/0x5df0 kernel/bpf/syscall.c:4587
__do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
__se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
__x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
On Fri, Sep 3, 2021 at 4:00 PM Eric Dumazet <[email protected]> wrote:
>
> > IOW, it sounds like you can send some netlink message that causes
> > insane hash size allocations. Shouldn't _that_ be fixed?
>
> Probably, but as I said there are many different reports.
>
> If there was only one or two, I would simply have sent a fix(es).
>
> I will probably release these bugs, so that they can be spread among
> interested parties.
Sure.
Let's keep the warning in place. We can remove it before the actual
release if things don't get better, but it does look like it's
actually finding places where people should have checked limits more,
rather than apparently just relying on the allocation failing.
Because with enough memory, the allocations traditionally didn't fail
- they just succeed with completely insane allocations and absolutely
horrendous latencies (ie allocating and possibly clearing gigabytes
and gigabytes of data).
This other one:
> WARNING: CPU: 1 PID: 26011 at mm/util.c:597 kvmalloc_node+0x111/0x120
> mm/util.c:597
> Modules linked in:
> CPU: 1 PID: 26011 Comm: syz-executor.2 Not tainted 5.14.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:kvmalloc_node+0x111/0x120 mm/util.c:597
> Call Trace:
> check_btf_line+0x1a9/0xad0 kernel/bpf/verifier.c:9925
Yeah, that code should check "nr_linfo" a lot more than it seems to do.
It had just added __GFP_NOWARN to hide the fact that it did crazy
allocations and just wanted the craziest ones to fail silently.
I think it should just limit itself to something sane.
Linus
On Fri, Sep 3, 2021 at 4:08 PM Linus Torvalds
<[email protected]> wrote:
>
> It had just added __GFP_NOWARN to hide the fact that it did crazy
> allocations and just wanted the craziest ones to fail silently.
.. and yes, we can make __GFP_NOWARN limit that WARN_ON_ONCE() too,
but the whole point of this really was to actually find people who
simply didn't check their arguments. So I'd rather add a few sanity
checks, than say "__GFP_NOWARN silences the sanity check too".
Linus
From: Linus Torvalds
> Sent: 03 September 2021 23:21
>
> On Fri, Sep 3, 2021 at 1:44 PM Eric Dumazet <[email protected]> wrote:
> >
> > I have a pile of (still under triage) sysbot reports coming after one of your patch
> >
> > Typical stack trace:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 24889 at mm/util.c:597 kvmalloc_node+0x111/0x120 mm/util.c:597
> > Call Trace:
> > hash_ipport_create+0x3dd/0x1220 net/netfilter/ipset/ip_set_hash_gen.h:1524
> > ip_set_create+0x782/0x15a0 net/netfilter/ipset/ip_set_core.c:1100
> > nfnetlink_rcv_msg+0xbc9/0x13f0 net/netfilter/nfnetlink.c:296
>
> So the real question is mainly just whether those huge allocations
> actually make sense or not.
>
> If they truly are sensible, we can remove the warning. But it would be
> good to perhaps look at them more.
>
> Because no:
>
> > Do we want to fix all problematic callers, with ad-hoc patches like
>
> Not insane patches like this, no.
>
> > ip_set_alloc(size_t size)
> > {
> > + if (size > INT_MAX)
> > + return NULL;
> > return kvzalloc(size, GFP_KERNEL_ACCOUNT);
> > }
>
> But does that kind of size really make sense? I'm looking at the
> particular caller, is it *really* senseible to have a 4GB+ hash table
> size?
I wonder if there should be a GFP_LARGE_ALLOC flag that must
be set in order to allow allocates over a few MB?
(Probably with warn + allocate for now.)
That will generate even more warnings for oversize allocates
but stop accidental huge allocates in places that really
don't expect them.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)