Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp2202587pxm; Fri, 4 Mar 2022 11:16:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFY8FoPxu0/KqokuDTrkqlCXclks7TmNzoZnR/SaZrcTA5gY0tBsdbggqu9f9h82tdQw2x X-Received: by 2002:a17:90a:319:b0:1be:d55e:af43 with SMTP id 25-20020a17090a031900b001bed55eaf43mr111061pje.231.1646421384480; Fri, 04 Mar 2022 11:16:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646421384; cv=none; d=google.com; s=arc-20160816; b=RjP+DetI3V8tSJezD3ZvZnECmlAjTFgnBWaTaKHrz+o2RhXUG0/oNudCUTjbcYbEjd ugBVTa+1enhK6XRgDQzvrNEOkRcfwK9Zve7TjIPN1N024oWFkLwumfn13IUwMuB8Yf2d 19mrmPoJYfOIh71JVDjzbEKywI6S+2+eSj2uhG1UJt1CsSC0x2uMai3KB4UwnTFcTft+ zfitIXZc/Gn5o/IhgIsashRA6ib68S5yCcuWAac8ZWO6y69b4q4KRHRaZWhMDqrJPARg L1k1/z3+lsPsHyZibkckIG1LPtAMBz7F+Ueu1FPVQRDFnXk0i0HBp6uonAH+EwG7zuah DMag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=4CP2nM1FzQrhT7vZUoNiltDi/7uhQXiWnH0eoQoGLa4=; b=gXvs6rg67fM4tMaJYaly4HIqx24IIFGEcMNKr4oEaVOb+3KxmPscrHMH+6AG7YfEu2 I9EYYEEOVZ7ddhTDobKAaCK8XgHZtKdKYDyKIVpw+QFxjfvAlaeKSTO8Hq2VVCCrDht0 j/LirrY/lZW+hizAWfmRgLgJSC8PFOuZ99bfBrwA2QHZKyeiCfdJ9+W5vIinL6+kEhQx J79r1L/TIpIpSGn7UB/lbks1b4+zqegXr2kHeT1P53XJfjmis1fG/m4zRzQDrTQplXux jHJUoQYngF7QFAYFBS/+zBJ1KYgrEd3Lq/OwD106/P2mcBSJStnuFp6Fs0G+/fJg02Ra NpsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id a8-20020a17090a854800b001bd14e01f8asi342600pjw.120.2022.03.04.11.16.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Mar 2022 11:16:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 09947B91; Fri, 4 Mar 2022 11:07:27 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239848AbiCDO2B (ORCPT + 99 others); Fri, 4 Mar 2022 09:28:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239834AbiCDO15 (ORCPT ); Fri, 4 Mar 2022 09:27:57 -0500 Received: from www62.your-server.de (www62.your-server.de [213.133.104.62]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A4DE1BAF25; Fri, 4 Mar 2022 06:27:08 -0800 (PST) Received: from 226.206.1.85.dynamic.wline.res.cust.swisscom.ch ([85.1.206.226] helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1nQ8tR-000Foa-Vf; Fri, 04 Mar 2022 15:26:54 +0100 From: Daniel Borkmann To: torvalds@linux-foundation.org Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Borkmann , syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com, =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Willy Tarreau , Andrew Morton , Alexei Starovoitov , Andrii Nakryiko , Jakub Kicinski , "David S . Miller" Subject: [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls Date: Fri, 4 Mar 2022 15:26:32 +0100 Message-Id: <8a99a175d25f4bcce6b78cee8fa536e40b987b0a.1646403182.git.daniel@iogearbox.net> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.103.5/26471/Fri Mar 4 10:24:47 2022) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org syzkaller was recently triggering an oversized kvmalloc() warning via xdp_umem_create(). The triggered warning was added back in 7661809d493b ("mm: don't allow oversized kvmalloc() calls"). The rationale for the warning for huge kvmalloc sizes was as a reaction to a security bug where the size was more than UINT_MAX but not everything was prepared to handle unsigned long sizes. Anyway, the AF_XDP related call trace from this syzkaller report was: kvmalloc include/linux/mm.h:806 [inline] kvmalloc_array include/linux/mm.h:824 [inline] kvcalloc include/linux/mm.h:829 [inline] xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline] xdp_umem_reg net/xdp/xdp_umem.c:219 [inline] xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252 xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068 __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176 __do_sys_setsockopt net/socket.c:2187 [inline] __se_sys_setsockopt net/socket.c:2184 [inline] __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Björn mentioned that requests for >2GB allocation can still be valid: The structure that is being allocated is the page-pinning accounting. AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/ PAGE_SIZE on 64 bit systems). [...] I could just change from U32_MAX to INT_MAX, but as I stated earlier that has a hacky feeling to it. [...] From my perspective, the code isn't broken, with the memcg limits in consideration. [...] Linus says: [...] Pretty much every time this has come up, the kernel warning has shown that yes, the code was broken and there really wasn't a reason for doing allocations that big. Of course, some people would be perfectly fine with the allocation failing, they just don't want the warning. I didn't want __GFP_NOWARN to shut it up originally because I wanted people to see all those cases, but these days I think we can just say "yeah, people can shut it up explicitly by saying 'go ahead and fail this allocation, don't warn about it'". So enough time has passed that by now I'd certainly be ok with [it]. Thus allow call-sites to silence such userspace triggered splats if the allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call to kvcalloc() this is already the case, so nothing else needed there. Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Suggested-by: Linus Torvalds Signed-off-by: Daniel Borkmann Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Cc: Björn Töpel Cc: Magnus Karlsson Cc: Willy Tarreau Cc: Andrew Morton Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Jakub Kicinski Cc: David S. Miller Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org --- [ Hi Linus, just to follow-up on the discussion from here [0], I've cooked up proper and tested patch. Feel free to take it directly to your tree if you prefer, or we could also either route it via bpf or mm, whichever way is best. Thanks! [0] https://lore.kernel.org/bpf/CAHk-=wiRq+_jd_O1gz3J6-ANtXMY7iLpi8XFUcmtB3rBixvUXQ@mail.gmail.com/ ] mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/util.c b/mm/util.c index 7e43369064c8..d3102081add0 100644 --- a/mm/util.c +++ b/mm/util.c @@ -587,8 +587,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) return ret; /* Don't even allow crazy sizes */ - if (WARN_ON_ONCE(size > INT_MAX)) + if (unlikely(size > INT_MAX)) { + WARN_ON_ONCE(!(flags & __GFP_NOWARN)); return NULL; + } return __vmalloc_node(size, 1, flags, node, __builtin_return_address(0)); -- 2.21.0