Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1414723rwb; Fri, 7 Oct 2022 12:15:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7yXtJY2iXGFj+gRkc6hiFGah4FiEOLiXMvHzZBNO+3D93E+2khOFIxqDkwkyTFw8fZXuIU X-Received: by 2002:a05:6402:5110:b0:459:e912:17c8 with SMTP id m16-20020a056402511000b00459e91217c8mr6151371edd.137.1665170133953; Fri, 07 Oct 2022 12:15:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665170133; cv=none; d=google.com; s=arc-20160816; b=dWaQeNmOgbgcbGORb5+je5+v3aGav8Y492jssmvk2OQPDJpIyRUnWrpB/44ZR8f2nh FUQ30qleQkN8f6zhynrnHu+hdNgkvA2SeRhJz2hSnmKwXr4cxIsgPYUFKEarhFc8uRd7 99nuHYT4xZzkgTHPowHgGNwtCfzrAdlzQFS6EOWlai22L/t0m7Bb2TP/KVVnkcy8oHAu TMs39exbQuZat3Oh0UF/fym1hN15V1IrPtOFwiKWj9Cac46n7MzgRIP7YI8ltTsN19Ka YIABmjF7rlcU2ykob8pUpvjNadMWxJfkaOu6K44JRr34d8f0VEyyy3mD4RgHtBW9CJ7G w1rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=21A6ZjPuJYw0WXuNQnO8404OZpL79Fxvlo9cifFtwEE=; b=qO5y/iPpr10gy21Ii/TdealnBLhzo70pg+hp6149MwK6adTb5vXja6XVrBfvkhOaHB KJ+dphG0GgSwOMvCG1+bZpRqqglmEXc21OAxQVvPda2YKsfXwwpFLJqt6Pigu7xbGtH8 VUEhDNLI4vj7Wjsq3xDBJaU1B2SwV9DigdoSJ+LSsWoU6CK0yexehQvr2LfFXaj0cnHi 5MjQ71p4n+603somVQQM0qzHjVc0NTsGTkpio+2R6LMXy1B3TX3ChjgbJTqqMC1FhscM NuNbFkCC2ZKC5S3jYIOM1K74bphppRHOS2DKDJ10an/aQ8TtWJ+QxQ7ctptU2NBVYovt lhGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OgjwR2mp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 7-20020a508747000000b0044615ee1b6fsi2885150edv.218.2022.10.07.12.15.07; Fri, 07 Oct 2022 12:15:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OgjwR2mp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229506AbiJGTE1 (ORCPT + 99 others); Fri, 7 Oct 2022 15:04:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229511AbiJGTEZ (ORCPT ); Fri, 7 Oct 2022 15:04:25 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BAA94E401 for ; Fri, 7 Oct 2022 12:04:24 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id c24so5367205plo.3 for ; Fri, 07 Oct 2022 12:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=21A6ZjPuJYw0WXuNQnO8404OZpL79Fxvlo9cifFtwEE=; b=OgjwR2mpvL7at1QiCptYJGYayigIS2Mnch4J2YrXJaQ42rnGB/rkvrX9a67tle7xCW DV71jD5VbMXXT2fc+6W0GIBPw1RiWVdwt6xGnnYBPGuFo/EzsOy/rYI7EuOCIe3RPp6f Yl/EYakcL4uTGWudHHVZseK0jFW5qg1nLAG4ilXwFF+7zrcOlHvO6YaFMLo01IDsHNMz /aLxG6AryM6U/BcZnz6+L5Z7priSsJeGNyF5GYnJIqCU0z3nnctU3XBYqlY72vmrxcCm oD+C6JsVYJXXYXNb2pFrfb5IRNVRWQy7LL3JjivtxT2cs4CraTszZRiSB9sC+s0w6SnG 3gmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=21A6ZjPuJYw0WXuNQnO8404OZpL79Fxvlo9cifFtwEE=; b=6sBu8/gLqzzqd/XEkdBY5uqEjXfiECD6ESocIaikR7P4Sq/nVCaAKVYBerX/Apg+Ar 9lmfc4hqFlChWxXB7y0NE1Xo9gQb1Wmqva3xQFcQ2n6bT2PVShtTvfDkkz+AcWRPI4qZ IGtx+WniPFXCoX/iamOrGTI/hFNO3OUEEbEHKREIs10FVb+8OXc5sl9D5cFhkYZOL6e8 shwgRUDp4js0VX/rCc+ItW71KqKnKArdftat2vGHkYDQK0BoqKPVdmtKRDjpLXf1/km9 KNE5Ej7tr/cIq7VBkWtNYz8UYR6l228zJXsAd/3VR9f2B/V3B5HHbr4suiXQyU/T9D0b xKqw== X-Gm-Message-State: ACrzQf23LOeOnSauUeCxNaN5s1xxxtwDTfRWktM3PS0m983Ttrga6f2p YdKqArErVaWZ+L8+WiLBVvJtAbYKXqE2JaOEYTZHWQ== X-Received: by 2002:a17:90b:3a88:b0:209:f35d:ad53 with SMTP id om8-20020a17090b3a8800b00209f35dad53mr17920889pjb.102.1665169463422; Fri, 07 Oct 2022 12:04:23 -0700 (PDT) MIME-Version: 1.0 References: <20190307090146.1874906-1-arnd@arndb.de> <20221006222124.aabaemy7ofop7ccz@google.com> In-Reply-To: From: Nick Desaulniers Date: Fri, 7 Oct 2022 12:04:11 -0700 Message-ID: Subject: Re: [PATCH] fs/select: avoid clang stack usage warning To: Arnd Bergmann , Kees Cook Cc: linux-fsdevel@vger.kernel.org, Alexander Viro , Andrew Morton , Andi Kleen , Christoph Hellwig , Eric Dumazet , "Darrick J. Wong" , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, llvm@lists.linux.dev, Paul Kirth Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org + Kees, Paul On Fri, Oct 7, 2022 at 1:28 AM Arnd Bergmann wrote: > > On Fri, Oct 7, 2022, at 12:21 AM, Nick Desaulniers wrote: > > On Thu, Mar 07, 2019 at 10:01:36AM +0100, Arnd Bergmann wrote: > >> The select() implementation is carefully tuned to put a sensible amount > >> of data on the stack for holding a copy of the user space fd_set, > >> but not too large to risk overflowing the kernel stack. > >> > >> When building a 32-bit kernel with clang, we need a little more space > >> than with gcc, which often triggers a warning: > >> > >> fs/select.c:619:5: error: stack frame size of 1048 bytes in function 'core_sys_select' [-Werror,-Wframe-larger-than=] > >> int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, > >> > >> I experimentally found that for 32-bit ARM, reducing the maximum > >> stack usage by 64 bytes keeps us reliably under the warning limit > >> again. > >> > >> Signed-off-by: Arnd Bergmann > >> --- > >> include/linux/poll.h | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/include/linux/poll.h b/include/linux/poll.h > >> index 7e0fdcf905d2..1cdc32b1f1b0 100644 > >> --- a/include/linux/poll.h > >> +++ b/include/linux/poll.h > >> @@ -16,7 +16,11 @@ > >> extern struct ctl_table epoll_table[]; /* for sysctl */ > >> /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating > >> additional memory. */ > >> +#ifdef __clang__ > >> +#define MAX_STACK_ALLOC 768 > > > > Hi Arnd, > > Upon a toolchain upgrade for Android, our 32b x86 image used for > > first-party developer VMs started tripping -Wframe-larger-than= again > > (thanks -Werror) which is blocking our ability to upgrade our toolchain. > > > > I've attached the zstd compressed .config file that reproduces with ToT > > LLVM: > > > > $ cd linux > > $ zstd -d path/to/config.zst -o .config > > $ make ARCH=i386 LLVM=1 -j128 fs/select.o > > fs/select.c:625:5: error: stack frame size (1028) exceeds limit (1024) > > in 'core_sys_select' [-Werror,-Wframe-larger-than] > > int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, > > ^ > > > > As you can see, we're just barely tipping over the limit. Should I send > > a patch to reduce this again? If so, any thoughts by how much? > > Decrementing the current value by 4 builds the config in question, but > > seems brittle. > > > > Do we need to only do this if !CONFIG_64BIT? > > commit ad312f95d41c ("fs/select: avoid clang stack usage warning") > > seems to allude to this being more problematic on 32b targets? > > I think we should keep the limit consistent between 32 bit and 64 bit > kernels. Lowering the allocation a bit more would of course have a > performance impact for users that are just below the current limit, > so I think it would be best to first look at what might be going > wrong in the compiler. > > I managed to reproduce the issue and had a look at what happens > here. A few random observations: > > - the kernel is built with -fsanitize=local-bounds, dropping this > option reduces the stack allocation for this function by around > 100 bytes, which would be the easiest change for you to build > those kernels again without any source changes, but it may also > be possible to change the core_sys_select function in a way that > avoids the insertion of runtime bounds checks. Thanks for taking a look Arnd; ++beers_owed; I'm not sure we'd want to disable CONFIG_UBSAN_LOCAL_BOUNDS=y for this particular configuration of the kernel over this, or remove -fsanitize=local-bounds for this translation unit (even if we did so specifically for 32b targets). FWICT, the parameter n of function core_sys_select() is used to index into the stack allocated stack_fds, which is what -fsanitize=local-bounds is inserting runtime guards for. If I dump the compiler IR (before register allocation), the only explicit stack allocations I observe once the middle end optimizations have run are: 1. a single 64b value...looks like the ktime_t passed to poll_schedule_timeout IIUC. 2. a struct poll_wqueues inlined from do_select. 3. 64x32b array, probably stack_fds. (oh, yeah, those are correct, if I rebuild with `KCFLAGS="-g0 -fno-discard-value-names"` the IR retains identifiers for locals. I should send a patch for that for kbuild). I think that implies that the final stack slots emitted are a result of the register allocator failing to keep all temporary values live in registers; they are spilled to the stack. Paul has been playing with visualizing stack slots recently, and might be able to provide more color. I worry that the back end might do tail duplication or if conversion and potentially split large stack values into two distinct (non-overlapping) stack slots, but haven't seen that yet in reality. We've also seen poor stack slot reuse with KASAN with clang as well... > > - If I mark 'do_select' as noinline_for_stack, the reported frame > size is decreased a lot and is suddenly independent of > -fsanitize=local-bounds: > fs/select.c:625:5: error: stack frame size (336) exceeds limit (100) in 'core_sys_select' [-Werror,-Wframe-larger-than] > int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, > fs/select.c:479:21: error: stack frame size (684) exceeds limit (100) in 'do_select' [-Werror,-Wframe-larger-than] > static noinline int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) I think this approach makes the most sense to me; the caller core_sys_select() has a large stack allocation `stack_fds`, and so does the callee do_select with `table`. Add in inlining and long live ranges and it makes sense that stack spills are going to tip us over the threshold set by -Wframe-larger-than. Whether you make do_select() `noinline_for_stack` conditional on additional configs like CC_IS_CLANG or CONFIG_UBSAN_LOCAL_BOUNDS is perhaps also worth considering. How would you feel about a patch that: 1. reverts commit ad312f95d41c ("fs/select: avoid clang stack usage warning") 2. marks do_select noinline_for_stack ? I assume the point of "small string optimization" going on with `stack_fds` in core_sys_select() is that the potential overhead for kmalloc is much much higher than the cost of not inlining do_select() into core_sys_select(). The above approach does solve this .config's instance, and seems slightly less brittle to me. > However, I don't even see how this makes sense at all, given that > the actual frame size should be at least SELECT_STACK_ALLOC! I think the math checks out: #define FRONTEND_STACK_ALLOC 256 #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; sizeof(long) == 4; // i386 ilp32 sizeof(stack_fds) == sizeof(long) * 256 / sizeof(long) == 256 > > - The behavior of -ftrivial-auto-var-init= is a bit odd here: with =zero or > =pattern, the stack usage is just below the limit (1020), without the > option it is up to 1044. It looks like your .config picks =zero, which > was dropped in the latest clang version, so it falls back to not Huh? What do you mean by "was dropped?" The config I sent has: CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_BARE=y CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y # CONFIG_INIT_STACK_NONE is not set CONFIG_INIT_STACK_ALL_ZERO=y Disabling INIT_STACK_ALL_ZERO from the config provided doesn't elide the diagnostic. Enabling INIT_STACK_ALL_PATTERN does... explicit stack allocations in IR haven't changed. In the generated assembly we're pushing 3x 4B GPRs, subtracting 8B from the stack pointer, then another 1008B. (Filed: https://github.com/llvm/llvm-project/issues/58237) So that's 3 * 4B + 8B + 1008B == 1028. But CONFIG_FRAME_WARN is set to 1024. I wonder if this diagnostic is not as precise as it could be, or my math is wrong? It looks like for arrays INIT_STACK_ALL_PATTERN uses memset to fill the array with 0xFF rather than 0x00 used by INIT_STACK_ALL_ZERO. Not sure why that would make a difference, but curious that it does. Looking at the delta in the (massive) IR between the two, it looks like the second for loop preheader and body differ. That's going to result in different choices by the register allocator. The second loop is referencing `can_busy_loop`, `busy_flag`, and `retval1` FWICT when looking at IR. That looks like one of the loops in do_select. > initializing. Setting it to =pattern should give you the old > behavior, but I don't understand why clang uses more stack without > the initialization, rather than using less, as it would likely cause > fewer spills > > Arnd -- Thanks, ~Nick Desaulniers