Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp534493imn; Wed, 3 Aug 2022 14:33:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR4WwXqT8n2PY3wwYtqmS6FEKL12o1X6VW2V/J4Cf64WLkbl0xoQviwgHOpuxJHGTBT/87n2 X-Received: by 2002:a17:90b:4a50:b0:1f4:f1b9:d21f with SMTP id lb16-20020a17090b4a5000b001f4f1b9d21fmr6736121pjb.185.1659562425767; Wed, 03 Aug 2022 14:33:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659562425; cv=none; d=google.com; s=arc-20160816; b=IYQmhw3GNCI2qkKc2HAu3kd3BD3yKL+vOIdT25i6x3rDRwPuB1B5cxXOq2bXCVUc67 9W60Smqk8Ol5TD5B5Huc8jqm+VpT4Fi6XGGcbK4SIJXFa06IxNyCcwHcm58RCyf+AGCm X6bTwuzumQJX3NWCMdzBfO1ctr8GFKkIotBRxKgj4MRL7hmDhlwnTzWOIqWsFDIyw05T DtHDBUJBoieZiRzWnxsGcLKgR8cjcv7n8x5yBxG6X0qVNAA/3BH1JUD4BtS0+eGeXHIv HBm9S/N22aZa0YORlYay5VXzt0aZW8sXzpnEQvhF/j4gbAk31hjhmZinCu4xZ+3PyEnM HYVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=W31+WmZqDoTeRfHZKbgwNjG6H/G021VFwQTB4BGaHxs=; b=W9+gWG4iOyTUzGJnd2jLCEFOU5w48HqyZku70xNIRQNlcbcyqLhYgJAL08uXwHrcvs 7ABlfpNeocOPOKLs+tIlnBMOBWE7ZjF5aeDHUCZTqfAvTK/9CmLC/UUJeEwbH+QcdiRP nwcY2hcNVu9UFPyAf9y71BHw7aD+7cfa2lyr/skolwvBhiTU24aFLZh33UYrl43pT8Sl TzQvD6O6HEv1qAXx9Ial/MNx5iyLmm57pY1azxvhkKqoyFzhMLpB2Zxa2yP5UhI2oXc2 MRjbYCdlaUWiljdB5MYfpIcrxzuueElWfzzhBjsBl9nj+8VT0kwnGLfznmVz7Szn2tC4 7CCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OURmV5cy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u9-20020a170902e5c900b0016d72804653si3660204plf.17.2022.08.03.14.33.32; Wed, 03 Aug 2022 14:33:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OURmV5cy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236115AbiHCVWC (ORCPT + 99 others); Wed, 3 Aug 2022 17:22:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232909AbiHCVWA (ORCPT ); Wed, 3 Aug 2022 17:22:00 -0400 Received: from mail-ua1-x92a.google.com (mail-ua1-x92a.google.com [IPv6:2607:f8b0:4864:20::92a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E78DC2A43E for ; Wed, 3 Aug 2022 14:21:58 -0700 (PDT) Received: by mail-ua1-x92a.google.com with SMTP id l7so7492625ual.9 for ; Wed, 03 Aug 2022 14:21:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=W31+WmZqDoTeRfHZKbgwNjG6H/G021VFwQTB4BGaHxs=; b=OURmV5cyPLxoH0AtByuPXsZsNGZZTChd2lpUNVTODewfq+G2biyliNUWMykdko/uK9 3gQoCh/LzjtBfYzUKWV5rOdxSfDyhN1vYsE3L29Xb39pDaxFKh+tJiKvjxFeO36Jyynj tvPOntSEF0ADSfvX4R7L91jeQZcYHXm2OBmTTCsBEIazhSo+Ra/vckPyuydrnyYV5fcr ltjlfNx5zZqNB6k3d0KCUyOVFN4F3vFs0TLJZqQfJnJ1h7+piR4wJT7O0rDAAUYHfdOR J7qnxF7QZGAhPnzSzvjnG1s02qMGeezikmXNJEC4mytiDgs6otKmuKD/ei+C4yhWxi90 nV9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=W31+WmZqDoTeRfHZKbgwNjG6H/G021VFwQTB4BGaHxs=; b=0zKFotAERuIeJjTQvGccwSx31ZJHiuVslIy+9lJWdWlC3Jn3YqLBGKOCqQ9heZQxoz ZEjKwq+ZI/H8kLhp/M4BFS7Fb+MwuG12JObPSeYTAER0SCMC/didud2HdwNh6TKJBa8S FKo6ut3RBs7VARs7m7gU1sP+JAfyLoKw3+4+GkoJa8SKq1ciSjKsvfcQhyMgjZohtQqM LEzc25sAZUtVeXC4eC78aZqaaSRwQSx7Q2sMye6gCaNILwN3zWutCZTXIVEGx+DElxz5 x9wN7DBk0hqYO90BsGeb88WuUuEtT9YjmT189wMjkmH3QxYZUicHlEEFSUmNxl4wn0pw dvHA== X-Gm-Message-State: ACgBeo1Sh5ispqS+/wxD4oggq5Absb1MtcBSZJ3AaPkTcP8ee9pd1kvY wGDz8cdQqOlYEv7dGOY0/lN3APkQGsg/4ILjnfpBBNSZ+t3ExRVX X-Received: by 2002:a9f:3869:0:b0:387:8a7f:4222 with SMTP id q38-20020a9f3869000000b003878a7f4222mr3135381uad.119.1659561717985; Wed, 03 Aug 2022 14:21:57 -0700 (PDT) MIME-Version: 1.0 References: <20220802214041.2656586-1-neelnatu@google.com> In-Reply-To: From: Neel Natu Date: Wed, 3 Aug 2022 14:21:46 -0700 Message-ID: Subject: Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap(). To: Yury Norov Cc: Andy Shevchenko , Rasmus Villemoes , Peter Zijlstra , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 3, 2022 at 11:52 AM Yury Norov wrote: > > On Wed, Aug 03, 2022 at 10:49:57AM -0700, Neel Natu wrote: > > On Tue, Aug 2, 2022 at 6:22 PM Yury Norov wrote: > > > > > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu wrote: > > > > > > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK. > > > > This in turn can change the set of cpus visited by for_each_cpu_wrap() > > > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS). > > > > > > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate > > > > over cpus outside the 'cpu_possible_mask'. > > > > > > > > Fix this to make its behavior match for_each_cpu() which always limits > > > > the iteration to the range [0, nr_cpu_ids). > > > > > > > > Signed-off-by: Neel Natu > > > > > > The patch itself doesn't look correct because it randomly switches a piece > > > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch > > > others. > > > > > > However... > > > > > > I don't know the story behind having 2 variables holding the max possible > > > number of cpus, and it looks like it dates back to prehistoric times. In > > > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids > > > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and > > > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig > > > of every popular architecture. > > > > > > > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK > > I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't > > match the observation > > above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y. > > I said this because the comment in include/linux/cpumaks.h says: > * If HOTPLUG is enabled, then cpu_possible_mask is forced to have > * all NR_CPUS bits set, otherwise it is just the set of CPUs that > * ACPI reports present at boot. > > And because of the code that sets nr_cpu_ids: > > void __init setup_nr_cpu_ids(void) > { > nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; > } > > Some arches override it, so it may be an issue. Are you running x86, > or maybe you have "nr_cpus" boot parameter? > > But anyways, I feel like this should be investigated for more... Can you > please share the config of your system and boot params? > Yeah, this is a stock defconfig compiled on an x86_64 host and booted inside a 20 vcpu virtual machine (virtme). There are no command line params to modify the number of cpus. I think everything is working as expected based on some debug prints I added during boot: [ 0.641798] smp: setup_nr_cpu_ids: nr_cpu_ids 20, cpu_possible_mask 0-19 [ 0.648424] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:20 nr_node_ids:2 The first one is from setup_nr_cpu_ids() in kernel/smp.c. The second one is from setup_per_cpu_areas() from arch/x86/setup_percpu.c. > > > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids > > > and nr_cpumask_bits different - what case should it cover?... Interestingly, in > > > comments to cpumask functions and in the code those two are referred > > > interchangeably. > > > > > > Even more interestingly, we have a function bitmap_setall() that sets all bits > > > up to nr_cpumask_bits, and it could trigger the problem that you described, > > > > I think you mean cpumask_setall() that in turn calls > > bitmap_fill(nr_cpumask_bits)? > > Yes, sorry. > > > > so that someone would complain. (Are there any other valid reasons to set > > > bits behind nr_cpu_ids intentionally?) > > > > > > > I don't know of any although this wasn't the case that trigger in my case. > > > > > Can you share more details about how you triggered that? If you observe > > > those bits set, something else is probably already wrong... > > > > The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating > > over a cpumask passed by userspace that set a bit in the [nr_cpu_ids, > > nr_cpumask_bits) > > range. > > If you receive bitmap from userspace, you need to copy it with > bitmap_copy_clear_tail(), or bitmap_from_arr{32,64}, as appropriate. > That will clear unneeded bits. > > For user-to-kernel communications, I'd recommend passing bitmaps in a > human-readable formats - hex string or bitmap list. Check the functions > cpumask_parse_user() and cpumask_parselist_user(). This would help to > avoid all possible issues related to endianness and 32/64 word length > mismatch. > > > The kernel code is out of tree but open source so happy to provide a > > pointer if needed. > > Yes please > Here is where we copy the user supplied cpumask using the get_user_cpu_mask() helper: https://github.com/google/ghost-kernel/blob/c21b36f87663efa2189876b2caa701fe74e72adf/kernel/sched/ghost.c#L5729 For performance reasons we cannot use human readable cpu masks in this code path. Note that the helper copies up to 'nr_cpumask_bits' which in some kernels (!CONFIG_CPUMASK_OFFSTACK) can copy bits beyond 'nr_cpu_ids'. A possible option could be to fix this helper but I do feel that for_each_cpu() and for_each_cpu_wrap() should visit the same set of cpus given the same cpumask (ordering can be different but the set of cpus should remain the same). What do you think? best Neel > > I considered ANDing the user supplied mask with 'cpu_possible_mask' > > but that felt > > like working around an inconsistency in the kernel API (hence the proposed fix). > > > > > So, if there is no real case in modern kernel to have nr_cpumask_bits and > > > nr_cpu_ids different, the proper fix would be to just drop the first. > > > > > > > I'll let other people more knowledgable than me in this area chime in. > > I'll be happy either > > way if that fixes the problem at hand. > > > > best > > Neel > > > > > If there is such a case, this is probably your case, and we'd know more > > > details to understand how to deal with that. > > > > > > Thanks, > > > Yury