Received: by 2002:a05:7412:251c:b0:e2:908c:2ebd with SMTP id w28csp2742158rda; Wed, 25 Oct 2023 10:56:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/gjF0jbDG3VN9ZXQWFt7r1u8Qonndr3FGydY0Zi1nF89PhpIrP0uPE5AtMFS2Q5WSu1IN X-Received: by 2002:a25:1544:0:b0:da0:3fac:7ee2 with SMTP id 65-20020a251544000000b00da03fac7ee2mr5292881ybv.22.1698256611637; Wed, 25 Oct 2023 10:56:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698256611; cv=none; d=google.com; s=arc-20160816; b=fLqCJBKpTmesvJHxWLNSWrZucnlagZ2Wl0FlPriICU/Fogt78UjoWK6A8sC9z/3TlP UquOebA3ToeHS1lV+n4PbzanYqK2kiX0Wj9a2lY8wIaT8+TJHWTx2ZWSIsSe+CHkifgO lu+w59G3Uu+VQ0TtONYJRVEWiAVZ2WtpEugAY8M+XKkE9HnAoYYJ95d8MJ6PTikM+8z+ g0GGsLYkUp/4QqLgRtwp0IcUH4nQaIMMDxU+aZFJ/oZNuHmlavXYknDYmXpzlzaBhGHn Bj9ml39JHA6pIre/MybyXuUZwhOE2a18S1Mj0sIleLYXiZFndaYv6Iiu2I4HsOW6gHzz 880w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=PdaVkOpgwdVB5nOol68sxs3zuXYobG/IF5E6NLOsBcw=; fh=VuurgKOz/iYGqMeXCSXbmd1sqeKGCB6wb7SlBMhYtIk=; b=uilqUFcN9BIlEj0cZgkc+wZdTIzYa8GIOnKnoudidomsSwUrbA8CxCmuJOAueq4hvW sFK0GTJg/s8p5MSDfFq6DeWqKdnPVeJSn2cbZc/SatltBudVZ8z22bynkblB17kHDOi+ /6rvAD5PerK5a2V+RHv8msC7jJqAKMDaqHAppi6WFMUguvcwhbfSoG1NKOUw4opNmCCQ 1DAl5Qvsmd+VnnWa+EGC+494bJdwj6NIZu7bbVc8C+msxbEP8MIHxzpu9ooE836LcdUj YWuDnrbpxsT+3UqRgf7PFkXTk0vDTq6CFbbGr+FYmSjwr0+fJx0YDsIMz6RC4jUoJSm0 DX/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id cj22-20020a056902189600b00d9ca4aca9afsi15491817ybb.90.2023.10.25.10.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 10:56:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 1F9C980BA6A0; Wed, 25 Oct 2023 10:56:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232695AbjJYR4L (ORCPT + 99 others); Wed, 25 Oct 2023 13:56:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjJYR4K (ORCPT ); Wed, 25 Oct 2023 13:56:10 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B192B138 for ; Wed, 25 Oct 2023 10:56:07 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC5F41FB; Wed, 25 Oct 2023 10:56:48 -0700 (PDT) Received: from [10.1.197.60] (eglon.cambridge.arm.com [10.1.197.60]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DF32A3F738; Wed, 25 Oct 2023 10:56:03 -0700 (PDT) Message-ID: Date: Wed, 25 Oct 2023 18:55:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks Content-Language: en-GB To: Reinette Chatre , x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Babu Moger , shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS , carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, xingxin.hx@openanolis.org, baolin.wang@linux.alibaba.com, Jamie Iles , Xin Hao , peternewman@google.com, dfustini@baylibre.com, amitsinght@marvell.com References: <20230914172138.11977-1-james.morse@arm.com> <20230914172138.11977-25-james.morse@arm.com> <9d4282af-7237-0c26-7c48-e2b9e2fb7795@intel.com> From: James Morse In-Reply-To: <9d4282af-7237-0c26-7c48-e2b9e2fb7795@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.1 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 25 Oct 2023 10:56:21 -0700 (PDT) Hi Reinette, On 03/10/2023 22:28, Reinette Chatre wrote: > On 9/14/2023 10:21 AM, James Morse wrote: >> resctrl has one mutex that is taken by the architecture specific code, >> and the filesystem parts. The two interact via cpuhp, where the >> architecture code updates the domain list. Filesystem handlers that >> walk the domains list should not run concurrently with the cpuhp >> callback modifying the list. >> >> Exposing a lock from the filesystem code means the interface is not >> cleanly defined, and creates the possibility of cross-architecture >> lock ordering headaches. The interaction only exists so that certain >> filesystem paths are serialised against CPU hotplug. The CPU hotplug >> code already has a mechanism to do this using cpus_read_lock(). >> >> MPAM's monitors have an overflow interrupt, so it needs to be possible >> to walk the domains list in irq context. RCU is ideal for this, >> but some paths need to be able to sleep to allocate memory. >> >> Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part >> of a cpuhp callback, cpus_read_lock() must always be taken first. >> rdtgroup_schemata_write() already does this. >> >> Most of the filesystem code's domain list walkers are currently >> protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). >> The exceptions are rdt_bit_usage_show() and the mon_config helpers >> which take the lock directly. >> >> Make the domain list protected by RCU. An architecture-specific >> lock prevents concurrent writers. rdt_bit_usage_show() could >> walk the domain list using RCU, but to keep all the filesystem >> operations the same, this is changed to call cpus_read_lock(). >> The mon_config helpers send multiple IPIs, take the cpus_read_lock() >> in these cases. >> >> The other filesystem list walkers need to be able to sleep. >> Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the >> cpuhp callbacks can't be invoked when file system operations are >> occurring. >> >> Add lockdep_assert_cpus_held() in the cases where the >> rdtgroup_kn_lock_live() call isn't obvious. > One place that does not seem to have this annotation that > I think is needed is within get_domain_from_cpu(). Starting > with this series it is called from resctrl_offline_cpu() > called via CPU hotplug code. From now on extra care needs to be > taken when trying to call it from anywhere else. Excellent! This shows that the overflow/limbo threads are now exposed to CPUs going offline while they run - I'll fix that. But, this gets called via IPI from rdt_ctrl_update(), and lockdep can't know who the IPI came from to check the lock was held, so it triggers false positives. This one will look a bit funny: | /* | * Walking r->domains, ensure it can't race with cpuhp. | * Because this is called via IPI by rdt_ctrl_update(), assertions | * about locks this thread holds will lead to false positives. Check | * someone is holding the CPUs lock. | */ | if (IS_ENABLED(CONFIG_LOCKDEP)) | lockdep_is_cpus_held(); >> Resctrl's domain online/offline calls now need to take the >> rdtgroup_mutex themselves. >> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >> index 1a10f567bbe5..8fd0510d767b 100644 >> --- a/arch/x86/kernel/cpu/resctrl/core.c >> +++ b/arch/x86/kernel/cpu/resctrl/core.c >> @@ -25,8 +25,15 @@ >> #include >> #include "internal.h" >> >> -/* Mutex to protect rdtgroup access. */ >> -DEFINE_MUTEX(rdtgroup_mutex); >> +/* >> + * rdt_domain structures are kfree()d when their last CPU goes offline, >> + * and allocated when the first CPU in a new domain comes online. >> + * The rdt_resource's domain list is updated when this happens. Readers of >> + * the domain list must either take cpus_read_lock(), or rely on an RCU >> + * read-side critical section, to avoid observing concurrent modification. >> + * All writers take this mutex: >> + */ >> +static DEFINE_MUTEX(domain_list_lock); >> > > I assume that you have not followed the SNC work. Please note that in > that work the domain list is split between a monitoring domain list and > control domain list. I expect this lock would cover both and both would > be rcu lists? It's on my list to read through, but too much arm stuff comes up for me to get to it. I agree that one write-lock to protect both RCU lists makes sense, those would only ever be modified together. The case I have for needing to walk the list without taking a lock only applies to the monitors - but keeping the rules the same makes it easier to think about. >> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >> index b4ed4e1b4938..0620dfc72036 100644 >> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >> @@ -535,7 +541,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, >> int cpu; >> >> /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */ >> - lockdep_assert_held(&rdtgroup_mutex); >> + lockdep_assert_cpus_held(); >> > > Only now is that comment accurate. Could it be moved to this patch? Before this patch resctrl_arch_offline_cpu() took the mutex, if this thread held the mutex, then cpuhp would get blocked in resctrl_arch_offline_cpu() until it was released. What has changed is how that mutual-exclusion is provided, but the comment describes why mutual-exclusion is needed. >> @@ -3801,6 +3832,13 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d) >> domain_destroy_mon_state(d); >> } >> >> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d) >> +{ >> + mutex_lock(&rdtgroup_mutex); >> + _resctrl_offline_domain(r, d); >> + mutex_unlock(&rdtgroup_mutex); >> +} >> + > This seems unnecessary. Why not keep resctrl_offline_domain() as-is and just > take the lock within it? For offline there is nothing in it, but .... >> @@ -3870,12 +3908,23 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) >> return 0; >> } >> >> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) >> +{ >> + int err; >> + >> + mutex_lock(&rdtgroup_mutex); >> + err = _resctrl_online_domain(r, d); >> + mutex_unlock(&rdtgroup_mutex); >> + >> + return err; >> +} >> + > > Same here. resctrl_online_domain() has four exit paths, like this they can just return an error, and the locking is taken care of here to keep the churn down. But it's just preference - I've changed it to do this with a handful of gotos. Thanks, James