Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp633158rwb; Thu, 11 Aug 2022 07:29:29 -0700 (PDT) X-Google-Smtp-Source: AA6agR6MPaIQbHXgkHvtH2L5M1rr056ZP53CmvqXnlOY0lXnwHESwHknOVJeeVK3zrhtNBDWMZ/c X-Received: by 2002:a17:907:b04:b0:730:b0d7:eeea with SMTP id h4-20020a1709070b0400b00730b0d7eeeamr24346664ejl.173.1660228169337; Thu, 11 Aug 2022 07:29:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660228169; cv=none; d=google.com; s=arc-20160816; b=QPO7DncQW5v7CgdanwfwF4ibVE6KH8+GA1QnZhAAEAj0oUhCz9xQDNgh+7Oed1dKZK U6cF6RD9jcR5bYS+6HVyPBvYxyYLUncZP7diJ2t8pFGcjCw5dGgIHniBUkbQRUM0ujM3 /PFE+hQEwodDbRcKhxCQ+Sf9Waa0WKOpBw0okIlk89qRIdhGxRqGnmOzkHAc6PALlXQD AGsNG7n7pqnX4ekcMsck8CvIGVoQTtywa9EwHxKyKDt0YTRXfB/xjKy7MVtOIIAoCoM3 Dd5tC8IS3dHE3X7Fd2u2gWLZhFf709r+R5mnig6uGKep+yJ0Au93eWiPPoQzBVf3QlCj RlbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=KD4jaSHGbAyRBzJRJqOn66507StRwvghrEfKEXzhkyI=; b=CSFnzpLbUzA3J5QWHh6X3Q7wt8Sgia//ReRNrg3hnd5Q2oZrcdGPorVMM3zjA7sdVz jwmRgmT7C+h1wmQSUw0C1veVmN4OlfvwTUfoixb9djRp8+ONVzNSPn058u77KTIDOCTR CxWr4xPuhE52D//UzWA5tJ4zYglddcziwmDls0Qn7qEfbzJq9ep+CgylF6eWdC2rMalM +Qwybm5ibJ//4gY7iFnt/K+s0pf7PdtNl/hlKp+6qvqdiO0g8hQU8Ozy55n7dK9Y3l2S K2o+tDB+ciyQVBwYTbUMCOUTk3TgmFLNnlE2sB6DhQHzCbn3kB8V9gORlpfTyTBhZTU0 AjJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=qIpDXvED; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga4-20020a1709070c0400b0073187ebd11esi7917704ejc.162.2022.08.11.07.29.01; Thu, 11 Aug 2022 07:29:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=qIpDXvED; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234190AbiHKOL3 (ORCPT + 99 others); Thu, 11 Aug 2022 10:11:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234066AbiHKOL0 (ORCPT ); Thu, 11 Aug 2022 10:11:26 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B577C5C9FF for ; Thu, 11 Aug 2022 07:11:25 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 90DE15CD6C; Thu, 11 Aug 2022 14:11:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1660227083; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KD4jaSHGbAyRBzJRJqOn66507StRwvghrEfKEXzhkyI=; b=qIpDXvEDbdPdn8WS3gwp4TYDIrwruTgaw9qlNiLtIfC62IeGVg0tE1IGtSULet4YXEHGmk ZPV403+CKrQYSuOwshPTlBqo/6NcDS/NIHRgM7SYfH6tII/LQ8NFWFvPJl80v5Q7ULlmV2 IuFUSIQOiYkqqd2uFpoEne+7/1kWm6Y= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 450E11342A; Thu, 11 Aug 2022 14:11:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /8juDQsO9WITegAAMHmgww (envelope-from ); Thu, 11 Aug 2022 14:11:23 +0000 Date: Thu, 11 Aug 2022 16:11:21 +0200 From: Michal Hocko To: Abel Wu Cc: Andrew Morton , Vlastimil Babka , Mel Gorman , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Wei Yang Subject: Re: [PATCH v2] mm/mempolicy: fix lock contention on mems_allowed Message-ID: References: <20220811124157.74888-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org fix the lkml address (fat fingers, sorry) On Thu 11-08-22 16:06:37, Michal Hocko wrote: > [Cc Wei Yang who is author of 78b132e9bae9] > > On Thu 11-08-22 20:41:57, Abel Wu wrote: > > The mems_allowed field can be modified by other tasks, so it isn't > > safe to access it with alloc_lock unlocked even in the current > > process context. > > > > Say there are two tasks: A from cpusetA is performing set_mempolicy(2), > > and B is changing cpusetA's cpuset.mems: > > > > A (set_mempolicy) B (echo xx > cpuset.mems) > > ------------------------------------------------------- > > pol = mpol_new(); > > update_tasks_nodemask(cpusetA) { > > foreach t in cpusetA { > > cpuset_change_task_nodemask(t) { > > mpol_set_nodemask(pol) { > > task_lock(t); // t could be A > > new = f(A->mems_allowed); > > update t->mems_allowed; > > pol.create(pol, new); > > task_unlock(t); > > } > > } > > } > > } > > task_lock(A); > > A->mempolicy = pol; > > task_unlock(A); > > > > In this case A's pol->nodes is computed by old mems_allowed, and could > > be inconsistent with A's new mems_allowed. > > Just to clarify. With an unfortunate timing and those two nodemasks > overlap the end user effect could be a premature OOM because some nodes > wouldn't be considered, right? > > > While it is different when replacing vmas' policy: the pol->nodes is > > gone wild only when current_cpuset_is_being_rebound(): > > > > A (mbind) B (echo xx > cpuset.mems) > > ------------------------------------------------------- > > pol = mpol_new(); > > mmap_write_lock(A->mm); > > cpuset_being_rebound = cpusetA; > > update_tasks_nodemask(cpusetA) { > > foreach t in cpusetA { > > cpuset_change_task_nodemask(t) { > > mpol_set_nodemask(pol) { > > task_lock(t); // t could be A > > mask = f(A->mems_allowed); > > update t->mems_allowed; > > pol.create(pol, mask); > > task_unlock(t); > > } > > } > > foreach v in A->mm { > > if (cpuset_being_rebound == cpusetA) > > pol.rebind(pol, cpuset.mems); > > v->vma_policy = pol; > > } > > mmap_write_unlock(A->mm); > > mmap_write_lock(t->mm); > > mpol_rebind_mm(t->mm); > > mmap_write_unlock(t->mm); > > } > > } > > cpuset_being_rebound = NULL; > > > > In this case, the cpuset.mems, which has already done updating, is > > finally used for calculating pol->nodes, rather than A->mems_allowed. > > So it is OK to call mpol_set_nodemask() with alloc_lock unlocked when > > doing mbind(2). > > > > Fixes: 78b132e9bae9 ("mm/mempolicy: remove or narrow the lock on current") > > Signed-off-by: Abel Wu > > The fix looks correct. > > > --- > > mm/mempolicy.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index d39b01fd52fe..61e4e6f5cfe8 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -855,12 +855,14 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, > > goto out; > > } > > > > + task_lock(current); > > ret = mpol_set_nodemask(new, nodes, scratch); > > if (ret) { > > + task_unlock(current); > > mpol_put(new); > > goto out; > > } > > - task_lock(current); > > + > > old = current->mempolicy; > > current->mempolicy = new; > > if (new && new->mode == MPOL_INTERLEAVE) > > -- > > 2.31.1 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs