Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1434160rwb; Fri, 12 Aug 2022 00:22:01 -0700 (PDT) X-Google-Smtp-Source: AA6agR4lCdnlPWzz8keYK5ljni96j3rXa6WDXo8L0PkVvz+ej/S7iEvQjha850hM8UotqLFedaaf X-Received: by 2002:a17:907:6d8c:b0:731:6c60:eced with SMTP id sb12-20020a1709076d8c00b007316c60ecedmr1867182ejc.266.1660288921299; Fri, 12 Aug 2022 00:22:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660288921; cv=none; d=google.com; s=arc-20160816; b=E1hkvaXSd1SVWJPf0bRt6+HqOoya9eOg7Zv7pFYVCDHVOHEOuOPcGdzERaxQCRI9YK Av3vw7cbeL5qdUKDOBvR8lcruQALEJ/kFWi5ZOGzZo2u8lDMTUv01PPSQK7J3ofU2ACU HJ65VkeHas3qt9i3qETdidlW5dSDj8Fi9x2JkCok39Ph0X+lDfxI70+p2xt/TpevbHQ7 4yJrH/7NOjuVdoeYncSDqaQ2Z4B0YAy+T+35yCBYVxaq9gdGHd7tgUvsb562ruCA2pex DcOfw7bIpCesweQlyOxWvuP/K+pn+JdPaJ8FzPMCgvBE5MfkKIfGtC7KZvcXAVMg+o2+ tZ5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=NpRoRt9oCsrbPdkhbPyBHdGh1078Vs1r57A/qJxLgf0=; b=YwgFlNMiTUvZLmzKuSVA4Nn3XVpoAgYd2n6+GMGZoNA2PGLU2gdq7vFK52YzYN1iVn KammjWndc3JzzJzCdCLBNk2TwvYoZWuw3NN9WduF6DH3lGLqrKyL9TrQxA6h9mNPkBFv Gw+gMeJXEmGbYHlFxL3WYimmUKoUpORqP3BGvCbEZ0r6p4EPQKM1wfMPdKP92K+yheHm hV9BmqkXyX3A3ulKE7CXslHGUpqDNz8wyZa3XHnuNxBgZHRGloabqsuc3IoGQPsnFoG5 W6c6mgkkkY4MX/7Phx6O544ayysuDWTlxbhcOanWLsa28sURRZYizc81Sh7Xgc+ewaX8 VVHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bPs7EqTE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r22-20020a056402035600b0043bf15dc4d2si1522894edw.131.2022.08.12.00.21.35; Fri, 12 Aug 2022 00:22:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bPs7EqTE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237096AbiHLHHc (ORCPT + 99 others); Fri, 12 Aug 2022 03:07:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236378AbiHLHH3 (ORCPT ); Fri, 12 Aug 2022 03:07:29 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1119B95AC3 for ; Fri, 12 Aug 2022 00:07:26 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6613A1FDD6; Fri, 12 Aug 2022 07:07:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1660288045; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NpRoRt9oCsrbPdkhbPyBHdGh1078Vs1r57A/qJxLgf0=; b=bPs7EqTEuAvyrD9uNUdGHh+Plh9hCw+x7dR+jGW6GAA/3+tHiv5FsSacWYkZBX2MPccCRX KfklPk5FGdUVgoR57yrraWV/Oie5pNDW2mszEbLL+EWcs9PEeG+oLWhs6m0gEozOH2rcuv IG4J4Y8GbegPf/jjgc/+i9pvhMX3/QM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 455D013AAE; Fri, 12 Aug 2022 07:07:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Ddq2Di389WLcfQAAMHmgww (envelope-from ); Fri, 12 Aug 2022 07:07:25 +0000 Date: Fri, 12 Aug 2022 09:07:24 +0200 From: Michal Hocko To: Abel Wu Cc: Andrew Morton , Vlastimil Babka , Mel Gorman , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Wei Yang Subject: Re: [PATCH v2] mm/mempolicy: fix lock contention on mems_allowed Message-ID: References: <20220811124157.74888-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 11-08-22 16:11:23, Michal Hocko wrote: > fix the lkml address (fat fingers, sorry) > > On Thu 11-08-22 16:06:37, Michal Hocko wrote: > > [Cc Wei Yang who is author of 78b132e9bae9] > > > > On Thu 11-08-22 20:41:57, Abel Wu wrote: > > > The mems_allowed field can be modified by other tasks, so it isn't > > > safe to access it with alloc_lock unlocked even in the current > > > process context. > > > > > > Say there are two tasks: A from cpusetA is performing set_mempolicy(2), > > > and B is changing cpusetA's cpuset.mems: > > > > > > A (set_mempolicy) B (echo xx > cpuset.mems) > > > ------------------------------------------------------- > > > pol = mpol_new(); > > > update_tasks_nodemask(cpusetA) { > > > foreach t in cpusetA { > > > cpuset_change_task_nodemask(t) { > > > mpol_set_nodemask(pol) { > > > task_lock(t); // t could be A > > > new = f(A->mems_allowed); > > > update t->mems_allowed; > > > pol.create(pol, new); > > > task_unlock(t); > > > } > > > } > > > } > > > } > > > task_lock(A); > > > A->mempolicy = pol; > > > task_unlock(A); > > > > > > In this case A's pol->nodes is computed by old mems_allowed, and could > > > be inconsistent with A's new mems_allowed. > > > > Just to clarify. With an unfortunate timing and those two nodemasks > > overlap the end user effect could be a premature OOM because some nodes > > wouldn't be considered, right? > > > > > While it is different when replacing vmas' policy: the pol->nodes is > > > gone wild only when current_cpuset_is_being_rebound(): > > > > > > A (mbind) B (echo xx > cpuset.mems) > > > ------------------------------------------------------- > > > pol = mpol_new(); > > > mmap_write_lock(A->mm); > > > cpuset_being_rebound = cpusetA; > > > update_tasks_nodemask(cpusetA) { > > > foreach t in cpusetA { > > > cpuset_change_task_nodemask(t) { > > > mpol_set_nodemask(pol) { > > > task_lock(t); // t could be A > > > mask = f(A->mems_allowed); > > > update t->mems_allowed; > > > pol.create(pol, mask); > > > task_unlock(t); > > > } > > > } > > > foreach v in A->mm { > > > if (cpuset_being_rebound == cpusetA) > > > pol.rebind(pol, cpuset.mems); > > > v->vma_policy = pol; > > > } > > > mmap_write_unlock(A->mm); > > > mmap_write_lock(t->mm); > > > mpol_rebind_mm(t->mm); > > > mmap_write_unlock(t->mm); > > > } > > > } > > > cpuset_being_rebound = NULL; > > > > > > In this case, the cpuset.mems, which has already done updating, is > > > finally used for calculating pol->nodes, rather than A->mems_allowed. > > > So it is OK to call mpol_set_nodemask() with alloc_lock unlocked when > > > doing mbind(2). > > > > > > Fixes: 78b132e9bae9 ("mm/mempolicy: remove or narrow the lock on current") > > > Signed-off-by: Abel Wu > > > > The fix looks correct. Forgot Acked-by: Michal Hocko > > > > > --- > > > mm/mempolicy.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > > index d39b01fd52fe..61e4e6f5cfe8 100644 > > > --- a/mm/mempolicy.c > > > +++ b/mm/mempolicy.c > > > @@ -855,12 +855,14 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, > > > goto out; > > > } > > > > > > + task_lock(current); > > > ret = mpol_set_nodemask(new, nodes, scratch); > > > if (ret) { > > > + task_unlock(current); > > > mpol_put(new); > > > goto out; > > > } > > > - task_lock(current); > > > + > > > old = current->mempolicy; > > > current->mempolicy = new; > > > if (new && new->mode == MPOL_INTERLEAVE) > > > -- > > > 2.31.1 > > > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs