Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp4236841pxb; Tue, 25 Jan 2022 06:25:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJzLe7DryQTc/FWZhsKfe44U//SMPkQ0JyFPDRRM4P7Eewfly+gFvKZ2ucQQiMYxjNd7CMSt X-Received: by 2002:a17:90b:3841:: with SMTP id nl1mr3848558pjb.50.1643120716022; Tue, 25 Jan 2022 06:25:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643120716; cv=none; d=google.com; s=arc-20160816; b=Xp4IDETtkd9oY5ROOXNkrhU5vKKm8w3hasf1MFj4fVcJBWAy2en4dHFC+QySMIY+HS gcujSLnBQWygy2wBp/upMxp3UckE9fbBVpnH7L4ZDifXH/x+ch/lPFjlWS4+i34u5IaI NLGaDlQJtMEtAUoJX5s6/Sp4Z5j/pAcFc+E3xG66vs8rla7n0hCJYpCGCR7xxWPA4RUB Hnz+pPvJ3PBnThPEaXjRd01BpF1JyeMD5AWZ6waN068PuQcLOy31OOpz+0qIiRCZVgmr aQyo+20kzc6GHFFhr3Ak4bKqcKtaW5am/Xpy3WkaAediF7bXa9A69uKWaBXrMUPUAXa5 4UBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=IHLfqC26/ZySpJ9o3h2r57QMCxwt9mzMwoe0dLzT+l4=; b=Ni2f/XKoPR5Fn62Ns44VvichZZucJQlwbaWoHtc4teMJNnPOUK+SIXFIv0bp6dksHN HFh3kuq8ijomURrtqHLfar7538L59/F4okiYNetLa4hKtfwsWD+Iy1Kv2UiE0QLsWBVU Iyjm0JWf5YIuv5r1DC+hutSorsiAm81cHd2HSyAjnznxmYsh5baUK0ZiwZJg616BCsw2 yXF3n4R4j8XHmeXMhY2zKlku4AzW/zhXp1ooxTOj86e+ULKd5nSWMiWfJHPZsc2+i5Kr 71hCNghfWKSl7DQItDNbd5nlnvRoxcSUt3Y59nFcVbE4mXf3LOVDYIobuHtT3G40BtMR zV9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=fNpDwIzN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u7si650660plf.36.2022.01.25.06.25.03; Tue, 25 Jan 2022 06:25:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=fNpDwIzN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574159AbiAYJc2 (ORCPT + 99 others); Tue, 25 Jan 2022 04:32:28 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:37470 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573411AbiAYJXa (ORCPT ); Tue, 25 Jan 2022 04:23:30 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 41CCB1F381; Tue, 25 Jan 2022 09:23:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1643102594; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IHLfqC26/ZySpJ9o3h2r57QMCxwt9mzMwoe0dLzT+l4=; b=fNpDwIzNQmpJcHrAmeo5BdlQH9uBbrKEITzB9n96768XwfQ/D0ML9HSalSbl7tbYiMNB/R CsmgzS2jvwybMB7QdmMJ6cSasmzpVrAbeZ+JI9rstx59nPhpuZC5ACRMi4VWroMLUGJgzi QQ3wFiVabi1M5UOuW7Ys3QuTwkiIj48= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id F21FCA3B81; Tue, 25 Jan 2022 09:23:13 +0000 (UTC) Date: Tue, 25 Jan 2022 10:23:13 +0100 From: Michal Hocko To: Minchan Kim Cc: Andrew Morton , David Hildenbrand , linux-mm , LKML , Suren Baghdasaryan , John Dias Subject: Re: [RESEND][PATCH v2] mm: don't call lru draining in the nested lru_cache_disable Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 24-01-22 14:22:03, Minchan Kim wrote: [...] > CPU 0 CPU 1 > > lru_cache_disable lru_cache_disable > ret = atomic_inc_return;(ret = 1) > > ret = atomic_inc_return;(ret = 2) > > lru_add_drain_all(true); > lru_add_drain_all(false) > mutex_lock() is holding > mutex_lock() is waiting > > IPI with !force_all_cpus > ... > ... > IPI done but it skipped some CPUs > > .. > .. > > > Thus, lru_cache_disable on CPU 1 doesn't run with every CPUs so it > introduces race of lru_disable_count so some pages on cores > which didn't run the IPI could accept upcoming pages into per-cpu > cache. Yes, that is certainly possible but the question is whether it really matters all that much. The race would require also another racer to be adding a page to an _empty_ pcp list at the same time. pagevec_add_and_need_flush 1) pagevec_add # add to pcp list 2) lru_cache_disabled atomic_read(lru_disable_count) = 0 # no flush but the page is on pcp There is no strong memory ordering between 1 and 2 and that is why we need an IPI to enforce it in general IIRC. But lru_cache_disable is not a strong synchronization primitive. It aims at providing a best effort means to reduce false positives, right? IMHO it doesn't make much sense to aim for perfection because all users of this interface already have to live with temporary failures and pcp caches is not the only reason to fail - e.g. short lived page pins. That being said, I would rather live with a best effort and simpler implementation approach rather than aim for perfection in this case. The scheme is already quite complex and another lock in the mix doesn't make it any easier to follow. If others believe that another lock makes the implementation more straightforward I will not object but I would go with the following. diff --git a/mm/swap.c b/mm/swap.c index ae8d56848602..c140c3743b9e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -922,7 +922,8 @@ atomic_t lru_disable_count = ATOMIC_INIT(0); */ void lru_cache_disable(void) { - atomic_inc(&lru_disable_count); + int count = atomic_inc_return(&lru_disable_count); + #ifdef CONFIG_SMP /* * lru_add_drain_all in the force mode will schedule draining on @@ -931,8 +932,28 @@ void lru_cache_disable(void) * The atomic operation doesn't need to have stronger ordering * requirements because that is enforeced by the scheduling * guarantees. + * Please note that there is a potential for a race condition: + * CPU0 CPU1 CPU2 + * pagevec_add_and_need_flush + * pagevec_add # to the empty list + * lru_cache_disabled + * atomic_read # 0 + * lru_cache_disable lru_cache_disable + * atomic_inc_return (1) + * atomic_inc_return (2) + * __lru_add_drain_all(true) + * __lru_add_drain_all(false) + * mutex_lock + * mutex_lock + * # skip cpu0 (pagevec_add not visible yet) + * mutex_unlock + * # fail because of pcp(0) pin + * queue_work_on(0) + * + * but the scheme is a best effort and the above race quite unlikely + * to matter in real life. */ - __lru_add_drain_all(true); + __lru_add_drain_all(count == 1); #else lru_add_and_bh_lrus_drain(); #endif -- Michal Hocko SUSE Labs