Received: by 2002:a25:2c96:0:0:0:0:0 with SMTP id s144csp1358037ybs; Mon, 25 May 2020 14:01:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy6f+FE6hbJzwAshRVPEcw81YW6Hck7QgPolDVNmhotPJbu8RyhvGRoqsZqWBvHjihN4xUS X-Received: by 2002:a17:906:46d3:: with SMTP id k19mr21148187ejs.349.1590440477072; Mon, 25 May 2020 14:01:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590440477; cv=none; d=google.com; s=arc-20160816; b=tc0iXz3Lp5fHyd/2nd5eiMU6Zn1eky5dBX3YAsWhz76z+bbogp/0r1JylFk7aXzsys RFzwa1JG41pD1ZXBn7LFqYMZ9mTVRjl1zNRtHsfW9ThPISY7/pw+miVn0XYA3DRAX2jp QHwQq1kvvSey9p+kOf/OK5N1Dab7moj0BXTj2s+QIiXnlYb158kHNC/xzBdenyj9TBw4 TnL6mQvES8XGDHcGC5g9SmJhZUjH7pCI2b2llI73M7wN7OKk7D+XqwkVrHKGn/k08rC1 azkZP6vYIePqqbzUyZ1PFAqN6YsgeZaemsK8BFsRutUjpkKX1TNefz6uBQgqDuoDTasA CtGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=POZ9SFxUj3Z/hJhUd9924W4XczqOMaaMYITap3JhnB0=; b=ZsF0HpDZJ+nehRW91kSL1a8+em3WjdKOaHCWm4xnUkvQWRrsOxuom3q3O9wg0ft4u4 V+ylS9WmV5s/1bBPM00loH8bVzqQsA+d069RJt9f76KGQr+7mGadjIaH5k5ALOwChN8X wIIRZa6Uq/NtR/eJ8yXL/82eue6PdEL9DJ61qSJjuFs1c8YhLFOd9QtzHtza6GSuDG08 jJe1/nlNSYxR04xlsZnAovm/o5bED16t/Nu0mcV6qpPBbb/kKMrz86Nw1sf97J7juxcJ Ij72wZskn/PScrwVUCpTlV0bkD9IXuuBITc4z2sMwbQirFzaINMKlDVcOx4HMkesNgqI yLuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t4si1929427edt.0.2020.05.25.14.00.54; Mon, 25 May 2020 14:01:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388886AbgEYQKu (ORCPT + 99 others); Mon, 25 May 2020 12:10:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388556AbgEYQKu (ORCPT ); Mon, 25 May 2020 12:10:50 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FA83C061A0E for ; Mon, 25 May 2020 09:10:50 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=vostro) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jdFgY-00059p-Mv; Mon, 25 May 2020 18:10:43 +0200 From: John Ogness To: Peter Zijlstra Cc: "Ahmed S. Darwish" , Ingo Molnar , Will Deacon , Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , Andrew Morton , Konstantin Khlebnikov , linux-mm@kvack.org Subject: Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API References: <20200519214547.352050-1-a.darwish@linutronix.de> <20200519214547.352050-3-a.darwish@linutronix.de> <20200522145707.GO325280@hirez.programming.kicks-ass.net> Date: Mon, 25 May 2020 18:10:40 +0200 In-Reply-To: <20200522145707.GO325280@hirez.programming.kicks-ass.net> (Peter Zijlstra's message of "Fri, 22 May 2020 16:57:07 +0200") Message-ID: <87y2pg9erj.fsf@vostro.fn.ogness.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Hi, This optimization is broken. The main concern here: Is it possible that lru_add_drain_all() _would_ have drained pagevec X, but then aborted because another lru_add_drain_all() is underway and that other task will _not_ drain pagevec X? I claim the answer is yes! My suggested changes are inline below. I attached a litmus test to verify it. On 2020-05-22, Peter Zijlstra wrote: > On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote: >> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) >> */ >> void lru_add_drain_all(void) >> { > >> + static unsigned int lru_drain_gen; >> static struct cpumask has_work; >> + static DEFINE_MUTEX(lock); >> + int cpu, this_gen; >> >> /* >> * Make sure nobody triggers this path before mm_percpu_wq is fully >> @@ -725,21 +735,48 @@ void lru_add_drain_all(void) >> if (WARN_ON(!mm_percpu_wq)) >> return; >> An smp_mb() is needed here. /* * Guarantee the pagevec counter stores visible by * this CPU are visible to other CPUs before loading * the current drain generation. */ smp_mb(); >> + this_gen = READ_ONCE(lru_drain_gen); >> + smp_rmb(); > > this_gen = smp_load_acquire(&lru_drain_gen); >> >> mutex_lock(&lock); >> >> /* >> + * (C) Exit the draining operation if a newer generation, from another >> + * lru_add_drain_all(), was already scheduled for draining. Check (A). >> */ >> + if (unlikely(this_gen != lru_drain_gen)) >> goto done; >> > >> + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); >> + smp_wmb(); Instead of smp_wmb(), this needs to be a full memory barrier. /* * Guarantee the new drain generation is stored before * loading the pagevec counters. */ smp_mb(); > You can leave this smp_wmb() out and rely on the smp_mb() implied by > queue_work_on()'s test_and_set_bit(). > >> cpumask_clear(&has_work); >> - >> for_each_online_cpu(cpu) { >> struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); >> > > While you're here, do: > > s/cpumask_set_cpu/__&/ > >> @@ -766,7 +803,7 @@ void lru_add_drain_all(void) >> { >> lru_add_drain(); >> } >> -#endif >> +#endif /* CONFIG_SMP */ >> >> /** >> * release_pages - batched put_page() For the litmus test: 1:rx=0 (P1 did not see the pagevec counter) 2:rx=1 (P2 _would_ have seen the pagevec counter) 2:ry1=0 /\ 2:ry2=1 (P2 aborted due to optimization) Changing the smp_mb() back to smp_wmb() in P1 and removing the smp_mb() in P2 represents this patch. And it shows that sometimes P2 will abort even though it would have drained the pagevec and P1 did not drain the pagevec. This is ugly as hell. And there maybe other memory barrier types to make it pretty. But as is, memory barriers are missing. John Ogness --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=lru_add_drain_all.litmus C lru_add_drain_all (* * x is a pagevec counter * y is @lru_drain_gen * z is @lock *) { } P0(int *x) { // mark pagevec for draining WRITE_ONCE(*x, 1); } P1(int *x, int *y, int *z) { int rx; int rz; // mutex_lock(&lock); rz = cmpxchg_acquire(z, 0, 1); if (rz == 0) { // WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); WRITE_ONCE(*y, 1); // guarantee lru_drain_gen store before loading pagevec smp_mb(); // if (pagevec_count(...)) rx = READ_ONCE(*x); // mutex_unlock(&lock); rz = cmpxchg_release(z, 1, 2); } } P2(int *x, int *y, int *z) { int rx; int ry1; int ry2; int rz; // the pagevec counter as visible now to this CPU rx = READ_ONCE(*x); // guarantee pagevec store before loading lru_drain_gen smp_mb(); // this_gen = READ_ONCE(lru_drain_gen); smp_rmb(); ry1 = smp_load_acquire(y); // mutex_lock(&lock) - acquired after P1 rz = cmpxchg_acquire(z, 2, 3); if (rz == 2) { // if (unlikely(this_gen != lru_drain_gen)) ry2 = READ_ONCE(*y); } } locations [x; y; z] exists (1:rx=0 /\ 2:rx=1 /\ 2:ry1=0 /\ 2:ry2=1) --=-=-=--