Received: by 2002:a05:6a10:6006:0:0:0:0 with SMTP id w6csp508880pxa; Thu, 27 Aug 2020 08:13:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwJv4VXJjro7CEsg6TSRTOmQDhviFMEgibNxOzU/P0aWnCTFSbF1tn7GKjQ+SiSCwuMWyQ9 X-Received: by 2002:a17:906:eb4f:: with SMTP id mc15mr20916151ejb.435.1598541212314; Thu, 27 Aug 2020 08:13:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598541212; cv=none; d=google.com; s=arc-20160816; b=WAK8XP0kioK56XvcMspxF52V8X+1FoLYSJYWatFeHzMj9B6eblYAteK9QxW8AR0Z1N C6/maLU4x/Lpi1+yCBPP91tBTCHEIAVQDUaRGy4rtR2GOUtT0BrfsSQ7sh24uu8l5F+O uNUKs2qQlOEc/4CldOjICQkRmrcRmUZxUVQ8iMOewdDb/MsOAdGX5k5lIYFussXKoeA8 ZWqdqD80oyySHSoUH0CJEimkoFJmFC1DzJBHooEwGEEpygysLSgSZ+LPEVLyM+XXUE5R 55fjV9BQFi99r9oDPjO0EZHQ9ZNkSip8hmVhIKO4JFbuYJfh2IHPOYU+brIHecTS5kM5 x0Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=LMqgdQtrL9KkVo4R+kGlzmmWtYDHEwOz0TcrSWf2Bzs=; b=aGtaAItD/1KAOOfoxMY3ubD9xeOStz28LiuO4XNwdHdgWHTMYp5nAnKOYWn3gmDLd9 vFiJfrMtyqu2DhHJACestISml8Aw8SA3+Hdr9Ejc54BAoKwcSdajbFdhX0ANftWjBKY7 aLOWpMoFyKuQhol/eB48Eu5btgKsPtRedBy5XTBuYnU6aIfQ83MQO/1hUAPxbhM7O1a9 b5IVcnAys6AL23DK1MFxJzMMClM8GTJLTsxEFsfuboDHAHjXBHHSLO6tzkDi8DfCthx7 yR5EPM09+W7Zsq5oppp+lfDKhisCQqdXsSst1fXic4o+T/ml1btlGtaf4YGg93gdG1Ft inzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Y1mV2kVw; dkim=neutral (no key) header.i=@vger.kernel.org header.s=2020e header.b=BY9gNkBf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qx20si1557852ejb.214.2020.08.27.08.13.10; Thu, 27 Aug 2020 08:13:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Y1mV2kVw; dkim=neutral (no key) header.i=@vger.kernel.org header.s=2020e header.b=BY9gNkBf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728004AbgH0PMa (ORCPT + 99 others); Thu, 27 Aug 2020 11:12:30 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:38190 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728791AbgH0Lkm (ORCPT ); Thu, 27 Aug 2020 07:40:42 -0400 From: "Ahmed S. Darwish" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1598528434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMqgdQtrL9KkVo4R+kGlzmmWtYDHEwOz0TcrSWf2Bzs=; b=Y1mV2kVwChCqd5pR9WedpTE72GW/HIHzwg+yD0BWSpblxs2ic0azaxKuXUy3gyC+Oa2XXX Si2RUYXhhI/T3i2o2B4QacLg/UnxS8sdKxf8LlKn570IiIyQdg9ng4Y9Heb2p0TPvZVL6n jI+4fMqZm+gBOJr0fxFym6xjXi5XB3AKyBiYchYjFx/r829PO9iU8Ji4c7lvjguPWkUP9z CgCeJu3FFdCeBrakJ01HgHdJGdEnrpLXQpBndrURaITvAC18HfmEdlL+FxjRngx5ZPW8Gy 3Mdis3lAU1hIgXUkEykndkqBrGaHCFfbpoqAvkG15S3VrXioqYBED2fi+UAK8g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1598528434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMqgdQtrL9KkVo4R+kGlzmmWtYDHEwOz0TcrSWf2Bzs=; b=BY9gNkBfUXyrwK+3tyeSP9m9sFYi/gXpQxizhLtUR5Fn7mSlRDFRSOq9wdVeULrT8igA3H 0PQoXUDyQu1yDKAA== To: Peter Zijlstra , Ingo Molnar , Will Deacon , Andrew Morton , Konstantin Khlebnikov , linux-mm@kvack.org Cc: Thomas Gleixner , "Sebastian A. Siewior" , LKML , "Ahmed S. Darwish" Subject: [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t latching API Date: Thu, 27 Aug 2020 13:40:38 +0200 Message-Id: <20200827114044.11173-3-a.darwish@linutronix.de> In-Reply-To: <20200827114044.11173-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> <20200827114044.11173-1-a.darwish@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls") implemented an optimization mechanism to exit the to-be-started LRU drain operation (name it A) if another drain operation *started and finished* while (A) was blocked on the LRU draining mutex. This was done through a seqcount_t latch, which is an abuse of its semantics: 1. seqcount_t latching should be used for the purpose of switching between two storage places with sequence protection to allow interruptible, preemptible, writer sections. The referenced optimization mechanism has absolutely nothing to do with that. 2. The used raw_write_seqcount_latch() has two SMP write memory barriers to insure one consistent storage place out of the two storage places available. A full memory barrier is required instead: to guarantee that the pagevec counter stores visible by local CPU are visible to other CPUs -- before loading the current drain generation. Beside the seqcount_t API abuse, the semantics of a latch sequence counter was force-fitted into the referenced optimization. What was meant is to track "generations" of LRU draining operations, where "global lru draining generation = x" implies that all generations 0 < n <= x are already *scheduled* for draining -- thus nothing needs to be done if the current generation number n <= x. Remove the conceptually-inappropriate seqcount_t latch usage. Manually implement the referenced optimization using a counter and SMP memory barriers. Note, while at it, use the non-atomic variant of cpumask_set_cpu(), __cpumask_set_cpu(), due to the already existing mutex protection. Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com Link: https://lkml.kernel.org/r/87y2pg9erj.fsf@vostro.fn.ogness.net Signed-off-by: Ahmed S. Darwish --- mm/swap.c | 65 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 54 insertions(+), 11 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index d16d65d9b4e0..a1ec807e325d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -763,10 +763,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) */ void lru_add_drain_all(void) { - static seqcount_t seqcount = SEQCNT_ZERO(seqcount); - static DEFINE_MUTEX(lock); + /* + * lru_drain_gen - Global pages generation number + * + * (A) Definition: global lru_drain_gen = x implies that all generations + * 0 < n <= x are already *scheduled* for draining. + * + * This is an optimization for the highly-contended use case where a + * user space workload keeps constantly generating a flow of pages for + * each CPU. + */ + static unsigned int lru_drain_gen; static struct cpumask has_work; - int cpu, seq; + static DEFINE_MUTEX(lock); + unsigned cpu, this_gen; /* * Make sure nobody triggers this path before mm_percpu_wq is fully @@ -775,21 +785,54 @@ void lru_add_drain_all(void) if (WARN_ON(!mm_percpu_wq)) return; - seq = raw_read_seqcount_latch(&seqcount); + /* + * Guarantee pagevec counter stores visible by this CPU are visible to + * other CPUs before loading the current drain generation. + */ + smp_mb(); + + /* + * (B) Locally cache global LRU draining generation number + * + * The read barrier ensures that the counter is loaded before the mutex + * is taken. It pairs with smp_mb() inside the mutex critical section + * at (D). + */ + this_gen = smp_load_acquire(&lru_drain_gen); mutex_lock(&lock); /* - * Piggyback on drain started and finished while we waited for lock: - * all pages pended at the time of our enter were drained from vectors. + * (C) Exit the draining operation if a newer generation, from another + * lru_add_drain_all(), was already scheduled for draining. Check (A). */ - if (__read_seqcount_retry(&seqcount, seq)) + if (unlikely(this_gen != lru_drain_gen)) goto done; - raw_write_seqcount_latch(&seqcount); + /* + * (D) Increment global generation number + * + * Pairs with smp_load_acquire() at (B), outside of the critical + * section. Use a full memory barrier to guarantee that the new global + * drain generation number is stored before loading pagevec counters. + * + * This pairing must be done here, before the for_each_online_cpu loop + * below which drains the page vectors. + * + * Let x, y, and z represent some system CPU numbers, where x < y < z. + * Assume CPU #z is is in the middle of the for_each_online_cpu loop + * below and has already reached CPU #y's per-cpu data. CPU #x comes + * along, adds some pages to its per-cpu vectors, then calls + * lru_add_drain_all(). + * + * If the paired barrier is done at any later step, e.g. after the + * loop, CPU #x will just exit at (C) and miss flushing out all of its + * added pages. + */ + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); + smp_mb(); cpumask_clear(&has_work); - for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); @@ -801,7 +844,7 @@ void lru_add_drain_all(void) need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); queue_work_on(cpu, mm_percpu_wq, work); - cpumask_set_cpu(cpu, &has_work); + __cpumask_set_cpu(cpu, &has_work); } } @@ -816,7 +859,7 @@ void lru_add_drain_all(void) { lru_add_drain(); } -#endif +#endif /* CONFIG_SMP */ /** * release_pages - batched put_page() -- 2.28.0