Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp509487ybk; Wed, 20 May 2020 05:24:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6QSWNeZkzw1N4fcpciJlyQP6NJ3oJKfZ/Np1roNRXjXuppA3nXH2vhUZiWzrZmIYUDvpq X-Received: by 2002:a17:906:7c59:: with SMTP id g25mr3644809ejp.393.1589977461873; Wed, 20 May 2020 05:24:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589977461; cv=none; d=google.com; s=arc-20160816; b=mbQ7coSxrnKvC8vjZGDXZyMWjMJhi8DKCa4zJeHCgmD/QOD6TowEsKAdoAg86cFv3N EnECLLB7GZD+MpplXJEtTarmX47FW9Qg9rbJJ6HiZ8F5M5wWntsXTrxSyOvpS/UGBvhU l0Qb1CldHRYoj/vONUrKRbMTiPtKYB54tJtUjf5srZqC5FDKgtO27FH+clMKCLrMF0wt qt7w8LOuFWlwJWGEjYKywPi4IhuPM9foPp+liKjcAmqAwR/4HiZJDE868/lpcuePPKNF Yfh5yhwhtxqhH5KS8O+fBbsrP3Ul2dB7Aos4ysSAHoAe2ZrVRj7/geADWeU1izJDAfI2 otOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=fWlELSchNbK6X8WY/LIEnxOeuh7ut5C9pJiR7sPpmL4=; b=tneafIR90WQUj5oH6cxGTqc/VtfMAvZmRCeHlBGI4wTfyG10JRhoLp/4fRSAEb+n5h hXXbSU2Wtz4HQa2pfnW9vFdZQbnVylIxHrzeFYzEF5kNki/gsRb8g8AmD1Qe0V2qKlWL hOvSN+k1S1ERTiTyeS24U7tdCU/8H0NMuPBgImU5e6y4RSOFYKslq5ym9MToBFIikyW8 412FC23FjHSEvXGDV8AFC7K1Zd7THXCHl9PiGbzuiikvnQBWRy1BtAdrHa5ZP70i4pV5 2LXjV+btaNUkaXhEBBmI8yFplHMnLk/zNLo9tuF8gO/xnNHe12/qxh9e4xcjvyUMXqPF 9F7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=rUVxQPbC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id be13si1481022edb.146.2020.05.20.05.23.58; Wed, 20 May 2020 05:24:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@yandex-team.ru header.s=default header.b=rUVxQPbC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex-team.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726747AbgETMWV (ORCPT + 99 others); Wed, 20 May 2020 08:22:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726224AbgETMWU (ORCPT ); Wed, 20 May 2020 08:22:20 -0400 Received: from forwardcorp1o.mail.yandex.net (forwardcorp1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::193]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C607C061A0E for ; Wed, 20 May 2020 05:22:20 -0700 (PDT) Received: from mxbackcorp2j.mail.yandex.net (mxbackcorp2j.mail.yandex.net [IPv6:2a02:6b8:0:1619::119]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id A042A2E15B6; Wed, 20 May 2020 15:22:18 +0300 (MSK) Received: from sas2-32987e004045.qloud-c.yandex.net (sas2-32987e004045.qloud-c.yandex.net [2a02:6b8:c08:b889:0:640:3298:7e00]) by mxbackcorp2j.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id RHZB8Jy9ID-MGd0Xjff; Wed, 20 May 2020 15:22:18 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1589977338; bh=fWlELSchNbK6X8WY/LIEnxOeuh7ut5C9pJiR7sPpmL4=; h=In-Reply-To:Message-ID:From:Date:References:To:Subject:Cc; b=rUVxQPbCa/3movkIPbwhwJovtico490x0cE7XdLEsHZQnLaUYW+pTRlQRpRn8/iwU SwZNBoMivSRY67RYcygcwS9TDCquOKNDRXOInxSLhkrIVEOhF2EzUU/7eUwTK2ozNJ Y+kQrnHFNE8mhbwf9qHviU4i9RzIkhZGNAH3lRtI= Authentication-Results: mxbackcorp2j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b081:704::1:9]) by sas2-32987e004045.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id g13KAeIdih-MGWCL2v2; Wed, 20 May 2020 15:22:16 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API To: "Ahmed S. Darwish" , Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , Andrew Morton , linux-mm@kvack.org References: <20200519214547.352050-1-a.darwish@linutronix.de> <20200519214547.352050-3-a.darwish@linutronix.de> From: Konstantin Khlebnikov Message-ID: <706c697c-c951-e0c3-40f8-f6e429b2226a@yandex-team.ru> Date: Wed, 20 May 2020 15:22:15 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200519214547.352050-3-a.darwish@linutronix.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-CA Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/05/2020 00.45, Ahmed S. Darwish wrote: > Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls") > implemented an optimization mechanism to exit the to-be-started LRU > drain operation (name it A) if another drain operation *started and > finished* while (A) was blocked on the LRU draining mutex. > > This was done through a seqcount latch, which is an abuse of its > semantics: > > 1. Seqcount latching should be used for the purpose of switching > between two storage places with sequence protection to allow > interruptible, preemptible writer sections. The optimization > mechanism has absolutely nothing to do with that. > > 2. The used raw_write_seqcount_latch() has two smp write memory > barriers to always insure one consistent storage place out of the > two storage places available. This extra smp_wmb() is redundant for > the optimization use case. > > Beside the API abuse, the semantics of a latch sequence counter was > force fitted into the optimization. What was actually meant is to track > generations of LRU draining operations, where "current lru draining > generation = x" implies that all generations 0 < n <= x are already > *scheduled* for draining. > > Remove the conceptually-inappropriate seqcount latch usage and manually > implement the optimization using a counter and SMP memory barriers. Well, I thought it fits perfectly =) Maybe it's worth to add helpers with appropriate semantics? This is pretty common pattern. > > Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com > Signed-off-by: Ahmed S. Darwish > --- > mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index bf9a79fed62d..d6910eeed43d 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) > */ > void lru_add_drain_all(void) > { > - static seqcount_t seqcount = SEQCNT_ZERO(seqcount); > - static DEFINE_MUTEX(lock); > + /* > + * lru_drain_gen - Current generation of pages that could be in vectors > + * > + * (A) Definition: lru_drain_gen = x implies that all generations > + * 0 < n <= x are already scheduled for draining. > + * > + * This is an optimization for the highly-contended use case where a > + * user space workload keeps constantly generating a flow of pages > + * for each CPU. > + */ > + static unsigned int lru_drain_gen; > static struct cpumask has_work; > - int cpu, seq; > + static DEFINE_MUTEX(lock); > + int cpu, this_gen; > > /* > * Make sure nobody triggers this path before mm_percpu_wq is fully > @@ -725,21 +735,48 @@ void lru_add_drain_all(void) > if (WARN_ON(!mm_percpu_wq)) > return; > > - seq = raw_read_seqcount_latch(&seqcount); > + /* > + * (B) Cache the LRU draining generation number > + * > + * smp_rmb() ensures that the counter is loaded before the mutex is > + * taken. It pairs with the smp_wmb() inside the mutex critical section > + * at (D). > + */ > + this_gen = READ_ONCE(lru_drain_gen); > + smp_rmb(); > > mutex_lock(&lock); > > /* > - * Piggyback on drain started and finished while we waited for lock: > - * all pages pended at the time of our enter were drained from vectors. > + * (C) Exit the draining operation if a newer generation, from another > + * lru_add_drain_all(), was already scheduled for draining. Check (A). > */ > - if (__read_seqcount_retry(&seqcount, seq)) > + if (unlikely(this_gen != lru_drain_gen)) > goto done; > > - raw_write_seqcount_latch(&seqcount); > + /* > + * (D) Increment generation number > + * > + * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical > + * section. > + * > + * This pairing must be done here, before the for_each_online_cpu loop > + * below which drains the page vectors. > + * > + * Let x, y, and z represent some system CPU numbers, where x < y < z. > + * Assume CPU #z is is in the middle of the for_each_online_cpu loop > + * below and has already reached CPU #y's per-cpu data. CPU #x comes > + * along, adds some pages to its per-cpu vectors, then calls > + * lru_add_drain_all(). > + * > + * If the paired smp_wmb() below is done at any later step, e.g. after > + * the loop, CPU #x will just exit at (C) and miss flushing out all of > + * its added pages. > + */ > + WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1); > + smp_wmb(); > > cpumask_clear(&has_work); > - > for_each_online_cpu(cpu) { > struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); > > @@ -766,7 +803,7 @@ void lru_add_drain_all(void) > { > lru_add_drain(); > } > -#endif > +#endif /* CONFIG_SMP */ > > /** > * release_pages - batched put_page() >