Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1336917imm; Tue, 5 Jun 2018 12:54:41 -0700 (PDT) X-Google-Smtp-Source: ADUXVKI8RMg9H5KzQmOXMM9GAUR4W58U31sa3ZkjZuYPncODsJQgUZNsogaqlZ1JuAqXi+SYsNCI X-Received: by 2002:a17:902:2983:: with SMTP id h3-v6mr43441plb.232.1528228481451; Tue, 05 Jun 2018 12:54:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528228481; cv=none; d=google.com; s=arc-20160816; b=NPatEd2G8Tg7MlA68eslGjooj/Hnd2CZoUbGKjb8dQR1IWZw9GJFjOp7k+amyeehgc 84JI0HyP4cBs3mX7rnKLUsu4iJ73Tgu80MsmQToOoo7Ieky8nlWMduclNmsFyg08AlZ5 o2bKfT/tMAGDiDInUo6CamE7jpGwcf0Zw2/+g6e2pKw5A8L4NGiOyMdqL86ZHTFY+7g2 DgWVP10KpgumXTF3x6szxBSCEUS8VwSr1vzz1q2M97oDuh4dwZ5x2CKUB2o7mcl0wLns EzX9KL+QQtf6mceca8ylSEvICqMOAsgN+jpGALqyMxqgf4+vdbSL0qNV5e1LEZKF5Qc8 dClQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=OQblS/7bEvaS242WEnjDdpR7FMHm4jgoAiM0YOpXbrI=; b=0/BpsE9xOLU7lNq1LopBE0+uHETDIS1q7f9JK4ydf4o0qUBKYKwB8dRCBiewcssqlg j53lEmCM8DbKDe1fEWvF1ioSSDHHWf4f1B0FHWv6fs/6O88lz6rxxcUo5mTzOus8E1KP 55gzMRPqa4MtRxLOTTGtrjED+xB38SWKGweqZslP7Jw97sKoUCX/GIETrXx+SGb4m4Hh N4GXHzPrPsnpGvnqERe7qIG7ke6MyNopy/EeqFEsqanBjahYY3W/MaXPf7nW7x67FPT1 +aSQ1CzomUCJNdqmFuvncE7Rrb32TLcEy4IILRtEy4xTgxi0HUFj+yJoXOB09Nt+jp0G 1Aaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=AkydIZH5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8-v6si20285894pgs.181.2018.06.05.12.54.26; Tue, 05 Jun 2018 12:54:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=AkydIZH5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752134AbeFETyC (ORCPT + 99 others); Tue, 5 Jun 2018 15:54:02 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:41079 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925AbeFETyB (ORCPT ); Tue, 5 Jun 2018 15:54:01 -0400 Received: by mail-pl0-f68.google.com with SMTP id az12-v6so2173310plb.8 for ; Tue, 05 Jun 2018 12:54:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=OQblS/7bEvaS242WEnjDdpR7FMHm4jgoAiM0YOpXbrI=; b=AkydIZH5CgpR3StuJT9dx5/2N3ZL4QN7XkxVhrSHmF39okLqyDSs+DZ8g9dgIbBGhD fwoh5aObHGziDDGC1G6JHD3sLmz7mXcBkadwaarYiTBFK9kbmLXZhxDUPaFOlDLE5nhd 0f5DAbCmUkRZf4FhfSC3P2lC8a1dpvwmY+L29TD9P8MwsQmiT45InlNrFbviCe/rKvRK 87hW6AnxhtiCgG/sTmD+a/onMoj61LMgSYXhEKgcCnHRj/cxCUnU4/hJqPEe1Jw9/JT3 +MEE+GflCg10BEFj+LThHeNh8cs76n4TesAaH/tRvgxHXTakOfmFUQWNMAETZ3PbGpe9 Ylqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=OQblS/7bEvaS242WEnjDdpR7FMHm4jgoAiM0YOpXbrI=; b=HXTsRhoV8U0Zie4rTRYugeGKou/OIU8tnodvGE8p8zPKC4cP1XmzQGlOV7qs6z2Wge 1/J++FJmhGHpKHd5qyEojPaJ7tdVjlEdMdqtyyP/FqWrCzVnTwW6qONHIuNPILbS1Z8c /AAf7c9CgYCY+SUcTLCNNiBkwtOhi6scLje+ev/oYezWL+S5oEPBe7z9ZNmvaT3J2YcS I9TKblxEKC7M9dZ6glBE26kh13iqBJf3xEtdKwnV8meJafn8ZcztDhW4KrndbIfAx3To VQTEpokB0cR1+hII7quwpLPcvQf/MXl/D+Ni1kU8xlDS5bhRUFmJYrtFTjNfanOv5NNt lFJA== X-Gm-Message-State: APt69E34HrcIpVDw2k2/mGema/MAFDANvbbI/bcBkP0zan3Bkitdhmeb Ei19l4H/1jHVRxN2QN6ijeE= X-Received: by 2002:a17:902:205:: with SMTP id 5-v6mr9505plc.301.1528228440633; Tue, 05 Jun 2018 12:54:00 -0700 (PDT) Received: from [10.2.101.129] ([208.91.2.2]) by smtp.gmail.com with ESMTPSA id z7-v6sm66553314pgp.74.2018.06.05.12.53.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jun 2018 12:53:59 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH] mremap: Avoid TLB flushing anonymous pages that are not in swap cache From: Nadav Amit In-Reply-To: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> Date: Tue, 5 Jun 2018 12:53:57 -0700 Cc: Andrew Morton , Michal Hocko , Vlastimil Babka , Aaron Lu , Dave Hansen , Linux Kernel Mailing List , linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> To: Mel Gorman X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mel Gorman wrote: > Commit 5d1904204c99 ("mremap: fix race between mremap() and page = cleanning") > fixed races between mremap and other operations for both file-backed = and > anonymous mappings. The file-backed was the most critical as it = allowed the > possibility that data could be changed on a physical page after = page_mkclean > returned which could trigger data loss or data integrity issues. A = customer > reported that the cost of the TLBs for anonymous regressions was = excessive > and resulting in a 30-50% drop in performance overall since this = commit > on a microbenchmark. Unfortunately I neither have access to the = test-case > nor can I describe what it does other than saying that mremap = operations > dominate heavily. >=20 > The anonymous page race fix is overkill for two reasons. Pages that = are not > in the swap cache are not going to be issued for IO and if a stale TLB = entry > is used, the write still occurs on the same physical page. Any race = with > mmap replacing the address space is handled by mmap_sem. As anonymous = pages > are often dirty, it can mean that mremap always has to flush even when = it is > not necessary. >=20 > This patch special cases anonymous pages to only flush if the page is = in > swap cache and can be potentially queued for IO. It uses the page lock = to > serialise against any potential reclaim. If the page is added to the = swap > cache on the reclaim side after the page lock is dropped on the mremap > side then reclaim will call try_to_unmap_flush_dirty() before issuing > any IO so there is no data integrity issue. This means that in the = common > case where a workload avoids swap entirely that mremap is a much = cheaper > operation due to the lack of TLB flushes. >=20 > Using another testcase that simply calls mremap heavily with varying = number > of threads, it was found that very broadly speaking that TLB = shootdowns > were reduced by 31% on average throughout the entire test case but = your > milage will vary. >=20 > Signed-off-by: Mel Gorman > --- > mm/mremap.c | 42 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 37 insertions(+), 5 deletions(-) >=20 > diff --git a/mm/mremap.c b/mm/mremap.c > index 049470aa1e3e..d26c5a00fd9d 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include >=20 > #include > #include > @@ -112,6 +113,41 @@ static pte_t move_soft_dirty_pte(pte_t pte) > return pte; > } >=20 > +/* Returns true if a TLB must be flushed before PTL is dropped */ > +static bool should_force_flush(pte_t *pte) > +{ > + bool is_swapcache; > + struct page *page; > + > + if (!pte_present(*pte) || !pte_dirty(*pte)) > + return false; > + > + /* > + * If we are remapping a dirty file PTE, make sure to flush TLB > + * before we drop the PTL for the old PTE or we may race with > + * page_mkclean(). > + */ > + page =3D pte_page(*pte); > + if (page_is_file_cache(page)) > + return true; > + > + /* > + * For anonymous pages, only flush swap cache pages that could > + * be unmapped and queued for swap since = flush_tlb_batched_pending was > + * last called. Reclaim itself takes care that the TLB is = flushed > + * before IO is queued. If a page is not in swap cache and a = stale TLB > + * is used before mremap is complete then the write hits the = same > + * physical page and there is no lost data loss. Check under the > + * page lock to avoid any potential race with reclaim. > + */ > + if (!trylock_page(page)) > + return true; > + is_swapcache =3D PageSwapCache(page); > + unlock_page(page); > + > + return is_swapcache; > +} While I do not have a specific reservation regarding the logic, I find = the current TLB invalidation scheme hard to follow and inconsistent. I guess should_force_flush() can be extended and used more commonly to make = things clearer. To be more specific and to give an example: Can should_force_flush() be = used in zap_pte_range() to set the force_flush instead of the current code? if (!PageAnon(page)) { if (pte_dirty(ptent)) { force_flush =3D 1; ... }