Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp519918imm; Wed, 6 Jun 2018 01:24:07 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKJk/0GoI6/wZs1LeS+EgR7qSan9/iZmc466GLMteGOW6PJhUX/DQm6+Z4TOQCWIBlL3TGH X-Received: by 2002:a63:2f04:: with SMTP id v4-v6mr1779800pgv.33.1528273447142; Wed, 06 Jun 2018 01:24:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528273447; cv=none; d=google.com; s=arc-20160816; b=JuSW8XWte6ulUXXmC553eR+on6iaD1t72j5tZrVK9Hv0pO03tRRihcW3mkMLvC4dxU 75i5ZR/Nv7gclAjjvfNd4H56Se++Ilak6zZc1zs9hxqTH8CnthN7fcRMdLTYW3SVvf+d nU+6EC0MVEdp7dB2Wx0/xcFd2pJKdhzXxTaB9HmNDJJ4lbP5bZ9FYW8n9JLa2oaNlVgl wzEPBWFHYX/OIptp9LlouMLgCKl7Cnu4O8K1swbwzDZ454cljYOjebCQSoXK5fvJkNM8 8xIGLd2kR3xnYG8Petl5ZdTCBUr+d9DADjlcjDPW/X+tyiqgeYKNcKk1gPNBVrEdqPsP 8GBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=YxQiKWtTxAXURXVsctQhF9QZloa3DbyD3BkDbNxCoPA=; b=kIuJ4rYVHakerFCHE8tOiD+sGQi0p/weuH1ZL1H0QsGSqGqTzDdIt4qaU8RLHMWIFR RClA1w38jQwXv44/GHb8P/ufkwtf+yTmUtK8oYI+f+5kMf9sZx17sG7J2ui9f7Cauwd0 Cx7TYeI3T/snJgiZnRPPzPyzQQ4BUCbecGpX377sXCYOP5C+UVEKq669A1LI9BoSI3wf 1EeQCaXQhvLHHMhKSsarB9olLy4CKFBPCna5e7o6CcT/cdyDHMqV09T8+oIgXzIXYvqv kigp+Yg4B9Dqa5BRP1/sBCBwAkFZMaKLLZ/45+ctUgPPuRqfE2861+4cgfKrSVCEJnB4 NJRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f8-v6si31212194plt.35.2018.06.06.01.23.52; Wed, 06 Jun 2018 01:24:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932547AbeFFIWL (ORCPT + 99 others); Wed, 6 Jun 2018 04:22:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:37863 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932389AbeFFIWK (ORCPT ); Wed, 6 Jun 2018 04:22:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9034BAEE6; Wed, 6 Jun 2018 08:22:08 +0000 (UTC) Date: Wed, 6 Jun 2018 10:22:07 +0200 From: Michal Hocko To: Mel Gorman Cc: Dave Hansen , Andrew Morton , vbabka@suse.cz, Aaron Lu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mremap: Avoid TLB flushing anonymous pages that are not in swap cache Message-ID: <20180606082207.GC32433@dhcp22.suse.cz> References: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> <20180605191245.3owve7gfut22tyob@techsingularity.net> <20180605195140.afc7xzgbre26m76l@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180605195140.afc7xzgbre26m76l@techsingularity.net> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 05-06-18 20:51:40, Mel Gorman wrote: [...] > mremap: Avoid excessive TLB flushing for anonymous pages that are not in swap cache > > Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") > fixed races between mremap and other operations for both file-backed and > anonymous mappings. The file-backed was the most critical as it allowed the > possibility that data could be changed on a physical page after page_mkclean > returned which could trigger data loss or data integrity issues. A customer > reported that the cost of the TLBs for anonymous regressions was excessive > and resulting in a 30-50% drop in performance overall since this commit > on a microbenchmark. Unfortunately I neither have access to the test-case > nor can I describe what it does other than saying that mremap operations > dominate heavily. > > The anonymous page race fix is overkill for two reasons. Pages that are > not in the swap cache are not going to be issued for IO and if a stale TLB > entry is used, the write still occurs on the same physical page. Any race > with mmap replacing the address space is handled by mmap_sem. As anonymous > pages are often dirty, it can mean that mremap always has to flush even > when it is not necessary. > > This patch special cases anonymous pages to only flush ranges under the > page table lock if the page is in swap cache and can be potentially queued > for IO. Note that the full flush of the range being mremapped is still > flushed so TLB flushes are not eliminated entirely. > > It uses the page lock to serialise against any potential reclaim. If the > page is added to the swap cache on the reclaim side after the page lock is > dropped on the mremap side then reclaim will call try_to_unmap_flush_dirty() > before issuing any IO so there is no data integrity issue. This means that > in the common case where a workload avoids swap entirely that mremap is > a much cheaper operation due to the lack of TLB flushes. > > Using another testcase that simply calls mremap heavily with varying number > of threads, it was found that very broadly speaking that TLB shootdowns > were reduced by 31% on average throughout the entire test case but your > milage will vary. LGTM and it would be great to add some perf numbers for the specific workload which triggered this (a mremap heavy workload which is real unfortunately). > Signed-off-by: Mel Gorman Acked-by: Michal Hocko > --- > mm/mremap.c | 45 ++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 40 insertions(+), 5 deletions(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index 049470aa1e3e..5b9767b0446e 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -112,6 +113,44 @@ static pte_t move_soft_dirty_pte(pte_t pte) > return pte; > } > > +/* Returns true if a TLB must be flushed before PTL is dropped */ > +static bool should_force_flush(pte_t pte) > +{ > + bool is_swapcache; > + struct page *page; > + > + if (!pte_present(pte) || !pte_dirty(pte)) > + return false; > + > + /* > + * If we are remapping a dirty file PTE, make sure to flush TLB > + * before we drop the PTL for the old PTE or we may race with > + * page_mkclean(). > + */ > + page = pte_page(pte); > + if (page_is_file_cache(page)) > + return true; > + > + /* > + * For anonymous pages, only flush swap cache pages that could > + * be unmapped and queued for swap since flush_tlb_batched_pending was > + * last called. Reclaim itself takes care that the TLB is flushed > + * before IO is queued. If a page is not in swap cache and a stale TLB > + * is used before mremap is complete then the write hits the same > + * physical page and there is no lost data loss. > + * > + * Check under the page lock to avoid any potential race with reclaim. > + * trylock is necessary as spinlocks are currently held. In the unlikely > + * event of contention, flush the TLB to be safe. > + */ > + if (!trylock_page(page)) > + return true; > + is_swapcache = PageSwapCache(page); > + unlock_page(page); > + > + return is_swapcache; > +} > + > static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > unsigned long old_addr, unsigned long old_end, > struct vm_area_struct *new_vma, pmd_t *new_pmd, > @@ -163,15 +202,11 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > pte = ptep_get_and_clear(mm, old_addr, old_pte); > /* > - * If we are remapping a dirty PTE, make sure > - * to flush TLB before we drop the PTL for the > - * old PTE or we may race with page_mkclean(). > - * > * This check has to be done after we removed the > * old PTE from page tables or another thread may > * dirty it after the check and before the removal. > */ > - if (pte_present(pte) && pte_dirty(pte)) > + if (should_force_flush(pte)) > force_flush = true; > pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); > pte = move_soft_dirty_pte(pte); > > -- > Mel Gorman > SUSE Labs -- Michal Hocko SUSE Labs