Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1300621imm; Tue, 5 Jun 2018 12:13:41 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ30VeL5r6KtZmInwIQbiZ6JNZC6Gpfs9b0R8cdkdEONsbcZN6hlFX0YAbq+PCQg6stRS8y X-Received: by 2002:a17:902:2c83:: with SMTP id n3-v6mr27330426plb.211.1528226021451; Tue, 05 Jun 2018 12:13:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528226021; cv=none; d=google.com; s=arc-20160816; b=cM9Z5bUc/zjFZ/jHwVXXnamc05sg+lfwSBNpPQCP24fNUzh/anM5UsSHENcKq4JHgU UwxITh3+aJNkz6YUqmHuLaHNCwdjT6uc/vMNf1HgMCL+gqF4vY0vehEikk4/SxafDzgE Qg9Bcn3jles9nlJ1QrmvIsmyKkz4cvTnF2bVO3XCQpsNfhI4A8+ymSzbq9pmwuzE1CLf f7S7bVTHRES7F7B9dKlbhLE+Wr4uZTAlCjesVCC+dU1ZndpeNCp/ugWKsApGb5kajfSP gbn8UFnR81MG+f7KOaXfT6IxKOk2TcJ2KnbOxZUGeSH29M1pj5mWlLRMwGR+BiUOx/yX YiPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=EdfGgyjhEO+GGnIGus5q5qH7L+whj/F1XQ49FAcpSxY=; b=aLlsZ0K/4QP7lzoCjBBiZ1zDKGKgyhIP6oo3s3xiN8gtZup6Ytct6FwZjf1tXxBpGe zZ5a9qExbklBKV4HtQSDW6ft22UPrwoidZ6hXDlxr9GZ+UHNWSIx3WfGbd0pN9XOI78z 4OMlm8w2o2tydNMxkIpESEjPH7NaJH7ZXwgvXFvLPN3OTG119BI6UWF0F3Ei2WMevrfL q3BKe7zaxWgFNLkVZYyx7XUThgulh41sbGxrAvzNAf85YOW6cwShAgbceJbckzmDzXW7 pnS18DYWSX1C69/vziGKYPeaaU/GzcyAP6+/N0Cz20YEhLO45Ly4piMy5r+Ck+DKQ6Gb GRNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g25-v6si1167436pgn.613.2018.06.05.12.13.26; Tue, 05 Jun 2018 12:13:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752477AbeFETMt (ORCPT + 99 others); Tue, 5 Jun 2018 15:12:49 -0400 Received: from outbound-smtp11.blacknight.com ([46.22.139.106]:44487 "EHLO outbound-smtp11.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752449AbeFETMs (ORCPT ); Tue, 5 Jun 2018 15:12:48 -0400 Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id CE7621C1E56 for ; Tue, 5 Jun 2018 20:12:46 +0100 (IST) Received: (qmail 29316 invoked from network); 5 Jun 2018 19:12:46 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.237.73]) by 81.17.254.9 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 5 Jun 2018 19:12:46 -0000 Date: Tue, 5 Jun 2018 20:12:46 +0100 From: Mel Gorman To: Dave Hansen Cc: Andrew Morton , mhocko@kernel.org, vbabka@suse.cz, Aaron Lu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mremap: Avoid TLB flushing anonymous pages that are not in swap cache Message-ID: <20180605191245.3owve7gfut22tyob@techsingularity.net> References: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 05, 2018 at 11:18:18AM -0700, Dave Hansen wrote: > On 06/05/2018 10:13 AM, Mel Gorman wrote: > > The anonymous page race fix is overkill for two reasons. Pages that are not > > in the swap cache are not going to be issued for IO and if a stale TLB entry > > is used, the write still occurs on the same physical page. Any race with > > mmap replacing the address space is handled by mmap_sem. As anonymous pages > > are often dirty, it can mean that mremap always has to flush even when it is > > not necessary. > > This looks fine to me. One nit on the description: I found myself > wondering if we skip the flush under the ptl where the flush is > eventually done. That code is a bit out of the context, so we don't see > it in the patch. > That's fair enough. I updated part of the changelog to read This patch special cases anonymous pages to only flush ranges under the page table lock if the page is in swap cache and can be potentially queued for IO. Note that the full flush of the range being mremapped is still flushed so TLB flushes are not eliminated entirely. Does that work for you? > We have two modes of flushing during move_ptes(): > 1. The flush_tlb_range() while holding the ptl in move_ptes(). > 2. A flush_tlb_range() at the end of move_table_tables(), driven by > 'need_flush' which will be set any time move_ptes() does *not* flush. > > This patch broadens the scope where move_ptes() does not flush and > shifts the burden to the flush inside move_table_tables(). > > Right? > Yes. While this does not eliminate TLB flushes, it reduces the number considerably as we potentially are replacing one-flush-per-LATENCY_LIMIT with one flush. > Other minor nits: > > > +/* Returns true if a TLB must be flushed before PTL is dropped */ > > +static bool should_force_flush(pte_t *pte) > > +{ > > I usually try to make the non-pte-modifying functions take a pte_t > instead of 'pte_t *' to make it obvious that there no modification going > on. Any reason not to do that here? > No, it was just a minor saving on stack usage. > > + if (!trylock_page(page)) > > + return true; > > + is_swapcache = PageSwapCache(page); > > + unlock_page(page); > > + > > + return is_swapcache; > > +} > > I was hoping we didn't have to go as far as taking the page lock, but I > guess the proof is in the pudding that this tradeoff is worth it. > In the interest of full disclosure, the feedback I have from the customer is based on a patch that modifies the LATENCY_LIMIT. This helped but did not eliminate the problem. This was potentially a better solution but I wanted review first. I'm waiting on confirmation this definitely behaves better. > BTW, do you want to add a tiny comment about why we do the > trylock_page()? I assume it's because we don't want to wait on finding > an exact answer: we just assume it is in the swap cache if the page is > locked and flush regardless. It's really because calling lock_page while holding a spinlock is eventually going to ruin your day. Thanks. -- Mel Gorman SUSE Labs