Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Thu, 24 May 2018 12:50:11 +0200
From:   Jan Kara <jack@suse.cz>
To:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc:     linux-kernel@vger.kernel.org, stable@vger.kernel.org,
        Mel Gorman <mgorman@techsingularity.net>,
        Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 4.4 50/92] mm: filemap: avoid unnecessary calls to
 lock_page when waiting for IO to complete during a read
Message-ID: <20180524105011.jkmjrmoyqtogtgnn@quack2.suse.cz>
References: <20180524093159.286472249@linuxfoundation.org>
 <20180524093204.290399449@linuxfoundation.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180524093204.290399449@linuxfoundation.org>
User-Agent: NeoMutt/20170421 (1.8.2)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.

Just one objection: Why does stable care about this (and the previous
patch)? I've checked the stable queue and I don't see anything that would
have these patches as a prerequisite. And on their own, they are only
cleanups without substantial gains.

								Honza

> 
> ------------------
> 
> From: Mel Gorman <mgorman@techsingularity.net>
> 
> commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
> 
> In the generic read paths the kernel looks up a page in the page cache
> and if it's up to date, it is used.  If not, the page lock is acquired
> to wait for IO to complete and then check the page.  If multiple
> processes are waiting on IO, they all serialise against the lock and
> duplicate the checks.  This is unnecessary.
> 
> The page lock in itself does not give any guarantees to the callers
> about the page state as it can be immediately truncated or reclaimed
> after the page is unlocked.  It's sufficient to wait_on_page_locked and
> then continue if the page is up to date on wakeup.
> 
> It is possible that a truncated but up-to-date page is returned but the
> reference taken during read prevents it disappearing underneath the
> caller and the data is still valid if PageUptodate.
> 
> The overall impact is small as even if processes serialise on the lock,
> the lock section is tiny once the IO is complete.  Profiles indicated
> that unlock_page and friends are generally a tiny portion of a
> read-intensive workload.  An artificial test was created that had
> instances of dd access a cache-cold file on an ext4 filesystem and
> measure how long the read took.
> 
> paralleldd
>                                     4.4.0                 4.4.0
>                                   vanilla             avoidlock
> Amean    Elapsd-1          5.28 (  0.00%)        5.15 (  2.50%)
> Amean    Elapsd-4          5.29 (  0.00%)        5.17 (  2.12%)
> Amean    Elapsd-7          5.28 (  0.00%)        5.18 (  1.78%)
> Amean    Elapsd-12         5.20 (  0.00%)        5.33 ( -2.50%)
> Amean    Elapsd-21         5.14 (  0.00%)        5.21 ( -1.41%)
> Amean    Elapsd-30         5.30 (  0.00%)        5.12 (  3.38%)
> Amean    Elapsd-48         5.78 (  0.00%)        5.42 (  6.21%)
> Amean    Elapsd-79         6.78 (  0.00%)        6.62 (  2.46%)
> Amean    Elapsd-110        9.09 (  0.00%)        8.99 (  1.15%)
> Amean    Elapsd-128       10.60 (  0.00%)       10.43 (  1.66%)
> 
> The impact is small but intuitively, it makes sense to avoid unnecessary
> calls to lock_page.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  mm/filemap.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1581,6 +1581,15 @@ find_page:
>  					index, last_index - index);
>  		}
>  		if (!PageUptodate(page)) {
> +			/*
> +			 * See comment in do_read_cache_page on why
> +			 * wait_on_page_locked is used to avoid unnecessarily
> +			 * serialisations and why it's safe.
> +			 */
> +			wait_on_page_locked_killable(page);
> +			if (PageUptodate(page))
> +				goto page_ok;
> +
>  			if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
>  					!mapping->a_ops->is_partially_uptodate)
>  				goto page_not_up_to_date;
> @@ -2253,12 +2262,52 @@ filler:
>  	if (PageUptodate(page))
>  		goto out;
>  
> +	/*
> +	 * Page is not up to date and may be locked due one of the following
> +	 * case a: Page is being filled and the page lock is held
> +	 * case b: Read/write error clearing the page uptodate status
> +	 * case c: Truncation in progress (page locked)
> +	 * case d: Reclaim in progress
> +	 *
> +	 * Case a, the page will be up to date when the page is unlocked.
> +	 *    There is no need to serialise on the page lock here as the page
> +	 *    is pinned so the lock gives no additional protection. Even if the
> +	 *    the page is truncated, the data is still valid if PageUptodate as
> +	 *    it's a race vs truncate race.
> +	 * Case b, the page will not be up to date
> +	 * Case c, the page may be truncated but in itself, the data may still
> +	 *    be valid after IO completes as it's a read vs truncate race. The
> +	 *    operation must restart if the page is not uptodate on unlock but
> +	 *    otherwise serialising on page lock to stabilise the mapping gives
> +	 *    no additional guarantees to the caller as the page lock is
> +	 *    released before return.
> +	 * Case d, similar to truncation. If reclaim holds the page lock, it
> +	 *    will be a race with remove_mapping that determines if the mapping
> +	 *    is valid on unlock but otherwise the data is valid and there is
> +	 *    no need to serialise with page lock.
> +	 *
> +	 * As the page lock gives no additional guarantee, we optimistically
> +	 * wait on the page to be unlocked and check if it's up to date and
> +	 * use the page if it is. Otherwise, the page lock is required to
> +	 * distinguish between the different cases. The motivation is that we
> +	 * avoid spurious serialisations and wakeups when multiple processes
> +	 * wait on the same page for IO to complete.
> +	 */
> +	wait_on_page_locked(page);
> +	if (PageUptodate(page))
> +		goto out;
> +
> +	/* Distinguish between all the cases under the safety of the lock */
>  	lock_page(page);
> +
> +	/* Case c or d, restart the operation */
>  	if (!page->mapping) {
>  		unlock_page(page);
>  		page_cache_release(page);
>  		goto repeat;
>  	}
> +
> +	/* Someone else locked and filled the page in a very small window */
>  	if (PageUptodate(page)) {
>  		unlock_page(page);
>  		goto out;
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR