Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1975096imm; Thu, 24 May 2018 03:53:19 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpc2PTwqVL5I9x+KkXP/TtjLviQLkrxR6E1YQQiumOvj4yx03DZvNfcXf6xai843dh+jB00 X-Received: by 2002:a17:902:bb84:: with SMTP id m4-v6mr6752267pls.339.1527159199450; Thu, 24 May 2018 03:53:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527159199; cv=none; d=google.com; s=arc-20160816; b=C6a/Y5SEdCPxADH+q4bl/FmMBr/1Nrd0ndC2AcwpX2MkJsEz2pGsrzEAVPpOm/hPeS g/D1IBfeH4svK8dSWgppqLkWg5MJqiez6LVAf3lO1AeRS0Fe/qzBkRzZtf5dkJLdzQoQ hvcKJJNDmCpsVw8/TGlYazYSq4Tt3tih2g4fjXTDBICakcOShXEXKzC6okl6AyPg87bi ylxewrF8rkwkFX/C6fc6DqulGzWqLMr8tQRvT1iSQ1ccVZ96SW5F/2lrXArSOBZgdRAt hYjNRer0mj0MgmFI6hfRvqEyeZHAufqlLigY4l+UKy89+XhQuJEALNGdT/iuGBi9ZpF9 SOoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=qkxXghPjKmeKly0clD5d/iNVb5RRRfT6SleOWjoXccs=; b=yDUwh4qZCBUP7QiZJdjmDxBim56cIKGAvFhKp21yNGfla/FrZbXPW0KRTN8I16L7gx Gsv78OetibkijlDA5b8JDfYNvLvQ1qbr/Efw7Ntp8b9kwHtvSRswpR49yJ3HrRerzoiP FROXuAWG3nVL++sM0KPHLLZSxLE7gHsC+MwhVCiIVwPqLbYTrZNXFN069X4t1oeqhEhb ikjFnAytsrLB2mNtBxbznS4idfqEEb2joMFT4G042HwUXk2mNEcUWwLohhNOOE8W1cAS u3O+C+PdLkMqI6NtiGdH0M4RrDIDthFdQEibqSJKYDlgSaMPwYOwXR+/lfj7ZgnwDc/R ZiOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v20-v6si19959272pff.363.2018.05.24.03.53.04; Thu, 24 May 2018 03:53:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1032436AbeEXKvT (ORCPT + 99 others); Thu, 24 May 2018 06:51:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:35268 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030472AbeEXKuN (ORCPT ); Thu, 24 May 2018 06:50:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D669AAF20; Thu, 24 May 2018 10:50:11 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 48F201E050C; Thu, 24 May 2018 12:50:11 +0200 (CEST) Date: Thu, 24 May 2018 12:50:11 +0200 From: Jan Kara To: Greg Kroah-Hartman Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Mel Gorman , Jan Kara , Hugh Dickins , Andrew Morton , Linus Torvalds , Mel Gorman Subject: Re: [PATCH 4.4 50/92] mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read Message-ID: <20180524105011.jkmjrmoyqtogtgnn@quack2.suse.cz> References: <20180524093159.286472249@linuxfoundation.org> <20180524093204.290399449@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180524093204.290399449@linuxfoundation.org> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please let me know. Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that would have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains. Honza > > ------------------ > > From: Mel Gorman > > commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream. > > In the generic read paths the kernel looks up a page in the page cache > and if it's up to date, it is used. If not, the page lock is acquired > to wait for IO to complete and then check the page. If multiple > processes are waiting on IO, they all serialise against the lock and > duplicate the checks. This is unnecessary. > > The page lock in itself does not give any guarantees to the callers > about the page state as it can be immediately truncated or reclaimed > after the page is unlocked. It's sufficient to wait_on_page_locked and > then continue if the page is up to date on wakeup. > > It is possible that a truncated but up-to-date page is returned but the > reference taken during read prevents it disappearing underneath the > caller and the data is still valid if PageUptodate. > > The overall impact is small as even if processes serialise on the lock, > the lock section is tiny once the IO is complete. Profiles indicated > that unlock_page and friends are generally a tiny portion of a > read-intensive workload. An artificial test was created that had > instances of dd access a cache-cold file on an ext4 filesystem and > measure how long the read took. > > paralleldd > 4.4.0 4.4.0 > vanilla avoidlock > Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) > Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) > Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) > Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) > Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) > Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) > Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) > Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) > Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) > Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%) > > The impact is small but intuitively, it makes sense to avoid unnecessary > calls to lock_page. > > Signed-off-by: Mel Gorman > Reviewed-by: Jan Kara > Cc: Hugh Dickins > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds > Signed-off-by: Mel Gorman > Signed-off-by: Greg Kroah-Hartman > > --- > mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 49 insertions(+) > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1581,6 +1581,15 @@ find_page: > index, last_index - index); > } > if (!PageUptodate(page)) { > + /* > + * See comment in do_read_cache_page on why > + * wait_on_page_locked is used to avoid unnecessarily > + * serialisations and why it's safe. > + */ > + wait_on_page_locked_killable(page); > + if (PageUptodate(page)) > + goto page_ok; > + > if (inode->i_blkbits == PAGE_CACHE_SHIFT || > !mapping->a_ops->is_partially_uptodate) > goto page_not_up_to_date; > @@ -2253,12 +2262,52 @@ filler: > if (PageUptodate(page)) > goto out; > > + /* > + * Page is not up to date and may be locked due one of the following > + * case a: Page is being filled and the page lock is held > + * case b: Read/write error clearing the page uptodate status > + * case c: Truncation in progress (page locked) > + * case d: Reclaim in progress > + * > + * Case a, the page will be up to date when the page is unlocked. > + * There is no need to serialise on the page lock here as the page > + * is pinned so the lock gives no additional protection. Even if the > + * the page is truncated, the data is still valid if PageUptodate as > + * it's a race vs truncate race. > + * Case b, the page will not be up to date > + * Case c, the page may be truncated but in itself, the data may still > + * be valid after IO completes as it's a read vs truncate race. The > + * operation must restart if the page is not uptodate on unlock but > + * otherwise serialising on page lock to stabilise the mapping gives > + * no additional guarantees to the caller as the page lock is > + * released before return. > + * Case d, similar to truncation. If reclaim holds the page lock, it > + * will be a race with remove_mapping that determines if the mapping > + * is valid on unlock but otherwise the data is valid and there is > + * no need to serialise with page lock. > + * > + * As the page lock gives no additional guarantee, we optimistically > + * wait on the page to be unlocked and check if it's up to date and > + * use the page if it is. Otherwise, the page lock is required to > + * distinguish between the different cases. The motivation is that we > + * avoid spurious serialisations and wakeups when multiple processes > + * wait on the same page for IO to complete. > + */ > + wait_on_page_locked(page); > + if (PageUptodate(page)) > + goto out; > + > + /* Distinguish between all the cases under the safety of the lock */ > lock_page(page); > + > + /* Case c or d, restart the operation */ > if (!page->mapping) { > unlock_page(page); > page_cache_release(page); > goto repeat; > } > + > + /* Someone else locked and filled the page in a very small window */ > if (PageUptodate(page)) { > unlock_page(page); > goto out; > > -- Jan Kara SUSE Labs, CR