Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753062AbbBXUu1 (ORCPT ); Tue, 24 Feb 2015 15:50:27 -0500 Received: from mail-ie0-f172.google.com ([209.85.223.172]:35582 "EHLO mail-ie0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752997AbbBXUuW (ORCPT ); Tue, 24 Feb 2015 15:50:22 -0500 Date: Tue, 24 Feb 2015 12:50:20 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Rafael Aquini cc: linux-mm@kvack.org, Andrew Morton , jweiner@redhat.com, riel@redhat.com, linux-kernel@vger.kernel.org, loberman@redhat.com, lwoodman@redhat.com, raghavendra.kt@linux.vnet.ibm.com, Linus Torvalds Subject: Re: [PATCH] mm: readahead: get back a sensible upper limit In-Reply-To: <9cc2b63100622f5fd17fa5e4adc59233a2b41877.1424779443.git.aquini@redhat.com> Message-ID: References: <9cc2b63100622f5fd17fa5e4adc59233a2b41877.1424779443.git.aquini@redhat.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3477 Lines: 83 On Tue, 24 Feb 2015, Rafael Aquini wrote: > commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for memoryless NUMA > nodes and limit readahead pages")[1] imposed 2 mB hard limits to readahead by > changing max_sane_readahead() to sort out a corner case where a thread runs on > amemoryless NUMA node and it would have its readahead capability disabled. > > The aforementioned change, despite fixing that corner case, is detrimental to > other ordinary workloads that memory map big files and rely on readahead() or > posix_fadvise(WILLNEED) syscalls to get most of the file populating system's cache. > > Laurence Oberman reports, via https://bugzilla.redhat.com/show_bug.cgi?id=1187940, > slowdowns up to 3-4 times when changes for mentioned commit [1] got introduced in > RHEL kenrel. We also have an upstream bugzilla opened for similar complaint: > https://bugzilla.kernel.org/show_bug.cgi?id=79111 > > This patch brings back the old behavior of max_sane_readahead() where we used to > consider NR_INACTIVE_FILE and NR_FREE_PAGES pages to derive a sensible / adujstable > readahead upper limit. This patch also keeps the 2 mB ceiling scheme introduced by > commit [1] to avoid regressions on CONFIG_HAVE_MEMORYLESS_NODES systems, > where numa_mem_id(), by any buggy reason, might end up not returning > the 'local memory' for a memoryless node CPU. > > Reported-by: Laurence Oberman > Tested-by: Laurence Oberman > Signed-off-by: Rafael Aquini > --- > mm/readahead.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/readahead.c b/mm/readahead.c > index 9356758..73f934d 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -203,6 +203,7 @@ out: > return ret; > } > > +#define MAX_READAHEAD ((512 * 4096) / PAGE_CACHE_SIZE) > /* > * Chunk the readahead into 2 megabyte units, so that we don't pin too much > * memory at once. > @@ -217,7 +218,7 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, > while (nr_to_read) { > int err; > > - unsigned long this_chunk = (2 * 1024 * 1024) / PAGE_CACHE_SIZE; > + unsigned long this_chunk = MAX_READAHEAD; > > if (this_chunk > nr_to_read) > this_chunk = nr_to_read; > @@ -232,14 +233,15 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, > return 0; > } > > -#define MAX_READAHEAD ((512*4096)/PAGE_CACHE_SIZE) > /* > * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a > * sensible upper limit. > */ > unsigned long max_sane_readahead(unsigned long nr) > { > - return min(nr, MAX_READAHEAD); > + return min(nr, max(MAX_READAHEAD, > + (node_page_state(numa_mem_id(), NR_INACTIVE_FILE) + > + node_page_state(numa_mem_id(), NR_FREE_PAGES)) / 2)); > } > > /* I think Linus suggested avoiding the complexity here regarding any heuristics involving the per-node memory state, specifically in http://www.kernelhub.org/?msg=413344&p=2, and suggested the MAX_READAHEAD size. If we are to go forward with this revert, then I believe the change to numa_mem_id() will fix the memoryless node issue as pointed out in that thread. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/