Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757476Ab2HTQzF (ORCPT ); Mon, 20 Aug 2012 12:55:05 -0400 Received: from cobra.newdream.net ([66.33.216.30]:34664 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752243Ab2HTQzA (ORCPT ); Mon, 20 Aug 2012 12:55:00 -0400 Date: Mon, 20 Aug 2012 09:54:59 -0700 (PDT) From: Sage Weil X-X-Sender: sage@cobra.newdream.net To: Mel Gorman cc: davem@davemloft.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, neilb@suse.de, a.p.zijlstra@chello.nl, michaelc@cs.wisc.edu, emunson@mgebm.net, eric.dumazet@gmail.com, sebastian@breakpoint.cc, cl@linux.com, akpm@linux-foundation.org, torvalds@linux-foundation.org Subject: Re: regression with poll(2) In-Reply-To: <20120820090443.GA3275@suse.de> Message-ID: References: <20120820090443.GA3275@suse.de> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4872 Lines: 138 On Mon, 20 Aug 2012, Mel Gorman wrote: > On Sun, Aug 19, 2012 at 11:49:31AM -0700, Sage Weil wrote: > > I've bisected and identified this commit: > > > > netvm: propagate page->pfmemalloc to skb > > > > The skb->pfmemalloc flag gets set to true iff during the slab allocation > > of data in __alloc_skb that the the PFMEMALLOC reserves were used. If the > > packet is fragmented, it is possible that pages will be allocated from the > > PFMEMALLOC reserve without propagating this information to the skb. This > > patch propagates page->pfmemalloc from pages allocated for fragments to > > the skb. > > > > Signed-off-by: Mel Gorman > > Acked-by: David S. Miller > > Cc: Neil Brown > > Cc: Peter Zijlstra > > Cc: Mike Christie > > Cc: Eric B Munson > > Cc: Eric Dumazet > > Cc: Sebastian Andrzej Siewior > > Cc: Mel Gorman > > Cc: Christoph Lameter > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > > > Ok, thanks. > > > I've retested several times and confirmed that this change leads to the > > breakage, and also confirmed that reverting it on top of -rc1 also fixes > > the problem. > > > > I've also added some additional instrumentation to my code and confirmed > > that the process is blocking on poll(2) while netstat is reporting > > data available on the socket. > > > > What can I do to help track this down? > > > > Can the following patch be tested please? It is reported to fix an fio > regression that may be similar to what you are experiencing but has not > been picked up yet. This patch appears to resolve things for me as well, at least after a couple of passes. I'll let you know if I see any further problems come up with more testing. Thanks! sage > > ---8<--- > From: Alex Shi > Subject: [PATCH] mm: correct page->pfmemalloc to fix deactivate_slab regression > > commit cfd19c5a9ec (mm: only set page->pfmemalloc when > ALLOC_NO_WATERMARKS was used) try to narrow down page->pfmemalloc > setting, but it missed some places the pfmemalloc should be set. > > So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS > cause incorrect deactivate_slab() on our core2 server: > > 64.73% fio [kernel.kallsyms] [k] _raw_spin_lock > | > --- _raw_spin_lock > | > |---0.34%-- deactivate_slab > | __slab_alloc > | kmem_cache_alloc > | | > > That causes our fio sync write performance has 40% regression. > > This patch move the checking in get_page_from_freelist, that resolved > this issue. > > Signed-off-by: Alex Shi > --- > mm/page_alloc.c | 21 +++++++++++---------- > 1 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 009ac28..07f1924 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1928,6 +1928,17 @@ this_zone_full: > zlc_active = 0; > goto zonelist_scan; > } > + > + if (page) > + /* > + * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was > + * necessary to allocate the page. The expectation is > + * that the caller is taking steps that will free more > + * memory. The caller should avoid the page being used > + * for !PFMEMALLOC purposes. > + */ > + page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS); > + > return page; > } > > @@ -2389,14 +2400,6 @@ rebalance: > zonelist, high_zoneidx, nodemask, > preferred_zone, migratetype); > if (page) { > - /* > - * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was > - * necessary to allocate the page. The expectation is > - * that the caller is taking steps that will free more > - * memory. The caller should avoid the page being used > - * for !PFMEMALLOC purposes. > - */ > - page->pfmemalloc = true; > goto got_pg; > } > } > @@ -2569,8 +2572,6 @@ retry_cpuset: > page = __alloc_pages_slowpath(gfp_mask, order, > zonelist, high_zoneidx, nodemask, > preferred_zone, migratetype); > - else > - page->pfmemalloc = false; > > trace_mm_page_alloc(page, order, gfp_mask, migratetype); > > -- > 1.7.5.4 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/