Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752875AbcD1OhQ (ORCPT ); Thu, 28 Apr 2016 10:37:16 -0400 Received: from mail-wm0-f43.google.com ([74.125.82.43]:34303 "EHLO mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752759AbcD1OhN (ORCPT ); Thu, 28 Apr 2016 10:37:13 -0400 Date: Thu, 28 Apr 2016 16:37:10 +0200 From: Michal Hocko To: Dave Hansen Cc: "Odzioba, Lukasz" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Shutemov, Kirill" , "Anaczkowski, Lukasz" Subject: Re: mm: pages are not freed from lru_add_pvecs after process termination Message-ID: <20160428143710.GC31496@dhcp22.suse.cz> References: <5720F2A8.6070406@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5720F2A8.6070406@intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2358 Lines: 63 On Wed 27-04-16 10:11:04, Dave Hansen wrote: > On 04/27/2016 10:01 AM, Odzioba, Lukasz wrote: [...] > > 1. We need some statistics on the number and total *SIZES* of all pages > > in the lru pagevecs. It's too opaque now. > > 2. We need to make darn sure we drain the lru pagevecs before failing > > any kind of allocation. lru_add_drain_all is unfortunatelly too costly (especially on large machines). You are right that failing an allocation with a lot of cached pages is less than suboptimal though. So maybe we can do it from the slow path after the first round of direct reclaim failed to allocate anything. Something like the following: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dd65d9fb76a..0743c58c2e9d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3559,6 +3559,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum compact_result compact_result; int compaction_retries = 0; int no_progress_loops = 0; + bool drained_lru = false; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3667,6 +3668,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + if (!drained_lru) { + drained_lru = true; + lru_add_drain_all(); + } + /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto noretry; The downside would be that we really depend on the WQ to make any progress here. If we are really out of memory then we are screwed so we would need a flush_work_timeout() or something else that would guarantee maximum timeout. That something else might be to stop using WQ and move the flushing into the IRQ context. Not for free too but at least not dependant on having some memory to make a progress. > > 3. We need some way to drain the lru pagevecs directly. Maybe the buddy > > pcp lists too. > > 4. We need to make sure that a zone_reclaim_mode=0 system still drains > > too. > > 5. The VM stats and their updates are now related to how often > > drain_zone_pages() gets run. That might be interacting here too. > > 6. Perhaps don't use the LRU pagevecs for large pages. It limits the > severity of the problem. 7. Hook into vmstat and flush from there? This would drain them periodically but it would also introduce an undeterministic interference as well. -- Michal Hocko SUSE Labs