Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755070AbcCWMpg (ORCPT ); Wed, 23 Mar 2016 08:45:36 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36294 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbcCWMp1 (ORCPT ); Wed, 23 Mar 2016 08:45:27 -0400 Date: Wed, 23 Mar 2016 13:45:24 +0100 From: Michal Hocko To: Rik van Riel Cc: Ebru Akagunduz , linux-mm@kvack.org, hughd@google.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, n-horiguchi@ah.jp.nec.com, aarcange@redhat.com, iamjoonsoo.kim@lge.com, gorcunov@openvz.org, linux-kernel@vger.kernel.org, mgorman@suse.de, rientjes@google.com, vbabka@suse.cz, aneesh.kumar@linux.vnet.ibm.com, hannes@cmpxchg.org, boaz@plexistor.com Subject: Re: [PATCH v4 2/2] mm, thp: avoid unnecessary swapin in khugepaged Message-ID: <20160323124523.GF7059@dhcp22.suse.cz> References: <1458497259-12753-1-git-send-email-ebru.akagunduz@gmail.com> <1458497259-12753-3-git-send-email-ebru.akagunduz@gmail.com> <20160321153637.GE21248@dhcp22.suse.cz> <1458674476.24206.5.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1458674476.24206.5.camel@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2711 Lines: 66 On Tue 22-03-16 15:21:16, Rik van Riel wrote: > On Mon, 2016-03-21 at 16:36 +0100, Michal Hocko wrote: > > On Sun 20-03-16 20:07:39, Ebru Akagunduz wrote: > > > > > > Currently khugepaged makes swapin readahead to improve > > > THP collapse rate. This patch checks vm statistics > > > to avoid workload of swapin, if unnecessary. So that > > > when system under pressure, khugepaged won't consume > > > resources to swapin. > > OK, so you want to disable the optimization when under the memory > > pressure. That sounds like a good idea in general. > >? > > > @@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct > > > mm_struct *mm, > > > ? goto out; > > > ? } > > > ? > > > - __collapse_huge_page_swapin(mm, vma, address, pmd); > > > + swap = get_mm_counter(mm, MM_SWAPENTS); > > > + curr_allocstall = sum_vm_event(ALLOCSTALL); > > > + /* > > > + ?* When system under pressure, don't swapin readahead. > > > + ?* So that avoid unnecessary resource consuming. > > > + ?*/ > > > + if (allocstall == curr_allocstall && swap != 0) > > > + __collapse_huge_page_swapin(mm, vma, address, > > > pmd); > > this criteria doesn't really make much sense to me. So we are > > checking > > whether there was the direct reclaim invoked since some point in time > > (more on that below) and we take that as a signal of a strong memory > > pressure, right? What if that was quite some time ago? What if we > > didn't > > have a single direct reclaim but the kswapd was busy the whole time. > > Or > > what if the allocstall was from a different numa node? > > Do you have a measure in mind that the code should test > against, instead? vmpressure provides a reclaim pressure feedback. I am not sure it could be used here, though. > I don't think we want page cache turnover to prevent > khugepaged collapsing THPs, but if the system gets > to the point where kswapd is doing pageout IO, or > swapout IO, or kswapd cannot keep up, we should > probably slow down khugepaged. I agree. Would using gfp_mask & ~___GFP_DIRECT_RECLAIM allocation requests for the opportunistic swapin be something to try out? If the kswapd doesn't keep up with the load to the point when we have to enter the direct reclaim then it doesn't really make sense to increase the memory pressure but additional direct reclaim. > If another NUMA node is under significant memory > pressure, we probably want the programs from that > node to be able to do some allocations from this > node, rather than have khugepaged consume the memory. This is hard to tell because those tasks might be bound to that node and won't leave it. Anyway I just wanted to point out that relying to a global counter is rather dubious. -- Michal Hocko SUSE Labs