Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751892AbcCBNWi (ORCPT ); Wed, 2 Mar 2016 08:22:38 -0500 Received: from mx2.suse.de ([195.135.220.15]:42718 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751061AbcCBNWh (ORCPT ); Wed, 2 Mar 2016 08:22:37 -0500 Subject: Re: [PATCH 0/3] OOM detection rework v4 To: Michal Hocko References: <1450203586-10959-1-git-send-email-mhocko@kernel.org> <20160203132718.GI6757@dhcp22.suse.cz> <20160229203502.GW16930@dhcp22.suse.cz> <20160301133846.GF9461@dhcp22.suse.cz> <56D5DBF0.2020004@suse.cz> <20160302122410.GD26686@dhcp22.suse.cz> Cc: Hugh Dickins , Joonsoo Kim , Andrew Morton , Linus Torvalds , Johannes Weiner , Mel Gorman , David Rientjes , Tetsuo Handa , Hillf Danton , KAMEZAWA Hiroyuki , linux-mm@kvack.org, LKML From: Vlastimil Babka Message-ID: <56D6E918.5030109@suse.cz> Date: Wed, 2 Mar 2016 14:22:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160302122410.GD26686@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1687 Lines: 37 On 03/02/2016 01:24 PM, Michal Hocko wrote: > On Tue 01-03-16 19:14:08, Vlastimil Babka wrote: >> >> I was under impression that similar checks to compaction_suitable() were >> done also in compact_finished(), to stop compacting if memory got low due to >> parallel activity. But I guess it was a patch from Joonsoo that didn't get >> merged. >> >> My only other theory so far is that watermark checks fail in >> __isolate_free_page() when we want to grab page(s) as migration targets. > > yes this certainly contributes to the problem and triggered in my case a > lot: > $ grep __isolate_free_page trace.log | wc -l > 181 > $ grep __alloc_pages_direct_compact: trace.log | wc -l > 7 > >> I would suggest enabling all compaction tracepoint and the migration >> tracepoint. Looking at the trace could hopefully help faster than >> going one trace_printk() per attempt. > > OK, here we go with both watermarks checks removed and hopefully all the > compaction related tracepoints enabled: > echo 1 > /debug/tracing/events/compaction/enable > echo 1 > /debug/tracing/events/migrate/mm_migrate_pages/enable The trace shows only 4 direct compaction attempts with order=2. The rest is order=9, i.e. THP, which has little chances of success under such pressure, and thus those failures and defers. The few order=2 attempts appear all successful (defer_reset is called). So it seems your system is mostly fine with just reclaim, and there's little need for order-2 compaction, and that's also why you can't reproduce the OOMs. So I'm afraid we'll learn nothing here, and looks like Hugh will have to try those watermark check adjustments/removals and/or provide the same kind of trace.