Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752047AbbLLRAr (ORCPT ); Sat, 12 Dec 2015 12:00:47 -0500 Received: from gum.cmpxchg.org ([85.214.110.215]:49420 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751361AbbLLRAq (ORCPT ); Sat, 12 Dec 2015 12:00:46 -0500 Date: Sat, 12 Dec 2015 12:00:32 -0500 From: Johannes Weiner To: Tetsuo Handa Cc: mhocko@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, rientjes@google.com, oleg@redhat.com, kwalker@redhat.com, cl@linux.com, akpm@linux-foundation.org, vdavydov@parallels.com, skozina@redhat.com, mgorman@suse.de, riel@redhat.com, arekm@maven.pl Subject: Re: [PATCH v4] mm,oom: Add memory allocation watchdog kernel thread. Message-ID: <20151212170032.GB7107@cmpxchg.org> References: <201512130033.ABH90650.FtFOMOFLVOJHQS@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201512130033.ABH90650.FtFOMOFLVOJHQS@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2412 Lines: 54 On Sun, Dec 13, 2015 at 12:33:04AM +0900, Tetsuo Handa wrote: > +Currently, when something went wrong inside memory allocation request, > +the system will stall with either 100% CPU usage (if memory allocating > +tasks are doing busy loop) or 0% CPU usage (if memory allocating tasks > +are waiting for file data to be flushed to storage). > +But /proc/sys/kernel/hung_task_warnings is not helpful because memory > +allocating tasks unlikely sleep in uninterruptible state for > +/proc/sys/kernel/hung_task_timeout_secs seconds. Yes, this is very annoying. Other tasks in the system get dumped out as they are blocked for too long, but not the allocating task itself as it's busy looping. That being said, I'm not entirely sure why we need daemon to do this, which then requires us to duplicate allocation state to task_struct. There is no scenario where the allocating task is not moving at all anymore, right? So can't we dump the allocation state from within the allocator and leave the rest to the hung task detector? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 05ef7fb..fbfc581 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3004,6 +3004,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum migrate_mode migration_mode = MIGRATE_ASYNC; bool deferred_compaction = false; int contended_compaction = COMPACT_CONTENDED_NONE; + unsigned int nr_tries = 0; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3033,6 +3034,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; retry: + if (++nr_retries % 1000 == 0) + warn_alloc_failed(gfp_mask, order, "Potential GFP deadlock\n"); + if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); Basing it on nr_retries alone might be too crude and take too long when each cycle spends time waiting for IO. However, if that is a problem we can make it time-based instead, like your memalloc_timer, to catch tasks that spend too much time in a single alloc attempt. > + start_memalloc_timer(alloc_mask, order); > page = __alloc_pages_slowpath(alloc_mask, order, &ac); > + stop_memalloc_timer(alloc_mask); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/