Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755631AbdDFSwz (ORCPT ); Thu, 6 Apr 2017 14:52:55 -0400 Received: from mail-it0-f51.google.com ([209.85.214.51]:36840 "EHLO mail-it0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752385AbdDFSwr (ORCPT ); Thu, 6 Apr 2017 14:52:47 -0400 Date: Thu, 6 Apr 2017 11:52:43 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Mel Gorman , Michal Hocko cc: Hugh Dickins , Andrew Morton , Tejun Heo , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Is it safe for kthreadd to drain_all_pages? In-Reply-To: <20170406130614.a6ygueggpwseqysd@techsingularity.net> Message-ID: References: <20170406130614.a6ygueggpwseqysd@techsingularity.net> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1680 Lines: 40 On Thu, 6 Apr 2017, Mel Gorman wrote: > On Wed, Apr 05, 2017 at 01:59:49PM -0700, Hugh Dickins wrote: > > Hi Mel, > > > > I suspect that it's not safe for kthreadd to drain_all_pages(); > > but I haven't studied flush_work() etc, so don't really know what > > I'm talking about: hoping that you will jump to a realization. > > > > You're right, it's not safe. If kthreadd is creating the workqueue > thread to do the drain and it'll recurse into itself. > > > 4.11-rc has been giving me hangs after hours of swapping load. At > > first they looked like memory leaks ("fork: Cannot allocate memory"); > > but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh" > > before looking at /proc/meminfo one time, and the stat_refresh stuck > > in D state, waiting for completion of flush_work like many kworkers. > > kthreadd waiting for completion of flush_work in drain_all_pages(). > > > > It's asking itself to do work in all likelihood. > > > Patch below has been running well for 36 hours now: > > a bit too early to be sure, but I think it's time to turn to you. > > > > I think the patch is valid but like Michal, would appreciate if you > could run the patch he linked to see if it also side-steps the same > problem. > > Good spot! Thank you both for explanations, and direction to the two "drainging" patches. I've put those on to 4.11-rc5 (and double-checked that I've taken mine off), and set it going. Fine so far but much too soon to tell - mine did 56 hours with clean /var/log/messages before I switched, so I demand no less of Michal's :). I'll report back tomorrow and the day after (unless badness appears sooner once I'm home). Hugh