Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934698AbdDFNGp (ORCPT ); Thu, 6 Apr 2017 09:06:45 -0400 Received: from outbound-smtp10.blacknight.com ([46.22.139.15]:48466 "EHLO outbound-smtp10.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934599AbdDFNGR (ORCPT ); Thu, 6 Apr 2017 09:06:17 -0400 Date: Thu, 6 Apr 2017 14:06:14 +0100 From: Mel Gorman To: Hugh Dickins Cc: Andrew Morton , Tejun Heo , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Is it safe for kthreadd to drain_all_pages? Message-ID: <20170406130614.a6ygueggpwseqysd@techsingularity.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1191 Lines: 34 On Wed, Apr 05, 2017 at 01:59:49PM -0700, Hugh Dickins wrote: > Hi Mel, > > I suspect that it's not safe for kthreadd to drain_all_pages(); > but I haven't studied flush_work() etc, so don't really know what > I'm talking about: hoping that you will jump to a realization. > You're right, it's not safe. If kthreadd is creating the workqueue thread to do the drain and it'll recurse into itself. > 4.11-rc has been giving me hangs after hours of swapping load. At > first they looked like memory leaks ("fork: Cannot allocate memory"); > but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh" > before looking at /proc/meminfo one time, and the stat_refresh stuck > in D state, waiting for completion of flush_work like many kworkers. > kthreadd waiting for completion of flush_work in drain_all_pages(). > It's asking itself to do work in all likelihood. > Patch below has been running well for 36 hours now: > a bit too early to be sure, but I think it's time to turn to you. > I think the patch is valid but like Michal, would appreciate if you could run the patch he linked to see if it also side-steps the same problem. Good spot! -- Mel Gorman SUSE Labs