Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751413AbcDMTXS (ORCPT ); Wed, 13 Apr 2016 15:23:18 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36291 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751119AbcDMTXR (ORCPT ); Wed, 13 Apr 2016 15:23:17 -0400 Date: Wed, 13 Apr 2016 21:23:14 +0200 From: Michal Hocko To: Tejun Heo Cc: Petr Mladek , cgroups@vger.kernel.org, Cyril Hrubis , linux-kernel@vger.kernel.org, Johannes Weiner Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups Message-ID: <20160413192313.GA30260@dhcp22.suse.cz> References: <20160413094216.GC5774@pathway.suse.cz> <20160413183309.GG3676@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160413183309.GG3676@htj.duckdns.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2011 Lines: 54 On Wed 13-04-16 14:33:09, Tejun Heo wrote: > Hello, Petr. > > (cc'ing Johannes) > > On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote: > ... > > By other words, "memcg_move_char/2860" flushes a work. But it cannot > > get flushed because one worker is blocked and another one could not > > get created. All these operations are blocked by the very same > > "memcg_move_char/2860". > > > > Note that also "systemd/1" is waiting for "cgroup_mutex" in > > proc_cgroup_show(). But it seems that it is not in the main > > cycle causing the deadlock. > > > > I am able to reproduce this problem quite easily (within few minutes). > > There are often even more tasks waiting for the cgroups-related locks > > but they are not causing the deadlock. > > > > > > The question is how to solve this problem. I see several possibilities: > > > > + avoid using workqueues in lru_add_drain_all() > > > > + make lru_add_drain_all() killable and restartable > > > > + do not block fork() when lru_add_drain_all() is running, > > e.g. using some lazy techniques like RCU, workqueues > > > > + at least do not block fork of workers; AFAIK, they have a limited > > cgroups usage anyway because they are marked with PF_NO_SETAFFINITY > > > > > > I am willing to test any potential fix or even work on the fix. > > But I do not have that big insight into the problem, so I would > > need some pointers. > > An easy solution would be to make lru_add_drain_all() use a > WQ_MEM_RECLAIM workqueue. I think we can live without lru_add_drain_all() in the migration path. We are talking about 4 pagevecs so 56 pages. The charge migration is racy anyway. What concerns me more is how all this is fragile. It sounds just too easy to add a dependency on per-cpu sync work later and reintroduce this issue which is quite hard to detect. Cannot we come up with something more robust? Or at least warn when we try to use per-cpu workers with problematic locks held? Thanks! -- Michal Hocko SUSE Labs