Date: Wed, 13 Apr 2016 14:33:09 -0400
From: Tejun Heo <tj@kernel.org>
To: Petr Mladek <pmladek@suse.com>
Cc: cgroups@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
        Cyril Hrubis <chrubis@suse.cz>, linux-kernel@vger.kernel.org,
        Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups
Message-ID: <20160413183309.GG3676@htj.duckdns.org>
References: <20160413094216.GC5774@pathway.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160413094216.GC5774@pathway.suse.cz>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1605
Lines: 47

Hello, Petr.

(cc'ing Johannes)

On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote:
...
> By other words, "memcg_move_char/2860" flushes a work. But it cannot
> get flushed because one worker is blocked and another one could not
> get created. All these operations are blocked by the very same
> "memcg_move_char/2860".
> 
> Note that also "systemd/1" is waiting for "cgroup_mutex" in
> proc_cgroup_show(). But it seems that it is not in the main
> cycle causing the deadlock.
> 
> I am able to reproduce this problem quite easily (within few minutes).
> There are often even more tasks waiting for the cgroups-related locks
> but they are not causing the deadlock.
> 
> 
> The question is how to solve this problem. I see several possibilities:
> 
>   + avoid using workqueues in lru_add_drain_all()
> 
>   + make lru_add_drain_all() killable and restartable
> 
>   + do not block fork() when lru_add_drain_all() is running,
>     e.g. using some lazy techniques like RCU, workqueues
> 
>   + at least do not block fork of workers; AFAIK, they have a limited
>      cgroups usage anyway because they are marked with PF_NO_SETAFFINITY
> 
> 
> I am willing to test any potential fix or even work on the fix.
> But I do not have that big insight into the problem, so I would
> need some pointers.

An easy solution would be to make lru_add_drain_all() use a
WQ_MEM_RECLAIM workqueue.  A better way would be making charge moving
asynchronous similar to cpuset node migration but I don't know whether
that's realistic.  Will prep a patch to add a rescuer to
lru_add_drain_all().

Thanks.

-- 
tejun