Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755253AbcDMSdO (ORCPT ); Wed, 13 Apr 2016 14:33:14 -0400 Received: from mail-yw0-f195.google.com ([209.85.161.195]:34205 "EHLO mail-yw0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755113AbcDMSdM (ORCPT ); Wed, 13 Apr 2016 14:33:12 -0400 Date: Wed, 13 Apr 2016 14:33:09 -0400 From: Tejun Heo To: Petr Mladek Cc: cgroups@vger.kernel.org, Michal Hocko , Cyril Hrubis , linux-kernel@vger.kernel.org, Johannes Weiner Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups Message-ID: <20160413183309.GG3676@htj.duckdns.org> References: <20160413094216.GC5774@pathway.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160413094216.GC5774@pathway.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1605 Lines: 47 Hello, Petr. (cc'ing Johannes) On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote: ... > By other words, "memcg_move_char/2860" flushes a work. But it cannot > get flushed because one worker is blocked and another one could not > get created. All these operations are blocked by the very same > "memcg_move_char/2860". > > Note that also "systemd/1" is waiting for "cgroup_mutex" in > proc_cgroup_show(). But it seems that it is not in the main > cycle causing the deadlock. > > I am able to reproduce this problem quite easily (within few minutes). > There are often even more tasks waiting for the cgroups-related locks > but they are not causing the deadlock. > > > The question is how to solve this problem. I see several possibilities: > > + avoid using workqueues in lru_add_drain_all() > > + make lru_add_drain_all() killable and restartable > > + do not block fork() when lru_add_drain_all() is running, > e.g. using some lazy techniques like RCU, workqueues > > + at least do not block fork of workers; AFAIK, they have a limited > cgroups usage anyway because they are marked with PF_NO_SETAFFINITY > > > I am willing to test any potential fix or even work on the fix. > But I do not have that big insight into the problem, so I would > need some pointers. An easy solution would be to make lru_add_drain_all() use a WQ_MEM_RECLAIM workqueue. A better way would be making charge moving asynchronous similar to cpuset node migration but I don't know whether that's realistic. Will prep a patch to add a rescuer to lru_add_drain_all(). Thanks. -- tejun