Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751848AbcDMT2W (ORCPT ); Wed, 13 Apr 2016 15:28:22 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36060 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750709AbcDMT2U (ORCPT ); Wed, 13 Apr 2016 15:28:20 -0400 Date: Wed, 13 Apr 2016 21:28:17 +0200 From: Michal Hocko To: Tejun Heo Cc: Petr Mladek , cgroups@vger.kernel.org, Cyril Hrubis , linux-kernel@vger.kernel.org, Johannes Weiner Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups Message-ID: <20160413192817.GB30260@dhcp22.suse.cz> References: <20160413094216.GC5774@pathway.suse.cz> <20160413183309.GG3676@htj.duckdns.org> <20160413192313.GA30260@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160413192313.GA30260@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2200 Lines: 58 On Wed 13-04-16 21:23:13, Michal Hocko wrote: > On Wed 13-04-16 14:33:09, Tejun Heo wrote: > > Hello, Petr. > > > > (cc'ing Johannes) > > > > On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote: > > ... > > > By other words, "memcg_move_char/2860" flushes a work. But it cannot > > > get flushed because one worker is blocked and another one could not > > > get created. All these operations are blocked by the very same > > > "memcg_move_char/2860". > > > > > > Note that also "systemd/1" is waiting for "cgroup_mutex" in > > > proc_cgroup_show(). But it seems that it is not in the main > > > cycle causing the deadlock. > > > > > > I am able to reproduce this problem quite easily (within few minutes). > > > There are often even more tasks waiting for the cgroups-related locks > > > but they are not causing the deadlock. > > > > > > > > > The question is how to solve this problem. I see several possibilities: > > > > > > + avoid using workqueues in lru_add_drain_all() > > > > > > + make lru_add_drain_all() killable and restartable > > > > > > + do not block fork() when lru_add_drain_all() is running, > > > e.g. using some lazy techniques like RCU, workqueues > > > > > > + at least do not block fork of workers; AFAIK, they have a limited > > > cgroups usage anyway because they are marked with PF_NO_SETAFFINITY > > > > > > > > > I am willing to test any potential fix or even work on the fix. > > > But I do not have that big insight into the problem, so I would > > > need some pointers. > > > > An easy solution would be to make lru_add_drain_all() use a > > WQ_MEM_RECLAIM workqueue. > > I think we can live without lru_add_drain_all() in the migration path. > We are talking about 4 pagevecs so 56 pages. The charge migration is wanted to say 56 * num_cpus of course. > racy anyway. What concerns me more is how all this is fragile. It sounds > just too easy to add a dependency on per-cpu sync work later and > reintroduce this issue which is quite hard to detect. > > Cannot we come up with something more robust? Or at least warn when we > try to use per-cpu workers with problematic locks held? > > Thanks! -- Michal Hocko SUSE Labs