Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp635973imm; Fri, 1 Jun 2018 07:07:49 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIs7h0e0LQ5jjXXRrorCxcHr6KaE30ifROAA4d+NULhACBYU4W5UOVeK2hLJ8bGnxgP5/0u X-Received: by 2002:a62:458a:: with SMTP id n10-v6mr6891525pfi.215.1527862069288; Fri, 01 Jun 2018 07:07:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527862069; cv=none; d=google.com; s=arc-20160816; b=avwZfrricSMYpoLeFYmhsjeHRddV6Bd/7AU3juxp0nlxC2BVdvZICBRiSNrN3/4DcK 6dOYwHocWrQ+rASBzhzcXM9WLe0YMFK2uDvq0x+EAqw6Wr2HgI/XPp3eeBxRTc61daAG ctBM6qjY+XJOGSak27y2npPoPRB5a3RBokcKcVQEkSOsD2fo0OriptYOYTIOnH+bKsAL Mt8MMkOu3Uyu735JSnSHi0hNuB3uYyekNK1u7/AqYrYEBT2tlkKmxImzccwB2oTfsaK4 aSf+uLB5QoguKFJ7KhbS/yf+wvsgN4RZj41odUeIJbaYT8HEIEFmFUzpV0MDjne/HBHh 1OAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=vi6Rp3/EFPNufc7mT3vkPYNfmgtb1rrDxtAvlVmlXgw=; b=vwBO1r/xyhtrCPkPbqSkJaFZXmN547s6xvHnTPFreVMr637uy4HA9JbTH5AtGmIQSd v2ie8/+NL8HoNJNd2BPmzgvI/CuJ5uIlATlPUSUg89EGgiYqFnf3oEXyxWlH8msBzbar X+vdoYQncYjXRI+wbWSJQ4rTJrES/0OpmdOq+bLyRGYbsFkTNq9pgEAYaln9VQEOTw1V O7cL+oSaoowO3hh/tZnlJ0kDHFPXJ5t5O7NqXPPzyQDODkzyP752YCj5JTv5zx2rIh5a 6ftaSTRe1VfHwMimCQkVRf+Tun4Cf6qyMpjWYT+moPZOP7aSbWy9oQTqOw0c3MpMW7pj YoeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x67-v6si39299949pfx.216.2018.06.01.07.07.33; Fri, 01 Jun 2018 07:07:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751584AbeFAOHH (ORCPT + 99 others); Fri, 1 Jun 2018 10:07:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:60419 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750850AbeFAOHF (ORCPT ); Fri, 1 Jun 2018 10:07:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5777AAC79; Fri, 1 Jun 2018 14:07:03 +0000 (UTC) Date: Fri, 1 Jun 2018 16:07:01 +0200 From: Michal Hocko To: "Eric W. Biederman" Cc: Andrew Morton , Johannes Weiner , Kirill Tkhai , peterz@infradead.org, viro@zeniv.linux.org.uk, mingo@kernel.org, paulmck@linux.vnet.ibm.com, keescook@chromium.org, riel@redhat.com, tglx@linutronix.de, kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com, hoeun.ryu@gmail.com, pasha.tatashin@oracle.com, gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, Balbir Singh , Tejun Heo , Oleg Nesterov Subject: Re: [PATCH 0/2] mm->owner to mm->memcg fixes Message-ID: <20180601140701.GF15278@dhcp22.suse.cz> References: <20180504162209.GB26573@redhat.com> <871serfk77.fsf@xmission.com> <87tvrncoyc.fsf_-_@xmission.com> <20180510121418.GD5325@dhcp22.suse.cz> <20180522125757.GL20020@dhcp22.suse.cz> <87wovu889o.fsf@xmission.com> <20180524111002.GB20441@dhcp22.suse.cz> <20180524141635.c99b7025a73a709e179f92a2@linux-foundation.org> <20180530121721.GD27180@dhcp22.suse.cz> <87wovjacrh.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87wovjacrh.fsf@xmission.com> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 31-05-18 13:41:38, Eric W. Biederman wrote: > Michal Hocko writes: > > > On Thu 24-05-18 14:16:35, Andrew Morton wrote: > >> On Thu, 24 May 2018 13:10:02 +0200 Michal Hocko wrote: > >> > >> > I would really prefer and appreciate a repost with all the fixes folded > >> > in. > >> > >> [1/2] > > > > Thanks Andrew and sorry it took so long! This seems to be missing the > > fix for the issue I've mentioned in http://lkml.kernel.org/r/20180510121838.GE5325@dhcp22.suse.cz > > and Eric wanted to handle by http://lkml.kernel.org/r/87wovu889o.fsf@xmission.com. > > I do not think that this fix is really correct one though. I will > > comment in a reply to that email. > > Agreed. That was not a correct fix to the possible infinite loop in > get_mem_cgroup_from_mm. Which in net leaves all of this broken and > not ready to be merged. > > > In any case I really think this should better be reposted in one patch > > series and restart the discussion. I strongly suggest revisiting my > > previous attempt http://lkml.kernel.org/r/1436358472-29137-8-git-send-email-mhocko@kernel.org > > including the follow up discussion regarding the unfortunate CLONE_VM > > outside of thread group case and potential solution for that. > > With my fix for get_mem_cgroup_from_mm falling flat that limits our > possible future directions: > - Make memory cgroups honest and require all tasks of an mm to be in the > same memory cgroup. [BEST] Agreed. This should be possible with the multi-pid migration support. Just do not allow to migrate if the set of pids doesn't contain all processes sharing the same mm. We can add a special MMF_ flag for mm that is generated by CLONE_VM without CLONE_THREAD to make the normal path fast. Or we can simply fail the migration for this odd cases and implement it only if somebody actually complains. The flag would be useful for other cases (e.g. in the oom path where we have to handle these tasks as well). [...] > >> - /* We move charges only when we move a owner of the mm */ > >> - if (mm->owner == p) { > c>> + > >> + /* We move charges except for creative uses of CLONE_VM */ > >> + if (mm->memcg == from) { > > > > I must be missing something here but how do you prevent those cases? > > AFAICS processes sharing the mm will simply allow to migrate. > > The mm will only migrate once per set of tasks. A task that does not > migrate with the mm will not have mm->memcg == from. Which is all I was > referring to. Perhaps I did not say that well. OK, I see now (I guess). I would probably just drop the comment. The code is much more clear when dealing with the memcg than a task... > >> VM_BUG_ON(mc.from); > >> VM_BUG_ON(mc.to); > >> VM_BUG_ON(mc.precharge); > > > > Other than that the patch makes sense to me. > > I will take a bit and respin this. Because of algorithmic bug-fix > nature of where this started I want to avoid depending on fixing the > semantics. With my third option for fixing get_mem_cgroup_from_mm I see > how to do that now. Then I will include a separate patch to fix the > semantics. Thanks! -- Michal Hocko SUSE Labs