Received: by 10.213.65.68 with SMTP id h4csp179666imn; Fri, 23 Mar 2018 02:08:32 -0700 (PDT) X-Google-Smtp-Source: AG47ELuHmV7uDt4Debtay4t4C8OEHTTXbcKwYhkPUsa2YYJpx5xUD5hyi1h9+f3yIoe6w4eMX5LT X-Received: by 2002:a17:902:3124:: with SMTP id w33-v6mr29453056plb.119.1521796112316; Fri, 23 Mar 2018 02:08:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521796112; cv=none; d=google.com; s=arc-20160816; b=NGUWlm8CSYMRwrKUjY9Bu4tHVvTQjcpyvIF+Td6J+vEzoTnf29Iwq5ERzFbL+4jrsT 5z8yG6GCNZWAdLL5a3HYR0hvzElIqZTOZVy1dJMhSe//dhqAhJRjswVjjKOpjMBZ1Rvg lmJZlViVMEWo+/K10IqLYfBR0Csrj2Z0MShU2yRHP5EZR+dcU9mkkhd+qJTUPD8IdZQN oaPbqKnd1WepB3RL14XNtz8eFHah11coaEDUVDJkUHGjKz8Sum6Amroods5QXnHhAUKK dFnX16cO/yWWBXhRdBScblGkyxQjb+2IBtQa/rVx7VzCcfxphY88ZHlzqSJItgKoPWxq KV4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=wZABxVmBbgADGQo3/J9dRNIkZx/Y9KuYg/bUN4xtQYw=; b=JtVZ2B0uRNva6cJfzev8Knt5iQkPSvMP0dNJcUB7eq0SDyO2tM+7Zrnd6qz1ELO9tV Je19wh9EA5lenoE91tKjYay3hqRM5RK+VlnN+No+unAtOQuAhZx2FjUCy438jdnkPRY0 md38a1RiLxt74zRGMdh/j0NVlFR3t1XGK7wle/mmxfCzXqdKkLbq0A7wTanWhY8PDRWX qc1AbZI8dyXd2RONMEJc5aYFp+yYuJtiQ1Lj6Wmkhuz6zNwxsksVyDX3FDleW1/iw9dp +hGtGGNzrdmZWXTBx6fmSWEjaW3ccRLGrV5yhPNeBnFZFTTPZS7Vh+ahD2ghWr72Agos d/qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h3si6395158pfb.394.2018.03.23.02.08.17; Fri, 23 Mar 2018 02:08:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751921AbeCWJHJ (ORCPT + 99 others); Fri, 23 Mar 2018 05:07:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:36299 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751668AbeCWJHH (ORCPT ); Fri, 23 Mar 2018 05:07:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 56E4FADA5; Fri, 23 Mar 2018 09:07:05 +0000 (UTC) Date: Fri, 23 Mar 2018 10:07:04 +0100 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Johannes Weiner , "Kirill A. Shutemov" , Vlastimil Babka , linux-mm@kvack.org, LKML Subject: Re: [PATCH] memcg, thp: do not invoke oom killer on thp charges Message-ID: <20180323090704.GK23100@dhcp22.suse.cz> References: <20180321205928.22240-1-mhocko@kernel.org> <20180321214104.GT23100@dhcp22.suse.cz> <20180322085611.GY23100@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 22-03-18 13:29:37, David Rientjes wrote: > On Thu, 22 Mar 2018, Michal Hocko wrote: [...] > > They simply cannot because kmalloc performs the change under the cover. > > So you would have to use kmalloc(gfp|__GFP_NORETRY) to be absolutely > > sure to not trigger _any_ oom killer. This is just wrong thing to do. > > > > Examples of where this isn't already done? It certainly wasn't a problem > before __GFP_NORETRY was dropped in commit 2516035499b9 but you suspect > it's a problem now. It is not a problem _right now_ as I've already pointed out few times. We do not trigger the OOM killer for anything but #PF path. But this is an implementation detail which can change in future and there is actually some demand for the change. Once we start triggering the oom killer for all charges then we do not really want to have the disparity. > > > You're diverging from it because the memcg charge path has never had this > > > heuristic. > > > > Which is arguably a bug which just didn't matter because we do not > > have costly order oom eligible charges in general and THP was subtly > > different and turned out to be error prone. > > > > It was inadvertently dropped from commit 2516035499b9. There were no > high-order charge oom kill problems before this commit. People know how > to use __GFP_NORETRY or leave it off, which you don't trust them to do > because you're hardcoding a heuristic in the charge path. No. Just read what I wrote. I am worried that the current disparity between the page allocator and the memcg charging will _force_ them to do hacks and sometimes (e.g. kmalloc) they will not have any option but using __GFP_NORETRY even when that is not really needed and it has a different semantic than they would like. Behavior on with and without memcgs should be as similar as possible otherwise you will see different sets of bugs when running under the memcg and without. I really fail to see what is so hard about this to understand. [...] > > > Your change is broken and I wouldn't push it to Linus for rc7 if my life > > > depended on it. What is the response when someone complains that they > > > start getting a ton of MEMCG_OOM notifications for every thp fallback? > > > They will, because yours is a broken implementation. > > > > I fail to see what is broken. Could you be more specific? > > > > I said MEMCG_OOM notifications on thp fallback. You modified > mem_cgroup_oom(). What is called before mem_cgroup_oom()? > mem_cgroup_event(mem_over_limit, MEMCG_OOM). That increments the > MEMCG_OOM event and anybody waiting on the events file gets notified it > changed. They read a MEMCG_OOM event. It's thp fallback, it's not memcg > oom. MEMCG_OOM doesn't count the number of oom killer invocations. That has never been the case. > Really, I can't continue to write 100 emails in this thread. Then try to read and even try to understand concerns expressed in those emails. Repeating the same set of arguments and ignoring the rest is not really helpful. -- Michal Hocko SUSE Labs