Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp332449ybg; Thu, 19 Mar 2020 00:11:09 -0700 (PDT) X-Google-Smtp-Source: ADFU+vu4T29qxZTFSTv/R0LaUhqwYJhwK7wRJjMIF7Elw5JfntbVjXJrKrBiTSGECmsdFWg5r3zP X-Received: by 2002:a05:6830:1f5a:: with SMTP id u26mr1227470oth.208.1584601868906; Thu, 19 Mar 2020 00:11:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584601868; cv=none; d=google.com; s=arc-20160816; b=e9gDNwTgoeTbcZ5r4wqXwYpKkf3/wJTDrSbfVKSoeELe1jy5bvPd/5bnaZ53N8DyhT 7AFdO/8PUmj0B3drW6ZZv6ann+gdhVeZF1CETJzqKhyLlaW2uVu6TuKl8sLWZ1UXaiy0 BcgjQbRQqmbXB/COlVwKYcdNyILGJhamA7ktec+LR+OLLbzFK9nxprBC1jRYBkyEv7/H VFGRDK52GPcsADp6+Y9fK9e+kbmz8j1jS/nIiqkl+0ZypLFMmfVWwoKrOx8i6BasXADI GBPSTRu5dtH84jdEmP8SpE/s5R52ApZsbJhOcyr74jeVNF5JBLiHyo7Hhnn9oUn5y+l5 8cxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=5EdXwbHlmoDhjXzrHF/yteEF/2NqvDtwOOg1hPGy1hY=; b=ZdlxxQXxO4FxgspdvvNPc0LKWqJD6npxjWPnhfVKzxFZbRxf4AAv2Gvz2QFG0JALLg BeHCMiDWFlOTu3pzf9vwTtKyNtsUIR5RR7CHTjpDMqatT8twvHrw77XxBs2eoHRCMjRI /4ZpI+FC8ciC/HDb21YKfyGt9L3zYmiQojd+8D6SJolCW7HXtVOvROkR6N1OVwgM2a3M XpPDi7Xvy9Gbmt+RCzE8NnSDx8b5OBc9SYIBBIIUFuiIurfeLWrW30H8DERuXZO7kBLv K355wGbfkzFp0U/vOA/63sRUxSftK4gNbGubcB4BhNJPMLykowRHUd2um6HQkf+qmKGS C2rQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l20si825479otn.37.2020.03.19.00.10.56; Thu, 19 Mar 2020 00:11:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727176AbgCSHJQ (ORCPT + 99 others); Thu, 19 Mar 2020 03:09:16 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:44332 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726627AbgCSHJQ (ORCPT ); Thu, 19 Mar 2020 03:09:16 -0400 Received: by mail-wr1-f67.google.com with SMTP id o12so797670wrh.11 for ; Thu, 19 Mar 2020 00:09:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5EdXwbHlmoDhjXzrHF/yteEF/2NqvDtwOOg1hPGy1hY=; b=SZkBNmuKbrkF/LPOHOqA32KPwbfgFDO1Dd3gB1/sK0sGP3hHIx8GkgHQRadgMDiFrd 1P21NpLIpNmIT/k4MjeWBctz3HeKs7yofkQ9oUZvFPbHT8P2nHxNS5f/qpdV0gj7csqm GPnFEyQ781HWOxERZx7BgDOX8W+uuLHk0uFpxv/rBF20dB5OaEUhzsp2ufWPKxVY/Ri8 z2c04j5nCFHPIwNUGX0tvplHTp30E9j+BFQtQ0p81wGqLJcHtykkJ9BAkc9/gtlP4iD1 pqmLpUiH38gMOe7PD0hSW94PLAuuEQQGnr3Kr5WfoL43tbRFuh0g8hHj/2kFRkM+Lbdl 0Xog== X-Gm-Message-State: ANhLgQ3b0KELeNvMYuBUCuXVnTCVdJ/AfGWyHgVtzfuruoY5Z6lvzW0a gM7Il73FZvpWIZ5uuQGfhmbCMbAU X-Received: by 2002:a5d:6ca7:: with SMTP id a7mr2398248wra.157.1584601754367; Thu, 19 Mar 2020 00:09:14 -0700 (PDT) Received: from localhost (ip-37-188-140-107.eurotel.cz. [37.188.140.107]) by smtp.gmail.com with ESMTPSA id a184sm1820443wmf.29.2020.03.19.00.09.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Mar 2020 00:09:12 -0700 (PDT) Date: Thu, 19 Mar 2020 08:09:11 +0100 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Tetsuo Handa , Vlastimil Babka , Robert Kolchmeyer , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v3] mm, oom: prevent soft lockup on memcg oom for UP systems Message-ID: <20200319070911.GU21362@dhcp22.suse.cz> References: <8395df04-9b7a-0084-4bb5-e430efe18b97@i-love.sakura.ne.jp> <202003170318.02H3IpSx047471@www262.sakura.ne.jp> <20200318094219.GE21362@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 18-03-20 15:03:52, David Rientjes wrote: > When a process is oom killed as a result of memcg limits and the victim > is waiting to exit, nothing ends up actually yielding the processor back > to the victim on UP systems with preemption disabled. Instead, the > charging process simply loops in memcg reclaim and eventually soft > lockups. > > For example, on an UP system with a memcg limited to 100MB, if three > processes each charge 40MB of heap with swap disabled, one of the charging > processes can loop endlessly trying to charge memory which starves the oom > victim. This only happens if there is no reclaimable memory in the hierarchy. That is a very specific condition. I do not see any other way than having a misconfigured system with min protection preventing any reclaim. Otherwise we have cond_resched both in slab shrinking code (do_shrink_slab) and LRU shrinking shrink_lruvec. If I am wrong and those are insufficient then please be explicit about the scenario. This is a very important information to have in the changelog! [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1576,6 +1576,12 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > */ > ret = should_force_charge() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > + /* > + * Give a killed process a good chance to exit before trying to > + * charge memory again. > + */ > + if (ret) > + schedule_timeout_killable(1); Why are you making this conditional? Say that there is no victim to kill. The charge path would simply bail out and it would really depend on the call chain whether there is a scheduling point or not. Isn't it simply safer to call schedule_timeout_killable unconditioanlly at this stage? > return ret; > } > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3861,6 +3861,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > out: > mutex_unlock(&oom_lock); > + /* > + * Give a killed process a good chance to exit before trying to > + * allocate memory again. > + */ > + if (*did_some_progress) > + schedule_timeout_killable(1); This doesn't make much sense either. Please remember that the primary reason you are adding this schedule_timeout_killable in this path is because you want to somehow reduce the priority inversion problem mentioned by Tetsuo. Because the page allocator path doesn't lack regular scheduling points - compaction, reclaim and should_reclaim_retry etc have them. > return page; > } > -- Michal Hocko SUSE Labs