Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp1210013ybh; Tue, 10 Mar 2020 17:20:32 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsP+8zeeQ6WpGYItENCDQpXsppvO/BUrPxKQBzKkv07XAbngO9iSAiYtDeL+sqmL7LaJR3I X-Received: by 2002:aca:3008:: with SMTP id w8mr195318oiw.96.1583886032018; Tue, 10 Mar 2020 17:20:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583886032; cv=none; d=google.com; s=arc-20160816; b=NBQQWp3Kfubs4nC5nZ6IzAlxNvZQTTA4w+KdfWdJ87fcw5dTnK5gv4NBStvYYi9dZb gdxil29nPU8nD9tPASc/3S8gPXBCLK2gMCBevGvGkKspv2aiX45/wsRDQzjwB9mY2/+E zND5a3bTPQOAZMJMHyQtYe8NJ6cQgmNH9+dxa8nuL41NxlzKSXD6AwwoBDjxYa1TeGMj yImBeXU7eZdMHq2SfbS4/y4Y8XVCBo36bDlZH2gW7TheKyw7pDBvvg8ck1uGOMFKVkz2 tFZlVvJ+xR4FxnA37e5b5Au4Dpj2nS4Yv5Fc3w2urUBjJnbYjlw7jNL46D05q5TTTfyD +M5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=ZnszfSLpeEsmKM+YZDNz5FXqBCtRAUo6KhVC1YwjcQo=; b=jSn/r33RuAxZnmdRN6VlmDc7/Uy4B+V/AaNuBXn76QHgCC7HwOVkF7y12hNEkjSZUy TNzH/KCM5k5KTBebWtcF22+FOQtV5A3UwzrGn7/Gs2yqYMn/xwSIvXZedgL+hNmGMa2u a0doTimPWgQusCCuu8YiC1OvNKuCieZtil/IFpqd9UDoyx80DjSodHCf3GmgvnezeykX LN7s1Xerz7uDTQCTbRmrZul16g3vg58GgfNbGn5neQXbEgRetxa49ZKPANRE7u6BHPbH OyuHtXXD004xuUP4Yb0An/gbBUShxNWuOqGjTiEW4wvhVtLNudNJ1wAYLPJGqPjNmdDb 1m7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="pO7/4cHW"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si246927oig.197.2020.03.10.17.20.19; Tue, 10 Mar 2020 17:20:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="pO7/4cHW"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727837AbgCKASD (ORCPT + 99 others); Tue, 10 Mar 2020 20:18:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:38378 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726463AbgCKASD (ORCPT ); Tue, 10 Mar 2020 20:18:03 -0400 Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C4334222C4; Wed, 11 Mar 2020 00:18:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583885883; bh=y5QJ9GXWSnz15V7XZRVYJqb5uqwEux6PilkXReQ4U1Q=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=pO7/4cHWR3qTj0bznJ7188fG64qaOSB88v7476iEvv3KbMspuOvOFmbEqLGsOanQ1 9/io2pTiqypzmUKiZ/4F0cXTFv2UUPnnwJQ9foCp+4eBKERrzND7jG8Aw8qFxSU4U3 2XbTIWxreNknyUIg94KcQqpX/rci92Nfcn53fLfU= Date: Tue, 10 Mar 2020 17:18:02 -0700 From: Andrew Morton To: David Rientjes Cc: Vlastimil Babka , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems Message-Id: <20200310171802.128129f6817ef3f77d230ccd@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 10 Mar 2020 14:39:48 -0700 (PDT) David Rientjes wrote: > When a process is oom killed as a result of memcg limits and the victim > is waiting to exit, nothing ends up actually yielding the processor back > to the victim on UP systems with preemption disabled. Instead, the > charging process simply loops in memcg reclaim and eventually soft > lockups. > > Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB oom_score_adj:0 > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806] > CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136 > RIP: 0010:shrink_lruvec+0x4e9/0xa40 > ... > Call Trace: > shrink_node+0x40d/0x7d0 > do_try_to_free_pages+0x13f/0x470 > try_to_free_mem_cgroup_pages+0x16d/0x230 > try_charge+0x247/0xac0 > mem_cgroup_try_charge+0x10a/0x220 > mem_cgroup_try_charge_delay+0x1e/0x40 > handle_mm_fault+0xdf2/0x15f0 > do_user_addr_fault+0x21f/0x420 > page_fault+0x2f/0x40 > > Make sure that something ends up actually yielding the processor back to > the victim to allow for memory freeing. Most appropriate place appears to > be shrink_node_memcgs() where the iteration of all decendant memcgs could > be particularly lengthy. > That's a bit sad. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > unsigned long reclaimed; > unsigned long scanned; > > + cond_resched(); > + > switch (mem_cgroup_protected(target_memcg, memcg)) { > case MEMCG_PROT_MIN: > /* Obviously better, but this will still spin wheels until this tasks's timeslice expires, and we might want to do something to help ensure that the victim runs next (or soon)? (And why is shrink_node_memcgs compiled in when CONFIG_MEMCG=n?)