Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp313015ybh; Wed, 11 Mar 2020 01:29:42 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtKFnogItdxA2DuI3xgKzmqyNguqNs1H6hktpnGYIQUM/3/tFHeLs+0qYYzMO1EK1oyys/6 X-Received: by 2002:a4a:370f:: with SMTP id r15mr53926oor.100.1583915382779; Wed, 11 Mar 2020 01:29:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583915382; cv=none; d=google.com; s=arc-20160816; b=jcgikcB/u3UCT140Ym5WkQ+FLCuGh3g6Txdx8RjPLhG8jS0kO+PGewddgn8hqCl7IH sNSVmeFi4ia5LfDAD4XBBmkfvAoAvnJOzs0SMJCAEdmn5t0OSZ6YBMsIXoBDqk6Ok5Yt 7YQ0EeLWG75QROwMkeaAasWsjOzLPGMdchq9lSBJwU2bWoeG57vrUaXynytNI4729Sju CrRYkbTNOtniHVCzjGDlixUcyBtz0qG8xuTET8aCv/JfDheTrOqH8HLCA2leNl+i3Twh PD4El5YhrpEaAjp9jHxrWzmAro65fbH5jZd+oQtrQ6eTNCEtaLk4jAg6dvkTTpdNZ7gM nz3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=t/jUgb2WsM9EoAjEAHRyAbPDuMelzLd2J5nybEhmsYU=; b=RGRA7qOLEBX+LGfZSDUrkh49FhjRf8Z38+IqAvAdE2VyKbuKRLUUwA9YXX7gQTDSGp IYFyhLub0ls4BPOOYwEnVhJdYYRoLoRcR6Afl96AVtLsZMh3eP16/pCTZHF17eLtafgp gsyKe7E7ujVPthEHSzRnTTK/d6w+Jj2DmAgcoaTf9Gc8xoG3GTfjvOrnMQVSNMGeDAmj sr3Mvre3l2BVhu2r5BuWB6TM/sSo9ByX1tpC8BbymZrsgsJYFJ9I21InS8weRvq/xt4A 9ZqMNNX3wT8KoS4GnHaPYYx7GohrIf/fwkWaQLzkAlffgd2zFaI7lkp/F4iV0kMcI5ex lt8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r17si20257otq.138.2020.03.11.01.29.29; Wed, 11 Mar 2020 01:29:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728242AbgCKI1m (ORCPT + 99 others); Wed, 11 Mar 2020 04:27:42 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:55436 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726160AbgCKI1m (ORCPT ); Wed, 11 Mar 2020 04:27:42 -0400 Received: by mail-wm1-f66.google.com with SMTP id 6so1035489wmi.5 for ; Wed, 11 Mar 2020 01:27:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=t/jUgb2WsM9EoAjEAHRyAbPDuMelzLd2J5nybEhmsYU=; b=jHuIGIPOTjRCns6e64/dXbYgRrRsDy+pS6K8S4EmExmqKFFbUgBTaLKS4L5ZwUkSzM SPQn9Xlo7E6IjBDgvDbIJ+Kb0eaBIU2dSDHtm5jjesh0aZ2ks/HtL8gHNN4l8j4SHcBW d/j/znw+ORjHUBrQrpgvdlbuqrqNKDRUuoyIp1OS7n8tTS3t0Mjr+atdWtVNFge92vvo MJFqHvO3eUkl4/ZJ+ljSx8qy7T0+MiZyC1VprLhkuzRU+RXTyo5+y1+wiDiKeFQZU6O1 yV3dTWSzv238IubogBkkJbe0RPjVPd9AV5/OzTr9CbofAuhatDfPnCcnC+VqCdQ0Zr8B YJlw== X-Gm-Message-State: ANhLgQ0dShsrGl9ByBHtY4HExhaOoiJckSJxlgeFeQ9NpPcvdFGFE2Ky 79mDbRyI7np5NjsORZU6sOU= X-Received: by 2002:a1c:9904:: with SMTP id b4mr2565466wme.34.1583915258992; Wed, 11 Mar 2020 01:27:38 -0700 (PDT) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id y184sm7683553wmd.43.2020.03.11.01.27.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2020 01:27:38 -0700 (PDT) Date: Wed, 11 Mar 2020 09:27:36 +0100 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems Message-ID: <20200311082736.GA23944@dhcp22.suse.cz> References: <20200310221019.GE8447@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-03-20 16:02:23, David Rientjes wrote: > On Tue, 10 Mar 2020, Michal Hocko wrote: > > > > When a process is oom killed as a result of memcg limits and the victim > > > is waiting to exit, nothing ends up actually yielding the processor back > > > to the victim on UP systems with preemption disabled. Instead, the > > > charging process simply loops in memcg reclaim and eventually soft > > > lockups. > > > > > > Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB oom_score_adj:0 > > > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806] > > > CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136 > > > RIP: 0010:shrink_lruvec+0x4e9/0xa40 > > > ... > > > Call Trace: > > > shrink_node+0x40d/0x7d0 > > > do_try_to_free_pages+0x13f/0x470 > > > try_to_free_mem_cgroup_pages+0x16d/0x230 > > > try_charge+0x247/0xac0 > > > mem_cgroup_try_charge+0x10a/0x220 > > > mem_cgroup_try_charge_delay+0x1e/0x40 > > > handle_mm_fault+0xdf2/0x15f0 > > > do_user_addr_fault+0x21f/0x420 > > > page_fault+0x2f/0x40 > > > > > > Make sure that something ends up actually yielding the processor back to > > > the victim to allow for memory freeing. Most appropriate place appears to > > > be shrink_node_memcgs() where the iteration of all decendant memcgs could > > > be particularly lengthy. > > > > There is a cond_resched in shrink_lruvec and another one in > > shrink_page_list. Why doesn't any of them hit? Is it because there are > > no pages on the LRU list? Because rss data suggests there should be > > enough pages to go that path. Or maybe it is shrink_slab path that takes > > too long? > > > > I think it can be a number of cases, most notably mem_cgroup_protected() > checks which is why the cond_resched() is added above it. Rather than add > cond_resched() only for MEMCG_PROT_MIN and for certain MEMCG_PROT_LOW, the > cond_resched() is added above the switch clause because the iteration > itself may be potentially very lengthy. Was any of the above the case for your soft lockup case? How have you managed to trigger it? As I've said I am not against the patch but I would really like to see an actual explanation what happened rather than speculations of what might have happened. If for nothing else then for the future reference. If this is really about all the hierarchy being MEMCG_PROT_MIN protected and that results in a very expensive and pointless reclaim walk that can trigger soft lockup then it should be explicitly mentioned in the changelog. -- Michal Hocko SUSE Labs