Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755296AbbKYCoR (ORCPT ); Tue, 24 Nov 2015 21:44:17 -0500 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:51217 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754488AbbKYCoO (ORCPT ); Tue, 24 Nov 2015 21:44:14 -0500 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: iamjoonsoo.kim@lge.com X-Original-SENDERIP: 10.177.222.138 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Wed, 25 Nov 2015 11:44:36 +0900 From: Joonsoo Kim To: Andrew Morton Cc: Michal Hocko , Tetsuo Handa , Tejun Heo , Cristopher Lameter , Arkadiusz =?utf-8?Q?Mi=C5=9Bkiewicz?= , linux-mm@kvack.org, LKML , Michal Hocko , Christoph Lameter Subject: Re: [PATCH] mm, vmstat: Allow WQ concurrency to discover memory reclaim doesn't make any progress Message-ID: <20151125024435.GB9563@js1304-P5Q-DELUXE> References: <1447936253-18134-1-git-send-email-mhocko@kernel.org> <20151124154448.ac124e62528db313279224ef@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151124154448.ac124e62528db313279224ef@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2376 Lines: 45 On Tue, Nov 24, 2015 at 03:44:48PM -0800, Andrew Morton wrote: > On Thu, 19 Nov 2015 13:30:53 +0100 Michal Hocko wrote: > > > From: Michal Hocko > > > > Tetsuo Handa has reported that the system might basically livelock in OOM > > condition without triggering the OOM killer. The issue is caused by > > internal dependency of the direct reclaim on vmstat counter updates (via > > zone_reclaimable) which are performed from the workqueue context. > > If all the current workers get assigned to an allocation request, > > though, they will be looping inside the allocator trying to reclaim > > memory but zone_reclaimable can see stalled numbers so it will consider > > a zone reclaimable even though it has been scanned way too much. WQ > > concurrency logic will not consider this situation as a congested workqueue > > because it relies that worker would have to sleep in such a situation. > > This also means that it doesn't try to spawn new workers or invoke > > the rescuer thread if the one is assigned to the queue. > > > > In order to fix this issue we need to do two things. First we have to > > let wq concurrency code know that we are in trouble so we have to do > > a short sleep. In order to prevent from issues handled by 0e093d99763e > > ("writeback: do not sleep on the congestion queue if there are no > > congested BDIs or if significant congestion is not being encountered in > > the current zone") we limit the sleep only to worker threads which are > > the ones of the interest anyway. > > > > The second thing to do is to create a dedicated workqueue for vmstat and > > mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to > > have a spare worker thread for it. > > This vmstat update thing is being a problem. Please see Joonsoo's > "mm/vmstat: retrieve more accurate vmstat value". > > Joonsoo, might this patch help with that issue? That issue cannot be solved by this patch. This patch solves blocking vmstat updator problem but that issue is caused by long update delay (not blocking). In there, update happens every 1 sec as usuall. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/