Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752830AbbKYLyR (ORCPT ); Wed, 25 Nov 2015 06:54:17 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:20027 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751320AbbKYLyP (ORCPT ); Wed, 25 Nov 2015 06:54:15 -0500 To: mhocko@kernel.org, akpm@linux-foundation.org Cc: tj@kernel.org, clameter@sgi.com, arekm@maven.pl, linux-mm@kvack.org, linux-kernel@vger.kernel.org, js1304@gmail.com, cl@linux.com Subject: Re: [PATCH] mm, vmstat: Allow WQ concurrency to discover memory reclaim doesn't make any progress From: Tetsuo Handa References: <1447936253-18134-1-git-send-email-mhocko@kernel.org> <20151124154448.ac124e62528db313279224ef@linux-foundation.org> <20151125110705.GC27283@dhcp22.suse.cz> In-Reply-To: <20151125110705.GC27283@dhcp22.suse.cz> Message-Id: <201511252054.DEC87052.MSLVJHFQtOFOFO@I-love.SAKURA.ne.jp> X-Mailer: Winbiff [Version 2.51 PL2] X-Accept-Language: ja,en,zh Date: Wed, 25 Nov 2015 20:54:13 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1355 Lines: 25 Michal Hocko wrote: > Anyway I think that the issue is not solely theoretical. WQ_MEM_RECLAIM > is simply not working if the allocation path doesn't sleep currently and > my understanding of what Tejun claims [2] is that that reimplementing WQ > concurrency would be too intrusive and lacks sufficient justification > because other kernel paths do sleep. This patch tries to reduce the > sleep only to worker threads which should not cause any problems to > regular tasks. I received many unexplained hangup/reboot reports from customers when I was working at support center. But we can't answer whether real people ever hit this problem because we have no watchdog for memory allocation stalls. I want one like http://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp as I wrote off-list ( "mm,oom: The reason why I continue proposing timeout based approach." ). It will help with judging when we tackle TIF_MEMDIE livelock problem. What I can say is that RHEL6 (a 2.6.32-based distro) backported the wait_iff_congested() changes and therefore people might really hit this problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/