Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933376AbdCKBrJ (ORCPT ); Fri, 10 Mar 2017 20:47:09 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:22141 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755389AbdCKBrC (ORCPT ); Fri, 10 Mar 2017 20:47:02 -0500 To: mhocko@kernel.org Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, mgorman@techsingularity.net, david@fromorbit.com, apolyakov@beget.ru Subject: Re: [PATCH v7] mm: Add memory allocation watchdog kernel thread. From: Tetsuo Handa References: <201703091946.GDC21885.OQFFOtJHSOFVML@I-love.SAKURA.ne.jp> <20170309143751.05bddcbad82672384947de5f@linux-foundation.org> <20170310104047.GF3753@dhcp22.suse.cz> <201703102019.JHJ58283.MQHtVFOOFOLFJS@I-love.SAKURA.ne.jp> <20170310152611.GM3753@dhcp22.suse.cz> In-Reply-To: <20170310152611.GM3753@dhcp22.suse.cz> Message-Id: <201703111046.FBB87020.OVOOQFMHFSJLtF@I-love.SAKURA.ne.jp> X-Mailer: Winbiff [Version 2.51 PL2] X-Accept-Language: ja,en,zh Date: Sat, 11 Mar 2017 10:46:58 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2040 Lines: 35 Michal Hocko wrote: > So, we have means to debug these issues. Some of them are rather coarse > and your watchdog can collect much more and maybe give us a clue much > quicker but we still have to judge whether all this is really needed > because it doesn't come for free. Have you considered this aspect? Sigh... You are ultimately ignoring the reality. Educating everybody to master debugging tools does not come for free. If I liken your argumentation to security modules, it looks like the following. "There is already SELinux. SELinux can do everything. Thus, AppArmor is not needed. I don't care about users/customers who cannot administrate SELinux." The reality is different. We need tools which users/customers can afford using. You had better getting away from existing debug tools which kernel developers are using. First of all, SysRq is an emergency tool and therefore it requires administrator's intervention. Your argumentation sounds to me that "Give up debugging unless you can sit on in front of console of Linux systems 24-7" which is already impossible. SysRq-t cannot print seq= and delay= fields because information of in-flight allocation request is not accessible from "struct task_struct", making extremely difficult to judge whether progress is made when several SysRq-t snapshots are taken. Also, year by year it is getting difficult to use vmcore for analysis because vmcore might include sensitive data (even after filtering out user pages). I saw cases where vmcore cannot be sent to support centers due to e.g. organization's information control rules. Sometimes we have to analyze from only kernel messages. Some pieces of information extracted by running scripts against /usr/bin/crash on cutomer's side might be available, but in general we can't assume that the whole memory image which includes whatever information is available. In most cases, administrators can't capture even SysRq-t; let alone vmcore. Therefore, automatic watchdog is highly appreciated. Have you considered this aspect?