Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp789362imm; Thu, 6 Sep 2018 10:06:14 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYJuhjT3IcLdj5Z9oM50enGR97IQabWcGMxPZgYb/tYwcb17IrBxWRlnb7w5tU5qGGoR2bS X-Received: by 2002:a17:902:2804:: with SMTP id e4-v6mr3607197plb.327.1536253574562; Thu, 06 Sep 2018 10:06:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536253574; cv=none; d=google.com; s=arc-20160816; b=xxnpBwA7d9jioB35SXI7TIGM67zrv+qrDWVGO7U3Rnn/Asahm6zUnM4Z0l9lERv++e ZpoNoZL3XQ3qNVteI2DX1BjAz8pdZKSpClQvvJ14tj/ipJkWqSe0teo4WIvxMuM7LBFN OZhOYgIBUC3qrboeBUQ+IqFpEcAUul0w8KPAgVm9Hui+VJ792YQTv40fQphnmav0DGl1 NnxbKJXNUvn43kw7ojkkFfmNiiKDaILA4I4676gbUehkHx1kh0I2kg4K7UQBU7kt/Py4 Xe6Y5H/EIUA+PeNAEkm/xmD1Ov8M/ryTOWXlgqz8T+We/kxFzV/VIrh+4Fe88k+qhFZE PkWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=t9wKb18ElXfXeeLTOB7iXfzQfwYFyOgdwKKGhTfXTJA=; b=jr/KrsXdHPeUfhTtUWDRGnExoxwr8HQzdSYmvaPCYXcpL66KEdwJxOU3ECGHbc+33j qkTlXvyypJ4CaiKFqF1btyHfdhxoFqwEvfxVWQWoD6xNo/c/c3eI2wdiAZ3LQDZx/3S1 B0L7RPz/HTKFOeIfvSe1KDleSCQSwnHRPJ2sDFL0S/UdrJP5yPvhWcGVL/VZfPfPR/bJ M2IVaw12WKSt2KnCrLlLuR11KiYW+L5taVBlgh+w4gniQ0YPLQUN9XsJhz8TYvecjx95 fX/Hz5rDx4GwJM3wxF68G51/J6Yp/2iCgK8ISsTAXC9/LM/a/0JYCEi52Zs42q3+ygse lqQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bj4-v6si5090791plb.119.2018.09.06.10.05.57; Thu, 06 Sep 2018 10:06:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729613AbeIFVkD (ORCPT + 99 others); Thu, 6 Sep 2018 17:40:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:52378 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727847AbeIFVkC (ORCPT ); Thu, 6 Sep 2018 17:40:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AAB22AEF2; Thu, 6 Sep 2018 17:03:36 +0000 (UTC) Date: Thu, 6 Sep 2018 19:03:34 +0200 From: Michal Hocko To: Alexander Duyck Cc: Dave Hansen , linux-mm , LKML , "Duyck, Alexander H" , pavel.tatashin@microsoft.com, Andrew Morton , Ingo Molnar , "Kirill A. Shutemov" Subject: Re: [PATCH v2 1/2] mm: Move page struct poisoning to CONFIG_DEBUG_VM_PAGE_INIT_POISON Message-ID: <20180906170334.GE14951@dhcp22.suse.cz> References: <20180905211041.3286.19083.stgit@localhost.localdomain> <20180905211328.3286.71674.stgit@localhost.localdomain> <20180906054735.GJ14951@dhcp22.suse.cz> <0c1c36f7-f45a-8fe9-dd52-0f60b42064a9@intel.com> <20180906151336.GD14951@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 06-09-18 08:41:52, Alexander Duyck wrote: > On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko wrote: > > > > On Thu 06-09-18 07:59:03, Dave Hansen wrote: > > > On 09/05/2018 10:47 PM, Michal Hocko wrote: > > > > why do you have to keep DEBUG_VM enabled for workloads where the boot > > > > time matters so much that few seconds matter? > > > > > > There are a number of distributions that run with it enabled in the > > > default build. Fedora, for one. We've basically assumed for a while > > > that we have to live with it in production environments. > > > > > > So, where does leave us? I think we either need a _generic_ debug > > > option like: > > > > > > CONFIG_DEBUG_VM_SLOW_AS_HECK > > > > > > under which we can put this an other really slow VM debugging. Or, we > > > need some kind of boot-time parameter to trigger the extra checking > > > instead of a new CONFIG option. > > > > I strongly suspect nobody will ever enable such a scary looking config > > TBH. Besides I am not sure what should go under that config option. > > Something that takes few cycles but it is called often or one time stuff > > that takes quite a long but less than aggregated overhead of the former? > > > > Just consider this particular case. It basically re-adds an overhead > > that has always been there before the struct page init optimization > > went it. The poisoning just returns it in a different form to catch > > potential left overs. And we would like to have as many people willing > > to running in debug mode to test for those paths because they are > > basically impossible to review by the code inspection. More importantnly > > the major overhead is boot time so my question still stands. Is this > > worth a separate config option almost nobody is going to enable? > > > > Enabling DEBUG_VM by Fedora and others serves us a very good testing > > coverage and I appreciate that because it has generated some useful bug > > reports. Those people are paying quite a lot of overhead in runtime > > which can aggregate over time is it so much to ask about one time boot > > overhead? > > The kind of boot time add-on I saw as a result of this was about 170 > seconds, or 2 minutes and 50 seconds on a 12TB system. Just curious. How long does it take to get from power on to even reaach boot loader on that machine... ;) > I spent a > couple minutes wondering if I had built a bad kernel or not as I was > staring at a dead console the entire time after the grub prompt since > I hit this so early in the boot. That is the reason why I am so eager > to slice this off and make it something separate. I could easily see > this as something that would get in the way of other debugging that is > going on in a system. But you would get the same overhead a kernel release ago when the memmap init optimization was merged. So you are basically back to what we used to have for years. Unless I misremember. > If we don't want to do a config option, then what about adding a > kernel parameter to put a limit on how much memory we will initialize > like this before we just start skipping it. We could put a default > limit on it like 256GB and then once we cross that threshold we just > don't bother poisoning any more memory. With that we would probably be > able to at least cover most of the early memory init, and that value > should cover most systems without getting into delays on the order of > minutes. No, this will defeat the purpose of the check. -- Michal Hocko SUSE Labs