Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp889436ybl; Wed, 29 Jan 2020 11:24:56 -0800 (PST) X-Google-Smtp-Source: APXvYqxc7sPWYtoQNgCDrPqPODOGIIJ0Wfgxrpo01cHduT1w1CSYYWFfr3g+6Z9k+JJKSa1ScuK3 X-Received: by 2002:a9d:7357:: with SMTP id l23mr659638otk.10.1580325896449; Wed, 29 Jan 2020 11:24:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580325896; cv=none; d=google.com; s=arc-20160816; b=U7I1k7bhpJqVINx392D6mK75tILDxHyCWXpVW1+ei1q2hi9NSfcU7ueYxMtXAG8SFw 4XcPfndX5dzqytAn+6qx6SMJmZ0Z4B8X5Hk9x/aGuACP5uiiOuwGxcwJTogz+t0be2Jk hxqOSvYw8DfBgaanNGNitbSmw3lE8DelmbN7TrEfQtQaFzTd7mLq7smVmjeKOZJWQSb1 bgCVaXLaEHWxy+RibS6Tufj80A5PLD+RXy+F2hX337y7kGgtVRxeXahsvOlhCAKBQUZn xSV6X1zFVIvajufZZkE6nVUcuksragYqGgrDRN+svVOd5IgkZlbEpdXMnXX1SNHqFGXe BfOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=FhknuesOl8lF6FjwV2Qc/z+nG0xejuInhcls4JdTHTE=; b=i1IY0H3WPt53xANvESSGC0SvqfBJ657bjD8wOfDUATrNLsvihNFgK1gNv0e3PO7rv3 YfxjRN8MjFSngUmddOImiEIDX//cXKzi6+ifve4BkLKbAy8jAcJdLvGRuCLZKfJLrFmC XxM6lOV4GIcwP6h0G8gYiQEd0DNXG5WgIu/wc4B9RRG/t4fUTAfkEZpt1SZpQlQIwBY9 Ahy4ZSes6ksohB9pgqFF91FYMVMstgodXZfKihhPJlIKoSZoTZ4khZoYAAZyRRK1Ty1y x9OwbyeKDkhDhED9AZSmGzH7MqapVmwVfWmUJ6xuTMRsw66e7gpvUTdf53esce9HbjC+ D5ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=DCehkkTP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w7si1682487otq.250.2020.01.29.11.24.44; Wed, 29 Jan 2020 11:24:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=DCehkkTP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728443AbgA2TG6 (ORCPT + 99 others); Wed, 29 Jan 2020 14:06:58 -0500 Received: from mail-qk1-f196.google.com ([209.85.222.196]:37331 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727146AbgA2TG6 (ORCPT ); Wed, 29 Jan 2020 14:06:58 -0500 Received: by mail-qk1-f196.google.com with SMTP id 21so365002qky.4 for ; Wed, 29 Jan 2020 11:06:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=FhknuesOl8lF6FjwV2Qc/z+nG0xejuInhcls4JdTHTE=; b=DCehkkTPUKR/9mbJVh0nUrN/gskyVcvXI5B/eGdZLKEDp2a2IvK3KP27QsnLVrulFo 7FeUrVYKeUtQbXyLVsEUMSLVDdWeU3hDie02VUj4zCFu/f7+t9k/LrKu3YXphoB7kgS5 Dr+ubaFziWqyJAanEe1MV1xCL8jdrRvT15y2Xf03sBspj0ZEmpof5gX8nyKdTxIP/vtP 8TFzloPPg6j/QKMhpBLqMrZpuslq6kseN4umpk3KaACEb81xTF1zRnK8UgAMIEx3wen+ zKCAzmTS5qYpNM4GEoxRfHvttHY5guooy8wlnmqIDidMA65CN8Cp7Ow1DDmuubTJ6+no xwCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=FhknuesOl8lF6FjwV2Qc/z+nG0xejuInhcls4JdTHTE=; b=PqVekZ9NAq9dVa1/DR6XRvzyl5A5boDr7lunEDla+Z6N0zOcjrl8yX9c1kfEMd7rT5 okMWiK8w8z5DVM5fsNfpjRYQ+tDdx3wHG0jIRvcXR0wZopIHCO1jxdBS2JX50wgiePjl cRu8xUqKyKUj+TUhZufaObChzDB230hE/6HA/JGA2UBmtIi+L0acc1/474p3xCr4lSPP cxub8Ea7Pdos3Dfs7pHmy0ARU+rpmLLk7mpHzmq7/ERPtHgSxj38PUCdjKL0l7+SVX9g a+mAomaa0CEy3k39qaco6aOoWGgeZAN/zGKCFqsHB440CM/ZHkm77MKESMNi6MNxW9mM CVdg== X-Gm-Message-State: APjAAAWYcY9qmkpXradD/tnjXTpzhZpj9TWGRn5p//LgAIWbcVYrK1jG nRbJ4hJF9D6MHD+P+4x7HLMBAw== X-Received: by 2002:a37:a8ca:: with SMTP id r193mr1300286qke.346.1580324816647; Wed, 29 Jan 2020 11:06:56 -0800 (PST) Received: from [192.168.1.153] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id v2sm1406940qkj.29.2020.01.29.11.06.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Jan 2020 11:06:56 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl From: Qian Cai In-Reply-To: <20200129180851.551109-1-ghalat@redhat.com> Date: Wed, 29 Jan 2020 14:06:54 -0500 Cc: Linux Kernel Mailing List , Linux-MM , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, ssaner@redhat.com, atomlin@redhat.com, oleksandr@redhat.com, vbendel@redhat.com, kirill@shutemov.name, khlebnikov@yandex-team.ru, borntraeger@de.ibm.com, Andrew Morton , Iurii Zaikin , Kees Cook , Luis Chamberlain , Jonathan Corbet , Tetsuo Handa Content-Transfer-Encoding: quoted-printable Message-Id: <526F3E1C-87D3-4049-BC93-A4F0EDA45608@lca.pw> References: <20200129180851.551109-1-ghalat@redhat.com> To: Grzegorz Halat X-Mailer: Apple Mail (2.3608.40.2.2.4) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jan 29, 2020, at 1:08 PM, Grzegorz Halat wrote: >=20 > Memory management subsystem performs various checks at runtime, > if an inconsistency is detected then such event is being logged and = kernel > continues to run. While debugging such problems it is helpful to = collect > memory dump as early as possible. Currently, there is no easy way to = panic > kernel when such error is detected. Also, why can=E2=80=99t you have a simple script that checking for the = tainted flags periodically, and then trigger the crash dump once it happened? >=20 > It was proposed[1] to panic the kernel if panic_on_oops is set but = this > approach was not accepted. One of alternative proposals was = introduction of > a new sysctl. >=20 > Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then = the > kernel will be crashed when an inconsistency is detected by memory > management. This currently means panic when bad page or bad PTE > is detected(this may be extended to other places in MM). >=20 > Another use case of this sysctl may be in security-wise environments, > it may be more desired to crash machine than continue to run with > potentially damaged data structures. >=20 > Changes since v1 [2]: > - rename the sysctl to panic_on_inconsistent_mm > - move the sysctl from kernel to vm table > - print modules in print_bad_pte() only before calling panic >=20 > [1] = https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-borntrae= ger@de.ibm.com/ > [2] = https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/ >=20 > Signed-off-by: Grzegorz Halat > --- > Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++ > include/linux/kernel.h | 1 + > kernel/sysctl.c | 9 +++++++++ > mm/memory.c | 8 ++++++++ > mm/page_alloc.c | 4 +++- > 5 files changed, 35 insertions(+), 1 deletion(-) >=20 > diff --git a/Documentation/admin-guide/sysctl/vm.rst = b/Documentation/admin-guide/sysctl/vm.rst > index 64aeee1009ca..57f7926a64b8 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm: > - overcommit_memory > - overcommit_ratio > - page-cluster > +- panic_on_inconsistent_mm > - panic_on_oom > - percpu_pagelist_fraction > - stat_interval > @@ -741,6 +742,19 @@ extra faults and I/O delays for following faults = if they would have been part of > that consecutive pages readahead would have brought in. >=20 >=20 > +panic_on_inconsistent_mm > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= > + > +Controls the kernel's behaviour when inconsistency is detected > +by memory management code, for example bad page state or bad PTE. > + > +0: try to continue operation. > + > +1: panic immediately. > + > +The default value is 0. > + > + > panic_on_oom > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 0d9db2a14f44..b3bd94c558ab 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -518,6 +518,7 @@ extern int oops_in_progress; /* If = set, an oops, panic(), BUG() or die() is in > extern int panic_timeout; > extern unsigned long panic_print; > extern int panic_on_oops; > +extern int panic_on_inconsistent_mm; > extern int panic_on_unrecovered_nmi; > extern int panic_on_io_nmi; > extern int panic_on_warn; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 70665934d53e..a9733311e3a1 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] =3D { > .extra1 =3D SYSCTL_ZERO, > .extra2 =3D &two, > }, > + { > + .procname =3D "panic_on_inconsistent_mm", > + .data =3D &panic_on_inconsistent_mm, > + .maxlen =3D sizeof(int), > + .mode =3D 0644, > + .proc_handler =3D proc_dointvec_minmax, > + .extra1 =3D SYSCTL_ZERO, > + .extra2 =3D SYSCTL_ONE, > + }, > { > .procname =3D "panic_on_oom", > .data =3D &sysctl_panic_on_oom, > diff --git a/mm/memory.c b/mm/memory.c > index 45442d9a4f52..b29a18077a6a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -71,6 +71,7 @@ > #include > #include > #include > +#include >=20 > #include >=20 > @@ -88,6 +89,8 @@ > #warning Unfortunate NUMA and NUMA Balancing config, growing = page-frame for last_cpupid. > #endif >=20 > +int panic_on_inconsistent_mm __read_mostly; > + > #ifndef CONFIG_NEED_MULTIPLE_NODES > /* use the per-pgdat data instead for discontigmem - mbligh */ > unsigned long max_mapnr; > @@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct = *vma, unsigned long addr, > vma->vm_ops ? vma->vm_ops->fault : NULL, > vma->vm_file ? vma->vm_file->f_op->mmap : NULL, > mapping ? mapping->a_ops->readpage : NULL); > + > + if (panic_on_inconsistent_mm) { > + print_modules(); > + panic("Bad page map detected"); > + } > dump_stack(); > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d047bf7d8fd4..a20cd3ece5ba 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const = char *reason, > if (bad_flags) > pr_alert("bad because of flags: %#lx(%pGp)\n", > bad_flags, &bad_flags); > - dump_page_owner(page); >=20 > + dump_page_owner(page); > print_modules(); > + if (panic_on_inconsistent_mm) > + panic("Bad page state detected"); > dump_stack(); > out: > /* Leave bad fields for debug, except PageBuddy could make = trouble */ > --=20 > 2.21.1 >=20