Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp559619imm; Wed, 6 Jun 2018 02:08:15 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLYtqqn74iNGRj76vfqGJODYNowfmHegTbqA4duJy1f5BV99QgidBM3RWSz3bvW87FSWcwR X-Received: by 2002:a62:e903:: with SMTP id j3-v6mr1658598pfh.196.1528276095182; Wed, 06 Jun 2018 02:08:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528276095; cv=none; d=google.com; s=arc-20160816; b=hMPR3e4+f41u6QFUOypAH61PyM0iXo+pU+EJJwlUu5fLGma3P+n6rOR0gXA7weuReu 3JvUjWJL6Gqw8aXOt08DZSmHwLQAGeeFmYq080uxW1ikDnc2xlt08q0tMnXh1GVTXLAr XzD54b32B/txQLlv+sWF+zpcL79mGi8wjWijVgudYjJLI3mFyC/F/XB3mPT/NTPc2zQv MUfhAi7XZmWnUPFYI50NImLs780UF05xRs2M+4HcbRMnllFfcTaaEeQaWIEHMJ56+pZj oMxX58RtnPqaqXNIXuBsFpCW4xTziHGcXfW8fGho4Tnvf9a0z7J7sXcrR6+bV1r+3Ho0 sEfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :arc-authentication-results; bh=o87dbeYDqIeQeS6BlLy8x36xNjHeaLSriK08pqPlu9M=; b=bwCLEa6fiunSOPNhb7zpdbpmLbVhxM5O6mGxMwUI+SIgxFptb5kmQAiNl0I2JV66r8 rf+MT9gJKgqG4CcuZmFWrAoUF9SfXZudu0Uj9zRBXp/iKSNbhFoHuYSCdXz9IrNDYVHZ FnOgQW17weNMbRuc1Bb6OWFU3KEPgI0uDF6YMtDCVgSkXa19L+VncmO9PcRarH14IiGY rVAEA81CW1UtUWyIEvUR0sCVL37jxKAnJPai+9m8stfaUSlPex6Ji0xMI3jpwlaizBv/ ksxDNiRs7tqpQUjleowOnoc8/y/jO07oIEhFCC1kGZFgKP9wQ290WAO64ubVYUnlRVGp 0lxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a3-v6si39714303pgq.652.2018.06.06.02.08.00; Wed, 06 Jun 2018 02:08:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932357AbeFFJHc convert rfc822-to-8bit (ORCPT + 99 others); Wed, 6 Jun 2018 05:07:32 -0400 Received: from tyo161.gate.nec.co.jp ([114.179.232.161]:49130 "EHLO tyo161.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932253AbeFFJHa (ORCPT ); Wed, 6 Jun 2018 05:07:30 -0400 Received: from mailgate01.nec.co.jp ([114.179.233.122]) by tyo161.gate.nec.co.jp (8.15.1/8.15.1) with ESMTPS id w5696iJU032178 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 6 Jun 2018 18:06:44 +0900 Received: from mailsv01.nec.co.jp (mailgate-v.nec.co.jp [10.204.236.94]) by mailgate01.nec.co.jp (8.15.1/8.15.1) with ESMTP id w5696il1022910; Wed, 6 Jun 2018 18:06:44 +0900 Received: from mail02.kamome.nec.co.jp (mail02.kamome.nec.co.jp [10.25.43.5]) by mailsv01.nec.co.jp (8.15.1/8.15.1) with ESMTP id w5695wIF002439; Wed, 6 Jun 2018 18:06:44 +0900 Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.149] [10.38.151.149]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-1012593; Wed, 6 Jun 2018 18:06:31 +0900 Received: from BPXM23GP.gisp.nec.co.jp ([10.38.151.215]) by BPXC21GP.gisp.nec.co.jp ([10.38.151.149]) with mapi id 14.03.0319.002; Wed, 6 Jun 2018 18:06:30 +0900 From: Naoya Horiguchi To: Oscar Salvador CC: Matthew Wilcox , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Andrew Morton , Michal Hocko , "mingo@kernel.org" , "dan.j.williams@intel.com" , Huang Ying , Pavel Tatashin Subject: Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM Thread-Topic: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM Thread-Index: AQHT/GeymmpDVR1XWE69lshFJtmzcqRQRpEAgABpKgCAAWucAIAALt0AgAANvoCAAAOvAA== Date: Wed, 6 Jun 2018 09:06:30 +0000 Message-ID: <20180606090630.GA27065@hori1.linux.bs1.fc.nec.co.jp> References: <20180605005402.GA22975@hori1.linux.bs1.fc.nec.co.jp> <20180605011836.GA32444@bombadil.infradead.org> <20180605073500.GA23766@hori1.linux.bs1.fc.nec.co.jp> <20180606051624.GA16021@hori1.linux.bs1.fc.nec.co.jp> <20180606080408.GA31794@techadventures.net> <20180606085319.GA32052@techadventures.net> In-Reply-To: <20180606085319.GA32052@techadventures.net> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.51.8.81] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 06, 2018 at 10:53:19AM +0200, Oscar Salvador wrote: > On Wed, Jun 06, 2018 at 10:04:08AM +0200, Oscar Salvador wrote: > > On Wed, Jun 06, 2018 at 05:16:24AM +0000, Naoya Horiguchi wrote: > > > On Tue, Jun 05, 2018 at 07:35:01AM +0000, Horiguchi Naoya(堀口 直也) wrote: > > > > On Mon, Jun 04, 2018 at 06:18:36PM -0700, Matthew Wilcox wrote: > > > > > On Tue, Jun 05, 2018 at 12:54:03AM +0000, Naoya Horiguchi wrote: > > > > > > Reproduction precedure is like this: > > > > > > - enable RAM based PMEM (with a kernel boot parameter like memmap=1G!4G) > > > > > > - read /proc/kpageflags (or call tools/vm/page-types with no arguments) > > > > > > (- my kernel config is attached) > > > > > > > > > > > > I spent a few days on this, but didn't reach any solutions. > > > > > > So let me report this with some details below ... > > > > > > > > > > > > In the critial page request, stable_page_flags() is called with an argument > > > > > > page whose ->compound_head was somehow filled with '0xffffffffffffffff'. > > > > > > And compound_head() returns (struct page *)(head - 1), which explains the > > > > > > address 0xfffffffffffffffe in the above message. > > > > > > > > > > Hm. compound_head shares with: > > > > > > > > > > struct list_head lru; > > > > > struct list_head slab_list; /* uses lru */ > > > > > struct { /* Partial pages */ > > > > > struct page *next; > > > > > unsigned long _compound_pad_1; /* compound_head */ > > > > > unsigned long _pt_pad_1; /* compound_head */ > > > > > struct dev_pagemap *pgmap; > > > > > struct rcu_head rcu_head; > > > > > > > > > > None of them should be -1. > > > > > > > > > > > It seems that this kernel panic happens when reading kpageflags of pfn range > > > > > > [0xbffd7, 0xc0000), which coresponds to a 'reserved' range. > > > > > > > > > > > > [ 0.000000] user-defined physical RAM map: > > > > > > [ 0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable > > > > > > [ 0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved > > > > > > [ 0.000000] user: [mem 0x00000000000f0000-0x00000000000fffff] reserved > > > > > > [ 0.000000] user: [mem 0x0000000000100000-0x00000000bffd6fff] usable > > > > > > [ 0.000000] user: [mem 0x00000000bffd7000-0x00000000bfffffff] reserved > > > > > > [ 0.000000] user: [mem 0x00000000feffc000-0x00000000feffffff] reserved > > > > > > [ 0.000000] user: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > > > > > > [ 0.000000] user: [mem 0x0000000100000000-0x000000013fffffff] persistent (type 12) > > > > > > > > > > > > So I guess 'memmap=' parameter might badly affect the memory initialization process. > > > > > > > > > > > > This problem doesn't reproduce on v4.17, so some pre-released patch introduces it. > > > > > > I hope this info helps you find the solution/workaround. > > > > > > > > > > Can you try bisecting this? It could be one of my patches to reorder struct > > > > > page, or it could be one of Pavel's deferred page initialisation patches. > > > > > Or something else ;-) > > > > > > > > Thank you for the comment. I'm trying bisecting now, let you know the result later. > > > > > > > > And I found that my statement "not reproduce on v4.17" was wrong (I used > > > > different kvm guests, which made some different test condition and misguided me), > > > > this seems an older (at least < 4.15) bug. > > > > > > (Cc: Pavel) > > > > > > Bisection showed that the following commit introduced this issue: > > > > > > commit f7f99100d8d95dbcf09e0216a143211e79418b9f > > > Author: Pavel Tatashin > > > Date: Wed Nov 15 17:36:44 2017 -0800 > > > > > > mm: stop zeroing memory during allocation in vmemmap > > > > > > This patch postpones struct page zeroing to later stage of memory initialization. > > > My kernel config disabled CONFIG_DEFERRED_STRUCT_PAGE_INIT so two callsites of > > > __init_single_page() were never reached. So in such case, struct pages populated > > > by vmemmap_pte_populate() could be left uninitialized? > > > And I'm not sure yet how this issue becomes visible with memmap= setting. > > > > I think that this becomes visible because memmap=x!y creates a persistent memory region: > > > > parse_memmap_one > > { > > ... > > } else if (*p == '!') { > > start_at = memparse(p+1, &p); > > e820__range_add(start_at, mem_size, E820_TYPE_PRAM); > > ... > > } > > > > and this region it is not added neither in memblock.memory nor in memblock.reserved. > > Ranges in memblock.memory get zeroed in memmap_init_zone(), while memblock.reserved get zeroed > > in free_low_memory_core_early(): > > > > static unsigned long __init free_low_memory_core_early(void) > > { > > ... > > for_each_reserved_mem_region(i, &start, &end) > > reserve_bootmem_region(start, end); > > ... > > } > > > > > > Maybe I am mistaken, but I think that persistent memory regions should be marked as reserved. > > A comment in do_mark_busy() suggests this: > > > > static bool __init do_mark_busy(enum e820_type type, struct resource *res) > > { > > > > ... > > /* > > * Treat persistent memory like device memory, i.e. reserve it > > * for exclusive use of a driver > > */ > > ... > > } > > > > > > I wonder if something like this could work and if so, if it is right (i haven't tested it yet): > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > > index 71c11ad5643e..3c9686ef74e5 100644 > > --- a/arch/x86/kernel/e820.c > > +++ b/arch/x86/kernel/e820.c > > @@ -1247,6 +1247,11 @@ void __init e820__memblock_setup(void) > > if (end != (resource_size_t)end) > > continue; > > > > + if (entry->type == E820_TYPE_PRAM || entry->type == E820_TYPE_PMEM) { > > + memblock_reserve(entry->addr, entry->size); > > + continue; > > + } > > + > > if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) > > continue; > > It does not seem to work, so the reasoning might be incorrect. Thank you for the comment. One note is that the memory region with "broken struct page" is a typical reserved region, not a pmem region. Strangely reading offset 0xbffd7 of /proc/kpageflags is OK if pmem region does not exist, but NG if pmem region exists. Reading the offset like 0x100000 (on pmem region) does not cause the crash, so pmem region seems properly set up. [ 0.000000] user-defined physical RAM map: [ 0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] user: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ 0.000000] user: [mem 0x0000000000100000-0x00000000bffd6fff] usable [ 0.000000] user: [mem 0x00000000bffd7000-0x00000000bfffffff] reserved ===> "broken struct page" region [ 0.000000] user: [mem 0x00000000feffc000-0x00000000feffffff] reserved [ 0.000000] user: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [ 0.000000] user: [mem 0x0000000100000000-0x000000013fffffff] persistent (type 12) => pmem region [ 0.000000] user: [mem 0x0000000140000000-0x000000023fffffff] usable Thanks, Naoya Horiguchi