Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp651899pxu; Wed, 6 Jan 2021 00:08:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJxiCIAoEhpR7WYf7JOfLVQi6viUyF8fGHdp4e7FAyzjYbjKmxdEEOIaemmFYkFCNxHCXeZb X-Received: by 2002:aa7:c891:: with SMTP id p17mr3212342eds.309.1609920533445; Wed, 06 Jan 2021 00:08:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609920533; cv=none; d=google.com; s=arc-20160816; b=Yqb0/2FiQCfBCIF4v6rGN7uVGFwNtLSUqgtm0sQDjMB1r1Bjx82w1xdX9t4w7Q/hBb BU1U9C5ffy/yKUqaS/8XP5cx8sVdTkI8NDL+KtVGgPcoGWKiNpO+pUeMTL7LVVC18JBR Ev2bdHx7xF1U7njGthV6GqjwQ+N8tGRcqvQlBVYdZfZ3GVVcZqsB9J9N9nm81sC3fj1J zTrtGX+8q/PJL9rGg5hzSCoiX8LVwH5JoFaIYgmr8gha5mcGUipd4CipC/2wns6VcGqg Bop8frFI07i4xwXmL9r6TTwXncafeoJvIcwfSsfoYM9spU/D4C0NtRdeUUClU8S9a8dP Z4sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=EA8rpusHbTFtwU42YXVl0jQGZSINKrz19d6mTjzMObw=; b=SvLwyn+VPvPWC0R1fgO7VpczO/N5EXt0RrIT03QdphSJiHu0/+/2vXEgWRjhU9+pFF 9NuK82Sl9z2SOFFfbtK+QXs0wHGhYnVi3TenNGC5pzp5d1U0RZtXIdQm145VeKkW2nNv 77fOlBPmR+ltLKkTd2sTR9OGfkDwSxWXHGDhOPiJEKAzpFkqwaUvhEqQSujzeDeqlhQd nAVj3BIEMFNoXqaLIgqXQ50jc6XG9MRGYgFXuo8/N8+2ogOtiDvLbhDe2O707ci2X+JU 1Ue9ui/eSqPKhhfPQ4gk81fKjdcgxgYGBBw8NYXIDfU03bDm4695g0HZxdX/GQDikjYl uxOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hMKYoGci; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lh17si665172ejb.328.2021.01.06.00.08.30; Wed, 06 Jan 2021 00:08:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hMKYoGci; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726346AbhAFIGo (ORCPT + 99 others); Wed, 6 Jan 2021 03:06:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:37920 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726258AbhAFIGo (ORCPT ); Wed, 6 Jan 2021 03:06:44 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7483823105; Wed, 6 Jan 2021 08:05:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1609920363; bh=DOxADtf74LzPX3TOq50ZcCx2eedEzlGE74o1a4nc3p8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hMKYoGciJMH2s1vxYcEKx8LKui+yfYYH8T5NTc2XebwsvezalLppbSVeSRgtitC+m hBfkbN2nlz1srNNE+f7iBIBl3N/w26uF0dprP/Bh3AoBNJuEGu38T2YmSySttF3dxn PMgHLKwlTJln6zur+sie+b9CVirlr2Z25Fqti75punXqeCDKl/K4lDz5KHML/+3cvV ClR9XgID7zZc8HHPdagkrymcnsNQplmR6ZnkV99X6LSOBi9wg6rKGu/4Pee8z9G4qo 8W/4sX8yb4X7fHdxFGqwl6m1ntVSPmcyXfq1p/aIeEZlCc3ecikuWNVqlyTjLT4IJz Eoi6EktwArSVg== Date: Wed, 6 Jan 2021 10:05:53 +0200 From: Mike Rapoport To: Qian Cai Cc: Andrew Morton , Andrea Arcangeli , Baoquan He , David Hildenbrand , Mel Gorman , Michal Hocko , Mike Rapoport , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, Stephen Rothwell , Linux Next Mailing List Subject: Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout Message-ID: <20210106080553.GB1106298@kernel.org> References: <20201209214304.6812-1-rppt@kernel.org> <20201209214304.6812-3-rppt@kernel.org> <768cb57d6ef0989293b3f9fbe0af8e8851723ea1.camel@redhat.com> <20210105082403.GA1106298@kernel.org> <67ef893f27551f80ecf49ef78c0ebc05d3e41b46.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67ef893f27551f80ecf49ef78c0ebc05d3e41b46.camel@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 05, 2021 at 01:45:37PM -0500, Qian Cai wrote: > On Tue, 2021-01-05 at 10:24 +0200, Mike Rapoport wrote: > > Hi, > > > > On Mon, Jan 04, 2021 at 02:03:00PM -0500, Qian Cai wrote: > > > On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote: > > > > From: Mike Rapoport > > > > > > > > Interleave initialization of pages that correspond to holes with the > > > > initialization of memory map, so that zone and node information will be > > > > properly set on such pages. > > > > > > > > Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions > > > > rather > > > > that check each PFN") > > > > Reported-by: Andrea Arcangeli > > > > Signed-off-by: Mike Rapoport > > > > > > Reverting this commit on the top of today's linux-next fixed a crash while > > > reading /proc/kpagecount on a NUMA server. > > > > Can you please post the entire dmesg? > > http://people.redhat.com/qcai/dmesg.txt > > > Is it possible to get the pfn that triggered the crash? > > Do you have any idea how to convert that fffffffffffffffe to pfn as it is always > that address? I don't understand what that address is though. I tried to catch > it from struct page pointer and page_address() without luck. I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffffffffffffffe is "accessed" from VM_BUG_ON_PAGE(). It seems to me that we are not initializing struct pages for holes at the node boundaries because zones are already clamped to exclude those holes. Can you please try to see if the patch below will produce any useful info: diff --git a/fs/proc/page.c b/fs/proc/page.c index 4dcbcd506cb6..708f8211dcc0 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -66,10 +66,14 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, */ ppage = pfn_to_online_page(pfn); - if (!ppage || PageSlab(ppage) || page_has_type(ppage)) + if (ppage && PagePoisoned(ppage)) { + pr_info("%s: pfn %lx is poisoned\n", __func__, pfn); pcount = 0; - else + } else if (!ppage || PageSlab(ppage) || page_has_type(ppage)) { + pcount = 0; + } else { pcount = page_mapcount(ppage); + } if (put_user(pcount, out)) { ret = -EFAULT; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 124b8c654ec6..1b3a37ace1b1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6271,6 +6271,8 @@ static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn, unsigned long pfn; u64 pgcnt = 0; + pr_info("%s: spfn: %lx, epfn: %lx, zone: %s, node: %d\n", __func__, spfn, epfn, zone_names[zone], node); + for (pfn = spfn; pfn < epfn; pfn++) { if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) > > > > > [ 8858.006726][T99897] BUG: unable to handle page fault for address: > > > fffffffffffffffe > > > [ 8858.014814][T99897] #PF: supervisor read access in kernel mode > > > [ 8858.020686][T99897] #PF: error_code(0x0000) - not-present page > > > [ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 > > > [ 8858.033224][T99897] Oops: 0000 [#1] SMP KASAN NOPTI > > > [ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted: > > > G O 5.11.0-rc1-next-20210104 #1 > > > [ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant > > > DL385 Gen10, BIOS A40 03/09/2018 > > > [ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0 > > > PageSlab at include/linux/page-flags.h:342 > > > (inlined by) kpagecount_read at fs/proc/page.c:69 > -- Sincerely yours, Mike.