Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp751566imu; Fri, 25 Jan 2019 10:17:37 -0800 (PST) X-Google-Smtp-Source: ALg8bN6iAn7+jFWSfse+vmWx05vafiojZ5quGKuyR5HBoj3sTsNrk94mflcn5CzDybzZX/QYvlPe X-Received: by 2002:a17:902:8687:: with SMTP id g7mr11921887plo.96.1548440257765; Fri, 25 Jan 2019 10:17:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548440257; cv=none; d=google.com; s=arc-20160816; b=TREkoNUC1BTFxhPIE3FxOuewxodnoJ/Ql1SqEMC5Y/FyrtsUal5Lw14Z27dDuOo5ie B6ECI0QW5EMmraRhw54y4JU/QKmNk5xRnupz7rqk2YBf2ypDbCJ2HcXo2JIdpy9p//lG 0SYPRc21fN+Kk3DUfUNPuDcXqVrMA5KIbZJmOJQWekfFLgqDiQHnp2DlfVmbpW+G3rhb T+ah5gCs2d8BLeL/tVJU9h5o2xkIget4O7idOdqBsOt2IJ5kXT/s9wv14pMjV0TC1QZA /YbpVYaMCa4hsTd0B1hWm07kt7VB8kQg98Au4afAUiitgGgDOtBr4h7j7fEtwArlpSM4 tLXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=gQsUo3X+gZnGx8dgu+JA3ufMtpybASRbTuVK2s3iuAQ=; b=Z8dQEeBycBZJ5//8nYAwhP7FhFjYIQyYzzR6UAG9nY2qDROLxwUx/8r3xRWtqwUzFx 1nHhsnkBthDfOMZCQ6ry+75MjNVvW5/MfNfjDYkb4eera3L4Xzq9d1Za8M+t+iWKMo2r fccolH8o6jeFX+smFzF5LJGhUG9yEnUCiVP9V1G4k7EquXRBGwrZ94QflIzqnJaBN2Cn pp0JK/Ffssfbf3b9SSTM5Qsk7HO7F23OMYED2/4KPwUv/VPveFd6OcEWXSyXGf26Qs80 NUVfgg6PAPf2kcKtGRaf+D1fDyfeG95wRvscrWsgqv+RsxHpSvJC1ePmzJjtai4xojdz Sf/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b61si26738785plb.70.2019.01.25.10.17.21; Fri, 25 Jan 2019 10:17:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729310AbfAYSP4 (ORCPT + 99 others); Fri, 25 Jan 2019 13:15:56 -0500 Received: from mx2.suse.de ([195.135.220.15]:57602 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726335AbfAYSPz (ORCPT ); Fri, 25 Jan 2019 13:15:55 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AA16AAF86; Fri, 25 Jan 2019 18:15:53 +0000 (UTC) Date: Fri, 25 Jan 2019 19:15:49 +0100 From: Michal Hocko To: robert shteynfeld Cc: Linus Torvalds , Mikhail Zaslonko , Linux List Kernel Mailing , Gerald Schaefer , Mikhail Gavrilov , Dave Hansen , Alexander Duyck , Andrew Morton , Pavel Tatashin , Steven Sistare , Daniel Jordan , Bob Picco Subject: Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684 Message-ID: <20190125181549.GE20411@dhcp22.suse.cz> References: <20190125073704.GC3560@dhcp22.suse.cz> <20190125081924.GF3560@dhcp22.suse.cz> <20190125082952.GG3560@dhcp22.suse.cz> <20190125155810.GQ3560@dhcp22.suse.cz> <20190125163938.GA20411@dhcp22.suse.cz> <20190125173315.GC20411@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190125173315.GC20411@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 25-01-19 18:33:15, Michal Hocko wrote: > On Fri 25-01-19 17:39:38, Michal Hocko wrote: > > On Fri 25-01-19 11:16:30, robert shteynfeld wrote: > > > Attached is the dmesg from patched kernel. > > > > Your Node1 physical memory range precedes Node0 which is quite unusual > > but it shouldn't be a huge problem on its own. But memory ranges are > > not aligned to the memory section > > > > [ 0.286954] Early memory node ranges > > [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff] > > [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] > > [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff] > > [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff] > > > > As you can see the last pfn for the node1 is inside the section and > > Node0 starts right after. This is quite unusual as well. If for no other > > reasons then the memmap of those struct pages will be remote for one or > > the other. Actually I am not even sure we can handle that properly > > because we do expect 1:1 mapping between sections and nodes. > > > > Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug: > > initialize struct pages for the full memory section") made any > > difference. We simply write over a potentially initialized struct page > > and blow up on that. I strongly suspect that the commit just uncovered > > a pre-existing problem. Let me think what we can do about that. > > Appart from force aligning node's start the only other option is to > revert 2830bf6f05fb and handling the underlying issue in the hotplug > code. We cannot really align because we have things like ZONE_DMA starting at 0x1000 and who knows what else. So let's go with the revert. Hutplug simply needs a larger surgery to get rid of the PAGES_PER_SECTION inherent assumptions. Linus, could you take the revert please? From 817b18d3db36a6900ca9043af8c1416c56358be3 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 25 Jan 2019 19:08:58 +0100 Subject: [PATCH] Revert "mm, memory_hotplug: initialize struct pages for the full memory section" This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684. The underlying assumption that one sparse section belongs into a single numa node doesn't hold really. Robert Shteynfeld has reported a boot failure. The boot log was not captured but his memory layout is as follows: [ 0.286954] Early memory node ranges [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff] [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff] [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff] This means that node0 starts in the middle of a memory section which is also in node1. memmap_init_zone tries to initialize padding of a section even when it is outside of the given pfn range because there are code paths (e.g. memory hotplug) which assume that the full worth of memory section is always initialized. In this particular case, though, such a range is already intialized and most likely already managed by the page allocator. Scribbling over those pages corrupts the internal state and likely blows up when any of those pages gets used. Reported-by: Robert Shteynfeld Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section") Cc: stable Signed-off-by: Michal Hocko --- mm/page_alloc.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d295c9bc01a8..35fdde041f5c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5701,18 +5701,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, cond_resched(); } } -#ifdef CONFIG_SPARSEMEM - /* - * If the zone does not span the rest of the section then - * we should at least initialize those pages. Otherwise we - * could blow up on a poisoned page in some paths which depend - * on full sections being initialized (e.g. memory hotplug). - */ - while (end_pfn % PAGES_PER_SECTION) { - __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid); - end_pfn++; - } -#endif } #ifdef CONFIG_ZONE_DEVICE -- 2.20.1 -- Michal Hocko SUSE Labs