Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp658312imu; Fri, 25 Jan 2019 08:41:27 -0800 (PST) X-Google-Smtp-Source: ALg8bN4rbqHhv9m3BxZ4+4DR4DizhuvL5gh/xX3p0gPQXAjel9LoDL7BztjMuhzuP/Gpl4XsonI5 X-Received: by 2002:a65:6392:: with SMTP id h18mr10700660pgv.107.1548434487451; Fri, 25 Jan 2019 08:41:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548434487; cv=none; d=google.com; s=arc-20160816; b=Ahw+x0ihHyGgaoz2XCFPUQMpwTmqNE4sh9JUlWVlIrP1mG2FXKre2IFSq6ze4A+WnF lHpJbtWWXiGhiKXn2faGVBk1/6RTVRT+PFNK0GOeIgXPGvIjNiFmDk3TVJ64x4gVCB9e enLwrSrHFpAusoLSlKzD2RF4GMVCsYTbK9FAIR6igy8HGaSOE5tM84f4BWke4jpE52or D56i8hzRFrpcFkjXPkk+5hOK44QYIVZyiViboWRneGKGpnQ50FoAjVGqhwENC+bpVxpy QzzCuEsoXoAn3Rd0cYYaQnAcHeEap4JkUpxy9hNpIJztiDcWVuTm1faTpxiV8W6315aD hyaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VJu7yhgHS2RFvImiLA1LZueRYm4hwXumO9zjbQY7TUc=; b=xEeSQvWbrdUOa7GI9Y2xdoKBJnUgM1w2GSWOabTo9w1yol3yCGXXDPhBy63QIIWrda bsUwjNXPcerA/73dT2k8HE2mD3MmlnZkL2m9L0QbvzzeWAKp1dnw9r5GHkaNKuiZjgQy XNrnivZTX5L6Re9Xy6t+BJaqtCbZgezmrz52BnyNuPLCbSa7Y5KsyRyZO+3KfKjMPDjo I/6nVXcQLddFCj84Fbstm4WS3fjyqGHjvBEujeeBk+c36du4N9/YLoOhJFaLB2J4cBYj +nY6/43MyI0lh/uv6uGOatua653TmCDYHSMolIvMEC01nQqK3A/q22LQMEePLTPNG6C+ F1iQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ba9si25076213plb.109.2019.01.25.08.41.11; Fri, 25 Jan 2019 08:41:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726778AbfAYQjn (ORCPT + 99 others); Fri, 25 Jan 2019 11:39:43 -0500 Received: from mx2.suse.de ([195.135.220.15]:37482 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726108AbfAYQjm (ORCPT ); Fri, 25 Jan 2019 11:39:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 86CBBB044; Fri, 25 Jan 2019 16:39:41 +0000 (UTC) Date: Fri, 25 Jan 2019 17:39:38 +0100 From: Michal Hocko To: robert shteynfeld Cc: Linus Torvalds , Mikhail Zaslonko , Linux List Kernel Mailing , Gerald Schaefer , Mikhail Gavrilov , Dave Hansen , Alexander Duyck , Andrew Morton , Pavel Tatashin , Steven Sistare , Daniel Jordan , Bob Picco Subject: Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684 Message-ID: <20190125163938.GA20411@dhcp22.suse.cz> References: <20190125073704.GC3560@dhcp22.suse.cz> <20190125081924.GF3560@dhcp22.suse.cz> <20190125082952.GG3560@dhcp22.suse.cz> <20190125155810.GQ3560@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 25-01-19 11:16:30, robert shteynfeld wrote: > Attached is the dmesg from patched kernel. Your Node1 physical memory range precedes Node0 which is quite unusual but it shouldn't be a huge problem on its own. But memory ranges are not aligned to the memory section [ 0.286954] Early memory node ranges [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff] [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff] [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff] As you can see the last pfn for the node1 is inside the section and Node0 starts right after. This is quite unusual as well. If for no other reasons then the memmap of those struct pages will be remote for one or the other. Actually I am not even sure we can handle that properly because we do expect 1:1 mapping between sections and nodes. Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section") made any difference. We simply write over a potentially initialized struct page and blow up on that. I strongly suspect that the commit just uncovered a pre-existing problem. Let me think what we can do about that. > I'm not an expert at debugging the kernel, obviously. I tried setting > up a serial console before without much luck as part of this debugging > session. Ubuntu has a nice howto for netconsole configuration https://wiki.ubuntu.com/Kernel/Netconsole. It is quite important to get the actual failure. -- Michal Hocko SUSE Labs