Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1323832ybl; Wed, 28 Aug 2019 12:56:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqzPEVudDETYiY0rFo+Y8d+tCOjRb48D87BUeDY3qAcf1PC8fw7/tMg8N3n4THEEW3k4xI7+ X-Received: by 2002:a17:902:4381:: with SMTP id j1mr1390578pld.318.1567022177774; Wed, 28 Aug 2019 12:56:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567022177; cv=none; d=google.com; s=arc-20160816; b=kt+HHTqNccEzNRoCgozGwQooET+YJhOriEVYEY8psMLFJRwqLWwcOA+4mLo+NK7Bo7 i1Nw/3OBb+j7qx7Y+J58YtBp4ZGSrNQQALHKlvX2Ln+6z+SMx8+boXxBOKQGl6Ma74zv agyx17gyycVsU6K3vFpLpWrycOyccmYhC1rzVKeAhCBKts4M92Kgqwq03KyyjA7uyv6o lu4/35YeS/arksL3NstLwh91GFUit8Ksakmt0Dr250q1M9ObfpEoiUQWXAXThg81rQvB asrYo1Z2CXQnHVKGsVNrUJ2oXgHx9mMPKRlqE7xW/cvLQkHpZ0zofJ0Uu5IrDOGrbVYq +98A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=wmZAQJx2OqL6LFXkueXZkFlQjyj3TtN4tXcZOT10LQ0=; b=GAHwpez5LSPT3MLriKd4eEujHJCE7MA2XAIPnfmx57bbNymnK+fgelpLyBckipXo2R MI9fVk781xAm1aq+98TeMGYENf72Z0aI2x2MNGnLj1q3EJyvLQE/BHNZZ51XgX3ReXxx ws2bHjl4+XjUNe2TdH5E7mf4jhXxUFy/PYFJBbN0GZVKFkkVSSD8p0MVXqgrsXlDTRRY Tu91/aaoAKLX4skPGmF/YhOfg5lPlT0gq54s7Wova2NxCTDyh2IORwkao4HHw/IQQDFy vTeUnodfGTWIr8C8vZb4+UIvUsXIYtpdE7xp6t3NXon0k1QSzcbQsKDr7Qpmb+mpFXGb Aavw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5si34040pgq.355.2019.08.28.12.56.01; Wed, 28 Aug 2019 12:56:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726828AbfH1TyT convert rfc822-to-8bit (ORCPT + 99 others); Wed, 28 Aug 2019 15:54:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33664 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726725AbfH1TyT (ORCPT ); Wed, 28 Aug 2019 15:54:19 -0400 Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C60845AFDE for ; Wed, 28 Aug 2019 19:54:18 +0000 (UTC) Received: by mail-lf1-f70.google.com with SMTP id b30so116051lfq.6 for ; Wed, 28 Aug 2019 12:54:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=y8b+VQlMQkntqtzRJkYxCkt06xbQtLJ+059ETXIVBcY=; b=I64CcZHBgszeZAoHgdCGQ2Vv86VT8otGHmC5r4Hl4HeR3d0gwTR2JkptTERimJfbLB JADjy4UfvxXAxDKI5r2HKM679onQEL3QOifQUF4VCQpxRhYE8qnj6Ipyx+6PzBl9HOlz B/omHgmhyOb4Ji7+2D6Ue9XmHnZPafEbQhFKgRnYJraf3DTMHfAlNaMzdIi4bWXU9PSB rcGQXrOIXeTYUnyhYVhtAwsquDPn6x6Zo772PF4DtGHma3UXtPY0BbpFhtNhTiZlsq6X o4CQybByVqlyYJqTDU2bFyXkjJs0dRdIWtwTlldeXx6uid+yJxuANUmEoWkypx9XmK52 4tzA== X-Gm-Message-State: APjAAAWrTNOp0sWlHQurowj6h0UHbkNlfJ2esuPl/kBthuHiAF8ENZnP h8R+Pl3yMtwj/ok/dCa48fufU9wrOZnpDh/bWtjpDHtAY1TYLl+s0HnFLx2Wm1g66t/E3zwmpCp WWMs3xcfU32mwcuvFSaVO3yTLNjyzRP2em9NkdSHW X-Received: by 2002:a19:f019:: with SMTP id p25mr2331030lfc.9.1567022056772; Wed, 28 Aug 2019 12:54:16 -0700 (PDT) X-Received: by 2002:a19:f019:: with SMTP id p25mr2331013lfc.9.1567022056438; Wed, 28 Aug 2019 12:54:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Bhupesh Sharma Date: Thu, 29 Aug 2019 01:24:05 +0530 Message-ID: Subject: Re: /proc/vmcore and wrong PAGE_OFFSET To: Donald Buczek Cc: iommu , linux-pci@vger.kernel.org, x86@kernel.org, kexec mailing list , Linux Kernel Mailing List , Paul Menzel , Simon Horman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Donald, On Wed, Aug 28, 2019 at 8:38 PM Donald Buczek wrote: > > On 8/20/19 11:21 PM, Donald Buczek wrote: > > Dear Linux folks, > > > > I'm investigating a problem, that the crash utility fails to work with our crash dumps: > > > > buczek@kreios:/mnt$ crash vmlinux crash.vmcore > > crash 7.2.6 > > Copyright (C) 2002-2019 Red Hat, Inc. > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation > > Copyright (C) 1999-2006 Hewlett-Packard Co > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > > Copyright (C) 2005, 2011 NEC Corporation > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > > This program is free software, covered by the GNU General Public License, > > and you are welcome to change it and/or distribute copies of it under > > certain conditions. Enter "help copying" to see the conditions. > > This program has absolutely no warranty. Enter "help warranty" for details. > > GNU gdb (GDB) 7.6 > > Copyright (C) 2013 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-unknown-linux-gnu"... > > crash: read error: kernel virtual address: ffff89807ff77000 type: "memory section root table" > > > > The crash file is a copy of /dev/vmcore taken by a crashkernel after a sysctl-forced panic. > > > > It looks to me, that 0xffff89807ff77000 is not readable, because the virtual addresses stored in the elf header of the dump file are off by 0x0000008000000000: > > > > buczek@kreios:/mnt$ readelf -a crash.vmcore | grep LOAD | perl -lane 'printf "%s (%016x)\n",$_,hex($F[2])-hex($F[3])' > > LOAD 0x000000000000d000 0xffffffff81000000 0x000001007d000000 (fffffeff04000000) > > LOAD 0x0000000001c33000 0xffff880000001000 0x0000000000001000 (ffff880000000000) > > LOAD 0x0000000001cc1000 0xffff880000090000 0x0000000000090000 (ffff880000000000) > > LOAD 0x0000000001cd1000 0xffff880000100000 0x0000000000100000 (ffff880000000000) > > LOAD 0x0000000001cd2070 0xffff880000100070 0x0000000000100070 (ffff880000000000) > > LOAD 0x0000000019bd2000 0xffff880038000000 0x0000000038000000 (ffff880000000000) > > LOAD 0x000000004e6a1000 0xffff88006ffff000 0x000000006ffff000 (ffff880000000000) > > LOAD 0x000000004e6a2000 0xffff880100000000 0x0000000100000000 (ffff880000000000) > > LOAD 0x0000001fcda22000 0xffff882080000000 0x0000002080000000 (ffff880000000000) > > LOAD 0x0000003fcd9a2000 0xffff884080000000 0x0000004080000000 (ffff880000000000) > > LOAD 0x0000005fcd922000 0xffff886080000000 0x0000006080000000 (ffff880000000000) > > LOAD 0x0000007fcd8a2000 0xffff888080000000 0x0000008080000000 (ffff880000000000) > > LOAD 0x0000009fcd822000 0xffff88a080000000 0x000000a080000000 (ffff880000000000) > > LOAD 0x000000bfcd7a2000 0xffff88c080000000 0x000000c080000000 (ffff880000000000) > > LOAD 0x000000dfcd722000 0xffff88e080000000 0x000000e080000000 (ffff880000000000) > > LOAD 0x000000fc4d722000 0xffff88fe00000000 0x000000fe00000000 (ffff880000000000) > > > > (Columns are File offset, Virtual Address, Physical Address and computed offset). > > > > I would expect the offset between the virtual and the physical address to be PAGE_OFFSET, which is 0xffff88800000000 on x86_64, not 0xffff880000000000. Unlike /proc/vmcore, /proc/kcore shows the same physical memory (of the last memory section above) with a correct offset: > > > > buczek@kreios:/mnt$ sudo readelf -a /proc/kcore | grep 0x000000fe00000000 | perl -lane 'printf "%s (%016x)\n",$_,hex($F[2])-hex($F[3])' > > LOAD 0x0000097e00004000 0xffff897e00000000 0x000000fe00000000 (ffff888000000000) > > > > The failing address 0xffff89807ff77000 happens to be at the end of the last memory section. It is the mem_section array, which crash wants to load and which is visible in the running system: > > > > buczek@kreios:/mnt$ sudo gdb vmlinux /proc/kcore > > [...] > > (gdb) print mem_section > > $1 = (struct mem_section **) 0xffff89807ff77000 > > (gdb) print *mem_section > > $2 = (struct mem_section *) 0xffff88a07f37b000 > > (gdb) print **mem_section > > $3 = {section_mem_map = 18446719884453740551, pageblock_flags = 0xffff88a07f36f040} > > > > I can read the same information from the crash dump, if I account for the 0x0000008000000000 error: > > > > buczek@kreios:/mnt$ gdb vmlinux crash.vmcore > > [...] > > (gdb) print mem_section > > $1 = (struct mem_section **) 0xffff89807ff77000 > > (gdb) print *mem_section > > Cannot access memory at address 0xffff89807ff77000 > > (gdb) set $t=(struct mem_section **) ((char *)mem_section - 0x0000008000000000) > > (gdb) print *$t > > $2 = (struct mem_section *) 0xffff88a07f37b000 > > (gdb) set $s=(struct mem_section *)((char *)*$t - 0x0000008000000000 ) > > (gdb) print *$s > > $3 = {section_mem_map = 18446719884453740551, pageblock_flags = 0xffff88a07f36f040} > > > > In the above example, the running kernel, the crashed kernel and the crashkernel are all the same 4.19.57 compilation. But I've tried with several other versions ( crashkernel 4.4, running kernel from 4.0 to linux master) with the same result. > > > > The machine in the above example has several numa nodes (this is why there are so many LOAD headers). But I've tried this with a small kvm virtual machine and got the same result. > > > > buczek@kreios:/mnt/linux-4.19.57-286.x86_64/build$ grep RANDOMIZE_BASE .config > > # CONFIG_RANDOMIZE_BASE is not set > > buczek@kreios:/mnt/linux-4.19.57-286.x86_64/build$ grep SPARSEMEM .config > > CONFIG_ARCH_SPARSEMEM_ENABLE=y > > CONFIG_ARCH_SPARSEMEM_DEFAULT=y > > CONFIG_SPARSEMEM_MANUAL=y > > CONFIG_SPARSEMEM=y > > CONFIG_SPARSEMEM_EXTREME=y > > CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y > > CONFIG_SPARSEMEM_VMEMMAP=y > > buczek@kreios:/mnt/linux-4.19.57-286.x86_64/build$ grep PAGE_TABLE_ISOLATION .config > > CONFIG_PAGE_TABLE_ISOLATION=y > > > > Any ideas? > > > > Donald > > To answer my own question for the records: Thanks for the update. I think Paul (may be from your organization?) posted a similar issue and I had enquired about a few environment details from him for helping debug this issue (see ). But he seems to be OOO.. > Our kexec command line is > > /usr/sbin/kexec -p /boot/bzImage.crash --initrd=/boot/grub/initramfs.igz --command-line="root=LABEL=root ro console=ttyS1,115200n8 console=tty0 irqpoll nr_cpus=1 reset_devices panic=5 CRASH" > > So we neither gave -s (--kexec-file-syscall) nor -a ( --kexec-syscall-auto ). For this reason, kexec used the kexec_load() syscall instead of the newer kexec_file_load syscall. '-p' flag is for indicating a kdump operation (i.e you want to load a crash kernel and want to execute it if the primary kernel crashes) and different from the kexec load ('-l' or '-s' operation where you want to load and execute another kernel). > With kexec_load(), the elf headers for the crash, which include program header for the old system ram, are not computed by the kernel, but by the userspace program from kexec-tools. See above, kdump and kexec-load are completely different operation and I am not sure how using kdump options seem to help your case when kexec_load() / kexec_file_load() don't seem to work. However looking at your and Paul's original email, I can decipher that you are able to generate a vmcore (although an incomplete one), so I am pretty sure you are using the kexec -p (i.e. kdump) feature rather than kexec to another kernel :) > Linux kernel commit d52888aa ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") changed the base of the direct mapping from 0xffff880000000000 to 0xffff888000000000. This was merged into v4.20-rc2. > > kexec-tools, however, still has the old address hard coded: > buczek@avaritia:/scratch/cluster/buczek/kexec-tools (master)$ git grep X86_64_PAGE_OFFSET > kexec/arch/i386/crashdump-x86.c: elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27; > kexec/arch/i386/crashdump-x86.c: elf_info->page_offset = X86_64_PAGE_OFFSET; > kexec/arch/i386/crashdump-x86.h:#define X86_64_PAGE_OFFSET_PRE_2_6_27 0xffff810000000000ULL > kexec/arch/i386/crashdump-x86.h:#define X86_64_PAGE_OFFSET 0xffff880000000000ULL Good catch. I see, while other user-space tools (for e.g. makedumpfile have migrated to using the available PT_LOADs for example in the '/proc/kcore' file (see [0] for reference) to determine the correct PAGE_OFFSET value, it seems kexec-tools is still using MACRO values for the same - which probably are not maintainable and need to be updated with changes in the kernel. I will try to reproduce this at my end (I think it should be easy to do so on Qemu) and send a kexec-tools fix shortly. I will Cc you for the fix patch. Please feel free to test the same and let me know in case you face any further issues. Thanks, Bhupesh