Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp89290imm; Mon, 21 May 2018 02:47:27 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr6DUfIqHnKB7lSg5hcwpoHwQ5kUbQTdB5zj+hwTnDPkOfyO41vxdrrUo+YlbUlqj3qQVAc X-Received: by 2002:a17:902:7c18:: with SMTP id x24-v6mr20208040pll.173.1526896047108; Mon, 21 May 2018 02:47:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526896047; cv=none; d=google.com; s=arc-20160816; b=CLHBJvzthXF9HLFAxc0ZOURgWVcHOKEJAhKAyAMiUkFDBQVNfxlEnAxJPvJholR7ZC vBwV/1UAOVNGI17bsSmfC+SUDysVzkSPr9sePCyLzBC5c25wXHhYP4nyOsdI/HoQdnNE bHrgER6fBgN6D9ACnUNkLIAMohMuqDIgg1dWZ0xTQiwJbfIqRX/8+270/zLQJ1b16dWD CaeLyXclJgJjBEEm6l84FyiHZmP0TQcsDSGTR6KWnRasqD0dVTf+Jw4UkHpVlbpeftux pnRY6VwTztj9CdGfyT7ZfUu8pOSVxkhT3kjdIZ7cxPHkU0OkKRPGjhuri71/4oynODLC xn5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=pngMo7zruxqOu8V6gwNulcX0jOIMyH9+RPCFG7dum/M=; b=WdiJxuMvbuqzuHqGyLZtAo1+O7lvXp12bwVqMZ84dEFVMbAy9JcF4PjkKfGdObPVBv D5CFpt86LgoTvMzIjXPsLVWy1OXDCYsqXMXbw+aU3OoIWWLNMXXTkEO3Bcv+FRv7A3qK vd+7ldPxDnrV5jx60KDN2DROeB5FBrsx+6oIsiVQvcecajDVCBUtX0pE4ZUl3wHgB4lA 5+vDnmmQqPBm9BIUhX+o8gP88+nDhSC3i9uXtgO57D5i+DCal0N9gry4JqYmFJps8zn1 WgxMop7qAr6UNJpIMTwyY/QMWjSMURCRU4RgCddA4Y5f71W3X2Wn0liqJqjXw3NK8+N+ NNow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FHV+GEr6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x15-v6si6563028pge.686.2018.05.21.02.47.10; Mon, 21 May 2018 02:47:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FHV+GEr6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751216AbeEUJrB (ORCPT + 99 others); Mon, 21 May 2018 05:47:01 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:46373 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751134AbeEUJrA (ORCPT ); Mon, 21 May 2018 05:47:00 -0400 Received: by mail-pg0-f65.google.com with SMTP id a3-v6so1203597pgt.13 for ; Mon, 21 May 2018 02:46:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=pngMo7zruxqOu8V6gwNulcX0jOIMyH9+RPCFG7dum/M=; b=FHV+GEr6c/Z1MEtQhWoEHBbt4/4j09dUxEB8r9yeRS/o+StAgCEMmeAk8OlxLxY4Yf q4ZTedUkG/S6Xt73LY3E7Pw2gDuFSaCaI+JbIwRqRAVNJ3k2d9/GbubQD2mI+GvZIK8z vux+DqJ0zNENnDGxU1BSgnf/lXv9zW49RD02E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=pngMo7zruxqOu8V6gwNulcX0jOIMyH9+RPCFG7dum/M=; b=WKo6js3iyD2trJbareMNbljyKeeBxe0obOUAd6r5n7BV7IyE90q4K3JLeXZhQBhxRo rZcsMdmoAAvHNv/wFHeLso0l0qQuXw9vBgaHYzftBZ0WSJM7uSKhJkgRR/Q9OHlqvGzH R76RlpU+DTnLwHeS/rg2gOpIotozl62CgHK82j3/0ABGX2ouBQkWwtgbjAsc5ej6G4Th WlUNrKZkvKs5bkD0ks4A1mv4aw3rs5Mwaa8XOAnNh1Xwzuy2DWt7ZMIXJgnKo0dKO3EG bS56/rMRHDNS/xUagSaB1hZtwseZwPbqBQpNqHGzIPAgze2qrxBsTWgeIEZqhkA6C3Fc h8wA== X-Gm-Message-State: ALKqPwe1GKW5XujFfMKEohoGw/liShuJnUsUSMKvti1cBkYz7MJaEkn0 Qp/T0AaxtTBPr5WDrFKbaVfujg== X-Received: by 2002:a63:a102:: with SMTP id b2-v6mr14958551pgf.75.1526896019337; Mon, 21 May 2018 02:46:59 -0700 (PDT) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id k2-v6sm25976248pfg.82.2018.05.21.02.46.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 May 2018 02:46:58 -0700 (PDT) Date: Mon, 21 May 2018 18:46:59 +0900 From: AKASHI Takahiro To: James Morse Cc: catalin.marinas@arm.com, will.deacon@arm.com, dhowells@redhat.com, vgoyal@redhat.com, herbert@gondor.apana.org.au, davem@davemloft.net, dyoung@redhat.com, bhe@redhat.com, arnd@arndb.de, ard.biesheuvel@linaro.org, bhsharma@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support Message-ID: <20180521094658.GB9887@linaro.org> Mail-Followup-To: AKASHI Takahiro , James Morse , catalin.marinas@arm.com, will.deacon@arm.com, dhowells@redhat.com, vgoyal@redhat.com, herbert@gondor.apana.org.au, davem@davemloft.net, dyoung@redhat.com, bhe@redhat.com, arnd@arndb.de, ard.biesheuvel@linaro.org, bhsharma@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20180425062629.29404-1-takahiro.akashi@linaro.org> <20180425062629.29404-8-takahiro.akashi@linaro.org> <20180518103925.GP2737@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org James, On Fri, May 18, 2018 at 05:00:55PM +0100, James Morse wrote: > Hi Akashi, > > On 18/05/18 11:39, AKASHI Takahiro wrote: > > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote: > >> On 25/04/18 07:26, AKASHI Takahiro wrote: > >>> Enabling crash dump (kdump) includes > >>> * prepare contents of ELF header of a core dump file, /proc/vmcore, > >>> using crash_prepare_elf64_headers(), and > >>> * add two device tree properties, "linux,usable-memory-range" and > >>> "linux,elfcorehdr", which represent repsectively a memory range > > >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c > >>> index 37c0a9dc2e47..ec674f4d267c 100644 > >>> --- a/arch/arm64/kernel/machine_kexec_file.c > >>> +++ b/arch/arm64/kernel/machine_kexec_file.c > > >>> +static void fill_property(void *buf, u64 val64, int cells) > >>> +{ > >>> + u32 val32; > >>> + > >>> + if (cells == 1) { > >>> + val32 = cpu_to_fdt32((u32)val64); > >>> + memcpy(buf, &val32, sizeof(val32)); > >>> + } else { > >> > >>> + memset(buf, 0, cells * sizeof(u32) - sizeof(u64)); > >>> + buf += cells * sizeof(u32) - sizeof(u64); > >> > >> Is this trying to clear the 'top' cells and shuffle the pointer to point at the > >> 'bottom' 2? I'm pretty sure this isn't endian safe. > >> > >> Do we really expect a system to have #address-cells > 2? > > > > I don't know, but just for safety. > > Okay, so this is aiming to be a cover-all-cases library function. > > > >>> + val64 = cpu_to_fdt64(val64); > >>> + memcpy(buf, &val64, sizeof(val64)); > >>> + } > >>> +} > >>> + > >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name, > >>> + unsigned long addr, unsigned long size) > >> > >> (the device-tree spec describes a 'ranges' property, which had me confused. This > >> is encoding a prop-encoded-array) > > > > Should we rename it to, say, fdt_setprop_reg()? > > Sure, but I'd really like this code to come from libfdt. I'm hoping for some > temporary workaround, lets see what the DT folk say. OK, I will follow Rob's suggestion. > >>> + if (!buf) > >>> + return -ENOMEM; > >>> + > >>> + fill_property(prop, addr, __dt_root_addr_cells); > >>> + prop += __dt_root_addr_cells * sizeof(u32); > >>> + > >>> + fill_property(prop, size, __dt_root_size_cells); > >>> + > >>> + result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size); > >>> + > >>> + vfree(buf); > >>> + > >>> + return result; > >>> +} > >> > >> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api > >> because this the first time we've wanted to create a node with more than > >> key=fixed-size-value. > >> > >> I don't think this belongs in arch C code. Do we have a plan for getting libfdt > >> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating > >> this will find it, until we can (re)move it? > > > > I will temporarily move all fdt-related stuff to a separate file, but > > > >> I have no idea how that happens... it looks like the devicetree list is the > >> place to ask. > > > > should we always sync with the original dtc/libfdt repository? > > I thought so, libfdt is one of those external libraries that the kernel > consumes, like acpica. For acpica at least the rule is changes go upstream, then > get sync'd back. Same above. > >>> static int setup_dtb(struct kimage *image, > >>> unsigned long initrd_load_addr, unsigned long initrd_len, > >>> char *cmdline, unsigned long cmdline_len, > >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image, > >>> int range_len; > >>> int ret; > >>> > >>> + /* check ranges against root's #address-cells and #size-cells */ > >>> + if (image->type == KEXEC_TYPE_CRASH && > >>> + (!cells_size_fitted(image->arch.elf_load_addr, > >>> + image->arch.elf_headers_sz) || > >>> + !cells_size_fitted(crashk_res.start, > >>> + crashk_res.end - crashk_res.start + 1))) { > >>> + pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n"); > >>> + ret = -EINVAL; > >>> + goto out_err; > >>> + } > >> > >> To check I've understood this properly: This can happen if the firmware provided > >> a DTB with 32bit address/size cells, but at least some of the memory requires 64 > >> bit address/size cells. This could only happen on a UEFI system where the > >> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT. > > > > Probably, yes. I assumed the case where #address-cells and #size-cells > > were just missing in fdt. > > Ah, that's another one. I just wanted to check we could boot on a system where > this can happen. > > > >>> /* duplicate dt blob */ > >>> buf_size = fdt_totalsize(initial_boot_params); > >>> range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32); > >>> > >>> + if (image->type == KEXEC_TYPE_CRASH) > >>> + buf_size += fdt_prop_len("linux,elfcorehdr", range_len) > >>> + + fdt_prop_len("linux,usable-memory-range", > >>> + range_len); > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [...] > > >> Don't you need to add "linux,usable-memory-range" to the buf_size estimate? > > > > I think the code exists. See above. > > Sorry, turns out I can't read! > > > >>> + if (ret) > >>> + goto out_err; > >>> + } > >> > >>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image, > >> > >>> +static struct crash_mem *get_crash_memory_ranges(void) > >>> +{ > >>> + unsigned int nr_ranges; > >>> + struct crash_mem *cmem; > >>> + > >>> + nr_ranges = 1; /* for exclusion of crashkernel region */ > >>> + walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback); > >>> + > >>> + cmem = vmalloc(sizeof(struct crash_mem) + > >>> + sizeof(struct crash_mem_range) * nr_ranges); > >>> + if (!cmem) > >>> + return NULL; > >>> + > >>> + cmem->max_nr_ranges = nr_ranges; > >>> + cmem->nr_ranges = 0; > >>> + walk_system_ram_res(0, -1, cmem, add_mem_range_callback); > >>> + > >>> + /* Exclude crashkernel region */ > >>> + if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) { > >>> + vfree(cmem); > >>> + return NULL; > >>> + } > >>> + > >>> + return cmem; > >>> +} > >> > >> Could this function be included in prepare_elf_headers() so that the alloc() and > >> free() occur together. > > > > Or aiming that arm64 and x86 have similar-look code? > > What's the advantage in things looking the same? If they are the same, it > probably shouldn't be in per-arch code. Otherwise it should be as simple as > possible, otherwise we can't spot the bugs/leaks. > > But I think walking memblock here will remove all 'looks the same' properties here. OK, I will unfold the function in prepare_elf_headers(). > > >>> +static int prepare_elf_headers(void **addr, unsigned long *sz) > >>> +{ > >>> + struct crash_mem *cmem; > >>> + int ret = 0; > >>> + > >>> + cmem = get_crash_memory_ranges(); > >>> + if (!cmem) > >>> + return -ENOMEM; > >>> + > >>> + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); > >>> + > >>> + vfree(cmem); > >> > >>> + return ret; > >>> +} > >> > >> All this is moving memory-range information from core-code's > >> walk_system_ram_res() into core-code's struct crash_mem, and excluding > >> crashk_res, which again is accessible to the core code. > >> > >> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64 > >> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead > >> of when IS_ENABLED(CONFIG_X86_64). > >> If we can abstract just those two, more of this could be moved to core code > >> where powerpc can make use of it if they want to support kdump with > >> kexec_file_load(). > >> > >> But, its getting late for cross-architecture dependencies, lets put that on the > >> for-later list. (assuming there isn't a powerpc-kdump series out there adding a > >> third copy of this) > > > > Sure. X86 code has so many exceptional lines in the code :) > > They also pass the e820 'usable-memory' map on the cmdline... Well, according to Dave(RedHat)'s past comment, this type of kernel parameters are in a old style, and x86 now has a dedicated memory region passed for this sake. Thanks, -Takahiro AKASHI > > Thanks, > > James