Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4345508imm; Fri, 18 May 2018 03:39:55 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoIKLnWakXypEwOPXRk+2q+gAiKr/AeKudj01u0LvKcje+qGD8/ROdjV3GNdEO6VmgjGyLR X-Received: by 2002:a62:c205:: with SMTP id l5-v6mr8842256pfg.6.1526639995416; Fri, 18 May 2018 03:39:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526639995; cv=none; d=google.com; s=arc-20160816; b=uwSA/8n76WdZTg7hL0AqrzeUz6EzIvi3tlpBuFsS0Ji1f0fGd8O6xOQXSxqUkfq6NT iyx2Tz1//cDmROqF5bIzwmJpLQfgW63D9rEn3Te2Ub+aAAs333dUlufMikYUQ38KEgLe Gf3WkQc6lMq3Vsa8p0QLkJFt+45O0n02SfCN/UzbjI7wJS3ahhrcFaTQ6fAH5sfNMI35 kIkcn0Z4wdjHpmHWMLN6eIglCK+6uu3CbYN5oVy48wNgIms7QfvL4IjMS75dJMY19D/2 YN/O1uzMgTkn0Sw54XGEvcE/YAB9/LSZnE6Kyk6l4oRwyQZ65hWE5cnz3BHmbhZFReN6 zCfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=x6H9JrW78DlWeFEQOrK1aWp5RdOZ/RkJtsexGMDN/Ck=; b=YyXhCvID1T9q5IOmdaly31v+W5LXPh9yzAQwUhegEJvLLQV91eAzW3zfkme3Llm0DM v3TBiOnGHGRofT4lJffZU5WqgQIbf3l5M0AedhKwNYafo3e+1ruuC31wHe4G/Tkk2l/6 n9idh/4k+surnTyL/6FrtV2ectpayvlfjW2r24/ck3dFfMxeFMvOpZMCu11XsC0anmO2 UaEZzoAO2xzfZtp1VXX/Yo8Ikxl0oJ070t60ubgEYfkKzog2aAwyLBq1EGpDSwykUoQL QoJOXP4FOC3Z23sF7MmypkIky0AYRnXCuABczWnmPqIc700cAwfYBVMWio9cSW4TcQDc HRGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=O4LThB9y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g33-v6si7526953plb.297.2018.05.18.03.39.40; Fri, 18 May 2018 03:39:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=O4LThB9y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752723AbeERKj3 (ORCPT + 99 others); Fri, 18 May 2018 06:39:29 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:36128 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752342AbeERKj1 (ORCPT ); Fri, 18 May 2018 06:39:27 -0400 Received: by mail-pl0-f65.google.com with SMTP id v24-v6so4368524plo.3 for ; Fri, 18 May 2018 03:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=x6H9JrW78DlWeFEQOrK1aWp5RdOZ/RkJtsexGMDN/Ck=; b=O4LThB9yRvWZH7dyC30x5+B3vKBVnDkXcUv8WzwQPDvZqofv+CNSwifaBdd0n2YEl0 wGd7DH8oH/z5ztRd0kalsHvfhDPZ3Djo1nLS7nsMNP4njVvg2kFEEYmP7AiO/aP3MpqY YyO/lDRQ3CniwTA0cpxLgOeiyqxxEbBdru6J4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=x6H9JrW78DlWeFEQOrK1aWp5RdOZ/RkJtsexGMDN/Ck=; b=La4fsYB7r6aZ+wWI34UREcjBkNIAmacGOPm0RkvVjFL3wiibB75bKbgxVL81s68PdG v6fvW76ulm+UjPN3jzo4VhFl+WORubYSbjbhTvV7i1Xg3PGmDBuZ9h6GEeFG6z/KNb2k eS3ccAzS5PBn8ypALvkp4IUr+P3VlGjnVhsEIfxP88XTIqbUE1XLfB1FIY5GBb5QQa9v f6YN+ZcvxZNzfDZsRO+uugEmHjknN5Ado/Y93u8NYH8mrjtyjo9xeZ+Pij3RRv7M4feO nrOVfnn1E/JOWHQhWm/qdNbrBBIUiwWcbpT6XBGbHpOddxTlpdAVMCDfFfR3OeZnmX6G dB+Q== X-Gm-Message-State: ALKqPwfFW3ZsHP34080Bd2Cs5It4X8fSufpRe4IxLZ9LiOYsdQ2/swMP t4Ym9QnrfdClUygzCA0PGtziMA== X-Received: by 2002:a17:902:bccc:: with SMTP id o12-v6mr9064469pls.56.1526639966754; Fri, 18 May 2018 03:39:26 -0700 (PDT) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id t24-v6sm14246533pfj.75.2018.05.18.03.39.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 May 2018 03:39:26 -0700 (PDT) Date: Fri, 18 May 2018 19:39:26 +0900 From: AKASHI Takahiro To: James Morse Cc: catalin.marinas@arm.com, will.deacon@arm.com, dhowells@redhat.com, vgoyal@redhat.com, herbert@gondor.apana.org.au, davem@davemloft.net, dyoung@redhat.com, bhe@redhat.com, arnd@arndb.de, ard.biesheuvel@linaro.org, bhsharma@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support Message-ID: <20180518103925.GP2737@linaro.org> Mail-Followup-To: AKASHI Takahiro , James Morse , catalin.marinas@arm.com, will.deacon@arm.com, dhowells@redhat.com, vgoyal@redhat.com, herbert@gondor.apana.org.au, davem@davemloft.net, dyoung@redhat.com, bhe@redhat.com, arnd@arndb.de, ard.biesheuvel@linaro.org, bhsharma@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20180425062629.29404-1-takahiro.akashi@linaro.org> <20180425062629.29404-8-takahiro.akashi@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote: > Hi Akashi, > > On 25/04/18 07:26, AKASHI Takahiro wrote: > > Enabling crash dump (kdump) includes > > * prepare contents of ELF header of a core dump file, /proc/vmcore, > > using crash_prepare_elf64_headers(), and > > * add two device tree properties, "linux,usable-memory-range" and > > "linux,elfcorehdr", which represent repsectively a memory range > > (Nit: respectively) Will fix. > > > to be used by crash dump kernel and the header's location > > > arch/arm64/include/asm/kexec.h | 4 + > > arch/arm64/kernel/kexec_image.c | 9 +- > > arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++ > > In this patch, machine_kexec_file.c gains its own private fdt array encoder. See below. > > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c > > index 37c0a9dc2e47..ec674f4d267c 100644 > > --- a/arch/arm64/kernel/machine_kexec_file.c > > +++ b/arch/arm64/kernel/machine_kexec_file.c > > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, > > return ret; > > } > > > > +static int __init arch_kexec_file_init(void) > > +{ > > + /* Those values are used later on loading the kernel */ > > + __dt_root_addr_cells = dt_root_addr_cells; > > + __dt_root_size_cells = dt_root_size_cells; > > + > > + return 0; > > +} > > +late_initcall(arch_kexec_file_init); > > If we need these is it worth taking them out of __initdata? I note they've been > 'temporary' for quite a long time. I think that I had some reason that I didn't do that, but don't remember now. If there's no problem, I will take your suggestion. > > > + > > +#define FDT_ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1)) > > +#define FDT_TAGALIGN(x) (FDT_ALIGN((x), FDT_TAGSIZE)) > > + > > +static int fdt_prop_len(const char *prop_name, int len) > > +{ > > + return (strlen(prop_name) + 1) + > > + sizeof(struct fdt_property) + > > + FDT_TAGALIGN(len); > > +} > > This stuff should really be in libfdt.h Those macros come from > libfdt_internal.h, so we're probably doing something wrong here. > > > > +static bool cells_size_fitted(unsigned long base, unsigned long size) > > +{ > > + /* if *_cells >= 2, cells can hold 64-bit values anyway */ > > + if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32))) > > + return false; > > + > > + if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32))) > > + return false; > > Using '> U32_MAX' here may be more readable. OK > > > + return true; > > +} > > + > > +static void fill_property(void *buf, u64 val64, int cells) > > +{ > > + u32 val32; > > + > > + if (cells == 1) { > > + val32 = cpu_to_fdt32((u32)val64); > > + memcpy(buf, &val32, sizeof(val32)); > > + } else { > > > + memset(buf, 0, cells * sizeof(u32) - sizeof(u64)); > > + buf += cells * sizeof(u32) - sizeof(u64); > > Is this trying to clear the 'top' cells and shuffle the pointer to point at the > 'bottom' 2? I'm pretty sure this isn't endian safe. > > Do we really expect a system to have #address-cells > 2? I don't know, but just for safety. > > > + val64 = cpu_to_fdt64(val64); > > + memcpy(buf, &val64, sizeof(val64)); > > + } > > +} > > + > > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name, > > + unsigned long addr, unsigned long size) > > (the device-tree spec describes a 'ranges' property, which had me confused. This > is encoding a prop-encoded-array) Should we rename it to, say, fdt_setprop_reg()? > > +{ > > + void *buf, *prop; > > + size_t buf_size; > > + int result; > > + > > + buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32); > > + prop = buf = vmalloc(buf_size); > > virtual memory allocation for something less than PAGE_SIZE? I've never cared about that. Let me think again. > > > + if (!buf) > > + return -ENOMEM; > > + > > + fill_property(prop, addr, __dt_root_addr_cells); > > + prop += __dt_root_addr_cells * sizeof(u32); > > + > > + fill_property(prop, size, __dt_root_size_cells); > > + > > + result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size); > > + > > + vfree(buf); > > + > > + return result; > > +} > > Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api > because this the first time we've wanted to create a node with more than > key=fixed-size-value. > > I don't think this belongs in arch C code. Do we have a plan for getting libfdt > to support encoding prop-arrays? Can we put it somewhere anyone else duplicating > this will find it, until we can (re)move it? I will temporarily move all fdt-related stuff to a separate file, but > I have no idea how that happens... it looks like the devicetree list is the > place to ask. should we always sync with the original dtc/libfdt repository? > > > static int setup_dtb(struct kimage *image, > > unsigned long initrd_load_addr, unsigned long initrd_len, > > char *cmdline, unsigned long cmdline_len, > > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image, > > int range_len; > > int ret; > > > > + /* check ranges against root's #address-cells and #size-cells */ > > + if (image->type == KEXEC_TYPE_CRASH && > > + (!cells_size_fitted(image->arch.elf_load_addr, > > + image->arch.elf_headers_sz) || > > + !cells_size_fitted(crashk_res.start, > > + crashk_res.end - crashk_res.start + 1))) { > > + pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n"); > > + ret = -EINVAL; > > + goto out_err; > > + } > > To check I've understood this properly: This can happen if the firmware provided > a DTB with 32bit address/size cells, but at least some of the memory requires 64 > bit address/size cells. This could only happen on a UEFI system where the > firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT. Probably, yes. I assumed the case where #address-cells and #size-cells were just missing in fdt. > > > /* duplicate dt blob */ > > buf_size = fdt_totalsize(initial_boot_params); > > range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32); > > > > + if (image->type == KEXEC_TYPE_CRASH) > > + buf_size += fdt_prop_len("linux,elfcorehdr", range_len) > > + + fdt_prop_len("linux,usable-memory-range", > > + range_len); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > + > > if (initrd_load_addr) > > buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64)) > > + fdt_prop_len("linux,initrd-end", sizeof(u64)); > > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image, > > if (nodeoffset < 0) > > goto out_err; > > > > + if (image->type == KEXEC_TYPE_CRASH) { > > + /* add linux,elfcorehdr */ > > + ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr", > > + image->arch.elf_load_addr, > > + image->arch.elf_headers_sz); > > + if (ret) > > + goto out_err; > > + > > + /* add linux,usable-memory-range */ > > + ret = fdt_setprop_range(buf, nodeoffset, > > + "linux,usable-memory-range", > > + crashk_res.start, > > + crashk_res.end - crashk_res.start + 1); > > Don't you need to add "linux,usable-memory-range" to the buf_size estimate? I think the code exists. See above. > > > + if (ret) > > + goto out_err; > > + } > > > @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image, > > > +static struct crash_mem *get_crash_memory_ranges(void) > > +{ > > + unsigned int nr_ranges; > > + struct crash_mem *cmem; > > + > > + nr_ranges = 1; /* for exclusion of crashkernel region */ > > + walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback); > > + > > + cmem = vmalloc(sizeof(struct crash_mem) + > > + sizeof(struct crash_mem_range) * nr_ranges); > > + if (!cmem) > > + return NULL; > > + > > + cmem->max_nr_ranges = nr_ranges; > > + cmem->nr_ranges = 0; > > + walk_system_ram_res(0, -1, cmem, add_mem_range_callback); > > + > > + /* Exclude crashkernel region */ > > + if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) { > > + vfree(cmem); > > + return NULL; > > + } > > + > > + return cmem; > > +} > > Could this function be included in prepare_elf_headers() so that the alloc() and > free() occur together. Or aiming that arm64 and x86 have similar-look code? > > > +static int prepare_elf_headers(void **addr, unsigned long *sz) > > +{ > > + struct crash_mem *cmem; > > + int ret = 0; > > + > > + cmem = get_crash_memory_ranges(); > > + if (!cmem) > > + return -ENOMEM; > > + > > + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); > > + > > + vfree(cmem); > > > + return ret; > > +} > > All this is moving memory-range information from core-code's > walk_system_ram_res() into core-code's struct crash_mem, and excluding > crashk_res, which again is accessible to the core code. > > It looks like this is duplicated in arch/x86 and arch/arm64 because arm64 > doesn't have a second 'crashk_low_res' region, and always wants elf64, instead > of when IS_ENABLED(CONFIG_X86_64). > If we can abstract just those two, more of this could be moved to core code > where powerpc can make use of it if they want to support kdump with > kexec_file_load(). > > But, its getting late for cross-architecture dependencies, lets put that on the > for-later list. (assuming there isn't a powerpc-kdump series out there adding a > third copy of this) Sure. X86 code has so many exceptional lines in the code :) Thanks, -Takahiro AKASHI > > Thanks, > > James