Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4692970imm; Fri, 18 May 2018 09:06:33 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqtflNUUv/luVA5DbtzSI+5Ro0aM6T2dQftqNqUb2D8PZi8MUGbic95bGmRndqoat7f+Ni/ X-Received: by 2002:a62:59d1:: with SMTP id k78-v6mr10029282pfj.54.1526659592991; Fri, 18 May 2018 09:06:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526659592; cv=none; d=google.com; s=arc-20160816; b=BjFlsjWMYwe6KKsRo79Nxx81fJ+MibZnvIQkoEAc9dYQ+nkQah3ZeMNpldaf7IkswW NysMyoVMxiie4Vvp7jnlr3Ua1Zyq5ty3lAYR9yu4Zl+XAD6BtJARsTCWA8G2YNj1Dp2f mVXNtWOfpA/r6chALvp5g5BA1l7lKd/7vPaqYroGQYjHupynEUKRC862YIeX1n/32qkt mHXi5b7TCEzUPCJ5iIuHFd4+nSNkt/sh706qNozj2K7YfZz1i4jBzzwl3auDiyGKOCwe S0cY4kMYogQ1wBhxF2WjlOAvux/AY+7tDbodytfoY4gwWiNeMtkCOsi2V3vqxMdf3Wba 8cMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:cc:from:references:to:subject:arc-authentication-results; bh=HQuse6pmLiP07D1q9AU9m8e7RTlrxAIraZ2C0LdwRcA=; b=ZwGXWbGkjrV7qV09+eEfQa8kLv2WEUU5uN9BmOWKNckUbx09thubPnlzQthTCGrleA utX6/PR9of1kMcsu9506xqb9Q46SFJun1FlQMFTGUEF4CSXGSKvVxbO8XvC1iZlByxF7 9i+y7LdYMcQXO2HRxE6IekISxnp7VIm97CMh4jwwkaWMHz4/Aft/UHVD4/1i9hEl6d3O UoUzQorpDo1WrcZpuyarf5QCZowTOc0PZr3RVMVXWfKi7TTp5Rs/Rz/vXz7VdWgiztx5 PIEbCGWOUQCBSrWxERIb1afdxmci6+jcKqMqYtOr8elHmml1WLmdg2xgq1vnZVoOL+a6 908Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 186-v6si7835775pfg.141.2018.05.18.09.06.18; Fri, 18 May 2018 09:06:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752059AbeERQEG (ORCPT + 99 others); Fri, 18 May 2018 12:04:06 -0400 Received: from foss.arm.com ([217.140.101.70]:55432 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751413AbeERQEF (ORCPT ); Fri, 18 May 2018 12:04:05 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 20DB11529; Fri, 18 May 2018 09:04:05 -0700 (PDT) Received: from [10.1.207.55] (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A00833F25D; Fri, 18 May 2018 09:03:59 -0700 (PDT) Subject: Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support To: AKASHI Takahiro References: <20180425062629.29404-1-takahiro.akashi@linaro.org> <20180425062629.29404-8-takahiro.akashi@linaro.org> <20180518103925.GP2737@linaro.org> From: James Morse Cc: catalin.marinas@arm.com, will.deacon@arm.com, dhowells@redhat.com, vgoyal@redhat.com, herbert@gondor.apana.org.au, davem@davemloft.net, dyoung@redhat.com, bhe@redhat.com, arnd@arndb.de, ard.biesheuvel@linaro.org, bhsharma@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Message-ID: Date: Fri, 18 May 2018 17:00:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180518103925.GP2737@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Akashi, On 18/05/18 11:39, AKASHI Takahiro wrote: > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote: >> On 25/04/18 07:26, AKASHI Takahiro wrote: >>> Enabling crash dump (kdump) includes >>> * prepare contents of ELF header of a core dump file, /proc/vmcore, >>> using crash_prepare_elf64_headers(), and >>> * add two device tree properties, "linux,usable-memory-range" and >>> "linux,elfcorehdr", which represent repsectively a memory range >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c >>> index 37c0a9dc2e47..ec674f4d267c 100644 >>> --- a/arch/arm64/kernel/machine_kexec_file.c >>> +++ b/arch/arm64/kernel/machine_kexec_file.c >>> +static void fill_property(void *buf, u64 val64, int cells) >>> +{ >>> + u32 val32; >>> + >>> + if (cells == 1) { >>> + val32 = cpu_to_fdt32((u32)val64); >>> + memcpy(buf, &val32, sizeof(val32)); >>> + } else { >> >>> + memset(buf, 0, cells * sizeof(u32) - sizeof(u64)); >>> + buf += cells * sizeof(u32) - sizeof(u64); >> >> Is this trying to clear the 'top' cells and shuffle the pointer to point at the >> 'bottom' 2? I'm pretty sure this isn't endian safe. >> >> Do we really expect a system to have #address-cells > 2? > > I don't know, but just for safety. Okay, so this is aiming to be a cover-all-cases library function. >>> + val64 = cpu_to_fdt64(val64); >>> + memcpy(buf, &val64, sizeof(val64)); >>> + } >>> +} >>> + >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name, >>> + unsigned long addr, unsigned long size) >> >> (the device-tree spec describes a 'ranges' property, which had me confused. This >> is encoding a prop-encoded-array) > > Should we rename it to, say, fdt_setprop_reg()? Sure, but I'd really like this code to come from libfdt. I'm hoping for some temporary workaround, lets see what the DT folk say. >>> + if (!buf) >>> + return -ENOMEM; >>> + >>> + fill_property(prop, addr, __dt_root_addr_cells); >>> + prop += __dt_root_addr_cells * sizeof(u32); >>> + >>> + fill_property(prop, size, __dt_root_size_cells); >>> + >>> + result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size); >>> + >>> + vfree(buf); >>> + >>> + return result; >>> +} >> >> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api >> because this the first time we've wanted to create a node with more than >> key=fixed-size-value. >> >> I don't think this belongs in arch C code. Do we have a plan for getting libfdt >> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating >> this will find it, until we can (re)move it? > > I will temporarily move all fdt-related stuff to a separate file, but > >> I have no idea how that happens... it looks like the devicetree list is the >> place to ask. > > should we always sync with the original dtc/libfdt repository? I thought so, libfdt is one of those external libraries that the kernel consumes, like acpica. For acpica at least the rule is changes go upstream, then get sync'd back. >>> static int setup_dtb(struct kimage *image, >>> unsigned long initrd_load_addr, unsigned long initrd_len, >>> char *cmdline, unsigned long cmdline_len, >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image, >>> int range_len; >>> int ret; >>> >>> + /* check ranges against root's #address-cells and #size-cells */ >>> + if (image->type == KEXEC_TYPE_CRASH && >>> + (!cells_size_fitted(image->arch.elf_load_addr, >>> + image->arch.elf_headers_sz) || >>> + !cells_size_fitted(crashk_res.start, >>> + crashk_res.end - crashk_res.start + 1))) { >>> + pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n"); >>> + ret = -EINVAL; >>> + goto out_err; >>> + } >> >> To check I've understood this properly: This can happen if the firmware provided >> a DTB with 32bit address/size cells, but at least some of the memory requires 64 >> bit address/size cells. This could only happen on a UEFI system where the >> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT. > > Probably, yes. I assumed the case where #address-cells and #size-cells > were just missing in fdt. Ah, that's another one. I just wanted to check we could boot on a system where this can happen. >>> /* duplicate dt blob */ >>> buf_size = fdt_totalsize(initial_boot_params); >>> range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32); >>> >>> + if (image->type == KEXEC_TYPE_CRASH) >>> + buf_size += fdt_prop_len("linux,elfcorehdr", range_len) >>> + + fdt_prop_len("linux,usable-memory-range", >>> + range_len); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [...] >> Don't you need to add "linux,usable-memory-range" to the buf_size estimate? > > I think the code exists. See above. Sorry, turns out I can't read! >>> + if (ret) >>> + goto out_err; >>> + } >> >>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image, >> >>> +static struct crash_mem *get_crash_memory_ranges(void) >>> +{ >>> + unsigned int nr_ranges; >>> + struct crash_mem *cmem; >>> + >>> + nr_ranges = 1; /* for exclusion of crashkernel region */ >>> + walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback); >>> + >>> + cmem = vmalloc(sizeof(struct crash_mem) + >>> + sizeof(struct crash_mem_range) * nr_ranges); >>> + if (!cmem) >>> + return NULL; >>> + >>> + cmem->max_nr_ranges = nr_ranges; >>> + cmem->nr_ranges = 0; >>> + walk_system_ram_res(0, -1, cmem, add_mem_range_callback); >>> + >>> + /* Exclude crashkernel region */ >>> + if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) { >>> + vfree(cmem); >>> + return NULL; >>> + } >>> + >>> + return cmem; >>> +} >> >> Could this function be included in prepare_elf_headers() so that the alloc() and >> free() occur together. > > Or aiming that arm64 and x86 have similar-look code? What's the advantage in things looking the same? If they are the same, it probably shouldn't be in per-arch code. Otherwise it should be as simple as possible, otherwise we can't spot the bugs/leaks. But I think walking memblock here will remove all 'looks the same' properties here. >>> +static int prepare_elf_headers(void **addr, unsigned long *sz) >>> +{ >>> + struct crash_mem *cmem; >>> + int ret = 0; >>> + >>> + cmem = get_crash_memory_ranges(); >>> + if (!cmem) >>> + return -ENOMEM; >>> + >>> + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); >>> + >>> + vfree(cmem); >> >>> + return ret; >>> +} >> >> All this is moving memory-range information from core-code's >> walk_system_ram_res() into core-code's struct crash_mem, and excluding >> crashk_res, which again is accessible to the core code. >> >> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64 >> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead >> of when IS_ENABLED(CONFIG_X86_64). >> If we can abstract just those two, more of this could be moved to core code >> where powerpc can make use of it if they want to support kdump with >> kexec_file_load(). >> >> But, its getting late for cross-architecture dependencies, lets put that on the >> for-later list. (assuming there isn't a powerpc-kdump series out there adding a >> third copy of this) > > Sure. X86 code has so many exceptional lines in the code :) They also pass the e820 'usable-memory' map on the cmdline... Thanks, James