Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4036931pxb; Mon, 1 Feb 2021 10:41:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJwoK9PcFV4MV1LJdGMlarw67mp10/CNIYJ+IkrqMVFr4Zz2sHK/PCoT+o0Qoh1GzN8r6eRv X-Received: by 2002:a17:906:1e87:: with SMTP id e7mr18923873ejj.322.1612204910787; Mon, 01 Feb 2021 10:41:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612204910; cv=none; d=google.com; s=arc-20160816; b=ihZ1rmyFtc2OXpDRihqD5lg7dK0x11moI5q0HqEN8NiFF7mRNgUbG+Kw2u0sakhRZ5 BXw6DesBpaTOAZQh/1JpiVhuaXaBchLjrMSuabLFgxVGyaWv4gmDX2QK4FQbfE2pIG0v yvf62rPp7AMFKfbGphQLgCnOuk4eRRbhqYLeiTrVJvo2xBtFlJGUCr6tvFRbR4jmKlb0 jIxa8yqwX6pfjP0esPxnJjdSwp0SYTz1FqGI0CHoWdypTzw5Cahkp3bVxw3nbrlyC/if 5LZKHVI6TC223fbouzvagXqHZe3xVV72N+/MIk8EzdeN0BypQ349Sati+N00S2O+hqU4 xYBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:cc:from :references:to:subject; bh=FGslV4bThrKvIFb7b/zs5N9lUgSNLStNeZNzGhSBQHE=; b=pxZnfIa0gv7JSkmXIRDVfkUEFPEYP3s9Br/IjPBRmym9MhtCgJ3M3SZ3XyI/UGymCn JL8fETEA9bffijJ9Tfx/DEs4dNbN18m0iV1OvhuVXMQOBjTF1Xu3CiHB4rlH17sHnkSe G+/zbAxdOh4kxWfkcjsgdzt0PQ9GWlAZ8MHa6L8SZqbqmU/NX0T0Eh9SR/F7NwGQSKKo rNCyiInKRml7ZjBd8DBHDH6Om6ajpImAHi9e/hNFrqXvC0PA0+IT7jX8NKsWbGBh5bXT atui31so6AXPl7Q5jFoVz1AxdCuS2Z3RuJAVrgJgjf8P/VTKrI5cBBe4OLT9Q7nfANDM 8Xlw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y24si2128163ejc.1.2021.02.01.10.41.23; Mon, 01 Feb 2021 10:41:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232322AbhBASid (ORCPT + 99 others); Mon, 1 Feb 2021 13:38:33 -0500 Received: from foss.arm.com ([217.140.110.172]:36118 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232879AbhBASdn (ORCPT ); Mon, 1 Feb 2021 13:33:43 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D02B01042; Mon, 1 Feb 2021 10:32:56 -0800 (PST) Received: from [192.168.0.14] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E79903F718; Mon, 1 Feb 2021 10:32:53 -0800 (PST) Subject: Re: [PATCH v11 0/6] arm64: MMU enabled kexec relocation To: Pavel Tatashin References: <20210127172706.617195-1-pasha.tatashin@soleen.com> From: James Morse Cc: jmorris@namei.org, sashal@kernel.org, ebiederm@xmission.com, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, corbet@lwn.net, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, maz@kernel.org, vladimir.murzin@arm.com, matthias.bgg@gmail.com, linux-mm@kvack.org, mark.rutland@arm.com, steve.capper@arm.com, rfontana@redhat.com, tglx@linutronix.de, selindag@gmail.com, tyhicks@linux.microsoft.com Message-ID: Date: Mon, 1 Feb 2021 18:32:52 +0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20210127172706.617195-1-pasha.tatashin@soleen.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Pavel, On 27/01/2021 17:27, Pavel Tatashin wrote: > Enable MMU during kexec relocation in order to improve reboot performance. > > If kexec functionality is used for a fast system update, with a minimal > downtime, the relocation of kernel + initramfs takes a significant portion > of reboot. > > The reason for slow relocation is because it is done without MMU, and thus > not benefiting from D-Cache. > > Performance data > ---------------- > For this experiment, the size of kernel plus initramfs is small, only 25M. > If initramfs was larger, than the improvements would be greater, as time > spent in relocation is proportional to the size of relocation. > > Previously: > kernel shutdown 0.022131328s > relocation 0.440510736s > kernel startup 0.294706768s > > Relocation was taking: 58.2% of reboot time > > Now: > kernel shutdown 0.032066576s > relocation 0.022158152s > kernel startup 0.296055880s > > Now: Relocation takes 6.3% of reboot time > > Total reboot is x2.16 times faster. > > With bigger userland (fitImage 380M), the reboot time is improved by 3.57s, > and is reduced from 3.9s down to 0.33s > Previous approaches and discussions > ----------------------------------- The problem I see with this is rewriting the relocation code. It needs to work whether the machine has enough memory to enable the MMU during kexec, or not. In off-list mail to Pavel I proposed an alternative implementation here: https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec+mmu/v0 By using a copy of the linear map, and passing the phys_to_virt offset into arm64_relocate_new_kernel() its possible to use the same code when we fail to allocate the page tables, and run with the MMU off as it does today. I'm convinced someone will crawl out of the woodwork screaming 'regression' if we substantially increase the amount of memory needed to kexec at all. From that discussion: this didn't meet Pavel's timing needs. If you depend on having all the src/dst pages lined up in a single line, it sounds like you've over-tuned this to depend on the CPU's streaming mode. What causes the CPU to start/stop that stuff is very implementation specific (and firmware configurable). I don't think we should let this rule out systems that can kexec today, but don't have enough extra memory for the page tables. Having two copies of the relocation code is obviously a bad idea. (as before: ) Instead of trying to make the relocations run quickly, can we reduce them? This would benefit other architectures too. Can the kexec core code allocate higher order pages, instead of doing everything page at at time? If you have a crash kernel reservation, can we use that to eliminate the relocations completely? (I think this suggestion has been lost in translation each time I make it. I mean like this: https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec/kexec_in_crashk/v0 Runes to test it: | sudo ./kexec -p -u | sudo cat /proc/iomem | grep Crash | b0200000-f01fffff : Crash kernel | sudo ./kexec --mem-min=0xb0200000 --mem-max=0xf01ffffff -l ~/Image --reuse-cmdline I bet its even faster!) I think 'as fast as possible' and 'memory constrained' are mutually exclusive requirements. We need to make the page tables optional with a single implementation. Thanks, James