Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7723121ybi; Tue, 9 Jul 2019 03:18:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqxNQuydeBIO/kFNpNrPAazjccB1uNwoldcQyfzbijhwrn4thochymbKOeXVue/gaNeXYs8y X-Received: by 2002:a17:90b:f12:: with SMTP id br18mr30571286pjb.127.1562667526609; Tue, 09 Jul 2019 03:18:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562667526; cv=none; d=google.com; s=arc-20160816; b=y3/EMV/sfKl0R7vufkRrhSYGBRkDkNW/K6Orm0tEy9vNDGyiBKDk+B4zVv6rhp7atA 8ga/PI+t7B/TgQ2uEF44rUpwupHKJglCBQOCVS305UkWHf0qbEm3/BC3NYajnugprnVY 0thYZZcfbGZxpLt6TZ5aWWLKrG7PToPVEPtuWNIabTiEaR1/zLJNkSHkz3N6paKcVw4k IUh30uRcZbMvgllorMsr3AOu/0PdZidb7ovBlOYZk0bggk5z2YnDw0rYo0a5YwX10Egf psdfnmf3QP/k6OxLw/tuKoEtOClcsG/Lsnt/ZxVw3I0XgqXU6IT5bQ0L6Uwi4BtE6stX KiQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=E4RzCAwWsxbJTTOrYouTV+2tIaHNpTBJDcnS7mg4ba4=; b=fxLGiPs8fTK9i1jGY55Mo7BnkpIbCDwx0Ppwm4h6CQ1wlZKs50wNp1pGYYxxeQxK+5 Nei7bFgX6KIImuhEOhfQllXCKaD6Vu0spKaAyLrzvCBvhIXNDmjJbazxUBbqq7uLfCCt xKxT2uwUUG/o9da6N+HNGSBaOEKWBOobcVV74eYLbFk1XyLKcS2gfAauNZ9XWodFyGhU xmkfNE6aXozhZHUXkYOG0xRbWQzQiKgtcQOagrHhExUmBc3aSAcbj2c9z6dxjnJACqHu J6FDsS9rWfdNCEO25vin46CksniusOsSIGqrR41bdAQaRURekTO/aSFYY5wjRRk2fqsB GEDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 94si21253103plb.224.2019.07.09.03.18.30; Tue, 09 Jul 2019 03:18:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726149AbfGIKSG (ORCPT + 99 others); Tue, 9 Jul 2019 06:18:06 -0400 Received: from foss.arm.com ([217.140.110.172]:40874 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725989AbfGIKSG (ORCPT ); Tue, 9 Jul 2019 06:18:06 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F96B2B; Tue, 9 Jul 2019 03:18:05 -0700 (PDT) Received: from [10.1.196.105] (eglon.cambridge.arm.com [10.1.196.105]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AB2AC3F738; Tue, 9 Jul 2019 03:18:02 -0700 (PDT) Subject: Re: [v1 0/5] allow to reserve memory for normal kexec kernel To: Pavel Tatashin Cc: "Eric W. Biederman" , James Morris , Sasha Levin , kexec@lists.infradead.org, LKML , corbet@lwn.net, Catalin Marinas , will@kernel.org, linux-doc@vger.kernel.org, Linux ARM References: <20190708211528.12392-1-pasha.tatashin@soleen.com> <87sgrgjd6i.fsf@xmission.com> From: James Morse Message-ID: <9a61d38a-66cb-132c-de07-ee65d5a238c0@arm.com> Date: Tue, 9 Jul 2019 11:18:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Pavel, Eric, (Subject-Nit: 'arm64:' is needed to match the style for arm64's arch code. Without it the maintainer is likely to skip the patches as being for core code.) On 09/07/2019 01:09, Pavel Tatashin wrote: >> Something is very very wrong there. >> >> Last I measured memory bandwidth seriously I could touch a Gigabyte per >> second easily, and that was nearly 20 years ago. Did you manage to >> disable caching or have some particularly slow code that does the >> reolocations. >> >> There is a serious cost to reserving memory in that it is simply not >> available at other times. For kexec on panic there is no other reliable >> way to get memory that won't be DMA'd to. > Indeed, but sometimes fast reboot is more important than the cost of > reserving 32M-64M of memory. >> We have options in this case and I would strongly encourage you to track >> down why that copy in relocation is so very slow. I suspect a 4KiB page >> size is large enough that it can swamp pointer following costs. >> >> My back of the napkin math says even 20 years ago your copying costs >> should be only 0.037s. The only machine I have ever tested on where >> the copy costs were noticable was my old 386. >> Maybe I am out to lunch here but a claim that your memory only runs >> at 100MiB/s (the speed of my spinning rust hard drive) is rather >> incredible. > I agree, my measurement on this machine was 2,857MB/s. Perhaps when > MMU is disabled ARM64 also has caching disabled? The function that > loops through array of pages and relocates them to final destination > is this: > A comment before calling it: > > 205 /* > 206 * cpu_soft_restart will shutdown the MMU, disable data caches, then > 207 * transfer control to the reboot_code_buffer which contains a copy of > 208 * the arm64_relocate_new_kernel routine. arm64_relocate_new_kernel > 209 * uses physical addressing to relocate the new image to its final > 210 * position and transfers control to the image entry point when the > 211 * relocation is complete. > 212 * In kexec case, kimage->start points to purgatory assuming that > 213 * kernel entry and dtb address are embedded in purgatory by > 214 * userspace (kexec-tools). > 215 * In kexec_file case, the kernel starts directly without purgatory. > 216 */ > So, as I understand at least data caches are disabled, and MMU is > disabled, perhaps this is why this function is so incredibly slow? Yup, spot on. Kexec typically wants to place the new kernel over the top of the old one, so its guaranteed to overwrite the live swapper_pg_dir. There is also nothing to prevent the other parts of the page-tables being overwritten as we relocate the kernel. The way the the kexec series chose to make this safe was the simplest: turn the MMU off. We need to enter purgatory with the MMU off anyway. (Its worth checking your kexec-tools purgatory isn't spending a decade generating a SHA256 of the kernel while the MMU is off. This is pointless as we don't suspect the previous kernel of corrupting memory, and we can't debug/report the problem if we detect a different SHA256. Newer kexec-tools have some commandline option to turn this thing off.) > Perhaps, there is a better way to fix this problem by keeping caches > enabled while still relocating? Any suggestions from Aarch64 > developers? Turning the MMU off is the simplest. The alternative is a lot more complicated: (To get the benefit of the caches, we need the MMU enabled to tell the hardware what the cache-ability attributes of each page of memory are.) We'd need to copy the page tables to build a new set out of memory we know won't get overwritten. Switching to this 'safe set' is tricky, as it also maps the code we're executing. To do that we'd need to use TTBR0 to hold another 'safe mapping' of the code we're running, while we change our view of the linear-map. Hibernate does exactly this, so its possible to re-use some of that logic. From memory, I think the reason that didn't get done is kexec doesn't provide an allocator, and needs the MMU off at some point anyway. Thanks, James