Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4625417ybi; Tue, 11 Jun 2019 09:38:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxEXgAaiC1zlKbG0GxB97tWc114qsJjqL6wUT5zA0l7evzluCImO/7TxmfCdq+K/8gYCMQj X-Received: by 2002:a17:90a:bd8c:: with SMTP id z12mr6388820pjr.60.1560271112130; Tue, 11 Jun 2019 09:38:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560271112; cv=none; d=google.com; s=arc-20160816; b=DNahr1BqoK1TEebCwR+pLaV9bXPwECC4cG2tz7e/j8/JqXtGBrPrVZ+RQj/OuAlAhj fmpPFB8Z7iGy3Q4lxtHxBPp8l53zKvwUJfszEf4Bc3ju0gUdNOvhW38DJN/yoiTI3tGI zIGnj+Q6HggZVOQD5594PK/8l306NxNZt1szm+U2dT3w+ZpaeW3EkjvVCGjII7Kf61t6 DcCWfW/reTxtHS1Nk7Cs7xaQDy+Vkq7Caj+syPE+A3VttxVIENJmQeyd3yhxDX2wYg4J X1tXC42DfiyDvVWofxhQjUoD+7A5IxnMdSvnDZu15aK+vBZRNSGsVFUllvgYSYEdHmqA Hzdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=x5KG9AWosN4apaQwJvBPSo8XAB4WLQ0fWEU5pXGVklg=; b=B/jqoHy/bEqedOZ11LNCk2qCGQYNxuvOVXq9Pjpzv1qRUWMRVtHMMSS2i0/NfDoEpf Sx5XsZ7KRlTbbo1DNsVicFn/Nu+TKiFwATPbjN9sCObI8O7Xk11Jk33JGTYmm7qpcaH+ 3+u8CoJWxGOSEuyqyXNTKk/PqU1zSdzvDzSS+onJeBciba781/Mk0tHAl26A/1P/jBbh MPw7oojXbIqHW4+LzdKuu3nwuzd+rW/vF4emyrj0tmo95JiGpg64MRCZLf+spCkAuoIb S/Rbp04XDwtrkok3yhqvH2B5aBmNWbVxRnYoPcpn86UuHdx7e57Ev11p2gibxIf3ywWb It1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 61si13255637plq.157.2019.06.11.09.38.16; Tue, 11 Jun 2019 09:38:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404798AbfFKPTZ (ORCPT + 99 others); Tue, 11 Jun 2019 11:19:25 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:18129 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387563AbfFKPTC (ORCPT ); Tue, 11 Jun 2019 11:19:02 -0400 Received: from DGGEMS405-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 7293773B8CAD52EBAEAF; Tue, 11 Jun 2019 23:19:00 +0800 (CST) Received: from [127.0.0.1] (10.177.223.23) by DGGEMS405-HUB.china.huawei.com (10.3.19.205) with Microsoft SMTP Server id 14.3.439.0; Tue, 11 Jun 2019 23:18:57 +0800 Subject: Re: [PATCH v11 0/3] remain and optimize memblock_next_valid_pfn on arm and arm64 To: Ard Biesheuvel CC: Will Deacon , Ard Biesheuvel , Mark Rutland , Michal Hocko , Catalin Marinas , Kemi Wang , Wei Yang , Linux-MM , Eugeniu Rosca , Petr Tesarik , Nikolay Borisov , Russell King , Daniel Jordan , "AKASHI Takahiro" , Mel Gorman , "Andrey Ryabinin" , Laura Abbott , "Daniel Vacek" , Vladimir Murzin , "Kees Cook" , Vlastimil Babka , "Johannes Weiner" , YASUAKI ISHIMATSU , "Jia He" , Jia He , Gioh Kim , linux-arm-kernel , Steve Capper , Linux Kernel Mailing List , James Morse , Philip Derrin , Andrew Morton References: <1534907237-2982-1-git-send-email-jia.he@hxt-semitech.com> <20180907144447.GD12788@arm.com> <84b8e874-2a52-274c-4806-968470e66a08@huawei.com> From: Hanjun Guo Message-ID: <2de74de9-35b0-5e62-d822-1be59f0ef605@huawei.com> Date: Tue, 11 Jun 2019 23:18:48 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.223.23] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Ard, Thanks for the reply, please see my comments inline. On 2019/6/10 21:16, Ard Biesheuvel wrote: > On Sat, 8 Jun 2019 at 06:22, Hanjun Guo wrote: >> >> Hi Ard, Will, >> >> This week we were trying to debug an issue of time consuming in mem_init(), >> and leading to this similar solution form Jia He, so I would like to bring this >> thread back, please see my detail test result below. >> >> On 2018/9/7 22:44, Will Deacon wrote: >>> On Thu, Sep 06, 2018 at 01:24:22PM +0200, Ard Biesheuvel wrote: >>>> On 22 August 2018 at 05:07, Jia He wrote: >>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >>>>> where possible") optimized the loop in memmap_init_zone(). But it causes >>>>> possible panic bug. So Daniel Vacek reverted it later. >>>>> >>>>> But as suggested by Daniel Vacek, it is fine to using memblock to skip >>>>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID. >>>>> >>>>> More from what Daniel said: >>>>> "On arm and arm64, memblock is used by default. But generic version of >>>>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does >>>>> not always return the next valid one but skips more resulting in some >>>>> valid frames to be skipped (as if they were invalid). And that's why >>>>> kernel was eventually crashing on some !arm machines." >>>>> >>>>> About the performance consideration: >>>>> As said by James in b92df1de5, >>>>> "I have tested this patch on a virtual model of a Samurai CPU with a >>>>> sparse memory map. The kernel boot time drops from 109 to 62 seconds." >>>>> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64. >>>>> >>>>> Besides we can remain memblock_next_valid_pfn, there is still some room >>>>> for improvement. After this set, I can see the time overhead of memmap_init >>>>> is reduced from 27956us to 13537us in my armv8a server(QDF2400 with 96G >>>>> memory, pagesize 64k). I believe arm server will benefit more if memory is >>>>> larger than TBs >>>>> >>>> >>>> OK so we can summarize the benefits of this series as follows: >>>> - boot time on a virtual model of a Samurai CPU drops from 109 to 62 seconds >>>> - boot time on a QDF2400 arm64 server with 96 GB of RAM drops by ~15 >>>> *milliseconds* >>>> >>>> Google was not very helpful in figuring out what a Samurai CPU is and >>>> why we should care about the boot time of Linux running on a virtual >>>> model of it, and the 15 ms speedup is not that compelling either. >> >> Testing this patch set on top of Kunpeng 920 based ARM64 server, with >> 384G memory in total, we got the time consuming below >> >> without this patch set with this patch set >> mem_init() 13310ms 1415ms >> >> So we got about 8x speedup on this machine, which is very impressive. >> > > Yes, this is impressive. But does it matter in the grand scheme of > things? It matters for this machine, because it's for storage and there is a watchdog and the time consuming triggers the watchdog. > How much time does this system take to arrive at this point > from power on? Sorry, I don't have such data, as the arch timer is not initialized and I didn't see the time stamp at this point, but I read the cycles from arch timer before and after the time consuming function to get how much time consumed. > >> The time consuming is related the memory DIMM size and where to locate those >> memory DIMMs in the slots. In above case, we are using 16G memory DIMM. >> We also tested 1T memory with 64G size for each memory DIMM on another ARM64 >> machine, the time consuming reduced from 20s to 2s (I think it's related to >> firmware implementations). >> > > I agree that this optimization looks good in isolation, but the fact > that you spotted a bug justifies my skepticism at the time. On the > other hand, now that we have several independent reports (from you, > but also from the Renesas folks) that the speedup is worthwhile for > real world use cases, I think it does make sense to revisit it. Thank you very much for taking care of this :) > > So what I would like to see is the patch set being proposed again, > with the new data points added for documentation. Also, the commit > logs need to crystal clear about how the meaning of PFN validity > differs between ARM and other architectures, and why the assumptions > that the optimization is based on are guaranteed to hold. I think Jia He no longer works for HXT, if don't mind, I can repost this patch set with Jia He's authority unchanged. Thanks Hanjun