Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp5330877ybi; Wed, 12 Jun 2019 00:21:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqwryvYSgJInhgIDwnq+6rr2cPSYEpYTZjoIcBAmirqik/8zUM4m4ss3IQ+gPud0TnSwebad X-Received: by 2002:a17:902:b202:: with SMTP id t2mr78065229plr.69.1560324078027; Wed, 12 Jun 2019 00:21:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560324078; cv=none; d=google.com; s=arc-20160816; b=RU2TGzCRrCWG9AqLVdK7oItziE58+EM3vDCtMgZKVJ0kBOibJeQKtWOGM5ZQshldVr /cXVAlqMj3BXkC/ejaEbTKTtqOQQNfzGw8fQcoOGemxq5jH/9jDSoUA2EWOYWVkMXHO/ oD9S6bmwWTk3Sruqic4UlaXBJR4DtbpPClDazFR2gNLcrLDc3ajwgD+upoYKBRvadyjY hxKPZ66LlPCzljVcFn4/StzMYV7euMpacvZPzdbhPD7A/sja5lEqTc32RK8DyiZMOlNy 3hO1F/cSA/OHofw9UjjgjtK0TrqjK/eYvYvKHMJwmj2C6EQhDLPFB/+VpbeEdCxL9by2 9JIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:organization:from :references:cc:to:subject:dkim-signature; bh=MogpfsEDMN5cINGy9NYc5BobLdtPxSxSh+1uml2YEE4=; b=Q9fWkUtZykq5YAcVGof7+vvs9yS+Z1u2N6Pn5SEcKkpNpRUW+JU3CsHheTKcx3C2DZ 0H2ZnlU0DKBjMKdZhpLdQbE0BWjPIRL7niswbBvKzyV4lPOmJtwESv2iXFmA5DQLMOcz I9Yi1xlAEqFm+FYyL6d4nWcyRWdt5sxsxDqKcViupnWL42snMS6S6JT/1HZY7fvZn9y9 BuofC4rtEMIU7FeStvxA43/yNojhfT52rgidQbHshJW7ez5KFKAO1BnJT+ISPkDkn8wR wGXJ06YXrNBUlPRWeRvIGl8ilV3iEyNTexxdC9rSbHCuCUrrA6J4vXR8fJMO/HLT5wiz rHSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GQnMsjM8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l11si14146266pgp.438.2019.06.12.00.21.01; Wed, 12 Jun 2019 00:21:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GQnMsjM8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407427AbfFLBGZ (ORCPT + 99 others); Tue, 11 Jun 2019 21:06:25 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44244 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405839AbfFLBGY (ORCPT ); Tue, 11 Jun 2019 21:06:24 -0400 Received: by mail-pf1-f196.google.com with SMTP id t16so8525985pfe.11 for ; Tue, 11 Jun 2019 18:06:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=MogpfsEDMN5cINGy9NYc5BobLdtPxSxSh+1uml2YEE4=; b=GQnMsjM8WfvHCd2nvoofF6j3WHIWn4KeV6xHyiFVP08PELSGNOvdMir1AHt6KF7bPU OfixU6Qgkvtq+LGc8P9OQ0sI+w0blIDkImxCRdF+BrHOKiTpYCDB7fNol6ByR2hXOVl7 wjJfHrG3NJDqAAgo/foug204XnRFrpeSSP+GtsmYAaS66WCP7c2672G7WhWlhy+2/Yrc FvoWZWIB0e5Ou/I6rHgZLNZoaADqHkRPp0LhjOSu6OhbqMZ5K6MAkVyAMCo+jF0Rbgze ttW3kS6aiC0RnY/xlOfU+rXrBYmEDXEHJtl9XTOoY9vu92MfNSMg3Z/UK9owvacfjJrP OjyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=MogpfsEDMN5cINGy9NYc5BobLdtPxSxSh+1uml2YEE4=; b=Zve6YmI/OI5ZXb8Qh/aznWsAu83N3l0k5GsY2fZPXEBMp8yIXMQyyO4Wim/DhwFe3M ULlkCmcigzIqGprAk+smoqo3lGON2HgS0qEcHfAFLiHWzDohJVPnMjdNjH1z3+whMypp xuVlAmN6+5Pwq+GpP+lKDyOdlWSH2obdv+pdxGN619KO5pdMgyyPYHiL3Ji2L5KfKE4z CGceOHR3xHboapU3Gi4UOMSrXyvABj6EQLWPSlyIorswbATToye3xrtjRvQMs2zsflI/ IroPf9Bg+Rqa5xiJaNpAdOqxBWRcdsfDEL21E8FNcmXuRJ3q+l6luEhsoOpFVwvHSC8L tvNw== X-Gm-Message-State: APjAAAWrE3GmG9WKdXu+UZZRsiWUzE6Qb0nrDY4/WMsTyujPliI9KbFc ZpFEh96J1UzUhdx2QzBSwQpBOEXNEpCm1g== X-Received: by 2002:a17:90a:254e:: with SMTP id j72mr30658331pje.11.1560301583425; Tue, 11 Jun 2019 18:06:23 -0700 (PDT) Received: from [0.0.0.0] (104.129.187.94.16clouds.com. [104.129.187.94]) by smtp.gmail.com with ESMTPSA id a18sm3530668pjq.0.2019.06.11.18.06.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Jun 2019 18:06:22 -0700 (PDT) Subject: Re: [PATCH v11 0/3] remain and optimize memblock_next_valid_pfn on arm and arm64 To: Hanjun Guo , Ard Biesheuvel Cc: Will Deacon , Ard Biesheuvel , Mark Rutland , Michal Hocko , Catalin Marinas , Kemi Wang , Wei Yang , Linux-MM , Eugeniu Rosca , Petr Tesarik , Nikolay Borisov , Russell King , Daniel Jordan , AKASHI Takahiro , Mel Gorman , Andrey Ryabinin , Laura Abbott , Daniel Vacek , Vladimir Murzin , Kees Cook , Vlastimil Babka , Johannes Weiner , YASUAKI ISHIMATSU , Jia He , Gioh Kim , linux-arm-kernel , Steve Capper , Linux Kernel Mailing List , James Morse , Philip Derrin , Andrew Morton References: <1534907237-2982-1-git-send-email-jia.he@hxt-semitech.com> <20180907144447.GD12788@arm.com> <84b8e874-2a52-274c-4806-968470e66a08@huawei.com> <2de74de9-35b0-5e62-d822-1be59f0ef605@huawei.com> From: Jia He Organization: ARM Message-ID: <8fdf5545-21b7-354c-4c4b-e1e92048864f@gmail.com> Date: Wed, 12 Jun 2019 09:05:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <2de74de9-35b0-5e62-d822-1be59f0ef605@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Hanjun On 2019/6/11 23:18, Hanjun Guo wrote: > Hello Ard, > > Thanks for the reply, please see my comments inline. > > On 2019/6/10 21:16, Ard Biesheuvel wrote: >> On Sat, 8 Jun 2019 at 06:22, Hanjun Guo wrote: >>> Hi Ard, Will, >>> >>> This week we were trying to debug an issue of time consuming in mem_init(), >>> and leading to this similar solution form Jia He, so I would like to bring this >>> thread back, please see my detail test result below. >>> >>> On 2018/9/7 22:44, Will Deacon wrote: >>>> On Thu, Sep 06, 2018 at 01:24:22PM +0200, Ard Biesheuvel wrote: >>>>> On 22 August 2018 at 05:07, Jia He wrote: >>>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >>>>>> where possible") optimized the loop in memmap_init_zone(). But it causes >>>>>> possible panic bug. So Daniel Vacek reverted it later. >>>>>> >>>>>> But as suggested by Daniel Vacek, it is fine to using memblock to skip >>>>>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID. >>>>>> >>>>>> More from what Daniel said: >>>>>> "On arm and arm64, memblock is used by default. But generic version of >>>>>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does >>>>>> not always return the next valid one but skips more resulting in some >>>>>> valid frames to be skipped (as if they were invalid). And that's why >>>>>> kernel was eventually crashing on some !arm machines." >>>>>> >>>>>> About the performance consideration: >>>>>> As said by James in b92df1de5, >>>>>> "I have tested this patch on a virtual model of a Samurai CPU with a >>>>>> sparse memory map. The kernel boot time drops from 109 to 62 seconds." >>>>>> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64. >>>>>> >>>>>> Besides we can remain memblock_next_valid_pfn, there is still some room >>>>>> for improvement. After this set, I can see the time overhead of memmap_init >>>>>> is reduced from 27956us to 13537us in my armv8a server(QDF2400 with 96G >>>>>> memory, pagesize 64k). I believe arm server will benefit more if memory is >>>>>> larger than TBs >>>>>> >>>>> OK so we can summarize the benefits of this series as follows: >>>>> - boot time on a virtual model of a Samurai CPU drops from 109 to 62 seconds >>>>> - boot time on a QDF2400 arm64 server with 96 GB of RAM drops by ~15 >>>>> *milliseconds* >>>>> >>>>> Google was not very helpful in figuring out what a Samurai CPU is and >>>>> why we should care about the boot time of Linux running on a virtual >>>>> model of it, and the 15 ms speedup is not that compelling either. >>> Testing this patch set on top of Kunpeng 920 based ARM64 server, with >>> 384G memory in total, we got the time consuming below >>> >>> without this patch set with this patch set >>> mem_init() 13310ms 1415ms >>> >>> So we got about 8x speedup on this machine, which is very impressive. >>> >> Yes, this is impressive. But does it matter in the grand scheme of >> things? > It matters for this machine, because it's for storage and there is > a watchdog and the time consuming triggers the watchdog. > >> How much time does this system take to arrive at this point >> from power on? > Sorry, I don't have such data, as the arch timer is not initialized > and I didn't see the time stamp at this point, but I read the cycles > from arch timer before and after the time consuming function to get > how much time consumed. > >>> The time consuming is related the memory DIMM size and where to locate those >>> memory DIMMs in the slots. In above case, we are using 16G memory DIMM. >>> We also tested 1T memory with 64G size for each memory DIMM on another ARM64 >>> machine, the time consuming reduced from 20s to 2s (I think it's related to >>> firmware implementations). >>> >> I agree that this optimization looks good in isolation, but the fact >> that you spotted a bug justifies my skepticism at the time. On the >> other hand, now that we have several independent reports (from you, >> but also from the Renesas folks) that the speedup is worthwhile for >> real world use cases, I think it does make sense to revisit it. > Thank you very much for taking care of this :) > >> So what I would like to see is the patch set being proposed again, >> with the new data points added for documentation. Also, the commit >> logs need to crystal clear about how the meaning of PFN validity >> differs between ARM and other architectures, and why the assumptions >> that the optimization is based on are guaranteed to hold. > I think Jia He no longer works for HXT, if don't mind, I can repost > this patch set with Jia He's authority unchanged. Ok, I don't mind that, thanks for your followup :) --- Cheers, Justin (Jia He)