Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BF24C433EF for ; Sat, 8 Jan 2022 12:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234226AbiAHMvb (ORCPT ); Sat, 8 Jan 2022 07:51:31 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:42388 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231812AbiAHMv2 (ORCPT ); Sat, 8 Jan 2022 07:51:28 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1E268B808D4 for ; Sat, 8 Jan 2022 12:51:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B49D9C36AE5; Sat, 8 Jan 2022 12:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641646285; bh=IE0CJv+o/fUXrMN/8a18QUgyfiT7NICOpdhziFbTIuI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=PsfoKHKNptiovQblJ37aLcQnSL9nzojdLtm0l0Pf7lsazQmP/cwLYJ3Pxxd327ym8 V8yKixlsq7k5krknzGk+exx03YDOb5XCN/Mr92f7niWUJob5KSh700F0Saadv/FumR GPBL8CMC7ymnmjTuiYxqaF/d37eqc0v0Zy5vCtjaUDIrh8773gfvnz91NSd7tgRgZr FIMsswyGfKa4+QmRQhGUqevtVFLm4MtIZ6xca6jFnoRkuIxxFTXC2+GjWfYLmcIZvN cTfpxyamApPLjMstApNuf3T6YQ8J5N9mhqSz6rrBfUzcjAE5aba4/tK5jlrjD+9BVh UiKH+HsHme9og== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n6BBr-00Gkxn-Jp; Sat, 08 Jan 2022 12:51:23 +0000 Date: Sat, 08 Jan 2022 12:51:23 +0000 Message-ID: <87pmp2tmpg.wl-maz@kernel.org> From: Marc Zyngier To: He Ying Cc: , , , , , , , Subject: Re: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code In-Reply-To: <20220107085536.214501-1-heying24@huawei.com> References: <20220107085536.214501-1-heying24@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: heying24@huawei.com, catalin.marinas@arm.com, will@kernel.org, mark.rutland@arm.com, marcan@marcan.st, joey.gouly@arm.com, pcc@google.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 07 Jan 2022 08:55:36 +0000, He Ying wrote: > > Our product has been updating its kernel from 4.4 to 5.10 recently and > found a performance issue. We do a bussiness test called ARP test, which > tests the latency for a ping-pong packets traffic with a certain payload. > The result is as following. > > - 4.4 kernel: avg = ~20s > - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s > > I have been just learning arm64 pseudo-NMI code and have a question, > why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI? > I wonder if this brings some performance regression. > > First, I make this patch and then do the test again. Here's the result. > > - 5.10 kernel with this patch not applied: avg = ~40s > - 5.10 kernel with this patch applied: avg = ~23s > > Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not > set. It seems the pseudo-NMI feature actually brings some overhead to > performance event if CONFIG_ARM64_PSEUDO_NMI is not set. > > Furthermore, I find the feature also brings some overhead to vmlinux size. > I build 5.10 kernel with this patch applied or not while > CONFIG_ARM64_PSEUDO_NMI is not set. > > - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes. > - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes. > > That means arm64 pseudo-NMI feature may bring ~200KB overhead to > vmlinux size. > > Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size > and performance even if config is not set. To avoid it, add macro control > all around the related code. This obviously attracted my attention, and I took this patch for a ride on 5.16-rc8 on a machine that doesn't support GICv3 NMIs to make sure that any extra code would only result in pure overhead. There was no measurable difference with this patch applied or not, with CONFIG_ARM64_PSEUDO_NMI selected or not for the workloads I tried (I/O heavy virtual machines, hackbench). Mark already asked a number of questions (test case, implementation, test on a modern kernel). Please provide as many detail as you possibly can, because such a regression really isn't expected, and doesn't show up on the systems I have at hand. Some profiling numbers could also be interesting, in case this is a result of a particular resource being thrashed (TLB, cache...). Thanks, M. -- Without deviation from the norm, progress is not possible.