Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp349900imw; Thu, 14 Jul 2022 02:28:52 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uZekr9lLVfRl47HWttO5lq2Dj2Rsbb+c8pbXP0o61E0ElNYKop7Ptc43jdB0oHhHcgOGTQ X-Received: by 2002:a17:907:75f1:b0:72b:9e40:c1a9 with SMTP id jz17-20020a17090775f100b0072b9e40c1a9mr6711081ejc.523.1657790932497; Thu, 14 Jul 2022 02:28:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657790932; cv=none; d=google.com; s=arc-20160816; b=Ho3i7siisnqEST/dw2KjZlDOMK+MUxG/ZUwz1Thpt/nB7j2/D0dMIG4lLbsw58IYZI 5JnZpCm6vukA2RRG6F5NmaFgoDKcJY7KlkLNZewDso0weNJ5nSdMwK1eE1xnomNzHHaN gd8BIkaa1idB8/bhlcf+XbyUbCYGHBylAgPcjA358tjYwbstz20wG91uyd3vwjxLUITB GSC4St9+xcOiFq/GFVx7miFg7eLdEKVylKrHJIOJJ8ybvE5bJ7jDNSvVN34kbkLx/6OU qL7EKVK0BqEjzKczzcxAIQunXfDtSPvAMYD5RY6Ce5+lnfO3M2QCfOzNVh5f1y1XK/8H NxJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Vp0tPHm1CFTDC9Rl7Y0l4fze5rHgRib9pNkv+tmD8zQ=; b=NWM/SXHOEg9LnUzn5sjTtW/pN4ciiKfSq503ueXAC++0xqikn8L/pGQGS6+oZqdDk1 sWwYTWUi0v654VK7YPyVM6ROtTmN+l6WTW1Fc49gluNRKjuCa9SZIlTWaQmcufEhHgsl DYKcypxTYogsShMkH1cT82v5knpew1eG67FJLw/ztyQ1iFk1SNfQnDrDFyzsmxagiFZ1 scoFpGeCzDTELVQqeJZMXpBn3VeVc+lEzFcScuU7zaGABvFBryvVbjkMLZahIwo0yk/Q z0PX3qbnAl4UmLkmocvbbmt9YhVQ3bg9kPZ+PRz2aQbJk7ScXSrjNMTEgAMp/+0D6dYe miOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qk3-20020a170906d9c300b00711da52c6e4si1324285ejb.309.2022.07.14.02.28.27; Thu, 14 Jul 2022 02:28:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231281AbiGNJKd (ORCPT + 99 others); Thu, 14 Jul 2022 05:10:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229719AbiGNJKb (ORCPT ); Thu, 14 Jul 2022 05:10:31 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 241321EEC4 for ; Thu, 14 Jul 2022 02:10:30 -0700 (PDT) Received: from ptx.hi.pengutronix.de ([2001:67c:670:100:1d::c0]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oBurc-00017G-F1; Thu, 14 Jul 2022 11:10:28 +0200 Received: from sha by ptx.hi.pengutronix.de with local (Exim 4.92) (envelope-from ) id 1oBura-0007zV-O5; Thu, 14 Jul 2022 11:10:26 +0200 Date: Thu, 14 Jul 2022 11:10:26 +0200 From: Sascha Hauer To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Ingo Molnar , kernel@pengutronix.de Subject: Re: Performance impact of CONFIG_FUNCTION_TRACER Message-ID: <20220714091026.GM2387@pengutronix.de> References: <20220705105416.GE5208@pengutronix.de> <20220705103901.41a70cf0@rorschach.local.home> <20220705215948.GK5208@pengutronix.de> <20220705182746.4ce53681@rorschach.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220705182746.4ce53681@rorschach.local.home> X-Sent-From: Pengutronix Hildesheim X-URL: http://www.pengutronix.de/ X-IRC: #ptxdist @freenode X-Accept-Language: de,en X-Accept-Content-Type: text/plain X-Uptime: 11:06:53 up 105 days, 21:36, 81 users, load average: 0.15, 0.38, 0.40 User-Agent: Mutt/1.10.1 (2018-07-13) X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::c0 X-SA-Exim-Mail-From: sha@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 05, 2022 at 06:27:46PM -0400, Steven Rostedt wrote: > On Tue, 5 Jul 2022 23:59:48 +0200 > Sascha Hauer wrote: > > > > > > > As I believe due to using a link register for function calls, ARM > > > requires adding two 4 byte nops to every function where as x86 only > > > adds a single 5 byte nop. > > > > > > Although nops are very fast (they should not be processed in the CPU's > > > pipe line, but I don't know if that's true for every arch). It also > > > affects instruction cache misses, as adding 8 bytes around the code > > > will cause more cache misses than when they do not exist. > > > > Just digged around a bit and saw that on ARM it's not even a real nop. > > The compiler emits: > > > > push {lr} > > bl 8010e7c0 <__gnu_mcount_nc> > > > > Which is then turned into a nop by replacing the second instruction with > > > > add sp, sp, #4 > > > > to bring the stack pointer back to its original value. This indeed must > > be processed by the CPU pipeline. I wonder if that could be optimized by > > replacing both instructions with a nop. I have no idea though if that's > > feasible at all or if the overhead would even get smaller by that. > > The problem is that there's no easy way to do that, because a task > could have been preempted after doing the 'push {lr}' and before the > 'bl'. Thus, you create a race by changing either one to a nop first. > > I wonder if it would have been better to change the first one to a jump > passed the second :-/ I gave this a try, but the performance was not better compared to the stack push/pop operations we have now. I also tried to replace both instructions with nops (mov r0, r0), still no better performance. I guess we have to live with it then. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |