Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp1777099imw; Tue, 5 Jul 2022 15:29:49 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t5gdlRuZzyDhiVNP9MKSUXkz9qLr0XGcwLw6T/aTuDHDW9NMotklXFto9s8J3ADbmwcx+C X-Received: by 2002:a17:902:db0e:b0:16b:eba9:9364 with SMTP id m14-20020a170902db0e00b0016beba99364mr9263646plx.100.1657060188859; Tue, 05 Jul 2022 15:29:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657060188; cv=none; d=google.com; s=arc-20160816; b=WbgnxPIGKFQtDN+OONiN2fyQKRB4gh810unKwSujMmyO5wOZummy5yJ82L+sLwbNt7 RrhNyH6rxFMurCUBPixNqp4FMKbDwP2hUGIlpY1dd8dOy051zviZvPKFm26JlnzE95Oa BVJurIveq5fqeQX6/kCXlJasvDixwFEXFLJpz9RLtl54G55b+/eTG7F2VekGdBAWiZ77 vYYfeouyL2cEFxmpPy3hTDNKt/mGytcj63wmCz/MKsgNIXfm2uo5jt4X0O2WMVKJbLZy 1LdOheBYAYzgK0kWhS/QFg/qgxWeiVIrfaweG0QOj9/mkYUTlNuX95vItBdYrjHqiGDL 6k6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=KYUYD0M6+fT1vVodSsigJbD34a0nRsBR4CtA4TjGH58=; b=hAjorxan1LpFTbJMJD8NDXam8NwSlxIWDQCJAul9XgRA0wT3u1YOf++JcZoIMh0rC1 Xhh35kfMslOSFrv5LaCUL9kE0JOFtgZHZpyTRKj9OzcnkQ8mpdcL+iG4hMqPDnyeu0bB D6NHxg4XQxNDEFHjIlaOQYAH7W4y3rw7ae6xJP8tDgEjPrYjOYcdTR+DPKEgtkeVu51K WC3+qtEO2bbHTZnxFWa5YAgVytQ0XkHaDAAMD8rrO3Y8cJe0HnivTshsymJG8NNV7LbM NV+SJFNuURZBJ+luXEtzHL/2bd1DSL8jHGEB8Dxg5MT9FWVvPbNKK3XCQGaca//ur5WK ktqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ez6-20020a17090ae14600b001ec9fefb09asi16283495pjb.134.2022.07.05.15.29.36; Tue, 05 Jul 2022 15:29:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230158AbiGEV7z (ORCPT + 99 others); Tue, 5 Jul 2022 17:59:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiGEV7y (ORCPT ); Tue, 5 Jul 2022 17:59:54 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CDA718B21 for ; Tue, 5 Jul 2022 14:59:53 -0700 (PDT) Received: from ptx.hi.pengutronix.de ([2001:67c:670:100:1d::c0]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o8qaE-0000UQ-EE; Tue, 05 Jul 2022 23:59:50 +0200 Received: from sha by ptx.hi.pengutronix.de with local (Exim 4.92) (envelope-from ) id 1o8qaC-0006a6-Qd; Tue, 05 Jul 2022 23:59:48 +0200 Date: Tue, 5 Jul 2022 23:59:48 +0200 From: Sascha Hauer To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Ingo Molnar , kernel@pengutronix.de Subject: Re: Performance impact of CONFIG_FUNCTION_TRACER Message-ID: <20220705215948.GK5208@pengutronix.de> References: <20220705105416.GE5208@pengutronix.de> <20220705103901.41a70cf0@rorschach.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220705103901.41a70cf0@rorschach.local.home> X-Sent-From: Pengutronix Hildesheim X-URL: http://www.pengutronix.de/ X-Accept-Language: de,en X-Accept-Content-Type: text/plain User-Agent: Mutt/1.10.1 (2018-07-13) X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::c0 X-SA-Exim-Mail-From: sha@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 05, 2022 at 10:39:01AM -0400, Steven Rostedt wrote: > On Tue, 5 Jul 2022 12:54:16 +0200 > Sascha Hauer wrote: > > > Hi, > > > > I ran some lmbench subtests on a ARMv7 machine (NXP i.MX6q) with and > > without CONFIG_FUNCTION_TRACER enabled (with CONFIG_DYNAMIC_FTRACE > > enabled and no tracing active), see below. The Kconfig help text of this > > option reads as: > > > > > If it's runtime disabled (the bootup default), then the overhead of > > > the instructions is very small and not measurable even in > > > micro-benchmarks. > > Well, this is true for x86 ;-) That was my assumption ;) > > > > > In my tests the overhead is small, but it surely exists and is > > measurable at least on ARMv7 machines. Is this expected? Should the help > > text be rephrased a little less optimistic? > > You mean "(but may vary by architecture)" Something like that, yes. > > As I believe due to using a link register for function calls, ARM > requires adding two 4 byte nops to every function where as x86 only > adds a single 5 byte nop. > > Although nops are very fast (they should not be processed in the CPU's > pipe line, but I don't know if that's true for every arch). It also > affects instruction cache misses, as adding 8 bytes around the code > will cause more cache misses than when they do not exist. Just digged around a bit and saw that on ARM it's not even a real nop. The compiler emits: push {lr} bl 8010e7c0 <__gnu_mcount_nc> Which is then turned into a nop by replacing the second instruction with add sp, sp, #4 to bring the stack pointer back to its original value. This indeed must be processed by the CPU pipeline. I wonder if that could be optimized by replacing both instructions with a nop. I have no idea though if that's feasible at all or if the overhead would even get smaller by that. > > Also, there's some configurations that use the old mcount that does add > some more code to handle the mcount case. > > So if this is just to have us change the kconfig, I'm happy to do that. Yes, would be good to make the kconfig text clear. The overhead itself is fine when people know that's the price to pay for getting the function tracer. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |