Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3407272pxj; Tue, 11 May 2021 03:58:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzrcu4q3HZhhgXTj5l0jHhiH9CD/ZlI/heMC76jKmE2YkEQMqV6ySsFA03xiIW0QAB68FyK X-Received: by 2002:a05:6e02:1aa7:: with SMTP id l7mr25561906ilv.307.1620730733910; Tue, 11 May 2021 03:58:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620730733; cv=none; d=google.com; s=arc-20160816; b=exzAXIVU5vWnc3jYleFLuMvh6qwjUfRPWaQERyrO6ayplfUX9VI814Xir6aXx6ORsf E0nmS1GgAi/C59XPFcUh4/SUCss4/l7w6xGWrTDNQojrBPrt/fG6mCf/bxVsiHjsOhEp tceAaGWzRqXUvZrc3dxM+koIeT9DeU8koj8RkxA7PySDz4dQGcPpcxZdUR+WGu7Ln539 2wQLjsfertiR7tGLwqD5idqgXhw9fRj/BE8vr9GU/cT5XUY7rTNiu/66HAuu3F92MMB7 isE58dl+5D/KA9wLrD+6E61YaQwfzNmjwMrmHbkGYcmMBIOlNVFPiD3A0+fi+NDzSlxQ 5i6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=t/2YADnwWuFuInu0OJtHNY9vHinBlUlYJNeXcdM2tn8=; b=oQcTXn+bgSUZiaobYT6ptLdbAbBH5ZfqvS00trW8SfnsMOQXQAnM3ln0qgs8ZhUCa6 Zw6yAAZGjEBvQPpSGFtO4fsydxuevv+9BYAb6EPoQruEItHb+DrNhild1BQKSRLMeQCJ p1FmFtAhQuO50a5liXnDGrZYhCPzy13mA8lBQrQE0dUx/8+puthj/2P0cwSs6fFupBvs iph1ddzpQZ4ipJvtR7BdL/igy6g+Wq7VGdkqkeTCjPlqSNyPXpQpdFOzKJjn77jOQEfc o1pRr9m4WLWq+1Cc4FbPqLMS1CxR2daZ3sNH0t8BL7QZufB4wruXfZa79doN8x7vgZNx 6TdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a5si20103793ilv.74.2021.05.11.03.58.42; Tue, 11 May 2021 03:58:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231513AbhEKK6q (ORCPT + 99 others); Tue, 11 May 2021 06:58:46 -0400 Received: from gate.crashing.org ([63.228.1.57]:55809 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231515AbhEKK6e (ORCPT ); Tue, 11 May 2021 06:58:34 -0400 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 14BApu5c031983; Tue, 11 May 2021 05:51:56 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 14BAptH1031976; Tue, 11 May 2021 05:51:55 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Tue, 11 May 2021 05:51:54 -0500 From: Segher Boessenkool To: Christophe Leroy Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] powerpc: Force inlining of csum_add() Message-ID: <20210511105154.GJ10366@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: > Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to > avoid multiple csum_partial() with GCC10") inlined csum_partial(). > > Now that csum_partial() is inlined, GCC outlines csum_add() when > called by csum_partial(). > c064fb28 : > c064fb28: 7c 63 20 14 addc r3,r3,r4 > c064fb2c: 7c 63 01 94 addze r3,r3 > c064fb30: 4e 80 00 20 blr Could you build this with -fdump-tree-einline-all and send me the results? Or open a GCC PR yourself :-) Something seems to have decided this asm is more expensive than it is. That isn't always avoidable -- the compiler cannot look inside asms -- but it seems it could be improved here. Do you have (or can make) a self-contained testcase? > The sum with 0 is useless, should have been skipped. That isn't something the compiler can do anything about (not sure if you were suggesting that); it has to be done in the user code (and it tries to already, see below). > And there is even one completely unused instance of csum_add(). That is strange, that should never happen. > ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': > ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline] > 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) > | ^~~~~~~~ > ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here > 172 | sum = csum_add(sum, (__force __wsum)*(const u32 *)buff); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At least we say what happened. Progress! :-) > In the non-inlined version, the first sum with 0 was performed. > Here it is skipped. That is because of how __builtin_constant_p works, most likely. As we discussed elsewhere it is evaluated before all forms of loop unrolling. The patch looks perfect of course :-) Reviewed-by: Segher Boessenkool Segher > --- a/arch/powerpc/include/asm/checksum.h > +++ b/arch/powerpc/include/asm/checksum.h > @@ -91,7 +91,7 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len, > } > > #define HAVE_ARCH_CSUM_ADD > -static inline __wsum csum_add(__wsum csum, __wsum addend) > +static __always_inline __wsum csum_add(__wsum csum, __wsum addend) > { > #ifdef __powerpc64__ > u64 res = (__force u64)csum;