Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp499811pxb; Thu, 17 Feb 2022 08:27:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwwZ953zALgp7XQNFro6d0TeQGou4k5Ppdl6jIJpmdBD4mgrd//XyksLdex9NEMQQSPBAS+ X-Received: by 2002:a17:902:eb8f:b0:14d:b09a:a1e with SMTP id q15-20020a170902eb8f00b0014db09a0a1emr3543439plg.16.1645115273673; Thu, 17 Feb 2022 08:27:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645115273; cv=none; d=google.com; s=arc-20160816; b=o4pKJsN3J7Ia2BzshLkeBUSTgew2AKCMACV8H7/l4l1uZJaEdAG9nubwVk7C3ukni1 19A0KVQkCkdxBmZwyi4h918Vr5QKzseNqaD3Y3Hx/LLC3WVFDeE0wpa3yAu1CMchPoli yJeJVeyjBP8kLI5kDq4AgyNYvrve/7gwkcwsKbegTrrQBuSR5NPe2rJLR8GlnEVbimFS Bb/YmdFS+6iB4C2NJ/oHFffR+gU3cXUr0d860TfHiZsnWkzPyWbuBnas4HZxGFXx/q4s oWCoeyO+jXbsMRxxBQINwFgHJvP5JgeeM4O12NagQrm6Vz6Vvql+8vpCIxh8qylHak+6 yezQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=Yi++ZwOSPf+XEg3L1wubqK6hDAFf5LG21+08d1eOQFM=; b=P9yW/zBartC5vklFePdD21fE1XvEvf6JTu4cxBUJQ/I+NB0koJaF2GWIsch0TtCP+n atPwu3fKS4E1rZkzHrNdAl2KdeP9usOjgN4UlfdqzviMR7+ebPdOWHA/HjobG5KDDkyb zvChV+xk7TJoTOw4tRRAtYP9HSUfYngebueMJhodNKUFo02RFOBWeFiCuofqOWwemChI xzJGgxDAuT06Z3W3koAHaIMTKXUUa/wpUSdiIGGgRnn/rO59BCY1Is40kIhHq0eUg4ca ko/Q/Zly7F4+3Sqe2pLVxeKAjEidcwmxxkF7zxU/sNMg8/m8oJdpxsQSWync5TqKDM1g Rg9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kk5si1253187pjb.176.2022.02.17.08.27.37; Thu, 17 Feb 2022 08:27:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240344AbiBQMTl (ORCPT + 99 others); Thu, 17 Feb 2022 07:19:41 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231388AbiBQMTk (ORCPT ); Thu, 17 Feb 2022 07:19:40 -0500 Received: from pegase2.c-s.fr (pegase2.c-s.fr [93.17.235.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE06A2A82DB; Thu, 17 Feb 2022 04:19:25 -0800 (PST) Received: from localhost (mailhub3.si.c-s.fr [172.26.127.67]) by localhost (Postfix) with ESMTP id 4Jzv5S3MH0z9sS8; Thu, 17 Feb 2022 13:19:24 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase2.c-s.fr ([172.26.127.65]) by localhost (pegase2.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xBzLmjhg3TyM; Thu, 17 Feb 2022 13:19:24 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase2.c-s.fr (Postfix) with ESMTP id 4Jzv5S2R32z9sRv; Thu, 17 Feb 2022 13:19:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 445198B77C; Thu, 17 Feb 2022 13:19:24 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id rNFXPKHIWOLA; Thu, 17 Feb 2022 13:19:24 +0100 (CET) Received: from PO20335.IDSI0.si.c-s.fr (unknown [192.168.6.225]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 087DE8B763; Thu, 17 Feb 2022 13:19:23 +0100 (CET) Received: from PO20335.IDSI0.si.c-s.fr (localhost [127.0.0.1]) by PO20335.IDSI0.si.c-s.fr (8.17.1/8.16.1) with ESMTPS id 21HCJEXX405815 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 17 Feb 2022 13:19:14 +0100 Received: (from chleroy@localhost) by PO20335.IDSI0.si.c-s.fr (8.17.1/8.17.1/Submit) id 21HCJDt2405814; Thu, 17 Feb 2022 13:19:13 +0100 X-Authentication-Warning: PO20335.IDSI0.si.c-s.fr: chleroy set sender to christophe.leroy@csgroup.eu using -f From: Christophe Leroy To: "David S. Miller" , Jakub Kicinski Cc: Christophe Leroy , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, Masahiro Yamada Subject: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h Date: Thu, 17 Feb 2022 13:19:07 +0100 Message-Id: <978951d76d8cb84bab347c7623bc163e9a038452.1645100305.git.christophe.leroy@csgroup.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1645100346; l=8199; s=20211009; h=from:subject:message-id; bh=v6MTJe/nEEUeoiXp6PfSr9gz+VNgRLmJ7rahEeCQNs8=; b=2zarvVpKFu+fjJa3ALxyIoRqQkLIOLjVcYlHZkTz1CGAEgqTtzd//2JkGlluTGBvKqb35E7HUWmU nhjlJSdyCJpmB9+yo3JcN/RGCVZ60de0b6ozXl9dYlRWtDiTfCvi X-Developer-Key: i=christophe.leroy@csgroup.eu; a=ed25519; pk=HIzTzUj91asvincQGOFx6+ZF5AoUuP9GdOtQChs7Mm0= Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org All functions defined as static inline in net/checksum.h are meant to be inlined for performance reason. But since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to uninline functions when it wants. Fair enough in the general case, but for tiny performance critical checksum helpers that's counter-productive. The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE, Those helpers being 'static inline' in header files you suddenly find them duplicated many times in the resulting vmlinux. Here is a typical exemple when building powerpc pmac32_defconfig with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times: c04a23cc : c04a23cc: 7c 84 20 f8 not r4,r4 c04a23d0: 7c 63 20 14 addc r3,r3,r4 c04a23d4: 7c 63 01 94 addze r3,r3 c04a23d8: 4e 80 00 20 blr ... c04a2ce8: 4b ff f6 e5 bl c04a23cc ... c04a2d2c: 4b ff f6 a1 bl c04a23cc ... c04a2d54: 4b ff f6 79 bl c04a23cc ... c04a754c : c04a754c: 7c 84 20 f8 not r4,r4 c04a7550: 7c 63 20 14 addc r3,r3,r4 c04a7554: 7c 63 01 94 addze r3,r3 c04a7558: 4e 80 00 20 blr ... c04ac930: 4b ff ac 1d bl c04a754c ... c04ad264: 4b ff a2 e9 bl c04a754c ... c04e3b08 : c04e3b08: 7c 84 20 f8 not r4,r4 c04e3b0c: 7c 63 20 14 addc r3,r3,r4 c04e3b10: 7c 63 01 94 addze r3,r3 c04e3b14: 4e 80 00 20 blr ... c04e5788: 4b ff e3 81 bl c04e3b08 ... c04e65c8: 4b ff d5 41 bl c04e3b08 ... c0512d34 : c0512d34: 7c 84 20 f8 not r4,r4 c0512d38: 7c 63 20 14 addc r3,r3,r4 c0512d3c: 7c 63 01 94 addze r3,r3 c0512d40: 4e 80 00 20 blr ... c0512dfc: 4b ff ff 39 bl c0512d34 ... c05138bc: 4b ff f4 79 bl c0512d34 ... Restore the expected behaviour by using __always_inline for all functions defined in net/checksum.h vmlinux size is even reduced by 256 bytes with this patch: text data bss dec hex filename 6980022 2515362 194384 9689768 93daa8 vmlinux.before 6979862 2515266 194384 9689512 93d9a8 vmlinux.now Fixes: ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") Cc: Masahiro Yamada Signed-off-by: Christophe Leroy --- v3: Added fixes tag and handled checkpatch warning about length of lines. v2: Rebased on net tree --- include/net/checksum.h | 45 +++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/include/net/checksum.h b/include/net/checksum.h index 5218041e5c8f..712b554a23be 100644 --- a/include/net/checksum.h +++ b/include/net/checksum.h @@ -22,7 +22,7 @@ #include #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER -static inline +static __always_inline __wsum csum_and_copy_from_user (const void __user *src, void *dst, int len) { @@ -33,7 +33,7 @@ __wsum csum_and_copy_from_user (const void __user *src, void *dst, #endif #ifndef HAVE_CSUM_COPY_USER -static __inline__ __wsum csum_and_copy_to_user +static __always_inline __wsum csum_and_copy_to_user (const void *src, void __user *dst, int len) { __wsum sum = csum_partial(src, len, ~0U); @@ -45,7 +45,7 @@ static __inline__ __wsum csum_and_copy_to_user #endif #ifndef _HAVE_ARCH_CSUM_AND_COPY -static inline __wsum +static __always_inline __wsum csum_partial_copy_nocheck(const void *src, void *dst, int len) { memcpy(dst, src, len); @@ -54,7 +54,7 @@ csum_partial_copy_nocheck(const void *src, void *dst, int len) #endif #ifndef HAVE_ARCH_CSUM_ADD -static inline __wsum csum_add(__wsum csum, __wsum addend) +static __always_inline __wsum csum_add(__wsum csum, __wsum addend) { u32 res = (__force u32)csum; res += (__force u32)addend; @@ -62,12 +62,12 @@ static inline __wsum csum_add(__wsum csum, __wsum addend) } #endif -static inline __wsum csum_sub(__wsum csum, __wsum addend) +static __always_inline __wsum csum_sub(__wsum csum, __wsum addend) { return csum_add(csum, ~addend); } -static inline __sum16 csum16_add(__sum16 csum, __be16 addend) +static __always_inline __sum16 csum16_add(__sum16 csum, __be16 addend) { u16 res = (__force u16)csum; @@ -75,12 +75,12 @@ static inline __sum16 csum16_add(__sum16 csum, __be16 addend) return (__force __sum16)(res + (res < (__force u16)addend)); } -static inline __sum16 csum16_sub(__sum16 csum, __be16 addend) +static __always_inline __sum16 csum16_sub(__sum16 csum, __be16 addend) { return csum16_add(csum, ~addend); } -static inline __wsum csum_shift(__wsum sum, int offset) +static __always_inline __wsum csum_shift(__wsum sum, int offset) { /* rotate sum to align it with a 16b boundary */ if (offset & 1) @@ -88,42 +88,43 @@ static inline __wsum csum_shift(__wsum sum, int offset) return sum; } -static inline __wsum +static __always_inline __wsum csum_block_add(__wsum csum, __wsum csum2, int offset) { return csum_add(csum, csum_shift(csum2, offset)); } -static inline __wsum +static __always_inline __wsum csum_block_add_ext(__wsum csum, __wsum csum2, int offset, int len) { return csum_block_add(csum, csum2, offset); } -static inline __wsum +static __always_inline __wsum csum_block_sub(__wsum csum, __wsum csum2, int offset) { return csum_block_add(csum, ~csum2, offset); } -static inline __wsum csum_unfold(__sum16 n) +static __always_inline __wsum csum_unfold(__sum16 n) { return (__force __wsum)n; } -static inline __wsum csum_partial_ext(const void *buff, int len, __wsum sum) +static __always_inline +__wsum csum_partial_ext(const void *buff, int len, __wsum sum) { return csum_partial(buff, len, sum); } #define CSUM_MANGLED_0 ((__force __sum16)0xffff) -static inline void csum_replace_by_diff(__sum16 *sum, __wsum diff) +static __always_inline void csum_replace_by_diff(__sum16 *sum, __wsum diff) { *sum = csum_fold(csum_add(diff, ~csum_unfold(*sum))); } -static inline void csum_replace4(__sum16 *sum, __be32 from, __be32 to) +static __always_inline void csum_replace4(__sum16 *sum, __be32 from, __be32 to) { __wsum tmp = csum_sub(~csum_unfold(*sum), (__force __wsum)from); @@ -136,7 +137,7 @@ static inline void csum_replace4(__sum16 *sum, __be32 from, __be32 to) * m : old value of a 16bit field * m' : new value of a 16bit field */ -static inline void csum_replace2(__sum16 *sum, __be16 old, __be16 new) +static __always_inline void csum_replace2(__sum16 *sum, __be16 old, __be16 new) { *sum = ~csum16_add(csum16_sub(~(*sum), old), new); } @@ -150,15 +151,15 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb, void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb, __wsum diff, bool pseudohdr); -static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb, - __be16 from, __be16 to, - bool pseudohdr) +static __always_inline +void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb, + __be16 from, __be16 to, bool pseudohdr) { inet_proto_csum_replace4(sum, skb, (__force __be32)from, (__force __be32)to, pseudohdr); } -static inline __wsum remcsum_adjust(void *ptr, __wsum csum, +static __always_inline __wsum remcsum_adjust(void *ptr, __wsum csum, int start, int offset) { __sum16 *psum = (__sum16 *)(ptr + offset); @@ -175,12 +176,12 @@ static inline __wsum remcsum_adjust(void *ptr, __wsum csum, return delta; } -static inline void remcsum_unadjust(__sum16 *psum, __wsum delta) +static __always_inline void remcsum_unadjust(__sum16 *psum, __wsum delta) { *psum = csum_fold(csum_sub(delta, (__force __wsum)*psum)); } -static inline __wsum wsum_negate(__wsum val) +static __always_inline __wsum wsum_negate(__wsum val) { return (__force __wsum)-((__force u32)val); } -- 2.34.1