Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp297812rwl; Sat, 25 Mar 2023 02:25:17 -0700 (PDT) X-Google-Smtp-Source: AKy350aUY3dVgJV8bKO45F2my7BpW3jCcObQwjgjfnkI/fxmB9Hp6cgpFmXKhv24LWQ0WcTsBGxt X-Received: by 2002:a17:906:ca12:b0:922:cb10:ad06 with SMTP id jt18-20020a170906ca1200b00922cb10ad06mr5784506ejb.43.1679736317463; Sat, 25 Mar 2023 02:25:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679736317; cv=none; d=google.com; s=arc-20160816; b=UHf7XhUHlUvQ8mEGYqSJsgx/NOijFWyrX3E5z+bTYCRAUrCEHgj25hbq/h378Sf3Lr R2D1spguQakZXN5cImY52NukcXt2ZQSKl43Z83fW76IIduHXC6nheo+1cWEo+y64g2am /iaHzOdnnNTfGYdM3JuURBQRAp3T2TwwLgSMVq7JrL1ywcwz7hU9RstLvdhz+9MCQAI9 p2y9x9Hn9D94xZ0N/VjiikV6MZXmCJZkMGoBWpScnVJbwrglOJ641MXkSiLdjLsh+Ks0 3S49koRqPR0pixw0mlGjIZh3TxD2B5gGW/aUDqTOsfyPT9u2FOk1M/URo1uNqC6pcA6s kQ9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=12ASz05YWIuRJs/QDFzeXq/y3/fnfP9vKZxU15znz/o=; b=jNFGucmCRDXj6kTGWV6GjxKqlrk4Oq3hgor06kheVB++9OLp1niSvX2Wo7wyK62iKx Hg9gW8cbRU4WCtZBd8XItCM/ahtoq9Mqbmf2cAGi85/gmcAn6q1VtgQ1xCil3b5JDUJx D+yynmWQV289b3NJZLuKE4i/7lKwSlF7r3aSnoMyRHdFqOYKu8MI3Pu9g3hwFmTm1d4t p8mlqy+HyW/GeBd0MpBifVbMURcQJJz0JjbwMqdz/nNt69QOkKAhvvofHvkkTAKa90On M7sR603dyAmS6xE3KWMSUVerU4QjpJO9iMaC2Pw4qgPY6zXpzQ4dvQ85Keuf+v6Fg6/N whgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=ET7BcErP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t6-20020a17090616c600b008e0363188c7si21754527ejd.890.2023.03.25.02.24.53; Sat, 25 Mar 2023 02:25:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=ET7BcErP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232087AbjCYIt6 (ORCPT + 99 others); Sat, 25 Mar 2023 04:49:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjCYIt5 (ORCPT ); Sat, 25 Mar 2023 04:49:57 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A465DBF4 for ; Sat, 25 Mar 2023 01:49:55 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C830BB80315 for ; Sat, 25 Mar 2023 08:49:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E4C1C433D2; Sat, 25 Mar 2023 08:49:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1679734192; bh=f8L0ikxZIxVcGgpfGKNQ0zoD47a47jZlN9kxB61HfCg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ET7BcErP5WMgqH4FoUREiKqPCEQoeOolm9wSzNA8ONXPk3tzAV188kmOmCE32aoRs uf0jORJzsj3FqVJwUWOayDntr+89Pw67lCBdcacUFeXCSyIfg7GQVcyjzxpdlzj/Ju hIndjU8w1SyMqclK39pj/BzmBbf+j690/GRg2ars= Date: Sat, 25 Mar 2023 09:49:49 +0100 From: Greg KH To: Julia Lawall Cc: Alex Elder , Menna Mahmoud , outreachy@lists.linux.dev, johan@kernel.org, elder@kernel.org, linux-kernel@vger.kernel.org, linux-staging@lists.linux.dev Subject: Re: [PATCH v2] staging: greybus: use inline function for macros Message-ID: References: <20230321183456.10385-1-eng.mennamahmoud.mm@gmail.com> <2e869677-2693-6419-ea25-f0cc2efcf3dd@ieee.org> <5efa6e6d-8573-31de-639a-d15b2e9deca0@ieee.org> <48674d8f-9753-780c-f37c-f83ea2855ae6@ieee.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 10:52:35AM +0100, Julia Lawall wrote: > > > On Thu, 23 Mar 2023, Greg KH wrote: > > > On Wed, Mar 22, 2023 at 11:00:41AM +0100, Julia Lawall wrote: > > > Greg raised the question of whether the inline function is really as > > > efficient as a macro. > > > > > > I tried the following definitions: > > > > > > #define to_gbphy_dev(d) container_of(d, struct gbphy_device, dev) > > > > > > static inline struct gbphy_device *extra_to_gbphy_dev(const struct device *_dev) > > > { > > > return container_of(_dev, struct gbphy_device, dev); > > > } > > > > > > And the following uses: > > > > > > ssize_t macro_protocol_id_show(struct device *dev, > > > struct device_attribute *attr, char *buf) > > > { > > > struct gbphy_device *gbphy_dev = to_gbphy_dev(dev); > > > > > > return sprintf(buf, "%c macro 0x%02x\n", *buf, gbphy_dev->cport_desc->protocol_id); > > > } > > > ssize_t extra_protocol_id_show(struct device *dev, > > > struct device_attribute *attr, char *buf) > > > { > > > struct gbphy_device *gbphy_dev = extra_to_gbphy_dev(dev); > > > > > > return sprintf(buf, "extra 0x%02x %c\n", gbphy_dev->cport_desc->protocol_id, *buf); > > > } > > > > > > They are a little bit different to avoid too much compiler optimization. > > > > > > After doing make drivers/staging/greybus/gbphy.s, I get similar looking > > > code in both cases: > > > > > > Macro version: > > > > > > .type macro_protocol_id_show, @function > > > macro_protocol_id_show: > > > endbr64 > > > 1: call __fentry__ > > > .section __mcount_loc, "a",@progbits > > > .quad 1b > > > .previous > > > pushq %rbp # > > > movq %rdx, %rbp # tmp96, buf > > > pushq %rbx # > > > # drivers/staging/greybus/gbphy.c:40: { > > > movq %rdi, %rbx # tmp95, dev > > > # drivers/staging/greybus/gbphy.c:43: return sprintf(buf, "%c macro 0x%02x\n", *buf, gbphy_dev->cport_desc->protocol_id); > > > call __sanitizer_cov_trace_pc # > > > # drivers/staging/greybus/gbphy.c:43: return sprintf(buf, "%c macro 0x%02x\n", *buf, gbphy_dev->cport_desc->protocol_id); > > > movq -32(%rbx), %rax # MEM[(struct gbphy_device *)dev_7(D) + -40B].cport_desc, MEM[(struct gbphy_device *)dev_7(D) + -40B].cport_desc > > > # drivers/staging/greybus/gbphy.c:43: return sprintf(buf, "%c macro 0x%02x\n", *buf, gbphy_dev->cport_desc->protocol_id); > > > movzbl 0(%rbp), %edx # *buf_9(D), *buf_9(D) > > > movq %rbp, %rdi # buf, > > > movq $.LC18, %rsi #, > > > movzbl 3(%rax), %ecx # _1->protocol_id, _1->protocol_id > > > call sprintf # > > > # drivers/staging/greybus/gbphy.c:44: } > > > movl $13, %eax #, > > > popq %rbx # > > > popq %rbp # > > > jmp __x86_return_thunk > > > .size macro_protocol_id_show, .-macro_protocol_id_show > > > > > > Function version: > > > > > > .type extra_protocol_id_show, @function > > > extra_protocol_id_show: > > > endbr64 > > > 1: call __fentry__ > > > .section __mcount_loc, "a",@progbits > > > .quad 1b > > > .previous > > > pushq %rbp # > > > movq %rdx, %rbp # tmp96, buf > > > pushq %rbx # > > > # drivers/staging/greybus/gbphy.c:47: { > > > movq %rdi, %rbx # tmp95, dev > > > # drivers/staging/greybus/gbphy.c:50: return sprintf(buf, "extra 0x%02x %c\n", gbphy_dev->cport_desc->protocol_id, *buf); > > > call __sanitizer_cov_trace_pc # > > > # drivers/staging/greybus/gbphy.c:50: return sprintf(buf, "extra 0x%02x %c\n", gbphy_dev->cport_desc->protocol_id, *buf); > > > movq -32(%rbx), %rax # MEM[(struct gbphy_device *)dev_8(D) + -40B].cport_desc, MEM[(struct gbphy_device *)dev_8(D) + -40B].cport_desc > > > # drivers/staging/greybus/gbphy.c:50: return sprintf(buf, "extra 0x%02x %c\n", gbphy_dev->cport_desc->protocol_id, *buf); > > > movzbl 0(%rbp), %ecx # *buf_9(D), *buf_9(D) > > > movq %rbp, %rdi # buf, > > > movq $.LC19, %rsi #, > > > movzbl 3(%rax), %edx # _3->protocol_id, _3->protocol_id > > > call sprintf # > > > # drivers/staging/greybus/gbphy.c:51: } > > > movl $13, %eax #, > > > popq %rbx # > > > popq %rbp # > > > jmp __x86_return_thunk > > > .size extra_protocol_id_show, .-extra_protocol_id_show > > > > > > Both seem to access the memory directly. Maybe the example is too simple, > > > and the compiler is more likely to optimize aggressively? > > > > Nice, that shows that it is the same both ways as the compiler version > > you are using is smart enough > > > > Which compiler and version is this? Does it work the same for all of > > the supported versions we have to support (i.e. really old gcc?) > > gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 > > I got a similar result for gcc-5: > > macro_protocol_id_show: > 1: call __fentry__ > .section __mcount_loc, "a",@progbits > .quad 1b > .previous > movq %rdx, %rax # buf, buf > movq -32(%rdi), %rdx # MEM[(struct gbphy_device *)dev_1(D) + -40B].cport_desc, MEM[(struct gbphy_device *)dev_1(D) + -40B].cport_desc > movq $.LC19, %rsi #, > movq %rax, %rdi # buf, > movzbl 3(%rdx), %ecx # _3->protocol_id, D.44996 > movzbl (%rax), %edx # *buf_6(D), D.44996 > call sprintf # > cltq > jmp __x86_return_thunk > .size macro_protocol_id_show, .-macro_protocol_id_show > .section .text.unlikely > .LCOLDE20: > .text > .LHOTE20: > .section .rodata.str1.1 > .LC21: > .string "extra 0x%02x %c\n" > .section .text.unlikely > .LCOLDB22: > .text > .LHOTB22: > .p2align 6,,63 > .globl extra_protocol_id_show > .type extra_protocol_id_show, @function > extra_protocol_id_show: > 1: call __fentry__ > .section __mcount_loc, "a",@progbits > .quad 1b > .previous > movq %rdx, %rax # buf, buf > movzbl (%rdx), %ecx # *buf_3(D), D.45003 > movq -32(%rdi), %rdx # MEM[(struct gbphy_device *)dev_2(D) + -40B].cport_desc, MEM[(struct gbphy_device *)dev_2(D) + -40B].cport_desc > movq $.LC21, %rsi #, > movq %rax, %rdi # buf, > movzbl 3(%rdx), %edx # _6->protocol_id, D.45003 > call sprintf # > cltq > jmp __x86_return_thunk > .size extra_protocol_id_show, .-extra_protocol_id_show > .section .text.unlikely Ah nice! Thanks for checking. Ok, so it's not a performance issue at all either way, compilers are smarter than we think :) > > For the most part, sysfs files are not on any sort of "fast path" so a > > function call is fine, but as I mentioned before, sometimes we are > > forced to move calls to container_of() to container_of_const() and that > > can not be an inline function, but must remain a macro :( > > It seems that this is because there is not a unique return type, but not a > performance issue? Yes, container_of_const() will return a different type (const or not) depending on what is passed into it. This allows the "const-ness" of a pointer to be kept across the call, while as container_of() will always cast away the const which can be dangerous if you don't know what you are doing. thanks, greg k-h