Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2730317rdb; Wed, 4 Oct 2023 09:41:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH1GiqxaqiUfsj1isZG5D63vHcymRgvLalN3VbRSrWb3bDbUpkXfvoXKJzjoYjJOdR/Du7B X-Received: by 2002:a17:902:ec90:b0:1c7:27a1:a9e5 with SMTP id x16-20020a170902ec9000b001c727a1a9e5mr177149plg.33.1696437688541; Wed, 04 Oct 2023 09:41:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696437688; cv=none; d=google.com; s=arc-20160816; b=FAuLPI10ZV04Omf3EAPws851oFpi8xUMiUHWs/4VvUiJ5kexm7AYdI55HIQuiUW76t Sa1+5/1gdiFRN6KClKwC9ifsVmNJWiH3SspPJB4Q31ZOOJw580BaQ0nvcBZkwx5TDM8t YEVxx7ryRe3szGnWxFMi3ugcNqmlRu7lKY+G2zq2nRDfqc9K35ErMtqzMFBYaw9ksm3h 9sI9MIppnNKqKVzuSd/udjiI0hMjcJw7V1pLHoW1DkZD0KLzq4u6EbBLGhmo4RK0dJPo 6iYemicl7+Lukbaalj+i7IsAnZwMHupyd2TGvM+CqbS6T1WSA+1l1vW+r9JERUdVM5Wb zx8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=j7nrS7h1c5kJRtH/fsc8FkvBlZl/ybH9lqaL9mHamzs=; fh=Kk3+srj5vtqYBUu4PgK0yGERKdLAX9Z/0h81rcsxMaM=; b=zJYSzXBJa3nHjF1UDBVyLBF8zKPegPr8kh79yHQigekdjadgdE7gpJxyVHrmIK9qUO L/UE7e0SjnmkRn+nEddILjW30S1PsMZD6o+xVp58QyWcBAAU+6m0WmA6HMawSptDK77U jpNMYV2soDQM7pkBSOwdPDE5g7Yd9yU4H2kdfk0shygygjVBxjAGGmzwZbpLw6OvUUGA 21+17TQpMjc3DyraRXUuBKX65izlKWY6qyVgoQrDs0Mmhhrx+rLAfes8mGTg9oDLuiB2 3Tv4CcfmvPpgznmNYaX2d76mEDSIC4/3IXbNlyfN55MR3/BGt9RkABJf6MXIUamL+sFb Dh9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ACSsUgYA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id k5-20020a170902d58500b001c0eefc0dfesi4218973plh.130.2023.10.04.09.41.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 09:41:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ACSsUgYA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 9F64E804C50F; Wed, 4 Oct 2023 09:41:01 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243453AbjJDQkx (ORCPT + 99 others); Wed, 4 Oct 2023 12:40:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233684AbjJDQku (ORCPT ); Wed, 4 Oct 2023 12:40:50 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07AF8AB for ; Wed, 4 Oct 2023 09:40:47 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-9b2cee55056so3103766b.3 for ; Wed, 04 Oct 2023 09:40:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696437645; x=1697042445; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=j7nrS7h1c5kJRtH/fsc8FkvBlZl/ybH9lqaL9mHamzs=; b=ACSsUgYAfY/ktkNMgzds7EMJBbyKbjyYUzme90vmDd/kjOzQSP8jvRHVOF5MoDZaC4 yARMOElfaN+ocOrt9lKSffWZDxRuj/eUojf9YH063cTHwQEhy9wIBM0m2JX+9UCPHGn8 L3VUGnCncUOkT7RZLRspr0+VMtNtStWxyC7y1rll+q4f8SXT5Zn9ARdowfnCfpZ7znJG p++Yaq+I7C79u7lTnHxoo/8ykqGp+DibzOPSdXAM02TTj5lHKpetfbRDwZ0xh+qX9sTq XspmMzhLJ2jvO0bO1SXa0NxI5g/rvzHiL2WRDC1nfp4RXG/AcKQlHuTRo5S3YW56vIDr F4aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696437645; x=1697042445; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j7nrS7h1c5kJRtH/fsc8FkvBlZl/ybH9lqaL9mHamzs=; b=RCv8Jb2U80OTSMuWGPXAXdJKsBq0BzEL+mP4MVBOzGFio+ClFBKRhsReR/yPJI3ccN I8Dbkd1CzI0jl1Ubiwv0C4SawZKsFHa8Gh1LYRb2CfUjQ1DXgZ82LOPy56wpCtL3bHWa 5nj9ElcQ9HXHSSuj3GaFfBZYTww44bDjyyyLkrFzUDSpwhHbXP4nsfSuMA5huVJQuUyH GDytnVuf4onTbiPt/4dAZWor+f0O2UgCYWpIox5PuM/R6jOIGUD0lUcP1GbmHv9ZQ2Zq PgJAdM0TjbziATs2pa0MgytYbiwAMb5PQstOJKJsMmmHuZ+SZ2E13X/DnWbILwWeNCAP iQ9A== X-Gm-Message-State: AOJu0YwzaMxcGH6v7hi3MvwDV334FY6s119tHczgJalfdt7/ZWXy3e2G 1w1cdajE/LntMWqAQwWTQLk= X-Received: by 2002:a17:906:9c2:b0:9b2:82d2:a2db with SMTP id r2-20020a17090609c200b009b282d2a2dbmr2472912eje.28.1696437645140; Wed, 04 Oct 2023 09:40:45 -0700 (PDT) Received: from gmail.com (1F2EF530.nat.pool.telekom.hu. [31.46.245.48]) by smtp.gmail.com with ESMTPSA id cf26-20020a170906b2da00b0099cb0a7098dsm3080318ejb.19.2023.10.04.09.40.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 09:40:44 -0700 (PDT) Sender: Ingo Molnar Date: Wed, 4 Oct 2023 18:40:42 +0200 From: Ingo Molnar To: Uros Bizjak Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Nadav Amit , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Linus Torvalds , Peter Zijlstra , Thomas Gleixner , Borislav Petkov , Josh Poimboeuf Subject: Re: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors Message-ID: References: <20231004145137.86537-1-ubizjak@gmail.com> <20231004145137.86537-5-ubizjak@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 04 Oct 2023 09:41:02 -0700 (PDT) * Ingo Molnar wrote: > > * Uros Bizjak wrote: > > > The percpu code mostly uses inline assembly. Using segment qualifiers > > allows to use C code instead, which enables the compiler to perform > > various optimizations (e.g. propagation of memory arguments). Convert > > percpu read and write accessors to C code, so the memory argument can > > be propagated to the instruction that uses this argument. > > > > Some examples of propagations: > > > > a) into sign/zero extensions: > > > > 110b54: 65 0f b6 05 00 00 00 movzbl %gs:0x0(%rip),%eax > > 11ab90: 65 0f b6 15 00 00 00 movzbl %gs:0x0(%rip),%edx > > 14484a: 65 0f b7 35 00 00 00 movzwl %gs:0x0(%rip),%esi > > 1a08a9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > > 1a08f9: 65 0f b6 43 78 movzbl %gs:0x78(%rbx),%eax > > > > 4ab29a: 65 48 63 15 00 00 00 movslq %gs:0x0(%rip),%rdx > > 4be128: 65 4c 63 25 00 00 00 movslq %gs:0x0(%rip),%r12 > > 547468: 65 48 63 1f movslq %gs:(%rdi),%rbx > > 5474e7: 65 48 63 0a movslq %gs:(%rdx),%rcx > > 54d05d: 65 48 63 0d 00 00 00 movslq %gs:0x0(%rip),%rcx > > Could you please also quote a 'before' assembly sequence, at least once > per group of propagations? Ie. for any changes to x86 code generation, please follow the changelog format of: 7c097ca50d2b ("x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op") ... Move the load of %rsi outside inline asm, so the compiler can reuse the value. The code in slub.o improves from: 55ac: 49 8b 3c 24 mov (%r12),%rdi 55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx 55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 55b8: 4c 89 f8 mov %r15,%rax 55bb: 48 8d 37 lea (%rdi),%rsi 55be: e8 00 00 00 00 callq 55c3 <...> 55bf: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 55c3: 75 a3 jne 5568 <...> 55c5: ... 0000000000000000 <.altinstr_replacement>: 5: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) to: 55ac: 49 8b 34 24 mov (%r12),%rsi 55b0: 48 8d 4a 40 lea 0x40(%rdx),%rcx 55b4: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 55b8: 4c 89 f8 mov %r15,%rax 55bb: e8 00 00 00 00 callq 55c0 <...> 55bc: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 55c0: 75 a6 jne 5568 <...> 55c2: ... Where the alternative replacement instruction now uses %rsi: 0000000000000000 <.altinstr_replacement>: 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) The instruction (effectively a reg-reg move) at 55bb: in the original assembly is removed. Also, both the CALL and replacement CMPXCHG16B are 5 bytes long, removing the need for NOPs in the asm code. ... Thanks, Ingo