Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp171808rwb; Wed, 28 Sep 2022 00:41:39 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6LpuSp2kYCyHBm7wNN1kCik6mRLI0rpGRxgSZGf1rTuy8fgMcIgny3MbDUk9rirRISDdf8 X-Received: by 2002:a17:907:16a3:b0:783:96fc:7fc0 with SMTP id hc35-20020a17090716a300b0078396fc7fc0mr11841041ejc.441.1664350899488; Wed, 28 Sep 2022 00:41:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664350899; cv=none; d=google.com; s=arc-20160816; b=0OgRgyXYHsEK8hIt8TuLwi5gdWw7JfIfK1wYleRNJNviF/1ssKV0SSsvnzi114lX03 SjEfLbb1NBbUJoGk3nl/LEUsZ4zBUgQLIuo5DtNa+m/fkAziCwhp0Lj1Vx8mdYSlYsUZ S19z9tBSKsW35vPSV48/00Fq3av62LIq5982VF8NzabqgXpiHJnCKO/JLB0woWTrPXM3 HDEvXvfnhCPX1bGnwFCyztp9r57Acczv4kq05H+OZYmOKLq/E979udD6sJEG7HX2oqYu spW1zMQoLj7MgTAIdW96QAt1UprNA/PFe1TOfiJLhJWxdmYovvWhSkGI+6epaPHp56fs EJTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=n0yYYWWOm3FM1VkJi0JdAR/s0E5nigR1hAoYI8h0YF0=; b=XSgqDDOCpcSNn+jFhqnVBaHgNGqX2O0iBwhaw2ZvUysT77uyZGzOyeNUUjSbgDCnas kWVR4GeqWJn/tP2Cjnzj2CNTOWGG+oOG6AF53yF3L/lIEMa3LuPDWx/X++zw4KazTb/e 0VedCg+gM9W0Z4orKOBxx1r/vJ02o+QezrjDHjU1Ssx0Os+Fww7hEJgK7KHCLjdfefsd Mv7qXzVN7jZdKFaFjG1WChfF+dxSqJxTuQRHY5G14mhf1F8zarDfLXEIShquiFB4P5sI zHrCjmXwcoyEURXwTH+ZSxLeA4DlV94ZRDyKVYV1b+F9b3LN68vQSV1Uu5Xa9eKrSKHw 0SSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=FK7YeUM2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h9-20020a05640250c900b00447e9c56162si4688425edb.11.2022.09.28.00.41.13; Wed, 28 Sep 2022 00:41:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=FK7YeUM2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233663AbiI1H2S (ORCPT + 99 others); Wed, 28 Sep 2022 03:28:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233466AbiI1H1x (ORCPT ); Wed, 28 Sep 2022 03:27:53 -0400 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4511A18E25 for ; Wed, 28 Sep 2022 00:24:48 -0700 (PDT) Received: by mail-lj1-x234.google.com with SMTP id s10so13376266ljp.5 for ; Wed, 28 Sep 2022 00:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rasmusvillemoes.dk; s=google; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=n0yYYWWOm3FM1VkJi0JdAR/s0E5nigR1hAoYI8h0YF0=; b=FK7YeUM2hM+3XPSYwmg/icglrYf9VcytTb705GT7JOTIsY/vfzZ0B163zJRG5LmnAY yKzg0VzUT8eyLc3tHgXOjQrn6e7882ApK2BuUG3C0muhqwi3F5Rw21f7dOQP8gGrcgTN Zs/Lx5u/g7wEbjiRktiE/AzZT3Fzo3pb9BbKI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=n0yYYWWOm3FM1VkJi0JdAR/s0E5nigR1hAoYI8h0YF0=; b=h3qxu8qBqbzUSWgJk4uRNwTNobMd5EWRGO/kESpU5mnILlX6jfp7ZTl9kYXZFN3TfD bI1mz8Px1hWJgJYhIaD34CV+FRepuf5SunnlnaCFWuapFrnOuFlVMCcELpaEoFP43v7K XA5/Jiz7jn2bnx1mzjPveJt/FVqRnijqeUlpmX5kLVEcI13qykJ0K0AuR8pIbh7UTgwN VqXhQHSbe3QGP+nEvkOj9IE/kcSmcaMgp809dz6e3gpiTradOgOAq2HLfUSD2laYwoXg JhB7L0QsN1CIbmBQKWi9AQyl/lLfJ0Xqx/gRa3SObYG0FBCbE0XHmiYaytQH1bqfBPr0 aNQw== X-Gm-Message-State: ACrzQf3jSiA8TPbBQLsmx/c2iQk2MpXZuyAxKMeZpKuYxCUSiUcfZVaR TMWR6Tkheb8Fj82mjhZN+9RCvg== X-Received: by 2002:a05:651c:94:b0:26c:6678:7d2e with SMTP id 20-20020a05651c009400b0026c66787d2emr11389575ljq.135.1664349854960; Wed, 28 Sep 2022 00:24:14 -0700 (PDT) Received: from [172.16.11.74] ([81.216.59.226]) by smtp.gmail.com with ESMTPSA id o22-20020ac25e36000000b004946c99e78asm390685lfg.277.2022.09.28.00.24.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Sep 2022 00:24:14 -0700 (PDT) Message-ID: Date: Wed, 28 Sep 2022 09:24:04 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v3] x86, mem: move memmove to out of line assembler Content-Language: en-US To: Nick Desaulniers , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen Cc: x86@kernel.org, "H . Peter Anvin" , Peter Zijlstra , Kees Cook , linux-kernel@vger.kernel.org, Linus Torvalds , llvm@lists.linux.dev, Andy Lutomirski References: <202209271333.10AE3E1D@keescook> <20220927210248.3950201-1-ndesaulniers@google.com> From: Rasmus Villemoes In-Reply-To: <20220927210248.3950201-1-ndesaulniers@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27/09/2022 23.02, Nick Desaulniers wrote: > + /* Decide forward/backward copy mode */ > + cmpl dest, src > + jb .Lbackwards_header I know you're mostly just moving existing code, but for my own education I'd like to understand this. > + /* > + * movs instruction have many startup latency > + * so we handle small size by general register. > + */ > + cmpl $680, n > + jb .Ltoo_small_forwards OK, this I get, there's some overhead, and hence we need _some_ cutoff value; 680 is probably chosen by some trial-and-error, but the exact value likely doesn't matter too much. > + /* > + * movs instruction is only good for aligned case. > + */ > + movl src, tmp0 > + xorl dest, tmp0 > + andl $0xff, tmp0 > + jz .Lforward_movs But this part I don't understand at all. This checks that the src and dest have the same %256 value, which is a rather odd thing, and very unlikely to ever be hit in practice. I could understand if it checked that they were both 4 or 8 or 16-byte aligned (i.e., (src|dest)&FOO)), or if it checked that they had the same offset within a cacheline [say (src^dest)&0x3f]. Any idea where that comes from? Or am I just incapable of reading x86 asm? > +.Ltoo_small_forwards: > + subl $0x10, n > + > + /* > + * We gobble 16 bytes forward in each loop. > + */ > +.L16_byteswap_forwards_loop: > + subl $0x10, n > + movl 0*4(src), tmp0 > + movl 1*4(src), tmp1 > + movl tmp0, 0*4(dest) > + movl tmp1, 1*4(dest) > + movl 2*4(src), tmp0 > + movl 3*4(src), tmp1 > + movl tmp0, 2*4(dest) > + movl tmp1, 3*4(dest) > + leal 0x10(src), src > + leal 0x10(dest), dest > + jae .L16_byteswap_forwards_loop > + addl $0x10, n > + jmp .L16_byteswap > + > + /* > + * Handle data forward by movs. > + */ > +.p2align 4 > +.Lforward_movs: > + movl -4(src, n), tmp0 > + leal -4(dest, n), tmp1 > + shrl $2, n > + rep movsl > + movl tmp0, (tmp1) > + jmp .Ldone So in the original code, %1 was forced to be %esi and %2 was forced to be %edi and they were initialized by src and dest. But here I fail to see how those registers have been properly set up before the rep movs; your names for those are tmp0 and tmp2. You have just loaded the last word of the source to %edi, and AFAICT %esi aka tmp2 is entirely uninitialized at this point (the only use is in L16_byteswap). I must be missing something. Please enlighten me. Rasmus