Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2699074rwd; Fri, 26 May 2023 09:56:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4xsYIDYkZ9/nlx/FkaTHkDHlR8YHlG1tK+tT3FCgvedsQOlD6Q/BXum0FWqGP/spuAtrTQ X-Received: by 2002:a05:6a20:2d0e:b0:10b:d22e:d419 with SMTP id g14-20020a056a202d0e00b0010bd22ed419mr2765968pzl.35.1685120166567; Fri, 26 May 2023 09:56:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685120166; cv=none; d=google.com; s=arc-20160816; b=bi6AqraQAPzb7XDCS+f+SnGBhn3JZFz0UzyP68hd8EpOLTZ3HaJCHIzXpSWTIjf2pE 2xNSj9JuhlpNU4nVm99AkhtqbaODiclcyIutWAcL8af6mn8HFbyubMqV6gPIHOFK0JPP beiMMdDMQyFv/QRKOnNuLKepIwmrfttpnTF8nOga1wqwEAErEOymeqHVqwj+kjJT0aNg aqwaAq04SPu8m8E3O69nzQuwtd8MVNhWH5npOqfSwERI/AnTaASiw124PGsjpuGPWlU1 hERBSCg3siKFx/Gz1vvjjyRSC3zTL+HnLwqiishHhes/IayrC3/yYrr/9Nx9WjHidujK nMag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=EY8jNRNfAa0Ugc42QcvXpqkmlzbK+JbHaTZLZiADEaE=; b=GkgbLL3CSyD1d+OjIz96EcK/g6bH4fpNJ3R9i0IPwnI+w8wcY4q1He00OdDgJlPL16 E9o6a+1Hj0Oru029rRjwOvYua2VErkRVF+zajyd1ngee9Oqhv7PUczWx02tuzhP+9g2v vN99VbvvpSniHkNpyGp8kouYV6m8rCifUOFzQmjpwOyfnNY7XIE902yfT8KKIAUhBm9T DoDDY3ML/eYuO9aiDLvaCqQfG1hwTTlLAzGSoW46SquA5sddlmyBpBfiLFPd+BmHWJ+H oxWNlPWuWdBNWTXSZYF9hyZZI9tusqLX/5omHJlSJLNKIVtV7ZUIwvLr9Qt0WVAh33L3 RLvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=De6LirTm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v13-20020a637a0d000000b00520dfb861fbsi267490pgc.416.2023.05.26.09.55.26; Fri, 26 May 2023 09:56:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=De6LirTm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232792AbjEZQh1 (ORCPT + 99 others); Fri, 26 May 2023 12:37:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230041AbjEZQh0 (ORCPT ); Fri, 26 May 2023 12:37:26 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 979F8D9 for ; Fri, 26 May 2023 09:37:23 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-3f600a6a890so765e9.0 for ; Fri, 26 May 2023 09:37:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685119042; x=1687711042; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EY8jNRNfAa0Ugc42QcvXpqkmlzbK+JbHaTZLZiADEaE=; b=De6LirTmjImnmRKlEqNPvy3JPAo2CR65jhpJD0oGuGW8VgRXuVGyhNZyzKlQlSoak3 KVyRwC2UjOk5mnVr1vIjSAO1ZlaJp/OfszCysoCsP/OT00FJDh/y+gqaRGBqMhsNJdTs mn0GEl71o/Q40nq6QmPpRmLFK3oWrjXTGN3mOpXhot4VTm4K6wNfdlF+3GxdsVZsFhsR HsNwwe4vqovoNpqmAEWVAvI3D3jvR6xW4eGXZ0bUpD3yxdjx3frSyMI1r297n+BHEPnc bGDaBGx78/R3AOxec3a2nzy+udYTBlMyerU05I7wpp9x3vLuPbYzrnw+ziQ5Xc+8PQ3e +nVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685119042; x=1687711042; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EY8jNRNfAa0Ugc42QcvXpqkmlzbK+JbHaTZLZiADEaE=; b=ZL0iZCLFFwO7OK1kBp/qte5756KjkaImBr6N9DkQSxjRdgUOWwrMyGBqA4jm2V3bwu RhSE/LBXLT69D/XYcoVEloUUe7sddSUfdQQUMjOP8G7UJnhmG1Biow5SHM8G09tfsj89 gPQxezK9kxkyicYRsSpnC3rbxF9ebOV3Hi7BwgZ4V30dBqYJ36stm0XnK9OND3+/rjFM MxxUuJzK5kqoq7eQrJg/oeQMoeUAit4hQj/AooQQot0Hpu2YOCq+TIliJYC6iRoeXH3O /af2rMSD9thEw+DMLVeIG8d6Z/tKUzwwt5hI6Tc8ATu96BU5JcU3K0Whu4LDc4zWWSUP 6dbw== X-Gm-Message-State: AC+VfDxqI9AjUZQGO9EU4bPFk/PwJEG7mDTHWeX3o+VjdYV9S5XEWK+D lZUkbbhup4KKn4ZPniWcN8VExHQrXUOtp5Dckq80gQ== X-Received: by 2002:a05:600c:3c93:b0:3f4:2594:118a with SMTP id bg19-20020a05600c3c9300b003f42594118amr132218wmb.2.1685119041804; Fri, 26 May 2023 09:37:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Eric Dumazet Date: Fri, 26 May 2023 18:37:09 +0200 Message-ID: Subject: Re: x86 copy performance regression To: Linus Torvalds Cc: LKML , netdev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 26, 2023 at 6:30=E2=80=AFPM Linus Torvalds wrote: > > On Fri, May 26, 2023 at 8:00=E2=80=AFAM Eric Dumazet wrote: > > > > We can see rep_movs_alternative() using more cycles in kernel profiles > > than the previous variant (copy_user_enhanced_fast_string, which was > > simply using "rep movsb"), and we can not reach line rate (as we > > could before the series) > > Hmm. I assume the attached patch ends up fixing the regression? > > That hack to generate the two-byte 'jae' instruction even for the > alternative is admittedly not pretty, but I just couldn't deal with > the alternative that generated pointlessly bad code. > > We could make the constant in the comparison depend on whether it is > for the unrolled or for the erms case too, I guess, but I think erms > is probably "good enough" with 64-byte copies. > > I was really hoping we could avoid this, but hey, a regression is a regre= ssion. > > Can you verify this patch fixes things for you? Hmm.. my build environment does not like this yet :) arch/x86/lib/copy_user_64.S:40:30: error: unexpected token in argument list 0: alternative ".byte 0x73," ".Lunrolled" "-0b-2", ".byte 0x73," ".Llarge" "-0b-2", X86_FEATURE_ERMS ^ make[3]: *** [scripts/Makefile.build:374: arch/x86/lib/copy_user_64.o] Erro= r 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [scripts/Makefile.build:494: arch/x86/lib] Error 2 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [Makefile:2026: .] Error 2