Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2844579rwd; Fri, 26 May 2023 12:07:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5fgU/v0v2s5uO+b8Wu0iJ9sjtqz3XrGlAh/BaPKPj1XW6HSdgqmydXZyYQUza0s2ap1RkY X-Received: by 2002:a17:902:e806:b0:1a9:8ba4:d0d3 with SMTP id u6-20020a170902e80600b001a98ba4d0d3mr5288537plg.8.1685128042963; Fri, 26 May 2023 12:07:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685128042; cv=none; d=google.com; s=arc-20160816; b=cm52y+VWygD0lY9YHp2ujUesGiBdJ+SRjjfkDBlAcq7rxxYZP3b34iBcadY48ECnul wLK0SohKMEyrOszVHSQoPdBoe6s9EsBVsCmk3G3SOeM+UWTtQPWK6s4FYGR5YN5AG6n9 yNp93znUHQ7g80sJYLre33IPu/zowSrmO6L4M6EyHxc7emanqn9tdSiUtnlvkDaGQ0cN D5nfEuf4jzXkwE4gaXWmy8RcRrSqmW0hIGfnmAMJhohmyYiXxQ6GDxnvGnYg5/k57Df8 sPxDoE6Ppf6O0JvMvC5M4457KwUnB5yQ2ty91Rlapbi43r2JFYl/WEicr/IW+6N4PLW1 srjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=u+I6E2+/xyMIRJ7E7KWkruQCNPnRbSv/rBisG3Yul5k=; b=PnAVXEmWFAFukFgJyTJedtr7EVbaBzecC3wdcHCuwnwpL/BiEf7LKkRaYhbd61I2aj BFFYJEzKKuYCJKT2moUvKhwyHSPjabl3FWmwAZxpwYtJNAqeRrDQrLJ12nEBiXAl5syf 80zNgg4Tqx2EIA6ZalP/EAGoGq2WCTu/l6vfmDEtN++Z/vApglEhIHXA/Z9gAM+9EUbX AxB8uOivr28knSB1osuW8SmRo3jSCx44reB9SIP0tQ+Usz5hwwp9RPpvSpOrL4svR8pT se5HgYZcTJ+Xadurnzsd1tI63Dgiegg4q6szlWUnV5Av6oha5DJI/sjzGM8RYAtvPP6n arHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yYOyRzK7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h10-20020a170902f54a00b001aada12d628si4037824plf.585.2023.05.26.12.07.09; Fri, 26 May 2023 12:07:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yYOyRzK7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242466AbjEZSzi (ORCPT + 99 others); Fri, 26 May 2023 14:55:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbjEZSzh (ORCPT ); Fri, 26 May 2023 14:55:37 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41CC213A for ; Fri, 26 May 2023 11:55:36 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-3f6a6b9bebdso11415e9.0 for ; Fri, 26 May 2023 11:55:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685127335; x=1687719335; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u+I6E2+/xyMIRJ7E7KWkruQCNPnRbSv/rBisG3Yul5k=; b=yYOyRzK7dBWY/xYBt1IqltL1yOPWvqma4h1bT0wKEpbbpjU6SKOim34uyw52H2z8O0 rpOR04jJ++TGmuP7zsqrh1sXsDyDGzTHZ/2pv2cFPMqLZg9GZr+B7WPZ5brt7Xo0FrTZ nl5dp1TycSUlyEIa580JLwnXrqdnCUytQ2YMQ5sGeISd26GeYZHIB43Pey8x7IDd361c Kv6eSjyRHZ9sorYqe98acErMOC4GLywCt0Q22atq/EFR13H35ZRjpOwr5T5LIafP5mXM P9LNRS5ZML98fAMi69pgT7ybA6CpMDmkI8xUwaOnUxbjMi0hCog2AhmmaUKQWQnWzcUt h1+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685127335; x=1687719335; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u+I6E2+/xyMIRJ7E7KWkruQCNPnRbSv/rBisG3Yul5k=; b=Cgz/bbA9BKr6xNDthjADmX2NF5ud9Mhi5lyRbviUcV+Gp5ZaLHjRQO23q766xi5/ZY W3lrQvttLcgj+JwDhVj21voTEtcdRGaYRJmbZAEYrRM66MQjkpZZjKcC6TTuQonixSzN l4mKeQ5ijVQ3I771oDDN38302xj/5PcXGOv8oLyaIzTQFljElS7rIRfrr/ICL9BVkoDR D8E/QsEEcdt4Xb6pa/963HAJUA8Io+mg4qk/UYd8jb4XLfHNPWBwEPXa8MBld8hZlugv Rx+u34jdaJWB+LsEcaxDVI94ds2utCjYtli1W5QxVfqys/UZFunVKJ0HTvKgctU7tNKq mu9g== X-Gm-Message-State: AC+VfDyt6l/2WXMK8VlY92HeFmpuadW8UqU2zHkSzEM2oUUPGtkjMWCA 3ifAX9dU0yIRXl6VfidlmoZo3BsgETemjfK6NbgYGFWn+pvWO3z3oaRW7g== X-Received: by 2002:a05:600c:3b17:b0:3f1:73b8:b5fe with SMTP id m23-20020a05600c3b1700b003f173b8b5femr19605wms.3.1685127334583; Fri, 26 May 2023 11:55:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Eric Dumazet Date: Fri, 26 May 2023 20:55:22 +0200 Message-ID: Subject: Re: x86 copy performance regression To: Linus Torvalds Cc: LKML , netdev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 26, 2023 at 8:33=E2=80=AFPM Linus Torvalds wrote: > > On Fri, May 26, 2023 at 10:51=E2=80=AFAM Eric Dumazet wrote: > > > > Hmmm > > > > [ 25.532236] RIP: 0010:0xffffffffa5a85134 > > [ 25.536173] Code: Unable to access opcode bytes at 0xffffffffa5a8510= a. > > This was the other reason I really didn't want to use alternatives on > the conditional branch instructions. The relocations are really not > very natural, and we have odd rules for those things. So I suspect our > instruction rewriting simply gets this wrong, because that's such a > nasty pattern. > > I really wanted my "just hardcode the instruction bytes" to work. Not > only did it get me the small 2-byte conditional jump, it meant that > there was no relocation on it. But objtool really hates not > understanding what the alternatives code does. > > Which is fair enough, but it's frustrating here when it only results > in more problems. > > Anyway, I guess *this* avoids all issues. It creates an extra jump to > a jump for the case where the CPU doesn't have ERMS, but I guess we > don't really care about those CPUs anyway. > > And it avoids all the "alternative instructions have relocations" > issues. And it creates all small two-byte jumps, and the "rep movsb" > fits exactly on that same 2 bytes too. Which I guess all argues for > this being what I should have started with. > > This time it *really* works. > Indeed, this one is working and fixes the issue for me, thanks a lot ! New numbers look similar to 6.3 ones. Tested-by: Eric Dumazet Performance counter stats for 'taskset 02 ./tcp_mmap -H 2002:a05:6608:297:= :': 2,833.29 msec task-clock # 0.970 CPUs utilized 1,065 context-switches # 375.888 /sec 1 cpu-migrations # 0.353 /sec 128 page-faults # 45.177 /sec 10,297,389,329 cycles # 3.634 GHz 7,213,189,594 instructions # 0.70 insn per cycle 1,220,821,121 branches # 430.884 M/sec 10,430,907 branch-misses # 0.85% of all branches 2.921180547 seconds time elapsed 0.005304000 seconds user 2.478561000 seconds sys