Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3438569rdb; Thu, 16 Nov 2023 09:25:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IF69XxC1JfLCfo0Pi38QvA1MD8xdqFMAbt2/fjC8jazvop467HM0wOyG6pAqm4ATwNpeS3f X-Received: by 2002:a05:6a00:10c8:b0:68f:ecb9:5fde with SMTP id d8-20020a056a0010c800b0068fecb95fdemr16092509pfu.34.1700155535493; Thu, 16 Nov 2023 09:25:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700155535; cv=none; d=google.com; s=arc-20160816; b=aj1pAbkK/egN+2vnEnC2Wk1CX1CW8tndvFneKYeU/mDJhM3dWxiOIN6kx6WVHzTLuz HlYasItltNXvhUYcUezqKy4wHj7SEBU2TG1MKH2SwkjtNXVZ2dqfaaEzJ17CmITm0oP9 J4cUFXyGMcEf2CTAZbXzxePFBWlgae+TnkfrL+4QgBOLWfdobgKVSKLN//5BjdKrYD7a p5sajZdCoqfWYy967l2l9FSJ5GLzxI5MtCRevXAkHTXGBF/rgqQCJuXzFNwIyRd3OU8M SQzlMmBrUvAo/z9MaZrFjy1k+OL9NDXzk1drpGjddNY8wSFn4EIdUV6tmm9r16dP2Wxj uInw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=U1iPlcZ3/XGhU6rtBc5tXppbT+kLnQ6bw9pSEmXipB0=; fh=GzZsZs0s2U2+VB51uQggZkBm8I1i83qNCPTtsaTvNQc=; b=Hs8saBanFlfGPmjWxIhQtgCz8QAOvuzeWmKcGGBBaNMOS22Di1MrZIxdILhbRoQ4fk yLrfXUmWzCzcPWs2g4Tn9ekIc4/GucJPRk0ILXRzDiasvxm78PQyjJMAEpgOA6UQ9FCu 66WKcSKbRcv+XjCPK4tPHl87KAaKdXXLU4wVsiUr9inrYoV/Li96YpoC8UgHgagTtzcb MmYV0A39ldBAQkpbYuSjQGWV/v0TVkXg5BD0xgy3g6V3m/j8mYidOMQTLMm+Vfe3Zk4H ENB9SySZv1jpXogHomNxEL2OsMxRu6a/imwIPkwFjTnolfoHGX/twMYVHS180NK3FfzT OiKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PQ9PsVnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id q3-20020a056a00088300b006b4231ba465si12986214pfj.85.2023.11.16.09.25.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 09:25:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PQ9PsVnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id CA67E810D660; Thu, 16 Nov 2023 09:25:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345268AbjKPRZY (ORCPT + 99 others); Thu, 16 Nov 2023 12:25:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229468AbjKPRZX (ORCPT ); Thu, 16 Nov 2023 12:25:23 -0500 Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D30FD52 for ; Thu, 16 Nov 2023 09:25:19 -0800 (PST) Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-9d10972e63eso158444366b.2 for ; Thu, 16 Nov 2023 09:25:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1700155517; x=1700760317; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=U1iPlcZ3/XGhU6rtBc5tXppbT+kLnQ6bw9pSEmXipB0=; b=PQ9PsVnUZJLJecwswG0j3ImmVGuDiQhL23IkvIByC933XF35kvYMPOps84uSw0i81A ZiW5phGiL+jaYK/eGOPZVWP9MCe72T8WY27sVa+OWm8uxCfM1WGcY5i7ibEoDMtP65hW 3PUESA8eTCwbLiUchzeuWnkviX8wNepxZNy88= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700155517; x=1700760317; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=U1iPlcZ3/XGhU6rtBc5tXppbT+kLnQ6bw9pSEmXipB0=; b=cogwH1h7f3GQcQhj/jAfE2KRy1Xqc8zU8WivURE94xNWeXGNj3cV9/KoNXQBQEPEuP e9+U78u5uml04ZSfXuvAhTHK1DPHMlfK2v3YLDoHFBBpEXKfEzjTZbzF9bYeIJ5wg0ej b5jbuta/rOM4c2JZ5l7lWNtnmBhxib9r6KUCf27mQkHtJePMceD+hm3irRHPFJ1jm35k uTjnxKtX1cc/iXkr/0PmrDIkvhtkZbXcioFD3QLjrnhQT5ZqgWS6PF+Mq8la18LkkGQx uHI4mG2me19atP4oRhwOuFrl6xIsB4TNmBhqjKcO43N8wISWqhTSdywqbaG1Vevb7Lyc /rcw== X-Gm-Message-State: AOJu0YwuyF40nqRgV8MWeXaHH0NmM4/ZJQw37LUPq/A1J9ef57eLH+2R FnjOU515XFvz2DmrsAraLvrwlg1nRbqEEZt8K2xb/MbK X-Received: by 2002:a17:906:6a1f:b0:9bd:d1e8:57f1 with SMTP id qw31-20020a1709066a1f00b009bdd1e857f1mr16389089ejc.50.1700155517623; Thu, 16 Nov 2023 09:25:17 -0800 (PST) Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com. [209.85.208.47]) by smtp.gmail.com with ESMTPSA id b22-20020a170906d11600b009929ab17be0sm8668600ejz.162.2023.11.16.09.25.16 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Nov 2023 09:25:16 -0800 (PST) Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-52bd9ddb741so1557453a12.0 for ; Thu, 16 Nov 2023 09:25:16 -0800 (PST) X-Received: by 2002:aa7:c993:0:b0:53d:a17a:7576 with SMTP id c19-20020aa7c993000000b0053da17a7576mr13094274edt.8.1700155516281; Thu, 16 Nov 2023 09:25:16 -0800 (PST) MIME-Version: 1.0 References: <202311061616.cd495695-oliver.sang@intel.com> <3865842.1700061614@warthog.procyon.org.uk> <4097023.1700084620@warthog.procyon.org.uk> <42895.1700089191@warthog.procyon.org.uk> <4cfd4808cc694f169aa8b83547ebc74d@AcuMS.aculab.com> In-Reply-To: <4cfd4808cc694f169aa8b83547ebc74d@AcuMS.aculab.com> From: Linus Torvalds Date: Thu, 16 Nov 2023 12:24:58 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression To: David Laight Cc: David Howells , Borislav Petkov , kernel test robot , "oe-lkp@lists.linux.dev" , "lkp@intel.com" , "linux-kernel@vger.kernel.org" , Christian Brauner , Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , "ying.huang@intel.com" , "feng.tang@intel.com" , "fengwei.yin@intel.com" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 16 Nov 2023 09:25:32 -0800 (PST) On Thu, 16 Nov 2023 at 11:55, David Laight wrote: > > I presume lack of coffee is responsible for the s/movs/stos/ :-) Yes. > How much difference does FSRM actually make? > Especially when compared to the cost of a function call (even > without the horrid return thunk). It can be a big deal. The subject line here is an example. On that machine, using the call to 'memcpy_orig' clearly performs *noticeably* better. So that 16% regression was"fast apparently at least partly because of -11.0 perf-profile.self.cycles-pp.memcpy_orig +14.7 perf-profile.self.cycles-pp.copy_page_from_iter_atomic where that inlined copy (that used 'rep movsq' and other things around it) was noticeably worse than just calling memcpy_orig that does a basic unrolled loop. Now, *why* it matters a lot is unclear. Some machines literally have the "fast rep string" code disabled, and then "rep movsb" is just horrendous. That's arguably a machine setup issue, but people have been known to do those things because of problems (most recently "reptar"). And in most older microarchitectures it's not just the cycles in the repat thing, it is also a pipeline stall and I think it's also a (partial? full?) barrier for OoO execution. That pipeline stall was most noticeable on P4, but it's most definitely there on other cores too. And the OoO execution batter can mean that it *benchmarks* fairly well when you just do "rep movs" in a loop to test, but then if you have code *around* it, it causes problems for the instructions around it. I have this memory from my "push for -Os" (which is from over a decade ago, to take my memory with a pinch of salt) of seeing "rep movsb" followed by a load of the result causing a horrid stall on the load. A regular load-store loop will have the store data forwarded to any subsequent load, but "rep movs" might not do that and if it works on a cacheline level you might lose out on those kinds of things. Don't get me wrong - I really like the rep string instructions, and while they have issues I'd *love* for CPU's to basically do "memcpy" and "memset" without any library call overhead. The security mitigations have made indirect calls much worse, but they have made regular function call overhead worse too (and there's the I$ footprint thing etc etc). So I like "rep movs" a lot when it works well, but it most definitely does not work well everywhere. Of course, while the kernel test robot doesn't seem to like the inlined "rep movsq", clearly the machine David is on absolutely *hates* the call to memcpy_orig. Possibly due to mitigation overhead. The problem with code generation at this level is that you win some, you lose some. You can seldom make everybody happy. Linus