Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp516085rdb; Fri, 17 Nov 2023 05:36:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IGUD4dUvBRglKLhViOoK2zyS24OC9/211nFbxKRZRZv6AVgDsedFB7FnD+ryYtgY/iJIWr6 X-Received: by 2002:a05:6a00:6c98:b0:6c3:5ffb:b63e with SMTP id jc24-20020a056a006c9800b006c35ffbb63emr22319829pfb.33.1700228204111; Fri, 17 Nov 2023 05:36:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700228204; cv=none; d=google.com; s=arc-20160816; b=C5ZpG2yGvxvvK6LSqDPHdCjsZzVx4X8+5bi8t/MaNKZROIsBETM/nUiZsRrZiKAebZ MZjIiS1gl2FeQSwbj2AXCI0H2/Mk3d77dtpwvrjc2crWn9HDQnnFrECQvCwaAZN2o9dU FPpKpUcrF5HzrH30+fOF6l9f8DkGvcS7lm3hQtBk6uatEhzxK/6gLli3heLaEsDN66eJ 5HUJ/eX26gcwOgLmqzfL5gS/kgzuSO3JtRbP6hxmxhRzjtL9rvdAmjuOMhcRS8aPtgHZ BxypGsCm8/loxUJMqgaXguctVymf8ckYlIzRxFqW6tprwX+8DV2b9g18K1fhpMNRLt4a 3urg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=j4nL6lMcchz3OqRF6Jql0QZtDg2jb+Lp1wFMbOZWnTY=; fh=beiXuPb//HPQbwds9hgQtaTNu/Eh6G36XpbMtMoVCuo=; b=l1q2y/EdyIt0+e2Rz3Fgtx7XB9g36pWcd+/RqiTEsZ6mRlRVeBFFHTfr7HzrG8PSZl SJ2shEY9G0ipZX1hp7k+zsYKPDYGUoLXKP0eErpDoMDpPflxZXzqRonN/L/ROBENqyZl TCjHSIpEm+o1gvOeJaUeWJvP/I+Al9zaEuiUumVIHTS8zVYhJ8h05taFozFjt0eD3nku 3Iab1BTSo4srUpAgJNgBK07vyiwy7ziA6j5fpyDTySAc7WM0gZnO9ww2WZsT0qUi4dEA RoSgDQMv1xXkgvHepPxsEjmrJAUJUyB54ZS+q/FVSTec7XCeGtI+VrxigolWDw47SHbG sJGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=cKOZQQKR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id g18-20020a056a000b9200b006c4af0a7842si1976292pfj.274.2023.11.17.05.36.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 05:36:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=cKOZQQKR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id EC4BA83037C4; Fri, 17 Nov 2023 05:36:40 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231437AbjKQNga (ORCPT + 99 others); Fri, 17 Nov 2023 08:36:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjKQNg3 (ORCPT ); Fri, 17 Nov 2023 08:36:29 -0500 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB3F8D55 for ; Fri, 17 Nov 2023 05:36:25 -0800 (PST) Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-9dbb3e0ff65so278921166b.1 for ; Fri, 17 Nov 2023 05:36:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1700228184; x=1700832984; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=j4nL6lMcchz3OqRF6Jql0QZtDg2jb+Lp1wFMbOZWnTY=; b=cKOZQQKRcVQgkEHFYCEHZONRrGc+CxSoq9Pu8fYvVv1WL5W2O7xz0HrL1YYQpIYVwy ksUobwXuP3pf6RU+iL1k3nUB2L92ZTKrHEa1FjfFCnKFS77sImjxUM4CksjKIcx8zIV3 7vAKjak/x4N7kRnaqMA2c01cStM+AfadG7krA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700228184; x=1700832984; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=j4nL6lMcchz3OqRF6Jql0QZtDg2jb+Lp1wFMbOZWnTY=; b=ms0L0kbB1PepZnPJB7aeFXjbj7aS7VpCPNIKRcv1tZUwd5VMV3ps3tv8ullgXI3rIm SB2Y0h7oige7ENKSnfQrfoU8fddhNrk7HaFdrg0EedklTYM959HgJSy9ZB0hrFyBicIW AmJPDKL56hEDMvh7Xy5+jpgcDgZXWY07ZJkBNifIO8aEEgNsNLPDvOhKeasb04jfMOxo /0H5ubAqHfc0zsommrWaFO+B3n7dWz3n8nPaj9PvgA0sbkyaOQaUs7ywwpTeLg6NoR/A a2Fxy6OqG4DF0VBibOv6K7GQLRbhdCao/xDVY1tXOopSY7uSmRYs+mRmZfg3tTeNZQtX 69jg== X-Gm-Message-State: AOJu0YwB8z9Idhvs+S5Kirh5h4QrZDgyuFBksKCls6XT5H7Hd44nd/52 A1Ql4vs8yEG14ptogmJFCMsHM226KRcj6cvJIyCCWKow X-Received: by 2002:a17:907:36a:b0:9e4:6ff2:131a with SMTP id rs10-20020a170907036a00b009e46ff2131amr14504689ejb.48.1700228184319; Fri, 17 Nov 2023 05:36:24 -0800 (PST) Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com. [209.85.208.49]) by smtp.gmail.com with ESMTPSA id h20-20020a1709062dd400b0099293cdbc98sm802445eji.145.2023.11.17.05.36.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Nov 2023 05:36:22 -0800 (PST) Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-53e08e439c7so2889955a12.0 for ; Fri, 17 Nov 2023 05:36:21 -0800 (PST) X-Received: by 2002:a05:6402:55:b0:543:5927:f969 with SMTP id f21-20020a056402005500b005435927f969mr14403549edu.3.1700228181455; Fri, 17 Nov 2023 05:36:21 -0800 (PST) MIME-Version: 1.0 References: <202311061616.cd495695-oliver.sang@intel.com> <3865842.1700061614@warthog.procyon.org.uk> <20231115190938.GGZVUXcuUjI3i1JRAB@fat_crate.local> <20231116154406.GDZVY4xmFvRQt0wGGE@fat_crate.local> <20231117114421.GCZVdSFZ7DKtBol821@fat_crate.local> <67acdf70c3fd4adf9bc0dddd4b10d4a1@AcuMS.aculab.com> In-Reply-To: <67acdf70c3fd4adf9bc0dddd4b10d4a1@AcuMS.aculab.com> From: Linus Torvalds Date: Fri, 17 Nov 2023 08:36:03 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression To: David Laight Cc: Borislav Petkov , David Howells , kernel test robot , "oe-lkp@lists.linux.dev" , "lkp@intel.com" , "linux-kernel@vger.kernel.org" , Christian Brauner , Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , "ying.huang@intel.com" , "feng.tang@intel.com" , "fengwei.yin@intel.com" , linux-toolchains ML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 17 Nov 2023 05:36:41 -0800 (PST) On Fri, 17 Nov 2023 at 08:09, David Laight wrote: > > Zero length copies are different, they always take ~60 clocks. That zero-length thing is some odd microcode implementation issue, and I think intel actually made a FZRM cpuid bit available for it ("Fast Zero-size Rep Movs"). I don't think we care in the kernel, but somebody else did (or maybe Intel added a flag for "we fixed it" just because they noticed) I at some point did some profiling, and we do have zero-length memcpy cases occasionally (at least for user copies, which was what I was looking at), but they aren't common enough to worry about some small extra strange overhead. (In case you care, it was for things like an ioctl doing "copy the base part of the ioctl data, then copy the rest separately". Where "the rest" was then often nothing at all). > My current guess for the 5000 clocks is that the logic to > decode 'rep movsb' is loaded into a buffer that is also used > to decode some other instructions. Unlikely. I would guess it's the "power up the AVX2 side". The memory copy uses those same resources internally. You could try to see if "first AVX memory access" (or similar) has the same extra initial cpu cycle issue. Anyway, the CPU you are testing is new enough to have ERMS - that's the "we do pretty well on string instructions" flag. It does indeed do pretty well on string instructions, but has a few oddities in addition to the zero-sized thing. The other bad cases tend to be along the line of "it falls flat on its face when the source and destination address are not mutually aligned, but they are the same virtual address modulo 4096". Or something like that. I forget the exact details. The details do exist, but I forget where (I suspect either Agner Fog or some footnote in some Intel architecture manual). So it's very much not as simple as "fixed initial cost and then a fairly fixed cost per 32B", even if that is *one* pattern. Linus