Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3212465rdb; Thu, 16 Nov 2023 03:38:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IHiCvZETvqHvsAxRKRAKlkk7rdbHK3HSfTGi1CYmDZrqqurQy8wABlbGuydqxlL3cY4UgJA X-Received: by 2002:a17:902:ecc2:b0:1cc:339b:62af with SMTP id a2-20020a170902ecc200b001cc339b62afmr10496011plh.16.1700134704990; Thu, 16 Nov 2023 03:38:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700134704; cv=none; d=google.com; s=arc-20160816; b=uTcNGM+YFKowTuMKbZ41XIw7/tJFbcqn02TnFzptqa1ozrHZzdRhrvBGCZ4URmpvV3 DXZYZntDoo+Ns4CjA39KZcLoib4gukNuUwgI1gu/lBNUx/jp/WaHQhjSLaPx94xY3FCk cn/Z8ZDWwyBN7ioUoW22YL8TR4W83r+Qry5+ei2X8KxYSbSWbBtvKb1WES/kBiCYAjTg QPuWYkc4Y9pfzSLfarozyKFECuNw6BaIk+Kc8dQmkax4H63uGH76cSPHnss1JWijPzAM rCOTpFZT5unmvd6cPb5a/yMSeGRHVDEaxDdLgAmxQFrAzPSFhTcsmY4G5lmH4wk2OIhH bgnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=EecvBc0RJqMgZIa9boImvsICK7NjVlntD2FwPFnd0hU=; fh=4QezMZ5BBvfnVtcUClekAvy1PYJG1V/QGUcXek3Azyo=; b=qPagmDBTERHnU/OCm1kFPuqXMYJHjk38RsvQGfoMsZg0EKf5DKTpXf/JFGCh2vu/AQ VolhuW7NWj4BE4fSYios01+kB6S8gulKplTn5xEDND1dF0i68aw9yTQADszXnT1MOsNo 30vX9TEGoOMPXC2xCRcbI3SySpDL3M3PtKnUhNftIfgKe+OrECLzWs6VdD/qb+4mcG9Q yv7tWRWLslJWxuML7Wz3JlRS9G9doN2+7Y4qV4wmCwHQkp4IrVxZBOlmVNx0yU69IPOZ mUJegw6u8GoUeTuB1cEqURL2+oYBOIpM3cjerZocLuJpXeqjROh1IFXXvqhmnPr4LMEK aRzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id a6-20020a170902ecc600b001bf0e15c0a9si12753656plh.269.2023.11.16.03.38.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 03:38:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7C337817C3E0; Thu, 16 Nov 2023 03:38:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230473AbjKPLiL convert rfc822-to-8bit (ORCPT + 99 others); Thu, 16 Nov 2023 06:38:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbjKPLiK (ORCPT ); Thu, 16 Nov 2023 06:38:10 -0500 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECC2B85 for ; Thu, 16 Nov 2023 03:38:06 -0800 (PST) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-56-nB6YrNm6ObaH_dAU9qjzxw-1; Thu, 16 Nov 2023 11:38:04 +0000 X-MC-Unique: nB6YrNm6ObaH_dAU9qjzxw-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Thu, 16 Nov 2023 11:38:08 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Thu, 16 Nov 2023 11:38:08 +0000 From: David Laight To: 'David Howells' CC: 'Linus Torvalds' , Borislav Petkov , kernel test robot , "oe-lkp@lists.linux.dev" , "lkp@intel.com" , "linux-kernel@vger.kernel.org" , Christian Brauner , Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , "ying.huang@intel.com" , "feng.tang@intel.com" , "fengwei.yin@intel.com" Subject: RE: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression Thread-Topic: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression Thread-Index: AQHaF/9mUr/d/cPMC0WC86r1pNT+UrB8tOywgAAGVICAAAkJMA== Date: Thu, 16 Nov 2023 11:38:08 +0000 Message-ID: References: <4c0c3ee6cfa84d21a807055bc1aa27b8@AcuMS.aculab.com> <202311061616.cd495695-oliver.sang@intel.com> <3865842.1700061614@warthog.procyon.org.uk> <20231115190938.GGZVUXcuUjI3i1JRAB@fat_crate.local> <97468.1700129643@warthog.procyon.org.uk> In-Reply-To: <97468.1700129643@warthog.procyon.org.uk> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 16 Nov 2023 03:38:22 -0800 (PST) From: David Howells > Sent: 16 November 2023 10:14 > > David Laight wrote: > > > On haswell (which is now quite old) both 'rep movsb' and > > 'rep movsq' copy 16 bytes/clock unless the destination > > is 32 byte aligned when they copy 32 bytes/clock. > > Source alignment make no different, neither does byte > > alignment. > > I think the i3-4170 cpu I'm using is Haswell. Does that mean for my > particular cpu, just using inline "rep movsb" is the best choice? I've just looked at a slight old copy of the instruction timing doc from https://www.agner.org/optimize Apart from P4 (130 clock setup!) the setup cost for 'rep movs' is relatively small. I think everything since sandy bridge and bulldozer (except atoms, but including silvermont) do fast copies for 'rep movsb'. (But the C2758 atom we use claims erms.) I'd bet that the overhead for using 'rep movsb' for a short copy is less than that of the mispredicted branch (or two) to select the required code. That rather implies always using 'rep movsb' is best unless someone is compiling explicitly for an old cpu. And apart from P4 an explicit 'rep movsl' will be fastest then because the setup cost is minimal/zero. The cutoff for using 'rep movsb' for constant sized copies is probably also a lot less than you might expect. Especially assuming cold cache. This all makes that POS that gcc is inlining even more stupid. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)