Received: by 2002:a05:7412:bc1a:b0:d7:7d3a:4fe2 with SMTP id ki26csp371518rdb; Sat, 19 Aug 2023 05:20:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFRU9XNWIlE6mx+zqtdXfEbrY5pqA3CcPzGDtIPMYxbyOf3ZzL9GQ7dpbknlk19Uhvs56bg X-Received: by 2002:a17:90a:3041:b0:26d:412c:fc3c with SMTP id q1-20020a17090a304100b0026d412cfc3cmr1719072pjl.3.1692447652006; Sat, 19 Aug 2023 05:20:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692447651; cv=none; d=google.com; s=arc-20160816; b=OrqTG2lymP4JOFs5wiPNGqDrnLsuCnNPVzhAyWFL+PNUEHN2EXRGGfxwoN+OstRBxZ IilfFRIShk28snQcbeL1I6xhn5xid6y1DPl6w6x2piz8GXUO6pH1XjJ9q1MZKeFXwcRu 79jjXt8qq9kV6u0s4hQHigG7HozTCUhrrnSvjeTfqh1umiRIl7fQd4c1fGEcNCc3rYXN /hGmIWkMGWC3RUE9Zr+sb+rR+qanQ4M2kK/AnRaKMe5RqK5aSl6F+7hw0OgHCTrvfoZn FFJgW2wIm+OF3PIMJb5loB5W5YIYNaUDq0hOEZ8HoCEFMdQy20yopb+VsK8v/34VjGr9 kPwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=LGFN2KCqJV9GejhXtfoJIMThhtrdOdTu6C+pJgRDIDc=; fh=MVaTaQ1cdN/YED580NnO7NIxQPmQBlZuSGdZG9rc2Wg=; b=fJY7x+c+0GKleCeA1OVQ1rmtootGxUKTmHJhmWKC1PdBKrxJ9qnKOGgPuph88uGEiB d+TXrm0fkPeuZ/wVBgGErRQ6SydA4P65oLzhS1tBpVDc5GzGPNU62EPGUBrjQLBg4feJ +oNrW+R8wJD5Fhp1+F/G5MVw3C9cneklCMinZEDJmOnALQDdO0wG+yaC4HjurnxwcACS weuh5yerAPukbrVT993dcCkxfqkTx8nRZ2XTEQr8B+qXieA/xGQISaumAwnY7j3FNNt2 YwTbtKSOg5kQpPGmWbyifvCzhSQbuG8EUBDBnH52zCa13+sPBBVVc3/3c1BYukFgD51X HbWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id pg2-20020a17090b1e0200b0026d40e6a819si3465171pjb.157.2023.08.19.05.20.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 19 Aug 2023 05:20:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 364117F7FC; Sat, 19 Aug 2023 01:36:59 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240712AbjHRVjp convert rfc822-to-8bit (ORCPT + 99 others); Fri, 18 Aug 2023 17:39:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240739AbjHRVj0 (ORCPT ); Fri, 18 Aug 2023 17:39:26 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43CB43C0A for ; Fri, 18 Aug 2023 14:39:24 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-156-hPJ_18syOlyCB1PGiL-aSA-1; Fri, 18 Aug 2023 22:39:21 +0100 X-MC-Unique: hPJ_18syOlyCB1PGiL-aSA-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Fri, 18 Aug 2023 22:39:20 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Fri, 18 Aug 2023 22:39:20 +0100 From: David Laight To: 'David Howells' CC: Linus Torvalds , Al Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , Jeff Layton , "linux-fsdevel@vger.kernel.org" , "linux-block@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() Thread-Topic: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() Thread-Index: AQHZ0DpP/l59sWTPXU+UuQ9VGbJikq/s16Kg///6foCAACRpIIAA7PpbgABGFYCAAgTAc4AAASpwgAAG9ACAAFxrAA== Date: Fri, 18 Aug 2023 21:39:20 +0000 Message-ID: <04ee44bc6c2d4c5bb1c143bcb6803b7b@AcuMS.aculab.com> References: <03730b50cebb4a349ad8667373bb8127@AcuMS.aculab.com> <20230816120741.534415-1-dhowells@redhat.com> <20230816120741.534415-3-dhowells@redhat.com> <608853.1692190847@warthog.procyon.org.uk> <3dabec5643b24534a1c1c51894798047@AcuMS.aculab.com> <665724.1692218114@warthog.procyon.org.uk> <2058762.1692371971@warthog.procyon.org.uk> <2093413.1692377320@warthog.procyon.org.uk> In-Reply-To: <2093413.1692377320@warthog.procyon.org.uk> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,PDS_BAD_THREAD_QP_64, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Howells > Sent: Friday, August 18, 2023 5:49 PM > > David Laight wrote: > > > > iov_iter_init inc 0x27 -> 0x31 +0xa > > > > Are you hitting the gcc bug that loads the constant from memory? > > I'm not sure what that looks like. For your perusal, here's a disassembly of > the use-switch-on-enum variant: > > 0xffffffff8177726c <+0>: cmp $0x1,%esi > 0xffffffff8177726f <+3>: jbe 0xffffffff81777273 > 0xffffffff81777271 <+5>: ud2 > 0xffffffff81777273 <+7>: test %esi,%esi > 0xffffffff81777275 <+9>: movw $0x1,(%rdi) > 0xffffffff8177727a <+14>: setne 0x3(%rdi) > 0xffffffff8177727e <+18>: xor %eax,%eax > 0xffffffff81777280 <+20>: movb $0x0,0x2(%rdi) > 0xffffffff81777284 <+24>: movb $0x1,0x4(%rdi) > 0xffffffff81777288 <+28>: mov %rax,0x8(%rdi) > 0xffffffff8177728c <+32>: mov %rdx,0x10(%rdi) > 0xffffffff81777290 <+36>: mov %r8,0x18(%rdi) > 0xffffffff81777294 <+40>: mov %rcx,0x20(%rdi) > 0xffffffff81777298 <+44>: jmp 0xffffffff81d728a0 <__x86_return_thunk> > > versus the use-bitmap variant: > > 0xffffffff81777311 <+0>: cmp $0x1,%esi > 0xffffffff81777314 <+3>: jbe 0xffffffff81777318 > 0xffffffff81777316 <+5>: ud2 > 0xffffffff81777318 <+7>: test %esi,%esi > 0xffffffff8177731a <+9>: movb $0x2,(%rdi) > 0xffffffff8177731d <+12>: setne 0x1(%rdi) > 0xffffffff81777321 <+16>: xor %eax,%eax > 0xffffffff81777323 <+18>: mov %rdx,0x10(%rdi) > 0xffffffff81777327 <+22>: mov %rax,0x8(%rdi) > 0xffffffff8177732b <+26>: mov %r8,0x18(%rdi) > 0xffffffff8177732f <+30>: mov %rcx,0x20(%rdi) > 0xffffffff81777333 <+34>: jmp 0xffffffff81d72960 <__x86_return_thunk> > > It seems to be that the former is loading byte constants individually, whereas > Linus combined all those fields into a single byte and eliminated one of them. I think you need to re-order the structure. The top set writes to bytes 0..4 with: > 0xffffffff81777275 <+9>: movw $0x1,(%rdi) > 0xffffffff8177727a <+14>: setne 0x3(%rdi) > 0xffffffff81777280 <+20>: movb $0x0,0x2(%rdi) > 0xffffffff81777284 <+24>: movb $0x1,0x4(%rdi) Note that the 'setne' writes into the middle of the constants. The lower writes bytes 0..1 with: > 0xffffffff8177731a <+9>: movb $0x2,(%rdi) > 0xffffffff8177731d <+12>: setne 0x1(%rdi) I think that if you move the 'conditional' value to offset 4 you'll get fewer writes. Probably a 32bit load into %eax and then a write. I don't think gcc likes generating 16bit immediates. In some tests I did it loaded a 32bit value into %eax and then wrote the low bits. So the code is much the same (on x86) for 2 or 4 bytes of constants. I'm sure you can use the 'data-16' prefix with an immediate. I'm not sure why you have two non-zero values when Linus only had one though. OTOH you don't want to be writing 3 bytes of constants. Also gcc won't generate: movl $0xaabbccdd,%eax setne %al // overwriting the dd movl %eax,(%rdi) and I suspect the partial write (to %al) will be a stall. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)