Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp2217684rdb; Mon, 20 Nov 2023 05:33:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IG0aHuBy9Kk1q7f6qN/rjP1fuRwiolenAbJkth4tBbMTcBm+BdFQzhpz1zodvZ8Q1YOF9e9 X-Received: by 2002:a05:6a00:4c81:b0:6bb:aaf:d7db with SMTP id eb1-20020a056a004c8100b006bb0aafd7dbmr9829596pfb.29.1700487204416; Mon, 20 Nov 2023 05:33:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700487204; cv=none; d=google.com; s=arc-20160816; b=DEbJd1F54q8aEiMpaGe+6AzyFEJUapwb1WDlFyo+sj/jwes2zoVyHusyzmG5hMa8cm cWGm5XTZzq6vcQd2DhEAWNV3zScRjCdCp1Kddbm4B5jq47MEjG6+mNs9GG66aLJhrL8l +4yI0i7mDx/CXmMkcLPHQ4JFOmKOMm3f3QQE0C/kYWlgPGMnI/rs7cqNNErU3FX1TkJG JxPaIoJDRISbD2EJUi+AC+s7yCNJII9xjD3F/8lNLPCEOxk6OmyHWcfUu8Z4g14jxiJ1 AS3Ij3HgD9v/l3kSaYSf/AY5/W48hitHwVN8dIT3eMu9Dh98q86Gw6Yg3uD0epNWVJRW ykPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-id:mime-version:subject :cc:to:references:in-reply-to:from:organization:dkim-signature; bh=GKAjxb5L8GXcEiZp1p1xgiANkfwmq+6HRFgDC3/1RH8=; fh=bs4gBEdRv/LpQG4o42LMtvHwD9MQgsarePZqPBxmj54=; b=wYefnnAPSbYnpQSJjsP7KEnTGr+SS5kKIGiOAj6M/eu09a98TjQmtXP7WV9lYjVf5d NmPO2Qzbd+qHqjetE8y5lEVne9BtfgoKb7Yi10E3D12tee5f6voMobxyvSL2KhUQ3jAq ZtlbtYuaYK6E7Dp00pd/eMP9Ejm8rhZTJ+iJEDqobJHQHG9XRW1ZVDt2qQjRbL7EF2+N VeiwGRqG7zzOcv1ESjM0kyxTiNeHfyivoDFcXN2mxND4lUO86Xfd9CYZtE7aQqsHQttY /raQRS2wZ2b0mf5iD825SjkoGyMCLaONdFyYjpqTuY6uI6800WnlxxvFfhT0eA6vIZFB sPwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JHzQjg1R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id fa16-20020a056a002d1000b006be30cdc3d8si7986463pfb.163.2023.11.20.05.33.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 05:33:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JHzQjg1R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 8B9ED80D8F4A; Mon, 20 Nov 2023 05:33:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232565AbjKTNdL (ORCPT + 99 others); Mon, 20 Nov 2023 08:33:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232135AbjKTNdK (ORCPT ); Mon, 20 Nov 2023 08:33:10 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E7ABD2 for ; Mon, 20 Nov 2023 05:33:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700487185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=GKAjxb5L8GXcEiZp1p1xgiANkfwmq+6HRFgDC3/1RH8=; b=JHzQjg1RWJmHk0kdGCcnbuJXhX80SajqMv6Jd9Lp2q6APpw8QdcZEhAPMlB6a75qCuLHFM PBOgF1stxwyBPFS2sQDwAX6FBu0cgpDbUUIimbw9VDTpRgbBGzdZYQi4qjVnQLfeOaRbLR zxt8NPnWYHhjWEuqGrxSLu2fMkYM4wA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-677-WX_z4KetNXKpjlICiS3MVQ-1; Mon, 20 Nov 2023 08:33:02 -0500 X-MC-Unique: WX_z4KetNXKpjlICiS3MVQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AD5308F15C7; Mon, 20 Nov 2023 13:33:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F889C1596F; Mon, 20 Nov 2023 13:32:58 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <20231115190938.GGZVUXcuUjI3i1JRAB@fat_crate.local> <232440.1700153046@warthog.procyon.org.uk> <864270.1700230337@warthog.procyon.org.uk> <20231117160940.GGZVeQRLgLjJZXBLE1@fat_crate.local> <20231117191243.GHZVe7K4vN9n5M92gb@fat_crate.local> To: Linus Torvalds Cc: dhowells@redhat.com, Borislav Petkov , kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Christian Brauner , Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , David Laight , ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2284218.1700487177.1@warthog.procyon.org.uk> Date: Mon, 20 Nov 2023 13:32:57 +0000 Message-ID: <2284219.1700487177@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 20 Nov 2023 05:33:22 -0800 (PST) Linus Torvalds wrote: > So I don't think we should use either of these benchmarks as a "we > need to optimize for *this*", but it is another example of how much > memcpy() does matter. Even if the end result is then "but different > microarchitectrues react so differently that we can't please > everybody". So what, if anything, should I change? Should I make it directly call __memcpy? Or should we just leave it to the compiler? I would prefer to leave memcpy_from_iter() and memcpy_to_iter() as __always_inline to eliminate the function pointer call we otherwise end up with and to eliminate the return value (which is always 0 in this case). How about something along the attached lines? (With the inline asm appropriate pushed out to an arch header file). On the other hand, it might be better to have memcpy_to/from_iter() just call __memcpy() as it doesn't seem to make much difference to the time taken and the inline func can still return a constant 0 return value that can be optimised away. David --- diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S index 0ae2e1712e2e..7354982dc433 100644 --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -43,7 +43,7 @@ EXPORT_SYMBOL(__memcpy) SYM_FUNC_ALIAS_MEMFUNC(memcpy, __memcpy) EXPORT_SYMBOL(memcpy) -SYM_FUNC_START_LOCAL(memcpy_orig) +SYM_TYPED_FUNC_START(memcpy_orig) movq %rdi, %rax cmpq $0x20, %rdx @@ -169,4 +169,4 @@ SYM_FUNC_START_LOCAL(memcpy_orig) .Lend: RET SYM_FUNC_END(memcpy_orig) - +EXPORT_SYMBOL(memcpy_orig) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index de7d11cf4c63..de73edb9ffcc 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -62,7 +62,17 @@ static __always_inline size_t memcpy_to_iter(void *iter_to, size_t progress, size_t len, void *from, void *priv2) { - memcpy(iter_to, from + progress, len); + size_t len2 = len; + from += progress; + /* + * If CPU has FSRM feature, use 'rep movs'. + * Otherwise, use rep_movs_alternative. + */ + asm volatile( + ALTERNATIVE("rep movsb", + "call memcpy_orig", ALT_NOT(X86_FEATURE_FSRM)) + :"+D" (iter_to), "+S" (from), "+d" (len), "+c"(len2), ASM_CALL_CONSTRAINT + :: "memory", "rax", "r8", "r9", "r10", "r11"); return 0; } @@ -70,7 +80,18 @@ static __always_inline size_t memcpy_from_iter(void *iter_from, size_t progress, size_t len, void *to, void *priv2) { - memcpy(to + progress, iter_from, len); + size_t len2 = len; + to += progress; + /* + * If CPU has FSRM feature, use 'rep movs'. + * Otherwise, use rep_movs_alternative. + */ + asm volatile( + ALTERNATIVE("rep movsb", + "call memcpy_orig", + ALT_NOT(X86_FEATURE_FSRM)) + :"+D" (to), "+S" (iter_from), "+d" (len), "+c" (len2), ASM_CALL_CONSTRAINT + :: "memory", "rax", "r8", "r9", "r10", "r11"); return 0; }