Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3423116pxj; Tue, 1 Jun 2021 05:08:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwZ1qB6kQjuuec1ykPgjjpCUDv47ZzHtoJ3fL1j2UIoTduO3gxWH8IVYydudkCPWbifp+iV X-Received: by 2002:a92:d24c:: with SMTP id v12mr8984803ilg.306.1622549290087; Tue, 01 Jun 2021 05:08:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622549290; cv=none; d=google.com; s=arc-20160816; b=jhOz3we2cmro85HbeOEUKtmTqXOKO1sdADBIfG5BqTkFcnHWnk/NgEt0Ckb1xRjrCH QF5LyOPYr5ZxO802UBnHY/0WpzBuRGXLLb9OD/VmoUXLAROClPOyq8YbB1WBK5Uyrzyp A9Omo++z2d6tfs/Cv2kNFPh0Jz71V8DMfP6lk5rJGxVJpIzfq69HYruLfAxORRhfYZl2 uh86jsci1/ThGV/C09PxQuFS4KC7hy+gMhsn6tYrEHRsoNXzRnQ8jODIuCJP0vkYmwfX MOIdhE4NF4tpfwbtsRmAjRwu/9Xn5LB430G+uNvwsy/YyUSVfstn+aTIP+nz/9qPEDK1 4Rjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=FAldgyoZ5aV/GTztmlBR/iQYEL8KfAko5pHLAkBRygA=; b=tqXJk3LobnS4f3yGkSzzloGDt7xHLkSCcEtqV9JjpOEDrRUGeT2DJbdzCIGfex0+Mq MaiV7YFsPW6hOrL97wMUs4ZhJL/tZpeU3w2/Yabor6J+UHgPPmy6osrKbgQbqSKtwti7 38njQQJ8yEOB3PTyq+dYh7NeGy3AoYOUfxYbbsNk084dKL1cUNQUyPk3mdugNkzzQ+0O /exwDS0xaCvG6mLwLGUhhRMHvqdt2wojlpUhyt4gSt0hyzuM+W9zBtfAyT7Xn/S/oMJ7 2dF2NUpulRTjqSQ9qmRopLod6yTP6CYw25DLkvjFSVCqhBIw+udG3ARqj67MeAi6sYfG 3X5A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w8si16947479iou.72.2021.06.01.05.07.56; Tue, 01 Jun 2021 05:08:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233409AbhFAMIX (ORCPT + 99 others); Tue, 1 Jun 2021 08:08:23 -0400 Received: from foss.arm.com ([217.140.110.172]:48436 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231201AbhFAMIW (ORCPT ); Tue, 1 Jun 2021 08:08:22 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4330C6D; Tue, 1 Jun 2021 05:06:40 -0700 (PDT) Received: from [10.57.73.64] (unknown [10.57.73.64]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 094DF3F73D; Tue, 1 Jun 2021 05:06:38 -0700 (PDT) Subject: Re: [PATCH v5 08/14] arm64: Import latest optimization of memcpy To: Sunil Kovvuri , Oliver Swede Cc: Catalin Marinas , will@kernel.org, linux-arm-kernel@lists.indradead.org, LKML , Sunil Goutham , George Cherian References: <20200914150958.2200-1-oli.swede@arm.com> <20200914150958.2200-9-oli.swede@arm.com> From: Robin Murphy Message-ID: <5156db7f-09a7-b0fa-d246-b024e40775fc@arm.com> Date: Tue, 1 Jun 2021 13:06:32 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-06-01 11:03, Sunil Kovvuri wrote: > On Mon, Sep 14, 2020 at 8:44 PM Oliver Swede wrote: >> >> From: Sam Tebbs >> >> Import the latest memcpy implementation into memcpy, >> copy_{from, to and in}_user. >> The implementation of the user routines is separated into two forms: >> one for when UAO is enabled and one for when UAO is disabled, with >> the two being chosen between with a runtime patch. >> This avoids executing the many NOPs emitted when UAO is disabled. >> >> The project containing optimized implementations for various library >> functions has now been renamed from 'cortex-strings' to >> 'optimized-routines', and the new upstream source is >> string/aarch64/memcpy.S as of commit 4c175c8be12 in >> https://github.com/ARM-software/optimized-routines. >> >> Signed-off-by: Sam Tebbs >> [ rm: add UAO fixups, streamline copy_exit paths, expand commit message ] >> Signed-off-by: Robin Murphy >> [ os: import newer memcpy algorithm, update commit message ] >> Signed-off-by: Oliver Swede >> --- >> arch/arm64/include/asm/alternative.h | 36 --- >> arch/arm64/lib/copy_from_user.S | 113 ++++++-- >> arch/arm64/lib/copy_in_user.S | 129 +++++++-- >> arch/arm64/lib/copy_template.S | 375 +++++++++++++++------------ >> arch/arm64/lib/copy_template_user.S | 24 ++ >> arch/arm64/lib/copy_to_user.S | 112 ++++++-- >> arch/arm64/lib/copy_user_fixup.S | 14 + >> arch/arm64/lib/memcpy.S | 47 ++-- >> 8 files changed, 557 insertions(+), 293 deletions(-) >> create mode 100644 arch/arm64/lib/copy_template_user.S >> create mode 100644 arch/arm64/lib/copy_user_fixup.S > > Do you have any performance data with this patch ? > I see these patches are still not pushed to mainline, any reasons ? Funny you should pick up on the 6-month-old thread days after I've been posting new versions of the relevant parts[1] :) I think this series mostly stalled on the complexity of the usercopy parts, which then turned into even more of a moving target anyway, hence why I decided to split it up. > Also curious to know why 128bit registers are not considered, similar to > https://android.googlesource.com/platform/bionic.git/+/a71b4c3f144a516826e8ac5b262099b920c49ce0/libc/arch-arm64/generic-neon/bionic/memcpy.S The overhead of kernel_neon_begin() etc. is significant, and usually only worth it in places like the crypto routines where there's enough benefit from actual ASIMD computation to outweigh the save/restore cost. On smaller cores where the L1 interface is only 128 bits wide anyway there is no possible gain in memcpy() throughput to ever offset that cost, and even for wider microarchitectures it's only likely to start breaking even at relatively large copy sizes. Plus we can't necessarily assume the ASIMD registers are even present (apparently the lack of a soft-float ABI hasn't stopped people from wanting to run Linux on such systems...) Robin. [1] https://lore.kernel.org/linux-arm-kernel/cover.1622128527.git.robin.murphy@arm.com/