Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp573856pxb; Mon, 16 Aug 2021 12:04:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz3QF+XmdQw/HEfZWPpda6ojZImReJVeTZ9UwEb7JwwTC23cxN41AUm/iBtjDbySRggJkwS X-Received: by 2002:a05:6402:4386:: with SMTP id o6mr99499edc.6.1629140673908; Mon, 16 Aug 2021 12:04:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629140673; cv=none; d=google.com; s=arc-20160816; b=DJdIV987Ixw1LP+K5H9Vwo6Ic9BJ4EVvR8ebp+AXzBIz/vbj9V2s90F3IPcLv8Ax3n +scGhl5t7esh+HQepCLmbFmlndpsV4F3zw2bqQpIcpeN5iCfVmD/9IXKYAXteVxVSJaW JsVhOUA3BKi0CN5RrZkt8luMjp0mFDjDS85mgu4RLFSpkECezYyqAuCZuQVYQi6XsdFt nyl9wfcfVSvESOx6Pr4O2fVpZ7cPFjrqzBkbDfjlMmzYO68mKT7nYoQ7B+F0SZTdKZu3 qvqOwJhXn+oBEh1Z1A3QUsYkWZ4iY6PnYx3U5Nd+C4jXEPxtpaf2mOoj4NyyKmAFtA57 JtdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from; bh=PusIt0T+I6VnyS7FwrDw3BMK8okCHKfg5zxPJcdqd74=; b=uylFhUF4K3FyL1aMcp9B8uiBGeC6q/2BiMRWRPGius9d+XlEPapEiApKgYmUr+jKOY slFT+/vvkXHJU6a3KujUk5i1Vr7g9CK7ssWj6vzPJHo9JR2CS9hC6PJDFunohSxs4sok jyf23x/siXQ+HZ1P+kaz/RUn5AwxVTRx4pVBfm28BwrCxWDC6PeAjmrS8i5cm+r3kUXO n4qCmqLS4NxBzY5kSu8Dw81+nzOVDs4oIs8tyMGkA2ZkBOkomdudrBP9TOF/5ZF4bRA4 IiZjCLDx6D+G6L6wnSPTAEy/l3mx5zJzl5bvZzhkn4Y3wzsHGP6fFggxZBhN+Z6eBKLe Ml3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b1si9894862edv.347.2021.08.16.12.04.10; Mon, 16 Aug 2021 12:04:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229758AbhHPTA7 (ORCPT + 99 others); Mon, 16 Aug 2021 15:00:59 -0400 Received: from mail-out.m-online.net ([212.18.0.9]:34362 "EHLO mail-out.m-online.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229556AbhHPTA6 (ORCPT ); Mon, 16 Aug 2021 15:00:58 -0400 Received: from frontend01.mail.m-online.net (unknown [192.168.8.182]) by mail-out.m-online.net (Postfix) with ESMTP id 4GpNlQ4jSRz1qwGp; Mon, 16 Aug 2021 21:00:18 +0200 (CEST) Received: from localhost (dynscan1.mnet-online.de [192.168.6.70]) by mail.m-online.net (Postfix) with ESMTP id 4GpNlQ3lsxz1r6P1; Mon, 16 Aug 2021 21:00:18 +0200 (CEST) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.8.182]) by localhost (dynscan1.mail.m-online.net [192.168.6.70]) (amavisd-new, port 10024) with ESMTP id C_DzlPCR5dQ3; Mon, 16 Aug 2021 21:00:17 +0200 (CEST) X-Auth-Info: Rw1j2hvSxw+BeG4hBnBmc2okW6IsahQRiTeFyGFwkw5fEIf7BhrTh0d9IoJAMmc+ Received: from igel.home (ppp-46-244-185-182.dynamic.mnet-online.de [46.244.185.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.mnet-online.de (Postfix) with ESMTPSA; Mon, 16 Aug 2021 21:00:17 +0200 (CEST) Received: by igel.home (Postfix, from userid 1000) id C1A902C2640; Mon, 16 Aug 2021 21:00:16 +0200 (CEST) From: Andreas Schwab To: Palmer Dabbelt Cc: akira.tsukamoto@gmail.com, Paul Walmsley , linux@roeck-us.net, geert@linux-m68k.org, qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG References: X-Yow: Now I'm having INSIPID THOUGHTS about the beautiful, round wives of HOLLYWOOD MOVIE MOGULS encased in PLEXIGLASS CARS and being approached by SMALL BOYS selling FRUIT.. Date: Mon, 16 Aug 2021 21:00:16 +0200 In-Reply-To: (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)") Message-ID: <87zgthjjun.fsf@igel.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Aug 16 2021, Palmer Dabbelt wrote: > On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote: >> Reduce the number of slow byte_copy when the size is in between >> 2*SZREG to 9*SZREG by using none unrolled word_copy. >> >> Without it any size smaller than 9*SZREG will be using slow byte_copy >> instead of none unrolled word_copy. >> >> Signed-off-by: Akira Tsukamoto >> --- >> arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++---- >> 1 file changed, 42 insertions(+), 4 deletions(-) >> >> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S >> index 63bc691cff91..6a80d5517afc 100644 >> --- a/arch/riscv/lib/uaccess.S >> +++ b/arch/riscv/lib/uaccess.S >> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user) >> /* >> * Use byte copy only if too small. >> * SZREG holds 4 for RV32 and 8 for RV64 >> + * a3 - 2*SZREG is minimum size for word_copy >> + * 1*SZREG for aligning dst + 1*SZREG for word_copy >> */ >> - li a3, 9*SZREG /* size must be larger than size in word_copy */ >> + li a3, 2*SZREG >> bltu a2, a3, .Lbyte_copy_tail >> >> /* >> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user) >> andi a3, a1, SZREG-1 >> bnez a3, .Lshift_copy >> >> +.Lcheck_size_bulk: >> + /* >> + * Evaluate the size if possible to use unrolled. >> + * The word_copy_unlrolled requires larger than 8*SZREG >> + */ >> + li a3, 8*SZREG >> + add a4, a0, a3 >> + bltu a4, t0, .Lword_copy_unlrolled >> + >> .Lword_copy: >> - /* >> - * Both src and dst are aligned, unrolled word copy >> + /* >> + * Both src and dst are aligned >> + * None unrolled word copy with every 1*SZREG iteration >> + * >> + * a0 - start of aligned dst >> + * a1 - start of aligned src >> + * t0 - end of aligned dst >> + */ >> + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */ >> + addi t0, t0, -(SZREG) /* not to over run */ >> +1: >> + REG_L a5, 0(a1) >> + addi a1, a1, SZREG >> + REG_S a5, 0(a0) >> + addi a0, a0, SZREG >> + bltu a0, t0, 1b >> + >> + addi t0, t0, SZREG /* revert to original value */ >> + j .Lbyte_copy_tail >> + >> +.Lword_copy_unlrolled: >> + /* >> + * Both src and dst are aligned >> + * Unrolled word copy with every 8*SZREG iteration >> * >> * a0 - start of aligned dst >> * a1 - start of aligned src >> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user) >> bltu a0, t0, 2b >> >> addi t0, t0, 8*SZREG /* revert to original value */ >> - j .Lbyte_copy_tail >> + >> + /* >> + * Remaining might large enough for word_copy to reduce slow byte >> + * copy >> + */ >> + j .Lcheck_size_bulk >> >> .Lshift_copy: > > I'm still not convinced that going all the way to such a large unrolling > factor is a net win, but this at least provides a much smoother cost > curve. > > That said, this is causing my 32-bit configs to hang. It's missing fixups for the loads in the loop. diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S index a835df6bd68f..12ed1f76bd1f 100644 --- a/arch/riscv/lib/uaccess.S +++ b/arch/riscv/lib/uaccess.S @@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user) bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */ addi t0, t0, -(SZREG) /* not to over run */ 1: - REG_L a5, 0(a1) + fixup REG_L a5, 0(a1), 10f addi a1, a1, SZREG - REG_S a5, 0(a0) + fixup REG_S a5, 0(a0), 10f addi a0, a0, SZREG bltu a0, t0, 1b Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."