Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp541986pxb; Mon, 16 Aug 2021 11:12:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNRzQJE2KXhL0V5CP0leiPwGYuVluLM0PhPMTaV5nvOQYr1VFTczv9m8sx85vFIldujpO7 X-Received: by 2002:a17:907:7251:: with SMTP id ds17mr17467871ejc.43.1629137564467; Mon, 16 Aug 2021 11:12:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629137564; cv=none; d=google.com; s=arc-20160816; b=X+qgicJM+P2TH5gEf444ln6dmPudB/usPrIpmSKDG/30IkxIWO11m5KrfV2JRQc42w OB+6pon9XbQoesHQoCV44RM73LmI0ASF/0aYVGFMwDpHk+2rNZMYhEgJPUYhTRwhU4hr DxBgew6c6FvY+jN8EE5uxct1tlEHvBDmFl1tshtZv9qfu2JAAforXRTwVJN+6bEGzEey 6fyFHhHxgRARADMesKP9m0zug0Bym1KlnFwvvuCDsgrOpAmiraP/u6gvBGH06hQDIwQX XAYFWFLgypm6ejst62KcwpTyTp49dgogJMRvNncWj5PWIerdUOZE/GddjB40Zh7eNUX8 XgHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:to:from:cc:in-reply-to:subject:date:dkim-signature; bh=p2bkqWFEC2gR2YtuFUx6Fm7qYd1zZdWbBWhk55kWfPA=; b=0ELkNbuFS19gVr+HqMuXkgk/ClVO6zk0e4U0Ma6lzEABwcK/8lL+XOxpCnXOHIdWJt jXAk57hr2tkIEMbIV/X2hy2WQ+QXJ+gHiaDbGAsUM0pusMjK4VmOY80k0QKK9phg5QR/ ZbiulHfzbIKoWx9x97wK4zm/jJrEZL6fDsKn6oqyp/4s5iTdaNnhegb+a9B0qRULr1g3 l9ZzVERQ3ApYeaES6XzYheb/kT8gfvM8StjkuBJ/wpVB5IRlVicOGqc0TiO0o/NAjhN3 D9XqIdNXbf4sDywrVLFiDRYl8OXNbrFnmmLCLVGdfE5N93e9hUgpRl8iAsNbJdaNIiDU ayxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dabbelt-com.20150623.gappssmtp.com header.s=20150623 header.b=a2tVNda0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id yc10si3499953ejb.683.2021.08.16.11.12.20; Mon, 16 Aug 2021 11:12:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dabbelt-com.20150623.gappssmtp.com header.s=20150623 header.b=a2tVNda0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231374AbhHPSKb (ORCPT + 99 others); Mon, 16 Aug 2021 14:10:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233119AbhHPSKZ (ORCPT ); Mon, 16 Aug 2021 14:10:25 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2927FC0612A4 for ; Mon, 16 Aug 2021 11:09:47 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id 28-20020a17090a031cb0290178dcd8a4d1so655985pje.0 for ; Mon, 16 Aug 2021 11:09:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20150623.gappssmtp.com; s=20150623; h=date:subject:in-reply-to:cc:from:to:message-id:mime-version :content-transfer-encoding; bh=p2bkqWFEC2gR2YtuFUx6Fm7qYd1zZdWbBWhk55kWfPA=; b=a2tVNda0WPfrJUw66HIu72ovJiFEENcVs4kaBBZ++kllMOZqV2QLrtqOxDWz3czrbp WfSp55yY9ZysyKQW9q5/bm++/rWkAz4/buGG4G1HLv3QwcC2UALa3b5E9XK84NVBwXC/ FTYyqS2F106horA2gZpCs1tPa47BJf71yM0gmITYiFBWgF2dUknhn+zZ46lsMja6ZTKL Dlj0r4xQEKt4ABoGbVRDjroXAnV3Mqx6hSJazfcczijWk/PfPURd92RomCfOyQnTWDbY 068hp5vjXluNF9B3R6p4JCN0Fl8WYuYtXMJVL7rxHX+1U2rNPk3VeOYtBXyZj2ajuWOW DlaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:subject:in-reply-to:cc:from:to:message-id :mime-version:content-transfer-encoding; bh=p2bkqWFEC2gR2YtuFUx6Fm7qYd1zZdWbBWhk55kWfPA=; b=AZGudx3ojYhhg+6hy4nzJOKERylwnGrO5TnAj0QKf3so+wuryJ9Xu7QM5QReAFNEME tf0TOOqDqyMUSHRCl28xxA3517MvpijOrAY9ZZq63K8tk5HcN+k7d3aqUeUw7fS+khKq pATOmbTiWGzmHf8V9vj9/6GHEfAqJHA3UrO+duVX8ysFZ0C5ZIqnaZHj7goKTXf//b01 r2ZJ/ecEbecuub3fPa2+XNmPqRPYYAzg4DIhF8uesLHT7xeD86VgpSZ2y8V0o7WltHLa g6++ehK2gX8venG93qQUjjouJvf9EhQc6ZCJdGg2Gw+H9Q6uTrt5efB7wrA9Sz01m9ru 0EXA== X-Gm-Message-State: AOAM532Odp1DDRdRWNFRkWAnBNA5AueK8503fW2yJrH7G3x2s1AH1mb2 yFoptHF7N6C60OuCkzKuTJ9v0Q== X-Received: by 2002:a65:4307:: with SMTP id j7mr58100pgq.387.1629137386557; Mon, 16 Aug 2021 11:09:46 -0700 (PDT) Received: from localhost (76-210-143-223.lightspeed.sntcca.sbcglobal.net. [76.210.143.223]) by smtp.gmail.com with ESMTPSA id gd14sm165849pjb.4.2021.08.16.11.09.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Aug 2021 11:09:45 -0700 (PDT) Date: Mon, 16 Aug 2021 11:09:45 -0700 (PDT) X-Google-Original-Date: Mon, 16 Aug 2021 11:09:43 PDT (-0700) Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG In-Reply-To: CC: Paul Walmsley , linux@roeck-us.net, geert@linux-m68k.org, qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org From: Palmer Dabbelt To: akira.tsukamoto@gmail.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote: > Reduce the number of slow byte_copy when the size is in between > 2*SZREG to 9*SZREG by using none unrolled word_copy. > > Without it any size smaller than 9*SZREG will be using slow byte_copy > instead of none unrolled word_copy. > > Signed-off-by: Akira Tsukamoto > --- > arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++---- > 1 file changed, 42 insertions(+), 4 deletions(-) > > diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S > index 63bc691cff91..6a80d5517afc 100644 > --- a/arch/riscv/lib/uaccess.S > +++ b/arch/riscv/lib/uaccess.S > @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user) > /* > * Use byte copy only if too small. > * SZREG holds 4 for RV32 and 8 for RV64 > + * a3 - 2*SZREG is minimum size for word_copy > + * 1*SZREG for aligning dst + 1*SZREG for word_copy > */ > - li a3, 9*SZREG /* size must be larger than size in word_copy */ > + li a3, 2*SZREG > bltu a2, a3, .Lbyte_copy_tail > > /* > @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user) > andi a3, a1, SZREG-1 > bnez a3, .Lshift_copy > > +.Lcheck_size_bulk: > + /* > + * Evaluate the size if possible to use unrolled. > + * The word_copy_unlrolled requires larger than 8*SZREG > + */ > + li a3, 8*SZREG > + add a4, a0, a3 > + bltu a4, t0, .Lword_copy_unlrolled > + > .Lword_copy: > - /* > - * Both src and dst are aligned, unrolled word copy > + /* > + * Both src and dst are aligned > + * None unrolled word copy with every 1*SZREG iteration > + * > + * a0 - start of aligned dst > + * a1 - start of aligned src > + * t0 - end of aligned dst > + */ > + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */ > + addi t0, t0, -(SZREG) /* not to over run */ > +1: > + REG_L a5, 0(a1) > + addi a1, a1, SZREG > + REG_S a5, 0(a0) > + addi a0, a0, SZREG > + bltu a0, t0, 1b > + > + addi t0, t0, SZREG /* revert to original value */ > + j .Lbyte_copy_tail > + > +.Lword_copy_unlrolled: > + /* > + * Both src and dst are aligned > + * Unrolled word copy with every 8*SZREG iteration > * > * a0 - start of aligned dst > * a1 - start of aligned src > @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user) > bltu a0, t0, 2b > > addi t0, t0, 8*SZREG /* revert to original value */ > - j .Lbyte_copy_tail > + > + /* > + * Remaining might large enough for word_copy to reduce slow byte > + * copy > + */ > + j .Lcheck_size_bulk > > .Lshift_copy: I'm still not convinced that going all the way to such a large unrolling factor is a net win, but this at least provides a much smoother cost curve. That said, this is causing my 32-bit configs to hang. There were a few conflicts so I may have messed something up, but nothing is jumping out at me. I've put what I ended up with on a branch, if you have time to look that'd be great but if not then I'll take another shot at this when I get back around to it. https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=wip-word_user_copy Here's the backtrace, though that's probably not all that useful: [ 0.703694] Unable to handle kernel NULL pointer dereference at virtual address 000005a8 [ 0.704194] Oops [#1] [ 0.704301] Modules linked in:[ 0.704463] CPU: 2 PID: 1 Comm: init Not tainted 5.14.0-rc1-00016-g59461ddb9dbd #5 [ 0.704660] Hardware name: riscv-virtio,qemu (DT) [ 0.704802] epc : walk_stackframe+0xac/0xc2[ 0.704941] ra : dump_backtrace+0x1a/0x22 [ 0.705074] epc : c0004558 ra : c0004588 sp : c1c5fe10 [ 0.705216] gp : c18b41c8 tp : c1cd8000 t0 : 00000000[ 0.705357] t1 : ffffffff t2 : 00000000 s0 : c1c5fe40 [ 0.705506] s1 : c11313dc a0 : 00000000 a1 : 00000000 [ 0.705647] a2 : c06fd2c2 a3 : c11313dc a4 : c084292d[ 0.705787] a5 : 00000000 a6 : c1864cb8 a7 : 3fffffff [ 0.705926] s2 : 00000000 s3 : c1123e88 s4 : 00000000 [ 0.706066] s5 : c11313dc s6 : c06fd2c2 s7 : 00000001[ 0.706206] s8 : 00000000 s9 : 95af6e28 s10: 00000000 [ 0.706345] s11: 00000001 t3 : 00000000 t4 : 00000000 [ 0.706482] t5 : 00000001 t6 : 00000000[ 0.706594] status: 00000100 badaddr: 000005a8 cause: 0000000d [ 0.706809] [] walk_stackframe+0xac/0xc2 [ 0.707019] [] dump_backtrace+0x1a/0x22[ 0.707149] [] show_stack+0x2c/0x38 [ 0.707271] [] dump_stack_lvl+0x40/0x58 [ 0.707400] [] dump_stack+0x12/0x1a[ 0.707521] [] panic+0xfa/0x2a6 [ 0.707632] [] do_exit+0x7a8/0x7ac [ 0.707749] [] do_group_exit+0x2a/0x7e[ 0.707872] [] __wake_up_parent+0x0/0x20 [ 0.707999] [] ret_from_syscall+0x0/0x2 [ 0.708385] ---[ end trace 260976561a3770d1 ]---