Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2867797pxj; Sun, 20 Jun 2021 03:32:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbmXfcxHtePPwbS2/aF2uvXXrHlf9g1FoGhdWSwyfrn03H/RnFZGC4HiAuvsBPkKDQMDrK X-Received: by 2002:a92:cbc8:: with SMTP id s8mr11770880ilq.193.1624185162011; Sun, 20 Jun 2021 03:32:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624185162; cv=none; d=google.com; s=arc-20160816; b=X9Z92OIZ/a3WiWFr4VAzz2eR0v/yCdIucYvQvjfZF6/bou6nz77D1UBQE45dbyb40E ngvmKtaYwb0lAjnma5HZCiGZ2Rv8Yhd07WMCXlXnl1cMEyTVN9eErXvJA1UugFlTwxIo 266LLBYdiny97K/KA4+kqfOJbQKvPtngqURoR072qfc9HkHLU9Ejw1xvODQFnfJUrE0D JSUIblqfbaOE3O/d6GO2XYYqkJD15k+sZGsKYrs4UkkK32s5Ae+7Iixdn1rfoujZVnrD e8uX+UlRlzx0vGwxGsfWUZMiqxSZ+qhPdCjkYBngqmSe1IGRINzPSeKL77h/0/dnBuqy xcxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:to:subject; bh=tD6iCj+WEsizm7+AcWfN3uveqzJx7Y55+ic6LSZvZcw=; b=SHoPLGsmWu/HjxRk2Ccc1G65IJFl2TQWl212K3ZPDNbCNOjydm4KonO5YQqFTPEgVe ZYBc9LfdXar+XfeNRWaZ1JxBQ8o9uRSo36bcY8jaT/zzprEF47qpObVBeknpDDqrvjxJ gFa/8tzTmX20zgXdIZdbdHvilC4KfOaEK2QK2HSzrP2P1AfaLBCFenREa2rEF8HO+b63 gkV+mb0Ll7TF/Qap+mwxvMYbxwDZOwKRdiq9kqfBILiZVDQPVThmpq16XvdXEOloQE6A 6/36ZKBwkxoYAbNBAILg8KaqA8JDQSBFYlIxzXvgiBL3du4Dajq7DwhSF601sHL2x8qL ZtkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g17si4422185iox.56.2021.06.20.03.32.16; Sun, 20 Jun 2021 03:32:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229608AbhFTKY4 (ORCPT + 99 others); Sun, 20 Jun 2021 06:24:56 -0400 Received: from imap2.colo.codethink.co.uk ([78.40.148.184]:43430 "EHLO imap2.colo.codethink.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229516AbhFTKYz (ORCPT ); Sun, 20 Jun 2021 06:24:55 -0400 X-Greylist: delayed 1201 seconds by postgrey-1.27 at vger.kernel.org; Sun, 20 Jun 2021 06:24:55 EDT Received: from cpc152649-stkp13-2-0-cust121.10-2.cable.virginm.net ([86.15.83.122] helo=[192.168.0.18]) by imap2.colo.codethink.co.uk with esmtpsa (Exim 4.92 #3 (Debian)) id 1luuHn-0007q2-Mg; Sun, 20 Jun 2021 11:02:39 +0100 Subject: Re: [PATCH v2 0/5] riscv: improving uaccess with logs from network bench To: Akira Tsukamoto , Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org References: <5a5c07ac-8c11-79d3-46a3-a255d4148f76@gmail.com> From: Ben Dooks Organization: Codethink Limited. Message-ID: <542310bc-840d-d5c9-a7b3-40f58504e7b5@codethink.co.uk> Date: Sun, 20 Jun 2021 11:02:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <5a5c07ac-8c11-79d3-46a3-a255d4148f76@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/06/2021 12:21, Akira Tsukamoto wrote: > Optimizing copy_to_user and copy_from_user. > > I rewrote the functions in v2, heavily influenced by Garry's memcpy > function [1]. > The functions must be written in assembler to handle page faults manually > inside the function. > > With the changes, improves in the percentage usage and some performance > of network speed in UDP packets. > Only patching copy_user. Using the original memcpy. > > All results are from the same base kernel, same rootfs and same > BeagleV beta board. Is there a git tree for these to try them out? > Comparison by "perf top -Ue task-clock" while running iperf3. > > --- TCP recv --- >  * Before >   40.40%  [kernel]  [k] memcpy >   33.09%  [kernel]  [k] __asm_copy_to_user >  * After >   50.35%  [kernel]  [k] memcpy >   13.76%  [kernel]  [k] __asm_copy_to_user > > --- TCP send --- >  * Before >   19.96%  [kernel]  [k] memcpy >    9.84%  [kernel]  [k] __asm_copy_to_user >  * After >   14.27%  [kernel]  [k] memcpy >    7.37%  [kernel]  [k] __asm_copy_to_user > > --- UDP send --- >  * Before >   25.18%  [kernel]  [k] memcpy >   22.50%  [kernel]  [k] __asm_copy_to_user >  * After >   28.90%  [kernel]  [k] memcpy >    9.49%  [kernel]  [k] __asm_copy_to_user > > --- UDP recv --- >  * Before >   44.45%  [kernel]  [k] memcpy >   31.04%  [kernel]  [k] __asm_copy_to_user >  * After >   55.62%  [kernel]  [k] memcpy >   11.22%  [kernel]  [k] __asm_copy_to_user What's the memcpy figure in the above? Could you explain the figures please? > Processing network packets require a lot of unaligned access for the packet > header, which is not able to change the design of the header format to be > aligned. Isn't there an option to allow padding of network packets in the skbuff to make the fields aligned for architectures which do not have efficient unaligned loads (looking at you arm32). Has this been looked at? > And user applications call system calls with a large buffer for send/recf() > and sendto/recvfrom() to repeat less function calls for the optimization. > > v1 -> v2: > - Added shift copy > - Separated patches for readability of changes in assembler > - Using perf results > > [1] https://lkml.org/lkml/2021/2/16/778 > > Akira Tsukamoto (5): >   riscv: __asm_to/copy_from_user: delete existing code >   riscv: __asm_to/copy_from_user: Adding byte copy first >   riscv: __asm_to/copy_from_user: Copy until dst is aligned address >   riscv: __asm_to/copy_from_user: Bulk copy while shifting misaligned >     data >   riscv: __asm_to/copy_from_user: Bulk copy when both src dst are >     aligned > >  arch/riscv/lib/uaccess.S | 181 +++++++++++++++++++++++++++++++-------- >  1 file changed, 146 insertions(+), 35 deletions(-) I'm concerned that delete and then re-add is either going to make the series un-bisectable or leave a point where the kernel is very broken? -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius https://www.codethink.co.uk/privacy.html