Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp314533pxa; Tue, 4 Aug 2020 06:23:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw93LEaPR1SpMqtWVmTEnKn/SEGaMWeJ+oBw7wkwZPLBV14dGUjYHGu0U0oZ2rmGgf0Yf2C X-Received: by 2002:a17:906:64d1:: with SMTP id p17mr21696481ejn.440.1596547424919; Tue, 04 Aug 2020 06:23:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596547424; cv=none; d=google.com; s=arc-20160816; b=LgAlmVZzeQvZexrnXY+hy5Tlqs0hhZn3vbR8MDvx4/3RV8Yq74/Td2vHpmq+1ZRKMG cu3Cnf5VgID3yBhRmqWrWLyrnb92MKi21wpx+59vfPLhZd0kYmGke8MQ6o2l/Khh6AtD wRG1lF56N7Ru9FE7cKLJRc6wxkz2UdW5eY/lqbu6WJPi0+MGgWV1UlY/39VF2PEDgMUk BS1vxB/3XVwDV48B/WN3q243QeGzl2A7rKxJNIz8leUwFeGgMsWsT/D5VOK4o5W1X7TL LnOIZkSyQIAEMExF6PrRY1pMd0w+Kid6HOOGmVVxa9rY1yRxhk7tDgbFLrBAbzoNEtC0 05mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature:dkim-filter; bh=DdfTfg62vo4tK5KKbWymQd5osgFGRQ5t9gviBNhxWfs=; b=gC2hF8ihA1mh1AhrTttHQOSP8vY9ocfEBIRim8LIy4NVf0V9Z/nNCl5G+eZyTw0lPo wRQt+vQxcq+p7JziB4LVonIa+eXB4jyFPZ9XHQkWEEGbeUgUMoCgAneMlpL5cgt+BCav V0UbIo1DohmvZHU1ADNmQIzTSuDGuaJHFkkLr26OiRVWCaifbTvIyWrEvW02TNFBQm3W Wr74IS+Mlj65xuptncz8Eixop845TD5z2Aw3DCTDhojfh52lIYbbDMCVnD5Jk3BClhua E9RMFDK35f32oUL5QTcadWW0bnAEmIkvbYZYWEKW9fKkvKzzj4rIHRYvfzBLINxwiWPf 0+WQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=j+Yp2tE9; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 15si11828554ejw.370.2020.08.04.06.23.17; Tue, 04 Aug 2020 06:23:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=j+Yp2tE9; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728409AbgHDNWn (ORCPT + 99 others); Tue, 4 Aug 2020 09:22:43 -0400 Received: from mail2.candelatech.com ([208.74.158.173]:35106 "EHLO mail3.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728387AbgHDNWn (ORCPT ); Tue, 4 Aug 2020 09:22:43 -0400 Received: from [192.168.254.5] (unknown [50.34.202.127]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail3.candelatech.com (Postfix) with ESMTPSA id C985413C2B0; Tue, 4 Aug 2020 06:22:39 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail3.candelatech.com C985413C2B0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=candelatech.com; s=default; t=1596547362; bh=HxHRy1/aUU4jST30w+h9yypNuGUna3zc/o0endUsehI=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=j+Yp2tE9X1q6Bg9NPfQtHd+TZDej680yUn/1JijxB2Uc+9mU9A2B0NrlLYYoTfo5o ODfKP7u97xc7quiIbVJE9hO+kBEyrzD2DGI3d/6qclDXvPGDLROE/qLuofUnbQmYtT GAUBojHqq9g6ATtu1Q2qPCWwJaKkdLtnGeJjYv8c= Subject: Re: [PATCH] crypto: x86/aesni - implement accelerated CBCMAC, CMAC and XCBC shashes To: Ard Biesheuvel Cc: Linux Crypto Mailing List , Herbert Xu , Eric Biggers References: <20200802090616.1328-1-ardb@kernel.org> <25776a56-4c6a-3976-f4bc-fa53ba4a1550@candelatech.com> <9c137bbf-2892-df7a-e6fa-8cce417ecd45@candelatech.com> From: Ben Greear Organization: Candela Technologies Message-ID: Date: Tue, 4 Aug 2020 06:22:39 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-MW Content-Transfer-Encoding: 7bit Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On 8/4/20 6:08 AM, Ard Biesheuvel wrote: > On Tue, 4 Aug 2020 at 15:01, Ben Greear wrote: >> >> On 8/4/20 5:55 AM, Ard Biesheuvel wrote: >>> On Mon, 3 Aug 2020 at 21:11, Ben Greear wrote: >>>> >>>> Hello, >>>> >>>> This helps a bit...now download sw-crypt performance is about 150Mbps, >>>> but still not as good as with my patch on 5.4 kernel, and fpu is still >>>> high in perf top: >>>> >>>> 13.89% libc-2.29.so [.] __memset_sse2_unaligned_erms >>>> 6.62% [kernel] [k] kernel_fpu_begin >>>> 4.14% [kernel] [k] _aesni_enc1 >>>> 2.06% [kernel] [k] __crypto_xor >>>> 1.95% [kernel] [k] copy_user_generic_string >>>> 1.93% libjvm.so [.] SpinPause >>>> 1.01% [kernel] [k] aesni_encrypt >>>> 0.98% [kernel] [k] crypto_ctr_crypt >>>> 0.93% [kernel] [k] udp_sendmsg >>>> 0.78% [kernel] [k] crypto_inc >>>> 0.74% [kernel] [k] __ip_append_data.isra.53 >>>> 0.65% [kernel] [k] aesni_cbc_enc >>>> 0.64% [kernel] [k] __dev_queue_xmit >>>> 0.62% [kernel] [k] ipt_do_table >>>> 0.62% [kernel] [k] igb_xmit_frame_ring >>>> 0.59% [kernel] [k] ip_route_output_key_hash_rcu >>>> 0.57% [kernel] [k] memcpy >>>> 0.57% libjvm.so [.] InstanceKlass::oop_follow_contents >>>> 0.56% [kernel] [k] irq_fpu_usable >>>> 0.56% [kernel] [k] mac_do_update >>>> >>>> If you'd like help setting up a test rig and have an ath10k pcie NIC or ath9k pcie NIC, >>>> then I can help. Possibly hwsim would also be a good test case, but I have not tried >>>> that. >>>> >>> >>> I don't think this is likely to be reproducible on other >>> micro-architectures, so setting up a test rig is unlikely to help. >>> >>> I'll send out a v2 which implements a ahash instead of a shash (and >>> implements some other tweaks) so that kernel_fpu_begin() is only >>> called twice for each packet on the cbcmac path. >>> >>> Do you have any numbers for the old kernel without your patch? This >>> pathological FPU preserve/restore behavior could be caused be the >>> optimizations, or by other changes that landed in the meantime, so I >>> would like to know if kernel_fpu_begin() is as prominent in those >>> traces as well. >>> >> >> This same patch makes i7 mobile processors able to handle 1Gbps+ software >> decrypt rates, where without the patch, the rate was badly constrained and CPU >> load was much higher, so it is definitely noticeable on other processors too. > > OK > >> The weak processor on the current test rig is convenient because the problem >> is so noticeable even at slower wifi speeds. >> >> We can do some tests on 5.4 with our patch reverted. >> > > The issue with your CCM patch is that it keeps the FPU enabled for the > entire input, which also means that preemption is disabled, which > makes the -rt people grumpy. (Of course, it also uses APIs that no > longer exists, but that should be easy to fix) So, if there is no other way to get back the performance, can it be a compile or runtime option (disabled by default for -RT type folks) to re-enable the feature that helps our CPU usage? Or, can you do an add-on patch to enable keeping fpu enabled so that I can test how that affects our performance? > > Do you happen to have any ballpark figures for the packet sizes and > the time spent doing encryption? This test was using MTU UDP frames I think, and mostly it is just sending and receiving frames. perf top output gives you as much detail as I have about what the kernel is spending time doing. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com