Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp296288pxa; Tue, 4 Aug 2020 06:02:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDWythWgARhrBzp0n+Sr294KmQNcqOqIWxW7S+68uXPqPJNnZfvHYGJH50DdVKoh35DJH/ X-Received: by 2002:a17:906:12cd:: with SMTP id l13mr20930961ejb.385.1596546121053; Tue, 04 Aug 2020 06:02:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596546121; cv=none; d=google.com; s=arc-20160816; b=VlwbDUpViAe51HaRhdMkRIjMMPUWOjXOdeedA//HCKU3t7d+rc4voaTHhJnOD5wLv4 afpciJrEjBu6wofSr/mtbANEeyQZCDR11DlJlpTC3mRhwTOs3P0aYcpaznYco8+mz4wP nYx5S00WVtBb7iM7SY94X2SqGyhTPCftxX7sIzVB6R5hyBZj2LsdmJnFdpGGeKNk/3VU Wy2GU7nzemXO4EZWT9LmZRkGR9RXqU3cBgbgfutzg+ogeH+5VGMKXE2Soysaq26tyaOD 68jzzH/5nH/ecWcyzllxLLanzeMcUxwJKqI8EaUWzx3qvnpvly8YTrM4H0YQp9lTH9kX Ywwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature:dkim-filter; bh=KLfJvy2DLvYQO5d/wYBe04yHt3xoJU6PC3OcFa95GnI=; b=V6oOk5mdhLw79RhZ50EmjrawKx3gW+dCu46TsyreaqOe//pRJER47pHDmIW4lYPy/G NX75VzAWoePj+TTdRy2O+16I2OzgQHyZkwmHiB//Qe7ij1nhnuKXgw6EhGX2cGFZBbZS QKGS10w6sWDm2jOGtG2Ve0thiXsQ6qss5kc4+36b5v0WMwW28UbCksmDI8Cst5hlIg+J DoJu64omvUM/NwgvGTkCcA5CHFSKqJelYfUDJpukzVVIjXiY1xvTYfn3gkmt9YZWYmPf Jqc5PHfkJB+BD1mX5D4bFYMMjLti738hf2mE05tU7aieTuHZexzuCB3dSxQOS08LmHUd OiUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=gs1xPyHL; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ay7si1058727edb.183.2020.08.04.06.01.34; Tue, 04 Aug 2020 06:02:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=gs1xPyHL; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728249AbgHDNB3 (ORCPT + 99 others); Tue, 4 Aug 2020 09:01:29 -0400 Received: from mail2.candelatech.com ([208.74.158.173]:34180 "EHLO mail3.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726210AbgHDNBC (ORCPT ); Tue, 4 Aug 2020 09:01:02 -0400 Received: from [192.168.254.5] (unknown [50.34.202.127]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail3.candelatech.com (Postfix) with ESMTPSA id AEB3013C2B0; Tue, 4 Aug 2020 06:01:00 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail3.candelatech.com AEB3013C2B0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=candelatech.com; s=default; t=1596546061; bh=ZuRbHha8N2or0z1ZnBmVjpl30gVuqmx9TUuFkz97K94=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=gs1xPyHL6ONXtjfmTK3SrTcCS+uI7CY8D1a+MJuetpieL7F845OPlw6zTWubPWPtn fV1rktpSrUDfeFIQotxYtVbm8uA/veNHOqKD355tL7bEeUSKKDnm7fcwYZadbiT+IR y1FgdJY+N25p0dqZHym966ctx19LAqGoNyQ7FGgQ= Subject: Re: [PATCH] crypto: x86/aesni - implement accelerated CBCMAC, CMAC and XCBC shashes To: Ard Biesheuvel Cc: Linux Crypto Mailing List , Herbert Xu , Eric Biggers References: <20200802090616.1328-1-ardb@kernel.org> <25776a56-4c6a-3976-f4bc-fa53ba4a1550@candelatech.com> From: Ben Greear Organization: Candela Technologies Message-ID: <9c137bbf-2892-df7a-e6fa-8cce417ecd45@candelatech.com> Date: Tue, 4 Aug 2020 06:01:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-MW Content-Transfer-Encoding: 7bit Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On 8/4/20 5:55 AM, Ard Biesheuvel wrote: > On Mon, 3 Aug 2020 at 21:11, Ben Greear wrote: >> >> Hello, >> >> This helps a bit...now download sw-crypt performance is about 150Mbps, >> but still not as good as with my patch on 5.4 kernel, and fpu is still >> high in perf top: >> >> 13.89% libc-2.29.so [.] __memset_sse2_unaligned_erms >> 6.62% [kernel] [k] kernel_fpu_begin >> 4.14% [kernel] [k] _aesni_enc1 >> 2.06% [kernel] [k] __crypto_xor >> 1.95% [kernel] [k] copy_user_generic_string >> 1.93% libjvm.so [.] SpinPause >> 1.01% [kernel] [k] aesni_encrypt >> 0.98% [kernel] [k] crypto_ctr_crypt >> 0.93% [kernel] [k] udp_sendmsg >> 0.78% [kernel] [k] crypto_inc >> 0.74% [kernel] [k] __ip_append_data.isra.53 >> 0.65% [kernel] [k] aesni_cbc_enc >> 0.64% [kernel] [k] __dev_queue_xmit >> 0.62% [kernel] [k] ipt_do_table >> 0.62% [kernel] [k] igb_xmit_frame_ring >> 0.59% [kernel] [k] ip_route_output_key_hash_rcu >> 0.57% [kernel] [k] memcpy >> 0.57% libjvm.so [.] InstanceKlass::oop_follow_contents >> 0.56% [kernel] [k] irq_fpu_usable >> 0.56% [kernel] [k] mac_do_update >> >> If you'd like help setting up a test rig and have an ath10k pcie NIC or ath9k pcie NIC, >> then I can help. Possibly hwsim would also be a good test case, but I have not tried >> that. >> > > I don't think this is likely to be reproducible on other > micro-architectures, so setting up a test rig is unlikely to help. > > I'll send out a v2 which implements a ahash instead of a shash (and > implements some other tweaks) so that kernel_fpu_begin() is only > called twice for each packet on the cbcmac path. > > Do you have any numbers for the old kernel without your patch? This > pathological FPU preserve/restore behavior could be caused be the > optimizations, or by other changes that landed in the meantime, so I > would like to know if kernel_fpu_begin() is as prominent in those > traces as well. > This same patch makes i7 mobile processors able to handle 1Gbps+ software decrypt rates, where without the patch, the rate was badly constrained and CPU load was much higher, so it is definitely noticeable on other processors too. The weak processor on the current test rig is convenient because the problem is so noticeable even at slower wifi speeds. We can do some tests on 5.4 with our patch reverted. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com