From: Binoy Jayan Subject: Re: [RFC PATCH v4] IV Generation algorithms for dm-crypt Date: Mon, 20 Mar 2017 20:01:56 +0530 Message-ID: References: <1486463731-6224-1-git-send-email-binoy.jayan@linaro.org> <68f70534-a309-46ba-a84d-8acc1e6620e5@gmail.com> <2aef6e54-805f-e09b-ae66-c198f8c05335@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2271408809661336662==" Cc: Rajendra , Herbert Xu , Oded , Ondrej Mosnacek , Mike Snitzer , Linux kernel mailing list , Milan Broz , linux-raid@vger.kernel.org, dm-devel@redhat.com, Mark Brown , Arnd Bergmann , linux-crypto@vger.kernel.org, Shaohua Li , "David S. Miller" , Alasdair Kergon , Ofir To: Gilad Ben-Yossef Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com List-Id: linux-crypto.vger.kernel.org --===============2271408809661336662== Content-Type: multipart/alternative; boundary=001a114e1d94b6ab1f054b2a639b --001a114e1d94b6ab1f054b2a639b Content-Type: text/plain; charset=UTF-8 Hi, On 8 March 2017 at 13:49, Binoy Jayan wrote: > Hi Gilad, > >> I gave it a spin on a x86_64 with 8 CPUs with AES-NI using cryptd and >> on Arm using CryptoCell hardware accelerator. >> >> There was no difference in performance between 512 and 4096 bytes >> cluster size on the x86_64 (800 MB loop file system) >> >> There was an improvement in latency of 3.2% between 512 and 4096 bytes >> cluster size on the Arm. I expect the performance benefits for this >> test for Binoy's patch to be the same. >> >> In both cases the very naive test was a simple dd with block size of >> 4096 bytes or the raw block device. >> >> I do not know what effect having a bigger cluster size would have on >> have on other more complex file system operations. >> Is there any specific benchmark worth testing with? The multiple instances issue in /proc/crypto is fixed. It was because of the IV code itself modifying the algorithm name inadvertently in the global crypto algorithm lookup table when it was splitting up "plain(cbc(aes))" into "plain" and "cbc(aes)" so as to invoke the child algorithm. I ran a few tests with dd, bonnie and FIO under Qemu - x86 using the automated script [1] that I wrote to make the testing easy. The tests were done on software implementations of the algorithms as the real hardware was not available with me. According to the test, I found that the sequential reads and writes have a good improvement (5.7 %) in the data rate with the proposed solution while the random reads shows a very little improvement. When tested with FIO, the random writes also shows a small improvement (2.2%) but the random reads show a little deterioration in performance (4 %). When tested in arm hardware, only the sequential writes with bonnie shows improvement (5.6%). All other tests shows degraded performance in the absence of crypto hardware. [1] https://github.com/binoyjayan/utilities/blob/master/utils/dmtest Dependencies: dd [Full version], bonnie, fio Thanks, Binoy --001a114e1d94b6ab1f054b2a639b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

On 8 March 2017 at 13:49, Binoy Jayan <= binoy.jayan@lin= aro.org> wrote:
> Hi Gilad,
>
>> I gave it a sp= in on a x86_64 with 8 CPUs with AES-NI using cryptd and
>> on Arm= =C2=A0 using CryptoCell hardware accelerator.
>>
>> There= was no difference in performance between 512 and 4096 bytes
>> cl= uster size on the x86_64 (800 MB loop file system)
>>
>> = There was an improvement in latency of 3.2% between 512 and 4096 bytes
&= gt;> cluster size on the Arm. I expect the performance benefits for this=
>> test for Binoy's patch to be the same.
>>
>= > In both cases the very naive test was a simple dd with block size of>> 4096 bytes or the raw block device.
>>
>> I do= not know what effect having a bigger cluster size would have on
>>= ; have on other more complex file system operations.
>> Is there a= ny specific benchmark worth testing with?

The multiple instances iss= ue in /proc/crypto is fixed. It was because of
the IV code itself modify= ing the algorithm name inadvertently in the
global crypto algorithm look= up table when it was splitting up
"plain(cbc(aes))" into "= ;plain" and "cbc(aes)" so as to invoke the child
algorith= m.

I ran a few tests with dd, bonnie and FIO under Qemu - x86 using = the
automated script [1] that I wrote to make the testing easy.
The t= ests were done on software implementations of the algorithms
as the real= hardware was not available with me. According to the test,
I found that= the sequential reads and writes have a good improvement
(5.7 %) in the = data rate with the proposed solution while the random
reads shows a very= little improvement. When tested with FIO, the
random writes also shows = a small improvement (2.2%) but the random
reads show a little deteriorat= ion in performance (4 %).
=C2=A0
When tested in arm hardwa= re, only the sequential writes with bonnie
shows improvement (5.6%). All= other tests shows degraded performance
in the absence of cry= pto hardware.
Depend= encies: dd [Full version], bonnie, fio

Thanks,
Binoy

=
--001a114e1d94b6ab1f054b2a639b-- --===============2271408809661336662== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============2271408809661336662==--