Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp674945ybe; Fri, 13 Sep 2019 04:32:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqxzGiEoG70ylx3i+x9/tCscEFq+4f2uRDMbOJW5gq5K8vmlNG73Tl6XhQ5HHoRfNdsg/2kH X-Received: by 2002:aa7:d98b:: with SMTP id u11mr45538468eds.110.1568374355347; Fri, 13 Sep 2019 04:32:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568374355; cv=none; d=google.com; s=arc-20160816; b=AM5zyzvw0D2FIfWy/l3dDQQZXnzu9DWU4+QOqnHPWeethGqRj0COz8cm8v4CVSfCan aApAplbiMUted5U7wHiAn3nyjqK7zg35Wi2WqoaS/hCr8/q+KXQagQWhyZ+se1GkTVlg shlN7hVwxqFM4HQ7Zcny1GFPKTtY2Y9ybWKnZjbLOatrmkHWXht6cics+ER0iUqSVE+A 8el8oa4dsWsPyUsiLryMRkxHW33+7skYBjLrjSmuO/IrZabNrgiSkl5k5pZUhJEvnHMF 8SDvLAaykYtjPOTL+zRNcD5NIuxrk99ieP3Zj9Om81i3tUKmfDJRqF8JHBZHM1kkqV5V Wx5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DtSTn5fbHw+QXwOQrpj5NfpmWj8Jm05lqrs1XqH1FTM=; b=u2SLhbAeV3xjPk7Lg316CdEcQcZxvBKjouyu/nlF0jjlgPt9blaOwujXwkFNh0JSsw 8L/L4TdMpB6xk0bY3teeUn3ce1vsxhwu0J3QpkZL5L4Pc4aPkyuoLTxDPHniZd9KPBjE uDJXZwwRIzO71qgai/X4NymVo2cj8MLL3mdaEicVtVXoEVKXAWyeMZMfBnz7iX82fBAY ywJyKzg5huHfK/gu2XMTtDlyLwAzcu94S+4YU+Vm41RrVqNt+B0oprXODyR/ojsDotAn n4xKfytJjE3RWGbX31dyDBcbFPa57jvFAxsGwRaPlYTJLuiTGN1Dtd4KJjD9zvDQHKHk hgXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id qt15si14426864ejb.252.2019.09.13.04.32.11; Fri, 13 Sep 2019 04:32:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387588AbfIML3Y (ORCPT + 99 others); Fri, 13 Sep 2019 07:29:24 -0400 Received: from helcar.hmeau.com ([216.24.177.18]:33588 "EHLO fornost.hmeau.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387584AbfIML3Y (ORCPT ); Fri, 13 Sep 2019 07:29:24 -0400 Received: from gwarestrin.arnor.me.apana.org.au ([192.168.0.7]) by fornost.hmeau.com with smtp (Exim 4.89 #2 (Debian)) id 1i8jlH-0001Jw-UM; Fri, 13 Sep 2019 21:29:13 +1000 Received: by gwarestrin.arnor.me.apana.org.au (sSMTP sendmail emulation); Fri, 13 Sep 2019 21:29:06 +1000 Date: Fri, 13 Sep 2019 21:29:06 +1000 From: Herbert Xu To: Daniel Jordan Cc: Steffen Klassert , Lai Jiangshan , Peter Zijlstra , Tejun Heo , linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 0/9] padata: use unbound workqueues for parallel jobs Message-ID: <20190913112905.GB14972@gondor.apana.org.au> References: <20190906014029.3345-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190906014029.3345-1-daniel.m.jordan@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Thu, Sep 05, 2019 at 09:40:20PM -0400, Daniel Jordan wrote: > v2 -> v3: > - Rebase onto cryptodev. > > v1 -> v2: > - Updated patch 8 to avoid queueing the reorder work if the next object > by sequence number isn't ready yet (Herbert) > - Added Steffen's ack to all but patch 8 since that one changed. > > RFC -> v1: > - Included Tejun's acks. > - Added testing section to cover letter. > > Padata binds the parallel part of a job to a single CPU and round-robins > over all CPUs in the system for each successive job. Though the serial > parts rely on per-CPU queues for correct ordering, they're not necessary > for parallel work, and it improves performance to run the job locally on > NUMA machines and let the scheduler pick the CPU within a node on a busy > system. > > This series makes parallel padata jobs run on unbound workqueues. > > Patch Description > ----- ----------- > > 1 Make a padata instance allocate its workqueue internally. > > 2 Unconfine some recently-confined workqueue interfaces. > > 3-6 Address recursive CPU hotplug locking issue. > > padata_alloc* requires its callers to hold this lock, but allocating > an unbound workqueue and calling apply_workqueue_attrs also take it. > Fix by removing the requirement for callers of padata_alloc*. > > 7-8 Add a second workqueue for each padata instance that's dedicated to > parallel jobs. > > 9 Small cleanup. > > Performance > ----------- > > Measurements are from a 2-socket, 20-core, 40-CPU Xeon server. > > For repeatability, modprobe was bound to a CPU and the serial cpumasks > for both pencrypt and pdecrypt were also restricted to a CPU different > from modprobe's. > > # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3 > # modprobe tcrypt mode=211 sec=1 > # modprobe tcrypt mode=215 sec=1 > > Busy system (tcrypt run while 10 stress-ng tasks were burning 100% CPU) > > base test > ---------------- --------------- > speedup key_sz blk_sz ops/sec stdev ops/sec stdev > > (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=211) > > 117.2x 160 16 960 30 112555 24775 > 135.1x 160 64 845 246 114145 25124 > 113.2x 160 256 993 17 112395 24714 > 111.3x 160 512 1000 0 111252 23755 > 110.0x 160 1024 983 16 108153 22374 > 104.2x 160 2048 985 22 102563 20530 > 98.5x 160 4096 998 3 98346 18777 > 86.2x 160 8192 1000 0 86173 14480 > > (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=211) > > 127.2x 160 16 997 5 126834 24244 > 128.4x 160 64 1000 0 128438 23261 > 127.6x 160 256 992 7 126627 23493 > 124.0x 160 512 1000 0 123958 22746 > 122.8x 160 1024 989 20 121372 22632 > 112.8x 160 2048 998 3 112602 18287 > 106.9x 160 4096 994 10 106255 16111 > 91.7x 160 8192 1000 0 91742 11670 > > multibuffer (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=215) > > 242.2x 160 16 2363 141 572189 16846 > 242.1x 160 64 2397 151 580424 11923 > 231.1x 160 256 2472 21 571387 16364 > 237.6x 160 512 2429 24 577264 8692 > 238.3x 160 1024 2384 97 568155 6621 > 216.3x 160 2048 2453 74 530627 3480 > 209.2x 160 4096 2381 206 498192 19177 > 176.5x 160 8192 2323 157 410013 9903 > > multibuffer (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=215) > > 220.3x 160 16 2341 228 515733 91317 > 216.6x 160 64 2467 33 534381 101262 > 217.7x 160 256 2451 45 533443 85418 > 213.8x 160 512 2485 26 531293 83767 > 211.0x 160 1024 2472 28 521677 80339 > 200.8x 160 2048 2459 67 493808 63587 > 188.8x 160 4096 2491 9 470325 58055 > 159.9x 160 8192 2459 51 393147 25756 > > Idle system (tcrypt run by itself) > > base test > ---------------- --------------- > speedup key_sz blk_sz ops/sec stdev ops/sec stdev > > (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=211) > > 2.5x 160 16 63412 43075 161615 1034 > 4.1x 160 64 39554 24006 161653 981 > 6.0x 160 256 26504 1436 160110 1158 > 6.2x 160 512 25500 40 157018 951 > 5.9x 160 1024 25777 1094 151852 915 > 5.8x 160 2048 24653 218 143756 508 > 5.6x 160 4096 24333 20 136752 548 > 5.0x 160 8192 23310 15 117660 481 > > (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=211) > > 2.4x 160 16 53471 48279 128047 31328 > 3.4x 160 64 37712 20855 128187 31074 > 4.5x 160 256 27911 4378 126430 31084 > 4.9x 160 512 25346 175 123870 29099 > 3.1x 160 1024 38452 23118 120817 26846 > 4.7x 160 2048 24612 187 115036 23942 > 4.5x 160 4096 24217 114 109583 21559 > 4.2x 160 8192 23144 108 96850 16686 > > multibuffer (pcrypt(rfc4106-gcm-aesni)) encryption (tcrypt mode=215) > > 1.0x 160 16 412157 3855 426973 1591 > 1.0x 160 64 412600 4410 431920 4224 > 1.1x 160 256 410352 3254 453691 17831 > 1.2x 160 512 406293 4948 473491 39818 > 1.2x 160 1024 395123 7804 478539 27660 > 1.2x 160 2048 385144 7601 453720 17579 > 1.2x 160 4096 371989 3631 449923 15331 > 1.2x 160 8192 346723 1617 399824 18559 > > multibuffer (pcrypt(rfc4106-gcm-aesni)) decryption (tcrypt mode=215) > > 1.1x 160 16 407317 1487 452619 14404 > 1.1x 160 64 411821 4261 464059 23541 > 1.2x 160 256 408941 4945 477483 36576 > 1.2x 160 512 406451 611 472661 11038 > 1.2x 160 1024 394813 2667 456357 11452 > 1.2x 160 2048 390291 4175 448928 8957 > 1.2x 160 4096 371904 1068 449344 14225 > 1.2x 160 8192 344227 1973 404397 19540 > > Testing > ------- > > In addition to the bare metal performance runs above, this series was > tested in a kvm guest with the tcrypt module (mode=215). All > combinations of CPUs among parallel_cpumask, serial_cpumask, and CPU > hotplug online/offline were run with 3 possible CPUs, and over 2000 > random combinations of these were run with 8 possible CPUs. Workqueue > events were used throughout to verify that all parallel and serial > workers executed on only the CPUs allowed by the cpumask sysfs files. > > Finally, tcrypt mode=215 was run at each patch in the series when built > with and without CONFIG_PADATA/CONFIG_CRYPTO_PCRYPT. > > v2: https://lore.kernel.org/linux-crypto/20190829173038.21040-1-daniel.m.jordan@oracle.com/ > v1: https://lore.kernel.org/linux-crypto/20190813005224.30779-1-daniel.m.jordan@oracle.com/ > RFC: https://lore.kernel.org/lkml/20190725212505.15055-1-daniel.m.jordan@oracle.com/ > > Daniel Jordan (9): > padata: allocate workqueue internally > workqueue: unconfine alloc/apply/free_workqueue_attrs() > workqueue: require CPU hotplug read exclusion for > apply_workqueue_attrs > padata: make padata_do_parallel find alternate callback CPU > pcrypt: remove padata cpumask notifier > padata, pcrypt: take CPU hotplug lock internally in > padata_alloc_possible > padata: use separate workqueues for parallel and serial work > padata: unbind parallel jobs from specific CPUs > padata: remove cpu_index from the parallel_queue > > Documentation/padata.txt | 12 +-- > crypto/pcrypt.c | 167 ++++--------------------------- > include/linux/padata.h | 16 +-- > include/linux/workqueue.h | 4 + > kernel/padata.c | 201 ++++++++++++++++++++++---------------- > kernel/workqueue.c | 25 +++-- > 6 files changed, 170 insertions(+), 255 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt