From: Barry Song <[email protected]>
acomp's users might want to know algorithm flags to optimize
themselves. One typical user which can benefit from exposed
alg flags is zswap.
In zswap, zsmalloc is the most commonly used allocator for
(and perhaps the only one). For zsmalloc, we cannot sleep
while we map the compressed memory, so we copy it to a
temporary buffer. By knowing the alg won't sleep can help
zswap to avoid the need for a buffer. This shows noticeable
improvement in load/store latency of zswap.
This patch also fixes the missing ASYNC cra_flags in Intel
iaa and Hisilicon zip drivers.
Cc: Yang Shen <[email protected]>
Cc: Zhou Wang <[email protected]>
Cc: Tom Zanussi <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
drivers/crypto/hisilicon/zip/zip_crypto.c | 1 +
drivers/crypto/intel/iaa/iaa_crypto_main.c | 1 +
include/crypto/acompress.h | 5 +++++
include/linux/crypto.h | 5 +++++
4 files changed, 12 insertions(+)
diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.c b/drivers/crypto/hisilicon/zip/zip_crypto.c
index c650c741a18d..94e2d66b04b6 100644
--- a/drivers/crypto/hisilicon/zip/zip_crypto.c
+++ b/drivers/crypto/hisilicon/zip/zip_crypto.c
@@ -591,6 +591,7 @@ static struct acomp_alg hisi_zip_acomp_deflate = {
.base = {
.cra_name = "deflate",
.cra_driver_name = "hisi-deflate-acomp",
+ .cra_flags = CRYPTO_ALG_ASYNC,
.cra_module = THIS_MODULE,
.cra_priority = HZIP_ALG_PRIORITY,
.cra_ctxsize = sizeof(struct hisi_zip_ctx),
diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index dfd3baf0a8d8..91adf9d76a2e 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -1916,6 +1916,7 @@ static struct acomp_alg iaa_acomp_fixed_deflate = {
.base = {
.cra_name = "deflate",
.cra_driver_name = "deflate-iaa",
+ .cra_flags = CRYPTO_ALG_ASYNC,
.cra_ctxsize = sizeof(struct iaa_compression_ctx),
.cra_module = THIS_MODULE,
.cra_priority = IAA_ALG_PRIORITY,
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index 574cffc90730..07bd8f6bc79a 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -160,6 +160,11 @@ static inline void acomp_request_set_tfm(struct acomp_req *req,
req->base.tfm = crypto_acomp_tfm(tfm);
}
+static inline u32 crypto_acomp_get_alg_flags(struct crypto_acomp *tfm)
+{
+ return crypto_tfm_alg_flags(crypto_acomp_tfm(tfm));
+}
+
static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req)
{
return __crypto_acomp_tfm(req->base.tfm);
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index b164da5e129e..811bfaf8b6f8 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -467,6 +467,11 @@ static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
return tfm->__crt_alg->cra_blocksize;
}
+static inline unsigned int crypto_tfm_alg_flags(struct crypto_tfm *tfm)
+{
+ return tfm->__crt_alg->cra_flags;
+}
+
static inline unsigned int crypto_tfm_alg_alignmask(struct crypto_tfm *tfm)
{
return tfm->__crt_alg->cra_alignmask;
--
2.34.1
On Tue, Feb 20, 2024 at 5:14 PM Herbert Xu <[email protected]> wrote:
>
> On Tue, Feb 20, 2024 at 03:55:43PM +1300, Barry Song wrote:
> >
> > diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.c b/drivers/crypto/hisilicon/zip/zip_crypto.c
> > index c650c741a18d..94e2d66b04b6 100644
> > --- a/drivers/crypto/hisilicon/zip/zip_crypto.c
> > +++ b/drivers/crypto/hisilicon/zip/zip_crypto.c
> > @@ -591,6 +591,7 @@ static struct acomp_alg hisi_zip_acomp_deflate = {
> > .base = {
> > .cra_name = "deflate",
> > .cra_driver_name = "hisi-deflate-acomp",
> > + .cra_flags = CRYPTO_ALG_ASYNC,
> > .cra_module = THIS_MODULE,
> > .cra_priority = HZIP_ALG_PRIORITY,
> > .cra_ctxsize = sizeof(struct hisi_zip_ctx),
> > diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
> > index dfd3baf0a8d8..91adf9d76a2e 100644
> > --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
> > +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
> > @@ -1916,6 +1916,7 @@ static struct acomp_alg iaa_acomp_fixed_deflate = {
> > .base = {
> > .cra_name = "deflate",
> > .cra_driver_name = "deflate-iaa",
> > + .cra_flags = CRYPTO_ALG_ASYNC,
> > .cra_ctxsize = sizeof(struct iaa_compression_ctx),
> > .cra_module = THIS_MODULE,
> > .cra_priority = IAA_ALG_PRIORITY,
>
> Good catch. I think this should go into a separate bug-fix patch.
done.
>
> > diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
> > index 574cffc90730..07bd8f6bc79a 100644
> > --- a/include/crypto/acompress.h
> > +++ b/include/crypto/acompress.h
> > @@ -160,6 +160,11 @@ static inline void acomp_request_set_tfm(struct acomp_req *req,
> > req->base.tfm = crypto_acomp_tfm(tfm);
> > }
> >
> > +static inline u32 crypto_acomp_get_alg_flags(struct crypto_acomp *tfm)
> > +{
> > + return crypto_tfm_alg_flags(crypto_acomp_tfm(tfm));
> > +}
>
> Sorry, my mistake. I shouldn't have suggested copying skcipher
> since that gets the tfm flags as opposed to the alg flags which
> you've found out.
no worries, Herbert :-)
>
> I think you should just go with your original function acomp_is_async
> but do it like this:
>
> static inline bool acomp_is_async(struct crypto_acomp *tfm)
> {
> return crypto_comp_alg_common(tfm)->base.cra_flags &
> CRYPTO_ALG_ASYNC;
> }
ok, I will do that. Besides, i'm also curious if we can open a
discussion, for example, letting offload drivers be able to work
in both sync mode and async. A scenario I can imagine is as
below,
for zswap, and page size is configured as 4KiB. in this
case, offload hardware might be able to compress/decompress
much faster than CPU, but the event-wait, wake-up, schedule
latency might add the total time for a compression and
decompression. thus offload might work worse than CPU
though hardware-accelerator is much faster than CPU.
In this case, it seems good to let offload drivers poll
the completion of compression and decompression
compared with sleep and wait.
On the other hand, when PAGE SIZE is big, for example,
ARM64's PAGE_SIZE could be 64KiB. in the future,
mTHP/large folios project might also ask for larger data
to be swapped as a whole. we will only need a schedule/
wake-up for 64KiB or larger data, the compression/
decompression time is much longer, in this case, async
mode might help to decrease CPU usage while providing
lower comp/decomp latency.
So it could be something like:
if data is short, acomp driver works by polling; if data is
long, acomp driver works by sleeping and waiting.
it would be perfect if Zhou or Yang Shen can help collect some
data to back up the discussion.
>
> Thanks,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Thanks
Barry
On Tue, Feb 20, 2024 at 6:09 PM Herbert Xu <[email protected]> wrote:
>
> On Tue, Feb 20, 2024 at 06:05:16PM +1300, Barry Song wrote:
> >
> > So it could be something like:
> > if data is short, acomp driver works by polling; if data is
> > long, acomp driver works by sleeping and waiting.
>
> This sort of logic is specific to each piece of hardware and
> should go into the driver.
>
> There is no reason why an async driver cannot return synchronously.
> The API fully supports this mode of operation.
Nice to know! Thanks for clarification.
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Thanks
Barry