2015-10-09 10:29:19

by Russell King - ARM Linux

[permalink] [raw]
Subject: [PATCH 0/3] crypto: fixes for Marvell hash

This small series of patches addresses oopses seen when trying to use
the AF_ALG interface via openssl with openssh. This series does not
address all problems, but merely stops the kernel from smashing its
kernel stack and oopsing.

With these fixes in place, the kernel no longer oopses. However, with
the digests enabled in openssl, openssh refuses to work, producing the
following when attempting to connect to the target system:

Corrupted MAC on input.
Disconnecting: Packet corrupt

It's been hard enough to get this far; the crypto code is not the easiest
code to debug for a new-comer due to the amount of state needed to be
retained to understand the code (all the inline functions masking
multiple levels of containerisation and pointer dereference does not
make it easy to track what is stored where, and once I've been through
one bit of code, I find I'm having to revisit the same piece of code a
bit later to re-understand what it's doing.)

It's been difficult enough to find the engine plugin for openssl - the
original git repo which hosted it is now dead
(http://src.carnivore.it/users/common/af_alg/). All that seems to be
left is someone's modified version on github, which seems to get some
maintanence. Debian doesn't seem to carry AF_ALG openssl support, and
seems to only carry one package (strongswan) which supports this
interface.

Hence, I'm leaving further debugging to other parties, especially as
the userspace tooling for the AF_ALG seems rather lacking. (Are there
any test programs, if so, can their location be documented and placed
in Documentation/crypto please?)

I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
picked Thomas. It would be nice if there was an entry in MAINTAINERS
for this driver.

The first patch in this series avoids kernel stack smashing if a crypto
driver forgets to set the 'statesize' member, but writes to what seems
to be a valid pointer passed to its export function. Of course, this
won't completely stop stack smashing if the statesize member is
smaller than the data which the export function writes. This patch is
optional.

The second patch adds the necessary statesize members to the Marvell
code which were previously missing. Fixing this uncovered a further
problem, which the third patch addresses.

crypto/algif_hash.c | 6 +++++-
drivers/crypto/marvell/hash.c | 9 +++++++++
2 files changed, 14 insertions(+), 1 deletion(-)

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


2015-10-09 10:29:50

by Russell King

[permalink] [raw]
Subject: [PATCH 1/3] crypto: ensure algif_hash does not pass a zero-sized state

If the algorithm passed a zero statesize, do not pass a valid pointer
into the export/import functions. Passing a valid pointer covers up
bugs in driver code which then go on to smash the kernel stack.
Instead, pass NULL, which will cause any attempt to write to the
pointer to fail.

Signed-off-by: Russell King <[email protected]>
---
crypto/algif_hash.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 1396ad0787fc..f450584cb940 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -177,12 +177,16 @@ static int hash_accept(struct socket *sock, struct socket *newsock, int flags)
struct alg_sock *ask = alg_sk(sk);
struct hash_ctx *ctx = ask->private;
struct ahash_request *req = &ctx->req;
- char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];
+ struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
+ unsigned int state_size = crypto_ahash_statesize(ahash);
+ char state_buf[state_size], *state;
struct sock *sk2;
struct alg_sock *ask2;
struct hash_ctx *ctx2;
int err;

+ state = state_size ? state_buf : NULL;
+
err = crypto_ahash_export(req, state);
if (err)
return err;
--
2.1.0

2015-10-09 10:29:54

by Russell King

[permalink] [raw]
Subject: [PATCH 2/3] crypto: marvell: fix stack smashing in marvell/hash.c

Several of the algorithms in marvell/hash.c have a statesize of zero.
When an AF_ALG accept() on an already-accepted file descriptor to
calls into hash_accept(), this causes:

char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];

to be zero-sized, but we still pass this to:

err = crypto_ahash_export(req, state);

which proceeds to write to 'state' as if it was a "struct md5_state",
"struct sha1_state" etc. Add the necessary initialisers for the
.statesize member.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index e8d0d7128137..a259aced3b42 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -870,6 +870,7 @@ struct ahash_alg mv_md5_alg = {
.import = mv_cesa_md5_import,
.halg = {
.digestsize = MD5_DIGEST_SIZE,
+ .statesize = sizeof(struct md5_state),
.base = {
.cra_name = "md5",
.cra_driver_name = "mv-md5",
@@ -959,6 +960,7 @@ struct ahash_alg mv_sha1_alg = {
.import = mv_cesa_sha1_import,
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
+ .statesize = sizeof(struct sha1_state),
.base = {
.cra_name = "sha1",
.cra_driver_name = "mv-sha1",
@@ -1048,6 +1050,7 @@ struct ahash_alg mv_sha256_alg = {
.import = mv_cesa_sha256_import,
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha256",
.cra_driver_name = "mv-sha256",
--
2.1.0

2015-10-09 10:29:59

by Russell King

[permalink] [raw]
Subject: [PATCH 3/3] crypto: marvell: initialise struct mv_cesa_ahash_req

When a AF_ALG fd is accepted a second time (hence hash_accept() is
used), hash_accept_parent() allocates a new private context using
sock_kmalloc(). This context is uninitialised. After use of the new
fd, we eventually end up with the kernel complaining:

marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)

where c0627770 is a random address. Poisoning the memory allocated by
the above sock_kmalloc() produces kernel oopses within the marvell hash
code, particularly the interrupt handling.

The following simplfied call sequence occurs:

hash_accept()
crypto_ahash_export()
marvell hash export function
af_alg_accept()
hash_accept_parent() <== allocates uninitialised struct hash_ctx
crypto_ahash_import()
marvell hash import function

hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
and, as the marvell hash import function only partially initialises
this structure, we end up with a lot of members which are left with
whatever data was in memory prior to sock_kmalloc().

Add zero-initialisation of this structure.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index a259aced3b42..699839816505 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -831,6 +831,8 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->byte_count;
memcpy(creq->state, in_state->hash, digsize);
creq->cache_ptr = 0;
@@ -921,6 +923,8 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
@@ -1022,6 +1026,8 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
--
2.1.0

2015-10-09 10:34:41

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, Oct 09, 2015 at 11:29:44AM +0100, Russell King wrote:
> If the algorithm passed a zero statesize, do not pass a valid pointer
> into the export/import functions. Passing a valid pointer covers up
> bugs in driver code which then go on to smash the kernel stack.
> Instead, pass NULL, which will cause any attempt to write to the
> pointer to fail.
>
> Signed-off-by: Russell King <[email protected]>

The state size should never be zero for a hash algorithm. Having
a zero state means that the hash output must always be identical.
Such an algorithm would be quite useless.

So how about adding a check upon hash registration to verify that
the state size is greater than zero? The place to do it would be
shash_prepare_alg and ahash_prepare_alg.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-09 10:41:16

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, Oct 09, 2015 at 06:34:28PM +0800, Herbert Xu wrote:
> On Fri, Oct 09, 2015 at 11:29:44AM +0100, Russell King wrote:
> > If the algorithm passed a zero statesize, do not pass a valid pointer
> > into the export/import functions. Passing a valid pointer covers up
> > bugs in driver code which then go on to smash the kernel stack.
> > Instead, pass NULL, which will cause any attempt to write to the
> > pointer to fail.
> >
> > Signed-off-by: Russell King <[email protected]>
>
> The state size should never be zero for a hash algorithm. Having
> a zero state means that the hash output must always be identical.
> Such an algorithm would be quite useless.
>
> So how about adding a check upon hash registration to verify that
> the state size is greater than zero? The place to do it would be
> shash_prepare_alg and ahash_prepare_alg.

Do you mean something like this? As statesize is an unsigned int, testing
for zero should be sufficient.

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 8acb886032ae..9c1dc8d6106a 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
struct crypto_alg *base = &alg->halg.base;

if (alg->halg.digestsize > PAGE_SIZE / 8 ||
- alg->halg.statesize > PAGE_SIZE / 8)
+ alg->halg.statesize > PAGE_SIZE / 8 ||
+ alg->halg.statesize == 0)
return -EINVAL;

base->cra_type = &crypto_ahash_type;
diff --git a/crypto/shash.c b/crypto/shash.c
index ecb1e3d39bf0..ab3384b38542 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -585,7 +585,8 @@ static int shash_prepare_alg(struct shash_alg *alg)

if (alg->digestsize > PAGE_SIZE / 8 ||
alg->descsize > PAGE_SIZE / 8 ||
- alg->statesize > PAGE_SIZE / 8)
+ alg->statesize > PAGE_SIZE / 8 ||
+ alg->statesize == 0)
return -EINVAL;

base->cra_type = &crypto_shash_type;

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 10:42:51

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, Oct 09, 2015 at 11:41:07AM +0100, Russell King - ARM Linux wrote:
>
> Do you mean something like this? As statesize is an unsigned int, testing
> for zero should be sufficient.

Yes looks great to me.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-09 10:46:44

by Russell King - ARM Linux

[permalink] [raw]
Subject: [PATCH v2 0/3] crypto: fixes for Marvell hash

Version 2, same as version 1 but with a different patch 1, thanks to
Herbert for an alternative approach on that one.

crypto/ahash.c | 3 ++-
crypto/shash.c | 3 ++-
drivers/crypto/marvell/hash.c | 9 +++++++++
3 files changed, 13 insertions(+), 2 deletions(-)

On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote:
> This small series of patches addresses oopses seen when trying to use
> the AF_ALG interface via openssl with openssh. This series does not
> address all problems, but merely stops the kernel from smashing its
> kernel stack and oopsing.
>
> With these fixes in place, the kernel no longer oopses. However, with
> the digests enabled in openssl, openssh refuses to work, producing the
> following when attempting to connect to the target system:
>
> Corrupted MAC on input.
> Disconnecting: Packet corrupt
>
> It's been hard enough to get this far; the crypto code is not the easiest
> code to debug for a new-comer due to the amount of state needed to be
> retained to understand the code (all the inline functions masking
> multiple levels of containerisation and pointer dereference does not
> make it easy to track what is stored where, and once I've been through
> one bit of code, I find I'm having to revisit the same piece of code a
> bit later to re-understand what it's doing.)
>
> It's been difficult enough to find the engine plugin for openssl - the
> original git repo which hosted it is now dead
> (http://src.carnivore.it/users/common/af_alg/). All that seems to be
> left is someone's modified version on github, which seems to get some
> maintanence. Debian doesn't seem to carry AF_ALG openssl support, and
> seems to only carry one package (strongswan) which supports this
> interface.
>
> Hence, I'm leaving further debugging to other parties, especially as
> the userspace tooling for the AF_ALG seems rather lacking. (Are there
> any test programs, if so, can their location be documented and placed
> in Documentation/crypto please?)
>
> I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> picked Thomas. It would be nice if there was an entry in MAINTAINERS
> for this driver.
>
> The first patch in this series avoids kernel stack smashing if a crypto
> driver forgets to set the 'statesize' member, but writes to what seems
> to be a valid pointer passed to its export function. Of course, this
> won't completely stop stack smashing if the statesize member is
> smaller than the data which the export function writes. This patch is
> optional.
>
> The second patch adds the necessary statesize members to the Marvell
> code which were previously missing. Fixing this uncovered a further
> problem, which the third patch addresses.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 10:48:48

by Russell King

[permalink] [raw]
Subject: [PATCH v2 1/3] crypto: ensure algif_hash does not pass a zero-sized state

If the algorithm passed a zero statesize, do not pass a valid pointer
into the export/import functions. Passing a valid pointer covers up
bugs in driver code which then go on to smash the kernel stack.
Instead, pass NULL, which will cause any attempt to write to the
pointer to fail.

Signed-off-by: Russell King <[email protected]>
---
crypto/ahash.c | 3 ++-
crypto/shash.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 8acb886032ae..9c1dc8d6106a 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
struct crypto_alg *base = &alg->halg.base;

if (alg->halg.digestsize > PAGE_SIZE / 8 ||
- alg->halg.statesize > PAGE_SIZE / 8)
+ alg->halg.statesize > PAGE_SIZE / 8 ||
+ alg->halg.statesize == 0)
return -EINVAL;

base->cra_type = &crypto_ahash_type;
diff --git a/crypto/shash.c b/crypto/shash.c
index ecb1e3d39bf0..ab3384b38542 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -585,7 +585,8 @@ static int shash_prepare_alg(struct shash_alg *alg)

if (alg->digestsize > PAGE_SIZE / 8 ||
alg->descsize > PAGE_SIZE / 8 ||
- alg->statesize > PAGE_SIZE / 8)
+ alg->statesize > PAGE_SIZE / 8 ||
+ alg->statesize == 0)
return -EINVAL;

base->cra_type = &crypto_shash_type;
--
2.1.0

2015-10-09 10:48:58

by Russell King

[permalink] [raw]
Subject: [PATCH v2 3/3] crypto: marvell: initialise struct mv_cesa_ahash_req

When a AF_ALG fd is accepted a second time (hence hash_accept() is
used), hash_accept_parent() allocates a new private context using
sock_kmalloc(). This context is uninitialised. After use of the new
fd, we eventually end up with the kernel complaining:

marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)

where c0627770 is a random address. Poisoning the memory allocated by
the above sock_kmalloc() produces kernel oopses within the marvell hash
code, particularly the interrupt handling.

The following simplfied call sequence occurs:

hash_accept()
crypto_ahash_export()
marvell hash export function
af_alg_accept()
hash_accept_parent() <== allocates uninitialised struct hash_ctx
crypto_ahash_import()
marvell hash import function

hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
and, as the marvell hash import function only partially initialises
this structure, we end up with a lot of members which are left with
whatever data was in memory prior to sock_kmalloc().

Add zero-initialisation of this structure.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index a259aced3b42..699839816505 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -831,6 +831,8 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->byte_count;
memcpy(creq->state, in_state->hash, digsize);
creq->cache_ptr = 0;
@@ -921,6 +923,8 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
@@ -1022,6 +1026,8 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
--
2.1.0

2015-10-09 10:48:53

by Russell King

[permalink] [raw]
Subject: [PATCH v2 2/3] crypto: marvell: fix stack smashing in marvell/hash.c

Several of the algorithms in marvell/hash.c have a statesize of zero.
When an AF_ALG accept() on an already-accepted file descriptor to
calls into hash_accept(), this causes:

char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];

to be zero-sized, but we still pass this to:

err = crypto_ahash_export(req, state);

which proceeds to write to 'state' as if it was a "struct md5_state",
"struct sha1_state" etc. Add the necessary initialisers for the
.statesize member.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index e8d0d7128137..a259aced3b42 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -870,6 +870,7 @@ struct ahash_alg mv_md5_alg = {
.import = mv_cesa_md5_import,
.halg = {
.digestsize = MD5_DIGEST_SIZE,
+ .statesize = sizeof(struct md5_state),
.base = {
.cra_name = "md5",
.cra_driver_name = "mv-md5",
@@ -959,6 +960,7 @@ struct ahash_alg mv_sha1_alg = {
.import = mv_cesa_sha1_import,
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
+ .statesize = sizeof(struct sha1_state),
.base = {
.cra_name = "sha1",
.cra_driver_name = "mv-sha1",
@@ -1048,6 +1050,7 @@ struct ahash_alg mv_sha256_alg = {
.import = mv_cesa_sha256_import,
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha256",
.cra_driver_name = "mv-sha256",
--
2.1.0

2015-10-09 12:12:45

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 0/3] crypto: fixes for Marvell hash

Hello Russell,

On Fri, 9 Oct 2015 11:29:05 +0100, Russell King - ARM Linux wrote:
> This small series of patches addresses oopses seen when trying to use
> the AF_ALG interface via openssl with openssh. This series does not
> address all problems, but merely stops the kernel from smashing its
> kernel stack and oopsing.

Thanks for debugging this and for the patches. However, could you Cc
the main authors of the driver:

Boris Brezillon <[email protected]>
Arnaud Ebalard <[email protected]>

> I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> picked Thomas. It would be nice if there was an entry in MAINTAINERS
> for this driver.

Indeed. However get_maintainer.pl gets this right:

$ ./scripts/get_maintainer.pl -f drivers/crypto/marvell/cesa.c
Herbert Xu <[email protected]> (maintainer:CRYPTO API,commit_signer:11/12=92%)
"David S. Miller" <[email protected]> (maintainer:CRYPTO API)
Boris BREZILLON <[email protected]> (commit_signer:11/12=92%,authored:6/12=50%,added_lines:538/558=96%,removed_lines:7/12=58%)
Arnaud Ebalard <[email protected]> (commit_signer:7/12=58%,authored:4/12=33%,removed_lines:1/12=8%)
Vladimir Zapolskiy <[email protected]> (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:3/12=25%)
Krzysztof Kozlowski <[email protected]> (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:1/12=8%)

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-10-09 12:31:40

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 0/3] crypto: fixes for Marvell hash

On Fri, Oct 09, 2015 at 02:12:43PM +0200, Thomas Petazzoni wrote:
> Hello Russell,
>
> On Fri, 9 Oct 2015 11:29:05 +0100, Russell King - ARM Linux wrote:
> > This small series of patches addresses oopses seen when trying to use
> > the AF_ALG interface via openssl with openssh. This series does not
> > address all problems, but merely stops the kernel from smashing its
> > kernel stack and oopsing.
>
> Thanks for debugging this and for the patches. However, could you Cc
> the main authors of the driver:
>
> Boris Brezillon <[email protected]>
> Arnaud Ebalard <[email protected]>

Hmm. You know, given how broken this is, I've a good mind to say that
this is more hassle than it's worth to get these simple fixes merged,
especially as there's no MAINTAINERS entry for them. I'm not sending
the patch series a third time. I could say if maintainers can't be
bothered to ensure that they're listed in MAINTAINERS then they aren't
maintainers at all.

If this means it won't be merged, so be it, I don't care enough given
that there's very little userspace support for kernel hw crypto.

> > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> > picked Thomas. It would be nice if there was an entry in MAINTAINERS
> > for this driver.
>
> Indeed. However get_maintainer.pl gets this right:

Try with --no-git, which is the only sane mode for using get_maintainer.pl.
Picking up everyone who's ever touched a driver with as little as a spelling
fix in a git commit is abhorrent.

> $ ./scripts/get_maintainer.pl -f drivers/crypto/marvell/cesa.c
> Herbert Xu <[email protected]> (maintainer:CRYPTO API,commit_signer:11/12=92%)
> "David S. Miller" <[email protected]> (maintainer:CRYPTO API)
> Boris BREZILLON <[email protected]> (commit_signer:11/12=92%,authored:6/12=50%,added_lines:538/558=96%,removed_lines:7/12=58%)
> Arnaud Ebalard <[email protected]> (commit_signer:7/12=58%,authored:4/12=33%,removed_lines:1/12=8%)
> Vladimir Zapolskiy <[email protected]> (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:3/12=25%)
> Krzysztof Kozlowski <[email protected]> (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:1/12=8%)
>
> Thomas
> --
> Thomas Petazzoni, CTO, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 12:40:58

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 0/3] crypto: fixes for Marvell hash

Hello,

On Fri, 9 Oct 2015 13:31:32 +0100, Russell King - ARM Linux wrote:

> > Thanks for debugging this and for the patches. However, could you Cc
> > the main authors of the driver:
> >
> > Boris Brezillon <[email protected]>
> > Arnaud Ebalard <[email protected]>
>
> Hmm. You know, given how broken this is, I've a good mind to say that
> this is more hassle than it's worth to get these simple fixes merged,
> especially as there's no MAINTAINERS entry for them. I'm not sending
> the patch series a third time. I could say if maintainers can't be
> bothered to ensure that they're listed in MAINTAINERS then they aren't
> maintainers at all.

I agree they should be list in MAINTAINERS. But rather than immediately
becoming nervous, maybe we can simply consider this as an oversight
that Boris and Arnaud will fix by sending a simple patch, and the
problem is done?

> If this means it won't be merged, so be it, I don't care enough given
> that there's very little userspace support for kernel hw crypto.

These fixes will definitely be merged. Boris and Arnaud are not stupid
to the point of rejecting patches simply because the submitter forgot
to Cc: them. The value of such fixes is higher than a silly procedural
discussion about Cc's.

> > Indeed. However get_maintainer.pl gets this right:
>
> Try with --no-git, which is the only sane mode for using get_maintainer.pl.
> Picking up everyone who's ever touched a driver with as little as a spelling
> fix in a git commit is abhorrent.

Agreed. I myself never used get_maintainer.pl for this reason :-)

Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

2015-10-09 12:43:11

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] crypto: fixes for Marvell hash

Thomas says you're the maintainers of the marvell crypto driver. I
picked sending these patches to Thomas because I couldn't work out
who should be receiving them. If you wish to be copied on patches,
please put yourselves in the MAINTAINERS file. In the mean time, you
can find the patches on the crypto list if you wish to review them.

Thanks.

On Fri, Oct 09, 2015 at 11:46:37AM +0100, Russell King - ARM Linux wrote:
> Version 2, same as version 1 but with a different patch 1, thanks to
> Herbert for an alternative approach on that one.
>
> crypto/ahash.c | 3 ++-
> crypto/shash.c | 3 ++-
> drivers/crypto/marvell/hash.c | 9 +++++++++
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote:
> > This small series of patches addresses oopses seen when trying to use
> > the AF_ALG interface via openssl with openssh. This series does not
> > address all problems, but merely stops the kernel from smashing its
> > kernel stack and oopsing.
> >
> > With these fixes in place, the kernel no longer oopses. However, with
> > the digests enabled in openssl, openssh refuses to work, producing the
> > following when attempting to connect to the target system:
> >
> > Corrupted MAC on input.
> > Disconnecting: Packet corrupt
> >
> > It's been hard enough to get this far; the crypto code is not the easiest
> > code to debug for a new-comer due to the amount of state needed to be
> > retained to understand the code (all the inline functions masking
> > multiple levels of containerisation and pointer dereference does not
> > make it easy to track what is stored where, and once I've been through
> > one bit of code, I find I'm having to revisit the same piece of code a
> > bit later to re-understand what it's doing.)
> >
> > It's been difficult enough to find the engine plugin for openssl - the
> > original git repo which hosted it is now dead
> > (http://src.carnivore.it/users/common/af_alg/). All that seems to be
> > left is someone's modified version on github, which seems to get some
> > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and
> > seems to only carry one package (strongswan) which supports this
> > interface.
> >
> > Hence, I'm leaving further debugging to other parties, especially as
> > the userspace tooling for the AF_ALG seems rather lacking. (Are there
> > any test programs, if so, can their location be documented and placed
> > in Documentation/crypto please?)
> >
> > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> > picked Thomas. It would be nice if there was an entry in MAINTAINERS
> > for this driver.
> >
> > The first patch in this series avoids kernel stack smashing if a crypto
> > driver forgets to set the 'statesize' member, but writes to what seems
> > to be a valid pointer passed to its export function. Of course, this
> > won't completely stop stack smashing if the statesize member is
> > smaller than the data which the export function writes. This patch is
> > optional.
> >
> > The second patch adds the necessary statesize members to the Marvell
> > code which were previously missing. Fixing this uncovered a further
> > problem, which the third patch addresses.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 14:35:59

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/3] crypto: fixes for Marvell hash

On Fri, Oct 09, 2015 at 01:31:32PM +0100, Russell King - ARM Linux wrote:
>
> Hmm. You know, given how broken this is, I've a good mind to say that
> this is more hassle than it's worth to get these simple fixes merged,
> especially as there's no MAINTAINERS entry for them. I'm not sending
> the patch series a third time. I could say if maintainers can't be
> bothered to ensure that they're listed in MAINTAINERS then they aren't
> maintainers at all.

There is no need to send them again Russell. They look good to me
and I'll apply them.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-09 16:12:19

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] crypto: fixes for Marvell hash

Hi Russel,

On Fri, 9 Oct 2015 11:46:37 +0100
Russell King - ARM Linux <[email protected]> (by way of Thomas
Petazzoni <[email protected]>) wrote:

> Version 2, same as version 1 but with a different patch 1, thanks to
> Herbert for an alternative approach on that one.
>
> crypto/ahash.c | 3 ++-
> crypto/shash.c | 3 ++-
> drivers/crypto/marvell/hash.c | 9 +++++++++
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote:
> > This small series of patches addresses oopses seen when trying to use
> > the AF_ALG interface via openssl with openssh. This series does not
> > address all problems, but merely stops the kernel from smashing its
> > kernel stack and oopsing.
> >
> > With these fixes in place, the kernel no longer oopses. However, with
> > the digests enabled in openssl, openssh refuses to work, producing the
> > following when attempting to connect to the target system:
> >
> > Corrupted MAC on input.
> > Disconnecting: Packet corrupt
> >
> > It's been hard enough to get this far; the crypto code is not the easiest
> > code to debug for a new-comer due to the amount of state needed to be
> > retained to understand the code (all the inline functions masking
> > multiple levels of containerisation and pointer dereference does not
> > make it easy to track what is stored where, and once I've been through
> > one bit of code, I find I'm having to revisit the same piece of code a
> > bit later to re-understand what it's doing.)

As said on IRC, I'm sorry that you had to debug this problem, and spent
so much time digging into the code, which I can't deny, is far from
simple.
I wish I had found a easier solution to support both DMA and non-DMA
operations (which is needed if we want to support all generations of
the CESA engine, but still support DMA access on newer versions).

If you see any solution to simplify the code or make it easily
debuggable I'm all ears.
I guess documenting the code and adding comments in tricky parts would
help a bit.
You suggested to avoid inline functions, which is also doable, though
I'm not sure the compiler won't decide to optimize those calls by
itself.

> >
> > It's been difficult enough to find the engine plugin for openssl - the
> > original git repo which hosted it is now dead
> > (http://src.carnivore.it/users/common/af_alg/). All that seems to be
> > left is someone's modified version on github, which seems to get some
> > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and
> > seems to only carry one package (strongswan) which supports this
> > interface.
> >
> > Hence, I'm leaving further debugging to other parties, especially as
> > the userspace tooling for the AF_ALG seems rather lacking. (Are there
> > any test programs, if so, can their location be documented and placed
> > in Documentation/crypto please?)
> >
> > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> > picked Thomas. It would be nice if there was an entry in MAINTAINERS
> > for this driver.

Should be addressed soon (I just acked Thomas patch adding me and
Arnaud as maintainers of this driver) ;-).

> >
> > The first patch in this series avoids kernel stack smashing if a crypto
> > driver forgets to set the 'statesize' member, but writes to what seems
> > to be a valid pointer passed to its export function. Of course, this
> > won't completely stop stack smashing if the statesize member is
> > smaller than the data which the export function writes. This patch is
> > optional.
> >
> > The second patch adds the necessary statesize members to the Marvell
> > code which were previously missing. Fixing this uncovered a further
> > problem, which the third patch addresses.
>

Thanks again for spending some of your time debugging the driver and for
providing those patches.

Best Regards,

Boris


--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 16:13:53

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] crypto: marvell: fix stack smashing in marvell/hash.c

On Fri, 09 Oct 2015 11:48:49 +0100
Russell King <[email protected]> (by way of Thomas Petazzoni
<[email protected]>) wrote:

> Several of the algorithms in marvell/hash.c have a statesize of zero.
> When an AF_ALG accept() on an already-accepted file descriptor to
> calls into hash_accept(), this causes:
>
> char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];
>
> to be zero-sized, but we still pass this to:
>
> err = crypto_ahash_export(req, state);
>
> which proceeds to write to 'state' as if it was a "struct md5_state",
> "struct sha1_state" etc. Add the necessary initialisers for the
> .statesize member.
>
> Signed-off-by: Russell King <[email protected]>

Acked-by: Boris Brezillon <[email protected]>

> ---
> drivers/crypto/marvell/hash.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index e8d0d7128137..a259aced3b42 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -870,6 +870,7 @@ struct ahash_alg mv_md5_alg = {
> .import = mv_cesa_md5_import,
> .halg = {
> .digestsize = MD5_DIGEST_SIZE,
> + .statesize = sizeof(struct md5_state),
> .base = {
> .cra_name = "md5",
> .cra_driver_name = "mv-md5",
> @@ -959,6 +960,7 @@ struct ahash_alg mv_sha1_alg = {
> .import = mv_cesa_sha1_import,
> .halg = {
> .digestsize = SHA1_DIGEST_SIZE,
> + .statesize = sizeof(struct sha1_state),
> .base = {
> .cra_name = "sha1",
> .cra_driver_name = "mv-sha1",
> @@ -1048,6 +1050,7 @@ struct ahash_alg mv_sha256_alg = {
> .import = mv_cesa_sha256_import,
> .halg = {
> .digestsize = SHA256_DIGEST_SIZE,
> + .statesize = sizeof(struct sha256_state),
> .base = {
> .cra_name = "sha256",
> .cra_driver_name = "mv-sha256",



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 16:15:43

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] crypto: marvell: initialise struct mv_cesa_ahash_req

On Fri, 09 Oct 2015 11:48:54 +0100
Russell King <[email protected]> (by way of Thomas Petazzoni
<[email protected]>) wrote:

> When a AF_ALG fd is accepted a second time (hence hash_accept() is
> used), hash_accept_parent() allocates a new private context using
> sock_kmalloc(). This context is uninitialised. After use of the new
> fd, we eventually end up with the kernel complaining:
>
> marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)
>
> where c0627770 is a random address. Poisoning the memory allocated by
> the above sock_kmalloc() produces kernel oopses within the marvell hash
> code, particularly the interrupt handling.
>
> The following simplfied call sequence occurs:
>
> hash_accept()
> crypto_ahash_export()
> marvell hash export function
> af_alg_accept()
> hash_accept_parent() <== allocates uninitialised struct hash_ctx
> crypto_ahash_import()
> marvell hash import function
>
> hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
> and, as the marvell hash import function only partially initialises
> this structure, we end up with a lot of members which are left with
> whatever data was in memory prior to sock_kmalloc().
>
> Add zero-initialisation of this structure.
>
> Signed-off-by: Russell King <[email protected]>

Acked-by: Boris Brezillon <[email protected]>

> ---
> drivers/crypto/marvell/hash.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index a259aced3b42..699839816505 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -831,6 +831,8 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + memset(creq, 0, sizeof(*creq));
> +
> creq->len = in_state->byte_count;
> memcpy(creq->state, in_state->hash, digsize);
> creq->cache_ptr = 0;
> @@ -921,6 +923,8 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + memset(creq, 0, sizeof(*creq));
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;
> @@ -1022,6 +1026,8 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + memset(creq, 0, sizeof(*creq));
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 19:43:28

by Russell King - ARM Linux

[permalink] [raw]
Subject: [PATCH v3 0/5] crypto: fixes for Marvell hash

After all, here's a version 3, which fixes the other bug I mentioned
below - we now have correct hash results. A couple of patches have
been added, one to fix that, and a second patch to factor out the
duplication in the import functions. This gets openssh working with
the digests enabled.

Acks so far received have been added. Patch 3 has changed, so I didn't
add the ack for the previous version. Please re-review patch 3.

Thanks.

crypto/ahash.c | 3 +-
crypto/shash.c | 3 +-
drivers/crypto/marvell/hash.c | 80 +++++++++++++++++--------------------------
3 files changed, 36 insertions(+), 50 deletions(-)

On Fri, Oct 09, 2015 at 11:46:37AM +0100, Russell King - ARM Linux wrote:
> Version 2, same as version 1 but with a different patch 1, thanks to
> Herbert for an alternative approach on that one.
>
> crypto/ahash.c | 3 ++-
> crypto/shash.c | 3 ++-
> drivers/crypto/marvell/hash.c | 9 +++++++++
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote:
> > This small series of patches addresses oopses seen when trying to use
> > the AF_ALG interface via openssl with openssh. This series does not
> > address all problems, but merely stops the kernel from smashing its
> > kernel stack and oopsing.
> >
> > With these fixes in place, the kernel no longer oopses. However, with
> > the digests enabled in openssl, openssh refuses to work, producing the
> > following when attempting to connect to the target system:
> >
> > Corrupted MAC on input.
> > Disconnecting: Packet corrupt
> >
> > It's been hard enough to get this far; the crypto code is not the easiest
> > code to debug for a new-comer due to the amount of state needed to be
> > retained to understand the code (all the inline functions masking
> > multiple levels of containerisation and pointer dereference does not
> > make it easy to track what is stored where, and once I've been through
> > one bit of code, I find I'm having to revisit the same piece of code a
> > bit later to re-understand what it's doing.)
> >
> > It's been difficult enough to find the engine plugin for openssl - the
> > original git repo which hosted it is now dead
> > (http://src.carnivore.it/users/common/af_alg/). All that seems to be
> > left is someone's modified version on github, which seems to get some
> > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and
> > seems to only carry one package (strongswan) which supports this
> > interface.
> >
> > Hence, I'm leaving further debugging to other parties, especially as
> > the userspace tooling for the AF_ALG seems rather lacking. (Are there
> > any test programs, if so, can their location be documented and placed
> > in Documentation/crypto please?)
> >
> > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've
> > picked Thomas. It would be nice if there was an entry in MAINTAINERS
> > for this driver.
> >
> > The first patch in this series avoids kernel stack smashing if a crypto
> > driver forgets to set the 'statesize' member, but writes to what seems
> > to be a valid pointer passed to its export function. Of course, this
> > won't completely stop stack smashing if the statesize member is
> > smaller than the data which the export function writes. This patch is
> > optional.
> >
> > The second patch adds the necessary statesize members to the Marvell
> > code which were previously missing. Fixing this uncovered a further
> > problem, which the third patch addresses.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 19:43:41

by Russell King

[permalink] [raw]
Subject: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

If the algorithm passed a zero statesize, do not pass a valid pointer
into the export/import functions. Passing a valid pointer covers up
bugs in driver code which then go on to smash the kernel stack.
Instead, pass NULL, which will cause any attempt to write to the
pointer to fail.

Signed-off-by: Russell King <[email protected]>
---
crypto/ahash.c | 3 ++-
crypto/shash.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 8acb886032ae..9c1dc8d6106a 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
struct crypto_alg *base = &alg->halg.base;

if (alg->halg.digestsize > PAGE_SIZE / 8 ||
- alg->halg.statesize > PAGE_SIZE / 8)
+ alg->halg.statesize > PAGE_SIZE / 8 ||
+ alg->halg.statesize == 0)
return -EINVAL;

base->cra_type = &crypto_ahash_type;
diff --git a/crypto/shash.c b/crypto/shash.c
index ecb1e3d39bf0..ab3384b38542 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -585,7 +585,8 @@ static int shash_prepare_alg(struct shash_alg *alg)

if (alg->digestsize > PAGE_SIZE / 8 ||
alg->descsize > PAGE_SIZE / 8 ||
- alg->statesize > PAGE_SIZE / 8)
+ alg->statesize > PAGE_SIZE / 8 ||
+ alg->statesize == 0)
return -EINVAL;

base->cra_type = &crypto_shash_type;
--
2.1.0

2015-10-09 19:43:48

by Russell King

[permalink] [raw]
Subject: [PATCH v3 2/5] crypto: marvell: fix stack smashing in marvell/hash.c

Several of the algorithms in marvell/hash.c have a statesize of zero.
When an AF_ALG accept() on an already-accepted file descriptor to
calls into hash_accept(), this causes:

char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];

to be zero-sized, but we still pass this to:

err = crypto_ahash_export(req, state);

which proceeds to write to 'state' as if it was a "struct md5_state",
"struct sha1_state" etc. Add the necessary initialisers for the
.statesize member.

Acked-by: Boris Brezillon <[email protected]>
Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index e8d0d7128137..a259aced3b42 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -870,6 +870,7 @@ struct ahash_alg mv_md5_alg = {
.import = mv_cesa_md5_import,
.halg = {
.digestsize = MD5_DIGEST_SIZE,
+ .statesize = sizeof(struct md5_state),
.base = {
.cra_name = "md5",
.cra_driver_name = "mv-md5",
@@ -959,6 +960,7 @@ struct ahash_alg mv_sha1_alg = {
.import = mv_cesa_sha1_import,
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
+ .statesize = sizeof(struct sha1_state),
.base = {
.cra_name = "sha1",
.cra_driver_name = "mv-sha1",
@@ -1048,6 +1050,7 @@ struct ahash_alg mv_sha256_alg = {
.import = mv_cesa_sha256_import,
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha256",
.cra_driver_name = "mv-sha256",
--
2.1.0

2015-10-09 19:43:59

by Russell King

[permalink] [raw]
Subject: [PATCH v3 4/5] crypto: marvell: fix wrong hash results

Attempting to use the sha1 digest for openssh via openssl reveals that
the result from the hash is wrong: this happens when we export the
state from one socket and import it into another via calling accept().

The reason for this is because the operation is reset to "initial block"
state, whereas we may be past the first fragment of data to be hashed.

Arrange for the operation code to avoid the initialisation of the state,
thereby preserving the imported state.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 458867ce9515..b7c2c05f1a01 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -835,6 +835,11 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

+ if (in_state->byte_count >= sizeof(in_state->block))
+ mv_cesa_update_op_cfg(&creq->op_tmpl,
+ CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
creq->len = in_state->byte_count;
memcpy(creq->state, in_state->hash, digsize);
creq->cache_ptr = 0;
@@ -929,6 +934,11 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

+ if (in_state->count >= SHA1_BLOCK_SIZE)
+ mv_cesa_update_op_cfg(&creq->op_tmpl,
+ CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
@@ -1034,6 +1044,11 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

+ if (in_state->count >= SHA256_BLOCK_SIZE)
+ mv_cesa_update_op_cfg(&creq->op_tmpl,
+ CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
--
2.1.0

2015-10-09 19:43:54

by Russell King

[permalink] [raw]
Subject: [PATCH v3 3/5] crypto: marvell: initialise struct mv_cesa_ahash_req

When a AF_ALG fd is accepted a second time (hence hash_accept() is
used), hash_accept_parent() allocates a new private context using
sock_kmalloc(). This context is uninitialised. After use of the new
fd, we eventually end up with the kernel complaining:

marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)

where c0627770 is a random address. Poisoning the memory allocated by
the above sock_kmalloc() produces kernel oopses within the marvell hash
code, particularly the interrupt handling.

The following simplfied call sequence occurs:

hash_accept()
crypto_ahash_export()
marvell hash export function
af_alg_accept()
hash_accept_parent() <== allocates uninitialised struct hash_ctx
crypto_ahash_import()
marvell hash import function

hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
and, as the marvell hash import function only partially initialises
this structure, we end up with a lot of members which are left with
whatever data was in memory prior to sock_kmalloc().

Add zero-initialisation of this structure.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index a259aced3b42..458867ce9515 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -831,6 +831,10 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ ret = crypto_ahash_init(req);
+ if (ret)
+ return ret;
+
creq->len = in_state->byte_count;
memcpy(creq->state, in_state->hash, digsize);
creq->cache_ptr = 0;
@@ -921,6 +925,10 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ ret = crypto_ahash_init(req);
+ if (ret)
+ return ret;
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
@@ -1022,6 +1030,10 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
unsigned int cache_ptr;
int ret;

+ ret = crypto_ahash_init(req);
+ if (ret)
+ return ret;
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
--
2.1.0

2015-10-09 19:44:04

by Russell King

[permalink] [raw]
Subject: [PATCH v3 5/5] crypto: marvell: factor out common import functions

As all the import functions are virtually identical, factor out their
common parts into a generic mv_cesa_import(). This performs the
actual import, and we pass the data pointers and current length into
this function.

We have to switch a % const operation to do_div() to avoid provoking
gcc to use the expensive 64-bit by 64-bit modulus operation.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 88 +++++++++++--------------------------------
1 file changed, 21 insertions(+), 67 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b7c2c05f1a01..4f328b2a3a22 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -822,12 +822,13 @@ static int mv_cesa_md5_export(struct ahash_request *req, void *out)
return 0;
}

-static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
+static int mv_cesa_import(struct ahash_request *req, const void *hash, u64 len,
+ const void *cache)
{
- const struct md5_state *in_state = in;
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
unsigned int digsize = crypto_ahash_digestsize(ahash);
+ unsigned int blocksize;
unsigned int cache_ptr;
int ret;

@@ -835,16 +836,17 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

- if (in_state->byte_count >= sizeof(in_state->block))
+ blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
+ if (len >= blocksize)
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);

- creq->len = in_state->byte_count;
- memcpy(creq->state, in_state->hash, digsize);
+ creq->len = len;
+ memcpy(creq->state, hash, digsize);
creq->cache_ptr = 0;

- cache_ptr = creq->len % sizeof(in_state->block);
+ cache_ptr = do_div(len, blocksize);
if (!cache_ptr)
return 0;

@@ -852,12 +854,20 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

- memcpy(creq->cache, in_state->block, cache_ptr);
+ memcpy(creq->cache, cache, cache_ptr);
creq->cache_ptr = cache_ptr;

return 0;
}

+static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
+{
+ const struct md5_state *in_state = in;
+
+ return mv_cesa_import(req, in_state->hash, in_state->byte_count,
+ in_state->block);
+}
+
static int mv_cesa_md5_digest(struct ahash_request *req)
{
int ret;
@@ -924,37 +934,9 @@ static int mv_cesa_sha1_export(struct ahash_request *req, void *out)
static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
{
const struct sha1_state *in_state = in;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int digsize = crypto_ahash_digestsize(ahash);
- unsigned int cache_ptr;
- int ret;
-
- ret = crypto_ahash_init(req);
- if (ret)
- return ret;
-
- if (in_state->count >= SHA1_BLOCK_SIZE)
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);

- creq->len = in_state->count;
- memcpy(creq->state, in_state->state, digsize);
- creq->cache_ptr = 0;
-
- cache_ptr = creq->len % SHA1_BLOCK_SIZE;
- if (!cache_ptr)
- return 0;
-
- ret = mv_cesa_ahash_alloc_cache(req);
- if (ret)
- return ret;
-
- memcpy(creq->cache, in_state->buffer, cache_ptr);
- creq->cache_ptr = cache_ptr;
-
- return 0;
+ return mv_cesa_import(req, in_state->state, in_state->count,
+ in_state->buffer);
}

static int mv_cesa_sha1_digest(struct ahash_request *req)
@@ -1034,37 +1016,9 @@ static int mv_cesa_sha256_export(struct ahash_request *req, void *out)
static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
{
const struct sha256_state *in_state = in;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int digsize = crypto_ahash_digestsize(ahash);
- unsigned int cache_ptr;
- int ret;

- ret = crypto_ahash_init(req);
- if (ret)
- return ret;
-
- if (in_state->count >= SHA256_BLOCK_SIZE)
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
- creq->len = in_state->count;
- memcpy(creq->state, in_state->state, digsize);
- creq->cache_ptr = 0;
-
- cache_ptr = creq->len % SHA256_BLOCK_SIZE;
- if (!cache_ptr)
- return 0;
-
- ret = mv_cesa_ahash_alloc_cache(req);
- if (ret)
- return ret;
-
- memcpy(creq->cache, in_state->buf, cache_ptr);
- creq->cache_ptr = cache_ptr;
-
- return 0;
+ return mv_cesa_import(req, in_state->state, in_state->count,
+ in_state->buf);
}

struct ahash_alg mv_sha256_alg = {
--
2.1.0

2015-10-09 19:50:23

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] crypto: marvell: initialise struct mv_cesa_ahash_req

Hi Russel,

On Fri, 09 Oct 2015 20:43:43 +0100
Russell King <[email protected]> wrote:

> When a AF_ALG fd is accepted a second time (hence hash_accept() is
> used), hash_accept_parent() allocates a new private context using
> sock_kmalloc(). This context is uninitialised. After use of the new
> fd, we eventually end up with the kernel complaining:
>
> marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)
>
> where c0627770 is a random address. Poisoning the memory allocated by
> the above sock_kmalloc() produces kernel oopses within the marvell hash
> code, particularly the interrupt handling.
>
> The following simplfied call sequence occurs:
>
> hash_accept()
> crypto_ahash_export()
> marvell hash export function
> af_alg_accept()
> hash_accept_parent() <== allocates uninitialised struct hash_ctx
> crypto_ahash_import()
> marvell hash import function
>
> hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
> and, as the marvell hash import function only partially initialises
> this structure, we end up with a lot of members which are left with
> whatever data was in memory prior to sock_kmalloc().
>
> Add zero-initialisation of this structure.

Maybe you should also change your commit message since this patch no
longer initializes the req struct to zero, otherwise

Acked-by: Boris Brezillon <[email protected]>

>
> Signed-off-by: Russell King <[email protected]>
> ---
> drivers/crypto/marvell/hash.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index a259aced3b42..458867ce9515 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -831,6 +831,10 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + ret = crypto_ahash_init(req);
> + if (ret)
> + return ret;
> +
> creq->len = in_state->byte_count;
> memcpy(creq->state, in_state->hash, digsize);
> creq->cache_ptr = 0;
> @@ -921,6 +925,10 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + ret = crypto_ahash_init(req);
> + if (ret)
> + return ret;
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;
> @@ -1022,6 +1030,10 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> unsigned int cache_ptr;
> int ret;
>
> + ret = crypto_ahash_init(req);
> + if (ret)
> + return ret;
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 19:52:01

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] crypto: marvell: fix wrong hash results

On Fri, 09 Oct 2015 20:43:48 +0100
Russell King <[email protected]> wrote:

> Attempting to use the sha1 digest for openssh via openssl reveals that
> the result from the hash is wrong: this happens when we export the
> state from one socket and import it into another via calling accept().
>
> The reason for this is because the operation is reset to "initial block"
> state, whereas we may be past the first fragment of data to be hashed.
>
> Arrange for the operation code to avoid the initialisation of the state,
> thereby preserving the imported state.
>
> Signed-off-by: Russell King <[email protected]>

Acked-by: Boris Brezillon <[email protected]>

Thanks!

> ---
> drivers/crypto/marvell/hash.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index 458867ce9515..b7c2c05f1a01 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -835,6 +835,11 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> + if (in_state->byte_count >= sizeof(in_state->block))
> + mv_cesa_update_op_cfg(&creq->op_tmpl,
> + CESA_SA_DESC_CFG_MID_FRAG,
> + CESA_SA_DESC_CFG_FRAG_MSK);
> +
> creq->len = in_state->byte_count;
> memcpy(creq->state, in_state->hash, digsize);
> creq->cache_ptr = 0;
> @@ -929,6 +934,11 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> + if (in_state->count >= SHA1_BLOCK_SIZE)
> + mv_cesa_update_op_cfg(&creq->op_tmpl,
> + CESA_SA_DESC_CFG_MID_FRAG,
> + CESA_SA_DESC_CFG_FRAG_MSK);
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;
> @@ -1034,6 +1044,11 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> + if (in_state->count >= SHA256_BLOCK_SIZE)
> + mv_cesa_update_op_cfg(&creq->op_tmpl,
> + CESA_SA_DESC_CFG_MID_FRAG,
> + CESA_SA_DESC_CFG_FRAG_MSK);
> +
> creq->len = in_state->count;
> memcpy(creq->state, in_state->state, digsize);
> creq->cache_ptr = 0;



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 19:52:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] crypto: marvell: initialise struct mv_cesa_ahash_req

On Fri, Oct 09, 2015 at 09:50:18PM +0200, Boris Brezillon wrote:
> Hi Russel,
>
> On Fri, 09 Oct 2015 20:43:43 +0100
> Russell King <[email protected]> wrote:
>
> > When a AF_ALG fd is accepted a second time (hence hash_accept() is
> > used), hash_accept_parent() allocates a new private context using
> > sock_kmalloc(). This context is uninitialised. After use of the new
> > fd, we eventually end up with the kernel complaining:
> >
> > marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)
> >
> > where c0627770 is a random address. Poisoning the memory allocated by
> > the above sock_kmalloc() produces kernel oopses within the marvell hash
> > code, particularly the interrupt handling.
> >
> > The following simplfied call sequence occurs:
> >
> > hash_accept()
> > crypto_ahash_export()
> > marvell hash export function
> > af_alg_accept()
> > hash_accept_parent() <== allocates uninitialised struct hash_ctx
> > crypto_ahash_import()
> > marvell hash import function
> >
> > hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
> > and, as the marvell hash import function only partially initialises
> > this structure, we end up with a lot of members which are left with
> > whatever data was in memory prior to sock_kmalloc().
> >
> > Add zero-initialisation of this structure.
>
> Maybe you should also change your commit message since this patch no
> longer initializes the req struct to zero, otherwise

Oops. s/zero-// :)

> Acked-by: Boris Brezillon <[email protected]>
>
> >
> > Signed-off-by: Russell King <[email protected]>
> > ---
> > drivers/crypto/marvell/hash.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> > index a259aced3b42..458867ce9515 100644
> > --- a/drivers/crypto/marvell/hash.c
> > +++ b/drivers/crypto/marvell/hash.c
> > @@ -831,6 +831,10 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> > unsigned int cache_ptr;
> > int ret;
> >
> > + ret = crypto_ahash_init(req);
> > + if (ret)
> > + return ret;
> > +
> > creq->len = in_state->byte_count;
> > memcpy(creq->state, in_state->hash, digsize);
> > creq->cache_ptr = 0;
> > @@ -921,6 +925,10 @@ static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> > unsigned int cache_ptr;
> > int ret;
> >
> > + ret = crypto_ahash_init(req);
> > + if (ret)
> > + return ret;
> > +
> > creq->len = in_state->count;
> > memcpy(creq->state, in_state->state, digsize);
> > creq->cache_ptr = 0;
> > @@ -1022,6 +1030,10 @@ static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> > unsigned int cache_ptr;
> > int ret;
> >
> > + ret = crypto_ahash_init(req);
> > + if (ret)
> > + return ret;
> > +
> > creq->len = in_state->count;
> > memcpy(creq->state, in_state->state, digsize);
> > creq->cache_ptr = 0;
>
>
>
> --
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-09 19:55:08

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 5/5] crypto: marvell: factor out common import functions

On Fri, 09 Oct 2015 20:43:54 +0100
Russell King <[email protected]> wrote:

> As all the import functions are virtually identical, factor out their
> common parts into a generic mv_cesa_import(). This performs the
> actual import, and we pass the data pointers and current length into
> this function.

Nice factorization indeed. Actually I think this can be done in other
places of this driver, but that's another story.

>
> We have to switch a % const operation to do_div() to avoid provoking
> gcc to use the expensive 64-bit by 64-bit modulus operation.
>
> Signed-off-by: Russell King <[email protected]>

Acked-by: Boris Brezillon <[email protected]>

Thanks.

> ---
> drivers/crypto/marvell/hash.c | 88 +++++++++++--------------------------------
> 1 file changed, 21 insertions(+), 67 deletions(-)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index b7c2c05f1a01..4f328b2a3a22 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -822,12 +822,13 @@ static int mv_cesa_md5_export(struct ahash_request *req, void *out)
> return 0;
> }
>
> -static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> +static int mv_cesa_import(struct ahash_request *req, const void *hash, u64 len,
> + const void *cache)
> {
> - const struct md5_state *in_state = in;
> struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> unsigned int digsize = crypto_ahash_digestsize(ahash);
> + unsigned int blocksize;
> unsigned int cache_ptr;
> int ret;
>
> @@ -835,16 +836,17 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> - if (in_state->byte_count >= sizeof(in_state->block))
> + blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
> + if (len >= blocksize)
> mv_cesa_update_op_cfg(&creq->op_tmpl,
> CESA_SA_DESC_CFG_MID_FRAG,
> CESA_SA_DESC_CFG_FRAG_MSK);
>
> - creq->len = in_state->byte_count;
> - memcpy(creq->state, in_state->hash, digsize);
> + creq->len = len;
> + memcpy(creq->state, hash, digsize);
> creq->cache_ptr = 0;
>
> - cache_ptr = creq->len % sizeof(in_state->block);
> + cache_ptr = do_div(len, blocksize);
> if (!cache_ptr)
> return 0;
>
> @@ -852,12 +854,20 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> - memcpy(creq->cache, in_state->block, cache_ptr);
> + memcpy(creq->cache, cache, cache_ptr);
> creq->cache_ptr = cache_ptr;
>
> return 0;
> }
>
> +static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> +{
> + const struct md5_state *in_state = in;
> +
> + return mv_cesa_import(req, in_state->hash, in_state->byte_count,
> + in_state->block);
> +}
> +
> static int mv_cesa_md5_digest(struct ahash_request *req)
> {
> int ret;
> @@ -924,37 +934,9 @@ static int mv_cesa_sha1_export(struct ahash_request *req, void *out)
> static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> {
> const struct sha1_state *in_state = in;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int digsize = crypto_ahash_digestsize(ahash);
> - unsigned int cache_ptr;
> - int ret;
> -
> - ret = crypto_ahash_init(req);
> - if (ret)
> - return ret;
> -
> - if (in_state->count >= SHA1_BLOCK_SIZE)
> - mv_cesa_update_op_cfg(&creq->op_tmpl,
> - CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
>
> - creq->len = in_state->count;
> - memcpy(creq->state, in_state->state, digsize);
> - creq->cache_ptr = 0;
> -
> - cache_ptr = creq->len % SHA1_BLOCK_SIZE;
> - if (!cache_ptr)
> - return 0;
> -
> - ret = mv_cesa_ahash_alloc_cache(req);
> - if (ret)
> - return ret;
> -
> - memcpy(creq->cache, in_state->buffer, cache_ptr);
> - creq->cache_ptr = cache_ptr;
> -
> - return 0;
> + return mv_cesa_import(req, in_state->state, in_state->count,
> + in_state->buffer);
> }
>
> static int mv_cesa_sha1_digest(struct ahash_request *req)
> @@ -1034,37 +1016,9 @@ static int mv_cesa_sha256_export(struct ahash_request *req, void *out)
> static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> {
> const struct sha256_state *in_state = in;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int digsize = crypto_ahash_digestsize(ahash);
> - unsigned int cache_ptr;
> - int ret;
>
> - ret = crypto_ahash_init(req);
> - if (ret)
> - return ret;
> -
> - if (in_state->count >= SHA256_BLOCK_SIZE)
> - mv_cesa_update_op_cfg(&creq->op_tmpl,
> - CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
> -
> - creq->len = in_state->count;
> - memcpy(creq->state, in_state->state, digsize);
> - creq->cache_ptr = 0;
> -
> - cache_ptr = creq->len % SHA256_BLOCK_SIZE;
> - if (!cache_ptr)
> - return 0;
> -
> - ret = mv_cesa_ahash_alloc_cache(req);
> - if (ret)
> - return ret;
> -
> - memcpy(creq->cache, in_state->buf, cache_ptr);
> - creq->cache_ptr = cache_ptr;
> -
> - return 0;
> + return mv_cesa_import(req, in_state->state, in_state->count,
> + in_state->buf);
> }
>
> struct ahash_alg mv_sha256_alg = {



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 19:57:28

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] crypto: fixes for Marvell hash

Russel, Herbert,

On Fri, 9 Oct 2015 20:43:10 +0100
Russell King - ARM Linux <[email protected]> wrote:

> After all, here's a version 3, which fixes the other bug I mentioned
> below - we now have correct hash results. A couple of patches have
> been added, one to fix that, and a second patch to factor out the
> duplication in the import functions. This gets openssh working with
> the digests enabled.
>
> Acks so far received have been added. Patch 3 has changed, so I didn't
> add the ack for the previous version. Please re-review patch 3.

Could we had the CC stable + Fixes tags to patches 1 to 4 so that they
can be applied on 4.2 too?

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 20:14:33

by Russell King

[permalink] [raw]
Subject: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

As all the import functions and export functions are virtually
identical, factor out their common parts into a generic
mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
performs the actual import or export, and we pass the data pointers and
length into these functions.

We have to switch a % const operation to do_div() in the common import
function to avoid provoking gcc to use the expensive 64-bit by 64-bit
modulus operation.

Signed-off-by: Russell King <[email protected]>
---
Replacement to patch 5, going a little further, as requested by Boris.

drivers/crypto/marvell/hash.c | 157 ++++++++++++++----------------------------
1 file changed, 53 insertions(+), 104 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b7c2c05f1a01..a24ceda7acfe 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -795,39 +795,32 @@ static int mv_cesa_ahash_finup(struct ahash_request *req)
return ret;
}

-static int mv_cesa_md5_init(struct ahash_request *req)
+static int mv_cesa_ahash_export(struct ahash_request *req, void *hash,
+ u64 *len, void *cache)
{
- struct mv_cesa_op_ctx tmpl;
-
- mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
-
- mv_cesa_ahash_init(req, &tmpl);
-
- return 0;
-}
-
-static int mv_cesa_md5_export(struct ahash_request *req, void *out)
-{
- struct md5_state *out_state = out;
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
unsigned int digsize = crypto_ahash_digestsize(ahash);
+ unsigned int blocksize;
+
+ blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));

- out_state->byte_count = creq->len;
- memcpy(out_state->hash, creq->state, digsize);
- memset(out_state->block, 0, sizeof(out_state->block));
+ *len = creq->len;
+ memcpy(hash, creq->state, digsize);
+ memset(cache, 0, blocksize);
if (creq->cache)
- memcpy(out_state->block, creq->cache, creq->cache_ptr);
+ memcpy(cache, creq->cache, creq->cache_ptr);

return 0;
}

-static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
+static int mv_cesa_ahash_import(struct ahash_request *req, const void *hash,
+ u64 len, const void *cache)
{
- const struct md5_state *in_state = in;
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
unsigned int digsize = crypto_ahash_digestsize(ahash);
+ unsigned int blocksize;
unsigned int cache_ptr;
int ret;

@@ -835,16 +828,17 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

- if (in_state->byte_count >= sizeof(in_state->block))
+ blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
+ if (len >= blocksize)
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);

- creq->len = in_state->byte_count;
- memcpy(creq->state, in_state->hash, digsize);
+ creq->len = len;
+ memcpy(creq->state, hash, digsize);
creq->cache_ptr = 0;

- cache_ptr = creq->len % sizeof(in_state->block);
+ cache_ptr = do_div(len, blocksize);
if (!cache_ptr)
return 0;

@@ -852,12 +846,39 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
if (ret)
return ret;

- memcpy(creq->cache, in_state->block, cache_ptr);
+ memcpy(creq->cache, cache, cache_ptr);
creq->cache_ptr = cache_ptr;

return 0;
}

+static int mv_cesa_md5_init(struct ahash_request *req)
+{
+ struct mv_cesa_op_ctx tmpl;
+
+ mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
+
+ mv_cesa_ahash_init(req, &tmpl);
+
+ return 0;
+}
+
+static int mv_cesa_md5_export(struct ahash_request *req, void *out)
+{
+ struct md5_state *out_state = out;
+
+ return mv_cesa_ahash_export(req, out_state->hash,
+ &out_state->byte_count, out_state->block);
+}
+
+static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
+{
+ const struct md5_state *in_state = in;
+
+ return mv_cesa_ahash_import(req, in_state->hash, in_state->byte_count,
+ in_state->block);
+}
+
static int mv_cesa_md5_digest(struct ahash_request *req)
{
int ret;
@@ -908,53 +929,17 @@ static int mv_cesa_sha1_init(struct ahash_request *req)
static int mv_cesa_sha1_export(struct ahash_request *req, void *out)
{
struct sha1_state *out_state = out;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int digsize = crypto_ahash_digestsize(ahash);
-
- out_state->count = creq->len;
- memcpy(out_state->state, creq->state, digsize);
- memset(out_state->buffer, 0, sizeof(out_state->buffer));
- if (creq->cache)
- memcpy(out_state->buffer, creq->cache, creq->cache_ptr);

- return 0;
+ return mv_cesa_ahash_export(req, out_state->state, &out_state->count,
+ out_state->buffer);
}

static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
{
const struct sha1_state *in_state = in;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int digsize = crypto_ahash_digestsize(ahash);
- unsigned int cache_ptr;
- int ret;
-
- ret = crypto_ahash_init(req);
- if (ret)
- return ret;
-
- if (in_state->count >= SHA1_BLOCK_SIZE)
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
- creq->len = in_state->count;
- memcpy(creq->state, in_state->state, digsize);
- creq->cache_ptr = 0;
-
- cache_ptr = creq->len % SHA1_BLOCK_SIZE;
- if (!cache_ptr)
- return 0;
-
- ret = mv_cesa_ahash_alloc_cache(req);
- if (ret)
- return ret;
-
- memcpy(creq->cache, in_state->buffer, cache_ptr);
- creq->cache_ptr = cache_ptr;

- return 0;
+ return mv_cesa_ahash_import(req, in_state->state, in_state->count,
+ in_state->buffer);
}

static int mv_cesa_sha1_digest(struct ahash_request *req)
@@ -1018,53 +1003,17 @@ static int mv_cesa_sha256_digest(struct ahash_request *req)
static int mv_cesa_sha256_export(struct ahash_request *req, void *out)
{
struct sha256_state *out_state = out;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int ds = crypto_ahash_digestsize(ahash);

- out_state->count = creq->len;
- memcpy(out_state->state, creq->state, ds);
- memset(out_state->buf, 0, sizeof(out_state->buf));
- if (creq->cache)
- memcpy(out_state->buf, creq->cache, creq->cache_ptr);
-
- return 0;
+ return mv_cesa_ahash_export(req, out_state->state, &out_state->count,
+ out_state->buf);
}

static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
{
const struct sha256_state *in_state = in;
- struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
- struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int digsize = crypto_ahash_digestsize(ahash);
- unsigned int cache_ptr;
- int ret;
-
- ret = crypto_ahash_init(req);
- if (ret)
- return ret;
-
- if (in_state->count >= SHA256_BLOCK_SIZE)
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);

- creq->len = in_state->count;
- memcpy(creq->state, in_state->state, digsize);
- creq->cache_ptr = 0;
-
- cache_ptr = creq->len % SHA256_BLOCK_SIZE;
- if (!cache_ptr)
- return 0;
-
- ret = mv_cesa_ahash_alloc_cache(req);
- if (ret)
- return ret;
-
- memcpy(creq->cache, in_state->buf, cache_ptr);
- creq->cache_ptr = cache_ptr;
-
- return 0;
+ return mv_cesa_ahash_import(req, in_state->state, in_state->count,
+ in_state->buf);
}

struct ahash_alg mv_sha256_alg = {
--
2.1.0

2015-10-09 20:19:50

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Fri, 09 Oct 2015 21:14:22 +0100
Russell King <[email protected]> wrote:

> As all the import functions and export functions are virtually
> identical, factor out their common parts into a generic
> mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
> performs the actual import or export, and we pass the data pointers and
> length into these functions.
>
> We have to switch a % const operation to do_div() in the common import
> function to avoid provoking gcc to use the expensive 64-bit by 64-bit
> modulus operation.
>
> Signed-off-by: Russell King <[email protected]>

Looks good to me.

Acked-by: Boris Brezillon <[email protected]>

Thanks again for taking care of that.

Regards,

Boris

> ---
> Replacement to patch 5, going a little further, as requested by Boris.
>
> drivers/crypto/marvell/hash.c | 157 ++++++++++++++----------------------------
> 1 file changed, 53 insertions(+), 104 deletions(-)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index b7c2c05f1a01..a24ceda7acfe 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -795,39 +795,32 @@ static int mv_cesa_ahash_finup(struct ahash_request *req)
> return ret;
> }
>
> -static int mv_cesa_md5_init(struct ahash_request *req)
> +static int mv_cesa_ahash_export(struct ahash_request *req, void *hash,
> + u64 *len, void *cache)
> {
> - struct mv_cesa_op_ctx tmpl;
> -
> - mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
> -
> - mv_cesa_ahash_init(req, &tmpl);
> -
> - return 0;
> -}
> -
> -static int mv_cesa_md5_export(struct ahash_request *req, void *out)
> -{
> - struct md5_state *out_state = out;
> struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> unsigned int digsize = crypto_ahash_digestsize(ahash);
> + unsigned int blocksize;
> +
> + blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
>
> - out_state->byte_count = creq->len;
> - memcpy(out_state->hash, creq->state, digsize);
> - memset(out_state->block, 0, sizeof(out_state->block));
> + *len = creq->len;
> + memcpy(hash, creq->state, digsize);
> + memset(cache, 0, blocksize);
> if (creq->cache)
> - memcpy(out_state->block, creq->cache, creq->cache_ptr);
> + memcpy(cache, creq->cache, creq->cache_ptr);
>
> return 0;
> }
>
> -static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> +static int mv_cesa_ahash_import(struct ahash_request *req, const void *hash,
> + u64 len, const void *cache)
> {
> - const struct md5_state *in_state = in;
> struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> unsigned int digsize = crypto_ahash_digestsize(ahash);
> + unsigned int blocksize;
> unsigned int cache_ptr;
> int ret;
>
> @@ -835,16 +828,17 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> - if (in_state->byte_count >= sizeof(in_state->block))
> + blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
> + if (len >= blocksize)
> mv_cesa_update_op_cfg(&creq->op_tmpl,
> CESA_SA_DESC_CFG_MID_FRAG,
> CESA_SA_DESC_CFG_FRAG_MSK);
>
> - creq->len = in_state->byte_count;
> - memcpy(creq->state, in_state->hash, digsize);
> + creq->len = len;
> + memcpy(creq->state, hash, digsize);
> creq->cache_ptr = 0;
>
> - cache_ptr = creq->len % sizeof(in_state->block);
> + cache_ptr = do_div(len, blocksize);
> if (!cache_ptr)
> return 0;
>
> @@ -852,12 +846,39 @@ static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> if (ret)
> return ret;
>
> - memcpy(creq->cache, in_state->block, cache_ptr);
> + memcpy(creq->cache, cache, cache_ptr);
> creq->cache_ptr = cache_ptr;
>
> return 0;
> }
>
> +static int mv_cesa_md5_init(struct ahash_request *req)
> +{
> + struct mv_cesa_op_ctx tmpl;
> +
> + mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
> +
> + mv_cesa_ahash_init(req, &tmpl);
> +
> + return 0;
> +}
> +
> +static int mv_cesa_md5_export(struct ahash_request *req, void *out)
> +{
> + struct md5_state *out_state = out;
> +
> + return mv_cesa_ahash_export(req, out_state->hash,
> + &out_state->byte_count, out_state->block);
> +}
> +
> +static int mv_cesa_md5_import(struct ahash_request *req, const void *in)
> +{
> + const struct md5_state *in_state = in;
> +
> + return mv_cesa_ahash_import(req, in_state->hash, in_state->byte_count,
> + in_state->block);
> +}
> +
> static int mv_cesa_md5_digest(struct ahash_request *req)
> {
> int ret;
> @@ -908,53 +929,17 @@ static int mv_cesa_sha1_init(struct ahash_request *req)
> static int mv_cesa_sha1_export(struct ahash_request *req, void *out)
> {
> struct sha1_state *out_state = out;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int digsize = crypto_ahash_digestsize(ahash);
> -
> - out_state->count = creq->len;
> - memcpy(out_state->state, creq->state, digsize);
> - memset(out_state->buffer, 0, sizeof(out_state->buffer));
> - if (creq->cache)
> - memcpy(out_state->buffer, creq->cache, creq->cache_ptr);
>
> - return 0;
> + return mv_cesa_ahash_export(req, out_state->state, &out_state->count,
> + out_state->buffer);
> }
>
> static int mv_cesa_sha1_import(struct ahash_request *req, const void *in)
> {
> const struct sha1_state *in_state = in;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int digsize = crypto_ahash_digestsize(ahash);
> - unsigned int cache_ptr;
> - int ret;
> -
> - ret = crypto_ahash_init(req);
> - if (ret)
> - return ret;
> -
> - if (in_state->count >= SHA1_BLOCK_SIZE)
> - mv_cesa_update_op_cfg(&creq->op_tmpl,
> - CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
> -
> - creq->len = in_state->count;
> - memcpy(creq->state, in_state->state, digsize);
> - creq->cache_ptr = 0;
> -
> - cache_ptr = creq->len % SHA1_BLOCK_SIZE;
> - if (!cache_ptr)
> - return 0;
> -
> - ret = mv_cesa_ahash_alloc_cache(req);
> - if (ret)
> - return ret;
> -
> - memcpy(creq->cache, in_state->buffer, cache_ptr);
> - creq->cache_ptr = cache_ptr;
>
> - return 0;
> + return mv_cesa_ahash_import(req, in_state->state, in_state->count,
> + in_state->buffer);
> }
>
> static int mv_cesa_sha1_digest(struct ahash_request *req)
> @@ -1018,53 +1003,17 @@ static int mv_cesa_sha256_digest(struct ahash_request *req)
> static int mv_cesa_sha256_export(struct ahash_request *req, void *out)
> {
> struct sha256_state *out_state = out;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int ds = crypto_ahash_digestsize(ahash);
>
> - out_state->count = creq->len;
> - memcpy(out_state->state, creq->state, ds);
> - memset(out_state->buf, 0, sizeof(out_state->buf));
> - if (creq->cache)
> - memcpy(out_state->buf, creq->cache, creq->cache_ptr);
> -
> - return 0;
> + return mv_cesa_ahash_export(req, out_state->state, &out_state->count,
> + out_state->buf);
> }
>
> static int mv_cesa_sha256_import(struct ahash_request *req, const void *in)
> {
> const struct sha256_state *in_state = in;
> - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
> - struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
> - unsigned int digsize = crypto_ahash_digestsize(ahash);
> - unsigned int cache_ptr;
> - int ret;
> -
> - ret = crypto_ahash_init(req);
> - if (ret)
> - return ret;
> -
> - if (in_state->count >= SHA256_BLOCK_SIZE)
> - mv_cesa_update_op_cfg(&creq->op_tmpl,
> - CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
>
> - creq->len = in_state->count;
> - memcpy(creq->state, in_state->state, digsize);
> - creq->cache_ptr = 0;
> -
> - cache_ptr = creq->len % SHA256_BLOCK_SIZE;
> - if (!cache_ptr)
> - return 0;
> -
> - ret = mv_cesa_ahash_alloc_cache(req);
> - if (ret)
> - return ret;
> -
> - memcpy(creq->cache, in_state->buf, cache_ptr);
> - creq->cache_ptr = cache_ptr;
> -
> - return 0;
> + return mv_cesa_ahash_import(req, in_state->state, in_state->count,
> + in_state->buf);
> }
>
> struct ahash_alg mv_sha256_alg = {



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-09 22:37:50

by arno

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

Hi Russel,

Russell King <[email protected]> writes:

> As all the import functions and export functions are virtually
> identical, factor out their common parts into a generic
> mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
> performs the actual import or export, and we pass the data pointers and
> length into these functions.
>
> We have to switch a % const operation to do_div() in the common import
> function to avoid provoking gcc to use the expensive 64-bit by 64-bit
> modulus operation.
>
> Signed-off-by: Russell King <[email protected]>

Thanks for the refactoring and for the fixes. All patches look good to
me. Out of curiosity, can I ask what perf you get w/ openssh or openssl
using AF_ALG and the CESA?

Cheers,

a+

2015-10-09 23:51:47

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Sat, Oct 10, 2015 at 12:37:33AM +0200, Arnaud Ebalard wrote:
> Hi Russel,
>
> Russell King <[email protected]> writes:
>
> > As all the import functions and export functions are virtually
> > identical, factor out their common parts into a generic
> > mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
> > performs the actual import or export, and we pass the data pointers and
> > length into these functions.
> >
> > We have to switch a % const operation to do_div() in the common import
> > function to avoid provoking gcc to use the expensive 64-bit by 64-bit
> > modulus operation.
> >
> > Signed-off-by: Russell King <[email protected]>
>
> Thanks for the refactoring and for the fixes. All patches look good to
> me. Out of curiosity, can I ask what perf you get w/ openssh or openssl
> using AF_ALG and the CESA?

I would do, but it seems this AF_ALG plugin for openssl isn't
actually using it for encryption. When I try:

openssl speed -engine af_alg aes-128-cbc

I get results for using openssl's software implementation. If I do:

openssl speed -engine af_alg md5

then I get results from using the kernel's MD5. Hence, I think the
only thing that I think openssh is using it for is the digest stuff,
not the crypto itself. I can't be certain about that though.

I've tried debugging the af_alg engine plugin, but I'm not getting
very far (I'm not an openssl hacker!) I see it registering the
function to get the ciphers (via ENGINE_set_ciphers), and I see this
called several times, returning a list of NID_xxx values describing
the methods it supports, which includes aes-128-cbc. However,
unlike the equivalent digest function, I never see it called
requesting any of the ciphers. Maybe it's an openssl bug, or a
"feature" preventing hardware crypto? Maybe something is missing
from its initialisation? I've no idea yet. It seems I'm not alone
in this - this report from April 2015 is exactly what I'm seeing:

https://mta.openssl.org/pipermail/openssl-users/2015-April/001124.html

However, I'm coming to the conclusion that AF_ALG with openssl is a
dead project, and the only interface that everyone is using for that
is cryptodev - probably contary to Herbert and/or DaveM's wishes. For
example, the openwrt guys seem to only support cryptodev, according to
their wiki page on the subject of hardware crypto:

http://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators

Here's the references to code for AF_ALG with openssl I've found so far:

Original af_alg plugin (dead):

http://src.carnivore.it/users/common/af_alg/

3rd party "maintained" af_alg openssl plugin, derived from commit
1851bbb010c38878c83729be844f168192059189 in the above repo but with
no history:

https://github.com/RidgeRun/af-alg-rr

and that doesn't contain any changes to the C code originally committed.
Whether this C code contains changes or not is anyone's guess: there's
no way to refer back to the original repository.

Anyway, here's the digest results:

Software:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 13948.89k 42477.61k 104619.41k 165140.82k 199273.13k
sha1 13091.91k 36463.89k 75393.88k 103893.33k 117104.50k
sha256 13573.92k 30492.25k 52700.33k 64247.81k 68722.69k

Hardware:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 3964.55k 13782.11k 43181.71k 180263.38k 1446616.18k
sha1 4609.16k 8922.35k 35422.87k 333575.31k 2122547.20k
sha256 13519.62k 30484.10k 52547.47k 64285.21k 68530.60k

There's actually something suspicious while running these tests:

Doing md5 for 3s on 16 size blocks: 32212 md5's in 0.13s
Doing md5 for 3s on 64 size blocks: 23688 md5's in 0.11s
Doing md5 for 3s on 256 size blocks: 23615 md5's in 0.14s
Doing md5 for 3s on 1024 size blocks: 22885 md5's in 0.13s
Doing md5 for 3s on 8192 size blocks: 15893 md5's in 0.09s
Doing sha1 for 3s on 16 size blocks: 31688 sha1's in 0.11s
Doing sha1 for 3s on 64 size blocks: 23700 sha1's in 0.17s
Doing sha1 for 3s on 256 size blocks: 23523 sha1's in 0.17s
Doing sha1 for 3s on 1024 size blocks: 22803 sha1's in 0.07s
Doing sha1 for 3s on 8192 size blocks: 15546 sha1's in 0.06s
Doing sha256 for 3s on 16 size blocks: 2518030 sha256's in 2.98s
Doing sha256 for 3s on 64 size blocks: 1419416 sha256's in 2.98s
Doing sha256 for 3s on 256 size blocks: 613738 sha256's in 2.99s
Doing sha256 for 3s on 1024 size blocks: 187080 sha256's in 2.98s
Doing sha256 for 3s on 8192 size blocks: 25013 sha256's in 2.99s

from the hardware - note the "in" figures are rediculously low, yet
they do wait 3s for each test. Also, the sha256 results are close
enough to being the software version.

No ideas on any of this yet... but I'm not about to start digging in
the openssl code to try and work out what it's up to. As I say, I
think this is AF_ALG with openssl is a dead project.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-10 10:32:06

by arno

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

Hi Russel,

Russell King - ARM Linux <[email protected]> writes:

> On Sat, Oct 10, 2015 at 12:37:33AM +0200, Arnaud Ebalard wrote:
>> Hi Russel,
>>
>> Russell King <[email protected]> writes:
>>
>> > As all the import functions and export functions are virtually
>> > identical, factor out their common parts into a generic
>> > mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
>> > performs the actual import or export, and we pass the data pointers and
>> > length into these functions.
>> >
>> > We have to switch a % const operation to do_div() in the common import
>> > function to avoid provoking gcc to use the expensive 64-bit by 64-bit
>> > modulus operation.
>> >
>> > Signed-off-by: Russell King <[email protected]>
>>
>> Thanks for the refactoring and for the fixes. All patches look good to
>> me. Out of curiosity, can I ask what perf you get w/ openssh or openssl
>> using AF_ALG and the CESA?
>
> I would do, but it seems this AF_ALG plugin for openssl isn't
> actually using it for encryption. When I try:
>
> openssl speed -engine af_alg aes-128-cbc
>
> I get results for using openssl's software implementation. If I do:
>
> openssl speed -engine af_alg md5
>
> then I get results from using the kernel's MD5. Hence, I think the
> only thing that I think openssh is using it for is the digest stuff,
> not the crypto itself. I can't be certain about that though.
>
> I've tried debugging the af_alg engine plugin, but I'm not getting
> very far (I'm not an openssl hacker!) I see it registering the
> function to get the ciphers (via ENGINE_set_ciphers), and I see this
> called several times, returning a list of NID_xxx values describing
> the methods it supports, which includes aes-128-cbc. However,
> unlike the equivalent digest function, I never see it called
> requesting any of the ciphers. Maybe it's an openssl bug, or a
> "feature" preventing hardware crypto? Maybe something is missing
> from its initialisation? I've no idea yet. It seems I'm not alone
> in this - this report from April 2015 is exactly what I'm seeing:
>
> https://mta.openssl.org/pipermail/openssl-users/2015-April/001124.html
>
> However, I'm coming to the conclusion that AF_ALG with openssl is a
> dead project, and the only interface that everyone is using for that
> is cryptodev - probably contary to Herbert and/or DaveM's wishes. For
> example, the openwrt guys seem to only support cryptodev, according to
> their wiki page on the subject of hardware crypto:
>
> http://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators
>
> Here's the references to code for AF_ALG with openssl I've found so far:
>
> Original af_alg plugin (dead):
>
> http://src.carnivore.it/users/common/af_alg/
>
> 3rd party "maintained" af_alg openssl plugin, derived from commit
> 1851bbb010c38878c83729be844f168192059189 in the above repo but with
> no history:
>
> https://github.com/RidgeRun/af-alg-rr
>
> and that doesn't contain any changes to the C code originally committed.
> Whether this C code contains changes or not is anyone's guess: there's
> no way to refer back to the original repository.
>
> Anyway, here's the digest results:
>
> Software:
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> md5 13948.89k 42477.61k 104619.41k 165140.82k 199273.13k
> sha1 13091.91k 36463.89k 75393.88k 103893.33k 117104.50k
> sha256 13573.92k 30492.25k 52700.33k 64247.81k 68722.69k
>
> Hardware:
> The 'numbers' are in 1000s of bytes per second processed.
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> md5 3964.55k 13782.11k 43181.71k 180263.38k 1446616.18k
> sha1 4609.16k 8922.35k 35422.87k 333575.31k 2122547.20k
> sha256 13519.62k 30484.10k 52547.47k 64285.21k 68530.60k
>
> There's actually something suspicious while running these tests:
>
> Doing md5 for 3s on 16 size blocks: 32212 md5's in 0.13s
> Doing md5 for 3s on 64 size blocks: 23688 md5's in 0.11s
> Doing md5 for 3s on 256 size blocks: 23615 md5's in 0.14s
> Doing md5 for 3s on 1024 size blocks: 22885 md5's in 0.13s
> Doing md5 for 3s on 8192 size blocks: 15893 md5's in 0.09s
> Doing sha1 for 3s on 16 size blocks: 31688 sha1's in 0.11s
> Doing sha1 for 3s on 64 size blocks: 23700 sha1's in 0.17s
> Doing sha1 for 3s on 256 size blocks: 23523 sha1's in 0.17s
> Doing sha1 for 3s on 1024 size blocks: 22803 sha1's in 0.07s
> Doing sha1 for 3s on 8192 size blocks: 15546 sha1's in 0.06s
> Doing sha256 for 3s on 16 size blocks: 2518030 sha256's in 2.98s
> Doing sha256 for 3s on 64 size blocks: 1419416 sha256's in 2.98s
> Doing sha256 for 3s on 256 size blocks: 613738 sha256's in 2.99s
> Doing sha256 for 3s on 1024 size blocks: 187080 sha256's in 2.98s
> Doing sha256 for 3s on 8192 size blocks: 25013 sha256's in 2.99s
>
> from the hardware - note the "in" figures are rediculously low, yet
> they do wait 3s for each test. Also, the sha256 results are close
> enough to being the software version.
>
> No ideas on any of this yet... but I'm not about to start digging in
> the openssl code to try and work out what it's up to. As I say, I
> think this is AF_ALG with openssl is a dead project.

Thanks for the time you took to assemble the information in previous
email. Yesterday, when reading your patches, I ended up on [1], where
Marek (added him to Cc: list) basically has the same kind of conclusion
as yours, i.e. openssl w/ cryptodev is what currently works better even
if AF_ALG is the expected target for kernel to provide access to
hardware engines to userland apps.

I had a lot of performance results at various levels (tcrypt module on
variations of the drivers (tasklet, threaded irq, full polling, etc),
IPsec tunnel and transport mode through to see how it behaves w/ two
mvneta instances also eating CPU cycles for incoming/outgoing packets)
but those where done on an encryption use case. Some are provided
in [2]. In an early (read dirty) polling-based version of the driver,
the CESA on an Armada 370 (mirabox) was verified to be capable of near
100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current
version of the driver is not as good (say half that value) but it
behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s
between its two interfaces but when you mix both using IPsec in tunnel
mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I
think it's interesting to see where it ends up w/ the engine exposed to
userland consumers (e.g. sth like SSH).

I cannot promise a huge amount of time but I'll try and find some to
play w/ AF_ALG using openssl and CESA in the coming weeks.

Cheers,

a+

[1]: http://events.linuxfoundation.org/sites/events/files/slides/lcj-2014-crypto-user.pdf
[2]: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/336599.html

2015-10-10 11:29:45

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote:
> Hi Russel,
^

> Russell King - ARM Linux <[email protected]> writes:
> > Software:
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > md5 13948.89k 42477.61k 104619.41k 165140.82k 199273.13k
> > sha1 13091.91k 36463.89k 75393.88k 103893.33k 117104.50k
> > sha256 13573.92k 30492.25k 52700.33k 64247.81k 68722.69k
> >
> > Hardware:
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > md5 3964.55k 13782.11k 43181.71k 180263.38k 1446616.18k
> > sha1 4609.16k 8922.35k 35422.87k 333575.31k 2122547.20k
> > sha256 13519.62k 30484.10k 52547.47k 64285.21k 68530.60k

Okay, the reason for the difference in SHA256 speed is because the
"openssl speed" code *totally* *bypasses* the engine support, whereas
the md5 and sha1 do not. It even bypasses the normal method used to
get hold of the sha256 implementation (EVP_sha256), and goes straight
to using SHA256() directly in openssl/crypto/sha/sha256.c. It looks
like the same goes for the AES tests too.

> I had a lot of performance results at various levels (tcrypt module on
> variations of the drivers (tasklet, threaded irq, full polling, etc),
> IPsec tunnel and transport mode through to see how it behaves w/ two
> mvneta instances also eating CPU cycles for incoming/outgoing packets)
> but those where done on an encryption use case. Some are provided
> in [2]. In an early (read dirty) polling-based version of the driver,
> the CESA on an Armada 370 (mirabox) was verified to be capable of near
> 100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current
> version of the driver is not as good (say half that value) but it
> behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s
> between its two interfaces but when you mix both using IPsec in tunnel
> mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I
> think it's interesting to see where it ends up w/ the engine exposed to
> userland consumers (e.g. sth like SSH).
>
> I cannot promise a huge amount of time but I'll try and find some to
> play w/ AF_ALG using openssl and CESA in the coming weeks.

I think what we draw from my investigation is that "openssl speed" is
utterly crap - you don't actually know what's being tested there. Some
things test the engine, others bypass the engine infrastructure totally
and test the openssl software implementation instead.

So, if you think "openssl speed" is a good way to measure the speed of
digests and ciphers that openssl supplies to applications, *think again*.
It doesn't.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-10 16:18:10

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Sat, Oct 10, 2015 at 12:29:25PM +0100, Russell King - ARM Linux wrote:
> On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote:
> > Russell King - ARM Linux <[email protected]> writes:
> > > Software:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > > md5 13948.89k 42477.61k 104619.41k 165140.82k 199273.13k
> > > sha1 13091.91k 36463.89k 75393.88k 103893.33k 117104.50k
> > > sha256 13573.92k 30492.25k 52700.33k 64247.81k 68722.69k
> > >
> > > Hardware:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > > md5 3964.55k 13782.11k 43181.71k 180263.38k 1446616.18k
> > > sha1 4609.16k 8922.35k 35422.87k 333575.31k 2122547.20k
> > > sha256 13519.62k 30484.10k 52547.47k 64285.21k 68530.60k
>
> Okay, the reason for the difference in SHA256 speed is because the
> "openssl speed" code *totally* *bypasses* the engine support, whereas
> the md5 and sha1 do not. It even bypasses the normal method used to
> get hold of the sha256 implementation (EVP_sha256), and goes straight
> to using SHA256() directly in openssl/crypto/sha/sha256.c. It looks
> like the same goes for the AES tests too.

Okay, what's required is openssl speed -evp <cipher|digest> and it can
only do one at a time. That's really confusing and non-intuitive, given
that some of the non-evp options end up testing the EVP methods.

So, running:

$ openssl speed -evp aes-128-cbc

I get the kernel spitting out a number of these warnings during its run:

DRBG: could not allocate digest TFM handle: hmac(sha512)

I also notice is that 50% of CPU time is spent in the kernel, presumably
because the lack of hardware sha512 means that we end up having to do
sha512 in software in the kernel.

I _also_ notice that despite having the ARM assembly crypto functions
enabled and built-in to the kernel, they don't appear in /proc/crypto -
and this is because of patch 1 in this series, which blocks out any
crypto driver which has a zero statesize (as the ARM crypto functions
appear to have.) I found this by rebuilding the ARM crypto stuff as
modules, and then trying to insert them:

# modprobe sha512-arm
modprobe: ERROR: could not insert 'sha512_arm': Invalid argument

So, I think it's best if this patch series is *NOT* merged until someone
who knows the kernel crypto code gets to grips with what's supposed to be
done in various parts of the code. Yes, I know that Herbert suggested
the approach in patch 1, but that _will_ cause regressions like the above
when if it's merged.

Either the ARM crypto code in arch/arm/crypto is buggy, or the approach
in patch 1 is buggy.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-10 16:46:22

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, 09 Oct 2015 20:43:33 +0100
Russell King <[email protected]> wrote:

> If the algorithm passed a zero statesize, do not pass a valid pointer
> into the export/import functions. Passing a valid pointer covers up
> bugs in driver code which then go on to smash the kernel stack.
> Instead, pass NULL, which will cause any attempt to write to the
> pointer to fail.
>
> Signed-off-by: Russell King <[email protected]>
> ---
> crypto/ahash.c | 3 ++-
> crypto/shash.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/crypto/ahash.c b/crypto/ahash.c
> index 8acb886032ae..9c1dc8d6106a 100644
> --- a/crypto/ahash.c
> +++ b/crypto/ahash.c
> @@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
> struct crypto_alg *base = &alg->halg.base;
>
> if (alg->halg.digestsize > PAGE_SIZE / 8 ||
> - alg->halg.statesize > PAGE_SIZE / 8)
> + alg->halg.statesize > PAGE_SIZE / 8 ||
> + alg->halg.statesize == 0)

Just read Russel's answer to the cover letter, and I wonder if the
following test wouldn't fix the problem:

(alg->halg.statesize == 0 && (alg->import || alg->export))

I mean, the only valid case where statesize can be zero is when you
don't have any state associated to the crypto algorithm, and if that's
the case, ->import() and ->export() functions are useless, isn't ?

Best Regards,

Boris

> return -EINVAL;
>
> base->cra_type = &crypto_ahash_type;
> diff --git a/crypto/shash.c b/crypto/shash.c
> index ecb1e3d39bf0..ab3384b38542 100644
> --- a/crypto/shash.c
> +++ b/crypto/shash.c
> @@ -585,7 +585,8 @@ static int shash_prepare_alg(struct shash_alg *alg)
>
> if (alg->digestsize > PAGE_SIZE / 8 ||
> alg->descsize > PAGE_SIZE / 8 ||
> - alg->statesize > PAGE_SIZE / 8)
> + alg->statesize > PAGE_SIZE / 8 ||
> + alg->statesize == 0)
> return -EINVAL;
>
> base->cra_type = &crypto_shash_type;



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-10 16:52:40

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Sat, Oct 10, 2015 at 06:46:07PM +0200, Boris Brezillon wrote:
> On Fri, 09 Oct 2015 20:43:33 +0100
> Russell King <[email protected]> wrote:
>
> > If the algorithm passed a zero statesize, do not pass a valid pointer
> > into the export/import functions. Passing a valid pointer covers up
> > bugs in driver code which then go on to smash the kernel stack.
> > Instead, pass NULL, which will cause any attempt to write to the
> > pointer to fail.
> >
> > Signed-off-by: Russell King <[email protected]>
> > ---
> > crypto/ahash.c | 3 ++-
> > crypto/shash.c | 3 ++-
> > 2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/crypto/ahash.c b/crypto/ahash.c
> > index 8acb886032ae..9c1dc8d6106a 100644
> > --- a/crypto/ahash.c
> > +++ b/crypto/ahash.c
> > @@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
> > struct crypto_alg *base = &alg->halg.base;
> >
> > if (alg->halg.digestsize > PAGE_SIZE / 8 ||
> > - alg->halg.statesize > PAGE_SIZE / 8)
> > + alg->halg.statesize > PAGE_SIZE / 8 ||
> > + alg->halg.statesize == 0)
>
> Just read Russel's answer to the cover letter, and I wonder if the
> following test wouldn't fix the problem:
>
> (alg->halg.statesize == 0 && (alg->import || alg->export))
>
> I mean, the only valid case where statesize can be zero is when you
> don't have any state associated to the crypto algorithm, and if that's
> the case, ->import() and ->export() functions are useless, isn't ?

However, that brings up a question.

If you're using AF_ALG, and you attach to (say) the ARM Neon SHA512
implementation through it, and then use accept() to duplicate it's
state, what prevents the kernel from oopsing when hash_accept() calls
crypto_ahash_export(), which then dereferences the NULL alg->export
function pointer?

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-10 18:20:58

by Marek Vasut

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Saturday, October 10, 2015 at 01:29:25 PM, Russell King - ARM Linux wrote:
> On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote:
> > Hi Russel,
>
> ^
>
> > Russell King - ARM Linux <[email protected]> writes:
> > > Software:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
> > > bytes md5 13948.89k 42477.61k 104619.41k
> > > 165140.82k 199273.13k sha1 13091.91k 36463.89k
> > > 75393.88k 103893.33k 117104.50k sha256 13573.92k
> > > 30492.25k 52700.33k 64247.81k 68722.69k
> > >
> > > Hardware:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
> > > bytes md5 3964.55k 13782.11k 43181.71k
> > > 180263.38k 1446616.18k sha1 4609.16k 8922.35k
> > > 35422.87k 333575.31k 2122547.20k sha256 13519.62k
> > > 30484.10k 52547.47k 64285.21k 68530.60k
>
> Okay, the reason for the difference in SHA256 speed is because the
> "openssl speed" code *totally* *bypasses* the engine support, whereas
> the md5 and sha1 do not. It even bypasses the normal method used to
> get hold of the sha256 implementation (EVP_sha256), and goes straight
> to using SHA256() directly in openssl/crypto/sha/sha256.c. It looks
> like the same goes for the AES tests too.
>
> > I had a lot of performance results at various levels (tcrypt module on
> > variations of the drivers (tasklet, threaded irq, full polling, etc),
> > IPsec tunnel and transport mode through to see how it behaves w/ two
> > mvneta instances also eating CPU cycles for incoming/outgoing packets)
> > but those where done on an encryption use case. Some are provided
> > in [2]. In an early (read dirty) polling-based version of the driver,
> > the CESA on an Armada 370 (mirabox) was verified to be capable of near
> > 100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current
> > version of the driver is not as good (say half that value) but it
> > behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s
> > between its two interfaces but when you mix both using IPsec in tunnel
> > mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I
> > think it's interesting to see where it ends up w/ the engine exposed to
> > userland consumers (e.g. sth like SSH).
> >
> > I cannot promise a huge amount of time but I'll try and find some to
> > play w/ AF_ALG using openssl and CESA in the coming weeks.
>
> I think what we draw from my investigation is that "openssl speed" is
> utterly crap - you don't actually know what's being tested there. Some
> things test the engine, others bypass the engine infrastructure totally
> and test the openssl software implementation instead.
>
> So, if you think "openssl speed" is a good way to measure the speed of
> digests and ciphers that openssl supplies to applications, *think again*.
> It doesn't.

Add to that the fact that openssl speed does NOT verify the results of the
transformations, so it's not usable for detecting errors during high load.
It's utter crap, just like you said.

Best regards,
Marek Vasut

2015-10-11 06:55:47

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Sat, Oct 10, 2015 at 05:17:54PM +0100, Russell King - ARM Linux wrote:
>
> I _also_ notice that despite having the ARM assembly crypto functions
> enabled and built-in to the kernel, they don't appear in /proc/crypto -
> and this is because of patch 1 in this series, which blocks out any
> crypto driver which has a zero statesize (as the ARM crypto functions
> appear to have.) I found this by rebuilding the ARM crypto stuff as
> modules, and then trying to insert them:

They're buggy and unfortunately this wasn't caught during the
review process. The import/export functions are required and
certainly not optional.

> # modprobe sha512-arm
> modprobe: ERROR: could not insert 'sha512_arm': Invalid argument

This is the correct response and what will happen is that the
software implementation of sha512 will be used rather than the
accelerated sha512_arm.

Once the sha512_arm driver has been fixed then it can be loaded
again.

> So, I think it's best if this patch series is *NOT* merged until someone
> who knows the kernel crypto code gets to grips with what's supposed to be
> done in various parts of the code. Yes, I know that Herbert suggested
> the approach in patch 1, but that _will_ cause regressions like the above
> when if it's merged.

Considering that the current driver can be used to trigger a kernel
oops from user-space I think preventing the buggy driver from loading
is the right thing to do.

We can then fix them one-by-one.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-11 06:57:58

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Sat, Oct 10, 2015 at 06:46:07PM +0200, Boris Brezillon wrote:
>
> I mean, the only valid case where statesize can be zero is when you
> don't have any state associated to the crypto algorithm, and if that's
> the case, ->import() and ->export() functions are useless, isn't ?

The import/export functions are used by algif_hash and must be
implemented by the driver.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-11 06:59:53

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Sat, Oct 10, 2015 at 05:52:29PM +0100, Russell King - ARM Linux wrote:
>
> If you're using AF_ALG, and you attach to (say) the ARM Neon SHA512
> implementation through it, and then use accept() to duplicate it's
> state, what prevents the kernel from oopsing when hash_accept() calls
> crypto_ahash_export(), which then dereferences the NULL alg->export
> function pointer?

After reading the code I don't think you can actually trigger a NULL
dereference since the crypto API will provide a default import
and export function that just returns ENOSYS. Having said that,
not having an import/export function means that algif_hash may
not work correctly so they should be provided by the driver.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-13 13:01:12

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Sun, Oct 11, 2015 at 02:55:05PM +0800, Herbert Xu wrote:
> On Sat, Oct 10, 2015 at 05:17:54PM +0100, Russell King - ARM Linux wrote:
> >
> > I _also_ notice that despite having the ARM assembly crypto functions
> > enabled and built-in to the kernel, they don't appear in /proc/crypto -
> > and this is because of patch 1 in this series, which blocks out any
> > crypto driver which has a zero statesize (as the ARM crypto functions
> > appear to have.) I found this by rebuilding the ARM crypto stuff as
> > modules, and then trying to insert them:
>
> They're buggy and unfortunately this wasn't caught during the
> review process. The import/export functions are required and
> certainly not optional.

OK it looks like I was confused. Yes the import and export functions
are required but for shash algorithms we provide a simple default.
This means that implementations can simply leave it NULL and they
will then use the default import/export functions which exports
the shash_desc as the state.

So Russell what I'll do is apply your patch without the hunk
for shash_prepare_alg. IOW we will block any ahash algorithms
that leave state as zero since ahash does not provide a default
import/export function (it can't because it may involve hardware
state). shash algorithms on the other hand won't be affected.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-13 13:55:33

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Tue, Oct 13, 2015 at 09:00:48PM +0800, Herbert Xu wrote:
> On Sun, Oct 11, 2015 at 02:55:05PM +0800, Herbert Xu wrote:
> > On Sat, Oct 10, 2015 at 05:17:54PM +0100, Russell King - ARM Linux wrote:
> > >
> > > I _also_ notice that despite having the ARM assembly crypto functions
> > > enabled and built-in to the kernel, they don't appear in /proc/crypto -
> > > and this is because of patch 1 in this series, which blocks out any
> > > crypto driver which has a zero statesize (as the ARM crypto functions
> > > appear to have.) I found this by rebuilding the ARM crypto stuff as
> > > modules, and then trying to insert them:
> >
> > They're buggy and unfortunately this wasn't caught during the
> > review process. The import/export functions are required and
> > certainly not optional.
>
> OK it looks like I was confused. Yes the import and export functions
> are required but for shash algorithms we provide a simple default.
> This means that implementations can simply leave it NULL and they
> will then use the default import/export functions which exports
> the shash_desc as the state.
>
> So Russell what I'll do is apply your patch without the hunk
> for shash_prepare_alg. IOW we will block any ahash algorithms
> that leave state as zero since ahash does not provide a default
> import/export function (it can't because it may involve hardware
> state). shash algorithms on the other hand won't be affected.

Okay.

I've a larger queue of patches building at the moment for the Marvell
driver - I've found that it's touch-and-go whether it produces the
correct hash result, and previous hashes can affect subsequent hashes
in some circumstances. I'm currently at another 7 patches, plus a
"big" as-yet uncommitted patch which is virtually a rewrite of the
DMA preparation - diffstat wise, in total it's looking like:

drivers/crypto/marvell/cesa.h | 8 +-
drivers/crypto/marvell/hash.c | 258 ++++++++++++++++++++++--------------------
drivers/crypto/marvell/tdma.c | 7 ++
3 files changed, 151 insertions(+), 122 deletions(-)

and there's still a few more cases that need solving: zero bytes of
input should not give a zero hash, and an input which is the multiple
of the blocksize causes the hardware to hang - though I'm not yet
sure whether that's a result of my above changes. At least with my
above changes, I now get stable and correct hash values for sizes
which do not hit any of those remaining bugs.

Given it's size so far, I'm not sure it makes much sense to backport
any of these fixes to stable; I think maybe we should just say "okay,
before these fixes, it's just broken."

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-13 13:57:36

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Tue, Oct 13, 2015 at 02:55:18PM +0100, Russell King - ARM Linux wrote:
>
> Given it's size so far, I'm not sure it makes much sense to backport
> any of these fixes to stable; I think maybe we should just say "okay,
> before these fixes, it's just broken."

OK as long as your first patch goes into stable it should be fine
as it'll essentially disable the marvell driver.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-13 13:59:55

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Tue, Oct 13, 2015 at 09:57:22PM +0800, Herbert Xu wrote:
> On Tue, Oct 13, 2015 at 02:55:18PM +0100, Russell King - ARM Linux wrote:
> >
> > Given it's size so far, I'm not sure it makes much sense to backport
> > any of these fixes to stable; I think maybe we should just say "okay,
> > before these fixes, it's just broken."
>
> OK as long as your first patch goes into stable it should be fine
> as it'll essentially disable the marvell driver.

It will disable the Marvell driver, along with the SW implementations in
arch/arm/crypto/ which don't set .statesize, .import and .export. I
would not be surprised if it also affects others too.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-13 14:01:34

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

On Tue, Oct 13, 2015 at 02:59:47PM +0100, Russell King - ARM Linux wrote:
>
> It will disable the Marvell driver, along with the SW implementations in
> arch/arm/crypto/ which don't set .statesize, .import and .export. I
> would not be surprised if it also affects others too.

As I said earlier I'll make sure to delete the hunk for
shash_prepare_alg when I apply your patch so only ahash drivers
will be affected.

The ARM stuff is all shash so they are fine and will use the
default import/export functions.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-13 14:33:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, Oct 09, 2015 at 08:43:33PM +0100, Russell King wrote:
> If the algorithm passed a zero statesize, do not pass a valid pointer
> into the export/import functions. Passing a valid pointer covers up
> bugs in driver code which then go on to smash the kernel stack.
> Instead, pass NULL, which will cause any attempt to write to the
> pointer to fail.
>
> Signed-off-by: Russell King <[email protected]>

Patch applied without the shash hunk. I also replaced your commit
message as it no longer makes any sense:

crypto: ahash - ensure statesize is non-zero

Unlike shash algorithms, ahash drivers must implement export
and import as their descriptors may contain hardware state and
cannot be exported as is. Unfortunately some ahash drivers did
not provide them and end up causing crashes with algif_hash.

This patch adds a check to prevent these drivers from registering
ahash algorithms until they are fixed.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-15 09:39:45

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Tue, Oct 13, 2015 at 10:33:12PM +0800, Herbert Xu wrote:
> On Fri, Oct 09, 2015 at 08:43:33PM +0100, Russell King wrote:
> > If the algorithm passed a zero statesize, do not pass a valid pointer
> > into the export/import functions. Passing a valid pointer covers up
> > bugs in driver code which then go on to smash the kernel stack.
> > Instead, pass NULL, which will cause any attempt to write to the
> > pointer to fail.
> >
> > Signed-off-by: Russell King <[email protected]>
>
> Patch applied without the shash hunk. I also replaced your commit
> message as it no longer makes any sense:
>
> crypto: ahash - ensure statesize is non-zero
>
> Unlike shash algorithms, ahash drivers must implement export
> and import as their descriptors may contain hardware state and
> cannot be exported as is. Unfortunately some ahash drivers did
> not provide them and end up causing crashes with algif_hash.
>
> This patch adds a check to prevent these drivers from registering
> ahash algorithms until they are fixed.
>
> Thanks,

There will be fallout from this.

The CAAM driver is similarly buggy - it has export/import functions in
its ahash drivers, but zero statesize.

User exploitable kernel stack smashing... I'd suggest putting this patch
into stable kernels as high priority, as I'm pretty sure this could be
used to gain privileges via carefully crafted md5 hashes. I've not
proven it, but given that the md5 hash and state data get copied over
the kernel stack, it's highly likely that it _is_ exploitable from any
user that can create an AF_ALG socket.

Yes, it means regressions in the form of various hw crypto no longer
being loadable, but I think that's preferable to the security hole here.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-15 09:42:05

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Thu, Oct 15, 2015 at 10:39:30AM +0100, Russell King - ARM Linux wrote:
>
> The CAAM driver is similarly buggy - it has export/import functions in
> its ahash drivers, but zero statesize.
>
> User exploitable kernel stack smashing... I'd suggest putting this patch
> into stable kernels as high priority, as I'm pretty sure this could be

I agree. It should already be on its way to stable as Linus has
pulled it into his tree and it carries a stable cc.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-15 13:00:04

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Thu, Oct 15, 2015 at 05:41:47PM +0800, Herbert Xu wrote:
> On Thu, Oct 15, 2015 at 10:39:30AM +0100, Russell King - ARM Linux wrote:
> >
> > The CAAM driver is similarly buggy - it has export/import functions in
> > its ahash drivers, but zero statesize.
> >
> > User exploitable kernel stack smashing... I'd suggest putting this patch
> > into stable kernels as high priority, as I'm pretty sure this could be
>
> I agree. It should already be on its way to stable as Linus has
> pulled it into his tree and it carries a stable cc.

Thanks.

I think the CAAM driver is pretty unfixable from a trivial point of
view. This driver exports a huge amount of state - it contains both a
struct caam_hash_ctx and a struct caam_hash_state, which totals up to
1600 bytes. This fails the:

alg->halg.statesize > PAGE_SIZE / 8

in ahash_prepare_alg() if we set .statesize. For ARM, this places a
limit of 512 bytes on the state size.

The CAAM authors need to come up with a better solution (and quickly,
as caamhash is going to fail in all kernels soon), or we need to
support larger exported states.

BTW, I can't find a MAINTAINERS entry for CAAM, so I've just grabbed
a couple of addresses from recent git history in the hope they'll know
who's responsible.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-15 13:13:59

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Thu, Oct 15, 2015 at 01:59:44PM +0100, Russell King - ARM Linux wrote:
>
> I think the CAAM driver is pretty unfixable from a trivial point of
> view. This driver exports a huge amount of state - it contains both a
> struct caam_hash_ctx and a struct caam_hash_state, which totals up to
> 1600 bytes. This fails the:

Right just dumping the state out as is not going to work. This
is not supposed to be how export works anyway. But it doesn't
look too bad as most of that 1600 is consumed by the hardware
program descriptor which can easily be regenerated upon import.

The only things that need to be exported AFAICS are key and buf_X.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-16 23:25:07

by Victoria Milhoan

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Thu, 15 Oct 2015 21:13:38 +0800
Herbert Xu <[email protected]> wrote:

> On Thu, Oct 15, 2015 at 01:59:44PM +0100, Russell King - ARM Linux wrote:
> >
> > I think the CAAM driver is pretty unfixable from a trivial point of
> > view. This driver exports a huge amount of state - it contains both a
> > struct caam_hash_ctx and a struct caam_hash_state, which totals up to
> > 1600 bytes. This fails the:
>
> Right just dumping the state out as is not going to work. This
> is not supposed to be how export works anyway. But it doesn't
> look too bad as most of that 1600 is consumed by the hardware
> program descriptor which can easily be regenerated upon import.
>
> The only things that need to be exported AFAICS are key and buf_X.

I just pushed out a patch for export/import functions in the CAAM driver. The
patch has been through testing with OpenSSL and the AF_ALG plugin on the MX6.

>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Victoria Milhoan <[email protected]>

2015-10-17 07:56:34

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

On Fri, Oct 16, 2015 at 04:24:54PM -0700, Victoria Milhoan wrote:
> On Thu, 15 Oct 2015 21:13:38 +0800
> Herbert Xu <[email protected]> wrote:
>
> > On Thu, Oct 15, 2015 at 01:59:44PM +0100, Russell King - ARM Linux wrote:
> > >
> > > I think the CAAM driver is pretty unfixable from a trivial point of
> > > view. This driver exports a huge amount of state - it contains both a
> > > struct caam_hash_ctx and a struct caam_hash_state, which totals up to
> > > 1600 bytes. This fails the:
> >
> > Right just dumping the state out as is not going to work. This
> > is not supposed to be how export works anyway. But it doesn't
> > look too bad as most of that 1600 is consumed by the hardware
> > program descriptor which can easily be regenerated upon import.
> >
> > The only things that need to be exported AFAICS are key and buf_X.
>
> I just pushed out a patch for export/import functions in the CAAM driver. The
> patch has been through testing with OpenSSL and the AF_ALG plugin on the MX6.

Be careful with that. There's two ways to test:

1. Checking hash output.

Preparation - copy openssl.cnf and add this to openssl.cnf:

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
af_alg = af_alg_engine

[af_alg_engine]
CIPHERS=aes-128-cbc aes-192-cbc aes-256-cbc des-cbc des-ede3-cbc

DIGESTS=md5 sha1 sha256 sha512

# Putting this last means we register the above as the default algorithms
default_algorithms = ALL

Then:

#!/bin/sh

for type in md5 sha1 sha256 sha512; do
echo -n "Checking $type hash:"
for file in /bin/*; do
echo -n " $file"
if ! OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed "s,(stdin)= ,,;s,\$,\t$file," | ${type}sum -c > /dev/null; then
echo " ... failed"
echo -n "Openssl says: " >&2
OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed "s,(stdin)= ,,;s,\$,\t$file," >&2
echo -n "${type}sum says: " >&2
${type}sum $file >&2
exit 1
fi
done
echo " ... ok"
done
echo "All hashes passed"

The above will verify that the hashes are producing the correct answers
for a range of files. This does _not_ test the export/import paths.

2. Backup the existing openssl.cnf and replace it with the above modified
version. Then try to ssh into the platform. This will verify the
export/import side of things. If ssh fails to connect to the target,
you know that the crypto drivers for SHA1 are broken, probably due to
export/import.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-18 16:17:11

by Russell King - ARM Linux

[permalink] [raw]
Subject: [PATCH 00/18] crypto: further fixes for Marvell CESA hash

Following on from the previous series, this series addresses further
problems with the Marvell CESA hash driver found while testing it my
openssl/ssh scenarios.

The first patch improves one from the previous series: we can get the
transform more directly using req->base.tfm rather than going round
the houses.

The next few patches rework the algorithm endianness conversions.
There are two things which depend on the algorithm endianness - the
format of the result, and the format of the bit length in the last
block. We introduce a flag to convey this information, and keep
the creq->state format in CPU endian mode for consistency.

Some of the inconsistent hash results are down to the template
operation not being properly initialised - so we zero initialise all
template operations.

The following patches (from "factor out first fragment decisions to
helper") rework the digest handling to ensure that we always provide
the hardware with a complete block of data to hash, otherwise it can
be left mid-calculation, which then causes state to leak to
subsequent operations. This requires a re-structure of the way we
put together the DMA entries, so it's done in relatively small steps.

This results in the CESA driver passing all tests I can throw at it
via the AF_ALG openssl plugin with the exception of asking for the
hash of /dev/null. This returns an all zero result, rather than the
correct hash value. This bug is pending further diagnosis, but it
is believed not to be a driver specific bug as iMX6 CAAM also behaves
the same.

Unfortunately, this is a large series, but the driver unfortunately
needs this level of bug fixing to work properly.

drivers/crypto/marvell/cesa.h | 11 +-
drivers/crypto/marvell/hash.c | 307 +++++++++++++++++++++---------------------
2 files changed, 164 insertions(+), 154 deletions(-)

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-18 16:23:38

by Russell King

[permalink] [raw]
Subject: [PATCH 01/18] crypto: marvell: easier way to get the transform

There's an easier way to get at the hash transform - rather than
using crypto_ahash_tfm(ahash), we can get it directly from
req->base.tfm.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index a24ceda7acfe..d9cc49e9c43b 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -803,7 +803,7 @@ static int mv_cesa_ahash_export(struct ahash_request *req, void *hash,
unsigned int digsize = crypto_ahash_digestsize(ahash);
unsigned int blocksize;

- blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
+ blocksize = crypto_tfm_alg_blocksize(req->base.tfm);

*len = creq->len;
memcpy(hash, creq->state, digsize);
@@ -828,7 +828,7 @@ static int mv_cesa_ahash_import(struct ahash_request *req, const void *hash,
if (ret)
return ret;

- blocksize = crypto_tfm_alg_blocksize(crypto_ahash_tfm(ahash));
+ blocksize = crypto_tfm_alg_blocksize(req->base.tfm);
if (len >= blocksize)
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
--
2.1.0

2015-10-18 16:23:46

by Russell King

[permalink] [raw]
Subject: [PATCH 02/18] crypto: marvell: keep creq->state in CPU endian format at all times

Currently, we read/write the state in CPU endian, but on the final
request, we convert its endian according to the requested algorithm.
(md5 is little endian, SHA are big endian.)

Always keep creq->state in CPU native endian format, and perform the
necessary conversion when copying the hash to the result.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 2 +-
drivers/crypto/marvell/hash.c | 25 ++++++++++++++-----------
2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index bc2a55bc35e4..e19302c9dec9 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -612,7 +612,7 @@ struct mv_cesa_ahash_req {
u64 len;
int src_nents;
bool last_req;
- __be32 state[8];
+ u32 state[8];
};

/* CESA functions */
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index d9cc49e9c43b..f59faabcd34f 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -347,18 +347,21 @@ static int mv_cesa_ahash_process(struct crypto_async_request *req, u32 status)
ahashreq->nbytes - creq->cache_ptr);

if (creq->last_req) {
- for (i = 0; i < digsize / 4; i++) {
- /*
- * Hardware provides MD5 digest in a different
- * endianness than SHA-1 and SHA-256 ones.
- */
- if (digsize == MD5_DIGEST_SIZE)
- creq->state[i] = cpu_to_le32(creq->state[i]);
- else
- creq->state[i] = cpu_to_be32(creq->state[i]);
- }
+ /*
+ * Hardware's MD5 digest is in little endian format, but
+ * SHA in big endian format
+ */
+ if (digsize == MD5_DIGEST_SIZE) {
+ __le32 *result = (void *)ahashreq->result;
+
+ for (i = 0; i < digsize / 4; i++)
+ result[i] = cpu_to_le32(creq->state[i]);
+ } else {
+ __be32 *result = (void *)ahashreq->result;

- memcpy(ahashreq->result, creq->state, digsize);
+ for (i = 0; i < digsize / 4; i++)
+ result[i] = cpu_to_be32(creq->state[i]);
+ }
}

return ret;
--
2.1.0

2015-10-18 16:23:51

by Russell King

[permalink] [raw]
Subject: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

Rather than determining whether we're using a MD5 hash by looking at
the digest size, switch to a cleaner solution using a per-request flag
initialised by the method type.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 1 +
drivers/crypto/marvell/hash.c | 17 +++++++++--------
2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e19302c9dec9..5d5b66ea2ceb 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -612,6 +612,7 @@ struct mv_cesa_ahash_req {
u64 len;
int src_nents;
bool last_req;
+ bool algo_le;
u32 state[8];
};

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index f59faabcd34f..aa12274608ab 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -351,7 +351,7 @@ static int mv_cesa_ahash_process(struct crypto_async_request *req, u32 status)
* Hardware's MD5 digest is in little endian format, but
* SHA in big endian format
*/
- if (digsize == MD5_DIGEST_SIZE) {
+ if (creq->algo_le) {
__le32 *result = (void *)ahashreq->result;

for (i = 0; i < digsize / 4; i++)
@@ -407,7 +407,7 @@ static const struct mv_cesa_req_ops mv_cesa_ahash_req_ops = {
};

static int mv_cesa_ahash_init(struct ahash_request *req,
- struct mv_cesa_op_ctx *tmpl)
+ struct mv_cesa_op_ctx *tmpl, bool algo_le)
{
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);

@@ -421,6 +421,7 @@ static int mv_cesa_ahash_init(struct ahash_request *req,
mv_cesa_set_mac_op_frag_len(tmpl, 0);
creq->op_tmpl = *tmpl;
creq->len = 0;
+ creq->algo_le = algo_le;

return 0;
}
@@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, true);

return 0;
}
@@ -924,7 +925,7 @@ static int mv_cesa_sha1_init(struct ahash_request *req)

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_SHA1);

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, false);

return 0;
}
@@ -987,7 +988,7 @@ static int mv_cesa_sha256_init(struct ahash_request *req)

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_SHA256);

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, false);

return 0;
}
@@ -1218,7 +1219,7 @@ static int mv_cesa_ahmac_md5_init(struct ahash_request *req)
mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_MD5);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, true);

return 0;
}
@@ -1288,7 +1289,7 @@ static int mv_cesa_ahmac_sha1_init(struct ahash_request *req)
mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_SHA1);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, false);

return 0;
}
@@ -1378,7 +1379,7 @@ static int mv_cesa_ahmac_sha256_init(struct ahash_request *req)
mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_SHA256);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));

- mv_cesa_ahash_init(req, &tmpl);
+ mv_cesa_ahash_init(req, &tmpl, false);

return 0;
}
--
2.1.0

2015-10-18 16:23:56

by Russell King

[permalink] [raw]
Subject: [PATCH 04/18] crypto: marvell: fix the bit length endianness

The endianness of the bit length used in the final stage depends on the
endianness of the algorithm - md5 hashes need it to be in little endian
format, whereas SHA hashes need it in big endian format. Use the
previously added algorithm endianness flag to control this.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index aa12274608ab..dc2c65b52d7a 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -179,7 +179,6 @@ static int mv_cesa_ahash_pad_len(struct mv_cesa_ahash_req *creq)

static int mv_cesa_ahash_pad_req(struct mv_cesa_ahash_req *creq, u8 *buf)
{
- __be64 bits = cpu_to_be64(creq->len << 3);
unsigned int index, padlen;

buf[0] = 0x80;
@@ -187,7 +186,14 @@ static int mv_cesa_ahash_pad_req(struct mv_cesa_ahash_req *creq, u8 *buf)
index = creq->len & CESA_HASH_BLOCK_SIZE_MSK;
padlen = mv_cesa_ahash_pad_len(creq);
memset(buf + 1, 0, padlen - 1);
- memcpy(buf + padlen, &bits, sizeof(bits));
+
+ if (creq->algo_le) {
+ __le64 bits = cpu_to_le64(creq->len << 3);
+ memcpy(buf + padlen, &bits, sizeof(bits));
+ } else {
+ __be64 bits = cpu_to_be64(creq->len << 3);
+ memcpy(buf + padlen, &bits, sizeof(bits));
+ }

return padlen + 8;
}
--
2.1.0

2015-10-18 16:24:02

by Russell King

[permalink] [raw]
Subject: [PATCH 05/18] crypto: marvell: ensure template operation is initialised

Ensure that the template operation is fully initialised, otherwise we
end up loading data from the kernel stack into the engines, which can
upset the hash results.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index dc2c65b52d7a..82d9e3d09331 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -864,7 +864,7 @@ static int mv_cesa_ahash_import(struct ahash_request *req, const void *hash,

static int mv_cesa_md5_init(struct ahash_request *req)
{
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);

@@ -927,7 +927,7 @@ struct ahash_alg mv_md5_alg = {

static int mv_cesa_sha1_init(struct ahash_request *req)
{
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_SHA1);

@@ -990,7 +990,7 @@ struct ahash_alg mv_sha1_alg = {

static int mv_cesa_sha256_init(struct ahash_request *req)
{
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_SHA256);

@@ -1220,7 +1220,7 @@ static int mv_cesa_ahmac_cra_init(struct crypto_tfm *tfm)
static int mv_cesa_ahmac_md5_init(struct ahash_request *req)
{
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_MD5);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
@@ -1290,7 +1290,7 @@ struct ahash_alg mv_ahmac_md5_alg = {
static int mv_cesa_ahmac_sha1_init(struct ahash_request *req)
{
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_SHA1);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
@@ -1380,7 +1380,7 @@ static int mv_cesa_ahmac_sha256_setkey(struct crypto_ahash *tfm, const u8 *key,
static int mv_cesa_ahmac_sha256_init(struct ahash_request *req)
{
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
- struct mv_cesa_op_ctx tmpl;
+ struct mv_cesa_op_ctx tmpl = { };

mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_HMAC_SHA256);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
--
2.1.0

2015-10-18 16:24:07

by Russell King

[permalink] [raw]
Subject: [PATCH 06/18] crypto: marvell: const-ify argument to mv_cesa_get_op_cfg()

mv_cesa_get_op_cfg() does not write to its argument, it only reads.
So, let's make it const.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index 5d5b66ea2ceb..cd646d7c14e2 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -627,7 +627,7 @@ static inline void mv_cesa_update_op_cfg(struct mv_cesa_op_ctx *op,
op->desc.config |= cpu_to_le32(cfg);
}

-static inline u32 mv_cesa_get_op_cfg(struct mv_cesa_op_ctx *op)
+static inline u32 mv_cesa_get_op_cfg(const struct mv_cesa_op_ctx *op)
{
return le32_to_cpu(op->desc.config);
}
--
2.1.0

2015-10-18 16:24:15

by Russell King

[permalink] [raw]
Subject: [PATCH 07/18] crypto: marvell: factor out first fragment decisions to helper

Multiple locations in the driver test the operation context fragment
type, checking whether it is a first fragment or not. Introduce a
mv_cesa_mac_op_is_first_frag() helper, which returns true if the
fragment operation is for a first fragment.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 6 ++++++
drivers/crypto/marvell/hash.c | 9 +++------
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index cd646d7c14e2..e9f732138ba3 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -686,6 +686,12 @@ static inline u32 mv_cesa_get_int_mask(struct mv_cesa_engine *engine)
return engine->int_mask;
}

+static inline bool mv_cesa_mac_op_is_first_frag(const struct mv_cesa_op_ctx *op)
+{
+ return (mv_cesa_get_op_cfg(op) & CESA_SA_DESC_CFG_FRAG_MSK) ==
+ CESA_SA_DESC_CFG_FIRST_FRAG;
+}
+
int mv_cesa_queue_req(struct crypto_async_request *req);

/*
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 82d9e3d09331..938ecfeb8ffe 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -524,8 +524,7 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,

mv_cesa_set_mac_op_frag_len(op, dma_iter->base.op_len);

- if ((mv_cesa_get_op_cfg(&creq->op_tmpl) & CESA_SA_DESC_CFG_FRAG_MSK) ==
- CESA_SA_DESC_CFG_FIRST_FRAG)
+ if (mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);
@@ -561,8 +560,7 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;

- if ((mv_cesa_get_op_cfg(op) & CESA_SA_DESC_CFG_FRAG_MSK) !=
- CESA_SA_DESC_CFG_FIRST_FRAG)
+ if (!mv_cesa_mac_op_is_first_frag(op))
frag = CESA_SA_DESC_CFG_LAST_FRAG;

mv_cesa_update_op_cfg(op, frag, CESA_SA_DESC_CFG_FRAG_MSK);
@@ -600,8 +598,7 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (padoff >= trailerlen)
return op;

- if ((mv_cesa_get_op_cfg(&creq->op_tmpl) & CESA_SA_DESC_CFG_FRAG_MSK) !=
- CESA_SA_DESC_CFG_FIRST_FRAG)
+ if (!mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);
--
2.1.0

2015-10-18 16:24:25

by Russell King

[permalink] [raw]
Subject: [PATCH 09/18] crypto: marvell: always ensure mid-fragments after first-fragment

If we add a template first-fragment operation, always update the
template to be a mid-fragment. This ensures that mid-fragments
always follow on from a first fragment in every case.

This means we can move the first to mid-fragment update code out of
mv_cesa_ahash_dma_add_data().

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 8111e73ca848..f567243da005 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -491,6 +491,11 @@ mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
if (ret)
return ERR_PTR(ret);

+ if (mv_cesa_mac_op_is_first_frag(tmpl))
+ mv_cesa_update_op_cfg(tmpl,
+ CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
return op;
}

@@ -529,7 +534,6 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_req *creq,
gfp_t flags)
{
- struct mv_cesa_op_ctx *op;
int ret;

/* Add input transfers */
@@ -538,17 +542,8 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
if (ret)
return ERR_PTR(ret);

- op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, dma_iter->base.op_len,
- flags);
- if (IS_ERR(op))
- return op;
-
- if (mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
- return op;
+ return mv_cesa_dma_add_frag(chain, &creq->op_tmpl, dma_iter->base.op_len,
+ flags);
}

static struct mv_cesa_op_ctx *
--
2.1.0

2015-10-18 16:24:21

by Russell King

[permalink] [raw]
Subject: [PATCH 08/18] crypto: marvell: factor out adding an operation and launching it

Add a helper to add the fragment operation block followed by the DMA
entry to launch the operation.

Although at the moment this pattern only strictly appears at one site,
two other sites can be factored as well by slightly changing the order
in which the DMA operations are performed. This should be harmless as
the only thing which matters is to have all the data loaded into SRAM
prior to launching the operation.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 74 +++++++++++++++++++++----------------------
1 file changed, 36 insertions(+), 38 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 938ecfeb8ffe..8111e73ca848 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -472,6 +472,29 @@ static int mv_cesa_ahash_cache_req(struct ahash_request *req, bool *cached)
}

static struct mv_cesa_op_ctx *
+mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
+ struct mv_cesa_op_ctx *tmpl, unsigned int frag_len,
+ gfp_t flags)
+{
+ struct mv_cesa_op_ctx *op;
+ int ret;
+
+ op = mv_cesa_dma_add_op(chain, tmpl, false, flags);
+ if (IS_ERR(op))
+ return op;
+
+ /* Set the operation block fragment length. */
+ mv_cesa_set_mac_op_frag_len(op, frag_len);
+
+ /* Append dummy desc to launch operation */
+ ret = mv_cesa_dma_add_dummy_launch(chain, flags);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return op;
+}
+
+static struct mv_cesa_op_ctx *
mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_dma_iter *dma_iter,
struct mv_cesa_ahash_req *creq,
@@ -493,18 +516,9 @@ mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
if (ret)
return ERR_PTR(ret);

- if (!dma_iter->base.op_len) {
- op = mv_cesa_dma_add_op(chain, &creq->op_tmpl, false, flags);
- if (IS_ERR(op))
- return op;
-
- mv_cesa_set_mac_op_frag_len(op, creq->cache_ptr);
-
- /* Add dummy desc to launch crypto operation */
- ret = mv_cesa_dma_add_dummy_launch(chain, flags);
- if (ret)
- return ERR_PTR(ret);
- }
+ if (!dma_iter->base.op_len)
+ op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl,
+ creq->cache_ptr, flags);

return op;
}
@@ -518,28 +532,22 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_op_ctx *op;
int ret;

- op = mv_cesa_dma_add_op(chain, &creq->op_tmpl, false, flags);
+ /* Add input transfers */
+ ret = mv_cesa_dma_add_op_transfers(chain, &dma_iter->base,
+ &dma_iter->src, flags);
+ if (ret)
+ return ERR_PTR(ret);
+
+ op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, dma_iter->base.op_len,
+ flags);
if (IS_ERR(op))
return op;

- mv_cesa_set_mac_op_frag_len(op, dma_iter->base.op_len);
-
if (mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
mv_cesa_update_op_cfg(&creq->op_tmpl,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);

- /* Add input transfers */
- ret = mv_cesa_dma_add_op_transfers(chain, &dma_iter->base,
- &dma_iter->src, flags);
- if (ret)
- return ERR_PTR(ret);
-
- /* Add dummy desc to launch crypto operation */
- ret = mv_cesa_dma_add_dummy_launch(chain, flags);
- if (ret)
- return ERR_PTR(ret);
-
return op;
}

@@ -603,12 +611,6 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
CESA_SA_DESC_CFG_MID_FRAG,
CESA_SA_DESC_CFG_FRAG_MSK);

- op = mv_cesa_dma_add_op(chain, &creq->op_tmpl, false, flags);
- if (IS_ERR(op))
- return op;
-
- mv_cesa_set_mac_op_frag_len(op, trailerlen - padoff);
-
ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET,
ahashdreq->padding_dma +
@@ -619,12 +621,8 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (ret)
return ERR_PTR(ret);

- /* Add dummy desc to launch crypto operation */
- ret = mv_cesa_dma_add_dummy_launch(chain, flags);
- if (ret)
- return ERR_PTR(ret);
-
- return op;
+ return mv_cesa_dma_add_frag(chain, &creq->op_tmpl, trailerlen - padoff,
+ flags);
}

static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
--
2.1.0

2015-10-18 16:24:30

by Russell King

[permalink] [raw]
Subject: [PATCH 10/18] crypto: marvell: move mv_cesa_dma_add_frag() calls

Move the calls to mv_cesa_dma_add_frag() into the parent function,
mv_cesa_ahash_dma_req_init(). This is in preparation to changing
when we generate the operation blocks, as we need to avoid generating
a block for a partial hash block at the end of the user data.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 71 ++++++++++++++++++-------------------------
1 file changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index f567243da005..787cec1715ed 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -499,51 +499,23 @@ mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
return op;
}

-static struct mv_cesa_op_ctx *
+static int
mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_dma_iter *dma_iter,
struct mv_cesa_ahash_req *creq,
gfp_t flags)
{
struct mv_cesa_ahash_dma_req *ahashdreq = &creq->req.dma;
- struct mv_cesa_op_ctx *op = NULL;
- int ret;

if (!creq->cache_ptr)
- return NULL;
-
- ret = mv_cesa_dma_add_data_transfer(chain,
- CESA_SA_DATA_SRAM_OFFSET,
- ahashdreq->cache_dma,
- creq->cache_ptr,
- CESA_TDMA_DST_IN_SRAM,
- flags);
- if (ret)
- return ERR_PTR(ret);
-
- if (!dma_iter->base.op_len)
- op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl,
- creq->cache_ptr, flags);
-
- return op;
-}
-
-static struct mv_cesa_op_ctx *
-mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
- struct mv_cesa_ahash_dma_iter *dma_iter,
- struct mv_cesa_ahash_req *creq,
- gfp_t flags)
-{
- int ret;
-
- /* Add input transfers */
- ret = mv_cesa_dma_add_op_transfers(chain, &dma_iter->base,
- &dma_iter->src, flags);
- if (ret)
- return ERR_PTR(ret);
+ return 0;

- return mv_cesa_dma_add_frag(chain, &creq->op_tmpl, dma_iter->base.op_len,
- flags);
+ return mv_cesa_dma_add_data_transfer(chain,
+ CESA_SA_DATA_SRAM_OFFSET,
+ ahashdreq->cache_dma,
+ creq->cache_ptr,
+ CESA_TDMA_DST_IN_SRAM,
+ flags);
}

static struct mv_cesa_op_ctx *
@@ -647,19 +619,34 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
mv_cesa_tdma_desc_iter_init(&chain);
mv_cesa_ahash_req_iter_init(&iter, req);

- op = mv_cesa_ahash_dma_add_cache(&chain, &iter,
- creq, flags);
- if (IS_ERR(op)) {
- ret = PTR_ERR(op);
+ /*
+ * Add the cache (left-over data from a previous block) first.
+ * This will never overflow the SRAM size.
+ */
+ ret = mv_cesa_ahash_dma_add_cache(&chain, &iter, creq, flags);
+ if (ret)
goto err_free_tdma;
+
+ if (creq->cache_ptr && !iter.base.op_len) {
+ op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
+ creq->cache_ptr, flags);
+ if (IS_ERR(op)) {
+ ret = PTR_ERR(op);
+ goto err_free_tdma;
+ }
}

do {
if (!iter.base.op_len)
break;

- op = mv_cesa_ahash_dma_add_data(&chain, &iter,
- creq, flags);
+ ret = mv_cesa_dma_add_op_transfers(&chain, &iter.base,
+ &iter.src, flags);
+ if (ret)
+ goto err_free_tdma;
+
+ op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
+ iter.base.op_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
--
2.1.0

2015-10-18 16:24:35

by Russell King

[permalink] [raw]
Subject: [PATCH 11/18] crypto: marvell: use presence of scatterlist to determine data load

Use the presence of the scatterlist to determine whether we should load
any new user data to the engine. The following shall always be true at
this point:

iter.base.op_len == 0 === iter.src.sg

In doing so, we can:

1. eliminate the test for iter.base.op_len inside the loop, which
makes the loop operation more obvious and understandable.

2. move the operation generation for the cache-only case.

This prepares the code for the next step in its transformation, and also
uncovers a bug that will be fixed in the next patch.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 787cec1715ed..c4693321fdc8 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -627,7 +627,27 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
if (ret)
goto err_free_tdma;

- if (creq->cache_ptr && !iter.base.op_len) {
+ if (iter.src.sg) {
+ /*
+ * Add all the new data, inserting an operation block and
+ * launch command between each full SRAM block-worth of
+ * data.
+ */
+ do {
+ ret = mv_cesa_dma_add_op_transfers(&chain, &iter.base,
+ &iter.src, flags);
+ if (ret)
+ goto err_free_tdma;
+
+ op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
+ iter.base.op_len, flags);
+ if (IS_ERR(op)) {
+ ret = PTR_ERR(op);
+ goto err_free_tdma;
+ }
+ } while (mv_cesa_ahash_req_iter_next_op(&iter));
+ } else if (creq->cache_ptr) {
+ /* Account for the data that was in the cache. */
op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
creq->cache_ptr, flags);
if (IS_ERR(op)) {
@@ -636,23 +656,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
}
}

- do {
- if (!iter.base.op_len)
- break;
-
- ret = mv_cesa_dma_add_op_transfers(&chain, &iter.base,
- &iter.src, flags);
- if (ret)
- goto err_free_tdma;
-
- op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
- iter.base.op_len, flags);
- if (IS_ERR(op)) {
- ret = PTR_ERR(op);
- goto err_free_tdma;
- }
- } while (mv_cesa_ahash_req_iter_next_op(&iter));
-
op = mv_cesa_ahash_dma_last_req(&chain, &iter, creq, op, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
--
2.1.0

2015-10-18 16:24:41

by Russell King

[permalink] [raw]
Subject: [PATCH 12/18] crypto: marvell: ensure iter.base.op_len is the full op length

When we process the last request of data, and the request contains user
data, the loop in mv_cesa_ahash_dma_req_init() marks the first data size
as being iter.base.op_len which does not include the size of the cache
data. This means we end up hashing an insufficient amount of data.

Fix this by always including the cache size in the first operation
length of any request.

This has the effect that for a request containing no user data,

iter.base.op_len === iter.src.op_offset === creq->cache_ptr

As a result, we include one further change to use iter.base.op_len in
the cache-but-no-user-data case to make the next change clearer.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index c4693321fdc8..b4da73d294af 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -27,10 +27,10 @@ mv_cesa_ahash_req_iter_init(struct mv_cesa_ahash_dma_iter *iter,
struct ahash_request *req)
{
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
- unsigned int len = req->nbytes;
+ unsigned int len = req->nbytes + creq->cache_ptr;

if (!creq->last_req)
- len = (len + creq->cache_ptr) & ~CESA_HASH_BLOCK_SIZE_MSK;
+ len &= ~CESA_HASH_BLOCK_SIZE_MSK;

mv_cesa_req_dma_iter_init(&iter->base, len);
mv_cesa_sg_dma_iter_init(&iter->src, req->src, DMA_TO_DEVICE);
@@ -646,10 +646,10 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
goto err_free_tdma;
}
} while (mv_cesa_ahash_req_iter_next_op(&iter));
- } else if (creq->cache_ptr) {
+ } else if (iter.base.op_len) {
/* Account for the data that was in the cache. */
op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
- creq->cache_ptr, flags);
+ iter.base.op_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
--
2.1.0

2015-10-18 16:24:46

by Russell King

[permalink] [raw]
Subject: [PATCH 13/18] crypto: marvell: avoid adding final operation within loop

Avoid adding the final operation within the loop, but instead add it
outside. We combine this with the handling for the no-data case.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b4da73d294af..dc8ab343ad9f 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -602,6 +602,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
struct mv_cesa_tdma_chain chain;
struct mv_cesa_ahash_dma_iter iter;
struct mv_cesa_op_ctx *op = NULL;
+ unsigned int frag_len;
int ret;

dreq->chain.first = NULL;
@@ -631,25 +632,34 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
/*
* Add all the new data, inserting an operation block and
* launch command between each full SRAM block-worth of
- * data.
+ * data. We intentionally do not add the final op block.
*/
- do {
+ while (true) {
ret = mv_cesa_dma_add_op_transfers(&chain, &iter.base,
&iter.src, flags);
if (ret)
goto err_free_tdma;

+ frag_len = iter.base.op_len;
+
+ if (!mv_cesa_ahash_req_iter_next_op(&iter))
+ break;
+
op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
- iter.base.op_len, flags);
+ frag_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
}
- } while (mv_cesa_ahash_req_iter_next_op(&iter));
- } else if (iter.base.op_len) {
+ }
+ } else {
/* Account for the data that was in the cache. */
- op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
- iter.base.op_len, flags);
+ frag_len = iter.base.op_len;
+ }
+
+ if (frag_len) {
+ op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl, frag_len,
+ flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
--
2.1.0

2015-10-18 16:24:56

by Russell King

[permalink] [raw]
Subject: [PATCH 15/18] crypto: marvell: rearrange handling for hw finished hashes

Rearrange the last request handling for hardware finished hashes
by moving the generation of the fragment operation into this path.
This results in a simplified sequence to handle this case, and
allows us to move the software padded case further down into the
function. Add comments describing these parts.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b8ed0478031a..d2265beaaa6b 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -529,32 +529,45 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_op_ctx *op;
int ret;

- if (frag_len) {
+ /*
+ * If the transfer is smaller than our maximum length, and we have
+ * some data outstanding, we can ask the engine to finish the hash.
+ */
+ if (creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX && frag_len) {
op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len,
flags);
if (IS_ERR(op))
return op;
- } else {
- op = NULL;
- }
-
- if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
- u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;
-
- if (!mv_cesa_mac_op_is_first_frag(op))
- frag = CESA_SA_DESC_CFG_LAST_FRAG;

- mv_cesa_update_op_cfg(op, frag, CESA_SA_DESC_CFG_FRAG_MSK);
+ mv_cesa_set_mac_op_total_len(op, creq->len);
+ mv_cesa_update_op_cfg(op, mv_cesa_mac_op_is_first_frag(op) ?
+ CESA_SA_DESC_CFG_NOT_FRAG :
+ CESA_SA_DESC_CFG_LAST_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);

return op;
}

+ /*
+ * The request is longer than the engine can handle, or we have
+ * no data outstanding. Manually generate the padding, adding it
+ * as a "mid" fragment.
+ */
ret = mv_cesa_ahash_dma_alloc_padding(ahashdreq, flags);
if (ret)
return ERR_PTR(ret);

trailerlen = mv_cesa_ahash_pad_req(creq, ahashdreq->padding);

+ if (frag_len) {
+ op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len,
+ flags);
+ if (IS_ERR(op))
+ return op;
+ } else {
+ op = NULL;
+ }
+
if (op) {
len = min(CESA_SA_SRAM_PAYLOAD_SIZE - dma_iter->base.op_len,
trailerlen);
--
2.1.0

2015-10-18 16:24:51

by Russell King

[permalink] [raw]
Subject: [PATCH 14/18] crypto: marvell: rearrange last request handling

Move the test for the last request out of mv_cesa_ahash_dma_last_req()
to its caller, and move the mv_cesa_dma_add_frag() down into this
function.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index dc8ab343ad9f..b8ed0478031a 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -522,15 +522,21 @@ static struct mv_cesa_op_ctx *
mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_dma_iter *dma_iter,
struct mv_cesa_ahash_req *creq,
- struct mv_cesa_op_ctx *op,
- gfp_t flags)
+ unsigned int frag_len, gfp_t flags)
{
struct mv_cesa_ahash_dma_req *ahashdreq = &creq->req.dma;
unsigned int len, trailerlen, padoff = 0;
+ struct mv_cesa_op_ctx *op;
int ret;

- if (!creq->last_req)
- return op;
+ if (frag_len) {
+ op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len,
+ flags);
+ if (IS_ERR(op))
+ return op;
+ } else {
+ op = NULL;
+ }

if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;
@@ -657,16 +663,18 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
frag_len = iter.base.op_len;
}

- if (frag_len) {
+ /*
+ * At this point, frag_len indicates whether we have any data
+ * outstanding which needs an operation. Queue up the final
+ * operation, which depends whether this is the final request.
+ */
+ if (creq->last_req)
+ op = mv_cesa_ahash_dma_last_req(&chain, &iter, creq, frag_len,
+ flags);
+ else if (frag_len)
op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl, frag_len,
flags);
- if (IS_ERR(op)) {
- ret = PTR_ERR(op);
- goto err_free_tdma;
- }
- }

- op = mv_cesa_ahash_dma_last_req(&chain, &iter, creq, op, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
--
2.1.0

2015-10-18 16:25:01

by Russell King

[permalink] [raw]
Subject: [PATCH 16/18] crypto: marvell: rearrange handling for sw padded hashes

Rearrange the last request handling for hashes which require software
padding.

We prepare the padding to be appended, and then append as much of the
padding to any existing data that's already queued up, adding an
operation block and launching the operation.

Any remainder is then appended as a separate operation.

This ensures that the hardware only ever sees multiples of the hash
block size to be operated on for software padded hashes, thus ensuring
that the engine always indicates that it has finished the calculation.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 44 ++++++++++++++++++-------------------------
1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index d2265beaaa6b..da541e59cc1d 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -559,38 +559,30 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,

trailerlen = mv_cesa_ahash_pad_req(creq, ahashdreq->padding);

- if (frag_len) {
- op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len,
- flags);
- if (IS_ERR(op))
- return op;
- } else {
- op = NULL;
- }
-
- if (op) {
- len = min(CESA_SA_SRAM_PAYLOAD_SIZE - dma_iter->base.op_len,
- trailerlen);
- if (len) {
- ret = mv_cesa_dma_add_data_transfer(chain,
+ len = min(CESA_SA_SRAM_PAYLOAD_SIZE - frag_len, trailerlen);
+ if (len) {
+ ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET +
- dma_iter->base.op_len,
+ frag_len,
ahashdreq->padding_dma,
len, CESA_TDMA_DST_IN_SRAM,
flags);
- if (ret)
- return ERR_PTR(ret);
+ if (ret)
+ return ERR_PTR(ret);

- mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
- mv_cesa_set_mac_op_frag_len(op,
- dma_iter->base.op_len + len);
- padoff += len;
- }
- }
+ op = mv_cesa_dma_add_frag(chain, &creq->op_tmpl, frag_len + len,
+ flags);
+ if (IS_ERR(op))
+ return op;

- if (padoff >= trailerlen)
- return op;
+ mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
+ if (len == trailerlen)
+ return op;
+
+ padoff += len;
+ }

if (!mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
mv_cesa_update_op_cfg(&creq->op_tmpl,
--
2.1.0

2015-10-18 16:25:11

by Russell King

[permalink] [raw]
Subject: [PATCH 18/18] crypto: marvell/cesa: fix memory leak

From: Boris Brezillon <[email protected]>
To: Boris Brezillon <[email protected]>,Arnaud Ebalard <[email protected]>,Thomas Petazzoni <[email protected]>,Jason Cooper <[email protected]>

The local chain variable is not cleaned up if an error occurs in the middle
of DMA chain creation. Fix that by dropping the local chain variable and
using the dreq->chain field which will be cleaned up by
mv_cesa_dma_cleanup() in case of errors.

Signed-off-by: Boris Brezillon <[email protected]>
Reported-by: Thomas Petazzoni <[email protected]>
Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 34271e9eb3a5..7cd0f0decf6c 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -602,7 +602,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
GFP_KERNEL : GFP_ATOMIC;
struct mv_cesa_ahash_dma_req *ahashdreq = &creq->req.dma;
struct mv_cesa_tdma_req *dreq = &ahashdreq->base;
- struct mv_cesa_tdma_chain chain;
struct mv_cesa_ahash_dma_iter iter;
struct mv_cesa_op_ctx *op = NULL;
unsigned int frag_len;
@@ -620,14 +619,14 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
}
}

- mv_cesa_tdma_desc_iter_init(&chain);
+ mv_cesa_tdma_desc_iter_init(&dreq->chain);
mv_cesa_ahash_req_iter_init(&iter, req);

/*
* Add the cache (left-over data from a previous block) first.
* This will never overflow the SRAM size.
*/
- ret = mv_cesa_ahash_dma_add_cache(&chain, &iter, creq, flags);
+ ret = mv_cesa_ahash_dma_add_cache(&dreq->chain, &iter, creq, flags);
if (ret)
goto err_free_tdma;

@@ -638,7 +637,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
* data. We intentionally do not add the final op block.
*/
while (true) {
- ret = mv_cesa_dma_add_op_transfers(&chain, &iter.base,
+ ret = mv_cesa_dma_add_op_transfers(&dreq->chain,
+ &iter.base,
&iter.src, flags);
if (ret)
goto err_free_tdma;
@@ -648,7 +648,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
if (!mv_cesa_ahash_req_iter_next_op(&iter))
break;

- op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl,
+ op = mv_cesa_dma_add_frag(&dreq->chain, &creq->op_tmpl,
frag_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
@@ -666,11 +666,11 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
* operation, which depends whether this is the final request.
*/
if (creq->last_req)
- op = mv_cesa_ahash_dma_last_req(&chain, &iter, creq, frag_len,
- flags);
+ op = mv_cesa_ahash_dma_last_req(&dreq->chain, &iter, creq,
+ frag_len, flags);
else if (frag_len)
- op = mv_cesa_dma_add_frag(&chain, &creq->op_tmpl, frag_len,
- flags);
+ op = mv_cesa_dma_add_frag(&dreq->chain, &creq->op_tmpl,
+ frag_len, flags);

if (IS_ERR(op)) {
ret = PTR_ERR(op);
@@ -679,7 +679,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)

if (op) {
/* Add dummy desc to wait for crypto operation end */
- ret = mv_cesa_dma_add_dummy_end(&chain, flags);
+ ret = mv_cesa_dma_add_dummy_end(&dreq->chain, flags);
if (ret)
goto err_free_tdma;
}
@@ -690,8 +690,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request *req)
else
creq->cache_ptr = 0;

- dreq->chain = chain;
-
return 0;

err_free_tdma:
--
2.1.0

2015-10-18 16:25:06

by Russell King

[permalink] [raw]
Subject: [PATCH 17/18] crypto: marvell: fix first-fragment handling in mv_cesa_ahash_dma_last_req()

When adding the software padding, this must be done using the first/mid
fragment mode, and any subsequent operation needs to be a mid-fragment.
Fix this.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/hash.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index da541e59cc1d..34271e9eb3a5 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -575,20 +575,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (IS_ERR(op))
return op;

- mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
if (len == trailerlen)
return op;

padoff += len;
}

- if (!mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
- mv_cesa_update_op_cfg(&creq->op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET,
ahashdreq->padding_dma +
--
2.1.0

2015-10-18 17:18:51

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH 00/18] crypto: further fixes for Marvell CESA hash

Hi Russell

On Sun, 18 Oct 2015 17:16:49 +0100
Russell King - ARM Linux <[email protected]> wrote:

> Following on from the previous series, this series addresses further
> problems with the Marvell CESA hash driver found while testing it my
> openssl/ssh scenarios.
>
> The first patch improves one from the previous series: we can get the
> transform more directly using req->base.tfm rather than going round
> the houses.
>
> The next few patches rework the algorithm endianness conversions.
> There are two things which depend on the algorithm endianness - the
> format of the result, and the format of the bit length in the last
> block. We introduce a flag to convey this information, and keep
> the creq->state format in CPU endian mode for consistency.
>
> Some of the inconsistent hash results are down to the template
> operation not being properly initialised - so we zero initialise all
> template operations.
>
> The following patches (from "factor out first fragment decisions to
> helper") rework the digest handling to ensure that we always provide
> the hardware with a complete block of data to hash, otherwise it can
> be left mid-calculation, which then causes state to leak to
> subsequent operations. This requires a re-structure of the way we
> put together the DMA entries, so it's done in relatively small steps.
>
> This results in the CESA driver passing all tests I can throw at it
> via the AF_ALG openssl plugin with the exception of asking for the
> hash of /dev/null. This returns an all zero result, rather than the
> correct hash value. This bug is pending further diagnosis, but it
> is believed not to be a driver specific bug as iMX6 CAAM also behaves
> the same.
>
> Unfortunately, this is a large series, but the driver unfortunately
> needs this level of bug fixing to work properly.

Thanks for spending some time fixing those bugs and
simplifying/factorizing/documenting the hash logic.

To the whole series:

Acked-by: Boris Brezillon <[email protected]>

Note that my ack might seem quite quick compared to the amount of
changes, but Russell actually sent me a preliminary version last
Wednesday.

Arnaud, if you have some time, could you also have a look at those
patches?

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-18 17:30:52

by Russell King - ARM Linux

[permalink] [raw]
Subject: [PATCH 0/6] Sparse related fixes

Continuing on from the previous set of 18 patches, I also fixed a
number of sparse problems and other cleanups. I don't deem these
suitable for -rc merging, especially now that we're basically at
-rc6.

The first patch switches the driver over to appropriately using
the relaxed IO accessors - this avoids calling out to the heavy
barrier on every read and write operation, but only calling out on
those which really matter.

We switch to using dma_addr_t for DMA addresses which are not accessed
by hardware, and using gfp_t for the get_free_page flags. String-based
MMIO accesses are used instead of plain memcpy()/memset() which prevents
us potentially stumbling over GCC optimisations that it thinks it may
make with these functions.

We convert as much of the hardware state to __le32 endian markings,
and use cpu_to_le32() as appropriate. A number of places are left
unfixed, as we temporarily store CPU native endian values at these
locations; these warnings should not be fixed (basically, only
appropriate sparse warnings should be fixed without penalising code.)

drivers/crypto/marvell/cesa.h | 44 ++++++++++++++++++++---------------------
drivers/crypto/marvell/cipher.c | 13 ++++++------
drivers/crypto/marvell/hash.c | 23 +++++++++++----------
drivers/crypto/marvell/tdma.c | 42 ++++++++++++++++++++-------------------
4 files changed, 62 insertions(+), 60 deletions(-)

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-18 17:31:09

by Russell King

[permalink] [raw]
Subject: [PATCH 1/6] crypto: marvell: use readl_relaxed()/writel_relaxed()

Use relaxed IO accessors where appropriate.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 2 +-
drivers/crypto/marvell/cipher.c | 2 +-
drivers/crypto/marvell/hash.c | 7 +++----
drivers/crypto/marvell/tdma.c | 20 ++++++++++----------
4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e9f732138ba3..e19877402ec9 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -677,7 +677,7 @@ static inline void mv_cesa_set_int_mask(struct mv_cesa_engine *engine,
if (int_mask == engine->int_mask)
return;

- writel(int_mask, engine->regs + CESA_SA_INT_MSK);
+ writel_relaxed(int_mask, engine->regs + CESA_SA_INT_MSK);
engine->int_mask = int_mask;
}

diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index 3df2f4e7adb2..4db2d632204f 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -105,7 +105,7 @@ static void mv_cesa_ablkcipher_std_step(struct ablkcipher_request *req)
}

mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
- writel(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
+ writel_relaxed(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
}

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 7cd0f0decf6c..84ddc4cbfa9d 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -281,7 +281,7 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
creq->cache_ptr = new_cache_ptr;

mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
- writel(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
+ writel_relaxed(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
}

@@ -344,7 +344,7 @@ static int mv_cesa_ahash_process(struct crypto_async_request *req, u32 status)

digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
for (i = 0; i < digsize / 4; i++)
- creq->state[i] = readl(engine->regs + CESA_IVDIG(i));
+ creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));

if (creq->cache_ptr)
sg_pcopy_to_buffer(ahashreq->src, creq->src_nents,
@@ -390,8 +390,7 @@ static void mv_cesa_ahash_prepare(struct crypto_async_request *req,

digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
for (i = 0; i < digsize / 4; i++)
- writel(creq->state[i],
- engine->regs + CESA_IVDIG(i));
+ writel_relaxed(creq->state[i], engine->regs + CESA_IVDIG(i));
}

static void mv_cesa_ahash_req_cleanup(struct crypto_async_request *req)
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 64a366c50174..e8e8a7f7659b 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -41,18 +41,18 @@ void mv_cesa_dma_step(struct mv_cesa_tdma_req *dreq)
{
struct mv_cesa_engine *engine = dreq->base.engine;

- writel(0, engine->regs + CESA_SA_CFG);
+ writel_relaxed(0, engine->regs + CESA_SA_CFG);

mv_cesa_set_int_mask(engine, CESA_SA_INT_ACC0_IDMA_DONE);
- writel(CESA_TDMA_DST_BURST_128B | CESA_TDMA_SRC_BURST_128B |
- CESA_TDMA_NO_BYTE_SWAP | CESA_TDMA_EN,
- engine->regs + CESA_TDMA_CONTROL);
-
- writel(CESA_SA_CFG_ACT_CH0_IDMA | CESA_SA_CFG_MULTI_PKT |
- CESA_SA_CFG_CH0_W_IDMA | CESA_SA_CFG_PARA_DIS,
- engine->regs + CESA_SA_CFG);
- writel(dreq->chain.first->cur_dma,
- engine->regs + CESA_TDMA_NEXT_ADDR);
+ writel_relaxed(CESA_TDMA_DST_BURST_128B | CESA_TDMA_SRC_BURST_128B |
+ CESA_TDMA_NO_BYTE_SWAP | CESA_TDMA_EN,
+ engine->regs + CESA_TDMA_CONTROL);
+
+ writel_relaxed(CESA_SA_CFG_ACT_CH0_IDMA | CESA_SA_CFG_MULTI_PKT |
+ CESA_SA_CFG_CH0_W_IDMA | CESA_SA_CFG_PARA_DIS,
+ engine->regs + CESA_SA_CFG);
+ writel_relaxed(dreq->chain.first->cur_dma,
+ engine->regs + CESA_TDMA_NEXT_ADDR);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
}

--
2.1.0

2015-10-18 17:31:16

by Russell King

[permalink] [raw]
Subject: [PATCH 2/6] crypto: marvell: use dma_addr_t for cur_dma

cur_dma is part of the software state, not read by the hardware.
Storing it in LE32 format is wrong, use dma_addr_t for this.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 4 +++-
drivers/crypto/marvell/tdma.c | 6 +++---
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e19877402ec9..dbe4970d6c64 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -297,7 +297,9 @@ struct mv_cesa_tdma_desc {
u32 src;
u32 dst;
u32 next_dma;
- u32 cur_dma;
+
+ /* Software state */
+ dma_addr_t cur_dma;
struct mv_cesa_tdma_desc *next;
union {
struct mv_cesa_op_ctx *op;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index e8e8a7f7659b..c8256f5916b0 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -69,7 +69,7 @@ void mv_cesa_dma_cleanup(struct mv_cesa_tdma_req *dreq)

tdma = tdma->next;
dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
- le32_to_cpu(old_tdma->cur_dma));
+ old_tdma->cur_dma);
}

dreq->chain.first = NULL;
@@ -105,9 +105,9 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, gfp_t flags)
return ERR_PTR(-ENOMEM);

memset(new_tdma, 0, sizeof(*new_tdma));
- new_tdma->cur_dma = cpu_to_le32(dma_handle);
+ new_tdma->cur_dma = dma_handle;
if (chain->last) {
- chain->last->next_dma = new_tdma->cur_dma;
+ chain->last->next_dma = cpu_to_le32(dma_handle);
chain->last->next = new_tdma;
} else {
chain->first = new_tdma;
--
2.1.0

2015-10-18 17:31:21

by Russell King

[permalink] [raw]
Subject: [PATCH 3/6] crypto: marvell: use gfp_t for gfp flags

Use gfp_t not u32 for the GFP flags.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 6 ++----
drivers/crypto/marvell/tdma.c | 5 ++---
2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index dbe4970d6c64..752131ae1ef2 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -798,10 +798,8 @@ int mv_cesa_dma_add_data_transfer(struct mv_cesa_tdma_chain *chain,
dma_addr_t dst, dma_addr_t src, u32 size,
u32 flags, gfp_t gfp_flags);

-int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain,
- u32 flags);
-
-int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, u32 flags);
+int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain, gfp_t flags);
+int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, gfp_t flags);

int mv_cesa_dma_add_op_transfers(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_dma_iter *dma_iter,
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index c8256f5916b0..2738e24bcc59 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -166,8 +166,7 @@ int mv_cesa_dma_add_data_transfer(struct mv_cesa_tdma_chain *chain,
return 0;
}

-int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain,
- u32 flags)
+int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain, gfp_t flags)
{
struct mv_cesa_tdma_desc *tdma;

@@ -178,7 +177,7 @@ int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain,
return 0;
}

-int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, u32 flags)
+int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, gfp_t flags)
{
struct mv_cesa_tdma_desc *tdma;

--
2.1.0

2015-10-18 17:31:26

by Russell King

[permalink] [raw]
Subject: [PATCH 4/6] crypto: marvell: use memcpy_fromio()/memcpy_toio()

Use the IO memcpy() functions when copying from/to MMIO memory.
These locations were found via sparse.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cipher.c | 11 ++++++-----
drivers/crypto/marvell/hash.c | 16 ++++++++--------
2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index 4db2d632204f..6edae64bb387 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -98,10 +98,10 @@ static void mv_cesa_ablkcipher_std_step(struct ablkcipher_request *req)

/* FIXME: only update enc_len field */
if (!sreq->skip_ctx) {
- memcpy(engine->sram, &sreq->op, sizeof(sreq->op));
+ memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op));
sreq->skip_ctx = true;
} else {
- memcpy(engine->sram, &sreq->op, sizeof(sreq->op.desc));
+ memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op.desc));
}

mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
@@ -145,8 +145,9 @@ static int mv_cesa_ablkcipher_process(struct crypto_async_request *req,
if (ret)
return ret;

- memcpy(ablkreq->info, engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
- crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));
+ memcpy_fromio(ablkreq->info,
+ engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
+ crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));

return 0;
}
@@ -181,7 +182,7 @@ mv_cesa_ablkcipher_std_prepare(struct ablkcipher_request *req)
sreq->size = 0;
sreq->offset = 0;
mv_cesa_adjust_op(engine, &sreq->op);
- memcpy(engine->sram, &sreq->op, sizeof(sreq->op));
+ memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op));
}

static inline void mv_cesa_ablkcipher_prepare(struct crypto_async_request *req,
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 84ddc4cbfa9d..8330815d72fc 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -209,8 +209,8 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
size_t len;

if (creq->cache_ptr)
- memcpy(engine->sram + CESA_SA_DATA_SRAM_OFFSET, creq->cache,
- creq->cache_ptr);
+ memcpy_toio(engine->sram + CESA_SA_DATA_SRAM_OFFSET,
+ creq->cache, creq->cache_ptr);

len = min_t(size_t, req->nbytes + creq->cache_ptr - sreq->offset,
CESA_SA_SRAM_PAYLOAD_SIZE);
@@ -251,10 +251,10 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
if (len + trailerlen > CESA_SA_SRAM_PAYLOAD_SIZE) {
len &= CESA_HASH_BLOCK_SIZE_MSK;
new_cache_ptr = 64 - trailerlen;
- memcpy(creq->cache,
- engine->sram +
- CESA_SA_DATA_SRAM_OFFSET + len,
- new_cache_ptr);
+ memcpy_fromio(creq->cache,
+ engine->sram +
+ CESA_SA_DATA_SRAM_OFFSET + len,
+ new_cache_ptr);
} else {
len += mv_cesa_ahash_pad_req(creq,
engine->sram + len +
@@ -272,7 +272,7 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
mv_cesa_update_op_cfg(op, frag_mode, CESA_SA_DESC_CFG_FRAG_MSK);

/* FIXME: only update enc_len field */
- memcpy(engine->sram, op, sizeof(*op));
+ memcpy_toio(engine->sram, op, sizeof(*op));

if (frag_mode == CESA_SA_DESC_CFG_FIRST_FRAG)
mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
@@ -312,7 +312,7 @@ static void mv_cesa_ahash_std_prepare(struct ahash_request *req)

sreq->offset = 0;
mv_cesa_adjust_op(engine, &creq->op_tmpl);
- memcpy(engine->sram, &creq->op_tmpl, sizeof(creq->op_tmpl));
+ memcpy_toio(engine->sram, &creq->op_tmpl, sizeof(creq->op_tmpl));
}

static void mv_cesa_ahash_step(struct crypto_async_request *req)
--
2.1.0

2015-10-18 17:31:31

by Russell King

[permalink] [raw]
Subject: [PATCH 5/6] crypto: marvell: fix missing cpu_to_le32() in mv_cesa_dma_add_op()

When tdma->src is freed in mv_cesa_dma_cleanup(), we convert the DMA
address from a little-endian value prior to calling dma_pool_free().
However, mv_cesa_dma_add_op() assigns tdma->src without first converting
the DMA address to little endian. Fix this.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/tdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 2738e24bcc59..06bdeac27e3d 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -140,7 +140,7 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
tdma = chain->last;
tdma->op = op;
tdma->byte_cnt = (skip_ctx ? sizeof(op->desc) : sizeof(*op)) | BIT(31);
- tdma->src = dma_handle;
+ tdma->src = cpu_to_le32(dma_handle);
tdma->flags = CESA_TDMA_DST_IN_SRAM | CESA_TDMA_OP;

return op;
--
2.1.0

2015-10-18 17:31:35

by Russell King

[permalink] [raw]
Subject: [PATCH 6/6] crypto: marvell: use __le32 for hardware descriptors

Much of the driver uses cpu_to_le32() to convert values for descriptors
to little endian before writing. Use __le32 to define the hardware-
accessed parts of the descriptors, and ensure most places where it's
reasonable to do so use cpu_to_le32() when assigning to these.

Signed-off-by: Russell King <[email protected]>
---
drivers/crypto/marvell/cesa.h | 32 ++++++++++++++++----------------
drivers/crypto/marvell/tdma.c | 9 ++++++---
2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index 752131ae1ef2..bd985e72520b 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -174,19 +174,19 @@

#define CESA_SA_DESC_MAC_DATA(offset) \
cpu_to_le32(CESA_SA_DATA_SRAM_OFFSET + (offset))
-#define CESA_SA_DESC_MAC_DATA_MSK GENMASK(15, 0)
+#define CESA_SA_DESC_MAC_DATA_MSK cpu_to_le32(GENMASK(15, 0))

#define CESA_SA_DESC_MAC_TOTAL_LEN(total_len) cpu_to_le32((total_len) << 16)
-#define CESA_SA_DESC_MAC_TOTAL_LEN_MSK GENMASK(31, 16)
+#define CESA_SA_DESC_MAC_TOTAL_LEN_MSK cpu_to_le32(GENMASK(31, 16))

#define CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX 0xffff

#define CESA_SA_DESC_MAC_DIGEST(offset) \
cpu_to_le32(CESA_SA_MAC_DIG_SRAM_OFFSET + (offset))
-#define CESA_SA_DESC_MAC_DIGEST_MSK GENMASK(15, 0)
+#define CESA_SA_DESC_MAC_DIGEST_MSK cpu_to_le32(GENMASK(15, 0))

#define CESA_SA_DESC_MAC_FRAG_LEN(frag_len) cpu_to_le32((frag_len) << 16)
-#define CESA_SA_DESC_MAC_FRAG_LEN_MSK GENMASK(31, 16)
+#define CESA_SA_DESC_MAC_FRAG_LEN_MSK cpu_to_le32(GENMASK(31, 16))

#define CESA_SA_DESC_MAC_IV(offset) \
cpu_to_le32((CESA_SA_MAC_IIV_SRAM_OFFSET + (offset)) | \
@@ -219,14 +219,14 @@
* to be executed.
*/
struct mv_cesa_sec_accel_desc {
- u32 config;
- u32 enc_p;
- u32 enc_len;
- u32 enc_key_p;
- u32 enc_iv;
- u32 mac_src_p;
- u32 mac_digest;
- u32 mac_iv;
+ __le32 config;
+ __le32 enc_p;
+ __le32 enc_len;
+ __le32 enc_key_p;
+ __le32 enc_iv;
+ __le32 mac_src_p;
+ __le32 mac_digest;
+ __le32 mac_iv;
};

/**
@@ -293,10 +293,10 @@ struct mv_cesa_op_ctx {
* operation.
*/
struct mv_cesa_tdma_desc {
- u32 byte_cnt;
- u32 src;
- u32 dst;
- u32 next_dma;
+ __le32 byte_cnt;
+ __le32 src;
+ __le32 dst;
+ __le32 next_dma;

/* Software state */
dma_addr_t cur_dma;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 06bdeac27e3d..76427981275b 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -126,6 +126,7 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_tdma_desc *tdma;
struct mv_cesa_op_ctx *op;
dma_addr_t dma_handle;
+ unsigned int size;

tdma = mv_cesa_dma_add_desc(chain, flags);
if (IS_ERR(tdma))
@@ -137,9 +138,11 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct mv_cesa_tdma_chain *chain,

*op = *op_templ;

+ size = skip_ctx ? sizeof(op->desc) : sizeof(*op);
+
tdma = chain->last;
tdma->op = op;
- tdma->byte_cnt = (skip_ctx ? sizeof(op->desc) : sizeof(*op)) | BIT(31);
+ tdma->byte_cnt = cpu_to_le32(size | BIT(31));
tdma->src = cpu_to_le32(dma_handle);
tdma->flags = CESA_TDMA_DST_IN_SRAM | CESA_TDMA_OP;

@@ -156,7 +159,7 @@ int mv_cesa_dma_add_data_transfer(struct mv_cesa_tdma_chain *chain,
if (IS_ERR(tdma))
return PTR_ERR(tdma);

- tdma->byte_cnt = size | BIT(31);
+ tdma->byte_cnt = cpu_to_le32(size | BIT(31));
tdma->src = src;
tdma->dst = dst;

@@ -185,7 +188,7 @@ int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, gfp_t flags)
if (IS_ERR(tdma))
return PTR_ERR(tdma);

- tdma->byte_cnt = BIT(31);
+ tdma->byte_cnt = cpu_to_le32(BIT(31));

return 0;
}
--
2.1.0

2015-10-18 17:49:56

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH 0/6] Sparse related fixes

On Sun, 18 Oct 2015 18:30:39 +0100
Russell King - ARM Linux <[email protected]> wrote:

> Continuing on from the previous set of 18 patches, I also fixed a
> number of sparse problems and other cleanups. I don't deem these
> suitable for -rc merging, especially now that we're basically at
> -rc6.
>
> The first patch switches the driver over to appropriately using
> the relaxed IO accessors - this avoids calling out to the heavy
> barrier on every read and write operation, but only calling out on
> those which really matter.
>
> We switch to using dma_addr_t for DMA addresses which are not accessed
> by hardware, and using gfp_t for the get_free_page flags. String-based
> MMIO accesses are used instead of plain memcpy()/memset() which prevents
> us potentially stumbling over GCC optimisations that it thinks it may
> make with these functions.
>
> We convert as much of the hardware state to __le32 endian markings,
> and use cpu_to_le32() as appropriate. A number of places are left
> unfixed, as we temporarily store CPU native endian values at these
> locations; these warnings should not be fixed (basically, only
> appropriate sparse warnings should be fixed without penalising code.)

To the whole series:

Acked-by: Boris Brezillon <[email protected]>

>
> drivers/crypto/marvell/cesa.h | 44 ++++++++++++++++++++---------------------
> drivers/crypto/marvell/cipher.c | 13 ++++++------
> drivers/crypto/marvell/hash.c | 23 +++++++++++----------
> drivers/crypto/marvell/tdma.c | 42 ++++++++++++++++++++-------------------
> 4 files changed, 62 insertions(+), 60 deletions(-)
>



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-10-18 23:58:08

by arno

[permalink] [raw]
Subject: Re: [PATCH 00/18] crypto: further fixes for Marvell CESA hash

Hi,

Boris Brezillon <[email protected]> writes:

> On Sun, 18 Oct 2015 17:16:49 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
>> Following on from the previous series, this series addresses further
>> problems with the Marvell CESA hash driver found while testing it my
>> openssl/ssh scenarios.
>>
>> The first patch improves one from the previous series: we can get the
>> transform more directly using req->base.tfm rather than going round
>> the houses.
>>
>> The next few patches rework the algorithm endianness conversions.
>> There are two things which depend on the algorithm endianness - the
>> format of the result, and the format of the bit length in the last
>> block. We introduce a flag to convey this information, and keep
>> the creq->state format in CPU endian mode for consistency.
>>
>> Some of the inconsistent hash results are down to the template
>> operation not being properly initialised - so we zero initialise all
>> template operations.
>>
>> The following patches (from "factor out first fragment decisions to
>> helper") rework the digest handling to ensure that we always provide
>> the hardware with a complete block of data to hash, otherwise it can
>> be left mid-calculation, which then causes state to leak to
>> subsequent operations. This requires a re-structure of the way we
>> put together the DMA entries, so it's done in relatively small steps.
>>
>> This results in the CESA driver passing all tests I can throw at it
>> via the AF_ALG openssl plugin with the exception of asking for the
>> hash of /dev/null. This returns an all zero result, rather than the
>> correct hash value. This bug is pending further diagnosis, but it
>> is believed not to be a driver specific bug as iMX6 CAAM also behaves
>> the same.
>>
>> Unfortunately, this is a large series, but the driver unfortunately
>> needs this level of bug fixing to work properly.
>
> Thanks for spending some time fixing those bugs and
> simplifying/factorizing/documenting the hash logic.
>
> To the whole series:
>
> Acked-by: Boris Brezillon <[email protected]>
>
> Note that my ack might seem quite quick compared to the amount of
> changes, but Russell actually sent me a preliminary version last
> Wednesday.
>
> Arnaud, if you have some time, could you also have a look at those
> patches?

I have tested the whole set and it works fine on my mirabox (crypto
manager tests are ok). I have started reading the patches but I am only
at patch 08/18 right now; everything looks good for now. I'll continue
tomorrow.

Cheers,

a+

2015-10-19 01:37:59

by Herbert Xu

[permalink] [raw]
Subject: crypto: ahash - Add crypto_ahash_blocksize

On Sun, Oct 18, 2015 at 05:23:30PM +0100, Russell King wrote:
> There's an easier way to get at the hash transform - rather than
> using crypto_ahash_tfm(ahash), we can get it directly from
> req->base.tfm.
>
> Signed-off-by: Russell King <[email protected]>

This should be converted to crypto_ahash_blocksize which doesn't
exist for some reason. So I'm going to add it with the following
patch and then I'll change your patch to use it.

Thanks,

---8<---
This patch adds the missing helper crypto_ahash_blocksize which
returns the block size of an ahash algorithm.

Signed-off-by: Herbert Xu <[email protected]>

diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 8e920b4..3d69c93 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -264,6 +264,20 @@ static inline unsigned int crypto_ahash_alignmask(
return crypto_tfm_alg_alignmask(crypto_ahash_tfm(tfm));
}

+/**
+ * crypto_ahash_blocksize() - obtain block size for cipher
+ * @tfm: cipher handle
+ *
+ * The block size for the message digest cipher referenced with the cipher
+ * handle is returned.
+ *
+ * Return: block size of cipher
+ */
+static inline unsigned int crypto_ahash_blocksize(struct crypto_ahash *tfm)
+{
+ return crypto_tfm_alg_blocksize(crypto_ahash_tfm(tfm));
+}
+
static inline struct hash_alg_common *__crypto_hash_alg_common(
struct crypto_alg *alg)
{
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-19 15:20:57

by Jason Cooper

[permalink] [raw]
Subject: Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

Hey Russell,

On Sun, Oct 18, 2015 at 05:23:40PM +0100, Russell King wrote:
> Rather than determining whether we're using a MD5 hash by looking at
> the digest size, switch to a cleaner solution using a per-request flag
> initialised by the method type.
>
> Signed-off-by: Russell King <[email protected]>
> ---
> drivers/crypto/marvell/cesa.h | 1 +
> drivers/crypto/marvell/hash.c | 17 +++++++++--------
> 2 files changed, 10 insertions(+), 8 deletions(-)
>
...
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index f59faabcd34f..aa12274608ab 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -351,7 +351,7 @@ static int mv_cesa_ahash_process(struct crypto_async_request *req, u32 status)
> * Hardware's MD5 digest is in little endian format, but
> * SHA in big endian format
> */
> - if (digsize == MD5_DIGEST_SIZE) {
> + if (creq->algo_le) {
> __le32 *result = (void *)ahashreq->result;
>
> for (i = 0; i < digsize / 4; i++)
> @@ -407,7 +407,7 @@ static const struct mv_cesa_req_ops mv_cesa_ahash_req_ops = {
> };
>
> static int mv_cesa_ahash_init(struct ahash_request *req,
> - struct mv_cesa_op_ctx *tmpl)
> + struct mv_cesa_op_ctx *tmpl, bool algo_le)

nit: Would it make more sense the do bool algo_endian, and then ...

> @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
>
> mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
>
> - mv_cesa_ahash_init(req, &tmpl);
> + mv_cesa_ahash_init(req, &tmpl, true);

mv_cesa_ahash_init(req, &tmpl, ALGO_ENDIAN_LITTLE);

'true' doesn't seem as obvious. But again, nit-picky.

thx,

Jason.

2015-10-19 15:25:21

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

On Mon, Oct 19, 2015 at 03:04:51PM +0000, Jason Cooper wrote:
> Hey Russell,
>
> On Sun, Oct 18, 2015 at 05:23:40PM +0100, Russell King wrote:
> > static int mv_cesa_ahash_init(struct ahash_request *req,
> > - struct mv_cesa_op_ctx *tmpl)
> > + struct mv_cesa_op_ctx *tmpl, bool algo_le)
>
> nit: Would it make more sense the do bool algo_endian, and then ...

I don't like "bool"s that don't self-document what they mean.
What does "if (algo_endian)" mean? It's pretty clear what
"if (algo_le)" means in the context of endianness, though I'll
give you that "if (algo_little_endian)" would be a lot better.

>
> > @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
> >
> > mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
> >
> > - mv_cesa_ahash_init(req, &tmpl);
> > + mv_cesa_ahash_init(req, &tmpl, true);
>
> mv_cesa_ahash_init(req, &tmpl, ALGO_ENDIAN_LITTLE);
>
> 'true' doesn't seem as obvious. But again, nit-picky.

I did think about:

enum {
ALGO_LITTLE_ENDIAN,
ALGO_BIG_ENDIAN,
};

and passing an int algo_endian around, but that seemed to be like too
much bloat... though if you want to insist, I could make that change.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-19 16:19:06

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

On Mon, Oct 19, 2015 at 04:25:07PM +0100, Russell King - ARM Linux wrote:
>
> > > @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
> > >
> > > mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
> > >
> > > - mv_cesa_ahash_init(req, &tmpl);
> > > + mv_cesa_ahash_init(req, &tmpl, true);
> >
> > mv_cesa_ahash_init(req, &tmpl, ALGO_ENDIAN_LITTLE);
> >
> > 'true' doesn't seem as obvious. But again, nit-picky.
>
> I did think about:
>
> enum {
> ALGO_LITTLE_ENDIAN,
> ALGO_BIG_ENDIAN,
> };
>
> and passing an int algo_endian around, but that seemed to be like too
> much bloat... though if you want to insist, I could make that change.

I think the patch is fine as it is.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-19 16:31:14

by Jason Cooper

[permalink] [raw]
Subject: Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

On Mon, Oct 19, 2015 at 04:25:07PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 19, 2015 at 03:04:51PM +0000, Jason Cooper wrote:
> > Hey Russell,
> >
> > On Sun, Oct 18, 2015 at 05:23:40PM +0100, Russell King wrote:
> > > static int mv_cesa_ahash_init(struct ahash_request *req,
> > > - struct mv_cesa_op_ctx *tmpl)
> > > + struct mv_cesa_op_ctx *tmpl, bool algo_le)
> >
> > nit: Would it make more sense the do bool algo_endian, and then ...
>
> I don't like "bool"s that don't self-document what they mean.
> What does "if (algo_endian)" mean? It's pretty clear what

That's where I would go with an enum. "if (algo_endian ==
ALGO_ENDIAN_LITTLE) ..."

> "if (algo_le)" means in the context of endianness, though I'll
> give you that "if (algo_little_endian)" would be a lot better.

Right, it's really a question of where do we want readability? I was
focusing on the call site, since the context isn't there for newcomers to
easily grok 'true'.

> > > @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
> > >
> > > mv_cesa_set_op_cfg(&tmpl, CESA_SA_DESC_CFG_MACM_MD5);
> > >
> > > - mv_cesa_ahash_init(req, &tmpl);
> > > + mv_cesa_ahash_init(req, &tmpl, true);
> >
> > mv_cesa_ahash_init(req, &tmpl, ALGO_ENDIAN_LITTLE);
> >
> > 'true' doesn't seem as obvious. But again, nit-picky.
>
> I did think about:
>
> enum {
> ALGO_LITTLE_ENDIAN,
> ALGO_BIG_ENDIAN,
> };
>
> and passing an int algo_endian around, but that seemed to be like too
> much bloat... though if you want to insist, I could make that change.

Like I said, it's a nit. Either a self-documenting bool, or an enum
will work.


thx,

Jason.

2015-10-19 22:53:55

by arno

[permalink] [raw]
Subject: Re: [PATCH 17/18] crypto: marvell: fix first-fragment handling in mv_cesa_ahash_dma_last_req()

Hi Russell,

Russell King <[email protected]> writes:

> When adding the software padding, this must be done using the first/mid
> fragment mode, and any subsequent operation needs to be a mid-fragment.
> Fix this.
>
> Signed-off-by: Russell King <[email protected]>
> ---
> drivers/crypto/marvell/hash.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index da541e59cc1d..34271e9eb3a5 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -575,20 +575,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
> if (IS_ERR(op))
> return op;
>
> - mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
> -
> if (len == trailerlen)
> return op;
>
> padoff += len;
> }
>
> - if (!mv_cesa_mac_op_is_first_frag(&creq->op_tmpl))
> - mv_cesa_update_op_cfg(&creq->op_tmpl,
> - CESA_SA_DESC_CFG_MID_FRAG,
> - CESA_SA_DESC_CFG_FRAG_MSK);
> -
> ret = mv_cesa_dma_add_data_transfer(chain,
> CESA_SA_DATA_SRAM_OFFSET,
> ahashdreq->padding_dma +

I'll consider it's just late here and I need some sleep but I fail to
align the commit message w/ the content of the patch, i.e. it's
unclear to me.

Cheers,

a+

2015-10-19 22:57:44

by arno

[permalink] [raw]
Subject: Re: [PATCH 00/18] crypto: further fixes for Marvell CESA hash

Hi Russell,

Boris Brezillon <[email protected]> writes:

> On Sun, 18 Oct 2015 17:16:49 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
>> Following on from the previous series, this series addresses further
>> problems with the Marvell CESA hash driver found while testing it my
>> openssl/ssh scenarios.
>>
>> The first patch improves one from the previous series: we can get the
>> transform more directly using req->base.tfm rather than going round
>> the houses.
>>
>> The next few patches rework the algorithm endianness conversions.
>> There are two things which depend on the algorithm endianness - the
>> format of the result, and the format of the bit length in the last
>> block. We introduce a flag to convey this information, and keep
>> the creq->state format in CPU endian mode for consistency.
>>
>> Some of the inconsistent hash results are down to the template
>> operation not being properly initialised - so we zero initialise all
>> template operations.
>>
>> The following patches (from "factor out first fragment decisions to
>> helper") rework the digest handling to ensure that we always provide
>> the hardware with a complete block of data to hash, otherwise it can
>> be left mid-calculation, which then causes state to leak to
>> subsequent operations. This requires a re-structure of the way we
>> put together the DMA entries, so it's done in relatively small steps.
>>
>> This results in the CESA driver passing all tests I can throw at it
>> via the AF_ALG openssl plugin with the exception of asking for the
>> hash of /dev/null. This returns an all zero result, rather than the
>> correct hash value. This bug is pending further diagnosis, but it
>> is believed not to be a driver specific bug as iMX6 CAAM also behaves
>> the same.
>>
>> Unfortunately, this is a large series, but the driver unfortunately
>> needs this level of bug fixing to work properly.
>
> Thanks for spending some time fixing those bugs and
> simplifying/factorizing/documenting the hash logic.

+1

> To the whole series:
>
> Acked-by: Boris Brezillon <[email protected]>
>
> Note that my ack might seem quite quick compared to the amount of
> changes, but Russell actually sent me a preliminary version last
> Wednesday.
>
> Arnaud, if you have some time, could you also have a look at those
> patches?

Acked-by: Arnaud Ebalard <[email protected]>

Cheers,

a+

2015-10-19 23:27:14

by arno

[permalink] [raw]
Subject: Re: [PATCH 4/6] crypto: marvell: use memcpy_fromio()/memcpy_toio()

Hi Russell,

Russell King <[email protected]> writes:

> Use the IO memcpy() functions when copying from/to MMIO memory.
> These locations were found via sparse.

On recent MVEBU hardware, *_std_* function are not expected to be used
because we will instead use the TDMA-based versions. So the only
possible penalty we could get (if any) from this change on recent
devices is for the copy of the IV in mv_cesa_ablkcipher_process(). Out
of curiosity, I took a quick look at what is generated and it seems this
results in a call to mmiocpy():

00000340 <mv_cesa_ablkcipher_process>:
340: e1a0c00d mov ip, sp
344: e92dd830 push {r4, r5, fp, ip, lr, pc}
348: e24cb004 sub fp, ip, #4
34c: e24dd008 sub sp, sp, #8

....

3a4: e5943010 ldr r3, [r4, #16]
3a8: e5951008 ldr r1, [r5, #8]
3ac: e594001c ldr r0, [r4, #28]
3b0: e2811040 add r1, r1, #64 ; 0x40
3b4: e593201c ldr r2, [r3, #28]
3b8: ebfffffe bl 0 <mmiocpy>

which if I am not mistaken is provided by arch/arm/lib/memcpy.S via:

ENTRY(mmiocpy)
ENTRY(memcpy)

#include "copy_template.S"

ENDPROC(memcpy)
ENDPROC(mmiocpy)

Am I right in concluding this will end up being the code of a usual
memcpy(), i.e. the change is a noop in the final code compared to
previous version?

Cheers,

a+


>
> Signed-off-by: Russell King <[email protected]>
> ---
> drivers/crypto/marvell/cipher.c | 11 ++++++-----
> drivers/crypto/marvell/hash.c | 16 ++++++++--------
> 2 files changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
> index 4db2d632204f..6edae64bb387 100644
> --- a/drivers/crypto/marvell/cipher.c
> +++ b/drivers/crypto/marvell/cipher.c
> @@ -98,10 +98,10 @@ static void mv_cesa_ablkcipher_std_step(struct ablkcipher_request *req)
>
> /* FIXME: only update enc_len field */
> if (!sreq->skip_ctx) {
> - memcpy(engine->sram, &sreq->op, sizeof(sreq->op));
> + memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op));
> sreq->skip_ctx = true;
> } else {
> - memcpy(engine->sram, &sreq->op, sizeof(sreq->op.desc));
> + memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op.desc));
> }
>
> mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
> @@ -145,8 +145,9 @@ static int mv_cesa_ablkcipher_process(struct crypto_async_request *req,
> if (ret)
> return ret;
>
> - memcpy(ablkreq->info, engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
> - crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));
> + memcpy_fromio(ablkreq->info,
> + engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
> + crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));
>
> return 0;
> }
> @@ -181,7 +182,7 @@ mv_cesa_ablkcipher_std_prepare(struct ablkcipher_request *req)
> sreq->size = 0;
> sreq->offset = 0;
> mv_cesa_adjust_op(engine, &sreq->op);
> - memcpy(engine->sram, &sreq->op, sizeof(sreq->op));
> + memcpy_toio(engine->sram, &sreq->op, sizeof(sreq->op));
> }
>
> static inline void mv_cesa_ablkcipher_prepare(struct crypto_async_request *req,
> diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
> index 84ddc4cbfa9d..8330815d72fc 100644
> --- a/drivers/crypto/marvell/hash.c
> +++ b/drivers/crypto/marvell/hash.c
> @@ -209,8 +209,8 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
> size_t len;
>
> if (creq->cache_ptr)
> - memcpy(engine->sram + CESA_SA_DATA_SRAM_OFFSET, creq->cache,
> - creq->cache_ptr);
> + memcpy_toio(engine->sram + CESA_SA_DATA_SRAM_OFFSET,
> + creq->cache, creq->cache_ptr);
>
> len = min_t(size_t, req->nbytes + creq->cache_ptr - sreq->offset,
> CESA_SA_SRAM_PAYLOAD_SIZE);
> @@ -251,10 +251,10 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
> if (len + trailerlen > CESA_SA_SRAM_PAYLOAD_SIZE) {
> len &= CESA_HASH_BLOCK_SIZE_MSK;
> new_cache_ptr = 64 - trailerlen;
> - memcpy(creq->cache,
> - engine->sram +
> - CESA_SA_DATA_SRAM_OFFSET + len,
> - new_cache_ptr);
> + memcpy_fromio(creq->cache,
> + engine->sram +
> + CESA_SA_DATA_SRAM_OFFSET + len,
> + new_cache_ptr);
> } else {
> len += mv_cesa_ahash_pad_req(creq,
> engine->sram + len +
> @@ -272,7 +272,7 @@ static void mv_cesa_ahash_std_step(struct ahash_request *req)
> mv_cesa_update_op_cfg(op, frag_mode, CESA_SA_DESC_CFG_FRAG_MSK);
>
> /* FIXME: only update enc_len field */
> - memcpy(engine->sram, op, sizeof(*op));
> + memcpy_toio(engine->sram, op, sizeof(*op));
>
> if (frag_mode == CESA_SA_DESC_CFG_FIRST_FRAG)
> mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
> @@ -312,7 +312,7 @@ static void mv_cesa_ahash_std_prepare(struct ahash_request *req)
>
> sreq->offset = 0;
> mv_cesa_adjust_op(engine, &creq->op_tmpl);
> - memcpy(engine->sram, &creq->op_tmpl, sizeof(creq->op_tmpl));
> + memcpy_toio(engine->sram, &creq->op_tmpl, sizeof(creq->op_tmpl));
> }
>
> static void mv_cesa_ahash_step(struct crypto_async_request *req)

2015-10-19 23:30:17

by arno

[permalink] [raw]
Subject: Re: [PATCH 0/6] Sparse related fixes

Hi,

Boris Brezillon <[email protected]> writes:

> On Sun, 18 Oct 2015 18:30:39 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
>> Continuing on from the previous set of 18 patches, I also fixed a
>> number of sparse problems and other cleanups. I don't deem these
>> suitable for -rc merging, especially now that we're basically at
>> -rc6.
>>
>> The first patch switches the driver over to appropriately using
>> the relaxed IO accessors - this avoids calling out to the heavy
>> barrier on every read and write operation, but only calling out on
>> those which really matter.
>>
>> We switch to using dma_addr_t for DMA addresses which are not accessed
>> by hardware, and using gfp_t for the get_free_page flags. String-based
>> MMIO accesses are used instead of plain memcpy()/memset() which prevents
>> us potentially stumbling over GCC optimisations that it thinks it may
>> make with these functions.
>>
>> We convert as much of the hardware state to __le32 endian markings,
>> and use cpu_to_le32() as appropriate. A number of places are left
>> unfixed, as we temporarily store CPU native endian values at these
>> locations; these warnings should not be fixed (basically, only
>> appropriate sparse warnings should be fixed without penalising code.)
>
> To the whole series:
>
> Acked-by: Boris Brezillon <[email protected]>

Same here:

Acked-by: Arnaud Ebalard <[email protected]>

2015-10-20 07:59:13

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 4/6] crypto: marvell: use memcpy_fromio()/memcpy_toio()

On Tue, Oct 20, 2015 at 01:26:55AM +0200, Arnaud Ebalard wrote:
> Hi Russell,
>
> Russell King <[email protected]> writes:
>
> > Use the IO memcpy() functions when copying from/to MMIO memory.
> > These locations were found via sparse.
>
> On recent MVEBU hardware, *_std_* function are not expected to be used
> because we will instead use the TDMA-based versions. So the only
> possible penalty we could get (if any) from this change on recent
> devices is for the copy of the IV in mv_cesa_ablkcipher_process(). Out
> of curiosity, I took a quick look at what is generated and it seems this
> results in a call to mmiocpy():
>
> 00000340 <mv_cesa_ablkcipher_process>:
> 340: e1a0c00d mov ip, sp
> 344: e92dd830 push {r4, r5, fp, ip, lr, pc}
> 348: e24cb004 sub fp, ip, #4
> 34c: e24dd008 sub sp, sp, #8
>
> ....
>
> 3a4: e5943010 ldr r3, [r4, #16]
> 3a8: e5951008 ldr r1, [r5, #8]
> 3ac: e594001c ldr r0, [r4, #28]
> 3b0: e2811040 add r1, r1, #64 ; 0x40
> 3b4: e593201c ldr r2, [r3, #28]
> 3b8: ebfffffe bl 0 <mmiocpy>
>
> which if I am not mistaken is provided by arch/arm/lib/memcpy.S via:
>
> ENTRY(mmiocpy)
> ENTRY(memcpy)
>
> #include "copy_template.S"
>
> ENDPROC(memcpy)
> ENDPROC(mmiocpy)
>
> Am I right in concluding this will end up being the code of a usual
> memcpy(), i.e. the change is a noop in the final code compared to
> previous version?

Almost - what's different is the compiler is not allowed to "optimise"
this memcpy call. See 1bd46782d08b ("ARM: avoid unwanted GCC
memset()/memcpy() optimisations for IO variants")

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-10-20 14:20:59

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 00/18] crypto: further fixes for Marvell CESA hash

On Sun, Oct 18, 2015 at 05:16:49PM +0100, Russell King - ARM Linux wrote:
> Following on from the previous series, this series addresses further
> problems with the Marvell CESA hash driver found while testing it my
> openssl/ssh scenarios.
>
> The first patch improves one from the previous series: we can get the
> transform more directly using req->base.tfm rather than going round
> the houses.
>
> The next few patches rework the algorithm endianness conversions.
> There are two things which depend on the algorithm endianness - the
> format of the result, and the format of the bit length in the last
> block. We introduce a flag to convey this information, and keep
> the creq->state format in CPU endian mode for consistency.
>
> Some of the inconsistent hash results are down to the template
> operation not being properly initialised - so we zero initialise all
> template operations.
>
> The following patches (from "factor out first fragment decisions to
> helper") rework the digest handling to ensure that we always provide
> the hardware with a complete block of data to hash, otherwise it can
> be left mid-calculation, which then causes state to leak to
> subsequent operations. This requires a re-structure of the way we
> put together the DMA entries, so it's done in relatively small steps.
>
> This results in the CESA driver passing all tests I can throw at it
> via the AF_ALG openssl plugin with the exception of asking for the
> hash of /dev/null. This returns an all zero result, rather than the
> correct hash value. This bug is pending further diagnosis, but it
> is believed not to be a driver specific bug as iMX6 CAAM also behaves
> the same.
>
> Unfortunately, this is a large series, but the driver unfortunately
> needs this level of bug fixing to work properly.

All applied. Thanks Russell!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-10-20 14:21:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/6] Sparse related fixes

On Sun, Oct 18, 2015 at 06:30:39PM +0100, Russell King - ARM Linux wrote:
> Continuing on from the previous set of 18 patches, I also fixed a
> number of sparse problems and other cleanups. I don't deem these
> suitable for -rc merging, especially now that we're basically at
> -rc6.
>
> The first patch switches the driver over to appropriately using
> the relaxed IO accessors - this avoids calling out to the heavy
> barrier on every read and write operation, but only calling out on
> those which really matter.
>
> We switch to using dma_addr_t for DMA addresses which are not accessed
> by hardware, and using gfp_t for the get_free_page flags. String-based
> MMIO accesses are used instead of plain memcpy()/memset() which prevents
> us potentially stumbling over GCC optimisations that it thinks it may
> make with these functions.
>
> We convert as much of the hardware state to __le32 endian markings,
> and use cpu_to_le32() as appropriate. A number of places are left
> unfixed, as we temporarily store CPU native endian values at these
> locations; these warnings should not be fixed (basically, only
> appropriate sparse warnings should be fixed without penalising code.)

All applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt