2010-11-04 18:00:55

by Phil Sutter

[permalink] [raw]
Subject: [PATCH] crypto: padlock: fix for non-64byte aligned data

Using cryptodev-linux with it's zero-copy functionality, one is mighty
enough to pass various misaligned/mis-sized buffers unmodified to the
engine's driver, which occured to be an easy way to break padlock
equipped machines:

On one hand, the original code is broken in padlock_xcrypt_cbc(): when
passing the "initial" bytes to xcryptcbc, 'count' is incorrectly used as
length. This may trigger prefetch-related issues, but will definitely
lead to data corruption as xcryptcbc is called again afterwards,
altering (count - initial) * AES_BLOCK_SIZE bytes after the end of
'output' in memory.

Another problem occurs when passing non-64byte aligned buffers, which
leads to memory corruption in userspace (running applications crash
randomly). This problem is too subtile for me to have more than vague
assumptions about it's origin. Anyways, this patch fixes them:

Instead of handling the "odd" bytes (i.e., the remainder when dividing
into prefetch blocks of 64bytes) at the beginning, go for them in the
end, copying the data out if prefetching would run beyond the page
boundary.

Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: Nico Erfurth <[email protected]>
---
drivers/crypto/padlock-aes.c | 29 +++++++++++------------------
1 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/padlock-aes.c b/drivers/crypto/padlock-aes.c
index 2e992bc..b3f3382 100644
--- a/drivers/crypto/padlock-aes.c
+++ b/drivers/crypto/padlock-aes.c
@@ -260,19 +260,14 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 *output, void *key,
{
u32 initial = count & (ecb_fetch_blocks - 1);

- if (count < ecb_fetch_blocks) {
- ecb_crypt(input, output, key, control_word, count);
- return;
- }
-
- if (initial)
+ if (count > initial) {
asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */
: "+S"(input), "+D"(output)
- : "d"(control_word), "b"(key), "c"(initial));
+ : "d"(control_word), "b"(key), "c"(count - initial));
+ }

- asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */
- : "+S"(input), "+D"(output)
- : "d"(control_word), "b"(key), "c"(count - initial));
+ if (initial)
+ ecb_crypt(input, output, key, control_word, initial);
}

static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key,
@@ -280,17 +275,15 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key,
{
u32 initial = count & (cbc_fetch_blocks - 1);

- if (count < cbc_fetch_blocks)
- return cbc_crypt(input, output, key, iv, control_word, count);
-
- if (initial)
+ if (count > initial) {
asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
: "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (count));
+ : "d" (control_word), "b" (key), "c" (count - initial));
+ }
+
+ if (initial)
+ return cbc_crypt(input, output, key, iv, control_word, initial);

- asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
- : "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (count-initial));
return iv;
}

--
1.7.3.2


2010-11-04 18:46:14

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

On Thu, Nov 04, 2010 at 05:50:36PM +0000, Phil Sutter wrote:
> Using cryptodev-linux with it's zero-copy functionality, one is mighty
> enough to pass various misaligned/mis-sized buffers unmodified to the
> engine's driver, which occured to be an easy way to break padlock
> equipped machines:

OK.

> On one hand, the original code is broken in padlock_xcrypt_cbc(): when
> passing the "initial" bytes to xcryptcbc, 'count' is incorrectly used as
> length. This may trigger prefetch-related issues, but will definitely
> lead to data corruption as xcryptcbc is called again afterwards,
> altering (count - initial) * AES_BLOCK_SIZE bytes after the end of
> 'output' in memory.

Ouch, does the attached patch fix this problem for you?

> Another problem occurs when passing non-64byte aligned buffers, which
> leads to memory corruption in userspace (running applications crash
> randomly). This problem is too subtile for me to have more than vague
> assumptions about it's origin. Anyways, this patch fixes them:

I'd like to determine whether this is due to the previous bug.
If it still crashes randomly even with my one-line patch please
let me know.

> Instead of handling the "odd" bytes (i.e., the remainder when dividing
> into prefetch blocks of 64bytes) at the beginning, go for them in the
> end, copying the data out if prefetching would run beyond the page
> boundary.

I'd like to avoid this copying unless the hardware really needs
it.

Can you provide some information on the CPU where you're seeing
this?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
commit c054a076a1bd4731820a9c4d638b13d5c9bf5935
Author: Herbert Xu <[email protected]>
Date: Thu Nov 4 14:38:39 2010 -0400

crypto: padlock - Fix AES-CBC handling on odd-block-sized input

On certain VIA chipsets AES-CBC requires the input/output to be
a multiple of 64 bytes. We had a workaround for this but it was
buggy as it sent the whole input for processing when it is meant
to only send the initial number of blocks which makes the rest
a multiple of 64 bytes.

As expected this causes memory corruption whenever the workaround
kicks in.

Reported-by: Phil Sutter <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

diff --git a/drivers/crypto/padlock-aes.c b/drivers/crypto/padlock-aes.c
index 2e992bc..8a515ba 100644
--- a/drivers/crypto/padlock-aes.c
+++ b/drivers/crypto/padlock-aes.c
@@ -286,7 +286,7 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key,
if (initial)
asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
: "+S" (input), "+D" (output), "+a" (iv)
- : "d" (control_word), "b" (key), "c" (count));
+ : "d" (control_word), "b" (key), "c" (initial));

asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */
: "+S" (input), "+D" (output), "+a" (iv)

2010-11-05 14:12:39

by Phil Sutter

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

Herbert,

On Thu, Nov 04, 2010 at 01:46:06PM -0500, Herbert Xu wrote:
> > On one hand, the original code is broken in padlock_xcrypt_cbc(): when
> > passing the "initial" bytes to xcryptcbc, 'count' is incorrectly used as
> > length. This may trigger prefetch-related issues, but will definitely
> > lead to data corruption as xcryptcbc is called again afterwards,
> > altering (count - initial) * AES_BLOCK_SIZE bytes after the end of
> > 'output' in memory.
>
> Ouch, does the attached patch fix this problem for you?

Yes, kind of. With that trivial fix applied, the driver is stable most
of the time.

> > Another problem occurs when passing non-64byte aligned buffers, which
> > leads to memory corruption in userspace (running applications crash
> > randomly). This problem is too subtile for me to have more than vague
> > assumptions about it's origin. Anyways, this patch fixes them:
>
> I'd like to determine whether this is due to the previous bug.
> If it still crashes randomly even with my one-line patch please
> let me know.

Yes, it does, but triggering the bug is not really trivial. I've had
best results with a speed testing tool using the asynchronous interface,
memory corruption occured in each run. The same tool operating
synchronously doesn't crash as soon, but having three or more instances
running in parallel yields the same result.

This problem is so racey, a simple printk statement at the beginning of
padlock_xcrypt_ecb() fixes it. Enclosing the same function's content in
lock_kernel()/unlock_kernel() statements helps as well.

> > Instead of handling the "odd" bytes (i.e., the remainder when dividing
> > into prefetch blocks of 64bytes) at the beginning, go for them in the
> > end, copying the data out if prefetching would run beyond the page
> > boundary.
>
> I'd like to avoid this copying unless the hardware really needs
> it.

As stated initially, I'm not sure why the proposed change fixes
anything. AFAICT, both algorithms are correct in theory. I can't find a
case that breaks the original one reproducably. So my confidence
regarding the change's validity is based on trial and error. Maybe
someone with more knowledge about the various Via erratae can provide
some insights here.

> Can you provide some information on the CPU where you're seeing
> this?

This is the faulty one:
| -bash-4.0# cat /proc/cpuinfo
| processor : 0
| vendor_id : CentaurHauls
| cpu family : 6
| model : 15
| model name : VIA Nano processor [email protected]
| stepping : 2
| cpu MHz : 1599.696
| cache size : 1024 KB
| fdiv_bug : no
| hlt_bug : no
| f00f_bug : no
| coma_bug : no
| fpu : yes
| fpu_exception : yes
| cpuid level : 10
| wp : yes
| flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm syscall nx fxsr_opt rdtscp lm constant_tsc rep_good pni monitor est tm2 ssse3 cx16 xtpr rng rng_en ace ace_en ace2 phe phe_en lahf_lm
| bogomips : 3199.39
| clflush size : 64
| cache_alignment : 128
| address sizes : 36 bits physical, 48 bits virtual
| power management:

I have a C7 for comparison:
| -bash-4.0# cat /proc/cpuinfo
| processor : 0
| vendor_id : CentaurHauls
| cpu family : 6
| model : 13
| model name : VIA C7 Processor 1500MHz
| stepping : 0
| cpu MHz : 1500.100
| cache size : 128 KB
| fdiv_bug : no
| hlt_bug : no
| f00f_bug : no
| coma_bug : no
| fpu : yes
| fpu_exception : yes
| cpuid level : 1
| wp : yes
| flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx pni est tm2 xtpr rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
| bogomips : 3000.20
| clflush size : 64
| cache_alignment : 64
| address sizes : 36 bits physical, 32 bits virtual
| power management:

The C7 is definitely not affected by this bug, so your one-liner fixes all
issues for it.

Greetings, Phil

2010-12-02 06:14:51

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

On Fri, Nov 05, 2010 at 03:12:38PM +0100, Phil Sutter wrote:
>
> Yes, kind of. With that trivial fix applied, the driver is stable most
> of the time.

Great.

> Yes, it does, but triggering the bug is not really trivial. I've had
> best results with a speed testing tool using the asynchronous interface,
> memory corruption occured in each run. The same tool operating
> synchronously doesn't crash as soon, but having three or more instances
> running in parallel yields the same result.
>
> This problem is so racey, a simple printk statement at the beginning of
> padlock_xcrypt_ecb() fixes it. Enclosing the same function's content in
> lock_kernel()/unlock_kernel() statements helps as well.

Interesting. Do you have preemption enabled?

Can you share the test program with us?

> > Can you provide some information on the CPU where you're seeing
> > this?
>
> This is the faulty one:
> | -bash-4.0# cat /proc/cpuinfo
> | processor : 0
> | vendor_id : CentaurHauls
> | cpu family : 6
> | model : 15
> | model name : VIA Nano processor [email protected]
> | stepping : 2

Is there anyone else on the list who has this hardware?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-12-07 10:41:42

by Phil Sutter

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

Hi,

On Thu, Dec 02, 2010 at 02:14:41PM +0800, Herbert Xu wrote:
> > Yes, it does, but triggering the bug is not really trivial. I've had
> > best results with a speed testing tool using the asynchronous interface,
> > memory corruption occured in each run. The same tool operating
> > synchronously doesn't crash as soon, but having three or more instances
> > running in parallel yields the same result.
> >
> > This problem is so racey, a simple printk statement at the beginning of
> > padlock_xcrypt_ecb() fixes it. Enclosing the same function's content in
> > lock_kernel()/unlock_kernel() statements helps as well.
>
> Interesting. Do you have preemption enabled?

Yes, CONFIG_PREEMPT is active in my test system's kernel.

> Can you share the test program with us?

You can get it along with the actual cryptodev-implementation at
http://repo.or.cz/w/cryptodev-linux.git. I've been testing with speed.c
and async_speed.c, both can be found inside the examples directory.

Greetings, Phil

2010-12-07 11:40:04

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

On Tue, Dec 07, 2010 at 11:41:41AM +0100, Phil Sutter wrote:
>
> Yes, CONFIG_PREEMPT is active in my test system's kernel.

OK, can you see if the problem is still reproducible without
preemption?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2010-12-07 13:40:58

by Phil Sutter

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

Hi,

On Tue, Dec 07, 2010 at 07:39:56PM +0800, Herbert Xu wrote:
> On Tue, Dec 07, 2010 at 11:41:41AM +0100, Phil Sutter wrote:
> >
> > Yes, CONFIG_PREEMPT is active in my test system's kernel.
>
> OK, can you see if the problem is still reproducible without
> preemption?

Yes, it is. I just redid the test with a CONFIG_PREEMPT_NONE kernel,
memory corruption occurs every time.

Greetings, Phil

2010-12-08 00:08:43

by Harald Welte

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

Sorry for jumping into this thread so late.

I am almost constantly on business travel (without any VIA system with me), so
I cannot try to reproduce the problem. It probably has never shown up before
since my typical test case is dm-crypt and there you will of course never see
any unaligned buffers for encryption.

I'm returning home on December 11, and hopfeully can do some testing at
that time.

Is there a list of machines (/proc/cpuinfo) on which the problem has shown up
so far?

Regards,
Harald
--
- Harald Welte <[email protected]> http://linux.via.com.tw/

2010-12-08 01:52:59

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: padlock: fix for non-64byte aligned data

On Wed, Dec 08, 2010 at 08:08:13AM +0800, Harald Welte wrote:
>
> Is there a list of machines (/proc/cpuinfo) on which the problem has shown up
> so far?

Yes, this is the one that's show the problem:

| -bash-4.0# cat /proc/cpuinfo
| processor : 0
| vendor_id : CentaurHauls
| cpu family : 6
| model : 15
| model name : VIA Nano processor [email protected]
| stepping : 2
| cpu MHz : 1599.696
| cache size : 1024 KB
| fdiv_bug : no
| hlt_bug : no
| f00f_bug : no
| coma_bug : no
| fpu : yes
| fpu_exception : yes
| cpuid level : 10
| wp : yes
| flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm syscall nx fxsr_opt rdtscp lm constant_tsc rep_good pni monitor est tm2 ssse3 cx16 xtpr rng rng_en ace ace_en ace2 phe phe_en lahf_lm
| bogomips : 3199.39
| clflush size : 64
| cache_alignment : 128
| address sizes : 36 bits physical, 48 bits virtual
| power management:

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt