2011-11-21 04:08:39

by Nick Kossifidis

[permalink] [raw]
Subject: A weird hw crypto bug...

Some time ago we had a few reports of AR2413 cards being unable to
encrypt packets of specific lengths.

>From https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090, user
Musaraigne did some further investigation/debuging on this:

"The cause of the high and unpredictable latencies is that packets are
dropped depending on their size. According to my tests, packet sizes
of the form
size = 128*k + 81 + m
or
size = 128*k + 105 + m
for any k>=2 and 0<=m<=7

are dropped randomly in 90-95% of the cases. Conversely, all other
packet sizes work fine.

You can see for yourself if this is true on your system:
ping -M do -s 596 http://www.google.com
should result in 90% packet loss (because 624 = 128*4 + 105 + 7; the
28 byte difference comes from network headers added by ping)
while
ping -M do -s 597 http://www.google.com
should result in negligible packet loss."

So far I couldn't reproduce this bug with my AR2413 and now after e
few user reports I'm almost sure that it's only present on cards
created by Askey. I verified it again recently based on a bug report
from an OpenSUSE user here...

https://bugzilla.novell.com/show_bug.cgi?id=731576

What seems weird is that some users reported that MadWiFi worked fine
on these cards (I have no idea what happens on their windows driver,
maybe they disable hw crypto) ! Since for hw crypto we use code from
the common ath module (and I think this is the same with HAL) I don't
see how that can happen.

Anyway my question is how to handle this:

a) Just disable hw crypto for all AR2413 cards made by Askey
b) If we handle such card do some padding based on Musaraigne's findings

I'd like to go with b but I don't know how to do it correctly without
corrupting anything. Any ideas ?


--
GPG ID: 0xEE878588
As you read this post global entropy rises. Have Fun ;-)
Nick


2011-11-21 13:22:17

by Adrian Chadd

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

On 21 November 2011 19:16, Albert Gall <[email protected]> wrote:
> The same problem described in this thread appear AR2414 hardware. Loading
> ath5k with nohwcrypt = 1 everything works fine,?attached?information and
> evidence that?I hope will be?useful.
> If you?need more?please just?ask.

Well, what would be useful is figuring out why the frames are being
dropped in the first place.

Nick, did you get verification whether this is a bug with crypto TX, RX or both?


Adrian

2011-11-21 21:45:06

by Adrian Chadd

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

2011/11/22 G?bor Stefanik <[email protected]>:

> My guess is that they are using some king of ES/pre-production silicon
> that should have been destroyed, but was instead dumped on the
> black/grey-market.

That should be pretty clear by the MAC major/minor revision.
Maybe if someone can write down the exact part number from the MAC itself.


Adrian

2011-11-21 08:01:58

by Adrian Chadd

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

Well, unless that vendor spun their own silicon, or there's something
funky on the board that'd change crypto behaviour, the only thing I
can really think of are EEPROM settings.
Or maybe the MAC revision is slightly different, I dunno.



Adrian

2011-11-21 06:19:55

by Adrian Chadd

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

On 21 November 2011 12:08, Nick Kossifidis <[email protected]> wrote:
> Some time ago we had a few reports of AR2413 cards being unable to
> encrypt packets of specific lengths.
>
> From https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090, user
> Musaraigne did some further investigation/debuging on this:
>
> "The cause of the high and unpredictable latencies is that packets are
> dropped depending on their size. According to my tests, packet sizes
> of the form
> ?size = 128*k + 81 + m
> or
> ?size = 128*k + 105 + m
> for any k>=2 and 0<=m<=7
>
> are dropped randomly in 90-95% of the cases. Conversely, all other
> packet sizes work fine.

That seems a bit odd. Erm, how do they fail?

* Does the hardware just plain fail at encrypting the frame?
* Does the hardware just plain fail to _TX_ the frame of that size?
(ie, if you disable crypto and take padding into account, does it
fail?)
* What about other encryption types? AES? TKIP? WEP? None? :-)
* This is a PCIe NIC, right? Are there some kind of weird bus bugs
that you're seeing with this particular NIC and this particular bus
glue?

Since Madwifi works, I wonder if FreeBSD also works. If so, there's
only a few places I'd bet you'll find weird stuff:

* how the bus is setup during attach (eg overriding PCI values);
* which crypto key slots are allocated;
* are you still doing split keys on that hardware? or are they
combined like the ar5416+ chips are?
* are you perhaps doing multi-descriptor TX frames and you've not set
the encryption bits correctly? IIRC, the enctype field needs to match
on all descriptors of a frame (eg, if your packet is a chain rather
than a single skb, you need to make sure you're copying the descriptor
fields right.)
* what about the frame and header length fields? and encryption
padding? are they all setup the same with madwifi versus ath5k?

I bet a bit of methodical poking is going to reveal this bug. I find
it rather annoying that ath5k/ath9k have lots of encryption issues but
I've not had any reports of encryption on FreeBSD failing (save where
the AP is totally lying about which keyidx a frame is encrypted with.)
It's possible that you're seeing these errors because more people are
using ath5k then FreeBSD, but still:

* does freebsd-9 (wifi) work on the same hardware?

Nick, can you find out if someone can send us one of these weird NICs?
I'll throw it into my pile-o-old-stuff-to-test and do some regression
testing with it.


Adrian

2011-11-21 21:08:49

by Gábor Stefanik

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

On Mon, Nov 21, 2011 at 9:01 AM, Adrian Chadd <[email protected]> wrote:
> Well, unless that vendor spun their own silicon, or there's something
> funky on the board that'd change crypto behaviour, the only thing I
> can really think of are EEPROM settings.
> Or maybe the MAC revision is slightly different, I dunno.
>
>
>
> Adrian

My guess is that they are using some king of ES/pre-production silicon
that should have been destroyed, but was instead dumped on the
black/grey-market.

--
Vista: [V]iruses, [I]ntruders, [S]pyware, [T]rojans and [A]dware. :-)

2011-11-21 07:52:40

by Nick Kossifidis

[permalink] [raw]
Subject: Re: A weird hw crypto bug...

2011/11/21 Adrian Chadd <[email protected]>:
> On 21 November 2011 12:08, Nick Kossifidis <[email protected]> wrote:
>> Some time ago we had a few reports of AR2413 cards being unable to
>> encrypt packets of specific lengths.
>>
>> From https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090, user
>> Musaraigne did some further investigation/debuging on this:
>>
>> "The cause of the high and unpredictable latencies is that packets are
>> dropped depending on their size. According to my tests, packet sizes
>> of the form
>>  size = 128*k + 81 + m
>> or
>>  size = 128*k + 105 + m
>> for any k>=2 and 0<=m<=7
>>
>> are dropped randomly in 90-95% of the cases. Conversely, all other
>> packet sizes work fine.
>
> That seems a bit odd. Erm, how do they fail?
>
> * Does the hardware just plain fail at encrypting the frame?
> * Does the hardware just plain fail to _TX_ the frame of that size?
> (ie, if you disable crypto and take padding into account, does it
> fail?)
> * What about other encryption types? AES? TKIP? WEP? None? :-)
> * This is a PCIe NIC, right? Are there some kind of weird bus bugs
> that you're seeing with this particular NIC and this particular bus
> glue?

This is about specific AR2413 cards made by Askey, AR2413 cards in
general work fine, I have one myself. Problem is when hw encryption is
enabled (not sure about WEP, I think I've seen a report on WEP also)
when packets of specific lengths are dropped. They work O.K. without
encryption or with sw encryption. I haven't seen any report with dumps
etc so I don't know if packers are corrupted or never transmitted etc,
it's just weird that it happens only on specific AR2413 cards, not any
AR2413 card (that means in general we handle hw encryption correctly
as it works fine on most cards, in fact right now I think that's the
only active bug we have on encryption). The way I see it it's a hw
issue found on these implementations that we don't handle. Btw AR2413
is plain pci card, only AR5424/2424/2425 are pci-e from ar5k series
(it's also AR5418 but it's handled by ath9k).

> Since Madwifi works, I wonder if FreeBSD also works. If so, there's
> only a few places I'd bet you'll find weird stuff:
>
> * how the bus is setup during attach (eg overriding PCI values);
> * which crypto key slots are allocated;
> * are you still doing split keys on that hardware? or are they
> combined like the ar5416+ chips are?

When we had crypto code inside ath5k we did what the old HAL did (via
rev. engineering), now we use the common code from the ath module, so
we do whatever ath9k does (and I think MadWiFi too). Again hw crypto
works fine, even on AR2413 cards, it's this vendor that has done
something weird that results tx failures on hw encryption. Plus I
don't know what version of MadWiFi they used or if they used hw crypto
on MadWiFi at all.

> * are you perhaps doing multi-descriptor TX frames and you've not set
> the encryption bits correctly? IIRC, the enctype field needs to match
> on all descriptors of a frame (eg, if your packet is a chain rather
> than a single skb, you need to make sure you're copying the descriptor
> fields right.)

No fast frames here or multi-descriptor stuff.

> * what about the frame and header length fields? and encryption
> padding? are they all setup the same with madwifi versus ath5k?

That would show up as most of the things you mention above on any card
since it's common code, or even on a specific chip version/revision.
Here we have this issue with a specific implementation and everything
works fine on other implementations with the same chip.

> I bet a bit of methodical poking is going to reveal this bug. I find
> it rather annoying that ath5k/ath9k have lots of encryption issues but
> I've not had any reports of encryption on FreeBSD failing (save where
> the AP is totally lying about which keyidx a frame is encrypted with.)
> It's possible that you're seeing these errors because more people are
> using ath5k then FreeBSD, but still:
>
> * does freebsd-9 (wifi) work on the same hardware?

No idea :-(

> Nick, can you find out if someone can send us one of these weird NICs?
> I'll throw it into my pile-o-old-stuff-to-test and do some regression
> testing with it.
>

Haven't found so far but I think it's clearly a hw issue (I've done
some testing on my AR2413 and it seems fine) and since we know what
packet lengths fail maybe we can create a workaround with padding etc.



--
GPG ID: 0xEE878588
As you read this post global entropy rises. Have Fun ;-)
Nick