2012-02-10 18:09:35

by Tomas Janousek

[permalink] [raw]
Subject: Re: iwlagn: memory corruption with WPA enterprise

Hi guys,

On Sun, Nov 20, 2011 at 09:40:07PM +0100, Tomáš Janoušek wrote:
> > Yes, I will try your configuration when I get back to the office Monday

Did you have any luck? I just found out something which is almost completely
insane.

For the last few months, I've happily used a 64-bit kernel and have had no
problems whatsoever. About a week ago, I started using virtual machines in
KVM. And today I found that I have exactly the same problem, but only _inside_
the virtual machine. I can't reliably scp a file from the internet to my
virtual machine. It works fine when I scp to the host, it works fine when I'm
on a WPA-PSK network. And it happens even if I tell kvm to emulate e1000, not
only with virtio-net. How strange is that?

And while this is happening, the host is running just fine. The host has a
64-bit kernel with a 32-bit userspace, so if something was wrong with the
32-bit mode of my processor, it would've appeared on the host as well, no?

It's also worth mentioning that if I build openssl with "no-asm 386", scp
works just fine. So it doesn't look like a memory corruption after all. It
seems as if certain CPU instructions didn't work properly if running on a
32-bit kernel with a WiFi adapter doing something. But how can it be
that those same CPU instructions work on a 64-bit host with 32-bit userspace?
At the same time! That's just completely insane, and I can't think of an
explanation. Shall I get a new CPU perhaps? :-)

Please, give me any ideas that you might have.

Regards,
--
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/


2012-02-13 09:25:52

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: iwlagn: memory corruption with WPA enterprise

On Fri, Feb 10, 2012 at 07:09:29PM +0100, Tomáš Janoušek wrote:
> On Sun, Nov 20, 2011 at 09:40:07PM +0100, Tomáš Janoušek wrote:
> > > Yes, I will try your configuration when I get back to the office Monday
>
> Did you have any luck?
I think I tried to reproduce that problem and failed, but honestly I do
not remember right now ...

> I just found out something which is almost completely
> insane.
>
> For the last few months, I've happily used a 64-bit kernel and have had no
> problems whatsoever. About a week ago, I started using virtual machines in
> KVM. And today I found that I have exactly the same problem, but only _inside_
> the virtual machine. I can't reliably scp a file from the internet to my
> virtual machine. It works fine when I scp to the host, it works fine when I'm
> on a WPA-PSK network. And it happens even if I tell kvm to emulate e1000, not
> only with virtio-net. How strange is that?
>
> And while this is happening, the host is running just fine. The host has a
> 64-bit kernel with a 32-bit userspace, so if something was wrong with the
> 32-bit mode of my processor, it would've appeared on the host as well, no?
>
> It's also worth mentioning that if I build openssl with "no-asm 386", scp
> works just fine. So it doesn't look like a memory corruption after all. It
> seems as if certain CPU instructions didn't work properly if running on a
> 32-bit kernel with a WiFi adapter doing something. But how can it be
> that those same CPU instructions work on a 64-bit host with 32-bit userspace?
> At the same time! That's just completely insane, and I can't think of an
> explanation. Shall I get a new CPU perhaps? :-)

Currently there are discussion about compilator problems that
can result a corruption
http://lwn.net/Articles/478657/
Perhaps this problem is something similar.

Also, if you look at lspci -vt, does it show that corruption happen
only when PCI bridge is used (however that would not explain why it
only happens with WPA enterprise).

Stanislaw

2012-02-13 13:09:13

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: iwlagn: memory corruption with WPA enterprise

On Mon, Feb 13, 2012 at 10:25:39AM +0100, Stanislaw Gruszka wrote:
> On Fri, Feb 10, 2012 at 07:09:29PM +0100, Tomáš Janoušek wrote:
> > On Sun, Nov 20, 2011 at 09:40:07PM +0100, Tomáš Janoušek wrote:
> > > > Yes, I will try your configuration when I get back to the office Monday
> >
> > Did you have any luck?
> I think I tried to reproduce that problem and failed, but honestly I do
> not remember right now ...
>
> > I just found out something which is almost completely
> > insane.
> >
> > For the last few months, I've happily used a 64-bit kernel and have had no
> > problems whatsoever. About a week ago, I started using virtual machines in
> > KVM. And today I found that I have exactly the same problem, but only _inside_
> > the virtual machine. I can't reliably scp a file from the internet to my
> > virtual machine. It works fine when I scp to the host, it works fine when I'm
> > on a WPA-PSK network. And it happens even if I tell kvm to emulate e1000, not
> > only with virtio-net. How strange is that?
> >
> > And while this is happening, the host is running just fine. The host has a
> > 64-bit kernel with a 32-bit userspace, so if something was wrong with the
> > 32-bit mode of my processor, it would've appeared on the host as well, no?
> >
> > It's also worth mentioning that if I build openssl with "no-asm 386", scp
> > works just fine. So it doesn't look like a memory corruption after all. It
> > seems as if certain CPU instructions didn't work properly if running on a
> > 32-bit kernel with a WiFi adapter doing something. But how can it be
> > that those same CPU instructions work on a 64-bit host with 32-bit userspace?
> > At the same time! That's just completely insane, and I can't think of an
> > explanation. Shall I get a new CPU perhaps? :-)
>
> Currently there are discussion about compilator problems that
> can result a corruption
> http://lwn.net/Articles/478657/
> Perhaps this problem is something similar.
>
> Also, if you look at lspci -vt, does it show that corruption happen
> only when PCI bridge is used (however that would not explain why it
> only happens with WPA enterprise).

I also found this bug report
https://bugzilla.kernel.org/show_bug.cgi?id=37742
where one user report iwlwifi corruption catched by IOMMU.

Tomáš, I do not remember, do you have the same problems on
older kernels i.e < 3.0

Stanislaw

2012-02-13 13:29:42

by Tomas Janousek

[permalink] [raw]
Subject: Re: iwlagn: memory corruption with WPA enterprise

Hi,

On Mon, Feb 13, 2012 at 02:09:04PM +0100, Stanislaw Gruszka wrote:
> I also found this bug report
> https://bugzilla.kernel.org/show_bug.cgi?id=37742
> where one user report iwlwifi corruption catched by IOMMU.

I wasn't able to catch anything using IOMMU, and I also wasn't able to
reproduce the issue using any userspace memory checking tool. Hence I tend to
believe we're not dealing with a memory corruption at all, perhaps something
like certain CPU flags/registers not being correctly saved/restored during
wlan interrupts or something. When I have a sufficient amount of free time,
I'll try to check this hypothesis and perhaps pinpoint the instruction the
result of which is corrupted during wlan operation.

> Tomáš, I do not remember, do you have the same problems on
> older kernels i.e < 3.0

Yeah. I was able to reproduce it with 2.6.38.8 and 2.6.39.4 at least.

Regards,
--
Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/

2012-02-14 09:21:05

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: iwlagn: memory corruption with WPA enterprise

On Fri, Feb 10, 2012 at 07:09:29PM +0100, Tomáš Janoušek wrote:
> For the last few months, I've happily used a 64-bit kernel and have had no
> problems whatsoever. About a week ago, I started using virtual machines in
> KVM. And today I found that I have exactly the same problem, but only _inside_
> the virtual machine. I can't reliably scp a file from the internet to my
> virtual machine. It works fine when I scp to the host, it works fine when I'm
> on a WPA-PSK network. And it happens even if I tell kvm to emulate e1000, not
> only with virtio-net. How strange is that?
>
> And while this is happening, the host is running just fine. The host has a
> 64-bit kernel with a 32-bit userspace, so if something was wrong with the
> 32-bit mode of my processor, it would've appeared on the host as well, no?
>
> It's also worth mentioning that if I build openssl with "no-asm 386", scp
> works just fine.
Good hint.

> So it doesn't look like a memory corruption after all. It
> seems as if certain CPU instructions didn't work properly if running on a
> 32-bit kernel with a WiFi adapter doing something. But how can it be
> that those same CPU instructions work on a 64-bit host with 32-bit userspace?
> At the same time! That's just completely insane, and I can't think of an
> explanation. Shall I get a new CPU perhaps? :-)
>
>
> Please, give me any ideas that you might have.

That make sense! Your "CPU instructions break things" theory sounds crazy,
but I think it's logical. WPA enterprise differ from WPA-PSA (pre shared
key) that the key changed periodically, SSL is used when keys are changed
(via wpa_supplicant). So looks like 32-bit openssl generate object code
that trigger bug on CPU, which crash other processes.

Please forward details about this issue to [email protected] and proper
vendor engineer in non public manner, as this hw bug could be possibly
exploitable (hardware bug can not be fixed, but kernel could disable
appropriate functionality or use some other workaround).

Thanks
Stanislaw