2008-09-11 08:04:54

by Marcel Holtmann

[permalink] [raw]
Subject: Crash with 2.6.27-rc6 with iwlwifi



Attachments:
IMG_0124.JPG (540.64 kB)

2008-09-25 04:29:43

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Thu, 2008-09-25 at 01:00 +0200, Ian Schram wrote:

> > Incidentally, if this really is the same bug I have, you might want to
> > remove the iwlagn module completely, even loading it once is a bad
> > thing, it has started to corrupt things here entirely by making DMA go
> > to random locations.
>
> with the release of -rc7 i was starting to wonder what had happened to this bug,
> i'm not affected by it(other hardware) but from what i gather it's a serious problem?
>
> i have always wondered what people affected by this find if they revert the
> "memory allocation optimization" patch
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=da99c4b6c25964b90c79f19beccda208df1a865a
>
> apologies if this is a too random stab in the dark, just sounds like we might want to fix this
> before .27. And since there seems to be no git bisects that pinpoint the problem ..
> i figured i might as well send this.

I haven't really pondered bisecting since it happens for me with iwl5000
hardware which wasn't supported before... Because this bug was affecting
my machine quite a bit I've now removed the wireless card from it for
the time being. I still think it should be easy to reproduce by using a
machine with a good amount of memory (I have 6.5G) and enabling
CONFIG_SLUB_DEBUG, which seems to have at least some part in it.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-09-11 15:46:44

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Thu, 2008-09-11 at 17:03 +0200, Marcel Holtmann wrote:

> > That looks like the use-after-free BUG_ON I've been getting with a 5k
> > card, do you have sl*b debugging enabled?
>
> I have CONFIG_SLUB_DEBUG=y in this kernel. Including all sorts of kernel
> debugging features enabled. Anything else I would need to give better
> feedback?

No, I have no idea, but maybe Tomas doesn't have them enabled? The
message right before the BUG_ON says something about 0x6b... I have no
idea what's causing this, just saying that it's most likely the same
thing and I was able to capture the full message.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-09-11 23:47:11

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Yi,

> It should be triggered by the BUG_ON() in the function. Are you able to
> confirm it with netconsole or maybe even a frame buffer enabled console
> (it just shows more lines)? If this is the culprit, Tomas has been
> already working on it. But your BT coexist finding should help us
> reproducing the bug.
> http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703

so I was looking through the bug and switched some options around based
on wild assumptions made in that bug. The current settings are this:

CONFIG_IWLWIFI=m
CONFIG_IWLCORE=m
# CONFIG_IWLWIFI_LEDS is not set
CONFIG_IWLWIFI_RFKILL=y
# CONFIG_IWLWIFI_DEBUG is not set
CONFIG_IWLAGN=m
# CONFIG_IWLAGN_SPECTRUM_MEASUREMENT is not set
# CONFIG_IWLAGN_LEDS is not set
CONFIG_IWL4965=y
# CONFIG_IWL5000 is not set
# CONFIG_IWL3945 is not set

these seem to be stable and have no problems. I tried to crash the
kernel, but didn't manage it.

Previously I had IWLWIFI_RFKILL off (wich was an oversight) and
IWLAGN_LEDS on.

So my guess is either disabling RFKILL or enabling LEDS makes the
difference. This is another reason to not have all these config options,
because nobody find any bugs :(

Regards

Marcel



2008-09-12 02:10:02

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Yi,

> > It should be triggered by the BUG_ON() in the function. Are you able to
> > confirm it with netconsole or maybe even a frame buffer enabled console
> > (it just shows more lines)? If this is the culprit, Tomas has been
> > already working on it. But your BT coexist finding should help us
> > reproducing the bug.
> > http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703
>
> so I was looking through the bug and switched some options around based
> on wild assumptions made in that bug. The current settings are this:
>
> CONFIG_IWLWIFI=m
> CONFIG_IWLCORE=m
> # CONFIG_IWLWIFI_LEDS is not set
> CONFIG_IWLWIFI_RFKILL=y
> # CONFIG_IWLWIFI_DEBUG is not set
> CONFIG_IWLAGN=m
> # CONFIG_IWLAGN_SPECTRUM_MEASUREMENT is not set
> # CONFIG_IWLAGN_LEDS is not set
> CONFIG_IWL4965=y
> # CONFIG_IWL5000 is not set
> # CONFIG_IWL3945 is not set
>
> these seem to be stable and have no problems. I tried to crash the
> kernel, but didn't manage it.
>
> Previously I had IWLWIFI_RFKILL off (wich was an oversight) and
> IWLAGN_LEDS on.
>
> So my guess is either disabling RFKILL or enabling LEDS makes the
> difference. This is another reason to not have all these config options,
> because nobody find any bugs :(

I take this back. It took a couple of hours, but it crashed again.

Regards

Marcel



2008-09-12 07:50:27

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:

> I take this back. It took a couple of hours, but it crashed again.

That's kinda weird, for me it crashes instantly on the first packet. The
oops still looks similar though.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-09-11 09:02:55

by Zhu Yi

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Marcel!

On Thu, 2008-09-11 at 10:04 +0200, Marcel Holtmann wrote:
> However I tried to reproduce it without Bluetooth enabled, it didn't
> show or I didn't wait long enough. Once I started bluetoothd it showed
> within a few minutes.
>
> So the best I got was the attached screenshot of an oops. It is
> clearly the iwlwifi driver crashing here and killing the machine, but
> I have no idea why.

>From the screenshot line:

iwl_tx_cmd_complete+0x43/0x245

It should be triggered by the BUG_ON() in the function. Are you able to
confirm it with netconsole or maybe even a frame buffer enabled console
(it just shows more lines)? If this is the culprit, Tomas has been
already working on it. But your BT coexist finding should help us
reproducing the bug.
http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703

Thanks,
-yi


2008-09-24 23:00:20

by Ian Schram

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi



Johannes Berg wrote:
> On Thu, 2008-09-11 at 10:04 +0200, Marcel Holtmann wrote:
>
>> So the best I got was the attached screenshot of an oops. It is
>> clearly the iwlwifi driver crashing here and killing the machine, but
>> I have no idea why.
>
> Incidentally, if this really is the same bug I have, you might want to
> remove the iwlagn module completely, even loading it once is a bad
> thing, it has started to corrupt things here entirely by making DMA go
> to random locations.

with the release of -rc7 i was starting to wonder what had happened to this bug,
i'm not affected by it(other hardware) but from what i gather it's a serious problem?

i have always wondered what people affected by this find if they revert the
"memory allocation optimization" patch

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=da99c4b6c25964b90c79f19beccda208df1a865a

apologies if this is a too random stab in the dark, just sounds like we might want to fix this
before .27. And since there seems to be no git bisects that pinpoint the problem ..
i figured i might as well send this.

ian

>
> johannes

2008-09-11 10:37:03

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Thu, 2008-09-11 at 10:04 +0200, Marcel Holtmann wrote:
> Hi guys,
>
> so since a few 2.6.27 kernels I haven an issue when using my 4965
> wireless card. So the machine in question is an X61 and when enabling
> WiFi and Bluetooth at the same time it will freeze hard after a few
> minutes. It seems activating Bluetooth has something to do with it,
> but I have been using the machine for Bluetooth qualification testing
> (with WiFi disabled) and everything works fine.
>
> However I tried to reproduce it without Bluetooth enabled, it didn't
> show or I didn't wait long enough. Once I started bluetoothd it showed
> within a few minutes.
>
> So the best I got was the attached screenshot of an oops. It is
> clearly the iwlwifi driver crashing here and killing the machine, but
> I have no idea why.

That looks like the use-after-free BUG_ON I've been getting with a 5k
card, do you have sl*b debugging enabled?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-09-24 20:15:31

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Thu, 2008-09-11 at 10:04 +0200, Marcel Holtmann wrote:

> So the best I got was the attached screenshot of an oops. It is
> clearly the iwlwifi driver crashing here and killing the machine, but
> I have no idea why.

Incidentally, if this really is the same bug I have, you might want to
remove the iwlagn module completely, even loading it once is a bad
thing, it has started to corrupt things here entirely by making DMA go
to random locations.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-09-11 15:06:12

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Yi,

> > However I tried to reproduce it without Bluetooth enabled, it didn't
> > show or I didn't wait long enough. Once I started bluetoothd it showed
> > within a few minutes.
> >
> > So the best I got was the attached screenshot of an oops. It is
> > clearly the iwlwifi driver crashing here and killing the machine, but
> > I have no idea why.
>
> >From the screenshot line:
>
> iwl_tx_cmd_complete+0x43/0x245
>
> It should be triggered by the BUG_ON() in the function. Are you able to
> confirm it with netconsole or maybe even a frame buffer enabled console
> (it just shows more lines)? If this is the culprit, Tomas has been
> already working on it. But your BT coexist finding should help us
> reproducing the bug.
> http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703

right now the screenshot is the best I can do. This is a X61 and so I am
limited in what I can give you. Getting the screenshot was tricky enough
since most of the times is just crashes in the background and getting
this re-produced on the console is not as simple as I thought. When
working inside GNOME, it crashes from between 30 minutes or 2 hours
without any warning.

I am not sure if I am the only with this issue, but it looks like a
serious regression that needs to be fixed before 2.6.27 goes out.

Regards

Marcel



2008-09-11 15:02:34

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Johannes,

> > so since a few 2.6.27 kernels I haven an issue when using my 4965
> > wireless card. So the machine in question is an X61 and when enabling
> > WiFi and Bluetooth at the same time it will freeze hard after a few
> > minutes. It seems activating Bluetooth has something to do with it,
> > but I have been using the machine for Bluetooth qualification testing
> > (with WiFi disabled) and everything works fine.
> >
> > However I tried to reproduce it without Bluetooth enabled, it didn't
> > show or I didn't wait long enough. Once I started bluetoothd it showed
> > within a few minutes.
> >
> > So the best I got was the attached screenshot of an oops. It is
> > clearly the iwlwifi driver crashing here and killing the machine, but
> > I have no idea why.
>
> That looks like the use-after-free BUG_ON I've been getting with a 5k
> card, do you have sl*b debugging enabled?

I have CONFIG_SLUB_DEBUG=y in this kernel. Including all sorts of kernel
debugging features enabled. Anything else I would need to give better
feedback?

Regards

Marcel



2008-10-06 16:52:34

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, Oct 6, 2008 at 5:39 AM, Johannes Berg <[email protected]> wrote:
> On Mon, 2008-10-06 at 14:36 +0200, Marcel Holtmann wrote:
>> Hi Johannes,
>>
>> > > I take this back. It took a couple of hours, but it crashed again.
>> >
>> > Do you have 64k pages enabled? My issue goes away entirely when I
>> > disable 64k pages.
>>
>> so the initial issue was a BUG_ON when the ucode reports back an
>> unsupported event. That should have been fixed by now since we just
>> can't crash the whole kernel, because some piece of firmware acts bogus
>> now and then.
>
> I have a different interpretation of the original issue, since afaik the
> value it was doing a BUG_ON on is a value that is only passed in by the
> driver and copied by the ucode to the response, hence it shouldn't
> actually crash unless the driver is passing bogus stuff in or, as it
> seems to be the case here, the DMA programming is completely fucked and
> the ucode just goes to write to random memory locations.

Seems like a good enough argument to call for it to be opened up.

Luis

2008-10-06 13:05:10

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 15:01 +0200, Marcel Holtmann wrote:

> > I have a different interpretation of the original issue, since afaik the
> > value it was doing a BUG_ON on is a value that is only passed in by the
> > driver and copied by the ucode to the response, hence it shouldn't
> > actually crash unless the driver is passing bogus stuff in or, as it
> > seems to be the case here, the DMA programming is completely fucked and
> > the ucode just goes to write to random memory locations.
>
> that patch wasn't enough, but neverless we shouldn't crash the whole
> kernel based on this.

Well, I am undecided on yet; it appears that when the issue happens then
the ucode may have written to arbitrary memory locations, if you don't
have an iommu catching it then it may have corrupted anything in your
kernel, so just crashing may be the only way to reasonably recover... I
actually wrote the patch though to make it not crash, so I can hardly
claim I didn't think it was a good idea to change that. But I think I've
changed my mind based on further debugging of the issue.

> In addition they were working on some IOMMU stuff and that explains what
> you are seeing. I never got that far at all. My machine just dies on me.

Well, yes, without the patch my machine dies when loading iwlagn.

> > > However there seems to be some other issues with me
> > > running a 64-bit OS and connecting to an N-capable access point.
> >
> > Strange. My test AP isn't N capable though.
>
> We had some discussion about the fact the different firmware versions
> might be causing this. Don't have my X61 handy at the moment.

I don't even know which one I'm using, I think it's some firmware Tomas gave me.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 12:40:35

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 14:36 +0200, Marcel Holtmann wrote:
> Hi Johannes,
>
> > > I take this back. It took a couple of hours, but it crashed again.
> >
> > Do you have 64k pages enabled? My issue goes away entirely when I
> > disable 64k pages.
>
> so the initial issue was a BUG_ON when the ucode reports back an
> unsupported event. That should have been fixed by now since we just
> can't crash the whole kernel, because some piece of firmware acts bogus
> now and then.

I have a different interpretation of the original issue, since afaik the
value it was doing a BUG_ON on is a value that is only passed in by the
driver and copied by the ucode to the response, hence it shouldn't
actually crash unless the driver is passing bogus stuff in or, as it
seems to be the case here, the DMA programming is completely fucked and
the ucode just goes to write to random memory locations.

> However there seems to be some other issues with me
> running a 64-bit OS and connecting to an N-capable access point.

Strange. My test AP isn't N capable though.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 12:33:12

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 14:30 +0200, Johannes Berg wrote:
> On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:
>
> > I take this back. It took a couple of hours, but it crashed again.
>
> Do you have 64k pages enabled? My issue goes away entirely when I
> disable 64k pages.

Well, clearly not entirely. But it's holding up much better now.

[ 325.379292] iommu_free: invalid entry
[ 325.379309] entry = 0x848c0
[ 325.379315] dma_addr = 0x848c0100
[ 325.379321] Table = 0xc00000000083f348
[ 325.379328] bus# = 0x0
[ 325.379333] size = 0x80000
[ 325.379339] startOff = 0x0
[ 325.379345] index = 0x0
[ 325.379356] ------------[ cut here ]------------
[ 325.379364] Badness at arch/powerpc/kernel/iommu.c:258
[ 325.379372] NIP: c000000000021b58 LR: c000000000021b54 CTR: c0000000000473d0
[ 325.379382] REGS: c00000021612ef60 TRAP: 0700 Tainted: G W (2.6.27-rc6-wl-01382-g0bea1f7-dirty)
[ 325.379390] MSR: 9000000000029032 <EE,ME,IR,DR> CR: 48ff4f84 XER: 000fffff
[ 325.379427] TASK = c0000002161049a0[0] 'swapper' THREAD: c00000021612c000 CPU: 1
[ 325.379439] GPR00: c000000000021b54 c00000021612f1e0 c00000000080bb60 0000000000000023
[ 325.379463] GPR04: 0000000000000001 c00000000003e18c 0000000000000000 0000000000000002
[ 325.379488] GPR08: 0000000000000000 c00000021612c000 0000000000000001 0000000000000001
[ 325.379512] GPR12: 00000000000186a0 c00000000083c500 c000000000011fe4 c00000021612c080
[ 325.379536] GPR16: 0000000000000000 c00000020cb85ca0 c00000021612c000 c00000020cb85cb8
[ 325.379561] GPR20: 0000000000000001 c00000020cb91d80 0000000000000019 0000000000000008
[ 325.379585] GPR24: c00000020cb91d80 c00000020cb93ff8 00000000848c0100 0000000000000001
[ 325.379609] GPR28: 00000000000848c0 c00000000083f348 c00000000079ca00 00000000000848c0
[ 325.379647] NIP [c000000000021b58] .__iommu_free+0xe4/0x150
[ 325.379656] LR [c000000000021b54] .__iommu_free+0xe0/0x150
[ 325.379663] Call Trace:
[ 325.379672] [c00000021612f1e0] [c000000000021b54] .__iommu_free+0xe0/0x150 (unreliable)
[ 325.379690] [c00000021612f280] [c000000000021c1c] .iommu_free+0x58/0xc0
[ 325.379705] [c00000021612f320] [c000000000021978] .dma_iommu_unmap_single+0x14/0x28
[ 325.379745] [c00000021612f390] [d00000000035cc3c] .iwl_tx_cmd_complete+0x364/0x410 [iwlcore]
[ 325.379779] [c00000021612f440] [d0000000003218c8] .iwl_rx_handle+0x32c/0x4a0 [iwlagn]
[ 325.379805] [c00000021612f540] [d0000000003223b8] .iwl4965_irq_tasklet+0x97c/0xccc [iwlagn]
[ 325.379820] [c00000021612f600] [c000000000055240] .tasklet_action+0x14c/0x244
[ 325.379833] [c00000021612f6b0] [c000000000056130] .__do_softirq+0xd8/0x1c4
[ 325.379848] [c00000021612f760] [c00000000000c2b8] .do_softirq+0x5c/0xb8
[ 325.379862] [c00000021612f7e0] [c000000000055b6c] .irq_exit+0x74/0xe0
[ 325.379876] [c00000021612f860] [c00000000000c41c] .do_IRQ+0x108/0x14c
[ 325.379889] [c00000021612f8f0] [c000000000004794] hardware_interrupt_entry+0x1c/0x20
[ 325.379909] --- Exception: 501 at .raw_local_irq_restore+0x3c/0x40
[ 325.379911] LR = ._spin_unlock_irq+0x44/0x80
[ 325.379922] [c00000021612fbe0] [c0000000003ed864] ._spin_unlock_irq+0x38/0x80 (unreliable)
[ 325.379941] [c00000021612fc70] [c000000000045fcc] .finish_task_switch+0x70/0x14c
[ 325.379984] [c00000021612fd10] [c0000000003ea008] .schedule+0x94c/0xa28
[ 325.380000] [c00000021612fe30] [c000000000011fe4] .cpu_idle+0x1ec/0x200
[ 325.380014] [c00000021612fed0] [c0000000003f4b78] .start_secondary+0x36c/0x3a8
[ 325.380030] [c00000021612ff90] [c0000000000073c0] .start_secondary_prolog+0xc/0x10
[ 325.380045] Instruction dump:
[ 325.380057] e89d0008 e87e8050 483d2695 60000000 e89d0010 e87e8058 483d2685 60000000
[ 325.380108] e87e8060 e89d0020 483d2675 60000000 <0fe00000> 48000040 e93e8068 7fe4fb78
[ 370.135933] iommu_free: invalid entry
[ 370.136005] entry = 0xcb640
[ 370.136011] dma_addr = 0xcb640100
[ 370.136017] Table = 0xc00000000083f348
[ 370.136023] bus# = 0x0
[ 370.136029] size = 0x80000
[ 370.136035] startOff = 0x0
[ 370.136041] index = 0x0
[ 370.136052] ------------[ cut here ]------------
[ 370.136060] Badness at arch/powerpc/kernel/iommu.c:258
[ 370.136068] NIP: c000000000021b58 LR: c000000000021b54 CTR: c0000000000473d0
[ 370.136078] REGS: c000000211276da0 TRAP: 0700 Tainted: G W (2.6.27-rc6-wl-01382-g0bea1f7-dirty)
[ 370.136086] MSR: 9000000000029032 <EE,ME,IR,DR> CR: 28004884 XER: 000fffff
[ 370.136122] TASK = c0000002112424d0[3560] 'nmbd' THREAD: c000000211274000 CPU: 1
[ 370.136133] GPR00: c000000000021b54 c000000211277020 c00000000080bb60 0000000000000023
[ 370.136157] GPR04: 0000000000000001 c00000000003e18c 0000000000000000 0000000000000002
[ 370.136181] GPR08: 0000000000000000 c000000211274000 0000000000000001 0000000000000000
[ 370.136205] GPR12: 00000000000186a0 c00000000083c500 000000000fc9e1c4 000000000000000e
[ 370.136229] GPR16: 0000000000000000 c00000020cb85ca0 c000000211274000 c00000020cb85cb8
[ 370.136252] GPR20: 0000000000000001 c00000020cb91d80 00000000000000ad 0000000000000008
[ 370.136276] GPR24: c00000020cb91d80 c00000020cb93ff8 00000000cb640100 0000000000000001
[ 370.136300] GPR28: 00000000000cb640 c00000000083f348 c00000000079ca00 00000000000cb640
[ 370.136335] NIP [c000000000021b58] .__iommu_free+0xe4/0x150
[ 370.136345] LR [c000000000021b54] .__iommu_free+0xe0/0x150
[ 370.136352] Call Trace:
[ 370.136361] [c000000211277020] [c000000000021b54] .__iommu_free+0xe0/0x150 (unreliable)
[ 370.136378] [c0000002112770c0] [c000000000021c1c] .iommu_free+0x58/0xc0
[ 370.136392] [c000000211277160] [c000000000021978] .dma_iommu_unmap_single+0x14/0x28
[ 370.136432] [c0000002112771d0] [d00000000035cc3c] .iwl_tx_cmd_complete+0x364/0x410 [iwlcore]
[ 370.136465] [c000000211277280] [d0000000003218c8] .iwl_rx_handle+0x32c/0x4a0 [iwlagn]
[ 370.136491] [c000000211277380] [d0000000003223b8] .iwl4965_irq_tasklet+0x97c/0xccc [iwlagn]
[ 370.136506] [c000000211277440] [c000000000055240] .tasklet_action+0x14c/0x244
[ 370.136519] [c0000002112774f0] [c000000000056130] .__do_softirq+0xd8/0x1c4
[ 370.136533] [c0000002112775a0] [c00000000000c2b8] .do_softirq+0x5c/0xb8
[ 370.136546] [c000000211277620] [c000000000055b6c] .irq_exit+0x74/0xe0
[ 370.136560] [c0000002112776a0] [c00000000001e1d4] .timer_interrupt+0xe4/0x12c
[ 370.136573] [c000000211277740] [c000000000003614] decrementer_common+0x114/0x180
[ 370.136592] --- Exception: 901 at .kmem_cache_free+0x148/0x178
[ 370.136595] LR = .kmem_cache_free+0x148/0x178
[ 370.136605] [c000000211277a30] [c0000000000ecae8] .kmem_cache_free+0x13c/0x178 (unreliable)
[ 370.136624] [c000000211277ae0] [c0000000000fb720] .putname+0x40/0x64
[ 370.136638] [c000000211277b60] [c0000000000fdddc] .user_path_at+0x70/0xb0
[ 370.136651] [c000000211277c90] [c0000000000f42b4] .vfs_stat_fd+0x2c/0x78
[ 370.136664] [c000000211277d30] [c0000000000f4474] .sys_stat64+0x24/0x54
[ 370.136677] [c000000211277e30] [c0000000000076d4] syscall_exit+0x0/0x40
[ 370.136688] Instruction dump:
[ 370.136725] e89d0008 e87e8050 483d2695 60000000 e89d0010 e87e8058 483d2685 60000000
[ 370.136764] e87e8060 e89d0020 483d2675 60000000 <0fe00000> 48000040 e93e8068 7fe4fb78
[ 429.724965] iommu_free: invalid entry
[ 429.725026] entry = 0xc00c0
[ 429.725032] dma_addr = 0xc00c0100
[ 429.725038] Table = 0xc00000000083f348
[ 429.725045] bus# = 0x0
[ 429.725051] size = 0x80000
[ 429.725057] startOff = 0x0
[ 429.725063] index = 0x0
[ 429.725073] ------------[ cut here ]------------
[ 429.725081] Badness at arch/powerpc/kernel/iommu.c:258
[ 429.725089] NIP: c000000000021b58 LR: c000000000021b54 CTR: c0000000000473d0
[ 429.725100] REGS: c00000021612f1b0 TRAP: 0700 Tainted: G W (2.6.27-rc6-wl-01382-g0bea1f7-dirty)
[ 429.725108] MSR: 9000000000029032 <EE,ME,IR,DR> CR: 48ff4f84 XER: 000fffff
[ 429.725145] TASK = c0000002161049a0[0] 'swapper' THREAD: c00000021612c000 CPU: 1
[ 429.725157] GPR00: c000000000021b54 c00000021612f430 c00000000080bb60 0000000000000023
[ 429.725182] GPR04: 0000000000000001 c00000000003e18c 0000000000000000 0000000000000002
[ 429.725206] GPR08: 0000000000000000 c00000021612c000 0000000000000001 0000000000000001
[ 429.725231] GPR12: 00000000000186a0 c00000000083c500 00000000f8000000 0000000000000000
[ 429.725255] GPR16: 0000000000000000 c00000020cb85ca0 c00000021612c000 c00000020cb85cb8
[ 429.725280] GPR20: 0000000000000001 c00000020cb91d80 000000000000000d 0000000000000008
[ 429.725304] GPR24: c00000020cb91d80 c00000020cb93ff8 00000000c00c0100 0000000000000001
[ 429.725329] GPR28: 00000000000c00c0 c00000000083f348 c00000000079ca00 00000000000c00c0
[ 429.725365] NIP [c000000000021b58] .__iommu_free+0xe4/0x150
[ 429.725375] LR [c000000000021b54] .__iommu_free+0xe0/0x150
[ 429.725382] Call Trace:
[ 429.725391] [c00000021612f430] [c000000000021b54] .__iommu_free+0xe0/0x150 (unreliable)
[ 429.725409] [c00000021612f4d0] [c000000000021c1c] .iommu_free+0x58/0xc0
[ 429.725424] [c00000021612f570] [c000000000021978] .dma_iommu_unmap_single+0x14/0x28
[ 429.725464] [c00000021612f5e0] [d00000000035cc3c] .iwl_tx_cmd_complete+0x364/0x410 [iwlcore]
[ 429.725497] [c00000021612f690] [d0000000003218c8] .iwl_rx_handle+0x32c/0x4a0 [iwlagn]
[ 429.725523] [c00000021612f790] [d0000000003223b8] .iwl4965_irq_tasklet+0x97c/0xccc [iwlagn]
[ 429.725539] [c00000021612f850] [c000000000055240] .tasklet_action+0x14c/0x244
[ 429.725552] [c00000021612f900] [c000000000056130] .__do_softirq+0xd8/0x1c4
[ 429.725567] [c00000021612f9b0] [c00000000000c2b8] .do_softirq+0x5c/0xb8
[ 429.725580] [c00000021612fa30] [c000000000055b6c] .irq_exit+0x74/0xe0
[ 429.725594] [c00000021612fab0] [c00000000000c41c] .do_IRQ+0x108/0x14c
[ 429.725608] [c00000021612fb40] [c000000000004794] hardware_interrupt_entry+0x1c/0x20
[ 429.725625] --- Exception: 501 at .cpu_idle+0x118/0x200
[ 429.725627] LR = .cpu_idle+0x118/0x200
[ 429.725638] [c00000021612fe30] [c000000000011ec8] .cpu_idle+0xd0/0x200 (unreliable)
[ 429.725658] [c00000021612fed0] [c0000000003f4b78] .start_secondary+0x36c/0x3a8
[ 429.725671] [c00000021612ff90] [c0000000000073c0] .start_secondary_prolog+0xc/0x10
[ 429.725682] Instruction dump:
[ 429.725693] e89d0008 e87e8050 483d2695 60000000 e89d0010 e87e8058 483d2685 60000000
[ 429.725761] e87e8060 e89d0020 483d2675 60000000 <0fe00000> 48000040 e93e8068 7fe4fb78



2008-10-06 16:56:28

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 09:52 -0700, Luis R. Rodriguez wrote:

> > I have a different interpretation of the original issue, since afaik the
> > value it was doing a BUG_ON on is a value that is only passed in by the
> > driver and copied by the ucode to the response, hence it shouldn't
> > actually crash unless the driver is passing bogus stuff in or, as it
> > seems to be the case here, the DMA programming is completely fucked and
> > the ucode just goes to write to random memory locations.
>
> Seems like a good enough argument to call for it to be opened up.

But there's a chance that the DMA programming being fucked up will make
it scribble over your filesystem buffers. So it's surely safer to
BUG_ON.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 12:37:01

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 14:32 +0200, Johannes Berg wrote:
> On Mon, 2008-10-06 at 14:30 +0200, Johannes Berg wrote:
> > On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:
> >
> > > I take this back. It took a couple of hours, but it crashed again.
> >
> > Do you have 64k pages enabled? My issue goes away entirely when I
> > disable 64k pages.
>
> Well, clearly not entirely. But it's holding up much better now.
>
> [ 325.379292] iommu_free: invalid entry
> [ 325.379309] entry = 0x848c0
> [ 325.379315] dma_addr = 0x848c0100
> [ 325.379321] Table = 0xc00000000083f348
> [ 325.379328] bus# = 0x0
> [ 325.379333] size = 0x80000
> [ 325.379339] startOff = 0x0
> [ 325.379345] index = 0x0

and that keeps happening while I use wireless.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 12:36:13

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Johannes,

> > I take this back. It took a couple of hours, but it crashed again.
>
> Do you have 64k pages enabled? My issue goes away entirely when I
> disable 64k pages.

so the initial issue was a BUG_ON when the ucode reports back an
unsupported event. That should have been fixed by now since we just
can't crash the whole kernel, because some piece of firmware acts bogus
now and then. However there seems to be some other issues with me
running a 64-bit OS and connecting to an N-capable access point.

Regards

Marcel



2008-10-06 13:08:06

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 14:32 +0200, Johannes Berg wrote:
> On Mon, 2008-10-06 at 14:30 +0200, Johannes Berg wrote:
> > On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:
> >
> > > I take this back. It took a couple of hours, but it crashed again.
> >
> > Do you have 64k pages enabled? My issue goes away entirely when I
> > disable 64k pages.
>
> Well, clearly not entirely. But it's holding up much better now.
>
> [ 325.379292] iommu_free: invalid entry

I added two printks to make it easier to look at:

[ 736.879335] iommu_free: invalid entry
[ 736.890905] free_entry= 0xd6ec0
[ 736.902494] npages = 0x1
[ 736.913864] entry = 0xd6ec0
[ 736.925100] dma_addr = 0xd6ec0100
[ 736.936116] Table = 0xc00000000083f348
[ 736.947078] bus# = 0x0
[ 736.957934] size = 0x80000
[ 736.968775] startOff = 0x0
[ 736.979574] index = 0x0

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 13:21:53

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Mon, 2008-10-06 at 15:07 +0200, Johannes Berg wrote:
> On Mon, 2008-10-06 at 14:32 +0200, Johannes Berg wrote:
> > On Mon, 2008-10-06 at 14:30 +0200, Johannes Berg wrote:
> > > On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:
> > >
> > > > I take this back. It took a couple of hours, but it crashed again.
> > >
> > > Do you have 64k pages enabled? My issue goes away entirely when I
> > > disable 64k pages.
> >
> > Well, clearly not entirely. But it's holding up much better now.
> >
> > [ 325.379292] iommu_free: invalid entry
>
> I added two printks to make it easier to look at:

adding printk to iommu_map_single, we see that the address was never
even mapped!

[...]
[ 297.544692] iommu_map_single = 10ef2000
[ 297.544711] iommu_map_single = 10ef4000
[ 297.646975] iommu_map_single = 10ef6920
[ 297.748983] iommu_map_single = 10ef7b68
[ 297.953110] iommu_map_single = 10ef8db0
[ 297.953229] iommu_free: invalid entry
[ 297.953245] free_entry= 0x991c0
[ 297.953256] npages = 0x1
[ 297.953268] entry = 0x991c0
[ 297.953279] dma_addr = 0x991c0100
[ 297.953290] Table = 0xc00000000083f348
[ 297.953302] bus# = 0x0
[ 297.953313] size = 0x80000
[ 297.953324] startOff = 0x0
[ 297.953333] index = 0x0
[ 297.953349] ------------[ cut here ]------------
[ 297.953361] Badness at arch/powerpc/kernel/iommu.c:260

clearly, that can't be right since iommu_map_single has been returning
increasing numbers up to this point, and a dma_addr of 0x991c0100 wasn't
among them.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 12:30:54

by Johannes Berg

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

On Fri, 2008-09-12 at 04:10 +0200, Marcel Holtmann wrote:

> I take this back. It took a couple of hours, but it crashed again.

Do you have 64k pages enabled? My issue goes away entirely when I
disable 64k pages.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-10-06 13:01:13

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Crash with 2.6.27-rc6 with iwlwifi

Hi Johannes,

> > > > I take this back. It took a couple of hours, but it crashed again.
> > >
> > > Do you have 64k pages enabled? My issue goes away entirely when I
> > > disable 64k pages.
> >
> > so the initial issue was a BUG_ON when the ucode reports back an
> > unsupported event. That should have been fixed by now since we just
> > can't crash the whole kernel, because some piece of firmware acts bogus
> > now and then.
>
> I have a different interpretation of the original issue, since afaik the
> value it was doing a BUG_ON on is a value that is only passed in by the
> driver and copied by the ucode to the response, hence it shouldn't
> actually crash unless the driver is passing bogus stuff in or, as it
> seems to be the case here, the DMA programming is completely fucked and
> the ucode just goes to write to random memory locations.

that patch wasn't enough, but neverless we shouldn't crash the whole
kernel based on this.

In addition they were working on some IOMMU stuff and that explains what
you are seeing. I never got that far at all. My machine just dies on me.

> > However there seems to be some other issues with me
> > running a 64-bit OS and connecting to an N-capable access point.
>
> Strange. My test AP isn't N capable though.

We had some discussion about the fact the different firmware versions
might be causing this. Don't have my X61 handy at the moment.

Regards

Marcel