2010-11-01 13:22:50

by Josh Boyer

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

On Fri, Oct 29, 2010 at 11:06 PM, Rupjyoti Sarmah <[email protected]> wrote:
> Hi ,
>
> APM boards Canyonlands/Kilauea/Glacier/Katmai/Sequoia are all failing
> during booting.

What kernel version? What config? Have you tried a git bisect to see
when it broke? Etc, etc.

Also, CC'ing linuxppc-dev would have been a good idea. Not many on
lkml are even going to know what you're talking about with such sparse
details.

josh

>
> Call trace is same for all
>
>
> Call Trace:
>
> [df835d70] [c02d27cc] emac_probe+0xf28/0x12a8 (unreliable)
>
> [df835e50] [c023c7cc] platform_driver_probe_shim+0x40/0x54
>
> [df835e60] [c01bf354] platform_drv_probe+0x20/0x30
>
> [df835e70] [c01bde68] driver_probe_device+0x148/0x1ac
>
> [df835e90] [c01be17c] __driver_attach+0xa4/0xa8
>
> [df835eb0] [c01bcfa4] bus_for_each_dev+0x60/0x9c
>
> [df835ee0] [c01bdbbc] driver_attach+0x24/0x34
>
> [df835ef0] [c01bd94c] bus_add_driver+0x1b8/0x274
>
> [df835f20] [c01be3d8] driver_register+0x6c/0x160
>
> [df835f40] [c01bf6c4] platform_driver_register+0x68/0x78
>
> [df835f50] [c023c998] of_register_platform_driver+0xa8/0xc4
>
> [df835f60] [c0393e88] emac_init+0x1ac/0x1dc
>
> [df835fa0] [c0001574] do_one_initcall+0x160/0x1a8
>
> [df835fd0] [c037a1e8] kernel_init+0xcc/0x174
>
> [df835ff0] [c000c5b0] kernel_thread+0x4c/0x68
>
> Instruction dump:
>
> 419e016c 2f800007 419e0164 38130774 901a00d0 381306b0 901a00d4 7f43d378
>
> 4bf91d89 817a01a0 39200001 380b0008 <7d400028> 7d4a4b78 7d40012d 40a2fff4
>
> ---[ end trace dac0cf4779f83901 ]---
>
>
> Regards,
> Rup
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>


2010-11-01 15:05:56

by Josh Boyer

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

On Mon, Nov 1, 2010 at 9:22 AM, Josh Boyer <[email protected]> wrote:
> On Fri, Oct 29, 2010 at 11:06 PM, Rupjyoti Sarmah <[email protected]> wrote:
>> Hi ,
>>
>> APM boards Canyonlands/Kilauea/Glacier/Katmai/Sequoia are all failing
>> during booting.
>
> What kernel version? ?What config? ?Have you tried a git bisect to see
> when it broke? ?Etc, etc.
>>
>> Call trace is same for all
>>
>>
>> Call Trace:
>>
>> [df835d70] [c02d27cc] emac_probe+0xf28/0x12a8 (unreliable)
>>
>> [df835e50] [c023c7cc] platform_driver_probe_shim+0x40/0x54
>>
>> [df835e60] [c01bf354] platform_drv_probe+0x20/0x30
>>
>> [df835e70] [c01bde68] driver_probe_device+0x148/0x1ac
>>
>> [df835e90] [c01be17c] __driver_attach+0xa4/0xa8
>>
>> [df835eb0] [c01bcfa4] bus_for_each_dev+0x60/0x9c
>>
>> [df835ee0] [c01bdbbc] driver_attach+0x24/0x34
>>
>> [df835ef0] [c01bd94c] bus_add_driver+0x1b8/0x274
>>
>> [df835f20] [c01be3d8] driver_register+0x6c/0x160
>>
>> [df835f40] [c01bf6c4] platform_driver_register+0x68/0x78
>>
>> [df835f50] [c023c998] of_register_platform_driver+0xa8/0xc4
>>
>> [df835f60] [c0393e88] emac_init+0x1ac/0x1dc
>>
>> [df835fa0] [c0001574] do_one_initcall+0x160/0x1a8
>>
>> [df835fd0] [c037a1e8] kernel_init+0xcc/0x174
>>
>> [df835ff0] [c000c5b0] kernel_thread+0x4c/0x68
>>
>> Instruction dump:
>>
>> 419e016c 2f800007 419e0164 38130774 901a00d0 381306b0 901a00d4 7f43d378
>>
>> 4bf91d89 817a01a0 39200001 380b0008 <7d400028> 7d4a4b78 7d40012d 40a2fff4
>>
>> ---[ end trace dac0cf4779f83901 ]---

A git bisect between 2.6.36 (working) and Linus tip (traceback) points to:

e6484930d7c73d324bccda7d43d131088da697b9 net: allocate tx queues in
register_netdevice

as causing this. I'm not entirely sure why yet, but the commit
message seems slightly off to me. It claims to make TX queue
allocation identical to RX, but from what I can tell, most of the RX
queue logic is hidden behind CONFIG_RPS, which is not set in my config
at all (and can't be due to a dep on CONFIG_SMP). This change doesn't
guard anything behind that.

A few hints would be appreciated.

josh

2010-11-01 15:37:07

by Stephen Rothwell

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

Hi Josh,

On Mon, 1 Nov 2010 11:05:53 -0400 Josh Boyer <[email protected]> wrote:
>
> A few hints would be appreciated.

Remove the call to netif_stop_queue() from emac_probe(). Apparently,
calling this before register_netdev() is now wrong (maybe always was).

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (366.00 B)
(No filename) (490.00 B)
Download all attachments

2010-11-01 15:39:50

by Josh Boyer

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

On Mon, Nov 1, 2010 at 11:36 AM, Stephen Rothwell <[email protected]> wrote:
> Hi Josh,
>
> On Mon, 1 Nov 2010 11:05:53 -0400 Josh Boyer <[email protected]> wrote:
>>
>> A few hints would be appreciated.
>
> Remove the call to netif_stop_queue() from emac_probe(). ?Apparently,
> calling this before register_netdev() is now wrong (maybe always was).

Yeah, I just discovered that myself. I'm wondering

1) why we do that in that function?
2) If it needs to be removed entirely, or moved to after the
register_netdev call
3) If the call to netif_carrier_off also needs similar attention.

I can whip up a patch to remove those calls or move them after the
register, but I don't want to do that without knowing which one is
"right".

josh

2010-11-01 15:50:09

by David Miller

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

From: Stephen Rothwell <[email protected]>
Date: Tue, 2 Nov 2010 02:36:50 +1100

> Hi Josh,
>
> On Mon, 1 Nov 2010 11:05:53 -0400 Josh Boyer <[email protected]> wrote:
>>
>> A few hints would be appreciated.
>
> Remove the call to netif_stop_queue() from emac_probe(). Apparently,
> calling this before register_netdev() is now wrong (maybe always was).

Right.

I'll add this to net-2.6

--------------------
ibm_newemac: Remove netif_stop_queue() in emac_probe().

Touching the queue state before register_netdev is not
allowed, and besides the queue state before ->open()
is "don't care"

Reported-by: Josh Boyer <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
drivers/net/ibm_newemac/core.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 385dc32..06bb9b7 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -2871,7 +2871,6 @@ static int __devinit emac_probe(struct platform_device *ofdev,
SET_ETHTOOL_OPS(ndev, &emac_ethtool_ops);

netif_carrier_off(ndev);
- netif_stop_queue(ndev);

err = register_netdev(ndev);
if (err) {
--
1.7.3.2

2010-11-01 15:51:04

by David Miller

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

From: Josh Boyer <[email protected]>
Date: Mon, 1 Nov 2010 11:39:47 -0400

> On Mon, Nov 1, 2010 at 11:36 AM, Stephen Rothwell <[email protected]> wrote:
>> Hi Josh,
>>
>> On Mon, 1 Nov 2010 11:05:53 -0400 Josh Boyer <[email protected]> wrote:
>>>
>>> A few hints would be appreciated.
>>
>> Remove the call to netif_stop_queue() from emac_probe(). ?Apparently,
>> calling this before register_netdev() is now wrong (maybe always was).
>
> Yeah, I just discovered that myself. I'm wondering
>
> 1) why we do that in that function?

Because likely it was blindly copied from some other driver.

> 2) If it needs to be removed entirely, or moved to after the
> register_netdev call

Removed entirely.

> 3) If the call to netif_carrier_off also needs similar attention.

Not really.

> I can whip up a patch to remove those calls or move them after the
> register, but I don't want to do that without knowing which one is
> "right".

I've already taken care of this.

2010-11-01 16:14:59

by Josh Boyer

[permalink] [raw]
Subject: Re: All Applied micro boards are failing with current mainline kernel

On Mon, Nov 1, 2010 at 11:51 AM, David Miller <[email protected]> wrote:
>> I can whip up a patch to remove those calls or move them after the
>> register, but I don't want to do that without knowing which one is
>> "right".
>
> I've already taken care of this.

Thanks!

josh