2012-06-26 07:32:47

by Ren, Cloud

[permalink] [raw]
Subject: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

From: xiong <[email protected]>

some people report atl1c could cause system hang with following
kernel trace info:
---------------------------------------
WARNING: at.../net/sched/sch_generic.c:258
dev_watchdog+0x1db/0x1d0()
...
NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
...
---------------------------------------
This is caused by netif_stop_queue calling when cable Link is down
but netif_wake_queue isn't called when cable Link is resume.

Signed-off-by: xiong <[email protected]>
Signed-off-by: Cloud Ren <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 85717cb..c2736c4 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct *work)
atl1c_irq_disable(adapter);
atl1c_check_link_status(adapter);
atl1c_irq_enable(adapter);
+ if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
+ netif_wake_queue(netdev);
}
}

--
1.7.7


2012-06-26 18:03:17

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

On Tue, Jun 26, 2012 at 12:33:06PM -0300, Ren, Cloud wrote:
> From: xiong <[email protected]>
>
> some people report atl1c could cause system hang with following
> kernel trace info:
> ---------------------------------------
> WARNING: at.../net/sched/sch_generic.c:258
> dev_watchdog+0x1db/0x1d0()
> ...
> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> ...
> ---------------------------------------
> This is caused by netif_stop_queue calling when cable Link is down
> but netif_wake_queue isn't called when cable Link is resume.
>
> Signed-off-by: xiong <[email protected]>
> Signed-off-by: Cloud Ren <[email protected]>

If this fixes a system hang then this could be a stable
fix -- that is, this should be propagated to older stable
kernels, no?

If so then please add to the commit log a line like this:

Cc: [email protected] [3.4]

Note: this is for the commit log! Right above the line below
that has "---"

Luis

2012-06-26 20:23:54

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:
> From: xiong <[email protected]>
>
> some people report atl1c could cause system hang with following
> kernel trace info:
> ---------------------------------------
> WARNING: at.../net/sched/sch_generic.c:258
> dev_watchdog+0x1db/0x1d0()
> ...
> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> ...
> ---------------------------------------
> This is caused by netif_stop_queue calling when cable Link is down
> but netif_wake_queue isn't called when cable Link is resume.
>
> Signed-off-by: xiong <[email protected]>
> Signed-off-by: Cloud Ren <[email protected]>
> ---
> drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 85717cb..c2736c4 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct *work)
> atl1c_irq_disable(adapter);
> atl1c_check_link_status(adapter);
> atl1c_irq_enable(adapter);
> + if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
> + netif_wake_queue(netdev);
> }
> }
>

Why explicitly stop/start the queue when the link changes? That's what
link_watch is for.

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2012-06-26 20:25:27

by Huang, Xiong

[permalink] [raw]
Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

Yes, another fix to remove netif_stop_queue when cable link is down.

-Xiong

> -----Original Message-----
> From: Ben Hutchings [mailto:[email protected]]
> Sent: Wednesday, June 27, 2012 4:24
> To: Ren, Cloud
> Cc: [email protected]; [email protected]; linux-
> [email protected]; qca-linux-team; nic-devel; Huang, Xiong
> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
>
> On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:
> > From: xiong <[email protected]>
> >
> > some people report atl1c could cause system hang with following kernel
> > trace info:
> > ---------------------------------------
> > WARNING: at.../net/sched/sch_generic.c:258
> > dev_watchdog+0x1db/0x1d0()
> > ...
> > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...
> > ---------------------------------------
> > This is caused by netif_stop_queue calling when cable Link is down but
> > netif_wake_queue isn't called when cable Link is resume.
> >
> > Signed-off-by: xiong <[email protected]>
> > Signed-off-by: Cloud Ren <[email protected]>
> > ---
> > drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > index 85717cb..c2736c4 100644
> > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct
> *work)
> > atl1c_irq_disable(adapter);
> > atl1c_check_link_status(adapter);
> > atl1c_irq_enable(adapter);
> > + if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
> > + netif_wake_queue(netdev);
> > }
> > }
> >
>
> Why explicitly stop/start the queue when the link changes? That's what
> link_watch is for.
>
> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's
> the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-06-26 20:26:40

by Huang, Xiong

[permalink] [raw]
Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

Sorry, my mean , another fix is to remove netif_stop_queue when cable link is down.

Thanks
Xiong
> -----Original Message-----
> From: Huang, Xiong
> Sent: Wednesday, June 27, 2012 4:25
> To: Ben Hutchings; Ren, Cloud
> Cc: [email protected]; [email protected]; linux-
> [email protected]; qca-linux-team; nic-devel
> Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
>
> Yes, another fix to remove netif_stop_queue when cable link is down.
>
> -Xiong
>
> > -----Original Message-----
> > From: Ben Hutchings [mailto:[email protected]]
> > Sent: Wednesday, June 27, 2012 4:24
> > To: Ren, Cloud
> > Cc: [email protected]; [email protected]; linux-
> > [email protected]; qca-linux-team; nic-devel; Huang, Xiong
> > Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed
> > out
> >
> > On Tue, 2012-06-26 at 12:33 -0300, Ren, Cloud wrote:
> > > From: xiong <[email protected]>
> > >
> > > some people report atl1c could cause system hang with following
> > > kernel trace info:
> > > ---------------------------------------
> > > WARNING: at.../net/sched/sch_generic.c:258
> > > dev_watchdog+0x1db/0x1d0()
> > > ...
> > > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...
> > > ---------------------------------------
> > > This is caused by netif_stop_queue calling when cable Link is down
> > > but netif_wake_queue isn't called when cable Link is resume.
> > >
> > > Signed-off-by: xiong <[email protected]>
> > > Signed-off-by: Cloud Ren <[email protected]>
> > > ---
> > > drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 ++
> > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > > b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > > index 85717cb..c2736c4 100644
> > > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > > @@ -351,6 +351,8 @@ static void atl1c_common_task(struct work_struct
> > *work)
> > > atl1c_irq_disable(adapter);
> > > atl1c_check_link_status(adapter);
> > > atl1c_irq_enable(adapter);
> > > + if (netif_queue_stopped(netdev) && netif_carrier_ok(netdev))
> > > + netif_wake_queue(netdev);
> > > }
> > > }
> > >
> >
> > Why explicitly stop/start the queue when the link changes? That's
> > what link_watch is for.
> >
> > Ben.
> >
> > --
> > Ben Hutchings, Staff Engineer, Solarflare Not speaking for my
> > employer; that's the marketing department's job.
> > They asked us to note that Solarflare product names are trademarked.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-06-26 20:41:14

by Huang, Xiong

[permalink] [raw]
Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

Luis
It should be a stable fix, but as Ben Hutchings mentioned in another mail,
Maybe, removing netif_stop_queue when cable link down is a better choice.

Do you mean we need add 'cc:[email protected]' just before 'some people report ...' ?

Thanks
Xiong

> -----Original Message-----
> From: Rodriguez, Luis
> Sent: Wednesday, June 27, 2012 2:03
> To: Ren, Cloud
> Cc: [email protected]; [email protected]; linux-
> [email protected]; qca-linux-team; nic-devel; Huang, Xiong
> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
>
> On Tue, Jun 26, 2012 at 12:33:06PM -0300, Ren, Cloud wrote:
> > From: xiong <[email protected]>
> >
> > some people report atl1c could cause system hang with following kernel
> > trace info:
> > ---------------------------------------
> > WARNING: at.../net/sched/sch_generic.c:258
> > dev_watchdog+0x1db/0x1d0()
> > ...
> > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ...
> > ---------------------------------------
> > This is caused by netif_stop_queue calling when cable Link is down but
> > netif_wake_queue isn't called when cable Link is resume.
> >
> > Signed-off-by: xiong <[email protected]>
> > Signed-off-by: Cloud Ren <[email protected]>
>
> If this fixes a system hang then this could be a stable fix -- that is, this should
> be propagated to older stable kernels, no?
>
> If so then please add to the commit log a line like this:
>
> Cc: [email protected] [3.4]
>
> Note: this is for the commit log! Right above the line below that has "---"
>
> Luis

2012-06-26 20:54:41

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

On Tue, Jun 26, 2012 at 01:41:11PM -0700, Huang, Xiong wrote:
> Luis
> It should be a stable fix, but as Ben Hutchings mentioned in another mail,
> Maybe, removing netif_stop_queue when cable link down is a better choice.
>
> Do you mean we need add 'cc:[email protected]' just before 'some people report ...' ?

Nope, see commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
as an example of how to do this. This is a random commit
that has been marked as stable.

commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
Author: Ricardo Martins <[email protected]>
Date: Tue May 22 18:02:03 2012 +0100

USB: fix PS3 EHCI systems

After commit aaa0ef289afe9186f81e2340114ea413eef0492a "PS3 EHCI QH
read work-around", Terratec Grabby (em28xx) stopped working with AMD
Geode LX 800 (USB controller AMD CS5536). Since this is a PS3 only
fix, the following patch adds a conditional block around it.

Signed-off-by: Ricardo Martins <[email protected]>
Acked-by: Alan Stern <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Sometimes it helps if you specify the oldest stable kernel
to apply patches to, so for example:

commit 80b08a8d8829a58b5db14b1417151094cc28face
Author: Felix Fietkau <[email protected]>
Date: Fri Jun 15 03:04:53 2012 +0200

ath9k: fix invalid pointer access in the tx path

After setup_frame_info has been called, only info->control.rates is still
valid, other control fields have been overwritten by the ath_frame_info
data. Move the access to info->control.vif for checking short preamble
to setup_frame_info before it gets overwritten.

This regression was introduced in commit d47a61aa
"ath9k: Fix multi-VIF BSS handling"

Signed-off-by: Felix Fietkau <[email protected]>
Reported-by: Thomas H?hn <[email protected]>
Acked-by: Sujith Manoharan <[email protected]>
Cc: [email protected] [3.4]
Signed-off-by: John W. Linville <[email protected]>

To be clear, this is not a Cc: in the e-mail but instead a
Cc line in the commit log entry.

Luis

2012-06-26 20:55:59

by Huang, Xiong

[permalink] [raw]
Subject: RE: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

Understand, thank you !

> -----Original Message-----
> From: Rodriguez, Luis
> Sent: Wednesday, June 27, 2012 4:55
> To: Huang, Xiong
> Cc: Ren, Cloud; [email protected]; [email protected]; linux-
> [email protected]; qca-linux-team; nic-devel
> Subject: Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
>
> On Tue, Jun 26, 2012 at 01:41:11PM -0700, Huang, Xiong wrote:
> > Luis
> > It should be a stable fix, but as Ben Hutchings mentioned in
> > another mail, Maybe, removing netif_stop_queue when cable link down is a
> better choice.
> >
> > Do you mean we need add 'cc:[email protected]' just before 'some
> people report ...' ?
>
> Nope, see commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
> as an example of how to do this. This is a random commit that has been
> marked as stable.
>
> commit 4f7a67e2dd49fbfba002c453bc24bf00e701cc71
> Author: Ricardo Martins <[email protected]>
> Date: Tue May 22 18:02:03 2012 +0100
>
> USB: fix PS3 EHCI systems
>
> After commit aaa0ef289afe9186f81e2340114ea413eef0492a "PS3 EHCI
> QH
> read work-around", Terratec Grabby (em28xx) stopped working with AMD
> Geode LX 800 (USB controller AMD CS5536). Since this is a PS3 only
> fix, the following patch adds a conditional block around it.
>
> Signed-off-by: Ricardo Martins <[email protected]>
> Acked-by: Alan Stern <[email protected]>
> Cc: stable <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> Sometimes it helps if you specify the oldest stable kernel to apply patches to,
> so for example:
>
> commit 80b08a8d8829a58b5db14b1417151094cc28face
> Author: Felix Fietkau <[email protected]>
> Date: Fri Jun 15 03:04:53 2012 +0200
>
> ath9k: fix invalid pointer access in the tx path
>
> After setup_frame_info has been called, only info->control.rates is still
> valid, other control fields have been overwritten by the ath_frame_info
> data. Move the access to info->control.vif for checking short preamble
> to setup_frame_info before it gets overwritten.
>
> This regression was introduced in commit d47a61aa
> "ath9k: Fix multi-VIF BSS handling"
>
> Signed-off-by: Felix Fietkau <[email protected]>
> Reported-by: Thomas H?hn <[email protected]>
> Acked-by: Sujith Manoharan <[email protected]>
> Cc: [email protected] [3.4]
> Signed-off-by: John W. Linville <[email protected]>
>
> To be clear, this is not a Cc: in the e-mail but instead a Cc line in the commit
> log entry.
>
> Luis