2008-08-19 00:38:41

by Jay Cliburn

[permalink] [raw]
Subject: [PATCH] atl1: disable TSO by default

The atl1 driver is causing stalled connections and file corruption
whenever TSO is enabled. Two examples are here:

http://lkml.org/lkml/2008/7/15/325
http://lkml.org/lkml/2008/8/18/543

Disable TSO by default until we can determine the source of the
problem.

Signed-off-by: Jay Cliburn <[email protected]>
cc: [email protected]
---

Jeff, I've been trying to find the source of this problem in my
spare time for a few weeks now, but haven't been successful. We
turned on TSO by default in 2.6.26 after fixing a nasty performance
bug in it, but there's apparently another bug lurking.

This patch needs to be also applied to 2.6.26, hence the cc to
stable.

drivers/net/atlx/atl1.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index e6a7bb7..e23ce77 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -3022,7 +3022,6 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
netdev->features = NETIF_F_HW_CSUM;
netdev->features |= NETIF_F_SG;
netdev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
- netdev->features |= NETIF_F_TSO;
netdev->features |= NETIF_F_LLTX;

/*
--
1.5.5.1


2008-08-19 13:12:21

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] atl1: disable TSO by default

Jay Cliburn <[email protected]> wrote:
>
> diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
> index e6a7bb7..e23ce77 100644
> --- a/drivers/net/atlx/atl1.c
> +++ b/drivers/net/atlx/atl1.c
> @@ -3022,7 +3022,6 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
> netdev->features = NETIF_F_HW_CSUM;
> netdev->features |= NETIF_F_SG;
> netdev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
> - netdev->features |= NETIF_F_TSO;
> netdev->features |= NETIF_F_LLTX;

Another new driver using LLTX, this is not good when we're trying
to get rid of it.

Perhaps we could just kill it by ignoring the LLTX flag and always
grabbing the xmit lock. That should be safe as long as none of these
drivers grab the xmit lock within their private locks.

LLTX drivers will be a tad slower, but that should encourage them
to stop using it :)

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-08-19 22:56:34

by Jay Cliburn

[permalink] [raw]
Subject: Re: [PATCH] atl1: disable TSO by default

On Tue, 19 Aug 2008 23:10:37 +1000
Herbert Xu <[email protected]> wrote:

> Jay Cliburn <[email protected]> wrote:

> > netdev->features |= NETIF_F_LLTX;
>
> Another new driver using LLTX, this is not good when we're trying
> to get rid of it.

The atl1 driver was merged in the spring of 2007, so I'm not sure I
consider it new (but your kernel development time horizon is undoubtedly
*way* longer than mine, so you may indeed consider it new). It was
basically a vendor driver that we modified to conform to kernel coding
standards. It started life, we believe, as pretty much a clone of the
e1000 driver circa 2005, so that's likely where it's use of LLTX
came from.

>
> Perhaps we could just kill it by ignoring the LLTX flag and always
> grabbing the xmit lock. That should be safe as long as none of these
> drivers grab the xmit lock within their private locks.

I'd be happy to gin up a patch if you could point me to a driver that
implements properly what you're asking.

Thanks,
Jay

2008-08-19 22:59:32

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] atl1: disable TSO by default

From: Jay Cliburn <[email protected]>
Date: Tue, 19 Aug 2008 17:53:34 -0500

> On Tue, 19 Aug 2008 23:10:37 +1000
> Herbert Xu <[email protected]> wrote:
>
> > Jay Cliburn <[email protected]> wrote:
>
> > > netdev->features |= NETIF_F_LLTX;
> >
> > Another new driver using LLTX, this is not good when we're trying
> > to get rid of it.
>
> The atl1 driver was merged in the spring of 2007, so I'm not sure I
> consider it new (but your kernel development time horizon is undoubtedly
> *way* longer than mine, so you may indeed consider it new). It was
> basically a vendor driver that we modified to conform to kernel coding
> standards. It started life, we believe, as pretty much a clone of the
> e1000 driver circa 2005, so that's likely where it's use of LLTX
> came from.

You'll have to forgive us, as we often have a knee jerk reaction
to seeing LLTX anywhere :-)

> > Perhaps we could just kill it by ignoring the LLTX flag and always
> > grabbing the xmit lock. That should be safe as long as none of these
> > drivers grab the xmit lock within their private locks.
>
> I'd be happy to gin up a patch if you could point me to a driver that
> implements properly what you're asking.

TG3 is a good example, but that's just my heavily slanted opinion.

What your work should amount to is:

1) Eliminate local driver TX spinlock.

2) Stop taking #1 in ->hard_start_xmit()

3) Where you take #1 elsewhere, replace with netif_tx_lock()
and friends.

4) Stop setting NETIF_F_LLTX.

That should do it, but of course there are usually other details.

2008-08-19 23:17:39

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] atl1: disable TSO by default

On Tue, Aug 19, 2008 at 03:58:45PM -0700, David Miller wrote:
>
> > The atl1 driver was merged in the spring of 2007, so I'm not sure I
> > consider it new (but your kernel development time horizon is undoubtedly
> > *way* longer than mine, so you may indeed consider it new). It was
> > basically a vendor driver that we modified to conform to kernel coding
> > standards. It started life, we believe, as pretty much a clone of the
> > e1000 driver circa 2005, so that's likely where it's use of LLTX
> > came from.
>
> You'll have to forgive us, as we often have a knee jerk reaction
> to seeing LLTX anywhere :-)

Heh, it's like seeing a fly in your soup :)

However, this does seem to be proliferating to some extent although
I picked on the wrong driver. The one I saw that was new is actually
drivers/net/atl1e which seems to have copied the same code across.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-08-27 09:37:53

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] atl1: disable TSO by default

Jay Cliburn wrote:
> The atl1 driver is causing stalled connections and file corruption
> whenever TSO is enabled. Two examples are here:
>
> http://lkml.org/lkml/2008/7/15/325
> http://lkml.org/lkml/2008/8/18/543
>
> Disable TSO by default until we can determine the source of the
> problem.
>
> Signed-off-by: Jay Cliburn <[email protected]>
> cc: [email protected]
> ---
>
> Jeff, I've been trying to find the source of this problem in my
> spare time for a few weeks now, but haven't been successful. We
> turned on TSO by default in 2.6.26 after fixing a nasty performance
> bug in it, but there's apparently another bug lurking.
>
> This patch needs to be also applied to 2.6.26, hence the cc to
> stable.
>
> drivers/net/atlx/atl1.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)

applied