2008-07-15 17:00:35

by Ian Jeffray

[permalink] [raw]
Subject: sendfile() broken with 2.6.26 + Apache 2 ?

All,

I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
large files sent via Apache2 are partially corrupt.

This appears to be linked to sendfile() -- disabling the use of
sendfile in the apache config (EnableSendfile Off) allows it to
function as normal.

My system is a simple Core2Duo running Debian lenny/sid; nothing
special, and I have never observed problems like this before.

The problem feels certainly related to sendfile() since the data
reads correctly from disc in other programs, and via CIFS etc.

The corruption happens part-way in to the file... I've no exact
figure but it would seem like maybe 32KB -- I'm seeing broken
PNGs served from Apache, where the top few dozen lines decode
correctly, and the rest is garbage.

I've made basically no configuration changes between 2.6.25.4 and
2.6.26 and have explicitly tried both enabling and disabling the
new PAT support to no effect.

This is completely repeatable and reproducible.

Is anyone else seeing this broken behaviour?


Ian.


2008-07-16 05:43:59

by Eric Dumazet

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

CC to netdev where this report might find better answers

Ian Jeffray a ?crit :
> All,
>
> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
> large files sent via Apache2 are partially corrupt.
>
> This appears to be linked to sendfile() -- disabling the use of
> sendfile in the apache config (EnableSendfile Off) allows it to
> function as normal.
>
> My system is a simple Core2Duo running Debian lenny/sid; nothing
> special, and I have never observed problems like this before.
>
> The problem feels certainly related to sendfile() since the data
> reads correctly from disc in other programs, and via CIFS etc.
>
> The corruption happens part-way in to the file... I've no exact
> figure but it would seem like maybe 32KB -- I'm seeing broken
> PNGs served from Apache, where the top few dozen lines decode
> correctly, and the rest is garbage.
>
> I've made basically no configuration changes between 2.6.25.4 and
> 2.6.26 and have explicitly tried both enabling and disabling the
> new PAT support to no effect.
>
> This is completely repeatable and reproducible.
>
> Is anyone else seeing this broken behaviour?
>


What kind of network adapter are you using ? (lspci | grep -i ether)

If you disable TCP segmentation offload on this NIC (ethtool -K eth0 tso off) , is this problem still present ?



2008-07-16 07:30:34

by Ian Jeffray

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

Hi Eric,

Thanks for directing me to a better list.

Further responses below:

Eric Dumazet wrote:
> CC to netdev where this report might find better answers
>
> Ian Jeffray a ?crit :
>> All,
>>
>> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
>> large files sent via Apache2 are partially corrupt.
>>
>> This appears to be linked to sendfile() -- disabling the use of
>> sendfile in the apache config (EnableSendfile Off) allows it to
>> function as normal.
>>
>> My system is a simple Core2Duo running Debian lenny/sid; nothing
>> special, and I have never observed problems like this before.
>>
>> The problem feels certainly related to sendfile() since the data
>> reads correctly from disc in other programs, and via CIFS etc.
>>
>> The corruption happens part-way in to the file... I've no exact
>> figure but it would seem like maybe 32KB -- I'm seeing broken
>> PNGs served from Apache, where the top few dozen lines decode
>> correctly, and the rest is garbage.
>>
>> I've made basically no configuration changes between 2.6.25.4 and
>> 2.6.26 and have explicitly tried both enabling and disabling the
>> new PAT support to no effect.
>>
>> This is completely repeatable and reproducible.
>>
>> Is anyone else seeing this broken behaviour?
>>
>
>
> What kind of network adapter are you using ? (lspci | grep -i ether)

02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit
Ethernet Adapter (rev b0)


> If you disable TCP segmentation offload on this NIC (ethtool -K eth0 tso
> off) , is this problem still present ?

Wow. That 'solves' the problem! Great.

Does this therefore point to an attansic driver issue?


Ian.

2008-07-16 09:08:31

by Eric Dumazet

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

Ian Jeffray a ?crit :
> Hi Eric,
>
> Thanks for directing me to a better list.
>
> Further responses below:
>
> Eric Dumazet wrote:
>> CC to netdev where this report might find better answers
>>
>> Ian Jeffray a ?crit :
>>> All,
>>>
>>> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
>>> large files sent via Apache2 are partially corrupt.
>>>
>>> This appears to be linked to sendfile() -- disabling the use of
>>> sendfile in the apache config (EnableSendfile Off) allows it to
>>> function as normal.
>>>
>>> My system is a simple Core2Duo running Debian lenny/sid; nothing
>>> special, and I have never observed problems like this before.
>>>
>>> The problem feels certainly related to sendfile() since the data
>>> reads correctly from disc in other programs, and via CIFS etc.
>>>
>>> The corruption happens part-way in to the file... I've no exact
>>> figure but it would seem like maybe 32KB -- I'm seeing broken
>>> PNGs served from Apache, where the top few dozen lines decode
>>> correctly, and the rest is garbage.
>>>
>>> I've made basically no configuration changes between 2.6.25.4 and
>>> 2.6.26 and have explicitly tried both enabling and disabling the
>>> new PAT support to no effect.
>>>
>>> This is completely repeatable and reproducible.
>>>
>>> Is anyone else seeing this broken behaviour?
>>>
>>
>>
>> What kind of network adapter are you using ? (lspci | grep -i ether)
>
> 02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit
> Ethernet Adapter (rev b0)
>
>
>> If you disable TCP segmentation offload on this NIC (ethtool -K eth0
>> tso off) , is this problem still present ?
>
> Wow. That 'solves' the problem! Great.
>
> Does this therefore point to an attansic driver issue?
>


Yes, maybe related to commit 9d90fb1ac9d97da86e24d9ea947bf2a2f333829a

In this patch, Jay Cliburn enabled TSO by default for atl1 driver.

This might be a driver problem, or a generic sendfile() problem, I dont know...



2008-07-16 13:39:18

by J. K. Cliburn

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

Eric Dumazet wrote:
> Ian Jeffray a ?crit :
>> Hi Eric,
>>
>> Thanks for directing me to a better list.
>>
>> Further responses below:
>>
>> Eric Dumazet wrote:
>>> CC to netdev where this report might find better answers
>>>
>>> Ian Jeffray a ?crit :
>>>> All,
>>>>
>>>> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
>>>> large files sent via Apache2 are partially corrupt.
>>>>
>>>> This appears to be linked to sendfile() -- disabling the use of
>>>> sendfile in the apache config (EnableSendfile Off) allows it to
>>>> function as normal.

>>>
>>>
>>> What kind of network adapter are you using ? (lspci | grep -i ether)
>>
>> 02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit
>> Ethernet Adapter (rev b0)
>>
>>
>>> If you disable TCP segmentation offload on this NIC (ethtool -K eth0
>>> tso off) , is this problem still present ?
>>
>> Wow. That 'solves' the problem! Great.
>>
>> Does this therefore point to an attansic driver issue?
>>
>
>
> Yes, maybe related to commit 9d90fb1ac9d97da86e24d9ea947bf2a2f333829a
> In this patch, Jay Cliburn enabled TSO by default for atl1 driver.
>
> This might be a driver problem, or a generic sendfile() problem, I dont
> know...

I'm currently traveling and unable to delve into this issue and its
relation to the atl1 driver. I should be able to look at it this
weekend when I get back home.

Jay

2008-07-16 13:41:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

On Wed, Jul 16, 2008 at 4:38 PM, J. K. Cliburn <[email protected]> wrote:
>> Yes, maybe related to commit 9d90fb1ac9d97da86e24d9ea947bf2a2f333829a
>> In this patch, Jay Cliburn enabled TSO by default for atl1 driver.
>>
>> This might be a driver problem, or a generic sendfile() problem, I dont
>> know...
>
> I'm currently traveling and unable to delve into this issue and its relation
> to the atl1 driver. I should be able to look at it this weekend when I get
> back home.

Ehh... shouldn't we disable atl1 TSO by default for -stable, then?

2008-07-16 14:00:31

by J. K. Cliburn

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

Pekka Enberg wrote:
> On Wed, Jul 16, 2008 at 4:38 PM, J. K. Cliburn <[email protected]> wrote:
>>> Yes, maybe related to commit 9d90fb1ac9d97da86e24d9ea947bf2a2f333829a
>>> In this patch, Jay Cliburn enabled TSO by default for atl1 driver.
>>>
>>> This might be a driver problem, or a generic sendfile() problem, I dont
>>> know...
>> I'm currently traveling and unable to delve into this issue and its relation
>> to the atl1 driver. I should be able to look at it this weekend when I get
>> back home.
>
> Ehh... shouldn't we disable atl1 TSO by default for -stable, then?

Based upon my inability to look at the problem for a couple more days?


2008-07-16 15:00:21

by Holger Hoffstaette

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

On Wed, 16 Jul 2008 11:08:04 +0200, Eric Dumazet wrote:

> Ian Jeffray a ?crit :
>> Hi Eric,
>>
>> Thanks for directing me to a better list.
>>
>> Further responses below:
>>
>> Eric Dumazet wrote:
>>> CC to netdev where this report might find better answers
>>>
>>> Ian Jeffray a ?crit :
>>>> All,
>>>>
>>>> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
>>>> large files sent via Apache2 are partially corrupt.
>>>>
>>>> This appears to be linked to sendfile() -- disabling the use of
>>>> sendfile in the apache config (EnableSendfile Off) allows it to
>>>> function as normal.
>>>>
>>>> My system is a simple Core2Duo running Debian lenny/sid; nothing
>>>> special, and I have never observed problems like this before.
>>>>
>>>> The problem feels certainly related to sendfile() since the data reads
>>>> correctly from disc in other programs, and via CIFS etc.
>>>>
>>>> The corruption happens part-way in to the file... I've no exact figure
>>>> but it would seem like maybe 32KB -- I'm seeing broken PNGs served
>>>> from Apache, where the top few dozen lines decode correctly, and the
>>>> rest is garbage.
>>>>
>>>> I've made basically no configuration changes between 2.6.25.4 and
>>>> 2.6.26 and have explicitly tried both enabling and disabling the new
>>>> PAT support to no effect.
>>>>
>>>> This is completely repeatable and reproducible.
>>>>
>>>> Is anyone else seeing this broken behaviour?
>>>>
>>>>
>>>
>>> What kind of network adapter are you using ? (lspci | grep -i ether)
>>
>> 02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit
>> Ethernet Adapter (rev b0)
>>
>>
>>> If you disable TCP segmentation offload on this NIC (ethtool -K eth0
>>> tso off) , is this problem still present ?
>>
>> Wow. That 'solves' the problem! Great.
>>
>> Does this therefore point to an attansic driver issue?
>
> Yes, maybe related to commit 9d90fb1ac9d97da86e24d9ea947bf2a2f333829a
>
> In this patch, Jay Cliburn enabled TSO by default for atl1 driver.
>
> This might be a driver problem, or a generic sendfile() problem, I dont
> know...

Maybe related to http://lkml.org/lkml/2007/12/6/229 ?
I switched to a different server with e1000 NIC in the meantime and so I
cannot test if this is still a problem in 2.6.26, but apparently it seems
so. For me the combo e1000/e1000e + sendfile works reliably..

Holger

2008-07-19 12:32:16

by J. K. Cliburn

[permalink] [raw]
Subject: Re: sendfile() broken with 2.6.26 + Apache 2 ?

Ian Jeffray wrote:
> All,
>
> I moved from kernel 2.6.25.4 to 2.6.26 yesterday and observed that
> large files sent via Apache2 are partially corrupt.
>
> This appears to be linked to sendfile() -- disabling the use of
> sendfile in the apache config (EnableSendfile Off) allows it to
> function as normal.
>
> My system is a simple Core2Duo running Debian lenny/sid; nothing
> special, and I have never observed problems like this before.
>
> The problem feels certainly related to sendfile() since the data
> reads correctly from disc in other programs, and via CIFS etc.
>
> The corruption happens part-way in to the file... I've no exact
> figure but it would seem like maybe 32KB -- I'm seeing broken
> PNGs served from Apache, where the top few dozen lines decode
> correctly, and the rest is garbage.
>
> I've made basically no configuration changes between 2.6.25.4 and
> 2.6.26 and have explicitly tried both enabling and disabling the
> new PAT support to no effect.
>
> This is completely repeatable and reproducible.
>
> Is anyone else seeing this broken behaviour?

Can you please enable verbose logging

echo 8 > /proc/sys/kernel/printk
ethtool -s eth0 msglvl 0xffff

and see if the driver issues any error messages as the file is transferred?