[ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
> On Tue, 19 Jan 2010 20:01:10 -0500
> Michael Breuer <[email protected]> wrote:
>
> > On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
> > > On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
> > >
> > >> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
> > >>
> > >>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
> > >>> ...
> > >>>
> > >>>> Still get the warning... but now 60 bytes.
> > >>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
> > >>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
...
> > That not only compiled, but it cleared the error as well. Additionally,
> > I used to see a bit of a delay receiving the login prompt when first
> > connecting to the box by ssh. That delay is gone with this patch. I'd
> > guess that the warning wasn't quite as innocuous as I thought.
> > Note: tested on 2.6.32.4. I'll leave this up for a bit before
> > attempting to move back to head.
>
> Seems like an underlying bug in the DMA api. Maybe it just can't
> handle operations on partial mapping.
>
> Other drivers with same problem:
> bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
It seems using the same length (even without pci_unmap_len()) is
crucial here, but I hope maintainers (added to CC) will take care.
Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
Thanks,
Jarek P.
----------------------->
Using pci_unmap_len(), with the same length as pci_map_single(), with
pci_dma_sync_single_for_cpu()/_device() fixes this warning (2.6.32.4):
> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
> check_sync+0xc1/0x43f()
> Jan 19 10:43:50 mail kernel: Hardware name: System Product Name
> Jan 19 10:43:50 mail kernel: sky2 0000:04:00.0: DMA-API: device driver
> tries to sync DMA memory it has not allocated [device
> address=0x0000000320a0b022] [size=60 bytes]
Reported-by: Michael Breuer <[email protected]>
Tested-by: Michael Breuer <[email protected]>
Signed-off-by: Jarek Poplawski <[email protected]>
---
drivers/net/sky2.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 7650f73..cdebdd3 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
if (likely(skb)) {
pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ pci_unmap_len(re, data_size),
+ PCI_DMA_FROMDEVICE);
skb_copy_from_linear_data(re->skb, skb->data, length);
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ pci_unmap_len(re, data_size),
+ PCI_DMA_FROMDEVICE);
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
}
On Wed, 20 Jan 2010 09:41:03 +0000
Jarek Poplawski <[email protected]> wrote:
> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
> > On Tue, 19 Jan 2010 20:01:10 -0500
> > Michael Breuer <[email protected]> wrote:
> >
> > > On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
> > > > On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
> > > >
> > > >> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
> > > >>
> > > >>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
> > > >>> ...
> > > >>>
> > > >>>> Still get the warning... but now 60 bytes.
> > > >>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
> > > >>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
> ...
> > > That not only compiled, but it cleared the error as well. Additionally,
> > > I used to see a bit of a delay receiving the login prompt when first
> > > connecting to the box by ssh. That delay is gone with this patch. I'd
> > > guess that the warning wasn't quite as innocuous as I thought.
> > > Note: tested on 2.6.32.4. I'll leave this up for a bit before
> > > attempting to move back to head.
> >
> > Seems like an underlying bug in the DMA api. Maybe it just can't
> > handle operations on partial mapping.
> >
> > Other drivers with same problem:
> > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
>
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
>
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
The list of places to inspect is:
$ git grep -l sync_single_for_cpu $(git grep -l skb_copy_from_linear_data)
drivers/infiniband/ulp/ipoib/ipoib_cm.c
drivers/message/fusion/mptlan.c
drivers/net/b44.c
drivers/net/bnx2.c
drivers/net/cassini.c
drivers/net/chelsio/sge.c
drivers/net/cxgb3/sge.c
drivers/net/irda/vlsi_ir.c
drivers/net/myri_sbus.c
drivers/net/r8169.c
drivers/net/rrunner.c
drivers/net/skge.c
drivers/net/sky2.c
drivers/net/sungem.c
drivers/net/sunhme.c
drivers/net/tg3.c
drivers/net/tokenring/3c359.c
drivers/net/tokenring/olympic.c
drivers/net/tulip/de2104x.c
drivers/net/via-velocity.c
drivers/net/wireless/ipw2x00/ipw2100.c
drivers/net/wireless/ipw2x00/ipw2200.c
On Wed, 20 Jan 2010 09:41:03 +0000
Jarek Poplawski <[email protected]> wrote:
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
>
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
>
> Thanks,
> Jarek P.
Update Documentation/PCI-DMA-mapping as well please.
--
On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> On Wed, 20 Jan 2010 09:41:03 +0000
> Jarek Poplawski <[email protected]> wrote:
> > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > handle operations on partial mapping.
> > >
> > > Other drivers with same problem:
> > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> >
> > It seems using the same length (even without pci_unmap_len()) is
> > crucial here, but I hope maintainers (added to CC) will take care.
> >
I'm still unsure how to do dma_sync properly in bnx2. In the current
code, we always dma_sync_for_cpu a small portion of the SKB because the
rx descriptor is at the beginning of the SKB. We get the packet length,
for example, from the rx descriptor.
If it's a big packet, we'll simply unmap the entire SKB buffer (with the
beginning portion already dma_sync'ed). If the packet is smaller than
what we dma_sync'ed, we'll just copy the data to a new SKB. We'll then
dma_sync_for_device the portion of the original buffer and recycle the
whole buffer back to the device for new packets.
So, is it correct to just change the dma_sync length to the full length
of the buffer? It doesn't sound right to me.
Thanks.
On Wed, 20 Jan 2010 12:11:52 -0800
"Michael Chan" <[email protected]> wrote:
>
> On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> > On Wed, 20 Jan 2010 09:41:03 +0000
> > Jarek Poplawski <[email protected]> wrote:
> > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > handle operations on partial mapping.
> > > >
> > > > Other drivers with same problem:
> > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > >
> > > It seems using the same length (even without pci_unmap_len()) is
> > > crucial here, but I hope maintainers (added to CC) will take care.
> > >
>
> I'm still unsure how to do dma_sync properly in bnx2. In the current
> code, we always dma_sync_for_cpu a small portion of the SKB because the
> rx descriptor is at the beginning of the SKB. We get the packet length,
> for example, from the rx descriptor.
>
> If it's a big packet, we'll simply unmap the entire SKB buffer (with the
> beginning portion already dma_sync'ed). If the packet is smaller than
> what we dma_sync'ed, we'll just copy the data to a new SKB. We'll then
> dma_sync_for_device the portion of the original buffer and recycle the
> whole buffer back to the device for new packets.
>
> So, is it correct to just change the dma_sync length to the full length
> of the buffer? It doesn't sound right to me.
It looks like the size passed to sync_single has to match size of original
mapping.
On Wed, Jan 20, 2010 at 12:30:33PM -0800, Stephen Hemminger wrote:
> On Wed, 20 Jan 2010 12:11:52 -0800
> "Michael Chan" <[email protected]> wrote:
>
> >
> > On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> > > On Wed, 20 Jan 2010 09:41:03 +0000
> > > Jarek Poplawski <[email protected]> wrote:
> > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > handle operations on partial mapping.
> > > > >
> > > > > Other drivers with same problem:
> > > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > > >
> > > > It seems using the same length (even without pci_unmap_len()) is
> > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > >
> >
> > I'm still unsure how to do dma_sync properly in bnx2. In the current
> > code, we always dma_sync_for_cpu a small portion of the SKB because the
> > rx descriptor is at the beginning of the SKB. We get the packet length,
> > for example, from the rx descriptor.
> >
> > If it's a big packet, we'll simply unmap the entire SKB buffer (with the
> > beginning portion already dma_sync'ed). If the packet is smaller than
> > what we dma_sync'ed, we'll just copy the data to a new SKB. We'll then
> > dma_sync_for_device the portion of the original buffer and recycle the
> > whole buffer back to the device for new packets.
> >
> > So, is it correct to just change the dma_sync length to the full length
> > of the buffer? It doesn't sound right to me.
>
> It looks like the size passed to sync_single has to match size of original
> mapping.
Yes, and it's mainly for lib/dma-debug (until it's not verified dmar
errors reported by Michael Breuer could be connected). So, I'm not
sure for now how serious this warning could be. On the other hand,
Documentation/(PCI-?)DMA-mapping, mentioned by Stephen, doesn't seem
to allow or use in examples the "size" different than mapped.
Jarek P.
> > Seems like an underlying bug in the DMA api. Maybe it just can't
> > handle operations on partial mapping.
> >
> > Other drivers with same problem:
> > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
>
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
The API needs fixing - if you've got a large mapping and you want to sync
part of it then we need to support that. Now it might well be that the
implementation on some braindead platform has to sync the entire thing,
and some implementations entire pages or cache lines.
You can't fix this in the drivers, they requested a service and they
don't have enough information nor is it their job to know about all the
platform specific rules.
Alan
From: Stephen Hemminger <[email protected]>
Date: Wed, 20 Jan 2010 12:30:33 -0800
> It looks like the size passed to sync_single has to match size of original
> mapping.
I think that's rediculious, and unreasonable.
All implementations of these APIs where the sync matters have the
ability to handle partial syncs correctly.
From: Jarek Poplawski <[email protected]>
Date: Wed, 20 Jan 2010 21:58:42 +0100
> Yes, and it's mainly for lib/dma-debug (until it's not verified dmar
> errors reported by Michael Breuer could be connected). So, I'm not
> sure for now how serious this warning could be. On the other hand,
> Documentation/(PCI-?)DMA-mapping, mentioned by Stephen, doesn't seem
> to allow or use in examples the "size" different than mapped.
Someone find me even one actual implemenetation of the DMA APIs, not
the debugging code, which can't handle a partial sync correctly, then
we can talk.
As the person who originally created the PCI DMA APIs I can tell
you the intention was definitely to allow arbitrary SYNC lengths
and there was no requirement that the length needed to be the
same as that used in the mapping call.
Nothing else makes sense. People sync when they have a partially used
buffer and want to copy the data out then recycle the big original
buffer back to the chip. I knew this when creating the PCI DMA APIs
because I first tried to use them in network drivers and that's what
so many of them do.
Currently all we have is an overly anal DMA debugging layer check, and
some speculation that it might or might not fix some real bug.
From: Alan Cox <[email protected]>
Date: Wed, 20 Jan 2010 22:24:14 +0000
>> > Seems like an underlying bug in the DMA api. Maybe it just can't
>> > handle operations on partial mapping.
>> >
>> > Other drivers with same problem:
>> > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
>>
>> It seems using the same length (even without pci_unmap_len()) is
>> crucial here, but I hope maintainers (added to CC) will take care.
>
> The API needs fixing - if you've got a large mapping and you want to sync
> part of it then we need to support that. Now it might well be that the
> implementation on some braindead platform has to sync the entire thing,
> and some implementations entire pages or cache lines.
>
> You can't fix this in the drivers, they requested a service and they
> don't have enough information nor is it their job to know about all the
> platform specific rules.
Absolutely and %100 agreed, the DMA debugging layer and documentation
are both buggy.
The intention from the beginning was always to allow partial SYNCs
exactly for the reasons Alan and myself are mentioning here.
On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > handle operations on partial mapping.
> > >
> > > Other drivers with same problem:
> > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> >
> > It seems using the same length (even without pci_unmap_len()) is
> > crucial here, but I hope maintainers (added to CC) will take care.
>
> The API needs fixing - if you've got a large mapping and you want to sync
> part of it then we need to support that. Now it might well be that the
> implementation on some braindead platform has to sync the entire thing,
> and some implementations entire pages or cache lines.
>
> You can't fix this in the drivers, they requested a service and they
> don't have enough information nor is it their job to know about all the
> platform specific rules.
Yes, the need to repeat some other values if there is a dedicated
structure/pointer could be misleading. Btw, it seems to be a trivial
overlooking since there is dma_sync_single_range() ready to use.
Jarek P.
On Wed, 20 Jan 2010 23:53:22 +0100
Jarek Poplawski <[email protected]> wrote:
> On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > handle operations on partial mapping.
> > > >
> > > > Other drivers with same problem:
> > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > >
> > > It seems using the same length (even without pci_unmap_len()) is
> > > crucial here, but I hope maintainers (added to CC) will take care.
> >
> > The API needs fixing - if you've got a large mapping and you want to sync
> > part of it then we need to support that. Now it might well be that the
> > implementation on some braindead platform has to sync the entire thing,
> > and some implementations entire pages or cache lines.
> >
> > You can't fix this in the drivers, they requested a service and they
> > don't have enough information nor is it their job to know about all the
> > platform specific rules.
>
> Yes, the need to repeat some other values if there is a dedicated
> structure/pointer could be misleading. Btw, it seems to be a trivial
> overlooking since there is dma_sync_single_range() ready to use.
Yeah, dma_sync_single_range() enables you to do a partial sync. But
you must be really careful with a partial sync (as DMA-API.txt says).
On Fri, Jan 22, 2010 at 12:22:10AM +0900, FUJITA Tomonori wrote:
> On Wed, 20 Jan 2010 23:53:22 +0100
> Jarek Poplawski <[email protected]> wrote:
>
> > On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > handle operations on partial mapping.
> > > > >
> > > > > Other drivers with same problem:
> > > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > > >
> > > > It seems using the same length (even without pci_unmap_len()) is
> > > > crucial here, but I hope maintainers (added to CC) will take care.
> > >
> > > The API needs fixing - if you've got a large mapping and you want to sync
> > > part of it then we need to support that. Now it might well be that the
> > > implementation on some braindead platform has to sync the entire thing,
> > > and some implementations entire pages or cache lines.
> > >
> > > You can't fix this in the drivers, they requested a service and they
> > > don't have enough information nor is it their job to know about all the
> > > platform specific rules.
> >
> > Yes, the need to repeat some other values if there is a dedicated
> > structure/pointer could be misleading. Btw, it seems to be a trivial
> > overlooking since there is dma_sync_single_range() ready to use.
>
> Yeah, dma_sync_single_range() enables you to do a partial sync. But
> you must be really careful with a partial sync (as DMA-API.txt says).
Actually, we are trying to establish here (and a few more netdev@
threads) what exactly the author was worried about. After looking at
some implementations it seems to me this carefullness in observing
the cache alignment and width is needed only wrt. the 'offset'. But
then, the way the 'size' is used (or rather not used for anything
crucial) suggests dma_sync_single_range() with zero offset seems
completely safe. But then it's equivalent to dma_sync_single() with
the 'size' possibly less than 'passed into the single mapping'. Which
according to the other comment seems wrong...
Jarek P.
On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
>
>> On Tue, 19 Jan 2010 20:01:10 -0500
>> Michael Breuer<[email protected]> wrote:
>>
>>
>>> On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
>>>
>>>> On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
>>>>
>>>>
>>>>> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
>>>>>
>>>>>
>>>>>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>> Still get the warning... but now 60 bytes.
>>>>>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
>>>>>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
>>>>>>>
> ...
>
>>> That not only compiled, but it cleared the error as well. Additionally,
>>> I used to see a bit of a delay receiving the login prompt when first
>>> connecting to the box by ssh. That delay is gone with this patch. I'd
>>> guess that the warning wasn't quite as innocuous as I thought.
>>> Note: tested on 2.6.32.4. I'll leave this up for a bit before
>>> attempting to move back to head.
>>>
>> Seems like an underlying bug in the DMA api. Maybe it just can't
>> handle operations on partial mapping.
>>
>> Other drivers with same problem:
>> bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
>>
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
>
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
>
> Thanks,
> Jarek P.
> ----------------------->
>
> Using pci_unmap_len(), with the same length as pci_map_single(), with
> pci_dma_sync_single_for_cpu()/_device() fixes this warning (2.6.32.4):
>
>
>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
>> check_sync+0xc1/0x43f()
>> Jan 19 10:43:50 mail kernel: Hardware name: System Product Name
>> Jan 19 10:43:50 mail kernel: sky2 0000:04:00.0: DMA-API: device driver
>> tries to sync DMA memory it has not allocated [device
>> address=0x0000000320a0b022] [size=60 bytes]
>>
> Reported-by: Michael Breuer<[email protected]>
> Tested-by: Michael Breuer<[email protected]>
> Signed-off-by: Jarek Poplawski<[email protected]>
> ---
>
> drivers/net/sky2.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 7650f73..cdebdd3 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> if (likely(skb)) {
> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + pci_unmap_len(re, data_size),
> + PCI_DMA_FROMDEVICE);
> skb_copy_from_linear_data(re->skb, skb->data, length);
> skb->ip_summed = re->skb->ip_summed;
> skb->csum = re->skb->csum;
> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + pci_unmap_len(re, data_size),
> + PCI_DMA_FROMDEVICE);
> re->skb->ip_summed = CHECKSUM_NONE;
> skb_put(skb, length);
> }
>
Just a testing update: I went back to CONFIG_DMAR=Y yesterday and have
not (yet) encountered the sky2 crash I'd been having prior to this fix.
I've been pumping traffic through, and based on pre-patch experience, it
would likely have crashed by now.
Will keep the system up for the next couple of days w/o reboot to
confirm that the sky2 lockup I'd been seeing has stopped happening with
this patch.
Test notes:
1) Warning previously apparent on start (dma_debug check_sync) with
CONFIG_DMAR=n is gone.
2) W/o the above patch, I was getting sky2 DMAR errors and subsequent TX
hangs requiring reboot to clear. The hangs happened after at least 12
hours of uptime, and under RX load at the time of the hang.
3) With the above patch (and no other changes) I have not been able to
recreate the crash - the system is stable.
I have been following the discussion about the DMA api would suggest
that the length issue when DMAR is enabled is less innocuous than
previously believed.
On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
> On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
> >[ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> >diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> >index 7650f73..cdebdd3 100644
> >--- a/drivers/net/sky2.c
> >+++ b/drivers/net/sky2.c
> >@@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
> > skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> > if (likely(skb)) {
> > pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> >- length, PCI_DMA_FROMDEVICE);
> >+ pci_unmap_len(re, data_size),
> >+ PCI_DMA_FROMDEVICE);
> > skb_copy_from_linear_data(re->skb, skb->data, length);
> > skb->ip_summed = re->skb->ip_summed;
> > skb->csum = re->skb->csum;
> > pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> >- length, PCI_DMA_FROMDEVICE);
> >+ pci_unmap_len(re, data_size),
> >+ PCI_DMA_FROMDEVICE);
> > re->skb->ip_summed = CHECKSUM_NONE;
> > skb_put(skb, length);
> > }
> Just a testing update: I went back to CONFIG_DMAR=Y yesterday and
> have not (yet) encountered the sky2 crash I'd been having prior to
> this fix. I've been pumping traffic through, and based on pre-patch
> experience, it would likely have crashed by now.
>
> Will keep the system up for the next couple of days w/o reboot to
> confirm that the sky2 lockup I'd been seeing has stopped happening
> with this patch.
>
> Test notes:
>
> 1) Warning previously apparent on start (dma_debug check_sync) with
> CONFIG_DMAR=n is gone.
> 2) W/o the above patch, I was getting sky2 DMAR errors and
> subsequent TX hangs requiring reboot to clear. The hangs happened
> after at least 12 hours of uptime, and under RX load at the time of
> the hang.
> 3) With the above patch (and no other changes) I have not been able
> to recreate the crash - the system is stable.
Btw, could you remind us if during last dmar bugs jumbo frames might
have been used or maybe mtu was changed, and the current test setting?
>
> I have been following the discussion about the DMA api would suggest
> that the length issue when DMAR is enabled is less innocuous than
> previously believed.
Actually, the last conclusions are - it's more innocuous than ever
believed, and I completely agree with this (at least until the next
week ;-).
Thanks,
Jarek P.
On 1/21/2010 3:41 PM, Jarek Poplawski wrote:
> On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
>
>> On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
>>
>>> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
>>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>>> index 7650f73..cdebdd3 100644
>>> --- a/drivers/net/sky2.c
>>> +++ b/drivers/net/sky2.c
>>> @@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
>>> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>>> if (likely(skb)) {
>>> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
>>> - length, PCI_DMA_FROMDEVICE);
>>> + pci_unmap_len(re, data_size),
>>> + PCI_DMA_FROMDEVICE);
>>> skb_copy_from_linear_data(re->skb, skb->data, length);
>>> skb->ip_summed = re->skb->ip_summed;
>>> skb->csum = re->skb->csum;
>>> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
>>> - length, PCI_DMA_FROMDEVICE);
>>> + pci_unmap_len(re, data_size),
>>> + PCI_DMA_FROMDEVICE);
>>> re->skb->ip_summed = CHECKSUM_NONE;
>>> skb_put(skb, length);
>>> }
>>>
>> Just a testing update: I went back to CONFIG_DMAR=Y yesterday and
>> have not (yet) encountered the sky2 crash I'd been having prior to
>> this fix. I've been pumping traffic through, and based on pre-patch
>> experience, it would likely have crashed by now.
>>
>> Will keep the system up for the next couple of days w/o reboot to
>> confirm that the sky2 lockup I'd been seeing has stopped happening
>> with this patch.
>>
>> Test notes:
>>
>> 1) Warning previously apparent on start (dma_debug check_sync) with
>> CONFIG_DMAR=n is gone.
>> 2) W/o the above patch, I was getting sky2 DMAR errors and
>> subsequent TX hangs requiring reboot to clear. The hangs happened
>> after at least 12 hours of uptime, and under RX load at the time of
>> the hang.
>> 3) With the above patch (and no other changes) I have not been able
>> to recreate the crash - the system is stable.
>>
> Btw, could you remind us if during last dmar bugs jumbo frames might
> have been used or maybe mtu was changed, and the current test setting?
>
>
I've hit this with and without Jumbo frames enabled. Last couple of
recreations were with mtu = 1500, which is how I'm running now.
>> I have been following the discussion about the DMA api would suggest
>> that the length issue when DMAR is enabled is less innocuous than
>> previously believed.
>>
> Actually, the last conclusions are - it's more innocuous than ever
> believed, and I completely agree with this (at least until the next
> week ;-).
>
I stand grammatically corrected.
> Thanks,
> Jarek P.
>
On Thu, Jan 21, 2010 at 03:46:50PM -0500, Michael Breuer wrote:
> On 1/21/2010 3:41 PM, Jarek Poplawski wrote:
> >On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
> >>Test notes:
> >>
> >>1) Warning previously apparent on start (dma_debug check_sync) with
> >>CONFIG_DMAR=n is gone.
> >>2) W/o the above patch, I was getting sky2 DMAR errors and
> >>subsequent TX hangs requiring reboot to clear. The hangs happened
> >>after at least 12 hours of uptime, and under RX load at the time of
> >>the hang.
> >>3) With the above patch (and no other changes) I have not been able
> >>to recreate the crash - the system is stable.
> >Btw, could you remind us if during last dmar bugs jumbo frames might
> >have been used or maybe mtu was changed, and the current test setting?
> >
> I've hit this with and without Jumbo frames enabled. Last couple of
> recreations were with mtu = 1500, which is how I'm running now.
> >>I have been following the discussion about the DMA api would suggest
> >>that the length issue when DMAR is enabled is less innocuous than
> >>previously believed.
> >Actually, the last conclusions are - it's more innocuous than ever
> >believed, and I completely agree with this (at least until the next
> >week ;-).
> I stand grammatically corrected.
I didn't mean anything grammatical, sorry! (Except, it's equally
complex ;-)
Jarek P.
On Thu, 21 Jan 2010 19:41:58 +0100
Jarek Poplawski <[email protected]> wrote:
> On Fri, Jan 22, 2010 at 12:22:10AM +0900, FUJITA Tomonori wrote:
> > On Wed, 20 Jan 2010 23:53:22 +0100
> > Jarek Poplawski <[email protected]> wrote:
> >
> > > On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > > handle operations on partial mapping.
> > > > > >
> > > > > > Other drivers with same problem:
> > > > > > bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
> > > > >
> > > > > It seems using the same length (even without pci_unmap_len()) is
> > > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > >
> > > > The API needs fixing - if you've got a large mapping and you want to sync
> > > > part of it then we need to support that. Now it might well be that the
> > > > implementation on some braindead platform has to sync the entire thing,
> > > > and some implementations entire pages or cache lines.
> > > >
> > > > You can't fix this in the drivers, they requested a service and they
> > > > don't have enough information nor is it their job to know about all the
> > > > platform specific rules.
> > >
> > > Yes, the need to repeat some other values if there is a dedicated
> > > structure/pointer could be misleading. Btw, it seems to be a trivial
> > > overlooking since there is dma_sync_single_range() ready to use.
> >
> > Yeah, dma_sync_single_range() enables you to do a partial sync. But
> > you must be really careful with a partial sync (as DMA-API.txt says).
>
> Actually, we are trying to establish here (and a few more netdev@
> threads) what exactly the author was worried about. After looking at
James added to Cc,
> some implementations it seems to me this carefullness in observing
> the cache alignment and width is needed only wrt. the 'offset'. But
> then, the way the 'size' is used (or rather not used for anything
> crucial) suggests dma_sync_single_range() with zero offset seems
> completely safe. But then it's equivalent to dma_sync_single() with
Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
not a multiple of the cache line size, it's possible that driver
writers who aren't familiar with cache would be surprised (it depends
on the way their drivers use buffers though).
The easiest way for 'completely safe sync for any driver writers' is
asking for all the sync parameters must be the same as those passed
into the single mapping API. If writes knows what they do, they can do
a partial sync with sync_range API. That's the author intention, I
guess.
From: FUJITA Tomonori <[email protected]>
Date: Fri, 22 Jan 2010 14:11:29 +0900
> Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> not a multiple of the cache line size, it's possible that driver
> writers who aren't familiar with cache would be surprised (it depends
> on the way their drivers use buffers though).
>
> The easiest way for 'completely safe sync for any driver writers' is
> asking for all the sync parameters must be the same as those passed
> into the single mapping API. If writes knows what they do, they can do
> a partial sync with sync_range API. That's the author intention, I
> guess.
This is not reasonable.
You have to think about how people actually use these
interfaces.
They have a large buffer, and if they receive a small request they
want to allocate a smaller buffer, copy into that smaller buffer, and
give the larger buffer back to the hardware.
It's an optimization, it performs better this way.
If you make it so that the DMA sync has to cover the entire large
buffer, the whole point of the optimization is taken away.
That makes no sense at all.
I know that when I designed and wrote the first implementation of the
PCI DMA interfaces, I sure as hell meant to allow partial DMA sync
operations.
I know this as a fact, because the first drivers ported over
to these interfaces were network drivers. And I definitely
knew about the copy-break mechanism I describe above and how
networking drivers use such a scheme pretty much across the
board.
The DMA API documentation is wrong, it must be fixed to allow partial
syncs of arbitrary offsets and sizes.
The issue of cache line boundaries and such are the domain of the DMA
API implementation, and has absolutely no business in the definition
of these interfaces. Nor should it be something driver authors have
to be knowledgable about, that would be completely unreasonable.
Kernel 2.6.32.4 (git) with the following patches applied:
af_packet.c (tpacket_snd version 3)
sky2.c pskb_may_pull
sky2 fix WARNING at lib/dma-debug.c check_sync
Running with CONFIG_DMAR=n, system is stable.
Running with the exact same source but CONFIG_DMAR=y I get the WARNING
(see below) after about 36 hours of uptime (has varied from about 24 to
about 48):
Smolt profile:
http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
I'm also attaching dmesg.old (dmesg from the crash).
Subsequent to this the system watchdog reboots the system (it's hung).
Of interest: each and every time this has happened the system was under
heavy RX load (win7 backup to a cifs share hosted on this server). Also,
there is always a dhcp exchange of some sort preceding the event.
It is possible that the event is re creatable without DMAR enabled, but
I have been unsuccessful in doing so.
Jan 22 05:38:36 mail dhcpd: DHCPREQUEST for 10.0.0.54 from
00:1b:78:c8:2b:8e (HPC82B8D) via eth0
Jan 22 05:38:36 mail dhcpd: DHCPACK on 10.0.0.54 to 00:1b:78:c8:2b:8e
(HPC82B8D) via eth0
Jan 22 05:38:41 mail kernel: DRHD: handling fault status reg 2
Jan 22 05:38:41 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
fault addr ffdfdd9fe000
Jan 22 05:38:41 mail kernel: DMAR:[fault reason 06] PTE Read access is
not set
Jan 22 05:38:41 mail kernel: sky2 0000:06:00.0: error interrupt
status=0xc0000000
Jan 22 05:38:41 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 22 05:39:18 mail kernel: ------------[ cut here ]------------
Jan 22 05:39:18 mail kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0xf3/0x164()
Jan 22 05:39:18 mail kernel: Hardware name: System Product Name
Jan 22 05:39:18 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
queue 0 timed out
Jan 22 05:39:18 mail kernel: Modules linked in: cpufreq_stats
ip6table_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat
nf_nat iptable_mangle iptable_raw appletalk psnap llc nfsd lockd nfs_acl
auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit tunnel4
ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp nf_conntrack_ipv6
xt_multiport xt_DSCP xt_dscp xt_MARK ipv6 dm_multipath kvm_intel kvm
snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_hda_intel
snd_ac97_codec snd_hda_codec snd_hwdep ac97_bus snd_seq snd_seq_device
firewire_ohci snd_pcm firewire_core crc_itu_t snd_timer snd
gspca_spca505 gspca_main i2c_i801 videodev v4l1_compat sky2 soundcore
v4l2_compat_ioctl32 wmi snd_page_alloc asus_atk0110 hwmon pcspkr
iTCO_wdt iTCO_vendor_support fbcon tileblit font bitblit softcursor
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid1 ata_generic pata_acpi pata_marvell nouveau ttm
drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core
cfbimgblt cfb
Jan 22 05:39:18 mail kernel: fillrect [last unloaded: microcode]
Jan 22 05:39:18 mail kernel: Pid: 0, comm: swapper Tainted: G W
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 22 05:39:18 mail kernel: Call Trace:
Jan 22 05:39:18 mail kernel: <IRQ> [<ffffffff810536ee>]
warn_slowpath_common+0x7c/0x94
Jan 22 05:39:18 mail kernel: [<ffffffff8105375d>]
warn_slowpath_fmt+0x41/0x43
Jan 22 05:39:18 mail kernel: [<ffffffff813e3b6b>] ? netif_tx_lock+0x44/0x6c
Jan 22 05:39:18 mail kernel: [<ffffffff813e3cd3>] dev_watchdog+0xf3/0x164
Jan 22 05:39:18 mail kernel: [<ffffffff8105f3f4>] ? cascade+0x6a/0x84
Jan 22 05:39:18 mail kernel: [<ffffffff8106323f>]
run_timer_softirq+0x1c8/0x270
Jan 22 05:39:18 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 22 05:39:18 mail kernel: [<ffffffff8107f0ab>] ?
tick_program_event+0x2a/0x2c
Jan 22 05:39:18 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 22 05:39:18 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 22 05:39:18 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 22 05:39:18 mail kernel: [<ffffffff81470612>]
smp_apic_timer_interrupt+0x86/0x94
Jan 22 05:39:18 mail kernel: [<ffffffff810127e3>]
apic_timer_interrupt+0x13/0x20
Jan 22 05:39:18 mail kernel: <EOI> [<ffffffff812c729a>] ?
acpi_idle_enter_bm+0x256/0x28a
Jan 22 05:39:18 mail kernel: [<ffffffff812c7293>] ?
acpi_idle_enter_bm+0x24f/0x28a
Jan 22 05:39:18 mail kernel: [<ffffffff813a6c3c>] ?
cpuidle_idle_call+0x9e/0xfa
Jan 22 05:39:18 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
Jan 22 05:39:18 mail kernel: [<ffffffff81465ba5>] ?
start_secondary+0x201/0x242
Jan 22 05:39:18 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 22 05:39:18 mail kernel: sky2 eth0: tx timeout
Jan 22 05:39:18 mail kernel: sky2 eth0: transmit ring 76 .. 35 report=76
done=76
Jan 22 05:39:18 mail kernel: sky2 eth0: disabling interface
Jan 22 05:39:18 mail kernel: sky2 eth0: enabling interface
Jan 22 05:39:21 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
duplex, flow control both
Jan 22 05:40:06 mail kernel: sky2 eth0: tx timeout
Jan 22 05:40:06 mail kernel: sky2 eth0: transmit ring 3 .. 90 report=3
done=3
Jan 22 05:40:06 mail kernel: sky2 eth0: disabling interface
Jan 22 05:40:06 mail kernel: sky2 eth0: enabling interface
Jan 22 05:40:09 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
duplex, flow control both
Jan 22 05:40:30 mail abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Jan 22 05:40:30 mail abrtd: Directory 'kerneloops-1264156830-1' creation
detected
Jan 22 05:40:30 mail abrtd: Can't open file
'/var/cache/abrt/kerneloops-1264156830-1/cmdline'
Jan 22 05:40:30 mail abrtd: Corrupted or bad crash, deleting
Jan 22 05:40:33 mail dhcpd: DHCPINFORM from 10.0.0.11 via eth0
Jan 22 05:40:33 mail dhcpd: DHCPACK to 10.0.0.11 (00:1a:92:8d:30:81) via
eth0
Jan 22 05:40:36 mail dhcpd: DHCPINFORM from 10.0.0.11 via eth0
Jan 22 05:40:36 mail dhcpd: DHCPACK to 10.0.0.11 (00:1a:92:8d:30:81) via
eth0
Jan 22 05:40:50 mail named[11245]: error (connection refused) resolving
'173-212-207-86.hostnoc.net/A/IN': 2607:f878:0:3::12#53
Jan 22 05:40:50 mail named[11245]: error (connection refused) resolving
'NS1.HOSTNOC.NET/AAAA/IN': 2607:f878:0:3::10#53
<watchdog reboot>
On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
> Kernel 2.6.32.4 (git) with the following patches applied:
>
> af_packet.c (tpacket_snd version 3)
> sky2.c pskb_may_pull
> sky2 fix WARNING at lib/dma-debug.c check_sync
I guess, you meant the "sky2.c receive_copy" patch which you tested
earlier, or at least you managed to crash DMAR with that patch
before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
right?
> Running with CONFIG_DMAR=n, system is stable.
> Running with the exact same source but CONFIG_DMAR=y I get the
> WARNING (see below) after about 36 hours of uptime (has varied from
> about 24 to about 48):
> Smolt profile: http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
> I'm also attaching dmesg.old (dmesg from the crash).
>
> Subsequent to this the system watchdog reboots the system (it's hung).
>
> Of interest: each and every time this has happened the system was
> under heavy RX load (win7 backup to a cifs share hosted on this
> server). Also, there is always a dhcp exchange of some sort
> preceding the event.
>
> It is possible that the event is re creatable without DMAR enabled,
> but I have been unsuccessful in doing so.
It would be nice to check now if it's re-creatable without the dhcp
exchange yet, or at least dhcp through the switch and the router,
because I suspect there might be something more than a simple drop
on the switch that affects sky2 stability.
Jarek P.
On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
>
>> Kernel 2.6.32.4 (git) with the following patches applied:
>>
>> af_packet.c (tpacket_snd version 3)
>> sky2.c pskb_may_pull
>> sky2 fix WARNING at lib/dma-debug.c check_sync
>>
> I guess, you meant the "sky2.c receive_copy" patch which you tested
> earlier, or at least you managed to crash DMAR with that patch
> before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
> right?
>
>
Yes - sorry, correct - all three patches were in the last run.
Previously, I've encountered the crash without these patches.
>> Running with CONFIG_DMAR=n, system is stable.
>> Running with the exact same source but CONFIG_DMAR=y I get the
>> WARNING (see below) after about 36 hours of uptime (has varied from
>> about 24 to about 48):
>> Smolt profile: http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
>> I'm also attaching dmesg.old (dmesg from the crash).
>>
>> Subsequent to this the system watchdog reboots the system (it's hung).
>>
>> Of interest: each and every time this has happened the system was
>> under heavy RX load (win7 backup to a cifs share hosted on this
>> server). Also, there is always a dhcp exchange of some sort
>> preceding the event.
>>
>> It is possible that the event is re creatable without DMAR enabled,
>> but I have been unsuccessful in doing so.
>>
> It would be nice to check now if it's re-creatable without the dhcp
> exchange yet, or at least dhcp through the switch and the router,
> because I suspect there might be something more than a simple drop
> on the switch that affects sky2 stability.
>
> Jarek P.
>
Not sure I can do that. Note that based on the log messages, there were
no errors/dropped packets involving dhcp. Moving the dhcp server off of
the affected machine is not trivial. The dhcp correlation is based on
logged messages preceding each crash. I cannot confirm that they're
related, however it's really suspicious. If it helps, HP replaced my
unmanaged switch with a managed one so I can see whether there were any
switch events logged the next time I have a crash.
At this point, it seems the following is required to trigger the crash:
1) Uptime of 24-36 hours
2) High RX load on server (cifs traffic is what I've triggered it with).
3) Normal DHCP traffic.
Looks like based on the events I've seen that the high RX load has to be
sustained for about 15-20 minutes prior to the dhcp traffic. Crash
follows about a minute later.
On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
> On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
> >On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
> >>Kernel 2.6.32.4 (git) with the following patches applied:
> >>
> >>af_packet.c (tpacket_snd version 3)
> >>sky2.c pskb_may_pull
> >>sky2 fix WARNING at lib/dma-debug.c check_sync
> >I guess, you meant the "sky2.c receive_copy" patch which you tested
> >earlier, or at least you managed to crash DMAR with that patch
> >before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
> >right?
> >
> Yes - sorry, correct - all three patches were in the last run.
> Previously, I've encountered the crash without these patches.
OK, thanks for testing - it's really very helpful, and supports
David's opinion that dmar is a different problem.
...
> Not sure I can do that. Note that based on the log messages, there
> were no errors/dropped packets involving dhcp. Moving the dhcp
> server off of the affected machine is not trivial. The dhcp
> correlation is based on logged messages preceding each crash. I
> cannot confirm that they're related, however it's really suspicious.
> If it helps, HP replaced my unmanaged switch with a managed one so I
> can see whether there were any switch events logged the next time I
> have a crash.
>
> At this point, it seems the following is required to trigger the crash:
> 1) Uptime of 24-36 hours
> 2) High RX load on server (cifs traffic is what I've triggered it with).
> 3) Normal DHCP traffic.
Do you mean you got these crashes with the new switch too, and this
switch doesn't drop DHCP at all? (Otherwise, let's try this switch
first.)
Jarek P.
On 1/22/2010 6:06 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
>
>> On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
>>
>>> On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
>>>
>>>> Kernel 2.6.32.4 (git) with the following patches applied:
>>>>
>>>> af_packet.c (tpacket_snd version 3)
>>>> sky2.c pskb_may_pull
>>>> sky2 fix WARNING at lib/dma-debug.c check_sync
>>>>
>>> I guess, you meant the "sky2.c receive_copy" patch which you tested
>>> earlier, or at least you managed to crash DMAR with that patch
>>> before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
>>> right?
>>>
>>>
>> Yes - sorry, correct - all three patches were in the last run.
>> Previously, I've encountered the crash without these patches.
>>
> OK, thanks for testing - it's really very helpful, and supports
> David's opinion that dmar is a different problem.
> ...
>
>> Not sure I can do that. Note that based on the log messages, there
>> were no errors/dropped packets involving dhcp. Moving the dhcp
>> server off of the affected machine is not trivial. The dhcp
>> correlation is based on logged messages preceding each crash. I
>> cannot confirm that they're related, however it's really suspicious.
>> If it helps, HP replaced my unmanaged switch with a managed one so I
>> can see whether there were any switch events logged the next time I
>> have a crash.
>>
>> At this point, it seems the following is required to trigger the crash:
>> 1) Uptime of 24-36 hours
>> 2) High RX load on server (cifs traffic is what I've triggered it with).
>> 3) Normal DHCP traffic.
>>
> Do you mean you got these crashes with the new switch too, and this
> switch doesn't drop DHCP at all? (Otherwise, let's try this switch
> first.)
>
> Jarek P.
>
Nope - just got the new switch. Crash was old switch. That said, I don't
think (based on the log messages) that the dhcpoffer packet drop was
happening prior to the crash. I also can't fathom why a DHCPOFFER packet
dropped after leaving the server would have any bearing on the issue.
On Fri, Jan 22, 2010 at 06:25:12PM -0500, Michael Breuer wrote:
> On 1/22/2010 6:06 PM, Jarek Poplawski wrote:
> >On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
> >>Not sure I can do that. Note that based on the log messages, there
> >>were no errors/dropped packets involving dhcp. Moving the dhcp
> >>server off of the affected machine is not trivial. The dhcp
> >>correlation is based on logged messages preceding each crash. I
> >>cannot confirm that they're related, however it's really suspicious.
> >>If it helps, HP replaced my unmanaged switch with a managed one so I
> >>can see whether there were any switch events logged the next time I
> >>have a crash.
> >>
> >>At this point, it seems the following is required to trigger the crash:
> >>1) Uptime of 24-36 hours
> >>2) High RX load on server (cifs traffic is what I've triggered it with).
> >>3) Normal DHCP traffic.
> >Do you mean you got these crashes with the new switch too, and this
> >switch doesn't drop DHCP at all? (Otherwise, let's try this switch
> >first.)
> >
> >Jarek P.
> Nope - just got the new switch. Crash was old switch. That said, I
> don't think (based on the log messages) that the dhcpoffer packet
> drop was happening prior to the crash. I also can't fathom why a
> DHCPOFFER packet dropped after leaving the server would have any
> bearing on the issue.
You wrote earlier:
> [...] Also, there is always a dhcp exchange of some sort
> preceding the event.
So, I'm not sure there was "3) Normal DHCP traffic." if the switch
could drop DHCP packets in some buggy conditions. Anyway, let's try
the new one with really "3) Normal DHCP traffic.", I hope.
Jarek P.
On 1/22/2010 6:46 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:25:12PM -0500, Michael Breuer wrote:
>
>
> You wrote earlier:
>
>> [...] Also, there is always a dhcp exchange of some sort
>> preceding the event.
>>
> So, I'm not sure there was "3) Normal DHCP traffic." if the switch
> could drop DHCP packets in some buggy conditions. Anyway, let's try
> the new one with really "3) Normal DHCP traffic.", I hope.
>
> Jarek P.
>
When the packets were dropped, there was a different sequence in the log
- DISCOVER/OFFER repeated. The "normal" is that the sequence appeared
correct and complete - DISCOVER/OFFER/REQUEST/ACK - or INFORM/ACK (vs.
INFORM repeatedly sans ACK) as the case may be.
On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> When the packets were dropped, there was a different sequence in the
> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
Anyway, I'd be intersted if the switch matters here.
Plus one more test: could you try to load sky2 with the parameter:
"copybreak=1" (the rest as in any recent test, which gave you dmar
errors; any switch).
Thanks,
Jarek P.
On 1/23/2010 6:21 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>
>> When the packets were dropped, there was a different sequence in the
>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>
> Anyway, I'd be intersted if the switch matters here.
>
> Plus one more test: could you try to load sky2 with the parameter:
> "copybreak=1" (the rest as in any recent test, which gave you dmar
> errors; any switch).
>
> Thanks,
> Jarek P.
>
Ok - will try with and without the copybreak=1 after I'm done bisecting
2.6.33 rc5 for something unrelated (it'll take a couple of days for each
unless a crash occurs in less that 48 hours).
On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>
>> When the packets were dropped, there was a different sequence in the
>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>
> Anyway, I'd be intersted if the switch matters here.
>
> Plus one more test: could you try to load sky2 with the parameter:
> "copybreak=1" (the rest as in any recent test, which gave you dmar
> errors; any switch).
>
> Thanks,
> Jarek P.
>
Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
to confirm that I haven't inadvertently fixed something. However, given
that it might be copybreak-related, I looked at sky2.c again and I'm
wondering about the copybreak max size in sky2_rx_start:
size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
/* Stopping point for hardware truncation */
thresh = (size - 8) / sizeof(u32);
sky2->rx_nfrags = size >> PAGE_SHIFT;
BUG_ON(sky2->rx_nfrags > ARRAY_SIZE(re->frag_addr));
/* Compute residue after pages */
size -= sky2->rx_nfrags << PAGE_SHIFT;
/* Optimize to handle small packets and headers */
if (size < copybreak)
size = copybreak;
if (size < ETH_HLEN)
size = ETH_HLEN;
Why would increasing size to copybreak be valid here?
Guessing a bit as I'm not sure about rx_nfrags, but if I read this
correctly, if size is ever less than copybreak it's because there isn't
enough space left for anything larger. If so, wouldn't increasing size
potentially corrupt something? I'd further guess that the resulting
condition manifests sooner (or at least with a more visible effect) when
using DMAR.
In any event, why "copybreak" as the minimum buffer size? I'd suggest
that if it isn't possible to allocate at least MTU + overhead that
sky2_rx_start ought to be delayed until there is room.
On Wed, 27 Jan 2010 10:34:51 -0500
Michael Breuer <[email protected]> wrote:
> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> > On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >
> >> When the packets were dropped, there was a different sequence in the
> >> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>
> > Anyway, I'd be intersted if the switch matters here.
> >
> > Plus one more test: could you try to load sky2 with the parameter:
> > "copybreak=1" (the rest as in any recent test, which gave you dmar
> > errors; any switch).
> >
> > Thanks,
> > Jarek P.
> >
> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> to confirm that I haven't inadvertently fixed something. However, given
> that it might be copybreak-related, I looked at sky2.c again and I'm
> wondering about the copybreak max size in sky2_rx_start:
>
> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>
> /* Stopping point for hardware truncation */
> thresh = (size - 8) / sizeof(u32);
>
> sky2->rx_nfrags = size >> PAGE_SHIFT;
> BUG_ON(sky2->rx_nfrags > ARRAY_SIZE(re->frag_addr));
>
> /* Compute residue after pages */
> size -= sky2->rx_nfrags << PAGE_SHIFT;
>
> /* Optimize to handle small packets and headers */
> if (size < copybreak)
> size = copybreak;
> if (size < ETH_HLEN)
> size = ETH_HLEN;
>
>
> Why would increasing size to copybreak be valid here?
>
> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> correctly, if size is ever less than copybreak it's because there isn't
> enough space left for anything larger. If so, wouldn't increasing size
> potentially corrupt something? I'd further guess that the resulting
> condition manifests sooner (or at least with a more visible effect) when
> using DMAR.
>
> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> that if it isn't possible to allocate at least MTU + overhead that
> sky2_rx_start ought to be delayed until there is room.
This code is where driver decides how much data will be received in skb
data area and the remaining data spills over into skb frags.
Copybreak is the threshold so that packets less than size are copied
to a new skb. The code doing the copying there assumes the data is
totally contained in the skb (not in frags). The size increase there
is to make sure that assumption is always true. I suppose you
could do something perverse like setting copybreak really huge
and confuse driver, but that is a user error.
On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 10:34:51 -0500
> Michael Breuer<[email protected]> wrote:
>
>
>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>
>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>
>>>
>>>> When the packets were dropped, there was a different sequence in the
>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>
>>>>
>>> Anyway, I'd be intersted if the switch matters here.
>>>
>>> Plus one more test: could you try to load sky2 with the parameter:
>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>> errors; any switch).
>>>
>>> Thanks,
>>> Jarek P.
>>>
>>>
>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>> to confirm that I haven't inadvertently fixed something. However, given
>> that it might be copybreak-related, I looked at sky2.c again and I'm
>> wondering about the copybreak max size in sky2_rx_start:
>>
>> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>
>> /* Stopping point for hardware truncation */
>> thresh = (size - 8) / sizeof(u32);
>>
>> sky2->rx_nfrags = size>> PAGE_SHIFT;
>> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
>>
>> /* Compute residue after pages */
>> size -= sky2->rx_nfrags<< PAGE_SHIFT;
>>
>> /* Optimize to handle small packets and headers */
>> if (size< copybreak)
>> size = copybreak;
>> if (size< ETH_HLEN)
>> size = ETH_HLEN;
>>
>>
>> Why would increasing size to copybreak be valid here?
>>
>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>> correctly, if size is ever less than copybreak it's because there isn't
>> enough space left for anything larger. If so, wouldn't increasing size
>> potentially corrupt something? I'd further guess that the resulting
>> condition manifests sooner (or at least with a more visible effect) when
>> using DMAR.
>>
>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>> that if it isn't possible to allocate at least MTU + overhead that
>> sky2_rx_start ought to be delayed until there is room.
>>
> This code is where driver decides how much data will be received in skb
> data area and the remaining data spills over into skb frags.
> Copybreak is the threshold so that packets less than size are copied
> to a new skb. The code doing the copying there assumes the data is
> totally contained in the skb (not in frags). The size increase there
> is to make sure that assumption is always true. I suppose you
> could do something perverse like setting copybreak really huge
> and confuse driver, but that is a user error.
>
>
Ok - but I'm wondering under what circumstances size would be <
copybreak in the first place after computing the residue. If size ends
up being unreasonably small, is simply increasing the number to whatever
copybreak is correct? Assuming my testing is correct, then the crash
I've been experiencing when using dmar (only) seems related to the value
of copybreak. I don't think the other use (skb reuse) is the issue (but
hey, I could have missed something). The crash occurs when copybreak is
the default of 128, didn't happen when I set copybreak to 1.
On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <[email protected]> wrote:
> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<[email protected]> wrote:
> >
> >
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >> /* Stopping point for hardware truncation */
> >> thresh = (size - 8) / sizeof(u32);
> >>
> >> sky2->rx_nfrags = size>> PAGE_SHIFT;
> >> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
> >>
> >> /* Compute residue after pages */
> >> size -= sky2->rx_nfrags<< PAGE_SHIFT;
> >>
> >> /* Optimize to handle small packets and headers */
> >> if (size< copybreak)
> >> size = copybreak;
> >> if (size< ETH_HLEN)
> >> size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb. The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true. I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >
> Ok - but I'm wondering under what circumstances size would be <
> copybreak in the first place after computing the residue. If size ends
> up being unreasonably small, is simply increasing the number to whatever
> copybreak is correct? Assuming my testing is correct, then the crash
> I've been experiencing when using dmar (only) seems related to the value
> of copybreak. I don't think the other use (skb reuse) is the issue (but
> hey, I could have missed something). The crash occurs when copybreak is
> the default of 128, didn't happen when I set copybreak to 1.
>
Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
properly, or the value passed to sync_single_for_cpu doesn't account for
all the overhead of padding and/or ether header.
--
On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <[email protected]> wrote:
> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<[email protected]> wrote:
> >
> >
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >> /* Stopping point for hardware truncation */
> >> thresh = (size - 8) / sizeof(u32);
> >>
> >> sky2->rx_nfrags = size>> PAGE_SHIFT;
> >> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
> >>
> >> /* Compute residue after pages */
> >> size -= sky2->rx_nfrags<< PAGE_SHIFT;
> >>
> >> /* Optimize to handle small packets and headers */
> >> if (size< copybreak)
> >> size = copybreak;
> >> if (size< ETH_HLEN)
> >> size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb. The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true. I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >
> Ok - but I'm wondering under what circumstances size would be <
> copybreak in the first place after computing the residue. If size ends
> up being unreasonably small, is simply increasing the number to whatever
> copybreak is correct? Assuming my testing is correct, then the crash
> I've been experiencing when using dmar (only) seems related to the value
> of copybreak. I don't think the other use (skb reuse) is the issue (but
> hey, I could have missed something). The crash occurs when copybreak is
> the default of 128, didn't happen when I set copybreak to 1.
Does this change it? If so the dma code is (not sky2) is buggy and not
rounding up properly.
--- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
+++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
if (likely(skb)) {
+ unsigned dma_align = dma_get_cache_alignment();
+ unsigned dma_size = ALIGN(length+1, dma_align);
+
pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ dma_size, PCI_DMA_FROMDEVICE);
skb_copy_from_linear_data(re->skb, skb->data, length);
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ dma_size, PCI_DMA_FROMDEVICE);
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
}
On 1/27/2010 12:45 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<[email protected]> wrote:
>
>
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<[email protected]> wrote:
>>>
>>>
>>>
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>> /* Stopping point for hardware truncation */
>>>> thresh = (size - 8) / sizeof(u32);
>>>>
>>>> sky2->rx_nfrags = size>> PAGE_SHIFT;
>>>> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
>>>>
>>>> /* Compute residue after pages */
>>>> size -= sky2->rx_nfrags<< PAGE_SHIFT;
>>>>
>>>> /* Optimize to handle small packets and headers */
>>>> if (size< copybreak)
>>>> size = copybreak;
>>>> if (size< ETH_HLEN)
>>>> size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb. The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true. I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>
>>
> Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
> path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
> properly, or the value passed to sync_single_for_cpu doesn't account for
> all the overhead of padding and/or ether header.
>
>
Ah - ok... will poke around there... if you have any suggestions,
diagnostics, whatever, let me know. Also, just an FYI - before rebooting
with copybreak back to defaults, I tried mtu=9000 again. That hung the
server immediately - no diagnostic output - system froze until watchdog
rebooted. Don't know right now if the copybreak had anything to do with
this, but when I've tried in the past I've had errors on sky2, but never
crashed the system like this. Only two things different were copybreak
and the length of time the system had been up. I'll try later with
copybreak default and copybreak=1 to see if that affects mtu behavior.
On 1/27/2010 12:56 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<[email protected]> wrote:
>
>
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<[email protected]> wrote:
>>>
>>>
>>>
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>> /* Stopping point for hardware truncation */
>>>> thresh = (size - 8) / sizeof(u32);
>>>>
>>>> sky2->rx_nfrags = size>> PAGE_SHIFT;
>>>> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
>>>>
>>>> /* Compute residue after pages */
>>>> size -= sky2->rx_nfrags<< PAGE_SHIFT;
>>>>
>>>> /* Optimize to handle small packets and headers */
>>>> if (size< copybreak)
>>>> size = copybreak;
>>>> if (size< ETH_HLEN)
>>>> size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb. The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true. I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>
> Does this change it? If so the dma code is (not sky2) is buggy and not
> rounding up properly.
>
> --- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> if (likely(skb)) {
> + unsigned dma_align = dma_get_cache_alignment();
> + unsigned dma_size = ALIGN(length+1, dma_align);
> +
> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + dma_size, PCI_DMA_FROMDEVICE);
> skb_copy_from_linear_data(re->skb, skb->data, length);
> skb->ip_summed = re->skb->ip_summed;
> skb->csum = re->skb->csum;
> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + dma_size, PCI_DMA_FROMDEVICE);
> re->skb->ip_summed = CHECKSUM_NONE;
> skb_put(skb, length);
> }
>
Ok - will queue this - want to reconfirm that the system still crashes
w/o this (or copybreak). That should take a few days.
On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> --- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> if (likely(skb)) {
> + unsigned dma_align = dma_get_cache_alignment();
> + unsigned dma_size = ALIGN(length+1, dma_align);
> +
> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + dma_size, PCI_DMA_FROMDEVICE);
> skb_copy_from_linear_data(re->skb, skb->data, length);
> skb->ip_summed = re->skb->ip_summed;
> skb->csum = re->skb->csum;
> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> - length, PCI_DMA_FROMDEVICE);
> + dma_size, PCI_DMA_FROMDEVICE);
> re->skb->ip_summed = CHECKSUM_NONE;
> skb_put(skb, length);
> }
>
This doesn't apply - I'm missing some intermediate patch.
I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, data_size)
vs., "length." I assume that I can just replace the pci_unmap_len with
dma_size... but perhaps the intermediate change may have affected this
as well?
On 01/27/2010 12:57 PM, Michael Breuer wrote:
> On 1/27/2010 12:45 PM, Stephen Hemminger wrote:
>> On Wed, 27 Jan 2010 11:57:35 -0500
>> Michael Breuer<[email protected]> wrote:
>>
>>
> Ah - ok... will poke around there... if you have any suggestions,
> diagnostics, whatever, let me know. Also, just an FYI - before
> rebooting with copybreak back to defaults, I tried mtu=9000 again.
> That hung the server immediately - no diagnostic output - system froze
> until watchdog rebooted. Don't know right now if the copybreak had
> anything to do with this, but when I've tried in the past I've had
> errors on sky2, but never crashed the system like this. Only two
> things different were copybreak and the length of time the system had
> been up. I'll try later with copybreak default and copybreak=1 to see
> if that affects mtu behavior.
>
FYI - just redid this a few times. Looks like it's how long the system
was up, not copybreak wrt crash on resetting MTU.
That said, while the system seems OK after resetting the MTU, I do get a
WARNING from netdev watchdog - same warning regardless of copybreak.
Setting the mtu back to 1500 generates rx errors after which things
work. Going back to 9000 again does not generate new errors.
Jan 27 13:21:54 mail kernel: ------------[ cut here ]------------
Jan 27 13:21:54 mail kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0xf3/0x164()
Jan 27 13:21:54 mail kernel: Hardware name: System Product Name
Jan 27 13:21:54 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
queue 0 timed out
Jan 27 13:21:54 mail kernel: Modules linked in: microcode(+)
ip6table_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat
nf_nat iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd
lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
nf_conntrack_ftp nf_conntrack_ipv6 xt_multiport xt_DSCP xt_dscp xt_MARK
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371
gameport snd_rawmidi snd_ac97_codec snd_hda_intel snd_hda_codec
snd_hwdep ac97_bus snd_seq gspca_spca505 gspca_main videodev
snd_seq_device asus_atk0110 v4l1_compat snd_pcm hwmon
v4l2_compat_ioctl32 pcspkr i2c_i801 firewire_ohci firewire_core
crc_itu_t snd_timer snd soundcore wmi snd_page_alloc sky2 iTCO_wdt
iTCO_vendor_support fbcon tileblit font bitblit softcursor raid456
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm
agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfb
Jan 27 13:21:54 mail kernel: imgblt cfbfillrect [last unloaded: ip6_tables]
Jan 27 13:21:54 mail kernel: Pid: 0, comm: swapper Tainted: G W
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 27 13:21:54 mail kernel: Call Trace:
Jan 27 13:21:54 mail kernel: <IRQ> [<ffffffff810536ee>]
warn_slowpath_common+0x7c/0x94
Jan 27 13:21:54 mail kernel: [<ffffffff8105375d>]
warn_slowpath_fmt+0x41/0x43
Jan 27 13:21:54 mail kernel: [<ffffffff813e3b6b>] ? netif_tx_lock+0x44/0x6c
Jan 27 13:21:54 mail kernel: [<ffffffff813e3cd3>] dev_watchdog+0xf3/0x164
Jan 27 13:21:54 mail kernel: [<ffffffff8106e990>] ? __queue_work+0x3a/0x42
Jan 27 13:21:54 mail kernel: [<ffffffff8106323f>]
run_timer_softirq+0x1c8/0x270
Jan 27 13:21:54 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 27 13:21:54 mail kernel: [<ffffffff8107f0ab>] ?
tick_program_event+0x2a/0x2c
Jan 27 13:21:54 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 27 13:21:54 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 27 13:21:54 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 27 13:21:54 mail kernel: [<ffffffff81470612>]
smp_apic_timer_interrupt+0x86/0x94
Jan 27 13:21:54 mail kernel: [<ffffffff810127e3>]
apic_timer_interrupt+0x13/0x20
Jan 27 13:21:54 mail kernel: <EOI> [<ffffffff812c729a>] ?
acpi_idle_enter_bm+0x256/0x28a
Jan 27 13:21:54 mail kernel: [<ffffffff812c7293>] ?
acpi_idle_enter_bm+0x24f/0x28a
Jan 27 13:21:54 mail kernel: [<ffffffff813a6c3c>] ?
cpuidle_idle_call+0x9e/0xfa
Jan 27 13:21:54 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
Jan 27 13:21:54 mail kernel: [<ffffffff81465ba5>] ?
start_secondary+0x201/0x242
Jan 27 13:21:54 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 27 13:21:54 mail kernel: sky2 eth0: tx timeout
Jan 27 13:21:54 mail kernel: sky2 eth0: transmit ring 51 .. 10 report=51
done=51
Jan 27 13:21:54 mail kernel: sky2 eth0: disabling interface
Jan 27 13:21:54 mail kernel: sky2 eth0: enabling interface
On 1/27/2010 1:08 PM, Michael Breuer wrote:
> On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
>> --- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
>> +++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
>> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>>
>> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>> if (likely(skb)) {
>> + unsigned dma_align = dma_get_cache_alignment();
>> + unsigned dma_size = ALIGN(length+1, dma_align);
>> +
>> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
>> - length, PCI_DMA_FROMDEVICE);
>> + dma_size, PCI_DMA_FROMDEVICE);
>> skb_copy_from_linear_data(re->skb, skb->data, length);
>> skb->ip_summed = re->skb->ip_summed;
>> skb->csum = re->skb->csum;
>> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
>> - length, PCI_DMA_FROMDEVICE);
>> + dma_size, PCI_DMA_FROMDEVICE);
>> re->skb->ip_summed = CHECKSUM_NONE;
>> skb_put(skb, length);
>> }
> This doesn't apply - I'm missing some intermediate patch.
>
> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
> data_size) vs., "length." I assume that I can just replace the
> pci_unmap_len with dma_size... but perhaps the intermediate change may
> have affected this as well?
>
Never mind - that was from one of the earlier patches I had been trying
out. will try the above patch after reestablishing that the system still
crashes without copybreak=1.
On Wed, Jan 27, 2010 at 01:45:27PM -0500, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
> >On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> >>--- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
> >>+++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
> >>@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
> >>
> >> skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> >> if (likely(skb)) {
> >>+ unsigned dma_align = dma_get_cache_alignment();
> >>+ unsigned dma_size = ALIGN(length+1, dma_align);
> >>+
> >> pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> >>- length, PCI_DMA_FROMDEVICE);
> >>+ dma_size, PCI_DMA_FROMDEVICE);
> >> skb_copy_from_linear_data(re->skb, skb->data, length);
> >> skb->ip_summed = re->skb->ip_summed;
> >> skb->csum = re->skb->csum;
> >> pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> >>- length, PCI_DMA_FROMDEVICE);
> >>+ dma_size, PCI_DMA_FROMDEVICE);
> >> re->skb->ip_summed = CHECKSUM_NONE;
> >> skb_put(skb, length);
> >> }
> >This doesn't apply - I'm missing some intermediate patch.
> >
> >I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
> >data_size) vs., "length." I assume that I can just replace the
> >pci_unmap_len with dma_size... but perhaps the intermediate change
> >may have affected this as well?
> >
> Never mind - that was from one of the earlier patches I had been
> trying out. will try the above patch after reestablishing that the
> system still crashes without copybreak=1.
>
Stephen, I'm not sure this patch can show much after the patch with
"legal" dma_size == re->data_addr didn't help. It looks like David
was right: dma_sync can't affect dmar, because it doesn't use it at
all.
Then I'd rather suggest to test if using copybreak more often, e.g.
with copybreak=1000 or even more can trigger these errors faster.
Jarek P.
On Wed, Jan 27, 2010 at 08:23:12PM +0100, Jarek Poplawski wrote:
> Stephen, I'm not sure this patch can show much after the patch with
> "legal" dma_size == re->data_addr didn't help. It looks like David
Of course: "dma_size == re->data_size".
Jarek P.
From: Stephen Hemminger <[email protected]>
Date: Wed, 27 Jan 2010 09:45:31 -0800
> Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
> path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
> properly, or the value passed to sync_single_for_cpu doesn't account for
> all the overhead of padding and/or ether header.
For the millionth time...
DMAR does not implement any of the DMA SYNC operations, they are set
to NULL in the dma ops vector, and thus are NOPs.
On 01/27/2010 01:45 PM, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>
>>
>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
>> data_size) vs., "length." I assume that I can just replace the
>> pci_unmap_len with dma_size... but perhaps the intermediate change
>> may have affected this as well?
>>
> Never mind - that was from one of the earlier patches I had been
> trying out. will try the above patch after reestablishing that the
> system still crashes without copybreak=1.
Just FYI - still crashes with default copybreak. Didn't get the netdev
watchdog this time - just DMAR and then HW watchdog reboot (see below).
So what's known to be required to cause this crash:
1) sky2 @ 1Gb
2) High sustained RX load (> 40MBps)
3) Uptime (I can't cause this to happen just after boot).
4) DMAR enabled (doesn't crash w/o DMAR).
5) copybreak != 1
What might be required but is unproven:
1) cifs traffic (I've only seen this when the high traffic was due to a
Win7 box doing backup). I've tried but have been unable to recreate by
just copying large files. Backups done from a Mac OS laptop don't
trigger the issue even though that machine is also connecting with CIFS
(TimeMachine works better that way).
2) DHCP traffic. There has always been some sort of DHCP exchange in the
log before the first indication of a problem (DMAR).
3) Total throughput since boot. DK about this - however the uptime
component before the latest crash was the shortest yet. In preparation I
moved a bunch of large files around on the Windows box to ensure a
larger than normal backup run. I also ran manually before going to bed
(then moved the files around again). Didn't crash when I was watching -
but did overnight. Total uptime before this crash was only about 6
hours. Previously (with less backup data) the system didn't crash until
24-36 hours.
Observations:
Copybreak: I did play for an hour or so yesterday with copybreak=1000.
Ran traffic, etc. No crash, but throughput was lower and the system was
clearly working way harder than normal. Given the whine of the fans I'm
not keen on leaving the system in that state for any extended period of
time.
MTU: Increasing the MTU to 9000 yesterday after the system had been up
for some time (copybreak=1) crashed the system immediately. Subsequently
I have been able to change the mtu without crashes (although the driver
does end up in some sort of state that requires a restart after lowering
the mtu). I suspect that over time something is being corrupted
resulting in the crash when changing mtu. Whatever it becoming corrupted
is probably related to the other crash as well. That suggests to me that
copybreak=1 is preventing or delaying the manifestation of the
underlying issue but is unrelated to the source of corruption.
[no messages in the prior three minutes - there was a dhcp exchange
(request/ack) at 06:02:27]
Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
fault addr ffdd06bfe000
Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is
not set
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x80000000
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
[No further messages until restart at 06:09:46.]
On 01/28/2010 10:32 AM, Michael Breuer wrote:
> On 01/27/2010 01:45 PM, Michael Breuer wrote:
>> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>>
>>>
>>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
>>> data_size) vs., "length." I assume that I can just replace the
>>> pci_unmap_len with dma_size... but perhaps the intermediate change
>>> may have affected this as well?
>>>
>> Never mind - that was from one of the earlier patches I had been
>> trying out. will try the above patch after reestablishing that the
>> system still crashes without copybreak=1.
> Just FYI - still crashes with default copybreak. Didn't get the
> netdev watchdog this time - just DMAR and then HW watchdog reboot (see
> below).
>
> So what's known to be required to cause this crash:
>
> 1) sky2 @ 1Gb
> 2) High sustained RX load (> 40MBps)
> 3) Uptime (I can't cause this to happen just after boot).
> 4) DMAR enabled (doesn't crash w/o DMAR).
> 5) copybreak != 1
>
> What might be required but is unproven:
> 1) cifs traffic (I've only seen this when the high traffic was due to
> a Win7 box doing backup). I've tried but have been unable to recreate
> by just copying large files. Backups done from a Mac OS laptop don't
> trigger the issue even though that machine is also connecting with
> CIFS (TimeMachine works better that way).
> 2) DHCP traffic. There has always been some sort of DHCP exchange in
> the log before the first indication of a problem (DMAR).
> 3) Total throughput since boot. DK about this - however the uptime
> component before the latest crash was the shortest yet. In preparation
> I moved a bunch of large files around on the Windows box to ensure a
> larger than normal backup run. I also ran manually before going to bed
> (then moved the files around again). Didn't crash when I was watching
> - but did overnight. Total uptime before this crash was only about 6
> hours. Previously (with less backup data) the system didn't crash
> until 24-36 hours.
>
> Observations:
>
> Copybreak: I did play for an hour or so yesterday with copybreak=1000.
> Ran traffic, etc. No crash, but throughput was lower and the system
> was clearly working way harder than normal. Given the whine of the
> fans I'm not keen on leaving the system in that state for any extended
> period of time.
>
> MTU: Increasing the MTU to 9000 yesterday after the system had been up
> for some time (copybreak=1) crashed the system immediately.
> Subsequently I have been able to change the mtu without crashes
> (although the driver does end up in some sort of state that requires a
> restart after lowering the mtu). I suspect that over time something is
> being corrupted resulting in the crash when changing mtu. Whatever it
> becoming corrupted is probably related to the other crash as well.
> That suggests to me that copybreak=1 is preventing or delaying the
> manifestation of the underlying issue but is unrelated to the source
> of corruption.
>
> [no messages in the prior three minutes - there was a dhcp exchange
> (request/ack) at 06:02:27]
> Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
> Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
> fault addr ffdd06bfe000
> Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is
> not set
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt
> status=0x80000000
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error
> (0x2010)
> [No further messages until restart at 06:09:46.]
>
Update: I played with dma-debug. Was being disabled due to lack of
memory. I forced it back on while pumping traffic through and got this:
Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902
check_sync+0xc1/0x43f()
Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver
tries to sync DMA memory it has not allocated [device
address=0x0000ffff4fe37022] [size=1520 bytes]
Jan 28 11:39:30 mail kernel: Modules linked in: microcode(+)
ip6table_filter ip6table_mangle ip6_tables iptable_raw iptable_mangle
ipt_MASQUERADE iptable_nat nf_nat bridge stp appletalk psnap llc nfsd
lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371
gameport snd_rawmidi gspca_spca505 snd_hda_intel snd_ac97_codec
gspca_main snd_hda_codec videodev snd_hwdep snd_seq v4l1_compat i2c_i801
pcspkr ac97_bus v4l2_compat_ioctl32 snd_seq_device asus_atk0110 hwmon
snd_pcm firewire_ohci firewire_core crc_itu_t sky2 snd_timer snd
iTCO_wdt iTCO_vendor_support wmi soundcore snd_page_alloc fbcon tileblit
font bitblit softcursor raid456 async_raid6_recov async_pq raid6_pq
async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit
cfbcopyarea i2c_core cfb
Jan 28 11:39:30 mail kernel: imgblt cfbfillrect [last unloaded: ip6_tables]
Jan 28 11:39:30 mail kernel: Pid: 5327, comm: bash Tainted: G W
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 28 11:39:30 mail kernel: Call Trace:
Jan 28 11:39:30 mail kernel: <IRQ> [<ffffffff810536ee>]
warn_slowpath_common+0x7c/0x94
Jan 28 11:39:30 mail kernel: [<ffffffff8105375d>]
warn_slowpath_fmt+0x41/0x43
Jan 28 11:39:30 mail kernel: [<ffffffff8127b891>] check_sync+0xc1/0x43f
Jan 28 11:39:30 mail kernel: [<ffffffff8146c51a>] ?
_spin_unlock_irqrestore+0x29/0x41
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ?
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffff8127bf62>]
debug_dma_sync_single_for_cpu+0x42/0x44
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ?
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffffa019aee8>] sky2_poll+0x4d5/0xb06
[sky2]
Jan 28 11:39:30 mail kernel: [<ffffffff81044840>] ?
enqueue_entity+0x26c/0x279
Jan 28 11:39:30 mail kernel: [<ffffffff8107decf>] ?
clockevents_program_event+0x7a/0x83
Jan 28 11:39:30 mail kernel: [<ffffffff813d18ae>] net_rx_action+0xb5/0x1f3
Jan 28 11:39:30 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 28 11:39:30 mail kernel: [<ffffffff810a3006>] ?
handle_IRQ_event+0x119/0x12b
Jan 28 11:39:30 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 28 11:39:30 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 28 11:39:30 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 28 11:39:30 mail kernel: [<ffffffff81470575>] do_IRQ+0xa5/0xbc
Jan 28 11:39:30 mail kernel: [<ffffffff81012613>] ret_from_intr+0x0/0x16
Jan 28 11:39:30 mail kernel: <EOI>
Jan 28 11:39:30 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 28 11:39:30 mail kernel: DMA-API: debugging out of memory - disabling
On Thu, 28 Jan 2010 11:43:16 -0500
Michael Breuer <[email protected]> wrote:
> Update: I played with dma-debug. Was being disabled due to lack of
> memory. I forced it back on while pumping traffic through and got this:
> Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
> Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902
> check_sync+0xc1/0x43f()
> Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
> Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver
> tries to sync DMA memory it has not allocated [device
This test in dma-debug is bogus. Because the debug code matches
dma based on address and size; and is perfectly valid to sync a value
less than size. This is the patch I sent earlier, it isn't 100%
correct but it will let you keep testing
....................................
This should fix the dma-debug API code (and documentation), to
avoid false positives when sync is done on a partial map.
Signed-off-by: Stephen Hemminger <[email protected]>
--- a/Documentation/DMA-API.txt 2010-01-20 15:17:01.390143729 -0800
+++ b/Documentation/DMA-API.txt 2010-01-20 15:18:48.967875255 -0800
@@ -377,9 +377,10 @@ void
pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int nelems, int direction)
-Synchronise a single contiguous or scatter/gather mapping. All the
-parameters must be the same as those passed into the single mapping
-API.
+Synchronise a single contiguous or scatter/gather mapping. The
+device and handle must be the same as those passed into the single mapping
+API. The size can be less than the original mapping if only part
+of the mapping needs to be accessed.
Notes: You must do this:
--- a/lib/dma-debug.c 2010-01-20 15:22:55.919519883 -0800
+++ b/lib/dma-debug.c 2010-01-20 15:26:31.648895638 -0800
@@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
}
/*
- * If we have multiple matches but no perfect-fit, just return
- * NULL.
+ * If we have multiple matches but no perfect-fit
+ * return best value and let caller deal with it.
*/
- ret = (matches == 1) ? ret : NULL;
-
return ret;
}
On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
> On Thu, 28 Jan 2010 11:43:16 -0500
> Michael Breuer<[email protected]> wrote:
>
>
>> Update: I played with dma-debug. Was being disabled due to lack of
>> memory. I forced it back on while pumping traffic through and got this:
>> Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
>> Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902
>> check_sync+0xc1/0x43f()
>> Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
>> Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver
>> tries to sync DMA memory it has not allocated [device
>>
> This test in dma-debug is bogus. Because the debug code matches
> dma based on address and size; and is perfectly valid to sync a value
> less than size. This is the patch I sent earlier, it isn't 100%
> correct but it will let you keep testing
> ....................................
>
>
> This should fix the dma-debug API code (and documentation), to
> avoid false positives when sync is done on a partial map.
>
> Signed-off-by: Stephen Hemminger<[email protected]>
>
> --- a/Documentation/DMA-API.txt 2010-01-20 15:17:01.390143729 -0800
> +++ b/Documentation/DMA-API.txt 2010-01-20 15:18:48.967875255 -0800
> @@ -377,9 +377,10 @@ void
> pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
> int nelems, int direction)
>
> -Synchronise a single contiguous or scatter/gather mapping. All the
> -parameters must be the same as those passed into the single mapping
> -API.
> +Synchronise a single contiguous or scatter/gather mapping. The
> +device and handle must be the same as those passed into the single mapping
> +API. The size can be less than the original mapping if only part
> +of the mapping needs to be accessed.
>
> Notes: You must do this:
>
> --- a/lib/dma-debug.c 2010-01-20 15:22:55.919519883 -0800
> +++ b/lib/dma-debug.c 2010-01-20 15:26:31.648895638 -0800
> @@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
> }
>
> /*
> - * If we have multiple matches but no perfect-fit, just return
> - * NULL.
> + * If we have multiple matches but no perfect-fit
> + * return best value and let caller deal with it.
> */
> - ret = (matches == 1) ? ret : NULL;
> -
> return ret;
> }
>
>
Ok - applied. Noise gone... however I'm not sure whether I'll be able to
keep dma-debug going long enough to catch anything. num_free_entries
keeps dropping... looks like entries are not freed. I'm running with a
huge number for now & sky2 as the driver filter. Is there a reason that
entries wouldn't be unmapped, or is dma-debug.c just not processing the
unmap correctly?
On Thu, Jan 28, 2010 at 01:46:17PM -0500, Michael Breuer wrote:
> On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
> >--- a/lib/dma-debug.c 2010-01-20 15:22:55.919519883 -0800
> >+++ b/lib/dma-debug.c 2010-01-20 15:26:31.648895638 -0800
> >@@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
> > }
> >
> > /*
> >- * If we have multiple matches but no perfect-fit, just return
> >- * NULL.
> >+ * If we have multiple matches but no perfect-fit
> >+ * return best value and let caller deal with it.
> > */
> >- ret = (matches == 1) ? ret : NULL;
> >-
> > return ret;
> > }
> >
> Ok - applied. Noise gone... however I'm not sure whether I'll be
> able to keep dma-debug going long enough to catch anything.
> num_free_entries keeps dropping... looks like entries are not freed.
> I'm running with a huge number for now & sky2 as the driver filter.
> Is there a reason that entries wouldn't be unmapped, or is
> dma-debug.c just not processing the unmap correctly?
Do you mean it's after this patch or earlier too? I think you might
use my sky2/receive_copy/pci_unmap_len patch instead to get rid of
this warning.
Btw, since 1000 was too much, maybe you could try copybreak=256 yet,
plus additional ping or some other source of shorter packets. And how
about trying this new switch?
Jarek P.
On 1/28/2010 5:34 PM, Jarek Poplawski wrote:
> On Thu, Jan 28, 2010 at 01:46:17PM -0500, Michael Breuer wrote:
>
>> On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
>>
>>> --- a/lib/dma-debug.c 2010-01-20 15:22:55.919519883 -0800
>>> +++ b/lib/dma-debug.c 2010-01-20 15:26:31.648895638 -0800
>>> @@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
>>> }
>>>
>>> /*
>>> - * If we have multiple matches but no perfect-fit, just return
>>> - * NULL.
>>> + * If we have multiple matches but no perfect-fit
>>> + * return best value and let caller deal with it.
>>> */
>>> - ret = (matches == 1) ? ret : NULL;
>>> -
>>> return ret;
>>> }
>>>
>>>
>> Ok - applied. Noise gone... however I'm not sure whether I'll be
>> able to keep dma-debug going long enough to catch anything.
>> num_free_entries keeps dropping... looks like entries are not freed.
>> I'm running with a huge number for now& sky2 as the driver filter.
>> Is there a reason that entries wouldn't be unmapped, or is
>> dma-debug.c just not processing the unmap correctly?
>>
> Do you mean it's after this patch or earlier too? I think you might
> use my sky2/receive_copy/pci_unmap_len patch instead to get rid of
> this warning.
>
> Btw, since 1000 was too much, maybe you could try copybreak=256 yet,
> plus additional ping or some other source of shorter packets. And how
> about trying this new switch?
>
> Jarek P.
>
This is with the pci_unmap_len patch as well as the dma-debug patch. I'm
not getting any warnings - but dma-debug num-free-entries drops until
zero and then debugging is disabled. I started with 8,000,000 about
three hours ago - without load I'm already down to less than half that.
Again - only looking at sky2. Looks like the dma-debug hash table is
reducing one entry for every packet, but never increasing the
num_entries (no unmap perhaps).
On Thu, Jan 28, 2010 at 05:43:34PM -0500, Michael Breuer wrote:
> This is with the pci_unmap_len patch as well as the dma-debug patch.
> I'm not getting any warnings - but dma-debug num-free-entries drops
> until zero and then debugging is disabled. I started with 8,000,000
> about three hours ago - without load I'm already down to less than
> half that. Again - only looking at sky2. Looks like the dma-debug
> hash table is reducing one entry for every packet, but never
> increasing the num_entries (no unmap perhaps).
OK, then, until there is some new fix you better turn it off.
Jarek P.
On 1/28/2010 5:56 PM, Jarek Poplawski wrote:
> On Thu, Jan 28, 2010 at 05:43:34PM -0500, Michael Breuer wrote:
>
>> This is with the pci_unmap_len patch as well as the dma-debug patch.
>> I'm not getting any warnings - but dma-debug num-free-entries drops
>> until zero and then debugging is disabled. I started with 8,000,000
>> about three hours ago - without load I'm already down to less than
>> half that. Again - only looking at sky2. Looks like the dma-debug
>> hash table is reducing one entry for every packet, but never
>> increasing the num_entries (no unmap perhaps).
>>
> OK, then, until there is some new fix you better turn it off.
>
> Jarek P.
>
:( any further suggestions for helping track this down?
Please try this patch (and only this patch), on 2.6.33-rc5[*];
none of the other patches that did not make it upstream because that
confuses things too much.
The code that checks for DMA mapping errors on receive buffers would
not handle errors correctly. I doubt you have these errors, but if you
did then it would explain the problems. The code has to be a little
tricky and build mapping for new rx buffer before releasing old one,
that way if new mapping fails, the old one can be reused.
If it works for you, I will resubmit with signed-off.
---
* If you want to use DMA debugging, then you will also need the match patch.
--- a/drivers/net/sky2.c 2010-01-28 15:09:11.387649670 -0800
+++ b/drivers/net/sky2.c 2010-01-28 15:31:26.969181566 -0800
@@ -1106,18 +1106,39 @@ static int sky2_rx_map_skb(struct pci_de
int i;
re->data_addr = pci_map_single(pdev, skb->data, size, PCI_DMA_FROMDEVICE);
- if (unlikely(pci_dma_mapping_error(pdev, re->data_addr)))
- return -EIO;
+ if (pci_dma_mapping_error(pdev, re->data_addr))
+ goto mapping_error;
pci_unmap_len_set(re, data_size, size);
- for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
- re->frag_addr[i] = pci_map_page(pdev,
- skb_shinfo(skb)->frags[i].page,
- skb_shinfo(skb)->frags[i].page_offset,
- skb_shinfo(skb)->frags[i].size,
+ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+ re->frag_addr[i] = pci_map_page(pdev, frag->page,
+ frag->page_offset,
+ frag->size,
PCI_DMA_FROMDEVICE);
+
+ if (pci_dma_mapping_error(pdev, re->frag_addr[i]))
+ goto map_page_error;
+ }
return 0;
+
+map_page_error:
+ while (--i >= 0) {
+ pci_unmap_page(pdev, re->frag_addr[i],
+ skb_shinfo(skb)->frags[i].size,
+ PCI_DMA_FROMDEVICE);
+ }
+
+ pci_unmap_single(pdev, re->data_addr, pci_unmap_len(re, data_size),
+ PCI_DMA_FROMDEVICE);
+
+mapping_error:
+ if (net_ratelimit())
+ dev_warn(&pdev->dev, "%s: rx mapping error\n",
+ skb->dev->name);
+ return -EIO;
}
static void sky2_rx_unmap_skb(struct pci_dev *pdev, struct rx_ring_info *re)
@@ -2308,30 +2329,32 @@ static struct sk_buff *receive_new(struc
struct rx_ring_info *re,
unsigned int length)
{
- struct sk_buff *skb, *nskb;
+ struct sk_buff *skb;
+ struct rx_ring_info nre;
unsigned hdr_space = sky2->rx_data_size;
- /* Don't be tricky about reusing pages (yet) */
- nskb = sky2_rx_alloc(sky2);
- if (unlikely(!nskb))
- return NULL;
+ nre.skb = sky2_rx_alloc(sky2);
+ if (unlikely(!nre.skb))
+ goto nobuf;
+
+ if (sky2_rx_map_skb(sky2->hw->pdev, &nre, hdr_space))
+ goto nomap;
skb = re->skb;
sky2_rx_unmap_skb(sky2->hw->pdev, re);
-
prefetch(skb->data);
- re->skb = nskb;
- if (sky2_rx_map_skb(sky2->hw->pdev, re, hdr_space)) {
- dev_kfree_skb(nskb);
- re->skb = skb;
- return NULL;
- }
+ *re = nre;
if (skb_shinfo(skb)->nr_frags)
skb_put_frags(skb, hdr_space, length);
else
skb_put(skb, length);
return skb;
+
+nomap:
+ dev_kfree_skb(nre.skb);
+nobuf:
+ return NULL;
}
/*
On 1/28/2010 6:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly. I doubt you have these errors, but if you
> did then it would explain the problems. The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
> ---
> * If you want to use DMA debugging, then you will also need the match patch.
>
Ok - I'll also be running with the recent sched fork vs. hotplug vs.
cpuset namespaces patch (commit
fabf318e5e4bda0aca2b0d617b191884fda62703) from tip. Without that I get
an rcu hang.
My plan then is to run with your patch, the rcu patch & the dma debug
patch, but disable dma debug for now and see of the problem recurs. If
it works, I'll know in a couple of days. If not, perhaps sooner :(.
On 1/28/2010 6:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly. I doubt you have these errors, but if you
> did then it would explain the problems. The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
Nope - tx crash again. This time the system stayed up (but hosed) for a
few hours. When I tried to recover eth0, the system crashed.
Brief summary of events (log extract below):
System start Jan 28 19:29
Everything seemed good (load and all) until 17:13:11 the following day
when I got rx errors:
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:14 mail kernel: sky2 eth0: rx error, status 0x5f60010
length 1518
The system continued running normally after this until this morning (Jan
30) at 0:44:55:
Jan 30 05:44:55 mail kernel: DRHD: handling fault status reg 2
Jan 30 05:44:55 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
fault addr ffc4331ff000
Jan 30 05:44:55 mail kernel: DMAR:[fault reason 06] PTE Read access is
not set
Jan 30 05:44:55 mail kernel: net_ratelimit: 2 callbacks suppressed
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: error interrupt
status=0xc0000000
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 30 05:45:01 mail kernel: ------------[ cut here ]------------
Jan 30 05:45:01 mail kernel: WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0xf3/0x161()
Jan 30 05:45:01 mail kernel: Hardware name: System Product Name
Jan 30 05:45:01 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
queue 0 timed out
Jan 30 05:45:01 mail kernel: Modules linked in: iptable_raw
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat cpufreq_stats
ip6table_filter ip6table_mangle ip6_tables bridge stp appletalk psnap
llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_hda_intel
snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec
gspca_spca505 ac97_bus gspca_main snd_hwdep videodev snd_seq
snd_seq_device v4l1_compat snd_pcm v4l2_compat_ioctl32 snd_timer snd
soundcore snd_page_alloc firewire_ohci pcspkr i2c_i801 firewire_core wmi
asus_atk0110 crc_itu_t sky2 hwmon iTCO_wdt iTCO_vendor_support fbcon
tileblit font bitblit softcursor raid456 async_raid6_recov async_pq
raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit
cfbcopyarea i2c_core cf
Jan 30 05:45:01 mail kernel: bimgblt cfbfillrect [last unloaded: nf_nat]
Jan 30 05:45:01 mail kernel: Pid: 0, comm: swapper Tainted: G W
2.6.33-rc5WITHMMAPNODMARFORKTIPSKY2DMAMAP-00283-gd4d37bd-dirty #1
Jan 30 05:45:01 mail kernel: Call Trace:
Jan 30 05:45:01 mail kernel: <IRQ> [<ffffffff8104a03d>]
warn_slowpath_common+0x7c/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8104a0ac>]
warn_slowpath_fmt+0x41/0x43
Jan 30 05:45:01 mail kernel: [<ffffffff813d2f43>] ? netif_tx_lock+0x44/0x6c
Jan 30 05:45:01 mail kernel: [<ffffffff813d30ab>] dev_watchdog+0xf3/0x161
Jan 30 05:45:01 mail kernel: [<ffffffff8106a31f>] ?
sched_clock_cpu+0x44/0xce
Jan 30 05:45:01 mail kernel: [<ffffffff8105761a>]
run_timer_softirq+0x1c3/0x26b
Jan 30 05:45:01 mail kernel: [<ffffffff8105060c>] __do_softirq+0xf8/0x1cd
Jan 30 05:45:01 mail kernel: [<ffffffff8107192b>] ?
tick_program_event+0x2a/0x2c
Jan 30 05:45:01 mail kernel: [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
Jan 30 05:45:01 mail kernel: [<ffffffff8100c2b3>] do_softirq+0x4b/0xa3
Jan 30 05:45:01 mail kernel: [<ffffffff810501f8>] irq_exit+0x4a/0x8c
Jan 30 05:45:01 mail kernel: [<ffffffff81461859>]
smp_apic_timer_interrupt+0x86/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8100a5d3>]
apic_timer_interrupt+0x13/0x20
Jan 30 05:45:01 mail kernel: <EOI> [<ffffffff812afbd4>] ?
acpi_idle_enter_bm+0x256/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff812afbcd>] ?
acpi_idle_enter_bm+0x24f/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff8139574c>]
cpuidle_idle_call+0x9e/0xfa
Jan 30 05:45:01 mail kernel: [<ffffffff81008c05>] cpu_idle+0xb4/0xf6
Jan 30 05:45:01 mail kernel: [<ffffffff81455d48>]
start_secondary+0x201/0x242
Jan 30 05:45:01 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 30 05:45:01 mail kernel: sky2 eth0: tx timeout
Jan 30 05:45:01 mail kernel: sky2 eth0: transmit ring 14 .. 102
report=14 done=14
Jan 30 05:45:01 mail kernel: sky2 eth0: disabling interface
Jan 30 05:45:01 mail kernel: sky2 eth0: enabling interface
This down/up continued for several hours until I intervened at about 10:05.
I saw that there was no eth0 connectivity, eth1 was ok. It appeard that
eth0 was receiving traffic but unable to send. arpwatch was reporting
bogons, DHCP showed many DISCOVER/OFFER pairs, no REQUEST/ACK. Pings to
any system failed; arp showed incomplete for anything hanging off of
eth0. arping also failed.
I manually stopped and started eth0 (ifconfig) and reset iptables
(although eth0 has no filters).
As I started looking at logs, the system hung and rebooted. I'm up now
with dma debug enabled, however as with 2.6.32.4 num_entries is dropping
and I don't think that dma debug will remain enabled long enough to
catch a crash.
So, as I see things, there are two issues here: 1) the TX hang post DMAR
error and 2) the inability to recover the interface and subsequent
system instability.
On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly. I doubt you have these errors, but if you
> did then it would explain the problems. The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
> -
>
Nope - tx crash again. This time the system stayed up (but hosed) for a
few hours. When I tried to recover eth0 the system then crashed.
Brief summary of events (log extract below):
System start Jan 28 19:29
Everything seemed good (load and all) until 17:13:11 the following day
when I got rx errors:
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010
length 1518
Jan 29 17:13:14 mail kernel: sky2 eth0: rx error, status 0x5f60010
length 1518
The system continued running normally after this until this morning (Jan
30) at 0:44:55:
Jan 30 05:44:55 mail kernel: DRHD: handling fault status reg 2
Jan 30 05:44:55 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
fault addr ffc4331ff000
Jan 30 05:44:55 mail kernel: DMAR:[fault reason 06] PTE Read access is
not set
Jan 30 05:44:55 mail kernel: net_ratelimit: 2 callbacks suppressed
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: error interrupt
status=0xc0000000
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 30 05:45:01 mail kernel: ------------[ cut here ]------------
Jan 30 05:45:01 mail kernel: WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0xf3/0x161()
Jan 30 05:45:01 mail kernel: Hardware name: System Product Name
Jan 30 05:45:01 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
queue 0 timed out
Jan 30 05:45:01 mail kernel: Modules linked in: iptable_raw
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat cpufreq_stats
ip6table_filter ip6table_mangle ip6_tables bridge stp appletalk psnap
llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_hda_intel
snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec
gspca_spca505 ac97_bus gspca_main snd_hwdep videodev snd_seq
snd_seq_device v4l1_compat snd_pcm v4l2_compat_ioctl32 snd_timer snd
soundcore snd_page_alloc firewire_ohci pcspkr i2c_i801 firewire_core wmi
asus_atk0110 crc_itu_t sky2 hwmon iTCO_wdt iTCO_vendor_support fbcon
tileblit font bitblit softcursor raid456 async_raid6_recov async_pq
raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit
cfbcopyarea i2c_core cf
Jan 30 05:45:01 mail kernel: bimgblt cfbfillrect [last unloaded: nf_nat]
Jan 30 05:45:01 mail kernel: Pid: 0, comm: swapper Tainted: G W
2.6.33-rc5WITHMMAPNODMARFORKTIPSKY2DMAMAP-00283-gd4d37bd-dirty #1
Jan 30 05:45:01 mail kernel: Call Trace:
Jan 30 05:45:01 mail kernel: <IRQ> [<ffffffff8104a03d>]
warn_slowpath_common+0x7c/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8104a0ac>]
warn_slowpath_fmt+0x41/0x43
Jan 30 05:45:01 mail kernel: [<ffffffff813d2f43>] ? netif_tx_lock+0x44/0x6c
Jan 30 05:45:01 mail kernel: [<ffffffff813d30ab>] dev_watchdog+0xf3/0x161
Jan 30 05:45:01 mail kernel: [<ffffffff8106a31f>] ?
sched_clock_cpu+0x44/0xce
Jan 30 05:45:01 mail kernel: [<ffffffff8105761a>]
run_timer_softirq+0x1c3/0x26b
Jan 30 05:45:01 mail kernel: [<ffffffff8105060c>] __do_softirq+0xf8/0x1cd
Jan 30 05:45:01 mail kernel: [<ffffffff8107192b>] ?
tick_program_event+0x2a/0x2c
Jan 30 05:45:01 mail kernel: [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
Jan 30 05:45:01 mail kernel: [<ffffffff8100c2b3>] do_softirq+0x4b/0xa3
Jan 30 05:45:01 mail kernel: [<ffffffff810501f8>] irq_exit+0x4a/0x8c
Jan 30 05:45:01 mail kernel: [<ffffffff81461859>]
smp_apic_timer_interrupt+0x86/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8100a5d3>]
apic_timer_interrupt+0x13/0x20
Jan 30 05:45:01 mail kernel: <EOI> [<ffffffff812afbd4>] ?
acpi_idle_enter_bm+0x256/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff812afbcd>] ?
acpi_idle_enter_bm+0x24f/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff8139574c>]
cpuidle_idle_call+0x9e/0xfa
Jan 30 05:45:01 mail kernel: [<ffffffff81008c05>] cpu_idle+0xb4/0xf6
Jan 30 05:45:01 mail kernel: [<ffffffff81455d48>]
start_secondary+0x201/0x242
Jan 30 05:45:01 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 30 05:45:01 mail kernel: sky2 eth0: tx timeout
Jan 30 05:45:01 mail kernel: sky2 eth0: transmit ring 14 .. 102
report=14 done=14
Jan 30 05:45:01 mail kernel: sky2 eth0: disabling interface
Jan 30 05:45:01 mail kernel: sky2 eth0: enabling interface
This down/up continued for several hours until I intervened at about 10:05.
I saw that there was no eth0 connectivity, eth1 was ok. It appeard that
eth0 was receiving traffic but unable to send. arpwatch was reporting
bogons, DHCP showed many DISCOVER/OFFER pairs, no REQUEST/ACK. Pings to
any system failed; arp showed incomplete for anything hanging off of
eth0. arping also failed.
I manually stopped and started eth0 (ifconfig) and reset iptables
(although eth0 has no filters).
As I started looking at logs, the system hung and rebooted. I'm up now
with dma debug enabled, however as with 2.6.32.4 num_entries is dropping
and I don't think that dma debug will remain enabled long enough to
catch a crash.
So, as I see things, there are two issues here: 1) the TX hang post DMAR
error and 2) the inability to recover the interface and subsequent
system instability.
On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
> On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> >Please try this patch (and only this patch), on 2.6.33-rc5[*];
> >none of the other patches that did not make it upstream because that
> >confuses things too much.
> >
> >The code that checks for DMA mapping errors on receive buffers would
> >not handle errors correctly. I doubt you have these errors, but if you
> >did then it would explain the problems. The code has to be a little
> >tricky and build mapping for new rx buffer before releasing old one,
> >that way if new mapping fails, the old one can be reused.
> >
> >If it works for you, I will resubmit with signed-off.
> >
> >-
> >
> Nope - tx crash again. This time the system stayed up (but hosed)
> for a few hours. When I tried to recover eth0 the system then
> crashed.
>
> Brief summary of events (log extract below):
>
> System start Jan 28 19:29
> Everything seemed good (load and all) until 17:13:11 the following
> day when I got rx errors:
>
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
> length 1518
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
> length 1518
These are length errors, but status shows more than 1518, e.g. 2036
here, unless I miss something. Please, don't use jumbo frames in your
network until we fully debug it for regular frames (Stephen admitted
sky2 jumbo might be broken).
...
> As I started looking at logs, the system hung and rebooted. I'm up
> now with dma debug enabled, however as with 2.6.32.4 num_entries is
> dropping and I don't think that dma debug will remain enabled long
> enough to catch a crash.
Could you try the patch below to show maybe some other users of
dma-debug entries?
Jarek P.
---
lib/dma-debug.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 51 insertions(+), 1 deletions(-)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 7d2f0b3..e2dcc9c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -310,6 +310,53 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
list_del(&entry->list);
}
+struct dma_debug_dev {
+ struct device *dev;
+ unsigned int cnt;
+};
+
+#define DMA_DEBUG_DEVS 100
+static struct dma_debug_dev dma_debug_devs[DMA_DEBUG_DEVS];
+
+static void debug_dma_dump_devs(void)
+{
+ int idx, i;
+
+ memset(dma_debug_devs, 0, sizeof(struct dma_debug_dev) * DMA_DEBUG_DEVS);
+
+ for (idx = 0; idx < HASH_SIZE; idx++) {
+ struct hash_bucket *bucket = &dma_entry_hash[idx];
+ struct dma_debug_entry *entry;
+ unsigned long flags;
+
+ spin_lock_irqsave(&bucket->lock, flags);
+
+ list_for_each_entry(entry, &bucket->list, list) {
+ for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+ struct device *dev = dma_debug_devs[i].dev;
+
+ if (!dev || dev == entry->dev) {
+ dma_debug_devs[i].dev = entry->dev;
+ dma_debug_devs[i].cnt++;
+ break;
+ }
+ }
+ }
+
+ spin_unlock_irqrestore(&bucket->lock, flags);
+ }
+
+ for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+ struct device *dev = dma_debug_devs[i].dev;
+
+ if (!dev)
+ break;
+
+ pr_info("DMA-API: %s: entries: %d\n", dev_name(dev),
+ dma_debug_devs[i].cnt);
+ }
+}
+
/*
* Dump mapping entries for debugging purposes
*/
@@ -363,8 +410,11 @@ static struct dma_debug_entry *__dma_entry_alloc(void)
memset(entry, 0, sizeof(*entry));
num_free_entries -= 1;
- if (num_free_entries < min_free_entries)
+ if (num_free_entries < min_free_entries) {
min_free_entries = num_free_entries;
+ if ((min_free_entries & 0xffff) == 0)
+ debug_dma_dump_devs();
+ }
return entry;
}
On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
>
>> On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
>>
>>> Please try this patch (and only this patch), on 2.6.33-rc5[*];
>>> none of the other patches that did not make it upstream because that
>>> confuses things too much.
>>>
>>> The code that checks for DMA mapping errors on receive buffers would
>>> not handle errors correctly. I doubt you have these errors, but if you
>>> did then it would explain the problems. The code has to be a little
>>> tricky and build mapping for new rx buffer before releasing old one,
>>> that way if new mapping fails, the old one can be reused.
>>>
>>> If it works for you, I will resubmit with signed-off.
>>>
>>> -
>>>
>>>
>> Nope - tx crash again. This time the system stayed up (but hosed)
>> for a few hours. When I tried to recover eth0 the system then
>> crashed.
>>
>> Brief summary of events (log extract below):
>>
>> System start Jan 28 19:29
>> Everything seemed good (load and all) until 17:13:11 the following
>> day when I got rx errors:
>>
>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
>> length 1518
>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
>> length 1518
>>
> These are length errors, but status shows more than 1518, e.g. 2036
> here, unless I miss something. Please, don't use jumbo frames in your
> network until we fully debug it for regular frames (Stephen admitted
> sky2 jumbo might be broken).
>
MTU was 1500 - not using jumbo frames as they don't work.
> ...
>
>> As I started looking at logs, the system hung and rebooted. I'm up
>> now with dma debug enabled, however as with 2.6.32.4 num_entries is
>> dropping and I don't think that dma debug will remain enabled long
>> enough to catch a crash.
>>
> Could you try the patch below to show maybe some other users of
> dma-debug entries?
>
> Jarek P.
> ---
>
Will do. Note that I'm running with the dma debug filter set to sky2.
On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>
> Could you try the patch below to show maybe some other users of
> dma-debug entries?
>
> Jarek P.
> ---
>
>
With the default # entries & dma_debug_driver=sky2:
6:00 is eth0 & 4:00 is eth1.
Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1b.0: entries: 15
Jan 30 23:53:14 mail kernel: DMA-API: 0000:04:00.0: entries: 744
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1f.2: entries: 6
Jan 30 23:53:14 mail kernel: DMA-API: 0000:08:01.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.1: entries: 6
Jan 30 23:53:14 mail kernel: DMA-API: 0000:08:02.0: entries: 8
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.7: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.7: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.1: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.2: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.2: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:02:00.0: entries: 1
Jan 30 23:53:14 mail kernel: DMA-API: 0000:05:00.0: entries: 2
Jan 30 23:53:14 mail kernel: DMA-API: debugging out of memory - disabling
On 1/30/2010 11:55 PM, Michael Breuer wrote:
> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>
>> Could you try the patch below to show maybe some other users of
>> dma-debug entries?
>>
>> Jarek P.
>> ---
>>
> With the default # entries & dma_debug_driver=sky2:
>
> 6:00 is eth0 & 4:00 is eth1.
>
> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
> ...
>
I put a printk as a third else case in sky2_tx_unmap. Looks like the
issue is that a large number (perhaps all) calls to sky2_tx_unmap have
re->flags set to neither TX_MAP_SINGLE or TX_MAP_PAGE. Thus the elements
are never being unmapped.
I suspect that the system collapses when using DMAR sooner than if not
using DMAR. Probably some hardware limitation on the number of mapped
elements that is less than the software limitation. I don't see at
present how a ring element can ever get to this code without re->flags
being set to one or the other.
On 1/31/2010 1:50 PM, Michael Breuer wrote:
> On 1/30/2010 11:55 PM, Michael Breuer wrote:
>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>>
>>> Could you try the patch below to show maybe some other users of
>>> dma-debug entries?
>>>
>>> Jarek P.
>>> ---
>>>
>> With the default # entries & dma_debug_driver=sky2:
>>
>> 6:00 is eth0 & 4:00 is eth1.
>>
>> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
>> ...
>>
> I put a printk as a third else case in sky2_tx_unmap. Looks like the
> issue is that a large number (perhaps all) calls to sky2_tx_unmap have
> re->flags set to neither TX_MAP_SINGLE or TX_MAP_PAGE. Thus the
> elements are never being unmapped.
>
> I suspect that the system collapses when using DMAR sooner than if not
> using DMAR. Probably some hardware limitation on the number of mapped
> elements that is less than the software limitation. I don't see at
> present how a ring element can ever get to this code without re->flags
> being set to one or the other.
>
>
Put some more debugging code in... re->flags is always NULL upon entry
to sky2_tx_unmap.
On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
> On 1/31/2010 1:50 PM, Michael Breuer wrote:
> >On 1/30/2010 11:55 PM, Michael Breuer wrote:
> >>On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> >>>
> >>>Could you try the patch below to show maybe some other users of
> >>>dma-debug entries?
> >>>
> >>>Jarek P.
> >>>---
> >>>
> >>With the default # entries & dma_debug_driver=sky2:
> >>
> >>6:00 is eth0 & 4:00 is eth1.
> >>
> >>Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
> >>...
> >>
> >I put a printk as a third else case in sky2_tx_unmap. Looks like
> >the issue is that a large number (perhaps all) calls to
> >sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
> >TX_MAP_PAGE. Thus the elements are never being unmapped.
> >
> >I suspect that the system collapses when using DMAR sooner than if
> >not using DMAR. Probably some hardware limitation on the number of
> >mapped elements that is less than the software limitation. I don't
> >see at present how a ring element can ever get to this code
> >without re->flags being set to one or the other.
> >
> >
> Put some more debugging code in... re->flags is always NULL upon
> entry to sky2_tx_unmap.
>
Yes, good point! Could you try if this patch can fix it. (not compiled)
Thanks,
Jarek P.
---
drivers/net/sky2.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..3437917 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
{
struct sky2_tx_le *le = sky2->tx_le + *slot;
- struct tx_ring_info *re = sky2->tx_ring + *slot;
+ struct tx_ring_info *re;
*slot = RING_NEXT(*slot, sky2->tx_ring_size);
+ re = sky2->tx_ring + *slot;
re->flags = 0;
re->skb = NULL;
le->ctrl = 0;
@@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
static void tx_init(struct sky2_port *sky2)
{
- struct sky2_tx_le *le;
+ struct sky2_tx_le *le = sky2->tx_le;
+ struct tx_ring_info *re = sky2->tx_ring;
sky2->tx_prod = sky2->tx_cons = 0;
sky2->tx_tcpsum = 0;
sky2->tx_last_mss = 0;
- le = get_tx_le(sky2, &sky2->tx_prod);
+ re->flags = 0;
+ re->skb = NULL;
+ le->ctrl = 0;
le->addr = 0;
le->opcode = OP_ADDR64 | HW_OWNER;
sky2->tx_last_upper = 0;
On Sat, Jan 30, 2010 at 11:17:41PM -0500, Michael Breuer wrote:
> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> >On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
> >>Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
> >>length 1518
> >>Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
> >>length 1518
> >These are length errors, but status shows more than 1518, e.g. 2036
> >here, unless I miss something. Please, don't use jumbo frames in your
> >network until we fully debug it for regular frames (Stephen admitted
> >sky2 jumbo might be broken).
> MTU was 1500 - not using jumbo frames as they don't work.
Do you mean no NIC in your network could have sent such frames?
Jarek P.
On 1/31/2010 5:25 PM, Jarek Poplawski wrote:
> On Sat, Jan 30, 2010 at 11:17:41PM -0500, Michael Breuer wrote:
>
>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>
>>> On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
>>>
>>>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
>>>> length 1518
>>>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
>>>> length 1518
>>>>
>>> These are length errors, but status shows more than 1518, e.g. 2036
>>> here, unless I miss something. Please, don't use jumbo frames in your
>>> network until we fully debug it for regular frames (Stephen admitted
>>> sky2 jumbo might be broken).
>>>
>> MTU was 1500 - not using jumbo frames as they don't work.
>>
> Do you mean no NIC in your network could have sent such frames?
>
> Jarek P.
>
Well... There's only one possible source... and if there were it would
have been a Win7 bug :) Regardless, sky2 shouldn't be sensitive to rogue
external network stuff.
On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>
>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>
>>> On 1/30/2010 11:55 PM, Michael Breuer wrote:
>>>
>>>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>>>
>>>>> Could you try the patch below to show maybe some other users of
>>>>> dma-debug entries?
>>>>>
>>>>> Jarek P.
>>>>> ---
>>>>>
>>>>>
>>>> With the default # entries& dma_debug_driver=sky2:
>>>>
>>>> 6:00 is eth0& 4:00 is eth1.
>>>>
>>>> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
>>>> ...
>>>>
>>>>
>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>> the issue is that a large number (perhaps all) calls to
>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>
>>> I suspect that the system collapses when using DMAR sooner than if
>>> not using DMAR. Probably some hardware limitation on the number of
>>> mapped elements that is less than the software limitation. I don't
>>> see at present how a ring element can ever get to this code
>>> without re->flags being set to one or the other.
>>>
>>>
>>>
>> Put some more debugging code in... re->flags is always NULL upon
>> entry to sky2_tx_unmap.
>>
>>
> Yes, good point! Could you try if this patch can fix it. (not compiled)
>
> Thanks,
> Jarek P.
> ---
>
> drivers/net/sky2.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index d760650..3437917 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
> + struct tx_ring_info *re;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> + re = sky2->tx_ring + *slot;
> re->flags = 0;
> re->skb = NULL;
> le->ctrl = 0;
> @@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>
> static void tx_init(struct sky2_port *sky2)
> {
> - struct sky2_tx_le *le;
> + struct sky2_tx_le *le = sky2->tx_le;
> + struct tx_ring_info *re = sky2->tx_ring;
>
> sky2->tx_prod = sky2->tx_cons = 0;
> sky2->tx_tcpsum = 0;
> sky2->tx_last_mss = 0;
>
> - le = get_tx_le(sky2,&sky2->tx_prod);
> + re->flags = 0;
> + re->skb = NULL;
> + le->ctrl = 0;
> le->addr = 0;
> le->opcode = OP_ADDR64 | HW_OWNER;
> sky2->tx_last_upper = 0;
>
Ok- solves the dma-debug issue - i.e., elements are now being unmapped.
Will leave up and hit with traffic unless a crash occurs. If I hit
something unrelated I'll backport to 2.6.32.7 and try that for a while.
I do think it's plausible that the dma errors after (during) load were
due to hardware limitations on the number of mapped entries (haven't
researched what that limit was). I would also assume that the sw map
would also have failed eventually.
I'd suggest that regardless of whether this patch solves my crash that
it ought to be backported as it seems unlikely that any machine would be
able to survive for long without the tx entries being unmapped.
On 1/31/2010 7:19 PM, Michael Breuer wrote:
> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
> solves the dma-debug issue - i.e., elements are now being unmapped.
>
> Will leave up and hit with traffic unless a crash occurs. If I hit
> something unrelated I'll backport to 2.6.32.7 and try that for a
> while. I do think it's plausible that the dma errors after (during)
> load were due to hardware limitations on the number of mapped entries
> (haven't researched what that limit was). I would also assume that the
> sw map would also have failed eventually.
>
> I'd suggest that regardless of whether this patch solves my crash that
> it ought to be backported as it seems unlikely that any machine would
> be able to survive for long without the tx entries being unmapped.
>
FYI - tried generating lots of extra tx traffic... found a way to
generate the rx status messages on demand:
ping -i .0000001 -s 8000 -t 2 <host> >/dev/null
Yields:
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
length 1518
Jan 31 23:08:12 mail kernel: net_ratelimit: 316 callbacks suppressed
etc.
Looking at the packet trace, it seems that my Windows7 box is under
*some* circumstances not observing the MTU. In this case, the ICMP reply
is going back with the 8000 byte jumbo frame unfragmented. It seems that
the reverse is also true. I don't know why sometimes win7 does this, and
at other times properly fragments.
Oddly, prior to this attempt if I set no fragment on a ping from the
windows box back to the linux box and a size of > mtu (like 8000), the
ping failed. Absent the no-fragment flag, the ping properly fragmented.
I am not sure why Windows now thinks the MTU is > 1500. I'll look into
that when I have some time. It's possible that with 2.6.33-rc5 & the
patches I've got that somehow path mtu discovery is broken as nothing
changed on the windows side.
Understanding that the other side is out of spec, I'd still wonder why
the sky2 driver generates rx errors. Perhaps overruns should be tossed
silently... by the hardware if possible.
On Sun, Jan 31, 2010 at 07:19:46PM -0500, Michael Breuer wrote:
> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
>> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>>
>>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>>> the issue is that a large number (perhaps all) calls to
>>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>>
>>>> I suspect that the system collapses when using DMAR sooner than if
>>>> not using DMAR. Probably some hardware limitation on the number of
>>>> mapped elements that is less than the software limitation. I don't
>>>> see at present how a ring element can ever get to this code
>>>> without re->flags being set to one or the other.
>>>>
>>>>
>>>>
>>> Put some more debugging code in... re->flags is always NULL upon
>>> entry to sky2_tx_unmap.
>>>
>>>
>> Yes, good point! Could you try if this patch can fix it. (not compiled)
...
> Ok- solves the dma-debug issue - i.e., elements are now being unmapped.
>
> Will leave up and hit with traffic unless a crash occurs. If I hit
> something unrelated I'll backport to 2.6.32.7 and try that for a while.
> I do think it's plausible that the dma errors after (during) load were
> due to hardware limitations on the number of mapped entries (haven't
> researched what that limit was). I would also assume that the sw map
> would also have failed eventually.
>
> I'd suggest that regardless of whether this patch solves my crash that
> it ought to be backported as it seems unlikely that any machine would be
> able to survive for long without the tx entries being unmapped.
Here is a bit improved version (re->flags = 0 in sky2_tx_unmap()) for
merging, or additional testing if David wishes.
Thanks,
Jarek P.
--------------->
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.
This patch reorders initializations in get_tx_le() and tx_init(), and
additionally does re->flags zeroing in sky2_tx_unmap() to prevent
possible double unmapping.
With debugging by: Michael Breuer <[email protected]>
Reported-by: Michael Breuer <[email protected]>
Tested-by: Michael Breuer <[email protected]>
Signed-off-by: Jarek Poplawski <[email protected]>
---
drivers/net/sky2.c | 20 +++++++++++++-------
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..21bb00a 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
{
struct sky2_tx_le *le = sky2->tx_le + *slot;
- struct tx_ring_info *re = sky2->tx_ring + *slot;
+ struct tx_ring_info *re;
*slot = RING_NEXT(*slot, sky2->tx_ring_size);
+ re = sky2->tx_ring + *slot;
re->flags = 0;
re->skb = NULL;
le->ctrl = 0;
@@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
static void tx_init(struct sky2_port *sky2)
{
- struct sky2_tx_le *le;
+ struct sky2_tx_le *le = sky2->tx_le;
+ struct tx_ring_info *re = sky2->tx_ring;
sky2->tx_prod = sky2->tx_cons = 0;
sky2->tx_tcpsum = 0;
sky2->tx_last_mss = 0;
- le = get_tx_le(sky2, &sky2->tx_prod);
+ re->flags = 0;
+ re->skb = NULL;
+ le->ctrl = 0;
le->addr = 0;
le->opcode = OP_ADDR64 | HW_OWNER;
sky2->tx_last_upper = 0;
@@ -1622,17 +1626,19 @@ static unsigned tx_le_req(const struct sk_buff *skb)
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
- if (re->flags & TX_MAP_SINGLE)
+ if (re->flags & TX_MAP_SINGLE) {
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
- else if (re->flags & TX_MAP_PAGE)
+ re->flags = 0;
+ } else if (re->flags & TX_MAP_PAGE) {
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
+ }
}
/*
On Sun, Jan 31, 2010 at 11:26:03PM -0500, Michael Breuer wrote:
> FYI - tried generating lots of extra tx traffic... found a way to
> generate the rx status messages on demand:
> ping -i .0000001 -s 8000 -t 2 <host> >/dev/null
>
> Yields:
> Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010
> length 1518
...
> Jan 31 23:08:12 mail kernel: net_ratelimit: 316 callbacks suppressed
> etc.
...
> Understanding that the other side is out of spec, I'd still wonder why
> the sky2 driver generates rx errors. Perhaps overruns should be tossed
> silently... by the hardware if possible.
Of course it's a matter of taste, but it seems such errors shouldn't
be tolerated in a local network. I'd rather prefer doing them more
explicit (like e.g. some other kind of length errors).
Jarek P.
On 2/1/2010 4:17 AM, Jarek Poplawski wrote:
> On Sun, Jan 31, 2010 at 07:19:46PM -0500, Michael Breuer wrote:
>
>> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
>>
>>> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>>>
>>>
>>>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>>>
>>>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>>>> the issue is that a large number (perhaps all) calls to
>>>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>>>
>>>>> I suspect that the system collapses when using DMAR sooner than if
>>>>> not using DMAR. Probably some hardware limitation on the number of
>>>>> mapped elements that is less than the software limitation. I don't
>>>>> see at present how a ring element can ever get to this code
>>>>> without re->flags being set to one or the other.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Put some more debugging code in... re->flags is always NULL upon
>>>> entry to sky2_tx_unmap.
>>>>
>>>>
>>>>
>>> Yes, good point! Could you try if this patch can fix it. (not compiled)
>>>
> ...
>
>> Ok- solves the dma-debug issue - i.e., elements are now being unmapped.
>>
>> Will leave up and hit with traffic unless a crash occurs. If I hit
>> something unrelated I'll backport to 2.6.32.7 and try that for a while.
>> I do think it's plausible that the dma errors after (during) load were
>> due to hardware limitations on the number of mapped entries (haven't
>> researched what that limit was). I would also assume that the sw map
>> would also have failed eventually.
>>
>> I'd suggest that regardless of whether this patch solves my crash that
>> it ought to be backported as it seems unlikely that any machine would be
>> able to survive for long without the tx entries being unmapped.
>>
> Here is a bit improved version (re->flags = 0 in sky2_tx_unmap()) for
> merging, or additional testing if David wishes.
>
> Thanks,
> Jarek P.
> --------------->
> Michael Breuer reported that dma-debug entries added by sky2 driver
> weren't unmapped, and found out "re->flags is always NULL upon entry
> to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
> introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.
>
> This patch reorders initializations in get_tx_le() and tx_init(), and
> additionally does re->flags zeroing in sky2_tx_unmap() to prevent
> possible double unmapping.
>
> With debugging by: Michael Breuer<[email protected]>
>
> Reported-by: Michael Breuer<[email protected]>
> Tested-by: Michael Breuer<[email protected]>
> Signed-off-by: Jarek Poplawski<[email protected]>
> ---
>
> drivers/net/sky2.c | 20 +++++++++++++-------
> 1 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index d760650..21bb00a 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
> + struct tx_ring_info *re;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> + re = sky2->tx_ring + *slot;
> re->flags = 0;
> re->skb = NULL;
> le->ctrl = 0;
> @@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>
> static void tx_init(struct sky2_port *sky2)
> {
> - struct sky2_tx_le *le;
> + struct sky2_tx_le *le = sky2->tx_le;
> + struct tx_ring_info *re = sky2->tx_ring;
>
> sky2->tx_prod = sky2->tx_cons = 0;
> sky2->tx_tcpsum = 0;
> sky2->tx_last_mss = 0;
>
> - le = get_tx_le(sky2,&sky2->tx_prod);
> + re->flags = 0;
> + re->skb = NULL;
> + le->ctrl = 0;
> le->addr = 0;
> le->opcode = OP_ADDR64 | HW_OWNER;
> sky2->tx_last_upper = 0;
> @@ -1622,17 +1626,19 @@ static unsigned tx_le_req(const struct sk_buff *skb)
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> - if (re->flags& TX_MAP_SINGLE)
> + if (re->flags& TX_MAP_SINGLE) {
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> - else if (re->flags& TX_MAP_PAGE)
> + re->flags = 0;
> + } else if (re->flags& TX_MAP_PAGE) {
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> + }
> }
>
> /*
>
Running with v2 now. v1 patch significantly improved things. I cannot
state for certain that the original issue is resolved (crash under
load), however so far it appears so. Additionally, some performance
issues (raid rebuild times, etc.) seem drastically improved. I'm
guessing that this might have to do with the excess mapped sky2 tx buffers.
On Sun, 31 Jan 2010 23:18:35 +0100
Jarek Poplawski <[email protected]> wrote:
> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
> + struct tx_ring_info *re;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> + re = sky2->tx_ring + *slot;
> re->flags = 0;
Bogus, le and re are 1-to-1, so hardware portion an software
portion should be at same index.
--
This fixes the fact that re->flags is always zero without causing
other confusion.
--- a/drivers/net/sky2.c 2010-02-01 10:07:42.676296236 -0800
+++ b/drivers/net/sky2.c 2010-02-01 10:18:12.575044064 -0800
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
{
struct sky2_tx_le *le = sky2->tx_le + *slot;
- struct tx_ring_info *re = sky2->tx_ring + *slot;
*slot = RING_NEXT(*slot, sky2->tx_ring_size);
- re->flags = 0;
- re->skb = NULL;
le->ctrl = 0;
return le;
}
@@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
dev->stats.tx_packets++;
dev->stats.tx_bytes += skb->len;
+ re->skb = NULL;
dev_kfree_skb_any(skb);
sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
On 2/1/2010 1:20 PM, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
>
> --- a/drivers/net/sky2.c 2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c 2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> - re->flags = 0;
> - re->skb = NULL;
> le->ctrl = 0;
> return le;
> }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_packets++;
> dev->stats.tx_bytes += skb->len;
>
> + re->skb = NULL;
> dev_kfree_skb_any(skb);
>
> sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Running with this patch now - dma debug num_free_entries stable - so
far, so good.
On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
Actually, there is a slight confusion: after tx_init() slots #0 are
skipped and waiting for tx_complete. Of course, no big deal, but no
problem with fixing it too, so there is the main difference between
these two patches. (Moving re->flags and re->skb initializations is
an optimization, but I tried to change only the buggy parts.)
Jarek P.
>
> --- a/drivers/net/sky2.c 2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c 2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> - re->flags = 0;
> - re->skb = NULL;
> le->ctrl = 0;
> return le;
> }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags & TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_packets++;
> dev->stats.tx_bytes += skb->len;
>
> + re->skb = NULL;
> dev_kfree_skb_any(skb);
>
> sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
On Mon, Feb 01, 2010 at 09:13:23PM +0100, Jarek Poplawski wrote:
> On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> > This fixes the fact that re->flags is always zero without causing
> > other confusion.
>
> Actually, there is a slight confusion: after tx_init() slots #0 are
> skipped and waiting for tx_complete. Of course, no big deal, but no
> problem with fixing it too, so there is the main difference between
Hmm... On the other hand it could be fixed simpler by moving
sky2->tx_cons. I'll send v3.
Thanks,
Jarek P.
On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
Here it is, with only sky2->tx_cons update in tx_init().
Thanks,
Jarek P.
---------------> (take 3)
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.
This patch moves re->flags and re->skb initializations from
get_tx_le() to sky2_tx_unmap() and sky2_tx_complete() respectively.
Additionally sky2->tx_cons is updated to sky2->tx_prod in tx_init()
to remove one needless tx completion.
With debugging by: Michael Breuer <[email protected]>
Improved by: Stephen Hemminger <[email protected]>
Reported-by: Michael Breuer <[email protected]>
Tested-by: Michael Breuer <[email protected]>
Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Stephen Hemminger <[email protected]>
---
drivers/net/sky2.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..08cd65b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
{
struct sky2_tx_le *le = sky2->tx_le + *slot;
- struct tx_ring_info *re = sky2->tx_ring + *slot;
*slot = RING_NEXT(*slot, sky2->tx_ring_size);
- re->flags = 0;
- re->skb = NULL;
le->ctrl = 0;
return le;
}
@@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
{
struct sky2_tx_le *le;
- sky2->tx_prod = sky2->tx_cons = 0;
+ sky2->tx_prod = 0;
sky2->tx_tcpsum = 0;
sky2->tx_last_mss = 0;
le = get_tx_le(sky2, &sky2->tx_prod);
le->addr = 0;
le->opcode = OP_ADDR64 | HW_OWNER;
+ sky2->tx_cons = sky2->tx_prod;
sky2->tx_last_upper = 0;
}
@@ -1622,8 +1620,7 @@ static unsigned tx_le_req(const struct sk_buff *skb)
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1630,7 @@ static void sky2_tx_unmap(struct pci_dev *pdev,
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1839,6 +1837,7 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
dev->stats.tx_packets++;
dev->stats.tx_bytes += skb->len;
+ re->skb = NULL;
dev_kfree_skb_any(skb);
sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
[ Subject was wrong, sorry! ]
On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
Here it is, with only sky2->tx_cons update in tx_init().
Thanks,
Jarek P.
---------------> (take 3b)
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.
This patch moves re->flags and re->skb initializations from
get_tx_le() to sky2_tx_unmap() and sky2_tx_complete() respectively.
Additionally sky2->tx_cons is updated to sky2->tx_prod in tx_init()
to remove one needless tx completion.
With debugging by: Michael Breuer <[email protected]>
Improved by: Stephen Hemminger <[email protected]>
Reported-by: Michael Breuer <[email protected]>
Tested-by: Michael Breuer <[email protected]>
Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Stephen Hemminger <[email protected]>
---
drivers/net/sky2.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..08cd65b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
{
struct sky2_tx_le *le = sky2->tx_le + *slot;
- struct tx_ring_info *re = sky2->tx_ring + *slot;
*slot = RING_NEXT(*slot, sky2->tx_ring_size);
- re->flags = 0;
- re->skb = NULL;
le->ctrl = 0;
return le;
}
@@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
{
struct sky2_tx_le *le;
- sky2->tx_prod = sky2->tx_cons = 0;
+ sky2->tx_prod = 0;
sky2->tx_tcpsum = 0;
sky2->tx_last_mss = 0;
le = get_tx_le(sky2, &sky2->tx_prod);
le->addr = 0;
le->opcode = OP_ADDR64 | HW_OWNER;
+ sky2->tx_cons = sky2->tx_prod;
sky2->tx_last_upper = 0;
}
@@ -1622,8 +1620,7 @@ static unsigned tx_le_req(const struct sk_buff *skb)
return count;
}
-static void sky2_tx_unmap(struct pci_dev *pdev,
- const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
{
if (re->flags & TX_MAP_SINGLE)
pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1630,7 @@ static void sky2_tx_unmap(struct pci_dev *pdev,
pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
pci_unmap_len(re, maplen),
PCI_DMA_TODEVICE);
+ re->flags = 0;
}
/*
@@ -1839,6 +1837,7 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
dev->stats.tx_packets++;
dev->stats.tx_bytes += skb->len;
+ re->skb = NULL;
dev_kfree_skb_any(skb);
sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
On Mon, 1 Feb 2010 22:27:41 +0100
Jarek Poplawski <[email protected]> wrote:
> @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
> {
> struct sky2_tx_le *le;
>
> - sky2->tx_prod = sky2->tx_cons = 0;
> + sky2->tx_prod = 0;
> sky2->tx_tcpsum = 0;
> sky2->tx_last_mss = 0;
>
> le = get_tx_le(sky2, &sky2->tx_prod);
> le->addr = 0;
> le->opcode = OP_ADDR64 | HW_OWNER;
> + sky2->tx_cons = sky2->tx_prod;
> sky2->tx_last_upper = 0;
> }
Your change causes the initial element to be skipped. I want
it to goto the hardware. It makes sure the upper bits of the
first request are set (0).
I don't see what was wrong with my fix.
--
On Mon, Feb 01, 2010 at 02:29:42PM -0800, Stephen Hemminger wrote:
> On Mon, 1 Feb 2010 22:27:41 +0100
> Jarek Poplawski <[email protected]> wrote:
>
> > @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
> > {
> > struct sky2_tx_le *le;
> >
> > - sky2->tx_prod = sky2->tx_cons = 0;
> > + sky2->tx_prod = 0;
> > sky2->tx_tcpsum = 0;
> > sky2->tx_last_mss = 0;
> >
> > le = get_tx_le(sky2, &sky2->tx_prod);
> > le->addr = 0;
> > le->opcode = OP_ADDR64 | HW_OWNER;
> > + sky2->tx_cons = sky2->tx_prod;
> > sky2->tx_last_upper = 0;
> > }
>
> Your change causes the initial element to be skipped. I want
> it to goto the hardware. It makes sure the upper bits of the
> first request are set (0).
I thought "Send high bits if needed" part in sky2_xmit_frame() was
enough. If it's otherwise than my patch was wrong.
>
> I don't see what was wrong with my fix.
>
It's OK. Please, submit it instead of mine.
Thanks,
Jarek P.
On Mon, 1 Feb 2010 23:46:39 +0100
Jarek Poplawski <[email protected]> wrote:
> On Mon, Feb 01, 2010 at 02:29:42PM -0800, Stephen Hemminger wrote:
> > On Mon, 1 Feb 2010 22:27:41 +0100
> > Jarek Poplawski <[email protected]> wrote:
> >
> > > @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
> > > {
> > > struct sky2_tx_le *le;
> > >
> > > - sky2->tx_prod = sky2->tx_cons = 0;
> > > + sky2->tx_prod = 0;
> > > sky2->tx_tcpsum = 0;
> > > sky2->tx_last_mss = 0;
> > >
> > > le = get_tx_le(sky2, &sky2->tx_prod);
> > > le->addr = 0;
> > > le->opcode = OP_ADDR64 | HW_OWNER;
> > > + sky2->tx_cons = sky2->tx_prod;
> > > sky2->tx_last_upper = 0;
> > > }
> >
> > Your change causes the initial element to be skipped. I want
> > it to goto the hardware. It makes sure the upper bits of the
> > first request are set (0).
>
> I thought "Send high bits if needed" part in sky2_xmit_frame() was
> enough. If it's otherwise than my patch was wrong.
>
The definition is "high bits are different than the last value".
The code init sets the last value to zero, and initializes the hardware
engine as well.
Stephen Hemminger <[email protected]> writes:
> - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> - re->frag_addr[i] = pci_map_page(pdev,
> +
> +map_page_error:
> + while (--i >= 0) {
> + pci_unmap_page(pdev, re->frag_addr[i],
> + skb_shinfo(skb)->frags[i].size,
> + PCI_DMA_FROMDEVICE);
> + }
> +
> + pci_unmap_single(pdev, re->data_addr, pci_unmap_len(re, data_size),
> + PCI_DMA_FROMDEVICE);
Better add a helper somewhere to do this, doesn't make sense
to duplicate this in all drivers (lots of drivers have similar
problems)
I remember looking at this some time ago but for some reason
the patches never made it out.
-Andi
--
[email protected] -- Speaking for myself only.
Sorry for the late reply,
On Thu, 21 Jan 2010 22:38:41 -0800 (PST)
David Miller <[email protected]> wrote:
> From: FUJITA Tomonori <[email protected]>
> Date: Fri, 22 Jan 2010 14:11:29 +0900
>
> > Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> > not a multiple of the cache line size, it's possible that driver
> > writers who aren't familiar with cache would be surprised (it depends
> > on the way their drivers use buffers though).
> >
> > The easiest way for 'completely safe sync for any driver writers' is
> > asking for all the sync parameters must be the same as those passed
> > into the single mapping API. If writes knows what they do, they can do
> > a partial sync with sync_range API. That's the author intention, I
> > guess.
>
> This is not reasonable.
>
> You have to think about how people actually use these
> interfaces.
>
> They have a large buffer, and if they receive a small request they
> want to allocate a smaller buffer, copy into that smaller buffer, and
> give the larger buffer back to the hardware.
>
> It's an optimization, it performs better this way.
>
> If you make it so that the DMA sync has to cover the entire large
> buffer, the whole point of the optimization is taken away.
I talked with James. He is ok with changing (or fixing) this API to
enable users to do a partial sync (I'm ok with that too. I just
guessed that he designed the API in such way intentionally not by
mistake).
Can we safely assume that the arch implementations already round
up/down to the safe boundary internally in this API (they should
already)?
As you know, the patch to remove the description of
dma_sync_single/pci_dma_sync_single/dma_sync_sg/pci_dma_sync_pci that
always require a full sync in DMA-API.txt is already -mm so what we
need to do are:
- adding 'a partial sync' description to PCI-DMA-mapping.txt.
- duplicating the similar description to DMA-API.txt.
I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
have the generic DMA API with generic devices so we are always able to
use the API (as you did with sbus_map_*). The majority arch
implementations safely call the bus specific DMA functions via the
generic DMA API. So there are not many things to do. We can just
convert pci_dma_* to dma_* API slowly.
Opinions?
From: FUJITA Tomonori <[email protected]>
Date: Wed, 3 Feb 2010 10:18:39 +0900
> Can we safely assume that the arch implementations already round
> up/down to the safe boundary internally in this API (they should
> already)?
I can only speak for sparc64 and x86 directly and those are fine.
Any such improper implementations would fail with many common
ethernet drivers already.
> I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
> have the generic DMA API with generic devices so we are always able to
> use the API (as you did with sbus_map_*). The majority arch
> implementations safely call the bus specific DMA functions via the
> generic DMA API. So there are not many things to do. We can just
> convert pci_dma_* to dma_* API slowly.
>
> Opinions?
I have no problem with this.
On 2/1/2010 1:20 PM, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
>
> --- a/drivers/net/sky2.c 2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c 2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
> static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
> {
> struct sky2_tx_le *le = sky2->tx_le + *slot;
> - struct tx_ring_info *re = sky2->tx_ring + *slot;
>
> *slot = RING_NEXT(*slot, sky2->tx_ring_size);
> - re->flags = 0;
> - re->skb = NULL;
> le->ctrl = 0;
> return le;
> }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
> return count;
> }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> - const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
> {
> if (re->flags& TX_MAP_SINGLE)
> pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
> pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
> pci_unmap_len(re, maplen),
> PCI_DMA_TODEVICE);
> + re->flags = 0;
> }
>
> /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
> dev->stats.tx_packets++;
> dev->stats.tx_bytes += skb->len;
>
> + re->skb = NULL;
> dev_kfree_skb_any(skb);
>
> sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Just a brief update - this has been up and stable for about 32 hours -
I've been periodically generating load on the system. No kernel errors
of any sort so far. Actually, in retrospect, I believe the dma issue was
triggering other bad things - including an rcu lockup (patched in tip -
sched.c).
Just as an FYI - (and this should probably be in a new thread) I am
seeing an large number (>9,000,000) of dropped rx packets, however at
this time I see no errors resulting from that (on this or client
machines). As the # of dropped packets hasn't incremented at any time I
was observing things, I can't say what this is about. Probably nothing,
but I'll see if I can track down what is going on. I did see some of
this earlier on while troubleshooting the sky2 issues that now seem
resolved. Quick crosschecking of other machines do not show high error
or retransmission rates. I'm also not seeing any evidence of other
errors (no errors reported by ifconfig, or ethtool, or printk (debug is
enabled).
I'm wondering whether these dropped packets are due mostly to hitting
GMR_FS_RX_OK in sky2_receive. I'm also guessing that the high numbers of
this that I'm seeing is an artifact of being able to pump more traffic
through with the above patch. Given the description of the status code
in sky2.h (receive ok) I'm wondering whether a) this should be reported
as dropped, b) whether resubmit is necessary, c) whether it's possible
that eth1 events coinciding with eth0 events are the cause and d)
whether or not there's another issue entirely.
On 02/02/2010 11:07 PM, Michael Breuer wrote:
> Just a brief update - this has been up and stable for about 32 hours -
> I've been periodically generating load on the system. No kernel errors
> of any sort so far. Actually, in retrospect, I believe the dma issue
> was triggering other bad things - including an rcu lockup (patched in
> tip - sched.c).
>
> Just as an FYI - (and this should probably be in a new thread) I am
> seeing an large number (>9,000,000) of dropped rx packets, however at
> this time I see no errors resulting from that (on this or client
> machines). As the # of dropped packets hasn't incremented at any time
> I was observing things, I can't say what this is about. Probably
> nothing, but I'll see if I can track down what is going on. I did see
> some of this earlier on while troubleshooting the sky2 issues that now
> seem resolved. Quick crosschecking of other machines do not show high
> error or retransmission rates. I'm also not seeing any evidence of
> other errors (no errors reported by ifconfig, or ethtool, or printk
> (debug is enabled).
>
> I'm wondering whether these dropped packets are due mostly to hitting
> GMR_FS_RX_OK in sky2_receive. I'm also guessing that the high numbers
> of this that I'm seeing is an artifact of being able to pump more
> traffic through with the above patch. Given the description of the
> status code in sky2.h (receive ok) I'm wondering whether a) this
> should be reported as dropped, b) whether resubmit is necessary, c)
> whether it's possible that eth1 events coinciding with eth0 events are
> the cause and d) whether or not there's another issue entirely.
>
Tracked this down. The status being returned is 0x3c0080 - good flow
control packets. Nothing is actually being dropped (confirmed by packet
trace on switch compared with packet trace on server).
I whipped up a trivial patch to not count these as dropped packets and
will post to netdev.
I'm not really sure what the driver should be doing in this case, but
resubmit seems to work.
On Wed, 03 Feb 2010 11:47:19 -0500
Michael Breuer <[email protected]> wrote:
> Tracked this down. The status being returned is 0x3c0080 - good flow
> control packets. Nothing is actually being dropped (confirmed by packet
> trace on switch compared with packet trace on server).
>
> I whipped up a trivial patch to not count these as dropped packets and
> will post to netdev.
>
> I'm not really sure what the driver should be doing in this case, but
> resubmit seems to work.
Looks like a flow control negotiation issue. You probably turned off
flow control on the Linux side, but the switch is still doing flow
control.
--
On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 11:47:19 -0500
> Michael Breuer<[email protected]> wrote:
>
>
>> Tracked this down. The status being returned is 0x3c0080 - good flow
>> control packets. Nothing is actually being dropped (confirmed by packet
>> trace on switch compared with packet trace on server).
>>
>> I whipped up a trivial patch to not count these as dropped packets and
>> will post to netdev.
>>
>> I'm not really sure what the driver should be doing in this case, but
>> resubmit seems to work.
>>
> Looks like a flow control negotiation issue. You probably turned off
> flow control on the Linux side, but the switch is still doing flow
> control.
>
>
According to the driver:
Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
duplex, flow control both
So if the rx flow control packet status is due to flow control being
disabled, then there's a different issue.
On 02/03/10 08:56, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 11:47:19 -0500
> Michael Breuer<[email protected]> wrote:
>
>> Tracked this down. The status being returned is 0x3c0080 - good flow
>> control packets. Nothing is actually being dropped (confirmed by packet
>> trace on switch compared with packet trace on server).
>>
>> I whipped up a trivial patch to not count these as dropped packets and
>> will post to netdev.
>>
>> I'm not really sure what the driver should be doing in this case, but
>> resubmit seems to work.
>
> Looks like a flow control negotiation issue. You probably turned off
> flow control on the Linux side, but the switch is still doing flow
> control.
>
I noticed this in a hotel last week, I can try
at the home to see if this fires of, if so
I can test any patches you have.
Justin P. Mattock
On 02/03/10 09:07, Michael Breuer wrote:
> On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
>> On Wed, 03 Feb 2010 11:47:19 -0500
>> Michael Breuer<[email protected]> wrote:
>>
>>> Tracked this down. The status being returned is 0x3c0080 - good flow
>>> control packets. Nothing is actually being dropped (confirmed by packet
>>> trace on switch compared with packet trace on server).
>>>
>>> I whipped up a trivial patch to not count these as dropped packets and
>>> will post to netdev.
>>>
>>> I'm not really sure what the driver should be doing in this case, but
>>> resubmit seems to work.
>> Looks like a flow control negotiation issue. You probably turned off
>> flow control on the Linux side, but the switch is still doing flow
>> control.
>>
> According to the driver:
> Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
> duplex, flow control both
>
> So if the rx flow control packet status is due to flow control being
> disabled, then there's a different issue.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
hmm.. after an hour or so I'm not seeing anything.
from what I remember I turned the machine on in the
hotel, then left the system there as I went out for a few hours
(so maybe I need to wait).
Anyways I did keep dmesg of when this occurred, basically
the log was spammed with these:
[ 863.294057] sky2 eth0: rx error, status 0x580002 length 88
[ 865.646645] sky2 eth0: rx error, status 0x600002 length 96
[ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
[ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
[ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
[ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
[ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
[ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
[ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
[ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
[ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
[ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
[ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
[ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
[ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
[ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
[ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
[ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
[ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104
but while using the system with this, I didn't notice anything
out of the ordinary.
(if this fires off I can try a bisect for you guys, but right now
since I'm not seeing anything, might be a different story);
Justin P. Mattock
On Wed, 03 Feb 2010 10:23:01 -0800
"Justin P. Mattock" <[email protected]> wrote:
> On 02/03/10 09:07, Michael Breuer wrote:
> > On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
> >> On Wed, 03 Feb 2010 11:47:19 -0500
> >> Michael Breuer<[email protected]> wrote:
> >>
> >>> Tracked this down. The status being returned is 0x3c0080 - good flow
> >>> control packets. Nothing is actually being dropped (confirmed by packet
> >>> trace on switch compared with packet trace on server).
> >>>
> >>> I whipped up a trivial patch to not count these as dropped packets and
> >>> will post to netdev.
> >>>
> >>> I'm not really sure what the driver should be doing in this case, but
> >>> resubmit seems to work.
> >> Looks like a flow control negotiation issue. You probably turned off
> >> flow control on the Linux side, but the switch is still doing flow
> >> control.
> >>
> > According to the driver:
> > Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
> > duplex, flow control both
> >
> > So if the rx flow control packet status is due to flow control being
> > disabled, then there's a different issue.
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
> hmm.. after an hour or so I'm not seeing anything.
> from what I remember I turned the machine on in the
> hotel, then left the system there as I went out for a few hours
> (so maybe I need to wait).
>
> Anyways I did keep dmesg of when this occurred, basically
> the log was spammed with these:
>
> [ 863.294057] sky2 eth0: rx error, status 0x580002 length 88
> [ 865.646645] sky2 eth0: rx error, status 0x600002 length 96
> [ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
> [ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
> [ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
> [ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
> [ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
> [ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
> [ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
> [ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
> [ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
> [ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
> [ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
> [ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
> [ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
> [ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
> [ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
> [ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
> [ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104
>
> but while using the system with this, I didn't notice anything
> out of the ordinary.
> (if this fires off I can try a bisect for you guys, but right now
> since I'm not seeing anything, might be a different story);
>
You were on a crappy hotel switch. Those are all CRC errors.
--
On 02/03/10 10:25, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 10:23:01 -0800
> "Justin P. Mattock"<[email protected]> wrote:
>
>> On 02/03/10 09:07, Michael Breuer wrote:
>>> On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
>>>> On Wed, 03 Feb 2010 11:47:19 -0500
>>>> Michael Breuer<[email protected]> wrote:
>>>>
>>>>> Tracked this down. The status being returned is 0x3c0080 - good flow
>>>>> control packets. Nothing is actually being dropped (confirmed by packet
>>>>> trace on switch compared with packet trace on server).
>>>>>
>>>>> I whipped up a trivial patch to not count these as dropped packets and
>>>>> will post to netdev.
>>>>>
>>>>> I'm not really sure what the driver should be doing in this case, but
>>>>> resubmit seems to work.
>>>> Looks like a flow control negotiation issue. You probably turned off
>>>> flow control on the Linux side, but the switch is still doing flow
>>>> control.
>>>>
>>> According to the driver:
>>> Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
>>> duplex, flow control both
>>>
>>> So if the rx flow control packet status is due to flow control being
>>> disabled, then there's a different issue.
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>>
>> hmm.. after an hour or so I'm not seeing anything.
>> from what I remember I turned the machine on in the
>> hotel, then left the system there as I went out for a few hours
>> (so maybe I need to wait).
>>
>> Anyways I did keep dmesg of when this occurred, basically
>> the log was spammed with these:
>>
>> [ 863.294057] sky2 eth0: rx error, status 0x580002 length 88
>> [ 865.646645] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
>> [ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
>> [ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
>> [ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
>> [ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
>> [ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
>> [ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
>> [ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
>> [ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
>> [ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104
>>
>> but while using the system with this, I didn't notice anything
>> out of the ordinary.
>> (if this fires off I can try a bisect for you guys, but right now
>> since I'm not seeing anything, might be a different story);
>>
>
> You were on a crappy hotel switch. Those are all CRC errors.
>
>
>
alright.. makes sense cause I'm not getting
any of this now at home.
Justin P. Mattock