Hi,
Using kernel 2.6.6 on a m/b with an builtin sis900 network adapter i get
a soft hang during HEAVY network traffic (ie. 7.5Mb/s). I tried to
enable sysrq to no avail. Keyboard is still active but any attempt to
run any command hangs.
Disabling hyperthreading in the bios seems to solve the problem. I think
this is smp related.
When using 2.4.x all is ok with hyperthreading enabled.
here is some info with hyperthreading disabled.... if you need anymore
info plz don't hesitate to mail me directly.
[[email protected] root]# lspci -v
00:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device
0648 (rev 50)
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32
Memory at c0000000 (32-bit, non-prefetchable) [size=256M]
Capabilities: [c0] AGP version 3.5
00:01.0 PCI bridge: Silicon Integrated Systems [SiS]: Unknown device
0003 (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, fast devsel, latency 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
Memory behind bridge: d8000000-d9ffffff
Prefetchable memory behind bridge: d0000000-d7ffffff
00:02.0 ISA bridge: Silicon Integrated Systems [SiS]: Unknown device
0963 (rev 25)
Flags: bus master, medium devsel, latency 0
00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016
Flags: medium devsel, IRQ 169
I/O ports at 1080 [size=32]
00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE]
(prog-if 80 [Master])
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 128, IRQ 193
I/O ports at 4000 [size=16]
Capabilities: [58] Power Management version 2
00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS]
SiS7012 PCI Audio Accelerator (rev a0)
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32, IRQ 177
I/O ports at dc00 [size=256]
I/O ports at e000 [size=128]
Capabilities: [48] Power Management version 2
00:03.0 USB Controller: Silicon Integrated Systems [SiS] 7001 (rev 0f)
(prog-if 10 [OHCI])
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32, IRQ 201
Memory at da023000 (32-bit, non-prefetchable) [size=4K]
00:03.1 USB Controller: Silicon Integrated Systems [SiS] 7001 (rev 0f)
(prog-if 10 [OHCI])
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32, IRQ 209
Memory at da020000 (32-bit, non-prefetchable) [size=4K]
00:03.3 USB Controller: Silicon Integrated Systems [SiS]: Unknown device
7002 (prog-if 20 [EHCI])
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32, IRQ 225
Memory at da021000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [50] Power Management version 2
00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900
10/100 Ethernet (rev 91)
Subsystem: Elitegroup Computer Systems: Unknown device 1803
Flags: bus master, medium devsel, latency 32, IRQ 185
I/O ports at e400 [size=256]
Memory at da022000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [40] Power Management version 2
00:0b.0 FireWire (IEEE 1394): Lucent Microelectronics FW323 (rev 61)
(prog-if 10 [OHCI])
Subsystem: Lucent Microelectronics FW323
Flags: bus master, medium devsel, latency 32, IRQ 169
Memory at da024000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2
01:00.0 VGA compatible controller: nVidia Corporation: Unknown device
0281 (rev a1) (prog-if 00 [VGA])
Subsystem: Micro-star International Co Ltd: Unknown device 8943
Flags: bus master, 66Mhz, medium devsel, latency 248, IRQ 193
Memory at d8000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [60] Power Management version 2
Capabilities: [44] AGP version 3.0
[[email protected] root]# cat /proc/interrupts
CPU0
0: 1146943 IO-APIC-edge timer
4: 29 IO-APIC-edge serial
7: 2 IO-APIC-edge parport0
9: 171 IO-APIC-level acpi
14: 134449 IO-APIC-edge ide0
15: 24 IO-APIC-edge ide1
169: 171 IO-APIC-level ohci1394
185: 684341 IO-APIC-level eth0
193: 55506 IO-APIC-level nvidia
201: 12379 IO-APIC-level ohci_hcd
209: 4069 IO-APIC-level ohci_hcd
225: 0 IO-APIC-level ehci_hcd
NMI: 0
LOC: 1146994
ERR: 0
MIS: 0
[[email protected] root]# cat /proc/iomem
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000ce3ff : Video ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
00100000-004eb0ed : Kernel code
004eb0ee-0071faff : Kernel data
1fff0000-1fff2fff : ACPI Non-volatile Storage
1fff3000-1fffffff : ACPI Tables
c0000000-cfffffff : 0000:00:00.0
d0000000-d7ffffff : PCI Bus #01
d0000000-d7ffffff : 0000:01:00.0
d8000000-d9ffffff : PCI Bus #01
d8000000-d8ffffff : 0000:01:00.0
da020000-da020fff : 0000:00:03.1
da020000-da020fff : ohci_hcd
da021000-da021fff : 0000:00:03.3
da021000-da021fff : ehci_hcd
da022000-da022fff : 0000:00:04.0
da022000-da022fff : sis900
da023000-da023fff : 0000:00:03.0
da023000-da023fff : ohci_hcd
da024000-da024fff : 0000:00:0b.0
da024000-da0247ff : ohci1394
fec00000-ffffffff : reserved
[[email protected] root]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 9
cpu MHz : 2806.126
cache size : 512 KB
physical id : 0
siblings : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5570.56
On Fri, Jun 04, 2004 at 09:02:52PM +0000, Nigel Kukard wrote:
> Hi,
>
> Using kernel 2.6.6 on a m/b with an builtin sis900 network adapter i get
> a soft hang during HEAVY network traffic (ie. 7.5Mb/s). I tried to
> enable sysrq to no avail. Keyboard is still active but any attempt to
> run any command hangs.
>
> Disabling hyperthreading in the bios seems to solve the problem. I think
> this is smp related.
Disabling support in the kernel at compile time makes any difference ?
Unfortunately I don't have any SMP hardware to try to reproduce this
problem.
> When using 2.4.x all is ok with hyperthreading enabled.
This is important. The driver has some differences between the two
versions, but none of them is releated to SMP. I'll chack again, but if
someone with some more smp-karma than me wants to join, he is most
welcome...
So I'm leaning towards blaming a problem outside the sis900 driver
itself.
Thanks.
--
-----------------------------
Daniele Venzano
Web: http://teg.homeunix.org
Daniele Venzano <[email protected]> :
[...]
> > When using 2.4.x all is ok with hyperthreading enabled.
> This is important. The driver has some differences between the two
> versions, but none of them is releated to SMP. I'll chack again, but if
> someone with some more smp-karma than me wants to join, he is most
> welcome...
I have not checked the latest version of the driver but 2.6.7-rc2 seems
to give a Rx ring descriptor to the asic just before the Rx buffer
address is set. One would expect a different crash though.
--
Ueimor
On Sat, Jun 05, 2004 at 12:50:33AM +0200, Francois Romieu wrote:
> Daniele Venzano <[email protected]> :
> [...]
> > > When using 2.4.x all is ok with hyperthreading enabled.
> > This is important. The driver has some differences between the two
> > versions, but none of them is releated to SMP. I'll chack again, but if
> > someone with some more smp-karma than me wants to join, he is most
> > welcome...
>
> I have not checked the latest version of the driver but 2.6.7-rc2 seems
> to give a Rx ring descriptor to the asic just before the Rx buffer
> address is set. One would expect a different crash though.
>
> --
> Ueimor
Any quick fix i can hack?
-Nigel
Nigel Kukard <[email protected]> :
[...]
> Any quick fix i can hack?
Instant hack below. I do not expect it to make a difference but it _could_
make one.
You probably want to increase NUM_RX_DESC in sis900.h as well and see if
it changes things: at 7.5Mb/s, it takes 3ms of interrupt processing latency
before the network adapter exhaust the Rx ring (this should appear on the
output of 'ifconfig' btw). So if anything keeps the irq masked for that long,
you experience the usually very well tested error/uncommon paths of the
drivers :o)
NUM_RX_DESC at 64 or 256 should not hurt but I do not know if the datasheet
limits the number of Rx descriptors. Fiddling with NUM_RX_DESC could change
the behavior from "computer hangs" to "computer takes noticeably more time
to hang".
--- sis900.c.orig 2004-06-05 11:47:27.000000000 +0200
+++ sis900.c 2004-06-05 12:43:48.000000000 +0200
@@ -1626,7 +1626,7 @@ static int sis900_rx(struct net_device *
"status:0x%8.8x\n",
sis_priv->cur_rx, sis_priv->dirty_rx, rx_status);
- while (rx_status & OWN) {
+ while (rx_status & OWN & sis_priv->rx_skbuff[entry]) {
unsigned int rx_size;
rx_size = (rx_status & DSIZE) - CRC_SIZE;
@@ -1651,16 +1651,6 @@ static int sis900_rx(struct net_device *
} else {
struct sk_buff * skb;
- /* This situation should never happen, but due to
- some unknow bugs, it is possible that
- we are working on NULL sk_buff :-( */
- if (sis_priv->rx_skbuff[entry] == NULL) {
- printk(KERN_INFO "%s: NULL pointer "
- "encountered in Rx ring, skipping\n",
- net_dev->name);
- break;
- }
-
pci_unmap_single(sis_priv->pci_dev,
sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
PCI_DMA_FROMDEVICE);
@@ -1688,18 +1678,21 @@ static int sis900_rx(struct net_device *
"deferring packet.\n",
net_dev->name);
sis_priv->rx_skbuff[entry] = NULL;
- /* reset buffer descriptor state */
- sis_priv->rx_ring[entry].cmdsts = 0;
+ /*
+ * reset buffer descriptor state and keep it
+ * under host control
+ */
+ sis_priv->rx_ring[entry].cmdsts = OWN;
sis_priv->rx_ring[entry].bufptr = 0;
- sis_priv->stats.rx_dropped++;
- break;
+ continue;
}
skb->dev = net_dev;
sis_priv->rx_skbuff[entry] = skb;
- sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
sis_priv->rx_ring[entry].bufptr =
pci_map_single(sis_priv->pci_dev, skb->tail,
RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+ wmb();
+ sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
sis_priv->dirty_rx++;
}
sis_priv->cur_rx++;
@@ -1728,10 +1721,11 @@ static int sis900_rx(struct net_device *
}
skb->dev = net_dev;
sis_priv->rx_skbuff[entry] = skb;
- sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
sis_priv->rx_ring[entry].bufptr =
pci_map_single(sis_priv->pci_dev, skb->tail,
RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+ wmb();
+ sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
}
}
/* re-enable the potentially idle receive state matchine */
> Instant hack below. I do not expect it to make a difference but it _could_
> make one.
much appreciated man!
will test the second i take the box down next :), will be before end
monday.
-Nigel
No luck,
Funny thing is there is nothing in dmesg or even /var/log/messages for
a matter of fact, nor does it kick into sysrq console.
if i ping flood to a ip local on the network from the box giving the
problem, it does not hang.
if i copy something from the box giving the problem onto another pc,
BANG at around 30Mb (varying up to 60Mb) it bombs.
hang is such that nothing can be run, numlock. .. work fine. can't
execute any proggies or anything.
very weird.
I tried the patch to no avail, if i enable debugging the problem does
not occur (so it must be extreme load related).
as I said 2.4.x works fine with hyperthreading, 2.6.x bombs. seeing as
there isn't very much extreme driver change (and seeing as none should
be required) i suspect the problem is deeper.
-Nigel Kukard
On Sat, Jun 05, 2004 at 01:05:26PM +0200, Francois Romieu wrote:
> Nigel Kukard <[email protected]> :
> [...]
> > Any quick fix i can hack?
>
> Instant hack below. I do not expect it to make a difference but it _could_
> make one.
>
> You probably want to increase NUM_RX_DESC in sis900.h as well and see if
> it changes things: at 7.5Mb/s, it takes 3ms of interrupt processing latency
> before the network adapter exhaust the Rx ring (this should appear on the
> output of 'ifconfig' btw). So if anything keeps the irq masked for that long,
> you experience the usually very well tested error/uncommon paths of the
> drivers :o)
>
> NUM_RX_DESC at 64 or 256 should not hurt but I do not know if the datasheet
> limits the number of Rx descriptors. Fiddling with NUM_RX_DESC could change
> the behavior from "computer hangs" to "computer takes noticeably more time
> to hang".
>
> --- sis900.c.orig 2004-06-05 11:47:27.000000000 +0200
> +++ sis900.c 2004-06-05 12:43:48.000000000 +0200
> @@ -1626,7 +1626,7 @@ static int sis900_rx(struct net_device *
> "status:0x%8.8x\n",
> sis_priv->cur_rx, sis_priv->dirty_rx, rx_status);
>
> - while (rx_status & OWN) {
> + while (rx_status & OWN & sis_priv->rx_skbuff[entry]) {
> unsigned int rx_size;
>
> rx_size = (rx_status & DSIZE) - CRC_SIZE;
> @@ -1651,16 +1651,6 @@ static int sis900_rx(struct net_device *
> } else {
> struct sk_buff * skb;
>
> - /* This situation should never happen, but due to
> - some unknow bugs, it is possible that
> - we are working on NULL sk_buff :-( */
> - if (sis_priv->rx_skbuff[entry] == NULL) {
> - printk(KERN_INFO "%s: NULL pointer "
> - "encountered in Rx ring, skipping\n",
> - net_dev->name);
> - break;
> - }
> -
> pci_unmap_single(sis_priv->pci_dev,
> sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
> PCI_DMA_FROMDEVICE);
> @@ -1688,18 +1678,21 @@ static int sis900_rx(struct net_device *
> "deferring packet.\n",
> net_dev->name);
> sis_priv->rx_skbuff[entry] = NULL;
> - /* reset buffer descriptor state */
> - sis_priv->rx_ring[entry].cmdsts = 0;
> + /*
> + * reset buffer descriptor state and keep it
> + * under host control
> + */
> + sis_priv->rx_ring[entry].cmdsts = OWN;
> sis_priv->rx_ring[entry].bufptr = 0;
> - sis_priv->stats.rx_dropped++;
> - break;
> + continue;
> }
> skb->dev = net_dev;
> sis_priv->rx_skbuff[entry] = skb;
> - sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->rx_ring[entry].bufptr =
> pci_map_single(sis_priv->pci_dev, skb->tail,
> RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> + wmb();
> + sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->dirty_rx++;
> }
> sis_priv->cur_rx++;
> @@ -1728,10 +1721,11 @@ static int sis900_rx(struct net_device *
> }
> skb->dev = net_dev;
> sis_priv->rx_skbuff[entry] = skb;
> - sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->rx_ring[entry].bufptr =
> pci_map_single(sis_priv->pci_dev, skb->tail,
> RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> + wmb();
> + sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> }
> }
> /* re-enable the potentially idle receive state matchine */
Any more ideas? :(
-Nigel
On Sat, Jun 05, 2004 at 01:05:26PM +0200, Francois Romieu wrote:
> Nigel Kukard <[email protected]> :
> [...]
> > Any quick fix i can hack?
>
> Instant hack below. I do not expect it to make a difference but it _could_
> make one.
>
> You probably want to increase NUM_RX_DESC in sis900.h as well and see if
> it changes things: at 7.5Mb/s, it takes 3ms of interrupt processing latency
> before the network adapter exhaust the Rx ring (this should appear on the
> output of 'ifconfig' btw). So if anything keeps the irq masked for that long,
> you experience the usually very well tested error/uncommon paths of the
> drivers :o)
>
> NUM_RX_DESC at 64 or 256 should not hurt but I do not know if the datasheet
> limits the number of Rx descriptors. Fiddling with NUM_RX_DESC could change
> the behavior from "computer hangs" to "computer takes noticeably more time
> to hang".
>
> --- sis900.c.orig 2004-06-05 11:47:27.000000000 +0200
> +++ sis900.c 2004-06-05 12:43:48.000000000 +0200
> @@ -1626,7 +1626,7 @@ static int sis900_rx(struct net_device *
> "status:0x%8.8x\n",
> sis_priv->cur_rx, sis_priv->dirty_rx, rx_status);
>
> - while (rx_status & OWN) {
> + while (rx_status & OWN & sis_priv->rx_skbuff[entry]) {
> unsigned int rx_size;
>
> rx_size = (rx_status & DSIZE) - CRC_SIZE;
> @@ -1651,16 +1651,6 @@ static int sis900_rx(struct net_device *
> } else {
> struct sk_buff * skb;
>
> - /* This situation should never happen, but due to
> - some unknow bugs, it is possible that
> - we are working on NULL sk_buff :-( */
> - if (sis_priv->rx_skbuff[entry] == NULL) {
> - printk(KERN_INFO "%s: NULL pointer "
> - "encountered in Rx ring, skipping\n",
> - net_dev->name);
> - break;
> - }
> -
> pci_unmap_single(sis_priv->pci_dev,
> sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
> PCI_DMA_FROMDEVICE);
> @@ -1688,18 +1678,21 @@ static int sis900_rx(struct net_device *
> "deferring packet.\n",
> net_dev->name);
> sis_priv->rx_skbuff[entry] = NULL;
> - /* reset buffer descriptor state */
> - sis_priv->rx_ring[entry].cmdsts = 0;
> + /*
> + * reset buffer descriptor state and keep it
> + * under host control
> + */
> + sis_priv->rx_ring[entry].cmdsts = OWN;
> sis_priv->rx_ring[entry].bufptr = 0;
> - sis_priv->stats.rx_dropped++;
> - break;
> + continue;
> }
> skb->dev = net_dev;
> sis_priv->rx_skbuff[entry] = skb;
> - sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->rx_ring[entry].bufptr =
> pci_map_single(sis_priv->pci_dev, skb->tail,
> RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> + wmb();
> + sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->dirty_rx++;
> }
> sis_priv->cur_rx++;
> @@ -1728,10 +1721,11 @@ static int sis900_rx(struct net_device *
> }
> skb->dev = net_dev;
> sis_priv->rx_skbuff[entry] = skb;
> - sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> sis_priv->rx_ring[entry].bufptr =
> pci_map_single(sis_priv->pci_dev, skb->tail,
> RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> + wmb();
> + sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
> }
> }
> /* re-enable the potentially idle receive state matchine */
Nigel Kukard <[email protected]> :
> Any more ideas? :(
Tried to increase NUM_{RX/TX}_DESC ?
--
Ueimor
Can't remember if i tried it last time, but i just tried it again now
and increased it to 256 each. It got 30Mb further than last time in the
2Gb file i was ftp'ing over, but died with the exact same symptoms.
On Mon, Jun 14, 2004 at 08:39:17PM +0200, Francois Romieu wrote:
>
> Tried to increase NUM_{RX/TX}_DESC ?
>
> --
> Ueimor
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
It must be non-sis900 driver related, is there a way i can get more
debugging info?
error doesn't occur if i enable sis900 debugging, might be because the
hardware isn't operating at full speed.
-Nigel
On Mon, Jun 14, 2004 at 08:39:17PM +0200, Francois Romieu wrote:
> Nigel Kukard <[email protected]> :
> > Any more ideas? :(
>
> Tried to increase NUM_{RX/TX}_DESC ?
>
Nigel Kukard <[email protected]> wrote:
>
> It must be non-sis900 driver related, is there a way i can get more
> debugging info?
Please try booting with noapic and/or acpi=off. APIC is known to
cause problems on many SIS chipsets.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt