2000-12-10 09:27:08

by Petru Paler

[permalink] [raw]
Subject: sparc64 network-related problems

Let me know if you need additional info or testing done.

Bug report (in standard format):

[1.] One line summary of the problem:

Repeated kernel oopses, after a while of functioning under
heavy load.

[2.] Full description of the problem/report:

We use 4 E450 clones for DNS and mail servers. They are
always under heavy load, and after a while (usually a day)
of functioning, they start oopsing and eventually (after
a couple more days) they lock up.

[3.] Keywords (i.e., modules, networking, kernel):

kernel, sparc64, networking

[4.] Kernel version (from /proc/version):

Linux version 2.4.0-test12 (root@grey) (gcc version egcs-2.92.11 19980921 (gcc2 ss-980609 experimental)) #2 SMP Tue Dec 5 11:27:36 EST 2000

It's actually 2.4.0-test12-pre5, with one minor patch to drivers/pci/pci.c
(I added a missing declaration for "tmp" in pci_read_bases() otherwise it
didn't compile).

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

This is only one of the repeated oopses, if you need all of them I will
make the logs available.

skput:over: 000000000053ed64:524 put:-428 dev:eth0 \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
smtp(29923): Kernel bad trap
CPU[2]: local_irq_count[0] irqs_running[0]
TSTATE: 0000004411009601 TPC: 0000000000528b50 TNPC: 0000000000528b54 Y: 15e00000
g0: 0000000000000020 g1: 000020fa29bf28c5 g2: 0000000000410000 g3: 0000000000628000
g4: fffff80000000000 g5: 0000000000000001 g6: fffff800030e8000 g7: 0000000000000000
o0: 0000000000000032 o1: 0000000000629eae o2: 0000000000000032 o3: 0000000000000000
o4: 0000000000629e7b o5: 0000000000629ead sp: fffff800030eb1c1 ret_pc: 0000000000528b48
l0: 000000000064ec00 l1: 7ffffffffffffff8 l2: 8000000000000000 l3: 0800000000000000
l4: 0000000000000077 l5: 0000000000000002 l6: 0000000000000000 l7: 000000000062a278
i0: fffff80020f59b00 i1: fffffffffffffe54 i2: 000000000053ed64 i3: 00000000fffffe54
i4: 00000000000003b8 i5: 0000000000000000 i6: fffff800030eb281 i7: 000000000053ed68
Caller[000000000053ed68]
Caller[000000000055e4e0]
Caller[00000000005255b4]
Caller[0000000000525818]
Caller[000000000045e894]
Caller[000000000040fc34]
Caller[00000000000228fc]
Instruction DUMP: 981223a8 7ffc5ee6 9010000d <91d02005> 30680003 01000000 01000000 9de3bf40 1100167b
CPU[0]: local_irq_count[0] irqs_running[0]
TSTATE: 0000000011f09602 TPC: 0000000000448f68 TNPC: 0000000000448f6c Y: 00000000
g0: 0000000000691800 g1: 0000000000694800 g2: 00000000003fffff g3: 000000000000738a
g4: fffff80000000000 g5: 0000000000000000 g6: fffff8003ec0c000 g7: 0000000000000000
o0: 00000000000000b9 o1: 0000000001148e1b o2: 00000000005a9400 o3: 000000000000001a
o4: 00000000004de180 o5: 0000000000000000 sp: fffff8003ec0f061 ret_pc: 0000000000448e80
l0: 0000000000000001 l1: 00000000005a9798 l2: 000000000062f400 l3: ffffffffffffffff
l4: fffff8003c2f16a0 l5: 0000000000000002 l6: 0000000000630400 l7: 0000000000585d30
i0: 000000000062e500 i1: 0000000000694800 i2: 00000000005a9790 i3: 0000000000000001
i4: 0000000000000000 i5: 000000000000000f i6: fffff8003ec0f121 i7: 0000000000445510

After running through ksymoops:

ksymoops 2.3.4 on sparc64 2.4.0-test12. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.0-test12/ (default)
-m /boot/System.map-2.4.0-test12 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Reading Oops report from the terminal
skput:over: 000000000053ed64:524 put:-428 dev:eth0 \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
smtp(29923): Kernel bad trap
CPU[2]: local_irq_count[0] irqs_running[0]
TSTATE: 0000004411009601 TPC: 0000000000528b50 TNPC: 0000000000528b54 Y: 15e00000
Using defaults from ksymoops -t elf32-sparc -a sparc
g0: 0000000000000020 g1: 000020fa29bf28c5 g2: 0000000000410000 g3: 0000000000628000
g4: fffff80000000000 g5: 0000000000000001 g6: fffff800030e8000 g7: 0000000000000000
o0: 0000000000000032 o1: 0000000000629eae o2: 0000000000000032 o3: 0000000000000000
o4: 0000000000629e7b o5: 0000000000629ead sp: fffff800030eb1c1 ret_pc: 0000000000528b48
l0: 000000000064ec00 l1: 7ffffffffffffff8 l2: 8000000000000000 l3: 0800000000000000
l4: 0000000000000077 l5: 0000000000000002 l6: 0000000000000000 l7: 000000000062a278
i0: fffff80020f59b00 i1: fffffffffffffe54 i2: 000000000053ed64 i3: 00000000fffffe54
i4: 00000000000003b8 i5: 0000000000000000 i6: fffff800030eb281 i7: 000000000053ed68
Caller[000000000053ed68]
Caller[000000000055e4e0]
Caller[00000000005255b4]
Caller[0000000000525818]
Caller[000000000045e894]
Caller[000000000040fc34]
Caller[00000000000228fc]
Instruction DUMP: 981223a8 7ffc5ee6 9010000d <91d02005> 30680003 01000000 01000000 9de3bf40 1100167b

>>PC; 00528b50 <skb_over_panic+30/40> <=====
>>O7; 00528b48 <skb_over_panic+28/40>
>>I7; 0053ed68 <tcp_sendmsg+2e8/c60>
Trace; 0053ed68 <tcp_sendmsg+2e8/c60>
Trace; 0055e4e0 <inet_sendmsg+40/60>
Trace; 005255b4 <sock_sendmsg+74/a0>
Trace; 00525818 <sock_write+98/c0>
Trace; 0045e894 <sys_write+b4/100>
Trace; 0040fc34 <linux_sparc_syscall32+34/40>
Trace; 000228fc Before first symbol
Code; 00528b44 <skb_over_panic+24/40>
0000000000000000 <_PC>:
Code; 00528b44 <skb_over_panic+24/40>
0: 98 12 23 a8 or %o0, 0x3a8, %o4
Code; 00528b48 <skb_over_panic+28/40>
4: 7f fc 5e e6 call fffffffffff17b9c <_PC+0xfffffffffff17b9c> 004406e0 <printk+0/240>
Code; 00528b4c <skb_over_panic+2c/40>
8: 90 10 00 0d mov %o5, %o0
Code; 00528b50 <skb_over_panic+30/40> <=====
c: 91 d0 20 05 ta 5 <=====
Code; 00528b54 <skb_over_panic+34/40>
10: 30 68 00 03 unknown
Code; 00528b58 <skb_over_panic+38/40>
14: 01 00 00 00 nop
Code; 00528b5c <skb_over_panic+3c/40>
18: 01 00 00 00 nop
Code; 00528b60 <skb_under_panic+0/40>
1c: 9d e3 bf 40 save %sp, -192, %sp
Code; 00528b64 <skb_under_panic+4/40>
20: 11 00 16 7b sethi %hi(0x59ec00), %o0

CPU[0]: local_irq_count[0] irqs_running[0]
TSTATE: 0000000011f09602 TPC: 0000000000448f68 TNPC: 0000000000448f6c Y: 00000000
g0: 0000000000691800 g1: 0000000000694800 g2: 00000000003fffff g3: 000000000000738a
g4: fffff80000000000 g5: 0000000000000000 g6: fffff8003ec0c000 g7: 0000000000000000
o0: 00000000000000b9 o1: 0000000001148e1b o2: 00000000005a9400 o3: 000000000000001a
o4: 00000000004de180 o5: 0000000000000000 sp: fffff8003ec0f061 ret_pc: 0000000000448e80
l0: 0000000000000001 l1: 00000000005a9798 l2: 000000000062f400 l3: ffffffffffffffff
l4: fffff8003c2f16a0 l5: 0000000000000002 l6: 0000000000630400 l7: 0000000000585d30
i0: 000000000062e500 i1: 0000000000694800 i2: 00000000005a9790 i3: 0000000000000001
i4: 0000000000000000 i5: 000000000000000f i6: fffff8003ec0f121 i7: 0000000000445510
Warning (Oops_read): Code line not seen, dumping what data is available

>>PC; 00448f68 <timer_bh+128/3c0> <=====
>>O7; 00448e80 <timer_bh+40/3c0>
>>I7; 00445510 <bh_action+70/120>


3 warnings issued. Results may not be reliable.

[6.] A small shell script or example program which triggers the
problem (if possible)

N/A. The problem appears after about one day of heavy load.

[7.] Environment

[7.1.] Software (add the output of the ver_linux script here)

-- Versions installed: (if some fields are empty or look
-- unusual then possibly you have very old versions)
Linux grey 2.4.0-test12 #2 SMP Tue Dec 5 11:27:36 EST 2000 sparc64 unknown
Kernel modules 2.3.11
Gnu C 2.95.2
Gnu Make 3.79.1
Binutils 2.9.5.0.37
Linux C Library 2.1.3
Dynamic linker ldd: version 1.9.11
Procps 2.0.6
Mount 2.10f
Net-tools 2.05
Console-tools 0.2.3
Sh-utils 2.0
Modules Loaded

DNS server: tinydns (from the djbdns 1.02 package)
Mail server: Postfix (Snapshot-20001030)

[7.2.] Processor information (from /proc/cpuinfo):

Two of the servers are:

cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 10
prom : 3.10.7
type : sun4u
ncpus probed : 2
ncpus active : 2
Cpu0Bogo : 398.95
Cpu2Bogo : 399.76
MMU Type : Spitfire
State:
CPU0: online
CPU2: online

The other two are:

cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 10
prom : 3.10.7
type : sun4u
ncpus probed : 2
ncpus active : 2
Cpu0Bogo : 591.46
Cpu2Bogo : 591.46
MMU Type : Spitfire
State:
CPU0: online
CPU2: online

[7.3.] Module information (from /proc/modules):

N/A (no modules loaded)

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

grey:~# cat /proc/ioports
1c802000000-1c80200ffff : PSYCHO1 PBMA
1c802010000-1c80201ffff : PSYCHO1 PBMB
1c802010400-1c8020104ff : Symbios Logic Inc. (formerly NCR) 53c875
1c802010400-1c80201047f : sym53c8xx
1c802010800-1c8020108ff : Symbios Logic Inc. (formerly NCR) 53c875 (#2)
1c802010800-1c80201087f : sym53c8xx
1fe02000000-1fe0200ffff : PSYCHO0 PBMA
1fe02010000-1fe0201ffff : PSYCHO0 PBMB
1fe02010400-1fe020104ff : Emulex Corporation LP7000 Fibre Channel Host Adapter
1fe02010500-1fe020105ff : ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC]
1fe02010800-1fe020108ff : Emulex Corporation LP7000 Fibre Channel Host Adapter (#2)

grey:~# cat /proc/iomem
1c900000000-1c97fffffff : PSYCHO1 PBMA
1c980000000-1c9ffffffff : PSYCHO1 PBMB
1c980002000-1c9800020ff : Symbios Logic Inc. (formerly NCR) 53c875
1c980004000-1c980004fff : Symbios Logic Inc. (formerly NCR) 53c875
1c980006000-1c9800060ff : Symbios Logic Inc. (formerly NCR) 53c875 (#2)
1c980008000-1c980008fff : Symbios Logic Inc. (formerly NCR) 53c875 (#2)
1ff00000000-1ff7fffffff : PSYCHO0 PBMA
1ff80000000-1ffffffffff : PSYCHO0 PBMB
1ff80000000-1ff80000fff : ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC]
1ff80008000-1ff8000ffff : Sun Microsystems Computer Corp. Happy Meal
1ff80020000-1ff8003ffff : ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC]
1ff80040000-1ff8005ffff : Emulex Corporation LP7000 Fibre Channel Host Adapter
1ff80060000-1ff8007ffff : Emulex Corporation LP7000 Fibre Channel Host Adapter (#2)
1ff81000000-1ff81ffffff : ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC]
1ff81000000-1ff81ffffff : atyfb
1ff82000000-1ff82000fff : Emulex Corporation LP7000 Fibre Channel Host Adapter
1ff82002000-1ff820020ff : Emulex Corporation LP7000 Fibre Channel Host Adapter
1ff82004000-1ff82004fff : Emulex Corporation LP7000 Fibre Channel Host Adapter (#2)
1ff82006000-1ff820060ff : Emulex Corporation LP7000 Fibre Channel Host Adapter (#2)
1ff83000000-1ff83ffffff : Sun Microsystems Computer Corp. EBUS
1ff84000000-1ff84ffffff : Sun Microsystems Computer Corp. Happy Meal
1fff0000000-1fff0ffffff : Sun Microsystems Computer Corp. EBUS
1fff0000000-1fff00fffff : flashprom
1fff1000000-1fff17fffff : Sun Microsystems Computer Corp. EBUS
1fff1000000-1fff1001fff : eeprom
1fff130015c-1fff130015d : ecpp
1fff13203f0-1fff13203f7 : fdthree
1fff1340278-1fff1340287 : ecpp
1fff13602f8-1fff13602ff : su_pnp
1fff13803f8-1fff13803ff : su_pnp
1fff1400000-1fff140007f : se
1fff1500000-1fff1500007 : sc
1fff1504000-1fff1504002 : SUNW,pll
1fff1600000-1fff1600003 : i2c
1fff1700000-1fff170000f : ecpp
1fff1706000-1fff170600f : fdthree
1fff1720000-1fff1720003 : fdthree
1fff1724000-1fff1724003 : power
1fff1726000-1fff1726003 : auxio
1fff1728000-1fff1728003 : auxio
1fff172a000-1fff172a003 : auxio
1fff172c000-1fff172c003 : auxio
1fff172f000-1fff172f003 : auxio

[7.5.] PCI information ('lspci -vvv' as root)

grey:~# lspci -vvv
00:00.0 Host bridge: Sun Microsystems Computer Corp. PCI Bus Module
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 64 set

00:06.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 14)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 17 min, 64 max, 17 set, cache line size 10
Interrupt: pin A routed to IRQ 6627584
Region 0: I/O ports at 2010400 [size=256]
Region 1: Memory at 000001c980002000 (32-bit, non-prefetchable) [size=256]
Region 2: Memory at 000001c980004000 (32-bit, non-prefetchable) [size=4K]

00:06.1 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 14)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 17 min, 64 max, 17 set, cache line size 10
Interrupt: pin A routed to IRQ 6627584
Region 0: I/O ports at 2010800 [size=256]
Region 1: Memory at 000001c980006000 (32-bit, non-prefetchable) [size=256]
Region 2: Memory at 000001c980008000 (32-bit, non-prefetchable) [size=4K]

01:00.0 Host bridge: Sun Microsystems Computer Corp. PCI Bus Module
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 64 set

02:00.0 Host bridge: Sun Microsystems Computer Corp. PCI Bus Module
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 64 set

02:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 10 min, 25 max, 10 set, cache line size 10
Region 0: Memory at 000001fff0000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 000001fff1000000 (32-bit, non-prefetchable) [size=8M]
Expansion ROM at 0000000083000000 [disabled] [size=16M]

02:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 10 min, 5 max, 10 set, cache line size 10
Interrupt: pin ? routed to IRQ 6682912
Region 0: Memory at 000001ff80008000 (32-bit, non-prefetchable) [size=32K]
Expansion ROM at 0000000084000000 [disabled] [size=16M]

02:02.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC] (rev 3a) (prog-if 00 [VGA])
Subsystem: ATI Technologies Inc: Unknown device 0088
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 8 min, 8 set, cache line size 10
Interrupt: pin A routed to IRQ 6682368
Region 0: Memory at 000001ff81000000 (32-bit, prefetchable) [size=16M]
Region 1: I/O ports at 2010500 [size=256]
Region 2: [virtual] Memory at 000001ff80000000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at 0000000080020000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 1
Flags: PMEClk- AuxPwr- DSI- D1+ D2+ PME-
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

02:03.0 Fiber Channel: Emulex Corporation LP7000 Fibre Channel Host Adapter (rev 03)
Subsystem: Emulex Corporation: Unknown device f700
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 8 min, 8 set, cache line size 10


[7.6.] SCSI information (from /proc/scsi/scsi)

Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST39236LW Rev: 0010
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: SEAGATE Model: ST39103LW Rev: 0002
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 02 Lun: 00
Vendor: SEAGATE Model: ST39102LW Rev: 0006
Type: Direct-Access ANSI SCSI revision: 02

[7.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

The servers are hooked into a Cisco switch. The link up line says:

eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex.

The default gateway is an Intel box running FreeBSD, also having a
full-duplex link to the switch (but with an EtherExpress Pro card).

grey:/proc# cat mdstat
Personalities : [raid0]
read_ahead 1024 sectors
md0 : active raid0 sdc1[1] sdb1[0]
17776896 blocks 8k chunks

Other kernel log errors:

(only seen once):
eth0: Happy Meal out of receive descriptors, packet dropped.

(very often):
UDP: short packet: 1/48
or
UDP: short packet: 0/36

and a couple of:
sending pkt_too_big to self

--
Petru Paler, mailto:[email protected]
http://www.ppetru.net - ICQ: 41817235


2000-12-10 11:25:39

by David Miller

[permalink] [raw]
Subject: Re: sparc64 network-related problems

Date: Sun, 10 Dec 2000 10:55:53 +0200
From: Petru Paler <[email protected]>

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

This is only one of the repeated oopses, if you need all of them I will
make the logs available.

Is this always the _first_ OOPS though? That is what is important,
because after the first OOPS all the others are likely just side
effects of the first one.

Anyways, if it is always the first OOPS, the following debugging patch
may help because this case is the only way that OOPS could possibly
happen all by itself.

--- net/ipv4/tcp.c.~1~ Tue Nov 28 08:33:08 2000
+++ net/ipv4/tcp.c Sun Dec 10 02:36:43 2000
@@ -1014,6 +1014,14 @@

/* Determine how large of a buffer to allocate. */
tmp = MAX_TCP_HEADER + 15 + tp->mss_cache;
+#if 1
+ if (copy > tmp) {
+ printk("TCP: MSS out of sync copy(%d) tmp(%d) "
+ "mss_now(%d) mss_cache(%d)\n",
+ copy, tmp, mss_now, tp->mss_cache);
+ copy = tmp - (MAX_TCP_HEADER + 15);
+ }
+#endif
if (copy < mss_now && !(flags & MSG_OOB)) {
/* What is happening here is that we want to
* tack on later members of the users iovec

2000-12-10 11:41:44

by Petru Paler

[permalink] [raw]
Subject: Re: sparc64 network-related problems

On Sun, Dec 10, 2000 at 02:38:28AM -0800, David S. Miller wrote:
> Is this always the _first_ OOPS though? That is what is important,
> because after the first OOPS all the others are likely just side
> effects of the first one.

No, it was not the first one. Here's the ksymoops'ed first one:

ksymoops 2.3.4 on sparc64 2.4.0-test12. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.0-test12/ (default)
-m /boot/System.map-2.4.0-test12 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Dec 8 01:40:48 grey kernel: skput:over: 000000000053ed64:524 put:-428 dev:eth0 \|/ ____ \|/
Dec 8 01:40:48 grey kernel: "@'/ .. \`@"
Dec 8 01:40:48 grey kernel: /_| \__/ |_\
Dec 8 01:40:48 grey kernel: \__U_/
Dec 8 01:40:48 grey kernel: smtp(7102): Kernel bad trap
Dec 8 01:40:48 grey kernel: CPU[0]: local_irq_count[0] irqs_running[0]
Dec 8 01:40:48 grey kernel: TSTATE: 0000004411009601 TPC: 0000000000528b50 TNPC: 0000000000528b54 Y: 15e00000
Using defaults from ksymoops -t elf32-sparc -a sparc
Dec 8 01:40:48 grey kernel: g0: 0000000000000020 g1: 0000000000000001 g2: 0000000000000008 g3: 0000000000628000
Dec 8 01:40:48 grey kernel: g4: fffff80000000000 g5: 0000000000000001 g6: fffff800076a8000 g7: 0000000000000000
Dec 8 01:40:48 grey kernel: o0: 0000000000000032 o1: 0000000000629eae o2: 0000000000000032 o3: 0000000000000000
Dec 8 01:40:48 grey kernel: o4: 0000000000629e7b o5: 0000000000629ead sp: fffff800076ab1c1 ret_pc: 0000000000528b48
Dec 8 01:40:48 grey kernel: l0: fffff8003a9ec1a0 l1: 0000000000000008 l2: 0000000000000104 l3: 0000000200000000
Dec 8 01:40:48 grey kernel: l4: 0000000000000062 l5: 0000000000000000 l6: 0000000000000008 l7: 7fffffffffffffff
Dec 8 01:40:48 grey kernel: i0: fffff8003c65aae0 i1: fffffffffffffe54 i2: 000000000053ed64 i3: 00000000fffffe54
Dec 8 01:40:48 grey kernel: i4: 00000000000003b8 i5: 0000000000000000 i6: fffff800076ab281 i7: 000000000053ed68
Dec 8 01:40:48 grey kernel: Caller[000000000053ed68]
Dec 8 01:40:48 grey kernel: Caller[000000000055e4e0]
Dec 8 01:40:48 grey kernel: Caller[00000000005255b4]
Dec 8 01:40:48 grey kernel: Caller[0000000000525818]
Dec 8 01:40:48 grey kernel: Caller[000000000045e894]
Dec 8 01:40:48 grey kernel: Caller[000000000040fc34]
Dec 8 01:40:48 grey kernel: Caller[00000000000228fc]
Dec 8 01:40:48 grey kernel: Instruction DUMP: 981223a8 7ffc5ee6 9010000d <91d02005> 30680003 01000000 01000000 9de3bf40 1100167b

>>PC; 00528b50 <skb_over_panic+30/40> <=====
>>O7; 00528b48 <skb_over_panic+28/40>
>>I7; 0053ed68 <tcp_sendmsg+2e8/c60>
Trace; 0053ed68 <tcp_sendmsg+2e8/c60>
Trace; 0055e4e0 <inet_sendmsg+40/60>
Trace; 005255b4 <sock_sendmsg+74/a0>
Trace; 00525818 <sock_write+98/c0>
Trace; 0045e894 <sys_write+b4/100>
Trace; 0040fc34 <linux_sparc_syscall32+34/40>
Trace; 000228fc Before first symbol
Code; 00528b44 <skb_over_panic+24/40>
0000000000000000 <_PC>:
Code; 00528b44 <skb_over_panic+24/40>
0: 98 12 23 a8 or %o0, 0x3a8, %o4
Code; 00528b48 <skb_over_panic+28/40>
4: 7f fc 5e e6 call fffffffffff17b9c <_PC+0xfffffffffff17b9c> 004406e0 <printk+0/240>
Code; 00528b4c <skb_over_panic+2c/40>
8: 90 10 00 0d mov %o5, %o0
Code; 00528b50 <skb_over_panic+30/40> <=====
c: 91 d0 20 05 ta 5 <=====
Code; 00528b54 <skb_over_panic+34/40>
10: 30 68 00 03 unknown
Code; 00528b58 <skb_over_panic+38/40>
14: 01 00 00 00 nop
Code; 00528b5c <skb_over_panic+3c/40>
18: 01 00 00 00 nop
Code; 00528b60 <skb_under_panic+0/40>
1c: 9d e3 bf 40 save %sp, -192, %sp
Code; 00528b64 <skb_under_panic+4/40>
20: 11 00 16 7b sethi %hi(0x59ec00), %o0

Dec 8 01:40:48 grey kernel: CPU[2]: local_irq_count[0] irqs_running[0]
Dec 8 01:40:48 grey kernel: TSTATE: 0000000011009605 TPC: 0000000000449e94 TNPC: 0000000000449e98 Y: 05000000
Dec 8 01:40:48 grey kernel: g0: 80000000000006b0 g1: 0000000000000000 g2: 0000000000000000 g3: 00000000007fffff
Dec 8 01:40:48 grey kernel: g4: fffff80000000000 g5: 0000000000000003 g6: fffff8003e68c000 g7: 0000000000000003
Dec 8 01:40:48 grey kernel: o0: 000000000223e000 o1: 0000000000000000 o2: fffff8003f110000 o3: 0000000000800000
Dec 8 01:40:48 grey kernel: o4: 000000000001ff55 o5: fffff800002d8030 sp: fffff8003e68f481 ret_pc: 0000000000449f14
Dec 8 01:40:48 grey kernel: l0: 00000000000002c7 l1: fffff8003f1109c0 l2: 0000000086000000 l3: 0000000000000000
Dec 8 01:40:48 grey kernel: l4: 000000008823e000 l5: fffff8003f1a0430 l6: 0000000000000003 l7: 000001ffffffe000
Dec 8 01:40:48 grey kernel: i0: 0000000000800000 i1: 000000007012c000 i2: fffff80000505428 i3: fffff8003ee66000
Dec 8 01:40:48 grey kernel: i4: 0000000000000000 i5: 000000008823e000 i6: fffff8003e68f551 i7: 000000000044cfa8
Warning (Oops_read): Code line not seen, dumping what data is available

>>PC; 00449e94 <zap_page_range+134/280> <=====
>>O7; 00449f14 <zap_page_range+1b4/280>
>>I7; 0044cfa8 <do_munmap+268/300>


3 warnings issued. Results may not be reliable.

So should I apply your patch ?

--
Petru Paler, mailto:[email protected]
http://www.ppetru.net - ICQ: 41817235

2000-12-10 11:44:23

by David Miller

[permalink] [raw]
Subject: Re: sparc64 network-related problems

Date: Sun, 10 Dec 2000 13:10:33 +0200
From: Petru Paler <[email protected]>

So should I apply your patch ?

Yes, this new OOPS you've sent me is in the same place.

Later,
David S. Miller
[email protected]

2000-12-10 11:50:24

by Petru Paler

[permalink] [raw]
Subject: Re: sparc64 network-related problems

On Sun, Dec 10, 2000 at 02:57:21AM -0800, David S. Miller wrote:
> Date: Sun, 10 Dec 2000 13:10:33 +0200
> From: Petru Paler <[email protected]>
>
> So should I apply your patch ?
>
> Yes, this new OOPS you've sent me is in the same place.

Ok, applied. Will email again when/if something shows up in the logs.

Thanks,

--
Petru Paler, mailto:[email protected]
http://www.ppetru.net - ICQ: 41817235

2000-12-24 07:58:32

by Petru Paler

[permalink] [raw]
Subject: Re: sparc64 network-related problems

Follow-up: in the mean time I upgraded to test13-pre3. Things look fine so
far, but I got this in the kernel log:

TCP: peer 203.65.190.178:25/57885 shrinks window 2375104836:0:2375106284. Bad, what else can I say?

Should I be worried about it or it's ok ?

On Sun, Dec 10, 2000 at 02:57:21AM -0800, David S. Miller wrote:
> Date: Sun, 10 Dec 2000 13:10:33 +0200
> From: Petru Paler <[email protected]>
>
> So should I apply your patch ?
>
> Yes, this new OOPS you've sent me is in the same place.

--
Petru Paler, mailto:[email protected]
http://www.ppetru.net - ICQ: 41817235