2001-12-22 17:02:09

by Stefan Frank

[permalink] [raw]
Subject: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?


Hi,

since some time now, running the ht://dig indexing cron job locked my
machine hard (SMP, 2x866 PIII with 1G memory, Highmem(4G) enabled),
only SysReq was still working. Yesterday i enabled it again,
and, suprise surprise, it survived the night.

But this messsage appears in the logs:

Dec 22 00:01:25 obelix kernel: KERNEL: assertion (tp->copied_seq ==
tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at
tcp.c(1520):tcp_recvmsg

Dec 22 00:01:53 obelix last message repeated 14 times

Note that this cronjob is started at midnight.

I'm running kernel 2.4.17rc1 on a Debian Sid/Unstable box.
The web server (running on the same machine!) is debians apache 1.3.22-2 package.

Here's the output of lspci

00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo
PRO133x] (rev c4)
Subsystem: Asustek Computer, Inc.: Unknown device 8038
Flags: bus master, medium devsel, latency 0
Memory at fd000000 (32-bit, prefetchable) [size=16M]
Capabilities: [a0] AGP version 2.0
Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Capabilities: [80] Power Management version 2

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 40)
Subsystem: Asustek Computer, Inc.: Unknown device 8038
Flags: bus master, stepping, medium devsel, latency 0
Capabilities: [c0] Power Management version 2

00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
(prog-if 8a [Master SecP PriP])
Flags: medium devsel
I/O ports at d800 [size=16]
Capabilities: [c0] Power Management version 2

00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
(prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 12
I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2
00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
(prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 12
I/O ports at d000 [size=32]
Capabilities: [80] Power Management version 2

00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 40)
Flags: medium devsel
Capabilities: [68] Power Management version 2

00:0b.0 SCSI storage controller: LSI Logic / Symbios Logic (formerly
NCR) 53c895 (rev 01)
Subsystem: Tekram Technology Co.,Ltd.: Unknown device 3907
Flags: bus master, medium devsel, latency 32, IRQ 10
I/O ports at b800 [size=256]
Memory at fc000000 (32-bit, non-prefetchable) [size=256]
Memory at fb800000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at <unassigned> [disabled] [size=64K]

00:0c.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+]
(rev 40) (prog-if 00 [VGA])
Flags: medium devsel, IRQ 11
Memory at f4000000 (32-bit, non-prefetchable) [size=64M]
Expansion ROM at 000c0000 [disabled] [size=64K]

00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139
(rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Flags: bus master, medium devsel, latency 32, IRQ 15
I/O ports at b400 [size=256]
Memory at f3800000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

I'm using the 8139too driver built into the kernel for the Realtek card.

These modules are usually loaded:

obelix:~# lsmod
Module Size Used by Tainted: P
ide-probe-mod 8224 0 (autoclean)
ide-mod 58596 0 (autoclean) [ide-probe-mod]
ipt_TOS 1344 13 (autoclean)
iptable_mangle 2016 0 (autoclean) (unused)
ipt_REDIRECT 1088 1 (autoclean)
ipt_MASQUERADE 1664 1 (autoclean)
iptable_nat 16756 0 (autoclean) [ipt_REDIRECT
ipt_MASQUERADE]
ipt_state 928 2 (autoclean)
ip_conntrack 16652 2 (autoclean) [ipt_REDIRECT
ipt_MASQUERADE iptable_nat ipt_state]
ipt_LOG 3584 5 (autoclean)
ipt_limit 1344 12 (autoclean)
iptable_filter 2048 0 (autoclean) (unused)
ip_tables 11328 11 [ipt_TOS iptable_mangle ipt_REDIRECT
ipt_MASQUERADE iptable_nat ipt_state ipt_LOG ipt_limit iptable_filter]
ospm_processor 5984 0 (unused)
ospm_button 3264 0 (unused)
ospm_system 6028 0 (unused)
ospm_busmgr 11904 0 [ospm_processor ospm_button
ospm_system]
rtc 6456 0 (autoclean)

I will try the official 2.4.17 kernel and see how it goes.

Happy Christmas to all of you !

Stefan

--
Misery loves company, but company does not reciprocate.


2001-12-22 23:58:32

by David Miller

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?


What compiler are you using to build these kernels? To be honest
the assertion you have triggered ought to be impossible and this is
the first report I've ever seen of it triggering.

2001-12-23 02:44:59

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?

Em Sat, Dec 22, 2001 at 03:57:13PM -0800, David S. Miller escreveu:
> What compiler are you using to build these kernels? To be honest
> the assertion you have triggered ought to be impossible and this is
> the first report I've ever seen of it triggering.

IIRC he said he (or another guy with the same problem) was using gcc
3.0.something available in Red Hat rawhide.

- Arnaldo

2001-12-23 02:52:59

by Bill Nottingham

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?

Arnaldo Carvalho de Melo ([email protected]) said:
> Em Sat, Dec 22, 2001 at 03:57:13PM -0800, David S. Miller escreveu:
> > What compiler are you using to build these kernels? To be honest
> > the assertion you have triggered ought to be impossible and this is
> > the first report I've ever seen of it triggering.
>
> IIRC he said he (or another guy with the same problem) was using gcc
> 3.0.something available in Red Hat rawhide.

If it's the one in rawhide, it's 3.1-0.10, off gcc HEAD. I have yet
to see a kernel boot successfully compiled by this compiler, but
YMMV. :)

Bill

2001-12-23 12:36:19

by Stefan Frank

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?


Hi,

ok, here's a followup on my own mail. I just replaced
all the memory with the one from my workstation (256MB + 128MB)
and htdig's cron job still locks up the machine, so although
the 512MB module might have a bit-error it's NOT the cause of the
problem here. My workstation is running about a year now with this
memory and NEVER locked up like this.

So IMHO the problem lies somewhere else.

Any suggestions? I'm happy to provide more information if it helps.

TIA for your effort!

Bye, Stefan

PS: the filesystems are all ext3, in case it matters

2001-12-23 12:36:09

by Stefan Frank

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?

Hi David!

On Sat, 22 Dec 2001, David S. Miller wrote:

>
> What compiler are you using to build these kernels? To be honest
> the assertion you have triggered ought to be impossible and this is
> the first report I've ever seen of it triggering.


Ok, i switched to kernel 2.4.17 and it happened again tonight.

Here's the output of ver_linux:


asterix:/usr/src/linux/scripts# ./ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux asterix 2.4.17-a1 #1 Sam Dez 22 13:39:51 CET 2001 i686 unknown

Gnu C 2.95.4
Gnu make 3.79.1
util-linux 2.11n
mount 2.11n
modutils 2.4.11
e2fsprogs 1.25
PPP 2.4.1
Linux C Library 2.2.4
Dynamic linker (ldd) 2.2.4
Procps 2.0.7
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 2.0.11
Modules Loaded nfs lockd sunrpc parport_pc lp parport
matroxfb_maven matroxfb_crtc2 cs46xx ac97_codec soundcore i2c-matroxfb
i2c-algo-bit i2c-core apm ide-scsi rtc

asterix:/usr/src/linux/scripts# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011006 (Debian prerelease)


Note that the kernel was compiled on asterix (another machine).

I also installed memtest86 and ran it a few days ago.

I still need to read up on its doc. but here is one error that came up
after HOURS of memtest86 running.

Address: 190fe988; 400.90MiB
Good: 00100000
Bad: 00000000
Error-Bits: 00100000

Does it mean there's a one-bit error in the first (400.90<512) 512MB Module?
(This machine has 1GB memory in 2x512MB modules)
If so, I will remove this module and try again with only 512 MiB
memory.


Bye, Stefan

2001-12-23 13:43:05

by David Miller

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?

From: Stefan Frank <[email protected]>
Date: Sun, 23 Dec 2001 13:33:20 +0100

Any suggestions? I'm happy to provide more information if it helps.

Try a different compiler, the one you are using is known to generate
bogus kernels.

2001-12-23 13:42:45

by David Miller

[permalink] [raw]
Subject: Re: 2.4.17rc1: KERNEL: assertion failed at tcp.c(1520):tcp_recvmsg ?


Try with a different compiler, as others in this thread have noted the
compiler you are using is flakey at best.