Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Wed, 13 Feb 2002 01:05:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Wed, 13 Feb 2002 01:05:41 -0500 Received: from web12305.mail.yahoo.com ([216.136.173.103]:50696 "HELO web12305.mail.yahoo.com") by vger.kernel.org with SMTP id ; Wed, 13 Feb 2002 01:05:30 -0500 Message-ID: <20020213060529.91301.qmail@web12305.mail.yahoo.com> Date: Tue, 12 Feb 2002 22:05:29 -0800 (PST) From: Raghu Angadi Subject: memory corruption in tcp bind hash buckets on SMP? To: linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, We are seeing kernel crashes which seem to be caused because of some of the members in tcp_bind_bucket object get corrupt. These crashes seem to happen unpredictabily with network activity (need not always be heavy laod). Kernel: 2.4.7-10 SMP (RedHat 7.2) on Dual PIIIs. We kind of have to work with Redhat kernels. Also I searched the list and did not find any relevant info to think this bug was observed or fixed in later kernels. Any insights or suggestions as how to fix it are most welcome. We have seen crashes at two places: oops 1 (added below) happens here: --------- net/ipv4/tcp_minisocks.c:__tcp_tw_hashdance(): bhead = &tcp_bhash[tcp_bhashfn(sk->num)]; spin_lock(&bhead->lock); tw->tb = (struct tcp_bind_bucket *)sk->prev; BUG_TRAP(sk->prev!=NULL); if ((tw->bind_next = tw->tb->owners) != NULL) ===> tw->tb->owners->bind_pprev = &tw->bind_next; ^^^^^^ (owners is NULL _after_ the above if() succedes) tw->tb->owners = (struct sock*)tw; tw->bind_pprev = &tw->tb->owners; spin_unlock(&bhead->lock); oops 2 (added below) happens here: ---------- net/ipv4/tcp_ipv4.c:tcp_v4_get_port(): if (tb != NULL && tb->owners != NULL) { if (tb->fastreuse != 0 && sk->reuse != 0 && sk->state != TCP_LISTEN) { goto success; } else { struct sock *sk2 = tb->owners; int sk_reuse = sk->reuse; for( ; sk2 != NULL; sk2 = sk2->bind_next) { if (sk != sk2 && ======> sk->bound_dev_if == sk2->bound_dev_if) { ^^^ sk2 is 0x1 if (!sk_reuse || !sk2->reuse || sk2->state == TCP_LISTEN) { OOPS 1: ------- pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[] EFLAGS: 00010282 eax: f01edc38 ebx: f01edc20 ecx: f6ebf0d8 edx: 00000000 esi: e8b86a40 edi: f7271ce0 ebp: f6ebf0dc esp: c5339d1c ds: 0018 es: 0018 ss: 0018 Process Swapper (pid: 0, stackpage = c5339000) Stack: f01edc20 e8b86b78 e8b86a40 00000066 c01f7967 e8b86a40 f01edc20 00000000 e8b86b78 00000000 00000102 00000001 c01eb938 e8b86a40 00000001 9caca04c e2fe1ed7 e8b86b78 e8b86a40 ed86f042 c01eef9d e8b86a40 00000005 00001770 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 89 42 1c 8b 43 4c 89 58 08 8b 43 4c 83 c0 08 89 43 1c c6 07 <0> Kernel panic: Aiee, Killing interrupt handler! In interrupt handler - not syncing >>EIP; c01f77db <__tcp_tw_hashdance+eb/110> <===== Trace; c01f7967 Trace; c01eb938 Trace; c01eef9d Trace; c01c9c84 Trace; c01f5a01 Trace; c01f584b Trace; c01f5e3d Trace; c01ddfd0 Trace; c01dd9db Trace; c01de1a4 Trace; c01ddfd0 Trace; c01d3363 Trace; c01ddfd0 Trace; c01d339a Trace; c01dde16 Trace; c01ddfd0 Trace; c01cd0eb Trace; c011efbb Trace; c0108c1d Trace; c0105410 Trace; c0105410 Trace; c02285d8 Trace; c0105410 Trace; c0105410 Trace; c010543d Trace; c01054c2 Trace; c011ad76 <__call_console_drivers+46/60> Trace; c011aeeb Code; c01f77db <__tcp_tw_hashdance+eb/110> 00000000 <_EIP>: Code; c01f77db <__tcp_tw_hashdance+eb/110> <===== 0: 89 42 1c mov %eax,0x1c(%edx) <===== Code; c01f77de <__tcp_tw_hashdance+ee/110> 3: 8b 43 4c mov 0x4c(%ebx),%eax Code; c01f77e1 <__tcp_tw_hashdance+f1/110> 6: 89 58 08 mov %ebx,0x8(%eax) Code; c01f77e4 <__tcp_tw_hashdance+f4/110> 9: 8b 43 4c mov 0x4c(%ebx),%eax Code; c01f77e7 <__tcp_tw_hashdance+f7/110> c: 83 c0 08 add $0x8,%eax Code; c01f77ea <__tcp_tw_hashdance+fa/110> f: 89 43 1c mov %eax,0x1c(%ebx) Code; c01f77ed <__tcp_tw_hashdance+fd/110> 12: c6 07 00 movb $0x0,(%edi) <0> Kernel panic: Aiee, Killing interrupt handler! 11 warnings and 5 errors issued. Results may not be reliable. OOPS 2: ------- ts-lnx16 login: Unable to handle Kernel Null pointer dereference at virtual address 0000000d *pde = 00000000 Oops: 0 CPU: 0 EIP: 0010: [] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010292 eax: 00000004 ebx: cff945c0 ecx: db635580 edx: 00000001 esi: 00001f90 edi: 00000001 ebp: db635580 esp: d12afeb4 ds: 0018 es: 0018 ss: 0018 Process traffic_manager (pid: 25128, stackpage=d12af000) Stack: 00000000 c18cfc80 db635580 d12ae000 00000000 00001f90 c01ffc9d db635580 00001f90 ffffffea d0c41a64 d12aff00 00000010 bfffe550 c01c617f d0c41a64 d12aaf00 00000010 00000000 901f0002 00000000 00000000 00000000 00000000 Call trace: [][][][][][][][][][][] Code: 8b 42 0c 8b 4c 24 1c 39 41 0c 75 e7 85 ff 74 0e 80 7a 26 00 >>EIP; c01f343d <===== Trace; c01ffc9d Trace; c01c617f Trace; c01ff2f0 Trace; c01ffaaa Trace; c01c5f63 Trace; c0150905 Trace; c01c66f5 Trace; c01c6715 Trace; c01c6c6c Trace; c014c0cd Trace; c010716d Code; c01f343d 00000000 <_EIP>: Code; c01f343d <===== 0: 8b 42 0c mov 0xc(%edx),%eax <===== Code; c01f3440 3: 8b 4c 24 1c mov 0x1c(%esp,1),%ecx Code; c01f3444 7: 39 41 0c cmp %eax,0xc(%ecx) Code; c01f3447 a: 75 e7 jne fffffff3 <_EIP+0xfffffff3> c01f3430 Code; c01f3449 c: 85 ff test %edi,%edi Code; c01f344b e: 74 0e je 1e <_EIP+0x1e> c01f345b Code; c01f344d 10: 80 7a 26 00 cmpb $0x0,0x26(%edx) <0>Kernel Panic: Aiee, killing interrupt handler! Thanks, Raghu __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/