Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 5 Sep 2002 05:12:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 5 Sep 2002 05:12:49 -0400 Received: from holomorphy.com ([66.224.33.161]:7082 "EHLO holomorphy") by vger.kernel.org with ESMTP id ; Thu, 5 Sep 2002 05:12:46 -0400 Date: Thu, 5 Sep 2002 02:08:30 -0700 From: William Lee Irwin III To: Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: [BUG] __write_lock_failed() oops Message-ID: <20020905090830.GF888@holomorphy.com> Mail-Followup-To: William Lee Irwin III , Andrew Morton , linux-kernel@vger.kernel.org References: <20020905080310.GF18800@holomorphy.com> <3D77190C.F4562547@zip.com.au> <20020905083240.GC888@holomorphy.com> <20020905084502.GD888@holomorphy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Description: brief message Content-Disposition: inline In-Reply-To: <20020905084502.GD888@holomorphy.com> User-Agent: Mutt/1.3.25i Organization: The Domain of Holomorphy Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13020 Lines: 281 On Thu, Sep 05, 2002 at 01:42:52AM -0700, Andrew Morton wrote: >>> That's all the assembly hacks in the rwlock code not having proper >>> stack frames. You may have to ksymoops it. >> At a guess: use-after-free bug against an address_space. You may >>> be able to catch it with slab poisoning. > On Thu, Sep 05, 2002 at 01:32:40AM -0700, William Lee Irwin III wrote: >> (gdb) p/x $eax >> $25 = 0xc0331ca0 >> (gdb) p &tasklist_lock >> $27 = (rwlock_t *) 0xc0331ca0 On Thu, Sep 05, 2002 at 01:45:02AM -0700, William Lee Irwin III wrote: > The NMI oopser is going here as well (nmi_watchdog=2 for extra safety) > so I suspect the tasklist_lock semantics are behaving badly. But it's > not easily reproducible enough to test a quick attempt at a fix if it > can't be recognized a priori. > This is literally so difficult to reproduce it hasn't been seen in 2 > releases. kgdb is still going and dhowells is helping me fish stuff > off the stack. Here is the IRC log of the analysis. I'm leaving the machine untouched so kgdb may be used later on if facts are called into question. Cheers, Bill $3 = 0xc0331ca0 (gdb) p/x *(unsigned long *)$eax $4 = 0xffffff unless the code itself is corrupted it looks reasonable what else should I look for? Could the NMI oopser possibly oops it in this situation? what's the oops say at the beginning? ad which insn did it oops on? s/ad/and/ dhowells: I was running kgdb: Program received signal SIGSEGV, Segmentation fault. --> tkks (~kes@rrcs-se-24-129-177-178.biz.rr.com) has joined #kernel 0xc0106693 in __write_lock_failed () at semaphore.c:176 176 semaphore.c: No such file or directory. in semaphore.c 0xc0106693 <__write_lock_failed+15>: jne 0xc010668b <__write_lock_failed+7> it oopsed on _jne_? dhowells: NFI -- this is all I see. dhowells: I've stared at this several times in the past. load the vmlinux file into gdb and do "disas 0xc0106693" dhowells: it matches --> benh (~benh@m86.net195-132-236.noos.fr) has joined #kernel do you still have kgdb attached to the dying box? <-- daniel has quit (Ping timeout: 604 seconds) if so, dump the bytes at that address, and make sure they're the same as in the vmlinux file dhowells: that's where it got it from in the first place -- it's preloaded into gdb (gdb) p/x (unsigned char [4])0xc0106693 $10 = {0x93, 0x66, 0x10, 0xc0} dhowells: yes that didn't work those four bytes are the address (gdb) p/x (unsigned char [4])0xc0106693 $1 = {0x93, 0x66, 0x10, 0xc0} p/x (unsigned char [4])*(char*)0xc0106693 or just p/x *(int*)0xc0106693 kgdb: (gdb) p/x (unsigned char [4])*(char*)0xc0106693 $11 = {0x75, 0xf6, 0xf0, 0x81} vmlinux: (gdb) p/x (unsigned char [4])*(char*)0xc0106693 $2 = {0x75, 0xf6, 0xf0, 0x81} guess so then hmmm... on mine, I see: (gdb) p/x (char[4])*(char*)0xc01067c8 $2 = {0xf, 0x85, 0xe2, 0xff} which should also be a jne:-/ this is for the jne is __write_lock_failed()? yes What does mine reassemble to? * dhowells fires up his I386 insn pdf yours is right * viro looks at axboe ugh dhowells: what are the odds it's an NMI oops? and so is mine can you access the printk buffer? dhowells: (gdb) p log_buf $3 = '\0' the difference between your disas and mine are JNE rel8 vs JNE rel32 --- dwmw2_gone is now known as dwmw2 wli: p (char*)&log_buf is that what you did? (gdb) p (char*)&log_buf $4 = 0xc034cb40 "" (no What are the odds it's the NMI oopser? try p (char[256])log_buf it might be an NMI, I'm just wondering how to tell wli: what you need to do is to examine the irq_stat[] array (gdb) p log_buf p/x log_buf $12 = "01 (tiotest).\n<3>Out of Memory: Killed process 8902 (tiotest).\n<3>Out of Memory: Killed process 8916 (tiotest).\n<3>Out of Memory: Killed process 8917 (tiotest).\n<3>Out of Memory: Killed process 8918 ("... or that might be the problem p/x (unsigned[5])irq_stat (gdb) p/x irq_stat[((struct thread_info *)((unsigned long)$esp & ~8191UL))->cpu] $19 = {__softirq_pending = 0x0, __syscall_count = 0x0, __ksoftirqd_task = 0xcd1adaa0, idle_timestamp = 0xe1ef9f, __nmi_count = 0xacd8c3} did you compile with -g? yes kgdb implies -g and frame pointers what about p irq_stat[0] (gdb) p irq_stat[0] $20 = {__softirq_pending = 0, __syscall_count = 0, __ksoftirqd_task = 0xcd1df460, idle_timestamp = 14807014, __nmi_count = 12196773} and again? again for what? do the command again (and watch __nmi_count) (gdb) $21 = {__softirq_pending = 0, __syscall_count = 0, __ksoftirqd_task = 0xcd1df460, idle_timestamp = 14807014, __nmi_count = 12196773} how many CPUs do you have? dhowells: it oopsed on cpu 7 dhowells: 16 * dhowells hates wli dhowells: 32 doesn't boot yet, the ioredtbl's are FITH dhowells: 48 is beyond my power by several orders of magnitude. --> kai_ (~kai@pppoe79.swhBachemerstr.Uni-Koeln.DE) has joined #kernel dhowells: 64 and I have to borrow quads from another group and figure out the APIC ID remapping trick. the value in __nmi_count looks weird * dhowells really hates wli dhowells: I will be your testmonkey if you care to debug. =) * dhowells grins what's the __nmi_count on cpu7? dhowells: 32 can be brought up in 15 minutes or so, maybe 30, depending on how slow the console is. (gdb) p irq_stat[7] $24 = {__softirq_pending = 0, __syscall_count = 0, __ksoftirqd_task = 0xcd1adaa0, idle_timestamp = 14806943, __nmi_count = 11327683} what's p log_end-(u_long)&log_bug (gdb) p/x $eax $25 = 0xc0331ca0 (gdb) p &tasklist_lock $27 = (rwlock_t *) 0xc0331ca0 you could try p (char[256])*(char*)(log_end-256) (gdb) p (char[256])*(char*)(log_end-256) Cannot access memory at address 0x284d9 try p log_end #kernel> dhowells: 32 doesn't boot yet, the ioredtbl's are FITH #kernel> dhowells: 48 is beyond my power by several orders of magnitude. *** kai_ (~kai@pppoe79.swhBachemerstr.Uni-Koeln.DE) has joined channel #kernel #kernel> dhowells: 64 and I have to borrow quads from another group and figure +out the APIC ID remapping trick. the value in __nmi_count looks weird * dhowells:#kernel really hates wli #kernel> dhowells: I will be your testmonkey if you care to debug. =) * dhowells:#kernel grins what's the __nmi_count on cpu7? argh try p (char[256])*(char*)(log_buf+log_end-256) sorry (gdb) p log_end $28 = 165337 (gdb) p (char[256])*(char*)(log_buf+log_end-256) $29 = '\0' try p (char[256])*(char*)(log_buf+(log_end&65535)-256) log_end needs masking (gdb) p (char[256])*(char*)(log_buf+(log_end&65535)-256) $30 = "(tiotest).\n<3>Out of Memory: Killed process 9741 (tiotest).\n<3>Out of Memory: Killed process 9743 (tiotest).\n<3>Out of Memory: Killed process 9744 (tiotest).\n<3>Out of Memory: Killed process 9745 (tio"... hmmm no obvious oops... but it may have been overwritten :-/ tasklist_lock smells of NMI oopsing. tasklist_lock hold/wait times are beyond ridiculous and into the realm of flat-out bugginess. yeah... I think I have to agree The tasklist_lock is basically a bug. can you use some sort of serial console or net console? I am using a serial console. well, if there was an oops, you should've seen it fly past on that dhowells: kgdb traps the oops before it comes out there -- this is very difficult to reproduce, so... it's more or less "take the one shot we've got or wait a month before it happens again" so the oops is still pending? dhowells: it's been pending since NUMA-Q starting booting again around 2.5.20. I mean in your current kgdb session yes **** then you should be able to track back up the stack and locate it dhowells: I' do_nmi+N ought to be on the stack dhowells: I'm enough sheets to the wind it might take me a few tries to get it right. Can you give me a gdb command to dump out? do you know what ESP is on the dead task/CPU? dhowells: $esp produces what appears to be a proper result. p $ESP (gdb) p $esp $38 = (void *) 0xe3dfbf54 (gdb) p/x (0xe3dfbf54 & 0xffffe000) + 0x2000 - 0xe3dfbf54 $4 = 0xac that's how much stack space is used Is that good or bad? --> andre (~andre@astound-64-85-224-253.ca.astound.net) has joined #kernel wait p (void*[0xac/4])*(void**)$esp Got a stack-dumping command for me? (gdb) p (void*[0xac/4])*(void**)$esp A parse error in expression, near `4])*(void**)$esp'. p (void*[43])*(void**)$esp (gdb) p (void*[43])*(void**)$esp $39 = {0xc0119011, 0xb89ffbf8, 0xf21, 0xb89ffd24, 0xfffffff2, 0xe3dfbfa0, 0xc0118e95, 0xf21, 0xb89ffbf8, 0xe3dfbfc4, 0x0, 0xb89ffc00, 0xe3dfa000, 0x0, 0x100070, 0xe3dfa000, 0xb8800000, 0x200000, 0xe3dfa000, 0xe3dfbfbc, 0xc0105e39, 0xf21, 0xb89ffbf8, 0xe3dfbfc4, 0x0, 0xb89ffc00, 0x805e2ac, 0xc010788f, 0xf21, 0xb89ffbf8, 0xb89ffc00, 0x0, 0xb89ffd24, 0x805e2ac, 0x78, 0xc010002b, 0x2b, 0x78, 0x400fa8de, 0x23, 0x292, 0x805e25c, 0x2b} p do_nmi (gdb) p do_nmi $40 = {void (struct pt_regs *, long int)} 0xc01090d0 i sym 0xc0119011 i sym 0xc0118e95 i sym 0xc0105e39 (gdb) i sym 0xc0118e95 do_fork + 33 in section .text morning i sym 0xc010788f (gdb) i sym 0xc0119011 Letext + 191 in section .text i sym 0xc010002b hch: morning (gdb) i sym 0xc0105e39 sys_clone + 37 in section .text (gdb) i sym 0xc010788f syscall_call + 7 in section .text wli: okay... the exception hasn't appeared on the stack there... it's the other side of the $esp value (gdb) i sym 0xc010002b startup_32 + 43 in section .text dhowells: anything you can tell me is good p (void*[43])*(void**)($esp-0xac) (gdb) p (void*[43])*(void**)($esp-0xac) $41 = {0xe3dfbf54, 0xe3dfbf68, 0xf21, 0xb89ffd24, 0xc0106693, 0x87, 0x60, 0x68, 0xc0110068, 0x68, 0xffff, 0xffff, 0xe3dfbefc, 0xc0111cb0, 0x2, 0xb, 0x0, 0xe3dfbf20, 0xc0d06425, 0xe3dfbf20, 0xb89ffd24, 0xe3dfbf10, 0xc010911b, 0xe3dfbf20, 0xc0d06400, 0xf21, 0xe3dfbf68, 0xc0108356, 0xe3dfbf20, 0x0, 0xc0d06400, 0x0, 0xe3dfa000, 0xf21, 0xb89ffd24, 0xe3dfbf68, 0xc0331ca0, 0xc0110068, 0x68, 0xc0331ca0, 0xc0106693, 0x60, 0x87} i sym 0xc010911b (gdb) i sym 0xc010911b do_nmi + 75 in section .text * dhowells grins disas 0xc010911b what's that instruction? a call somewhere? 0xc0109116 : call 0xc0111bc4 0xc010911b : jmp 0xc01091b2 0xc0109120 : push %esi 0xc0109121 : movzbl %bl,%eax 0xc0109124 : push %eax does that answer your questions? 0xc01091b0 : in $0x71,%al 0xc01091b2 : lea 0xfffffff8(%ebp),%esp 0xc01091b5 : pop %ebx It's an NMI oops. NMI went off whilst do_fork was spinning on the tasklist_lock wli: you might also want to note the technique I used for finding it dhowells: noted wli: have fun dhowells: I will FTSO the tasklist_lock and the bloody fscking tasklist to boot. * dhowells grins hmm, I should have stayed up longer yesterday dhowells: Have you logged it? and sent linus the patch I thought I'd leave that to you why, do you want me to? dhowells: I'm too many sheets to the wind, can you mail me the log? the log of the xchat session? dhowells: I'll settle for that sure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/