Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752731AbZKLF0k (ORCPT ); Thu, 12 Nov 2009 00:26:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752470AbZKLF0k (ORCPT ); Thu, 12 Nov 2009 00:26:40 -0500 Received: from mail-pz0-f171.google.com ([209.85.222.171]:39392 "EHLO mail-pz0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752161AbZKLF0i (ORCPT ); Thu, 12 Nov 2009 00:26:38 -0500 MIME-Version: 1.0 In-Reply-To: <878weco4r9.fsf@openvz.org> References: <4AFA835B.9000904@garzik.org> <4AFB2F2A.7080900@garzik.org> <878weco4r9.fsf@openvz.org> Date: Thu, 12 Nov 2009 00:26:44 -0500 Message-ID: Subject: Re: Crash during SATA reads From: Glenn Maynard To: Jeff Garzik , linux-kernel@vger.kernel.org, linux-scsi , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3258 Lines: 73 On Wed, Nov 11, 2009 at 9:28 PM, Dmitry Monakhov wrote: > Seems what you have use-after-free here. > You probably have to add some debug info in to bio->end_io method > May be something like this. That's what it looks like, but won't these checks trigger on the use (and give the same trace), when we need to know where the free is happening? The problem is manifesting in several places (bogus or NULL bh->b_end_io in end_bio_bh_io_sync(); bh->b_this_page == NULL--I think, havn't reproduced that again to confirm--in block_invalidatepage()). I'm not sure how to figure out where the free might be happening; it's tricky enough in userspace with Valgrind available. I tried logging alloc_buffer_head and free_buffer_head, but it was too much output (if it's not masking the problem entirely by changing the timing too much, it would take ages to repro). I just repro'd it (took several hours this time), on the BUG_ON(!bh->b_page); assertion. This one happened while doing a partition copy (/dev/sdb2) rather than /dev/sdb. Trace follows (though I doubt it offers any new information). Hopefully somebody has an idea of where to look next... kernel BUG at fs/buffer.c:2934! invalid opcode: 0000 [#1] PREEMPT last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/model Modules linked in: netconsole atl1c rtc Pid: 0, comm: swapper Not tainted (2.6.31.6 #16) G31M-ES2L EIP: 0060:[] EFLAGS: 00010282 CPU: 0 EIP is at end_bio_bh_io_sync+0x20/0x63 EAX: c1ae78c0 EBX: c107cce3 ECX: c1ae78c0 EDX: 00000000 ESI: c1ae78c0 EDI: d9e5c450 EBP: c1351eac ESP: c1351e74 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=c1350000 task=c1356260 task.ti=c1350000) Stack: c107e5b5 00000400 c1158828 00000000 df8a62c0 df8ef5a4 00000000 00000000 <0> 00012c00 0000d400 00000000 d9e5c450 00000000 d9e5c450 df8ef168 c11589a2 <0> df8a6500 00000000 c1158a59 00000000 df8a6500 00000000 d9e5c450 c1158ac8 Call Trace: [] ? bio_endio+0x24/0x26 [] ? blk_update_request+0xdf/0x24e [] ? blk_update_bidi_request+0xb/0x41 [] ? blk_end_bidi_request+0x10/0x4f [] ? blk_end_request+0x7/0xc [] ? scsi_end_request+0x17/0x69 [] ? scsi_io_completion+0x173/0x335 [] ? scsi_finish_command+0x70/0x86 [] ? scsi_softirq_done+0xd7/0xdc [] ? blk_done_softirq+0x51/0x5d [] ? __do_softirq+0x5f/0xc8 [] ? do_softirq+0x22/0x26 [] ? irq_exit+0x29/0x34 [] ? do_IRQ+0x53/0x63 [] ? common_interrupt+0x29/0x30 [] ? mwait_idle+0x3c/0x44 [] ? cpu_idle+0x19/0x3a [] ? start_kernel+0x1a4/0x1a6 Code: 54 24 18 83 c4 48 5b 5e 5f 5d c3 56 53 89 c3 8b 48 40 8b 40 34 85 c0 75 04 0f 0b eb fe 85 c9 75 04 0f 0b eb fe 83 79 08 00 75 04 <0f> 0b eb fe 83 79 04 00 75 04 0f 0b eb fe 83 fa a1 75 08 80 4b EIP: [] end_bio_bh_io_sync+0x20/0x63 SS:ESP 0068:c1351e74 -- Glenn Maynard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/