Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751845AbZAFA3Z (ORCPT ); Mon, 5 Jan 2009 19:29:25 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750812AbZAFA3Q (ORCPT ); Mon, 5 Jan 2009 19:29:16 -0500 Received: from e1.ny.us.ibm.com ([32.97.182.141]:45772 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750774AbZAFA3O (ORCPT ); Mon, 5 Jan 2009 19:29:14 -0500 Date: Mon, 5 Jan 2009 16:29:12 -0800 From: "Paul E. McKenney" To: Eric Sesterhenn Cc: Kamalesh Babulal , linux-kernel@vger.kernel.org, josh@freedesktop.org, dipankar@in.ibm.com Subject: Re: [BUG] NULL pointer deref with rcutorture Message-ID: <20090106002912.GA21071@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20090104233855.GA17021@alice> <20090105022827.GA8080@linux.vnet.ibm.com> <20090105121409.GA5783@alice> <20090105180037.GH6959@linux.vnet.ibm.com> <20090105185655.GA11244@alice> <20090105193602.GL6959@linux.vnet.ibm.com> <20090105200012.GB11244@alice> <20090105201631.GO6959@linux.vnet.ibm.com> <20090105203153.GC11244@alice> <20090105221836.GT6959@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090105221836.GT6959@linux.vnet.ibm.com> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14070 Lines: 298 On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote: > On Mon, Jan 05, 2009 at 09:31:53PM +0100, Eric Sesterhenn wrote: > > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: > > > On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote: > > > > hi, > > > > > > > > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: > > > > > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote: > > > > > > > > > > Wow!!! Am I reading this correctly? Does the above "call" instruction > > > > > -really- call one byte into itself? That is what the hex for the x86 > > > > > instruction -looks- like it is doing, but I cannot see what would have > > > > > possessed the compiler to generate this code. > > > > > > > > Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3) > > > > > > I am using 4.1.3, for whatever it is worth. (Ancient, I know!) > > > > > > > > When I compile on a 32-bit x86 machine, I don't see the above "call" > > > > > instruction. Other than that, the code I see looks consistent. > > > > > > > > > > > 9f0: eb 1d jmp a0f > > > > > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0 > > > > > > 9f9: b8 01 00 00 00 mov $0x1,%eax > > > > > > 9fe: 75 0a jne a0a > > > > > > a00: b8 e8 03 00 00 mov $0x3e8,%eax > > > > > > a05: e8 fc ff ff ff call a06 > > > > > > a0a: e8 fc ff ff ff call a0b > > > > > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c > > > > > > ^---------- this line > > > > > > > > > > This looks like the first test in the "while" loop. > > > > > > > > > > > a16: 75 09 jne a21 > > > > > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0 > > > > > > a1f: 75 09 jne a2a > > > > > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50 > > > > > > a28: 74 c8 je 9f2 > > > > > > a2a: 5d pop %ebp > > > > > > a2b: c3 ret > > > > > > > > > > The corresponding C code is as follows: > > > > > > > > > > static void > > > > > rcu_stutter_wait(void) > > > > > { > > > > > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) { > > > > > if (rcutorture_runnable) > > > > > schedule_timeout_interruptible(1); > > > > > else > > > > > schedule_timeout_interruptible(round_jiffies_relative(HZ)); > > > > > } > > > > > } > > > > > > > > > > I don't see much opportunity for a page fault here... This is the > > > > > binary I get when I compile it, though not as a module: > > > > > > > > > > 0000085a : > > > > > 85a: 55 push %ebp > > > > > 85b: 89 e5 mov %esp,%ebp > > > > > 85d: eb 1d jmp 87c > > > > > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0 > > > > > 866: b8 01 00 00 00 mov $0x1,%eax > > > > > 86b: 75 0a jne 877 > > > > > 86d: b8 e8 03 00 00 mov $0x3e8,%eax > > > > > 872: e8 fc ff ff ff call 873 > > > > > 877: e8 fc ff ff ff call 878 > > > > > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14 > > > > > 883: 75 09 jne 88e > > > > > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0 > > > > > 88c: 75 09 jne 897 > > > > > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08 > > > > > 895: 74 c8 je 85f > > > > > 897: 5d pop %ebp > > > > > 898: c3 ret > > > > > > > > > > I confess, I am confused!!! > > > > > > > > on the other box with a different gcc version > > > > > > > > gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11) > > > > > > > > d1902e90 is the start of rcu_stutter_wait > > > > > > > > [ 533.391719] d087e000 d1902e90 > > > > [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 > > > > [ 541.000139] BUG: unable to handle kernel paging request at d1902efd > > > > [ 541.000423] IP: [] 0xd1902efd > > > > [ 541.000660] *pde = 0f08f067 *pte = 00000000 > > > > [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC > > > > [ 541.001126] last sysfs file: /sys/block/sda/size > > > > [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture] > > > > [ 541.002235] > > > > [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84) > > > > [ 541.002470] EIP: 0060:[] EFLAGS: 00010296 CPU: 0 > > > > [ 541.002598] EIP is at 0xd1902efd > > > > [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000 > > > > [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8 > > > > [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > > > > [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000) > > > > [ 541.003400] Stack: > > > > [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a > > > > [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000 > > > > [ 541.004022] 00000201 000004b4 > > > > [ 541.004022] Call Trace: > > > > [ 541.004022] [] ? kthread+0x3a/0x70 > > > > [ 541.004022] [] ? kthread+0x0/0x70 > > > > [ 541.004022] [] ? kernel_thread_helper+0x7/0x14 > > > > [ 541.004022] Code: Bad EIP value. > > > > [ 541.004022] EIP: [] 0xd1902efd SS:ESP 0068:c7b63fb8 > > > > [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]--- > > > > > > > > > > > > 00000e90 : > > > > e90: 55 push %ebp > > > > e91: 89 e5 mov %esp,%ebp > > > > e93: 90 nop > > > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > > > > e98: a1 98 00 00 00 mov 0x98,%eax > > > > e9d: 85 c0 test %eax,%eax > > > > e9f: 75 09 jne eaa > > > > ea1: a1 00 00 00 00 mov 0x0,%eax > > > > ea6: 85 c0 test %eax,%eax > > > > ea8: 75 36 jne ee0 > > > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax > > > > eaf: 85 c0 test %eax,%eax > > > > eb1: 75 2d jne ee0 > > > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx > > > > eb9: 85 d2 test %edx,%edx > > > > ebb: 74 2b je ee8 > > > > ebd: b8 01 00 00 00 mov $0x1,%eax > > > > ec2: e8 fc ff ff ff call ec3 > > > > ec7: a1 98 00 00 00 mov 0x98,%eax > > > > ecc: 85 c0 test %eax,%eax > > > > ece: 74 d1 je ea1 > > > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax > > > > ed5: 85 c0 test %eax,%eax > > > > ed7: 74 da je eb3 > > > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > > > > ee0: 5d pop %ebp > > > > ee1: c3 ret > > > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi > > > > ee8: b8 fa 00 00 00 mov $0xfa,%eax > > > > eed: e8 fc ff ff ff call eee > > > > > > Here we are again calling one byte into the current instruction!!! > > > > > > Or am I misinterpreting this code? > > > > > > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi > > > > ef8: e8 fc ff ff ff call ef9 > > > > efd: 8d 76 00 lea 0x0(%esi),%esi > > > > ^------------- here > > > > > > > > This one looks more like it can explain a page fault > > > > > > I don't understand why there are indirections in the assembly given the > > > C code for rcu_stutter_wait(). > > > > > > > f00: eb 96 jmp e98 > > > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > > > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi > > > > ok, after trying to find out if the ubuntu gccs are broken, i stumbled > > upon this: > > http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html > > > > Seems the difference is that you dont compile it as a module and the > > jump is perfectly normal, it gets overwritten when the stuff is loaded > > objdump -dr gives me > > > > 00000e90 : > > e90: 55 push %ebp > > e91: 89 e5 mov %esp,%ebp > > e93: 90 nop > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > > e98: a1 98 00 00 00 mov 0x98,%eax > > e99: R_386_32 .bss > > e9d: 85 c0 test %eax,%eax > > e9f: 75 09 jne eaa > > ea1: a1 00 00 00 00 mov 0x0,%eax > > ea2: R_386_32 rcutorture_runnable > > ea6: 85 c0 test %eax,%eax > > ea8: 75 36 jne ee0 > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax > > eab: R_386_32 .bss > > eaf: 85 c0 test %eax,%eax > > eb1: 75 2d jne ee0 > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx > > eb5: R_386_32 rcutorture_runnable > > eb9: 85 d2 test %edx,%edx > > ebb: 74 2b je ee8 > > ebd: b8 01 00 00 00 mov $0x1,%eax > > ec2: e8 fc ff ff ff call ec3 > > ec3: R_386_PC32 schedule_timeout_interruptible > > ec7: a1 98 00 00 00 mov 0x98,%eax > > ec8: R_386_32 .bss > > ecc: 85 c0 test %eax,%eax > > ece: 74 d1 je ea1 > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax > > ed1: R_386_32 .bss > > ed5: 85 c0 test %eax,%eax > > ed7: 74 da je eb3 > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > > ee0: 5d pop %ebp > > ee1: c3 ret > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi > > ee8: b8 fa 00 00 00 mov $0xfa,%eax > > eed: e8 fc ff ff ff call eee > > eee: R_386_PC32 round_jiffies_relative > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi > > ef8: e8 fc ff ff ff call ef9 > > ef9: R_386_PC32 schedule_timeout_interruptible > > efd: 8d 76 00 lea 0x0(%esi),%esi > > > > here is the deref ------------------------^ > > Ah!!! We are getting a page fault while cleaning up the stack frame? > > Ouch! > > > f00: eb 96 jmp e98 > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi And now I can run this. gcc v3.4.4 gives yet a different error: divide error: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type CPU 1 Modules linked in: [last unloaded: rcutorture] Pid: 3369, comm: rcu_torture_rea Not tainted 2.6.28-autokern1 #1 RIP: 0010:[] [] 0xffffffffa0000142 RSP: 0000:ffff88007f0afeb0 EFLAGS: 00010246 RAX: 00000000def36d6a RBX: ffffffffa0006850 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88007ecf1600 RDI: ffff88007f0afef0 RBP: 0000000000000dd4 R08: ffff88007f0ae000 R09: ffffffff8037c7d9 R10: 0000000000000000 R11: ffffffff8037c7d9 R12: 0000000000000000 R13: 000000000000000a R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8800e3801c00(0000) knlGS:00000000f7f3ab80 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000080dc87c CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rcu_torture_rea (pid: 3369, threadinfo ffff88007f0ae000, task ffff88007ecf5d80) Stack: ffffffff8037c7d9 ffffffffa000090d ffff88007ecfdec0 ffff88007ec17eb0 0000000000000001 ffffffffa00006e6 0000000000000000 ffff8800e3488000 c744ca136d6adef3 0000000000000256 0000000000000000 ffffffffa0000807 Call Trace: [] ? delay_tsc+0x0/0x99 [] ? kthread+0x3d/0x63 [] ? child_rip+0xa/0x20 [] ? kthread+0x0/0x63 [] ? child_rip+0x0/0x20 Code: 58 c3 56 bf 01 00 00 00 e8 88 bd 22 e0 59 31 c0 c3 41 51 e8 69 ff ff ff 48 63 15 0a 58 00 00 48 69 d2 90 01 00 00 48 89 d1 31 d2 <48> f7 f1 48 85 d2 75 0c 41 58 bf 78 1b 0d 00 e9 32 c7 37 e0 5f RIP [] 0xffffffffa0000142 RSP I will see what I can find from this. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/