Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756722Ab1EYVjO (ORCPT ); Wed, 25 May 2011 17:39:14 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:49124 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756536Ab1EYVi5 (ORCPT ); Wed, 25 May 2011 17:38:57 -0400 From: Konrad@dumpdata.com, Rzeszutek@dumpdata.com, Wilk@dumpdata.com To: stable@kernel.org, linux-kernel@vger.kernel.org Cc: "Tian, Kevin" , Konrad Rzeszutek Wilk Subject: [PATCH] xen mmu: fix a race window causing leave_mm BUG() Date: Wed, 25 May 2011 17:37:33 -0400 Message-Id: <1306359456-3627-2-git-send-email-konrad.wilk@oracle.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1306359456-3627-1-git-send-email-konrad.wilk@oracle.com> References: <1306359456-3627-1-git-send-email-konrad.wilk@oracle.com> X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090207.4DDD76BA.00CD,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5402 Lines: 102 From: Tian, Kevin There's a race window in xen_drop_mm_ref, where remote cpu may exit dirty bitmap between the check on this cpu and the point where remote cpu handles drop request. So in drop_other_mm_ref we need check whether TLB state is still lazy before calling into leave_mm. This bug is rarely observed in earlier kernel, but exaggerated by the commit 831d52bc153971b70e64eccfbed2b232394f22f8 ("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm") which clears bitmap after changing the TLB state. the call trace is as below: --------------------------------- kernel BUG at arch/x86/mm/tlb.c:61! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb CPU 1 Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table] Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285 RIP: e030:[] [] leave_mm+0x15/0x46 RSP: e02b:ffff88002805be48 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0 RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200 R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880 R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000 FS: 00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40) Stack: ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88 <0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78 <0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108 Call Trace: [] drop_other_mm_ref+0x2a/0x53 [] generic_smp_call_function_single_interrupt+0xd8/0xfc [] xen_call_function_single_interrupt+0x13/0x28 [] handle_IRQ_event+0x66/0x120 [] handle_percpu_irq+0x41/0x6e [] __xen_evtchn_do_upcall+0x1ab/0x27d [] xen_evtchn_do_upcall+0x33/0x46 [] xen_do_hyper visor_callback+0x1e/0x30 [] ? _spin_unlock_irqrestore+0x15/0x17 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? flush_old_exec+0x3ac/0x500 [] ? load_elf_binary+0x0/0x17ef [] ? load_elf_binary+0x0/0x17ef [] ? load_elf_binary+0x398/0x17ef [] ? need_resched+0x23/0x2d [] ? process_measurement+0xc0/0xd7 [] ? load_elf_binary+0x0/0x17ef [] ? search_binary_handler+0xc8/0x255 [] ? do_execve+0x1c3/0x29e [] ? sys_execve+0x43/0x5d [] ? __call_usermodehelper+0x0/0x6f [] ? kernel_execve+0x68/0xd0 [] ? __call_usermodehelper+0x0/0x6f [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? ____call_usermodehelper+0x113/0x11e [] ? child_rip+0xa/0x20 [] ? __call_usermodehelper+0x0/0x6f [] ? int_ret_from_sys_call+0x7/0x1b [] ? retint_restore_args+0x5/0x6 [] ? child_rip+0x0/0x20 Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8 RIP [] leave_mm+0x15/0x46 RSP ---[ end trace ce9cee6832a9c503 ]--- Tested-by: Maoxiaoyun Signed-off-by: Kevin Tian [v1: Fleshed out the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/mmu.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 5e92b61..4fd7387 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1140,7 +1140,7 @@ static void drop_other_mm_ref(void *info) active_mm = percpu_read(cpu_tlbstate.active_mm); - if (active_mm == mm) + if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK) leave_mm(smp_processor_id()); /* If this cpu still has a stale cr3 reference, then make sure -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/