Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754116Ab0LTH5Y (ORCPT ); Mon, 20 Dec 2010 02:57:24 -0500 Received: from mail-gx0-f180.google.com ([209.85.161.180]:38308 "EHLO mail-gx0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753747Ab0LTH5Q (ORCPT ); Mon, 20 Dec 2010 02:57:16 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=c4+IzXeMy9hfFIF5Qi0OChkms5WZP/jCGQm8diBSZnW78xQC2UXCvXXs1u4lrjnMrc NZ/KmdfGZgs6ZMsJWnrhJch97CAi7abPOPTu8Vbgnosewc8dWjW/kLPOEVWnMy/3EdhP F+s+FXYmDxTzg6qqGuPyVHz00oX6bMTaSHn+s= MIME-Version: 1.0 Date: Mon, 20 Dec 2010 02:57:15 -0500 X-Google-Sender-Auth: AZvL7NK07yjX7KS88Dmt6oEqRO4 Message-ID: Subject: BUG after md/raid10:md0: not enough operational mirrors. From: Ilia Mirkin To: dm-devel@redhat.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10690 Lines: 223 Hello, I've just upgraded to linux-2.6.36.2 on this machine. Right after the upgrade, I got an oops on boot which I was unable to capture. I'm guessing that it left md state in a somewhat undefined place, although I don't know what caused the initial oops. Anyways, on second boot: [ 17.336794] md: Scanned 11 and added 11 devices. [ 17.337050] md: autorun ... [ 17.337298] md: considering sdj1 ... [ 17.337552] md: adding sdj1 ... [ 17.337800] md: adding sdg1 ... [ 17.338047] md: adding sdi1 ... [ 17.338295] md: adding sdh1 ... [ 17.338548] md: adding sdf1 ... [ 17.338797] md: adding sde1 ... [ 17.339046] md: adding sdk1 ... [ 17.339295] md: adding sdd1 ... [ 17.339556] md: adding sda1 ... [ 17.339808] md: adding sdb1 ... [ 17.340058] md: adding sdc1 ... [ 17.340305] md: created md0 [ 17.340547] md: bind [ 17.340793] md: bind [ 17.341037] md: bind [ 17.341287] md: bind [ 17.341543] md: bind [ 17.341790] md: bind [ 17.342036] md: bind [ 17.342284] md: bind [ 17.342534] md: bind [ 17.342783] md: bind [ 17.343031] md: bind [ 17.343281] md: running: [ 17.344151] md: kicking non-fresh sdj1 from array! [ 17.344406] md: unbind [ 17.348365] md: export_rdev(sdj1) [ 17.348613] md: kicking non-fresh sdg1 from array! [ 17.348852] md: unbind [ 17.356343] md: export_rdev(sdg1) [ 17.356589] md: kicking non-fresh sdi1 from array! [ 17.356827] md: unbind [ 17.364325] md: export_rdev(sdi1) [ 17.364582] md: kicking non-fresh sdh1 from array! [ 17.364831] md: unbind [ 17.372308] md: export_rdev(sdh1) [ 17.372563] md: kicking non-fresh sdf1 from array! [ 17.372812] md: unbind [ 17.380291] md: export_rdev(sdf1) [ 17.380551] md: kicking non-fresh sde1 from array! [ 17.380801] md: unbind [ 17.388274] md: export_rdev(sde1) [ 17.388522] md: kicking non-fresh sdk1 from array! [ 17.388763] md: unbind [ 17.396256] md: export_rdev(sdk1) [ 17.397013] md/raid10:md0: not enough operational mirrors. [ 17.397364] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 [ 17.397882] IP: [] _raw_spin_lock_irq+0xa/0x1b [ 17.398162] PGD 0 [ 17.398450] Oops: 0002 [#1] SMP [ 17.398749] last sysfs file: [ 17.398986] CPU 13 [ 17.399022] Modules linked in: [ 17.399538] [ 17.399771] Pid: 1519, comm: md0_raid10 Not tainted 2.6.36.2 #2 X8DT3/X8DT3 [ 17.400013] RIP: 0010:[] [] _raw_spin_lock_irq+0xa/0x1b [ 17.400510] RSP: 0018:ffff88033d151cc0 EFLAGS: 00010082 [ 17.400750] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 0000000000000000 [ 17.400995] RDX: ffff88033e381650 RSI: 0000000000000000 RDI: 0000000000000014 [ 17.401288] RBP: ffff88033d151cc0 R08: ffff88033d150000 R09: 0000000000000000 [ 17.401531] R10: ffffffff81a7d7f0 R11: ffff88033e355dc8 R12: ffff88033d700d80 [ 17.401774] R13: 0000000000000014 R14: 0000000000000000 R15: ffff88033d151e80 [ 17.402018] FS: 0000000000000000(0000) GS:ffff88034e340000(0000) knlGS:0000000000000000 [ 17.402478] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 17.402718] CR2: 0000000000000014 CR3: 0000000001a05000 CR4: 00000000000006e0 [ 17.402961] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 17.403252] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 17.403496] Process md0_raid10 (pid: 1519, threadinfo ffff88033d150000, task ffff88033e381650) [ 17.403940] Stack: [ 17.404195] ffff88033d151cf0 ffffffff81380efb ffff88033d151d10 0000000000000014 [ 17.404536] <0> ffff88033d700d80 ffff88033e381650 ffff88033d151e50 ffffffff81381900 [ 17.405141] <0> ffffffff81a1cd80 ffff88033e381650 ffff88033e381650 ffffffff814cb876 [ 17.406019] Call Trace: [ 17.406279] [] flush_pending_writes+0x1c/0x8a [ 17.406524] [] raid10d+0x69/0xe06 [ 17.406765] [] ? schedule+0x61e/0x66b [ 17.407008] [] ? schedule_timeout+0x22/0xbf [ 17.407299] [] ? finish_task_switch+0x3d/0xb0 [ 17.407547] [] md_thread+0xf8/0x116 [ 17.407791] [] ? autoremove_wake_function+0x0/0x38 [ 17.408033] [] ? md_thread+0x0/0x116 [ 17.408291] [] kthread+0x81/0x89 [ 17.408534] [] kernel_thread_helper+0x4/0x10 [ 17.408777] [] ? kthread+0x0/0x89 [ 17.409017] [] ? kernel_thread_helper+0x0/0x10 [ 17.409301] Code: eb f6 c9 c3 55 48 89 e5 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c9 c3 55 48 89 e5 fa b8 00 01 00 00 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c9 c3 55 48 89 e5 [ 17.412131] RIP [] _raw_spin_lock_irq+0xa/0x1b [ 17.417553] RSP [ 17.417789] CR2: 0000000000000014 [ 17.418026] ---[ end trace 1dc7eeca43b701f8 ]--- [ 17.418302] md0_raid10 used greatest stack depth: 4632 bytes left [ 17.418338] md: pers->run() failed ... [ 17.418342] md: do_md_run() returned -5 [ 17.418344] md: md0 still in use. [ 17.418346] md: ... autorun DONE. Shortly followed by: [ 18.572342] udev: starting version 149 [ 18.572396] udevd (1612): /proc/1612/oom_adj is deprecated, please use /proc/1612/oom_score_adj instead. [ 18.615329] BUG: unable to handle kernel paging request at 00000000000ffeb6 [ 18.615645] IP: [] __wake_up_common+0x29/0x76 [ 18.615923] PGD 73cd4a067 PUD 73cd4b067 PMD 0 [ 18.616264] Oops: 0000 [#2] SMP [ 18.616570] last sysfs file: /sys/devices/pci0000:00/0000:00:1a.2/usb5/5-2/5-2:1.0/input/input2/name [ 18.617020] CPU 2 [ 18.617057] Modules linked in: [ 18.617558] [ 18.617796] Pid: 1635, comm: udevd Tainted: G D 2.6.36.2 #2 X8DT3/X8DT3 [ 18.618242] RIP: 0010:[] [] __wake_up_common+0x29/0x76 [ 18.618719] RSP: 0018:ffff88073cd69de8 EFLAGS: 00010096 [ 18.618958] RAX: 00000000000ffeb6 RBX: ffff88033d700d90 RCX: 0000000000000000 [ 18.619202] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88033d700d90 [ 18.619447] RBP: ffff88073cd69e18 R08: 00000000000ffe9e R09: 000000000000000a [ 18.619691] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 18.619934] R13: 0000000000000001 R14: ffff88033d700d98 R15: 0000000000000000 [ 18.620177] FS: 00007fcae5aba6f0(0000) GS:ffff880001c80000(0000) knlGS:0000000000000000 [ 18.620625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.620886] CR2: 00000000000ffeb6 CR3: 000000073cd49000 CR4: 00000000000006e0 [ 18.621146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.621410] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 18.621761] Process udevd (pid: 1635, threadinfo ffff88073cd68000, task ffff88073cd30770) [ 18.622238] Stack: [ 18.622487] 0000000300000000 ffff88033d700d90 0000000000000000 0000000000000001 [ 18.622928] <0> 0000000000000296 0000000000000003 ffff88073cd69e58 ffffffff8102d080 [ 18.623596] <0> ffff88073cd69ee8 ffff88033d10b400 0000000000000000 ffffffff81a44410 [ 18.624514] Call Trace: [ 18.624816] [] __wake_up+0x38/0x50 [ 18.625075] [] md_wakeup_thread+0x27/0x29 [ 18.625319] [] mddev_unlock+0xa6/0xab [ 18.625605] [] md_attr_show+0x4c/0x58 [ 18.625867] [] sysfs_read_file+0xb2/0x131 [ 18.626121] [] vfs_read+0xa8/0x100 [ 18.626371] [] sys_read+0x47/0x70 [ 18.626666] [] system_call_fastpath+0x16/0x1b [ 18.626928] Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4 53 48 83 ec 08 89 75 d4 89 4d d0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75 [ 18.629848] RIP [] __wake_up_common+0x29/0x76 [ 18.630139] RSP [ 18.630386] CR2: 00000000000ffeb6 [ 18.630680] ---[ end trace 1dc7eeca43b701f9 ]--- And then a watchdog: [ 230.799819] ------------[ cut here ]------------ [ 230.800081] WARNING: at kernel/watchdog.c:240 watchdog_overflow_callback+0xa9/0xbb() [ 230.805856] Hardware name: X8DT3 [ 230.806106] Watchdog detected hard LOCKUP on cpu 1 [ 230.806147] Modules linked in: cifs kvm_intel kvm iTCO_wdt iTCO_vendor_support i2c_i801 [ 230.807114] Pid: 2594, comm: udevd Tainted: G D 2.6.36.2 #2 [ 230.807358] Call Trace: [ 230.807636] [] ? watchdog_overflow_callback+0xa9/0xbb [ 230.808134] [] warn_slowpath_common+0x80/0x99 [ 230.808382] [] warn_slowpath_fmt+0x69/0x6b [ 230.808679] [] watchdog_overflow_callback+0xa9/0xbb [ 230.808938] [] __perf_event_overflow+0x189/0x1fc [ 230.809195] [] perf_event_overflow+0x14/0x16 [ 230.809448] [] intel_pmu_handle_irq+0x385/0x3ee [ 230.809744] [] perf_event_nmi_handler+0x6f/0xcf [ 230.810001] [] notifier_call_chain+0x33/0x5b [ 230.810249] [] atomic_notifier_call_chain+0x13/0x15 [ 230.810499] [] notify_die+0x2e/0x30 [ 230.810789] [] do_nmi+0x91/0x261 [ 230.811042] [] nmi+0x1a/0x20 [ 230.811290] [] ? _raw_spin_lock_irqsave+0x17/0x1d [ 230.811576] <> [] __wake_up+0x22/0x50 [ 230.811875] [] md_wakeup_thread+0x27/0x29 [ 230.812125] [] mddev_unlock+0xa6/0xab [ 230.812373] [] md_attr_show+0x4c/0x58 [ 230.812663] [] sysfs_read_file+0xb2/0x131 [ 230.812919] [] vfs_read+0xa8/0x100 [ 230.813168] [] sys_read+0x47/0x70 [ 230.813415] [] system_call_fastpath+0x16/0x1b [ 230.813706] ---[ end trace 1dc7eeca43b701fa ]--- I will be throwing out/rebuilding this raid shortly (it's used for swap, no real data on it anyways), but thought it would be good to report this. Let me know if I can provide any further details about this system. -- Ilia Mirkin imirkin@alum.mit.edu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/