Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753021AbaGHBxK (ORCPT ); Mon, 7 Jul 2014 21:53:10 -0400 Received: from p02c12o142.mxlogic.net ([208.65.145.75]:35921 "EHLO p02c12o142.mxlogic.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751656AbaGHBxH (ORCPT ); Mon, 7 Jul 2014 21:53:07 -0400 X-MXL-Hash: 53bb4f03193b8c6c-9e4562a6a78df3913a4df70fd9bb6af6cb6ba572 X-MXL-Hash: 53bb4f0200dce21c-7ab885d9b0ae365840daf7ab5edce6c5ddb1e3c9 Date: Mon, 7 Jul 2014 21:52:54 -0400 From: Joe Lawrence X-X-Sender: jlaw@jlaw-desktop.mno.stratus.com To: Joe Julian CC: Joe Lawrence , Subject: Re: mpt2sas stuck installing In-Reply-To: <53B649A6.60501@julianfamily.org> Message-ID: References: <20140704013250.088477de@jlaw-desktop.mno.stratus.com> <53B649A6.60501@julianfamily.org> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [134.111.199.152] X-AnalysisOut: [v=2.1 cv=OeEWD3jY c=1 sm=1 tr=0 a=o2bo05G+d1rlxuoNbFVhCw==] X-AnalysisOut: [:117 a=o2bo05G+d1rlxuoNbFVhCw==:17 a=nXGuE6rTyBAA:10 a=_KQ] X-AnalysisOut: [qW7t0BisA:10 a=CdzKgOd8jloA:10 a=BLceEmwcHowA:10 a=kj9zAlc] X-AnalysisOut: [Oel0A:10 a=uelBKuKpAAAA:8 a=YlVTAMxIAAAA:8 a=xe5O5S2I-XkTA] X-AnalysisOut: [AOPB8oA:9 a=CjuIK1q_8ugA:10] X-Spam: [F=0.5000000000; CM=0.500; MH=0.500(2014070726); S=0.200(2014051901)] X-MAIL-FROM: X-SOURCE-IP: [134.111.1.18] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 4 Jul 2014, Joe Julian wrote: > On 07/03/2014 10:32 PM, Joe Lawrence wrote: > > On Thu, Jul 3 2014 Joe Julian wrote: > > > I have a knox enclosure with an unresponsive drive. When the mpt2sas > > > module is loaded the module loading process hangs. modprobe/insmod is > > > stuck and any further attempts to load modules also hang. By > > > blacklisting the module and loading it last, I can get the computer to > > > boot, but attempting to manually load the module will still hang. When I > > > shut down, I get the following: > > > > > > [55473.508343] mpt2sas1: _config_request: timeout > > > [55474.510395] BUG: unable to handle kernel paging request at > > > ffffc90020ae0000 > > > [55474.513048] IP: [] > > > mpt2sas_base_get_iocstate+0x10/0x30 [mpt2sas] > > > [55474.525196] PGD 103f80c067 PUD 203f003067 PMD 1026dca067 PTE 0 > > > [55474.526115] Oops: 0000 [#1] SMP > > > [55474.527837] Modules linked in: raid456 async_pq async_xor xor > > > async_memcpy async_raid6_recov raid6_pq async_tx ses enclosure mpt2sas > > > raid_class rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_uverbs > > > ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core coretemp mlx4_core > > > kvm_intel kvm 8021q garp stp ghash_clmulni_intel llc aesni_intel > > > ablk_helper cryptd nfsd lrw aes_x86_64 xts psmouse gf128mul sb_edac > > > nfs_acl auth_rpcgss edac_core mei microcode serio_raw mac_hid lp lpc_ich > > > nfs fscache parport lockd sunrpc ext2 isci libsas ahci libahci e1000e > > > scsi_transport_sas [last unloaded: mlx4_core] > > > [55474.538831] CPU 2 > > > [55474.539218] Pid: 3516, comm: scsi_eh_10 Not tainted 3.8.0-38-generic > > > #56~precise1-Ubuntu Quanta F03R /Winterfell > > > [55474.541004] RIP: 0010:[] [] > > > mpt2sas_base_get_iocstate+0x10/0x30 [mpt2sas] > > > [55474.542772] RSP: 0018:ffff881019a39ae8 EFLAGS: 00010246 > > > [55474.543590] RAX: ffffc90020ae0000 RBX: ffff88100fd1e6b0 RCX: > > > 0000000000000000 > > > [55474.546285] RDX: ffff881019a39fd8 RSI: 0000000000000001 RDI: > > > ffff88100fd1e6b0 > > > [55474.548451] RBP: ffff881019a39ae8 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [55474.549585] R10: 00000000000007db R11: 00000000000007da R12: > > > 0000000000000001 > > > [55474.550689] R13: ffff881019a39bbc R14: 000000000000ffff R15: > > > ffff881019a39c80 > > > [55474.551791] FS: 0000000000000000(0000) GS:ffff88103fc40000(0000) > > > knlGS:0000000000000000 > > > [55474.553044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [55474.553928] CR2: ffffc90020ae0000 CR3: 0000000001c0d000 CR4: > > > 00000000000407e0 > > > [55474.555187] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > [55474.557030] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > 0000000000000400 > > > [55474.558139] Process scsi_eh_10 (pid: 3516, threadinfo ffff881019a38000, > > > task ffff8810257b5d00) > > > [55474.560748] Stack: > > > [55474.561069] ffff881019a39b98 ffffffffa03c183a 0000000000000006 > > > 0000000000000006 > > > [55474.562218] ffff88100fd1eb00 ffff88100fd1eaf8 ffff88100fd1e6d0 > > > 0000000000000000 > > > [55474.563343] ffff881019a30000 ffffffff8105b3ea ffff88100fd1ead8 > > > 0000000000000000 > > > [55474.564595] Call Trace: > > > [55474.565003] [] > > > _config_request.constprop.5+0x15a/0x590 [mpt2sas] > > > [55474.568223] [] ? console_unlock+0x1a/0x30 > > > [55474.569896] [] > > > mpt2sas_config_get_expander_pg0+0x8a/0xf0 [mpt2sas] > > > [55474.571322] [] > > > _scsih_search_responding_expanders+0x5c/0xe0 [mpt2sas] > > > [55474.572582] [] ? > > > _scsih_search_responding_sas_devices+0xa9/0xc0 [mpt2sas] > > > [55474.573912] [] > > > mpt2sas_scsih_reset_handler+0xbe/0x1a0 [mpt2sas] > > > [55474.575191] [] _base_reset_handler+0x1f/0x40 > > > [mpt2sas] > > > [55474.576250] [] > > > mpt2sas_base_hard_reset_handler+0x1ae/0x1e0 [mpt2sas] > > > [55474.577500] [] _scsih_host_reset+0x5c/0xb0 [mpt2sas] > > > [55474.578554] [] scsi_try_host_reset+0x53/0x110 > > > [55474.579729] [] scsi_eh_host_reset+0x4c/0x170 > > > [55474.580764] [] scsi_eh_ready_devs+0x82/0xa0 > > > [55474.581866] [] scsi_unjam_host+0xed/0x1d0 > > > [55474.584848] [] scsi_error_handler+0x165/0x1c0 > > > [55474.585984] [] ? scsi_unjam_host+0x1d0/0x1d0 > > > [55474.592375] [] kthread+0xc0/0xd0 > > > [55474.594325] [] ? flush_kthread_worker+0xb0/0xb0 > > > [55474.595654] [] ret_from_fork+0x7c/0xb0 > > > [55474.598729] [] ? flush_kthread_worker+0xb0/0xb0 > > > [55474.607501] Code: c7 c2 f8 7d 3d a0 48 c7 c7 52 ac 3d a0 31 c0 e8 f1 de > > > 31 e1 e9 f6 fe ff ff 66 90 66 66 66 66 90 55 48 8b 87 88 00 00 00 48 89 e5 > > > <8b> 00 89 c2 81 e2 00 00 00 f0 85 f6 0f 45 c2 5d c3 66 66 66 66 > > > [55474.611823] RIP [] > > > mpt2sas_base_get_iocstate+0x10/0x30 [mpt2sas] > > > [55474.613007] RSP > > > [55474.613548] CR2: ffffc90020ae0000 > > > [55474.614183] ---[ end trace a817d8e30eb9f07c ]--- > > Hi Joe, > > > > I was investigating a crash inside mpt2sas_base_get_iocstate just > > earlier today. In my case, it appeared that ioc->chip had been cleared > > when mpt2sas_base_get_iocstate tried to reference through it. This was > > with a newer kernel on RHEL7, but it also occured early in > > mpt2sas_base_get_iocstate and EAX held the bogo address. > > > > A few follow up questions: > > > > Do you happen to have kdump enabled? > > Were there any other interesting log messages after loading the driver? > > Is this crash easily reproducible? > > > > Regards, > > > > -- Joe > This was a production server, so no kdump enabled. There's no relevant log > messages. > > I do have a staging environment I can test in, and yes, I think I can easily > repro this. Any chance of posting the rest of the kernel log? (Or at least those lines containing "mpt2sas" ? Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/