Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753820Ab2E3Qup (ORCPT ); Wed, 30 May 2012 12:50:45 -0400 Received: from s15943758.onlinehome-server.info ([217.160.130.188]:37094 "EHLO mail.x86-64.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753375Ab2E3Qul (ORCPT ); Wed, 30 May 2012 12:50:41 -0400 Date: Wed, 30 May 2012 18:51:09 +0200 From: Borislav Petkov To: Dan Williams Cc: Alan Stern , James Bottomley , LKML , linux-scsi@vger.kernel.org Subject: "[SCSI] sd: limit the scope of the async probe domain" breaks booting here Message-ID: <20120530165109.GF4771@aftab.osrc.amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10752 Lines: 190 Dudes, so I've been testing latest linus (731a7378b81c2f5fa88ca1ae20b83d548d5613dc) here and my box fails booting because it can't find the root partition, see message below. I did a bisect run (also below) and pointed me to the first bad commit (see below too). Reverting the commit in question fixes booting. Let me know what other info you'd need. Thanks. * bisect ======== git bisect start # bad: [731a7378b81c2f5fa88ca1ae20b83d548d5613dc] Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 731a7378b81c2f5fa88ca1ae20b83d548d5613dc # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc # bad: [fb09bafda67041b74a668dc9d77735e36bd33d3b] Merge tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad fb09bafda67041b74a668dc9d77735e36bd33d3b # bad: [da4f58ffa08a7b7012fab9c205fa0f6ba40fec42] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect bad da4f58ffa08a7b7012fab9c205fa0f6ba40fec42 # good: [9a00be04e66cc025ab4558d34620615d5c4de5b6] iwlwifi: add BT reduced tx power flag git bisect good 9a00be04e66cc025ab4558d34620615d5c4de5b6 # good: [ff8ce5f67ddca709fe59e6173f89260f0fdc2b22] Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm git bisect good ff8ce5f67ddca709fe59e6173f89260f0fdc2b22 # good: [ac1806572df55b6125ad9d117906820dacfa3145] Merge tag 'regulator-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator git bisect good ac1806572df55b6125ad9d117906820dacfa3145 # bad: [76b311fdbdd2e16e5d39cd496a67aa1a1b948914] [SCSI] lpfc 8.3.31: Update lpfc to version 8.3.31 git bisect bad 76b311fdbdd2e16e5d39cd496a67aa1a1b948914 # good: [949e71f17d9a5c59fa7b02cce3b548384bff1c92] [SCSI] fcoe: Don't hold rtnl_mutex in fcoe_update_src_mac git bisect good 949e71f17d9a5c59fa7b02cce3b548384bff1c92 # bad: [794c10fa0fa4d1781c5651c31e3d4d0b71629128] [SCSI] sg: remove while (1) non-loop git bisect bad 794c10fa0fa4d1781c5651c31e3d4d0b71629128 # good: [852af20aa64ef34ab07de978c676e1e8860dca2e] [SCSI] hpsa: retry driver initiated commands on busy status git bisect good 852af20aa64ef34ab07de978c676e1e8860dca2e # good: [e16a33adc0e59aa96a483fd2923d77e674f013c1] [SCSI] hpsa: refine interrupt handler locking for greater concurrency git bisect good e16a33adc0e59aa96a483fd2923d77e674f013c1 # good: [21334ea9086c31db38e76152a1e31001a0ed288a] [SCSI] hpsa: removed unused member maxQsinceinit git bisect good 21334ea9086c31db38e76152a1e31001a0ed288a # bad: [a7a20d103994fd760766e6c9d494daa569cbfe06] [SCSI] sd: limit the scope of the async probe domain git bisect bad a7a20d103994fd760766e6c9d494daa569cbfe06 # good: [e85c59746957fd6e3595d02cf614370056b5816e] [SCSI] hpsa: dial down lockup detection during firmware flash git bisect good e85c59746957fd6e3595d02cf614370056b5816e * first bad commit ================== commit a7a20d103994fd760766e6c9d494daa569cbfe06 Author: Dan Williams Date: Thu Mar 22 17:05:11 2012 -0700 [SCSI] sd: limit the scope of the async probe domain sd injects and synchronizes probe work on the global kernel-wide domain. This runs into conflict with PM that wants to perform resume actions in async context: [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 [ 494.611685] Call Trace: [ 494.632649] [] schedule+0x5a/0x5c [ 494.674687] [] async_synchronize_cookie_domain+0xb6/0x112 [ 494.734177] [] ? __init_waitqueue_head+0x50/0x50 [ 494.787134] [] ? scsi_remove_target+0x48/0x48 [ 494.837900] [] async_synchronize_cookie+0x15/0x17 [ 494.891567] [] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete [ 494.943783] [] ? async_synchronize_full_domain+0x1a/0x1a [ 495.002547] [] sd_remove+0x2c/0xa2 [sd_mod] [ 495.051861] [] __device_release_driver+0x86/0xcf [ 495.104807] [] device_release_driver+0x25/0x32 <-- here we take device_lock() [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 [ 853.886633] Call Trace: [ 853.907631] [] schedule+0x5a/0x5c [ 853.949670] [] __mutex_lock_common+0x220/0x351 [ 854.001225] [] ? device_resume+0x58/0x1c4 [ 854.049082] [] ? device_resume+0x58/0x1c4 [ 854.097011] [] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() [ 854.145591] [] device_resume+0x58/0x1c4 [ 854.192066] [] async_resume+0x1e/0x45 [ 854.237019] [] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context Provide a 'scsi_sd_probe_domain' so that async probe actions actions can be flushed without regard for the state of PM, and allow for the resume path to handle devices that have transitioned from SDEV_QUIESCE to SDEV_DEL prior to resume. Acked-by: Alan Stern [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams [jejb: remove unneeded config guards in include file] Signed-off-by: James Bottomley * Error msg: ============ [ 4.582698] ata4.00: configured for UDMA/133 [ 4.587609] scsi 3:0:0:0: Direct-Access ATA WDC WD5001AALS-0 01.0 PQ: 0 ANSI: 5 [ 4.597471] sd 3:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB) [ 4.597750] sd 3:0:0:0: Attached scsi generic sg0 type 0 [ 4.599666] scsi 4:0:1:0: CD-ROM Optiarc DVD RW AD-7240S 1.01 PQ: 0 ANSI: 5 [ 4.602711] sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray [ 4.602714] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 4.603366] sr 4:0:1:0: Attached scsi CD-ROM sr0 [ 4.603922] sr 4:0:1:0: Attached scsi generic sg1 type 5 [ 4.604481] VFS: Cannot open root device "sda2" or unknown-block(0,0): error -6 [ 4.604484] Please append a correct "root=" boot option; here are the available partitions: [ 4.604501] 0b00 1048575 sr0 driver: sr [ 4.604506] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [ 4.604511] Pid: 1, comm: swapper/0 Tainted: G W 3.4.0+ #2 [ 4.604512] Call Trace: [ 4.604526] [] panic+0xbd/0x1c4 [ 4.604533] [] ? printk+0x4d/0x4f [ 4.604540] [] mount_block_root+0x251/0x26f [ 4.604545] [] mount_root+0x56/0x5a [ 4.604550] [] prepare_namespace+0x160/0x18d [ 4.604554] [] kernel_init+0x1eb/0x1fd [ 4.604560] [] ? loglevel+0x31/0x31 [ 4.604567] [] kernel_thread_helper+0x4/0x10 [ 4.604573] [] ? retint_restore_args+0xe/0xe [ 4.604577] [] ? start_kernel+0x2ee/0x2ee [ 4.604582] [] ? gs_change+0xb/0xb [ 4.605421] ------------[ cut here ]------------ [ 4.605428] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x2a/0x56() [ 4.605430] Hardware name: Dinar [ 4.605432] Modules linked in: [ 4.605436] Pid: 1, comm: swapper/0 Tainted: G W 3.4.0+ #2 [ 4.605437] Call Trace: [ 4.605446] [] warn_slowpath_common+0x85/0x9d [ 4.605451] [] warn_slowpath_null+0x1a/0x1c [ 4.605456] [] native_smp_send_reschedule+0x2a/0x56 [ 4.605463] [] trigger_load_balance+0x1ed/0x21a [ 4.605467] [] scheduler_tick+0xe9/0xf2 [ 4.605472] [] update_process_times+0x67/0x77 [ 4.605477] [] tick_sched_timer+0x72/0x91 [ 4.605481] [] __run_hrtimer+0xc3/0x17f [ 4.605486] [] ? tick_nohz_handler+0xd1/0xd1 [ 4.605490] [] hrtimer_interrupt+0xd4/0x197 [ 4.605497] [] smp_apic_timer_interrupt+0x86/0x99 [ 4.605501] [] apic_timer_interrupt+0x6c/0x80 [ 4.605510] [] ? delay_tsc+0x23/0x50 [ 4.605515] [] __delay+0xf/0x11 [ 4.605520] [] __const_udelay+0x29/0x2b [ 4.605525] [] native_stop_other_cpus+0x78/0x13d [ 4.605530] [] panic+0xcc/0x1c4 [ 4.605535] [] ? printk+0x4d/0x4f [ 4.605540] [] mount_block_root+0x251/0x26f [ 4.605544] [] mount_root+0x56/0x5a [ 4.605548] [] prepare_namespace+0x160/0x18d [ 4.605552] [] kernel_init+0x1eb/0x1fd [ 4.605557] [] ? loglevel+0x31/0x31 [ 4.605562] [] kernel_thread_helper+0x4/0x10 [ 4.605566] [] ? retint_restore_args+0xe/0xe [ 4.605570] [] ? start_kernel+0x2ee/0x2ee [ 4.605574] [] ? gs_change+0xb/0xb [ 4.605577] ---[ end trace 4eaa2a86a8e2da24 ]--- -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/