Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760722Ab2EWSFW (ORCPT ); Wed, 23 May 2012 14:05:22 -0400 Received: from shards.monkeyblade.net ([198.137.202.13]:60642 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752483Ab2EWSFU (ORCPT ); Wed, 23 May 2012 14:05:20 -0400 Date: Wed, 23 May 2012 14:04:51 -0400 (EDT) Message-Id: <20120523.140451.386112705611304887.davem@davemloft.net> To: mroos@linux.ee Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, dan.j.williams@intel.com, stern@rowland.harvard.edu, JBottomley@Parallels.com Subject: Re: 3.4.0-02580-g72c04af regression on sparc64 - partitions not recognized From: David Miller In-Reply-To: References: <20120522.151217.278388169416093561.davem@davemloft.net> X-Mailer: Mew version 6.4 on Emacs 23.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (shards.monkeyblade.net [198.137.202.13]); Wed, 23 May 2012 11:04:54 -0700 (PDT) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4832 Lines: 83 From: Meelis Roos Date: Wed, 23 May 2012 19:46:46 +0300 (EEST) CC:'ing interested parties. >> > Just tested 3.4.0-02580-g72c04af on about 10 machines. While most of >> > them work (including 3 different sparc64 machines with real scsi disks), >> > Sun Netra X1 with pata_ali and IDE disk consistently fails to boot. sda >> > is recognized but no partitions. 3.3.0 works fine, as did something >> > around 3.4-rc7 (plain 3.4 not tested yet). No other IDE machines tested >> > yet since I have none with remote console at the moment. >> >> If 3.4.0-final is OK, start bisecting from v3.4.0 until 72c04af. One >> possibility could be the sparc64 NOBOOTMEM conversion that went into >> the merge window. > > Bisecting leads to this commit: > > a7a20d103994fd760766e6c9d494daa569cbfe06 is the first bad commit > commit a7a20d103994fd760766e6c9d494daa569cbfe06 > Author: Dan Williams > Date: Thu Mar 22 17:05:11 2012 -0700 > > [SCSI] sd: limit the scope of the async probe domain > > sd injects and synchronizes probe work on the global kernel-wide domain. > This runs into conflict with PM that wants to perform resume actions in > async context: > > [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. > [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 > [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 > [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 > [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 > [ 494.611685] Call Trace: > [ 494.632649] [] schedule+0x5a/0x5c > [ 494.674687] [] async_synchronize_cookie_domain+0xb6/0x112 > [ 494.734177] [] ? __init_waitqueue_head+0x50/0x50 > [ 494.787134] [] ? scsi_remove_target+0x48/0x48 > [ 494.837900] [] async_synchronize_cookie+0x15/0x17 > [ 494.891567] [] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete > [ 494.943783] [] ? async_synchronize_full_domain+0x1a/0x1a > [ 495.002547] [] sd_remove+0x2c/0xa2 [sd_mod] > [ 495.051861] [] __device_release_driver+0x86/0xcf > [ 495.104807] [] device_release_driver+0x25/0x32 <-- here we take device_lock() > > [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. > [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 > [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 > [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 > [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 > [ 853.886633] Call Trace: > [ 853.907631] [] schedule+0x5a/0x5c > [ 853.949670] [] __mutex_lock_common+0x220/0x351 > [ 854.001225] [] ? device_resume+0x58/0x1c4 > [ 854.049082] [] ? device_resume+0x58/0x1c4 > [ 854.097011] [] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() > [ 854.145591] [] device_resume+0x58/0x1c4 > [ 854.192066] [] async_resume+0x1e/0x45 > [ 854.237019] [] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context > > Provide a 'scsi_sd_probe_domain' so that async probe actions actions can > be flushed without regard for the state of PM, and allow for the resume > path to handle devices that have transitioned from SDEV_QUIESCE to > SDEV_DEL prior to resume. > > Acked-by: Alan Stern > [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] > Signed-off-by: Dan Williams > [jejb: remove unneeded config guards in include file] > Signed-off-by: James Bottomley > > :040000 040000 4e59ccb852f261f97701a245e637a690dfce9d20 fc73ca0da1288a7f30b81a8593ddad2146d7bfb5 M drivers > > -- > Meelis Roos (mroos@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/