Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753203AbdHQPxh (ORCPT ); Thu, 17 Aug 2017 11:53:37 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:58479 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750857AbdHQPxf (ORCPT ); Thu, 17 Aug 2017 11:53:35 -0400 X-IronPort-AV: E=Sophos;i="5.41,388,1498492800"; d="scan'208";a="139551757" From: Bart Van Assche To: "linuxppc-dev@lists.ozlabs.org" , "abdhalee@linux.vnet.ibm.com" , "brking@linux.vnet.ibm.com" CC: "linux-kernel@vger.kernel.org" , "hch@lst.de" , "linux-scsi@vger.kernel.org" , "sfr@canb.auug.org.au" , "linux-next@vger.kernel.org" , "hare@suse.com" , "sachinp@linux.vnet.ibm.com" , "mpe@ellerman.id.au" Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc Thread-Topic: [BUG][bisected 270065e] linux-next fails to boot on powerpc Thread-Index: AQHTFrEsFAqRDNfhzkCHszoyuzdYEKKHOrsAgABjzoCAARXLgA== Date: Thu, 17 Aug 2017 15:52:45 +0000 Message-ID: <1502985161.2615.8.camel@wdc.com> References: <1502902815.3305.22.camel@abdul.in.ibm.com> <1502904072.2421.3.camel@wdc.com> <2f686064-3e32-df8d-134f-962b5181da9d@linux.vnet.ibm.com> In-Reply-To: <2f686064-3e32-df8d-134f-962b5181da9d@linux.vnet.ibm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [63.163.107.100] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;CY1PR0401MB1419;20:t5+glkCmGY6ATo8aFOJKdARiq2R8GEbuGptBeudDPBsTRO4JMnxF7kybZzYUO3ufRTbq/Ox0KD/VB9M2AGBabgCYLuh2oDqtX9dYL2UDPN3ejau/1ACgCNF1jA5uWPv00o+iWc+Efyv57PF4392vOVWOUMypBYIB0XOIaL5lU7w= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 17613262-c8ab-4e2f-2158-08d4e58800ea x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:CY1PR0401MB1419; x-ms-traffictypediagnostic: CY1PR0401MB1419: wdcipoutbound: EOP-TRUE x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(3002001)(6055026)(6041248)(20161123558100)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123564025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CY1PR0401MB1419;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CY1PR0401MB1419; x-forefront-prvs: 0402872DA1 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39860400002)(51234002)(24454002)(52314003)(199003)(377454003)(189002)(377424004)(54356999)(8936002)(53546010)(2201001)(72206003)(81166006)(81156014)(102836003)(3846002)(6116002)(6486002)(77096006)(305945005)(3660700001)(2906002)(33646002)(101416001)(189998001)(68736007)(14454004)(86362001)(2950100002)(229853002)(3280700002)(36756003)(50986999)(76176999)(97736004)(53936002)(478600001)(66066001)(2501003)(6512007)(105586002)(8676002)(103116003)(106356001)(2900100001)(6246003)(25786009)(6506006)(6436002)(7416002)(4326008)(99286003)(54906002)(5660300001)(7736002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0401MB1419;H:CY1PR0401MB1536.namprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <286DB63126610343B4578AC6D7A1728E@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Aug 2017 15:52:45.7427 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1419 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v7HFrhaQ018815 Content-Length: 3607 Lines: 67 On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote: > On 08/16/2017 12:21 PM, Bart Van Assche wrote: > > On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote: > > > As of next-20170809, linux-next on powerpc boot hung with below trace > > > message. > > > > > > [ ... ] > > > > > > A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq: > > > Always unprepare ...) in the merge branch 'scsi/for-next' > > > > > > System booted fine when the below commit is reverted: > > > > > > commit 270065e92c317845d69095ec8e3d18616b5b39d5 > > > Author: Bart Van Assche > > > Date: Thu Aug 3 14:40:14 2017 -0700 > > > > > > scsi: scsi-mq: Always unprepare before requeuing a request > > > > Hello Brian and Michael, > > > > Do you agree that this probably indicates a bug in the PowerPC block driver > > that is used to access the boot disk? Anyway, since a solution is not yet > > available, I will submit a revert for this patch. > > I've been looking at this a bit, and can recreate the issue, but haven't > got to root cause of the issue as of yet. If I do a sysrq-w while the system is hung > during boot I see this: > > [ 25.561523] Workqueue: events_unbound async_run_entry_fn > [ 25.561527] Call Trace: > [ 25.561529] [c0000001697873f0] [c000000169701600] 0xc000000169701600 (unreliable) > [ 25.561534] [c0000001697875c0] [c00000000001ab78] __switch_to+0x2e8/0x430 > [ 25.561539] [c000000169787620] [c00000000091ccb0] __schedule+0x310/0xa00 > [ 25.561543] [c0000001697876f0] [c00000000091d3e0] schedule+0x40/0xb0 > [ 25.561548] [c000000169787720] [c000000000921e40] schedule_timeout+0x200/0x430 > [ 25.561553] [c000000169787810] [c00000000091db10] io_schedule_timeout+0x30/0x70 > [ 25.561558] [c000000169787840] [c00000000091e978] wait_for_common_io.constprop.3+0x178/0x280 > [ 25.561563] [c0000001697878c0] [c00000000047f7ec] blk_execute_rq+0x7c/0xd0 > [ 25.561567] [c000000169787910] [c000000000614cd0] scsi_execute+0x100/0x230 > [ 25.561572] [c000000169787990] [c00000000060d29c] scsi_report_opcode+0xbc/0x170 > [ 25.561577] [c000000169787a50] [d000000004fe6404] sd_revalidate_disk+0xe04/0x1620 [sd_mod] > [ 25.561583] [c000000169787b80] [d000000004fe6d84] sd_probe_async+0xb4/0x230 [sd_mod] > [ 25.561588] [c000000169787c00] [c00000000010fc44] async_run_entry_fn+0x74/0x210 > [ 25.561593] [c000000169787c90] [c000000000102f48] process_one_work+0x198/0x480 > [ 25.561598] [c000000169787d30] [c0000000001032b8] worker_thread+0x88/0x510 > [ 25.561603] [c000000169787dc0] [c00000000010b030] kthread+0x160/0x1a0 > [ 25.561608] [c000000169787e30] [c00000000000b3a4] ret_from_kernel_thread+0x5c/0xb8 > > I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID arrays don't support > the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting sdev->no_report_opcodes = 1 > in ipr's slave configure. This seems to eliminate the boot hang for me, but is only working around > the issue. Since this command is not supported by ipr, it should return with an illegal request. > When I'm hung at this point, there is nothing outstanding to the adapter / driver. I'll continue > debugging... (+linux-scsi) Hello Brian, Is kernel debugging enabled on your test system? Is lockdep enabled? Anyway, stack traces like the above usually mean that a request got stuck in a block or scsi driver (ipr in this case). Information about pending requests, including the SCSI CDB, is available under /sys/kernel/debug/block (see also commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")). Bart.