Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752499AbdHPXSe (ORCPT ); Wed, 16 Aug 2017 19:18:34 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43885 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752377AbdHPXSc (ORCPT ); Wed, 16 Aug 2017 19:18:32 -0400 Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc To: Bart Van Assche , "linuxppc-dev@lists.ozlabs.org" , "abdhalee@linux.vnet.ibm.com" Cc: "linux-kernel@vger.kernel.org" , "hch@lst.de" , "sfr@canb.auug.org.au" , "sachinp@linux.vnet.ibm.com" , "linux-next@vger.kernel.org" , "hare@suse.com" , "mpe@ellerman.id.au" References: <1502902815.3305.22.camel@abdul.in.ibm.com> <1502904072.2421.3.camel@wdc.com> From: Brian King Date: Wed, 16 Aug 2017 18:18:25 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <1502904072.2421.3.camel@wdc.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17081623-0008-0000-0000-0000026F9137 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007557; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000221; SDB=6.00903349; UDB=6.00452522; IPR=6.00683566; BA=6.00005537; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016729; XFM=3.00000015; UTC=2017-08-16 23:18:29 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17081623-0009-0000-0000-00003665DD08 Message-Id: <2f686064-3e32-df8d-134f-962b5181da9d@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-16_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1708160379 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3093 Lines: 61 On 08/16/2017 12:21 PM, Bart Van Assche wrote: > On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote: >> As of next-20170809, linux-next on powerpc boot hung with below trace >> message. >> >> [ ... ] >> >> A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq: >> Always unprepare ...) in the merge branch 'scsi/for-next' >> >> System booted fine when the below commit is reverted: >> >> commit 270065e92c317845d69095ec8e3d18616b5b39d5 >> Author: Bart Van Assche >> Date: Thu Aug 3 14:40:14 2017 -0700 >> >> scsi: scsi-mq: Always unprepare before requeuing a request > > Hello Brian and Michael, > > Do you agree that this probably indicates a bug in the PowerPC block driver > that is used to access the boot disk? Anyway, since a solution is not yet > available, I will submit a revert for this patch. I've been looking at this a bit, and can recreate the issue, but haven't got to root cause of the issue as of yet. If I do a sysrq-w while the system is hung during boot I see this: [ 25.561523] Workqueue: events_unbound async_run_entry_fn [ 25.561527] Call Trace: [ 25.561529] [c0000001697873f0] [c000000169701600] 0xc000000169701600 (unreliable) [ 25.561534] [c0000001697875c0] [c00000000001ab78] __switch_to+0x2e8/0x430 [ 25.561539] [c000000169787620] [c00000000091ccb0] __schedule+0x310/0xa00 [ 25.561543] [c0000001697876f0] [c00000000091d3e0] schedule+0x40/0xb0 [ 25.561548] [c000000169787720] [c000000000921e40] schedule_timeout+0x200/0x430 [ 25.561553] [c000000169787810] [c00000000091db10] io_schedule_timeout+0x30/0x70 [ 25.561558] [c000000169787840] [c00000000091e978] wait_for_common_io.constprop.3+0x178/0x280 [ 25.561563] [c0000001697878c0] [c00000000047f7ec] blk_execute_rq+0x7c/0xd0 [ 25.561567] [c000000169787910] [c000000000614cd0] scsi_execute+0x100/0x230 [ 25.561572] [c000000169787990] [c00000000060d29c] scsi_report_opcode+0xbc/0x170 [ 25.561577] [c000000169787a50] [d000000004fe6404] sd_revalidate_disk+0xe04/0x1620 [sd_mod] [ 25.561583] [c000000169787b80] [d000000004fe6d84] sd_probe_async+0xb4/0x230 [sd_mod] [ 25.561588] [c000000169787c00] [c00000000010fc44] async_run_entry_fn+0x74/0x210 [ 25.561593] [c000000169787c90] [c000000000102f48] process_one_work+0x198/0x480 [ 25.561598] [c000000169787d30] [c0000000001032b8] worker_thread+0x88/0x510 [ 25.561603] [c000000169787dc0] [c00000000010b030] kthread+0x160/0x1a0 [ 25.561608] [c000000169787e30] [c00000000000b3a4] ret_from_kernel_thread+0x5c/0xb8 I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID arrays don't support the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting sdev->no_report_opcodes = 1 in ipr's slave configure. This seems to eliminate the boot hang for me, but is only working around the issue. Since this command is not supported by ipr, it should return with an illegal request. When I'm hung at this point, there is nothing outstanding to the adapter / driver. I'll continue debugging... -Brian -- Brian King Power Linux I/O IBM Linux Technology Center