Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754714AbcCCHXQ (ORCPT ); Thu, 3 Mar 2016 02:23:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:36644 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751321AbcCCHXO (ORCPT ); Thu, 3 Mar 2016 02:23:14 -0500 Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler To: ygardi@codeaurora.org References: <1456666367-11418-1-git-send-email-ygardi@codeaurora.org> <1456666367-11418-4-git-send-email-ygardi@codeaurora.org> <56D544E6.8040005@suse.de> <2b8282aad0b3edfaf873628edf03513d.squirrel@us.codeaurora.org> Cc: james.bottomley@hansenpartnership.com, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-arm-msm@vger.kernel.org, santoshsy@gmail.com, linux-scsi-owner@vger.kernel.org, Gilad Broner , Vinayak Holikatti , "James E.J. Bottomley" , "Martin K. Petersen" From: Hannes Reinecke X-Enigmail-Draft-Status: N1110 Message-ID: <56D7E652.90401@suse.de> Date: Thu, 3 Mar 2016 15:22:58 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <2b8282aad0b3edfaf873628edf03513d.squirrel@us.codeaurora.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3242 Lines: 83 On 03/01/2016 09:25 PM, ygardi@codeaurora.org wrote: >> On 02/28/2016 09:32 PM, Yaniv Gardi wrote: >>> A race condition exists between request requeueing and scsi layer >>> error handling: >>> When UFS driver queuecommand returns a busy status for a request, >>> it will be requeued and its tag will be freed and set to -1. >>> At the same time it is possible that the request will timeout and >>> scsi layer will start error handling for it. The scsi layer reuses >>> the request and its tag to send error related commands to the device, >>> however its tag is no longer valid. >> Hmm. How can the host return a 'busy' status for a request? >> From my understanding we have three possibilities: >> >> 1) queuecommand returns busy; however, that means that the command has >> never been send and this issue shouldn't occur >> 2) The command returns with BUSY status. But in this case it has already >> been returned, so there cannot be any timeout coming in. >> 3) The host receives a command with a tag which is already in-use. >> However, that should have been prevented by the block-layer, which >> really should ensure that this situation never happens. >> >> So either way I look at it, it really looks like a bug and adding a >> timeout handler will just paper over it. >> (Not that a timeout handler is a bad idea, in fact I'm convinced that >> you need one. Just not for this purpose.) >> >> So can you elaborate how this 'busy' status comes about? >> Is the command sent to the device? >> >> Cheers, >> >> Hannes > > > Hi Hannes, > > it's going to be a bit long :) > I think you are missing the point. > I will describe a race condition happened to us a while ago, that was > quite difficult to understand and fix. > So, this patch is not about the "busy" returning to the scsi dispatch > routine. it's about the abort triggered after 30 seconds. > > imagine a request being queued and sent to the scsi, and then to the ufs. > a timer, initialized to 30 seconds start ticking. > but the request is never sent to the ufs device, as queuecommand() returns > with "SCSI_MLQUEUE_HOST_BUSY" > by looking at the code, this could happen, for example: > err = ufshcd_hold(hba, true); > if (err) { > err = SCSI_MLQUEUE_HOST_BUSY; > goto out; > } > Uuhhh. You probably should not have pointed me to that piece of code ... open-coding loops in ufshcd_hold() ... shudder. (Did I ever review that one? Must've ...) _Anyway_: sleeping in queuecommand is always a bad idea, as then precisely those issues you've just described will happen. Couldn't you just call ufshcd_hold(hba, false) instead of ufshcd_hold(hba, true) ? The request will be requeued more-or-less immediately, avoiding the issue with timeout handler kicking in. And the queue will remain blocked until the ungate work item returns, at which point I/O submission will continue. As the request will be requeued to the head of the queue there won't be other I/O competing with tags, so it shouldn't have any adverse effects. Wouldn't that work? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)