Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1905142pxa; Mon, 3 Aug 2020 02:08:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzTiVHB8JCI5pznIMNz1EOXsWUguYW57yWWYDKLkbGha0uT3uDNdd3bKg5Ifqfsw5F0OjTA X-Received: by 2002:a17:906:960a:: with SMTP id s10mr15378600ejx.60.1596445699085; Mon, 03 Aug 2020 02:08:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596445699; cv=none; d=google.com; s=arc-20160816; b=E+GISYueVdcts+MxBeMz6HBd4EG9wkFv3LaQctD50W32uA3pufxP9XQGl9LfqdFPJu J8Z4t+/D0te9v0UYc1j1pn/rn52jPZgvSdZPzedYLtWXtFfHEKH2wnvdxLt7CHG/fT6p a6YHfjU96nazs8W1Ts8EN6UvQ3RRbUFJCHg97D7wStjJLEl2d3R90ERElEiwBsf3PYJg o9mDcZM09xmGiEeFhRLkhhSmXkjbvSBpt6C8p4hRSGrg7h/OWTwHVGXHohWSxgSbh7ve FAFMJq+cZ5gFXApjhpOjH8vwtvECbq9BV/p917Ri3XqeqJ3yzEyfNnmpK6AL8rfl6LLM pXEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr; bh=6xnfihZ6z5wBVICPnW5G4tnDqpuCDAk0kSc3qLGn1BI=; b=sxtPRUUJgKBp3+SDgBnYZ648tup7+Z9uME/gZupGU4g6Eg9QVnhZg3FQ6EUTS0vuih wmdOfVPe1JUhxuutKiD32DlCZ7iatUEm9iNCYm4QpVbJafOgIOQGqdpZgrXLtS27eCJf ZqK6tSnb0mrpdMhBjP7dPkrzOj9gimSflvhEg6kxvHeevVDQe0pEFEgxCnZlFN8URTeW pVFNl7nCrhJSw/CVTIYJcxW/o9ftKv1xklAUSpplZBLS0foUwJ4U7s4KX4yQmSmof9pY HY+buR10Zev/oyGKk7SYAJBZQfhx7sV/UdKqwJGZyONQsjNpSMaZUmNJ4J0P3q7fmE1M M18A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l8si9691646ejq.668.2020.08.03.02.07.56; Mon, 03 Aug 2020 02:08:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726814AbgHCJFR (ORCPT + 99 others); Mon, 3 Aug 2020 05:05:17 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:41081 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726785AbgHCJFQ (ORCPT ); Mon, 3 Aug 2020 05:05:16 -0400 IronPort-SDR: X+X4/D/CLdgaUp8sV+uPxmTskQbmi0w3ePUuYgZjkGvQevnzfOhLr4Y6xy/CiJ2r9+2Q6IlhVV 2xVOZm7hzLfpuWoYdlkSpTJw2vwiCKSb5Tdl8WpVxmUO2QCe5xnnfFMQt2Df2waFBBY0ifO6sH Oh31H4e6HQ70LFThWOs0rklt+L6+hKFiumZjsubCpHMk/wKhqbDk1T2//YOxaBym102BdPlblF lk4vFzmCGPR1PWigE/hT3/qMyWBOYa8wT+wQQu6wM75YfxHQHtHiA1xEnpDiV+kS696MmQjX8j dgE= X-IronPort-AV: E=Sophos;i="5.75,429,1589266800"; d="scan'208";a="29063954" Received: from unknown (HELO ironmsg04-sd.qualcomm.com) ([10.53.140.144]) by labrats.qualcomm.com with ESMTP; 03 Aug 2020 02:05:15 -0700 Received: from wsp769891wss.qualcomm.com (HELO stor-presley.qualcomm.com) ([192.168.140.85]) by ironmsg04-sd.qualcomm.com with ESMTP; 03 Aug 2020 02:05:15 -0700 Received: by stor-presley.qualcomm.com (Postfix, from userid 359480) id 99C5F214E4; Mon, 3 Aug 2020 02:05:15 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v9 8/9] scsi: ufs: Fix a racing problem btw error handler and runtime PM ops Date: Mon, 3 Aug 2020 02:04:43 -0700 Message-Id: <1596445485-19834-9-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1596445485-19834-1-git-send-email-cang@codeaurora.org> References: <1596445485-19834-1-git-send-email-cang@codeaurora.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Current IRQ handler blocks scsi requests before scheduling eh_work, when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a scsi cmd, most likely the SSU cmd, since scsi requests are blocked, pm_runtime_get_sync() will never return because ufshcd_suspend/reusme is blocked by the scsi cmd. Some changes and code re-arrangement can be made to resolve it. o In queuecommand path, hba->ufshcd_state check and ufshcd_send_command should stay into the same spin lock. This is to make sure that no more commands leak into doorbell after hba->ufshcd_state is changed. o Don't block scsi requests before scheduling eh_work, let error handler block scsi requests when it is ready to start error recovery. o Don't let scsi layer keep requeuing the scsi cmds sent from hba runtime PM ops, let them pass or fail them. Let them pass if eh_work is scheduled due to non-fatal errors. Fail them if eh_work is scheduled due to fatal errors, otherwise the cmds may eventually time out since UFS is in bad state, which gets error handler blocked for too long. If we fail the scsi cmds sent from hba runtime PM ops, hba runtime PM ops fails too, but it does not hurt since error handler can recover hba runtime PM error. Signed-off-by: Can Guo Reviewed-by: Bean Huo --- drivers/scsi/ufs/ufshcd.c | 84 ++++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 34 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index a79fbbd..d7d2758 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -126,7 +126,8 @@ enum { UFSHCD_STATE_RESET, UFSHCD_STATE_ERROR, UFSHCD_STATE_OPERATIONAL, - UFSHCD_STATE_EH_SCHEDULED, + UFSHCD_STATE_EH_SCHEDULED_FATAL, + UFSHCD_STATE_EH_SCHEDULED_NON_FATAL, }; /* UFSHCD error handling flags */ @@ -2515,34 +2516,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) if (!down_read_trylock(&hba->clk_scaling_lock)) return SCSI_MLQUEUE_HOST_BUSY; - spin_lock_irqsave(hba->host->host_lock, flags); - switch (hba->ufshcd_state) { - case UFSHCD_STATE_OPERATIONAL: - break; - case UFSHCD_STATE_EH_SCHEDULED: - case UFSHCD_STATE_RESET: - err = SCSI_MLQUEUE_HOST_BUSY; - goto out_unlock; - case UFSHCD_STATE_ERROR: - set_host_byte(cmd, DID_ERROR); - cmd->scsi_done(cmd); - goto out_unlock; - default: - dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n", - __func__, hba->ufshcd_state); - set_host_byte(cmd, DID_BAD_TARGET); - cmd->scsi_done(cmd); - goto out_unlock; - } - - /* if error handling is in progress, don't issue commands */ - if (ufshcd_eh_in_progress(hba)) { - set_host_byte(cmd, DID_ERROR); - cmd->scsi_done(cmd); - goto out_unlock; - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - hba->req_abort_count = 0; err = ufshcd_hold(hba, true); @@ -2578,11 +2551,50 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) /* Make sure descriptors are ready before ringing the doorbell */ wmb(); - /* issue command to the controller */ spin_lock_irqsave(hba->host->host_lock, flags); + switch (hba->ufshcd_state) { + case UFSHCD_STATE_OPERATIONAL: + case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL: + break; + case UFSHCD_STATE_EH_SCHEDULED_FATAL: + /* + * pm_runtime_get_sync() is used at error handling preparation + * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's + * PM ops, it can never be finished if we let SCSI layer keep + * retrying it, which gets err handler stuck forever. Neither + * can we let the scsi cmd pass through, because UFS is in bad + * state, the scsi cmd may eventually time out, which will get + * err handler blocked for too long. So, just fail the scsi cmd + * sent from PM ops, err handler can recover PM error anyways. + */ + if (hba->pm_op_in_progress) { + hba->force_reset = true; + set_host_byte(cmd, DID_BAD_TARGET); + goto out_compl_cmd; + } + case UFSHCD_STATE_RESET: + err = SCSI_MLQUEUE_HOST_BUSY; + goto out_compl_cmd; + case UFSHCD_STATE_ERROR: + set_host_byte(cmd, DID_ERROR); + goto out_compl_cmd; + default: + dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n", + __func__, hba->ufshcd_state); + set_host_byte(cmd, DID_BAD_TARGET); + goto out_compl_cmd; + } ufshcd_send_command(hba, tag); -out_unlock: spin_unlock_irqrestore(hba->host->host_lock, flags); + goto out; + +out_compl_cmd: + scsi_dma_unmap(lrbp->cmd); + lrbp->cmd = NULL; + spin_unlock_irqrestore(hba->host->host_lock, flags); + ufshcd_release(hba); + if (!err) + cmd->scsi_done(cmd); out: up_read(&hba->clk_scaling_lock); return err; @@ -5552,9 +5564,12 @@ static inline void ufshcd_schedule_eh_work(struct ufs_hba *hba) { /* handle fatal errors only when link is not in error state */ if (hba->ufshcd_state != UFSHCD_STATE_ERROR) { - hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED; - if (queue_work(hba->eh_wq, &hba->eh_work)) - ufshcd_scsi_block_requests(hba); + if (hba->force_reset || ufshcd_is_link_broken(hba) || + ufshcd_is_saved_err_fatal(hba)) + hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_FATAL; + else + hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_NON_FATAL; + queue_work(hba->eh_wq, &hba->eh_work); } } @@ -5664,6 +5679,7 @@ static void ufshcd_err_handler(struct work_struct *work) spin_unlock_irqrestore(hba->host->host_lock, flags); ufshcd_err_handling_prepare(hba); spin_lock_irqsave(hba->host->host_lock, flags); + ufshcd_scsi_block_requests(hba); /* * A full reset and restore might have happened after preparation * is finished, double check whether we should stop. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.