Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp324012pxj; Wed, 23 Jun 2021 23:27:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0CW+nmxpl/vCGwIMlDWuD8EAYl7lIswfrkpjCYig3J5/nkZTjx3pJE4V5v7qx1HF1wH5R X-Received: by 2002:a17:907:628d:: with SMTP id nd13mr3558004ejc.299.1624516074754; Wed, 23 Jun 2021 23:27:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624516074; cv=none; d=google.com; s=arc-20160816; b=FgmHRM6vk+8SkOfBNGC4icAwiDCWDKvg6W6/rg2NQvkaWVdfMResGlACLzLe6Ks6IW SrSCNnC5JAkF84ESpVZbikSVGRjCdT36jNjPZq+uhLRnGAK/l0WlJGEX0xsLpzmtTaue kTtRVYKCz99y99n0/hXDiHkiVGUenDVTcTPYut54tVzqzgMMReC1MNuZjzx6Va4ftOeM fCqUH8E4d2MVOTfyZCIKsRvcPruWb+uG7yHUMXeS44zXzxHqh5XFvDyPf84SNxISt6Fw YRRXTv+Oad8ez1751JQFo2mmcIWZMh67YU51RLKpqTXW9w7oiXUmTRGxOo4Ib3+I3zJc 1z1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=urrJUDM7O9Vpeoj8vj+3bxcEgoMCqhxHv41c86PcQzA=; b=qZOVV9XuURQ3somP47tVoAnVf5ot8rqjtW8bv2t+6L05L+AIWhX2mfxx5s2z5PVQP2 e6KWZaU2abfV8BoXgnceh/avH3SKtE9R0HIOETX2RHaRflPwskDzSrvS4b9kFSq/6+j9 0MqdMtfbkdQOPSfivZ4uB6BGMf+srQOgMvLRL1+ddyXkVHe0RU8F4zl5oJ3200MPIuRJ xqvhtWgWuTrk9uBe6XCrPCVwWHbfuGkf7rA2Dp26sdNNT4qsF1sMRnTQRd0Z7J439hf/ QrkpNawHW/hcRcqxkL5HPXNtEqmSlmOrvOnTqFuRkNk40H7pzadPuk0imUaFz34Vrhke Pa4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e12si1879220ejy.258.2021.06.23.23.27.31; Wed, 23 Jun 2021 23:27:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231202AbhFXGZy (ORCPT + 99 others); Thu, 24 Jun 2021 02:25:54 -0400 Received: from mga09.intel.com ([134.134.136.24]:2806 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbhFXGZy (ORCPT ); Thu, 24 Jun 2021 02:25:54 -0400 IronPort-SDR: SLLbEE+Ea17IepTLI06qJ5tV4vBBhez3TO+z26X/l9U295URwD3pVoM1VoEVE6sIsZkZmyVjvi 8fSwFBYqfxJg== X-IronPort-AV: E=McAfee;i="6200,9189,10024"; a="207343519" X-IronPort-AV: E=Sophos;i="5.83,295,1616482800"; d="scan'208";a="207343519" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 23:23:35 -0700 IronPort-SDR: 0GM1jq08tDDsOHBthDFTFJ2O1gR26tMygfUWr4icSMXxMkIC1RiRZWaqtafW83c1rXIyIMZhUD 7EaRDy2KU5CQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,295,1616482800"; d="scan'208";a="406539271" Received: from ahunter-desktop.fi.intel.com (HELO [10.237.72.79]) ([10.237.72.79]) by orsmga006.jf.intel.com with ESMTP; 23 Jun 2021 23:23:31 -0700 Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume To: Can Guo Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, ziqichen@codeaurora.org, linux-scsi@vger.kernel.org, kernel-team@android.com, Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Jaegeuk Kim , open list References: <1624433711-9339-1-git-send-email-cang@codeaurora.org> <1624433711-9339-8-git-send-email-cang@codeaurora.org> <9105f328ee6ce916a7f01027b0d28332@codeaurora.org> <1b351766a6e40d0df90b3adec964eb33@codeaurora.org> From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Message-ID: Date: Thu, 24 Jun 2021 09:23:52 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <1b351766a6e40d0df90b3adec964eb33@codeaurora.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24/06/21 9:12 am, Can Guo wrote: > On 2021-06-24 13:52, Adrian Hunter wrote: >> On 24/06/21 5:16 am, Can Guo wrote: >>> On 2021-06-23 22:30, Adrian Hunter wrote: >>>> On 23/06/21 10:35 am, Can Guo wrote: >>>>> To protect system suspend/resume from being disturbed by error handling, >>>>> instead of using host_sem, let error handler call lock_system_sleep() and >>>>> unlock_system_sleep() which achieve the same purpose. Remove the host_sem >>>>> used in suspend/resume paths to make the code more readable. >>>>> >>>>> Suggested-by: Bart Van Assche >>>>> Signed-off-by: Can Guo >>>>> --- >>>>>  drivers/scsi/ufs/ufshcd.c | 12 +++++++----- >>>>>  1 file changed, 7 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >>>>> index 3695dd2..a09e4a2 100644 >>>>> --- a/drivers/scsi/ufs/ufshcd.c >>>>> +++ b/drivers/scsi/ufs/ufshcd.c >>>>> @@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend) >>>>> >>>>>  static void ufshcd_err_handling_prepare(struct ufs_hba *hba) >>>>>  { >>>>> +    /* >>>>> +     * It is not safe to perform error handling while suspend or resume is >>>>> +     * in progress. Hence the lock_system_sleep() call. >>>>> +     */ >>>>> +    lock_system_sleep(); >>>> >>>> It looks to me like the system takes this lock quite early, even before >>>> freezing tasks, so if anything needs the error handler to run it will >>>> deadlock. >>> >>> Hi Adrian, >>> >>> UFS/hba system suspend/resume does not invoke or call error handling in a >>> synchronous way. So, whatever UFS errors (which schedules the error handler) >>> happens during suspend/resume, error handler will just wait here till system >>> suspend/resume release the lock. Hence no worries of deadlock here. >> >> It looks to me like the state can change to UFSHCD_STATE_EH_SCHEDULED_FATAL >> and since user processes are not frozen, nor file systems sync'ed, everything >> is going to deadlock. >> i.e. >> I/O is blocked waiting on error handling >> error handling is blocked waiting on lock_system_sleep() >> suspend is blocked waiting on I/O >> > > Hi Adrian, > > First of all, enter_state(suspend_state_t state) uses mutex_trylock(&system_transition_mutex). Yes, in the case I am outlining it gets the mutex. > Second, even that happens, in ufshcd_queuecommand(), below logic will break the cycle, by > fast failing the PM request (below codes are from the code tip with this whole series applied). It won't get that far because the suspend will be waiting to sync filesystems. Filesystems will be waiting on I/O. I/O will be waiting on the error handler. The error handler will be waiting on system_transition_mutex. But system_transition_mutex is already held by PM core. > >         case UFSHCD_STATE_EH_SCHEDULED_FATAL: >                 /* >                  * ufshcd_rpm_get_sync() is used at error handling preparation >                  * stage. If a scsi cmd, e.g., the SSU cmd, is sent from the >                  * PM ops, it can never be finished if we let SCSI layer keep >                  * retrying it, which gets err handler stuck forever. Neither >                  * can we let the scsi cmd pass through, because UFS is in bad >                  * state, the scsi cmd may eventually time out, which will get >                  * err handler blocked for too long. So, just fail the scsi cmd >                  * sent from PM ops, err handler can recover PM error anyways. >                  */ >                 if (cmd->request->rq_flags & RQF_PM) { >                         hba->force_reset = true; >                         set_host_byte(cmd, DID_BAD_TARGET); >                         cmd->scsi_done(cmd); >                         goto out; >                 } >                 fallthrough; >         case UFSHCD_STATE_RESET: > > Thanks, > > Can Guo. > >>> >>> Thanks, >>> >>> Can Guo. >>> >>>> >>>>>      ufshcd_rpm_get_sync(hba); >>>>>      if (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) || >>>>>          hba->is_wlu_sys_suspended) { >>>>> @@ -5951,6 +5956,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba) >>>>>          ufshcd_clk_scaling_suspend(hba, false); >>>>>      ufshcd_clear_ua_wluns(hba); >>>>>      ufshcd_rpm_put(hba); >>>>> +    unlock_system_sleep(); >>>>>  } >>>>> >>>>>  static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba) >>>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device *dev) >>>>>      ktime_t start = ktime_get(); >>>>> >>>>>      hba = shost_priv(sdev->host); >>>>> -    down(&hba->host_sem); >>>>> >>>>>      if (pm_runtime_suspended(dev)) >>>>>          goto out; >>>>> >>>>>      ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM); >>>>> -    if (ret) { >>>>> +    if (ret) >>>>>          dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  ret); >>>>> -        up(&hba->host_sem); >>>>> -    } >>>>> >>>>>  out: >>>>>      if (!ret) >>>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device *dev) >>>>>          hba->curr_dev_pwr_mode, hba->uic_link_state); >>>>>      if (!ret) >>>>>          hba->is_wlu_sys_suspended = false; >>>>> -    up(&hba->host_sem); >>>>>      return ret; >>>>>  } >>>>>  #endif >>>>>