Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp316231pxj; Wed, 23 Jun 2021 23:14:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxx4Svqb9WKOtMN9MSgRaZvVPTR5Y038RsRPFg3xu/hmrVyrBVAwBCvJ8e7bYHK+Y46iIN3 X-Received: by 2002:aa7:c9d8:: with SMTP id i24mr4881413edt.79.1624515294552; Wed, 23 Jun 2021 23:14:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624515294; cv=none; d=google.com; s=arc-20160816; b=zeXMH2g/KPHZXIWCH7AU2oDKpJ2Vt1kzgWDn/ccJW54gxlAn/vaUwjmJiTcyLGXgxN 294bUcwsiQZh6RPRzUOqHJ7MCeFSl7NZkJLQWwl2B4LymaskjjDgeqcKi9VzMLtt0tFd J+ZH+XkWrFGeOQXBn4nm3Kj0OTpnnUU3FcQQ2DrNykjiJbZ/2PoOyqBWlMlNS1jBC2x5 qH3DdUM9gAAc4EFrHpnQqk7yiPLIyNkopLATk+x5tKsXqOcdRpDHkNkq1tlCMcPUzzlT yhhcc4raWp6eUMq2z8rq//2id0Zfo/7i8wdI9AeuUSjYVLkxyj80vpicBOOZmU4Ge8ON UEwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:message-id:references:in-reply-to :subject:cc:to:from:date:content-transfer-encoding:mime-version :sender:dkim-signature; bh=ZrMHDWlFZx5T/DA/qDlkCvHV/LEE6wAJDGPH7Xn/SU4=; b=ioKU0gLGmdkEeGhb0Q/SOkOfS4ZmBsuWcGoogJDnb2Gbe/59XKMKmcZZu93suhItTC FezUd35BCTC8p3+nyY6MBJCUs7LfsoVXLBb+YkW47vshaPp0osOI8njBgCU+zkaLQtML c1fje8CGb6xCcPn79atkLMRGOSk2zhh2MQbDVPmIqldoc3PMZ1PeVXT+7M+7HSC7f+hA U2Y318qdhIZlTWaOMv0jCaUWcU+8y4KrbV6RmVPwCVNRUYGUWmkI6IcEOjDCUs9rA4T2 LLnG0nbmNVmEMtwdTffo/E4+C4mlLqCumfG0+cjXEi+ldfH0vQddrMOwPFlpCwcBY1+X sDoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=d9+fkVvz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id en8si1858792ejc.744.2021.06.23.23.14.31; Wed, 23 Jun 2021 23:14:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=d9+fkVvz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231144AbhFXGPL (ORCPT + 99 others); Thu, 24 Jun 2021 02:15:11 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:33504 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231143AbhFXGPK (ORCPT ); Thu, 24 Jun 2021 02:15:10 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1624515172; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=ZrMHDWlFZx5T/DA/qDlkCvHV/LEE6wAJDGPH7Xn/SU4=; b=d9+fkVvzP5b6yCzDJ82ypf1el+IHjw2Vm/z+05zW9/A0VTbFElsl6TCbADqkEs452f7OCI4Q oK0O0F1+CR/4ejdNH/NFa5MwikprSdu8bIH362DTIPaQljdpGKGZaLisBJ7kIXqJrdhkTkPH 5gjO3f+cJ0MHb+9Y74/pjYrIu2Q= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n05.prod.us-east-1.postgun.com with SMTP id 60d42251dc4628fe7e8d8072 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Thu, 24 Jun 2021 06:12:33 GMT Sender: cang=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id A806BC43217; Thu, 24 Jun 2021 06:12:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cang) by smtp.codeaurora.org (Postfix) with ESMTPSA id 4DBDAC433D3; Thu, 24 Jun 2021 06:12:31 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 24 Jun 2021 14:12:31 +0800 From: Can Guo To: Adrian Hunter Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, ziqichen@codeaurora.org, linux-scsi@vger.kernel.org, kernel-team@android.com, Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Jaegeuk Kim , open list Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume In-Reply-To: References: <1624433711-9339-1-git-send-email-cang@codeaurora.org> <1624433711-9339-8-git-send-email-cang@codeaurora.org> <9105f328ee6ce916a7f01027b0d28332@codeaurora.org> Message-ID: <1b351766a6e40d0df90b3adec964eb33@codeaurora.org> X-Sender: cang@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-06-24 13:52, Adrian Hunter wrote: > On 24/06/21 5:16 am, Can Guo wrote: >> On 2021-06-23 22:30, Adrian Hunter wrote: >>> On 23/06/21 10:35 am, Can Guo wrote: >>>> To protect system suspend/resume from being disturbed by error >>>> handling, >>>> instead of using host_sem, let error handler call >>>> lock_system_sleep() and >>>> unlock_system_sleep() which achieve the same purpose. Remove the >>>> host_sem >>>> used in suspend/resume paths to make the code more readable. >>>> >>>> Suggested-by: Bart Van Assche >>>> Signed-off-by: Can Guo >>>> --- >>>>  drivers/scsi/ufs/ufshcd.c | 12 +++++++----- >>>>  1 file changed, 7 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >>>> index 3695dd2..a09e4a2 100644 >>>> --- a/drivers/scsi/ufs/ufshcd.c >>>> +++ b/drivers/scsi/ufs/ufshcd.c >>>> @@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct >>>> ufs_hba *hba, bool suspend) >>>> >>>>  static void ufshcd_err_handling_prepare(struct ufs_hba *hba) >>>>  { >>>> +    /* >>>> +     * It is not safe to perform error handling while suspend or >>>> resume is >>>> +     * in progress. Hence the lock_system_sleep() call. >>>> +     */ >>>> +    lock_system_sleep(); >>> >>> It looks to me like the system takes this lock quite early, even >>> before >>> freezing tasks, so if anything needs the error handler to run it will >>> deadlock. >> >> Hi Adrian, >> >> UFS/hba system suspend/resume does not invoke or call error handling >> in a >> synchronous way. So, whatever UFS errors (which schedules the error >> handler) >> happens during suspend/resume, error handler will just wait here till >> system >> suspend/resume release the lock. Hence no worries of deadlock here. > > It looks to me like the state can change to > UFSHCD_STATE_EH_SCHEDULED_FATAL > and since user processes are not frozen, nor file systems sync'ed, > everything > is going to deadlock. > i.e. > I/O is blocked waiting on error handling > error handling is blocked waiting on lock_system_sleep() > suspend is blocked waiting on I/O > Hi Adrian, First of all, enter_state(suspend_state_t state) uses mutex_trylock(&system_transition_mutex). Second, even that happens, in ufshcd_queuecommand(), below logic will break the cycle, by fast failing the PM request (below codes are from the code tip with this whole series applied). case UFSHCD_STATE_EH_SCHEDULED_FATAL: /* * ufshcd_rpm_get_sync() is used at error handling preparation * stage. If a scsi cmd, e.g., the SSU cmd, is sent from the * PM ops, it can never be finished if we let SCSI layer keep * retrying it, which gets err handler stuck forever. Neither * can we let the scsi cmd pass through, because UFS is in bad * state, the scsi cmd may eventually time out, which will get * err handler blocked for too long. So, just fail the scsi cmd * sent from PM ops, err handler can recover PM error anyways. */ if (cmd->request->rq_flags & RQF_PM) { hba->force_reset = true; set_host_byte(cmd, DID_BAD_TARGET); cmd->scsi_done(cmd); goto out; } fallthrough; case UFSHCD_STATE_RESET: Thanks, Can Guo. >> >> Thanks, >> >> Can Guo. >> >>> >>>>      ufshcd_rpm_get_sync(hba); >>>>      if >>>> (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) || >>>>          hba->is_wlu_sys_suspended) { >>>> @@ -5951,6 +5956,7 @@ static void >>>> ufshcd_err_handling_unprepare(struct ufs_hba *hba) >>>>          ufshcd_clk_scaling_suspend(hba, false); >>>>      ufshcd_clear_ua_wluns(hba); >>>>      ufshcd_rpm_put(hba); >>>> +    unlock_system_sleep(); >>>>  } >>>> >>>>  static inline bool ufshcd_err_handling_should_stop(struct ufs_hba >>>> *hba) >>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device >>>> *dev) >>>>      ktime_t start = ktime_get(); >>>> >>>>      hba = shost_priv(sdev->host); >>>> -    down(&hba->host_sem); >>>> >>>>      if (pm_runtime_suspended(dev)) >>>>          goto out; >>>> >>>>      ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM); >>>> -    if (ret) { >>>> +    if (ret) >>>>          dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  >>>> ret); >>>> -        up(&hba->host_sem); >>>> -    } >>>> >>>>  out: >>>>      if (!ret) >>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device >>>> *dev) >>>>          hba->curr_dev_pwr_mode, hba->uic_link_state); >>>>      if (!ret) >>>>          hba->is_wlu_sys_suspended = false; >>>> -    up(&hba->host_sem); >>>>      return ret; >>>>  } >>>>  #endif >>>>