Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4615438ybl; Mon, 3 Feb 2020 22:30:45 -0800 (PST) X-Google-Smtp-Source: APXvYqxZdUPxjaSNIAklabe5HcLAo++PffC3F4CzHeleywYipdXdJu4+B1I3Va3+pWEVO/S2cSWC X-Received: by 2002:a05:6830:1db3:: with SMTP id z19mr21468956oti.292.1580797844855; Mon, 03 Feb 2020 22:30:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580797844; cv=none; d=google.com; s=arc-20160816; b=Ey1uPCEbDlrA6BK0ViRdDkxgDuChM0XF2tOZle6GaTAaWj67qOxekvVkCTBKztM2iR qyo8AMHniBr8I2co5+wSJvNWfr0CLfDW+4IWy97ZK3quWPkypXq1RCVu0Pq5HQq382kv I/kodtnfXJkMipPjhQnC3PByiBGPupmIze3uT/bSOPzkwWooqfneoVwz3eI3BnoZ3MJu lxOsmKjbmvJyxs7TEYXB49rmTHgZd+h9yUZvruk9ZCQYetWRVM8Q9EAv0Mzo3DROI4A2 VerAvmTIAxsQ+frsJ1sSMjKVBAvowhPfyMuy+wbodakkkUDA9ZlHSKEZylCIL/Sb7DF8 e6+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature; bh=+hoX44i3CFQrj6wb5kYegFGpcc1xGahWSjifEiJeY60=; b=Rt5G7M6riHX8X6J4ddcI7XH5di3RqGbzjCFRta0m6WEofZoJkiUoiP/We9bgVl0T5m s97HOEGhS5yI7ochlRl7S9++57++BEldZ5ctIro75sGotFW4Q233ma9uYb6Ze9zAOiDL 2MZ0uQgyKGnSRzldbbEVo5sKUjLnuA/Ktpbt5nzjLRAb6GE0HGtva1Kipt1BBe8GWAVz ss9oWfexqinpCiVKR/kdLU/E0arR/ZvwQNZqgn3oJuvWvJdWEe1BDGj+Sx7P4Esj11ED +L5rRDeseByKFGfUtaVjON5CTqP7vmMfVOR/w3JSZKNY5y1svJ5PzdHSfW7mkMrYtGCT pSHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=sGNPlcdI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m26si10405742otr.159.2020.02.03.22.30.32; Mon, 03 Feb 2020 22:30:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=sGNPlcdI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726552AbgBDG2b (ORCPT + 99 others); Tue, 4 Feb 2020 01:28:31 -0500 Received: from mail25.static.mailgun.info ([104.130.122.25]:44691 "EHLO mail25.static.mailgun.info" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725834AbgBDG2b (ORCPT ); Tue, 4 Feb 2020 01:28:31 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1580797709; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=+hoX44i3CFQrj6wb5kYegFGpcc1xGahWSjifEiJeY60=; b=sGNPlcdI2Ed4QfgoD7C1/kUDWz6PvZQQAcaIAw67g11ne/3ldqSNiDlVYN/Sce9Xi5r+TS+5 iCaUGWXEjR06cb/go+EsdmckZJialqBC/oIlRKIDyFZg9LvKHdQfWjbRgIcXUvkq6p8ZecXs eS3h6a2ph8bxj+WyhDtg4R4anq4= X-Mailgun-Sending-Ip: 104.130.122.25 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by mxa.mailgun.org with ESMTP id 5e390f0c.7f99b9c4b768-smtp-out-n03; Tue, 04 Feb 2020 06:28:28 -0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 1321FC447A5; Tue, 4 Feb 2020 06:28:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=2.0 tests=ALL_TRUSTED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cang) by smtp.codeaurora.org (Postfix) with ESMTPSA id ED653C433CB; Tue, 4 Feb 2020 06:28:24 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 04 Feb 2020 14:28:24 +0800 From: Can Guo To: Bart Van Assche Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, Sayali Lokhande , Alim Akhtar , Avri Altman , Pedro Sousa , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Venkat Gopalakrishnan , Tomas Winkler , open list Subject: Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend In-Reply-To: <12716695-d9a3-a40c-e563-fa0365183b0e@acm.org> References: <1579764349-15578-1-git-send-email-cang@codeaurora.org> <1579764349-15578-2-git-send-email-cang@codeaurora.org> <525e4f67-f471-54a6-aaea-b3772a550af1@acm.org> <82723efc44714e8677505cb7999d3fd5@codeaurora.org> <12716695-d9a3-a40c-e563-fa0365183b0e@acm.org> Message-ID: <4f9017b412139762fdda8c8d1741ae7b@codeaurora.org> X-Sender: cang@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-02-04 11:12, Bart Van Assche wrote: > On 2020-02-02 22:23, Can Guo wrote: >> On 2020-01-26 11:29, Bart Van Assche wrote: >>> On 2020-01-22 23:25, Can Guo wrote: >>>>              break; >>>>          case UPIU_TRANSACTION_REJECT_UPIU: >>>>              /* TODO: handle Reject UPIU Response */ >>>> @@ -5215,7 +5222,14 @@ static void >>>> ufshcd_exception_event_handler(struct work_struct *work) >>>> >>>>  out: >>>>      scsi_unblock_requests(hba->host); >>>> -    pm_runtime_put_sync(hba->dev); >>>> +    /* >>>> +     * pm_runtime_get_noresume is called while scheduling >>>> +     * eeh_work to avoid suspend racing with exception work. >>>> +     * Hence decrement usage counter using pm_runtime_put_noidle >>>> +     * to allow suspend on completion of exception event handler. >>>> +     */ >>>> +    pm_runtime_put_noidle(hba->dev); >>>> +    pm_runtime_put(hba->dev); >>>>      return; >>>>  } >>>> >>>> @@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, >>>> enum ufs_pm_op pm_op) >>>>              goto enable_gating; >>>>      } >>>> >>>> +    flush_work(&hba->eeh_work); >>>>      ret = ufshcd_link_state_transition(hba, req_link_state, 1); >>>>      if (ret) >>>>          goto set_dev_active; >>> >>> I think this patch introduces a new race condition, namely the >>> following: >>> - ufshcd_slave_destroy() tests pm_op_in_progress and reads the value >>>   zero from that variable. >>> - ufshcd_suspend() sets hba->pm_op_in_progress to one. >>> - ufshcd_slave_destroy() calls schedule_work(). >>> >>> How about fixing this race condition by calling >>> pm_runtime_get_noresume() before checking pm_op_in_progress and by >>> reallowing resume if no work is scheduled? >> >> If you apply this patch, you will find the change is not in >> ufshcd_slave_destroy(), but in ufshcd_transfer_rsp_status(). >> So the racing you mentioned above does not exist. > > Hi Can, > > Apparently I got a function name wrong. Can the following race > condition > happen: > - ufshcd_transfer_rsp_status() tests pm_op_in_progress and reads the > value zero from that variable. > - ufshcd_suspend() sets hba->pm_op_in_progress to one. > - ufshcd_suspend() calls flush_work(&hba->eeh_work). > - ufshcd_transfer_rsp_status() calls schedule_work(&hba->eeh_work). > > Thanks, > > Bart. Hi Bart, The sequence you mentioned is not possible. In normal cases, before ufshcd_transfer_rsp_status() returns, ufshcd_suspend() would not be called (unless you intentionally call ufshcd_suspend() to screw it). Because ufshcd_transfer_rsp_status() is called from __ufshcd_transfer_req_compl(), which is being used by either UFS IRQ handler or err handler. Meanwhile, in __ufshcd_transfer_req_compl(), scsi_done() is called only after ufshcd_transfer_rsp_status() returns. When we are here, it means UFS driver is still handling requests/tasks, so suspend would not kick start at this moment, either runtime suspend or system suspend. And this is why below lines work, calling pm_runtime_get_noresume() within ufshcd_transfer_rsp_status() can prevent runtime suspend from happening after we leave ufshcd_transfer_rsp_status(). + if (schedule_work(&hba->eeh_work)) + pm_runtime_get_noresume(hba->dev); Thanks, Can Guo.