Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp732555rwb; Fri, 18 Nov 2022 07:34:04 -0800 (PST) X-Google-Smtp-Source: AA0mqf5zLtm8ocsRq01JIBCh+bCrilppek2J7vSC4ytafiae9F+JCCrC9Ba22yEozx+OQgqSC+A/ X-Received: by 2002:a17:902:eac2:b0:186:b91b:17ec with SMTP id p2-20020a170902eac200b00186b91b17ecmr155382pld.10.1668785644719; Fri, 18 Nov 2022 07:34:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668785644; cv=none; d=google.com; s=arc-20160816; b=0xY2YEScpQ2W4qW1bIFKD39oelRtDPT4I1ZFViRsmxYUWoVZLhzeG8HpTocvcfCFHd x+4FuTZjW/aSsSl5NeXepf6QQt7mRb1sMOVRuD9VKmEArzE44OdfIEMYmk7U88jvjbj8 oAqmYXJgqDWMoPyWtfeiaKluWYWjZ8NRfEbJN5DPeKnbe1D1CpXsYSXbbXz1zMjJEus3 MA8m13UCyM2dwV7pkYFWcw+t7Y2z6jz6o/972VJgxF/vzuta0h4QOm4sdTv2xCdeT//X EMxyHQU2cBCyQ/FI5Z65RoAhbU2YlA0kbcivR0L9bMgDQyRu3rMMVnPugGqegCq1aa83 P5kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=jloyVuDJC8K1YHz6W+GjwGv6JZ9PiN4CEftYlQQwWCk=; b=XE7iHCH/u/kFL2qnhOvAO+gFts2dqe+vZjn2ko1qYBBmRi1m+Fsj4JOUqrTSeZuan+ kAGnqXDNQbAFEWPe686h5khg6QtvHacukvgQbYWXKhFFdaGD08BOim+6Jtf7tHxI6YRQ fOZjBI0egoQA01uTJj5wZmdwSzdeMnxI4AQLlrhVAXZEfGDQbtoEP90NLfowgpCEE6Wo FmdcNZxYvOh5opPxjl+0SfGoiemnJNODtGF3/dCY7kZu/hQaZn9m4Nib6TwNl2GNtrBL WUNdv05JUE2SdVi2z3RXvZUeiO3VKV67zmKF5RigymNhjypJ8nQm5x3urWSaf8kbPP9B 4Ssg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RIlc8WCC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cp7-20020a170902e78700b00176c59140c5si3838366plb.138.2022.11.18.07.33.51; Fri, 18 Nov 2022 07:34:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RIlc8WCC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242353AbiKRO6f (ORCPT + 92 others); Fri, 18 Nov 2022 09:58:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242101AbiKRO5j (ORCPT ); Fri, 18 Nov 2022 09:57:39 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79B0A93732; Fri, 18 Nov 2022 06:56:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668783364; x=1700319364; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=0OtRaUHayEMkGM0LZpQdloZpzT7G+P2JPi1mC2cyIyg=; b=RIlc8WCCUfEX1Rj3CNuGIT7/m4bg3rZ5vuiVVbU5exbh5hhAAbCVUYsX 89J1BuALE+hwlrYzkS6TP3nX25DUJJ3waFzAyFPzLUcFnyHtjI/5pHzio oKaQ4KTKMPvYvAGJdo6T3QSm/yWy899eYjPPEub4Le7j+2jJVa2SLwh6R hGSU+JqNMrCwpu5JpqLmI0TLv7mXtXBTBqYh9vIuKrAHngEEj9dgWmJWc 3bietnlDki07eXtfpYv4lQ1DRtJCwnIPZ1YA1ii4PmM3YGWB2vJ5L1lN/ oNwwVKnED5wZvpylPdTnX98kMh+c9f35bdAewVp1feer8gqd9eVA6zo+H Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10535"; a="293539780" X-IronPort-AV: E=Sophos;i="5.96,174,1665471600"; d="scan'208";a="293539780" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2022 06:56:04 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10535"; a="746028573" X-IronPort-AV: E=Sophos;i="5.96,174,1665471600"; d="scan'208";a="746028573" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO [10.0.2.15]) ([10.252.61.138]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2022 06:56:01 -0800 Message-ID: Date: Fri, 18 Nov 2022 16:55:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.5.0 Subject: Re: PM-runtime: supplier looses track of consumer during probe Content-Language: en-US To: Tushar Nimkar , linux-pm@vger.kernel.org, linux-scsi@vger.kernel.org, "Rafael J. Wysocki" Cc: linux-arm-msm@vger.kernel.org, LKML , bjorn.andersson@kernel.org, Nitin Rawat , quic_mkshah@quicinc.com, quic_lsrao@quicinc.com, bvanassche@acm.org, Peter Wang References: <36aed941-a73e-d937-2721-4f0decd61ce0@quicinc.com> <8c0a715a-d626-aa70-15f1-79f1e23fbc67@quicinc.com> From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki In-Reply-To: <8c0a715a-d626-aa70-15f1-79f1e23fbc67@quicinc.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/11/22 11:19, Tushar Nimkar wrote: > Hi linux-pm/linux-scsi, > > Gentle reminder! > > Can you please provide your suggestions on below race? > > Thanks, Tushar Nimkar > > On 10/14/2022 4:20 PM, Tushar Nimkar wrote: >> Hi linux-pm/linux-scsi, >> >> We have included fix [1] but continuing to observe supplier loosing track of consumer. >> >> Below is trace snippet with additional logging added. >> Here consumer is 0:0:0:0 and supplier is 0:0:0:49488. In Last three lines consumer resume is completed but supplier is put down. >> >>     kworker/u16:0-7     0.880014: rpm_idle:             0:0:0:0 flags-4 cnt-0  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:0-7     0.880017: bprint: pm_runtime_mark_last_busy.46700: :#205 dev_name:0:0:0:0 ktime_get_mono_fast_ns():852365364 >>     kworker/u16:0-7     0.880019: rpm_suspend:          0:0:0:0 flags-8 cnt-0  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:0-7     0.880022: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:5 decremented usage count >>     kworker/u16:0-7     0.880023: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 decremented usage count >>     kworker/u16:2-142   0.880024: rpm_resume:           0:0:0:0 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:0-7     0.880025: bprint: __rpm_put_suppliers: __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:4 >>     kworker/u16:0-7     0.880061: rpm_idle:             0:0:0:49488 flags-1 cnt-4  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:0-7     0.880062: rpm_return_int: rpm_idle+0x16c:0:0:0:49488 ret=-11 >>     kworker/u16:2-142   0.880062: bprint: __pm_runtime_resume: __pm_runtime_resume: #1147 dev_name:0:0:0:49488 dev usage_count:5 incremented usage count >>     kworker/u16:2-142   0.880063: rpm_resume:           0:0:0:49488 flags-4 cnt-5  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:2-142   0.880063: rpm_return_int: rpm_resume+0x690:0:0:0:49488 ret=1 >>     kworker/u16:0-7     0.880063: rpm_return_int: rpm_suspend+0x68:0:0:0:0 ret=0 >>     kworker/u16:2-142   0.880063: bprint: rpm_get_suppliers: rpm_get_suppliers: #300 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:5 >>     kworker/u16:0-7     0.880065: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 decremented usage count >>     kworker/u16:2-142   0.880065: bprint: pm_runtime_mark_last_busy.44088: :#205 dev_name:0:0:0:0 ktime_get_mono_fast_ns():852413749 >>     kworker/u16:2-142   0.880065: rpm_idle:             0:0:0:0 flags-1 cnt-1  dep-0  auto-1 p-0 irq-0 child-0 >>     kworker/u16:2-142   0.880065: rpm_return_int: rpm_idle+0x16c:0:0:0:0 ret=-11 >>     kworker/u16:0-7     0.880066: bprint: __rpm_put_suppliers: __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:4 >>     kworker/u16:0-7     0.880067: rpm_return_int: rpm_idle+0x16c:0:0:0:0 ret=-16 >>     kworker/u16:2-142   0.880067: rpm_return_int: rpm_resume+0x690:0:0:0:0 ret=0 >> >> Upon looking into this further the race looks to be in below two processes running in parallel and process-1 is putting down supplier at [C] because process-2 is setting runtime_status as resuming at [D]. >> >> Also as per runtime PM documentation >> In order to use autosuspend, subsystems or drivers must call >> pm_runtime_use_autosuspend(), and thereafter they should use the various `*_autosuspend()` helper functions... >> >> It was also observed that *_autosuspend() API at point [A] was invoked without first invoking pm_runtime_use_autosuspend() which return expiration as zero at point [B] and proceeds ahead for immediate runtime suspend of device which seems lead to this race condition. >> >> Process -1 >> ufshcd_async_scan context (process 1) >> scsi_autopm_put_device() //0:0:0:0 I am having trouble following your description. What function is calling scsi_autopm_put_device() here? >> pm_runtime_put_sync() >> __pm_runtime_idle() >> rpm_idle() -- RPM_GET_PUT(4) >>      __rpm_callback >>          scsi_runtime_idle() >>              pm_runtime_mark_last_busy() >>              pm_runtime_autosuspend()  --[A] >>                  rpm_suspend() -- RPM_AUTO(8) >>                      pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B] >>                          __update_runtime_status to RPM_SUSPENDING >>                      __rpm_callback() >>                          __rpm_put_suppliers(dev, false) >>                      __update_runtime_status to RPM_SUSPENDED >>                  rpm_suspend_suppliers() >>                      rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier] >>                  rpm_suspend() – END with return=0 >>          scsi_runtime_idle() END return (-EBUSY) always. Not following here either. Which device is EBUSY and why? >>       /* Do that if resume fails too.*/ >>      (dev->power.runtime_status == RPM_RESUMING && retval)))  return -EBUSY >>          __rpm_put_suppliers(dev, false)  -- [C] >> rpm_idle() END return -EBUSY >> >> Process -2 >> sd_probe context (Process 2) >> scsi_autopm_get_device() //0:0:0:0 >> __pm_runtime_resume(RPM_GET_PUT) >> rpm_resume() -- RPM_GET_PUT(4) >>      __update_runtime_status to RPM_RESUMING --[D] >>      __rpm_callback() >>          rpm_get_suppliers() >>              __pm_runtime_resume() - RPM_GET_PUT(4) – supplier >>                  rpm_resume() for supplier. >>      __update_runtime_status to RPM_ACTIVE >>      pm_runtime_mark_last_busy () >> rpm_resume() END return 0 >> >> Can you please provide your suggestions on addressing above race condition? >> >> This is also reported at [2]. >> >> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/ >> [2]: https://lkml.org/lkml/2022/10/12/259 >> >> Thanks, >> Tushar Nimkar