Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760112Ab1FWWek (ORCPT ); Thu, 23 Jun 2011 18:34:40 -0400 Received: from ogre.sisk.pl ([217.79.144.158]:54443 "EHLO ogre.sisk.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753179Ab1FWWei (ORCPT ); Thu, 23 Jun 2011 18:34:38 -0400 From: "Rafael J. Wysocki" To: Alan Stern Subject: Re: [PATCH] PCI / PM: Block races between runtime PM and system sleep Date: Fri, 24 Jun 2011 00:35:14 +0200 User-Agent: KMail/1.13.6 (Linux/3.0.0-rc4+; KDE/4.6.0; x86_64; ; ) Cc: Linux PM mailing list , LKML , Jesse Barnes , "linux-pci@vger.kernel.org" , Tejun Heo References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201106240035.14913.rjw@sisk.pl> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3835 Lines: 77 On Thursday, June 23, 2011, Alan Stern wrote: > On Thu, 23 Jun 2011, Rafael J. Wysocki wrote: > > > > Then maybe this disable_depth > 0 case should return something other > > > than 0. Something new, like -EACCES. That way the caller would > > > realize something strange was going on but wouldn't have to treat the > > > situation as an error. > > > > I would be fine with that, but then we'd need to reserve that error code, > > so that it's not returned by subsystem callbacks (or even we should convert > > it to a different error code if it is returned by the subsystem callback in > > rpm_resume()). > > > > > After all, the return value from pm_runtime_get_sync() is documented to > > > be the error code for the underlying pm_runtime_resume(). It doesn't > > > refer to the increment operation -- that always succeeds. > > > > That means we should change the caller, which is the SCSI subsystem in this > > particular case, to check the error code. The problem with this approach > > is that the same error code may be returned in a different situation, so > > we should prevent that from happening in the first place. Still, suppose > > that we do that and that the caller checks the error code. What is it > > supposed to do in that situation? The only reasonable action for the > > caller is to ignore the error code if it means disable_depth > 0 and go > > on with whatever it has to do, but that's what it will do if the > > pm_runtime_get_sync() returns 0 in that situation. > > > > So, in my opinion it simply may be best to update the documentation of > > pm_runtime_get_sync() along with the code changes. :-) > > The only reason you're doing this is for the SCSI error-handler > routine? Yes, it is. > I think it would be easier to change that routine instead of the PM > core. It should be smart enough to know that a runtime PM call isn't > needed during a system sleep transition, i.e., between the scsi_host's > suspend and resume callbacks. Maybe check the new is_suspended flag. > You'd also have to make sure the scsi_host wasn't runtime suspended > when the sleep begins, rather like PCI. Well, I think the problem is still there even at run time if runtime PM happens to be disabled for shost_gendev (this probably never happens in practice, though, except when runtime PM is disabled during system suspend, which also wasn't done before). > I'm still not clear on why the error handler needs to run at this time. Because SATA ports are suspended with the help of the SCSI error handling mechanism (which Tejun claims is the best way to do that). But the thing done by scsi_autopm_get_host() is entirely reasonable (maybe except that calling pm_runtime_put_sync() after failing pm_runtime_get_sync() is rather pointless, pm_runtime_put_noidle() would be sufficient), but it doesn't work for disabled runtime PM. So, the question is whether or not what scsi_autopm_get_host() does should work when runtime PM is disabled and I think it should. Hence the patch. Now, I agree that the proposed change is a little ugly, because it makes "resume with taking reference" behave differently from "resume". The solution to that would be to return a special error code in the "disabled runtime PM" case. However, in that case we'd need to change the runtime PM code in general to return this particular error code in the "disabled runtime PM" case and to prevent this error code from being returned in other circumstances. In addition to that, we'de need to change the SCSI code, of course. Do you think that would be better? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/