Hi Greg,
While playing around with my slot symlink stuff, I noticed that
the following sequence is problematic:
1. clean boot
2. modprobe acpiphp
3. echo 0 > /sys/bus/pci/slots/N/power
4. ???
After step 3, we *should* be seeing pci_release_dev() getting
called, but we never do because the refcount on the device is
still quite high (5 or 6, on my ia64 system).
I'm still trying to track this down, but I did notice, via code
inspection, at least one suspicious area:
#define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
That eventually calls pci_get_dev_by_id(), which increases the
refcount on the device, but never decrements it.
Looks like that change in behavior happened here:
PCI: clean up search.c a lot
95247b57ed844511a212265b45cf9a919753aea1
pci_get_device() used to decrement the refcount, but no longer
does.
Thanks to Matthew Wilcox for helping me get this far...
Like I said, I'm still trying to track down my particular issue,
but I'd like to get your opinion on this.
Thanks!
/ac
On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
>
> That eventually calls pci_get_dev_by_id(), which increases the
> refcount on the device, but never decrements it.
>
> Looks like that change in behavior happened here:
>
> PCI: clean up search.c a lot
> 95247b57ed844511a212265b45cf9a919753aea1
>
> pci_get_device() used to decrement the refcount, but no longer
> does.
>
> Thanks to Matthew Wilcox for helping me get this far...
>
> Like I said, I'm still trying to track down my particular issue,
> but I'd like to get your opinion on this.
In particular, I'd like to know whether this should be fixed by
pci_get_dev_by_id() decrementing the refcount of from/dev_start,
pci_get_subsys() decrementing 'from', or by bus_find_device()
decrementing 'start'. It looks like bus_find_device() is the place
where this should logically happen, but the kerneldoc doesn't document
the intended behaviour.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> Hi Greg,
>
> While playing around with my slot symlink stuff, I noticed that
> the following sequence is problematic:
>
> 1. clean boot
> 2. modprobe acpiphp
> 3. echo 0 > /sys/bus/pci/slots/N/power
> 4. ???
>
> After step 3, we *should* be seeing pci_release_dev() getting
> called, but we never do because the refcount on the device is
> still quite high (5 or 6, on my ia64 system).
>
> I'm still trying to track this down, but I did notice, via code
> inspection, at least one suspicious area:
>
> #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
>
> That eventually calls pci_get_dev_by_id(), which increases the
> refcount on the device, but never decrements it.
>
> Looks like that change in behavior happened here:
>
> PCI: clean up search.c a lot
> 95247b57ed844511a212265b45cf9a919753aea1
>
> pci_get_device() used to decrement the refcount, but no longer
> does.
No, pci_get_device() never decremented the refcount, and that didn't
change in the above git commit.
The description of pci_get_device() says that a reference is grabbed:
Iterates through the list of known PCI devices. If a PCI device
is found with a matching @vendor and @device, the reference
count to the device is incremented and a pointer to its device
structure is returned. Otherwise, %NULL is returned. A new
search is initiated by passing %NULL as the @from argument.
Otherwise if @from is not %NULL, searches continue from next
device on the global list. The reference count for @from is
always decremented if it is not %NULL.
All of the pci_find* functions should not have grabbed a reference to
the device, as that was the "old" behavior. All of the pci_get*
functions do grab a reference.
Did I somehow mess up and one of the pci_find* functions now improperly
increment a reference? Hopefully we shouldn't be using those functions
anymore as they aren't hotplug safe...
thanks,
greg k-h
On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
> On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> >
> > That eventually calls pci_get_dev_by_id(), which increases the
> > refcount on the device, but never decrements it.
> >
> > Looks like that change in behavior happened here:
> >
> > PCI: clean up search.c a lot
> > 95247b57ed844511a212265b45cf9a919753aea1
> >
> > pci_get_device() used to decrement the refcount, but no longer
> > does.
> >
> > Thanks to Matthew Wilcox for helping me get this far...
> >
> > Like I said, I'm still trying to track down my particular issue,
> > but I'd like to get your opinion on this.
>
> In particular, I'd like to know whether this should be fixed by
> pci_get_dev_by_id() decrementing the refcount of from/dev_start,
> pci_get_subsys() decrementing 'from', or by bus_find_device()
> decrementing 'start'. It looks like bus_find_device() is the place
> where this should logically happen, but the kerneldoc doesn't document
> the intended behaviour.
Ah, no the driver core isn't supposed to do this, it's something the pci
functions do out of "niceness" as that's how we can use them in an
iterator properly.
Does the following (untested) patch fix the issue for you all?
thanks,
greg k-h
--------------
Subject: PCI: fix reference leak in pci_get_dev_by_id()
From: Greg Kroah-Hartman <[email protected]>
Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
not properly decrement the reference on the from pointer if it is
present, like the documentation for the function states it will.
Cc: Matthew Wilcox <[email protected]>
Cc: Alex Chiang <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 217814f..3b3b5f1 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct pci_device_id *id,
match_pci_dev_by_id);
if (dev)
pdev = to_pci_dev(dev);
+ if (from)
+ pci_dev_put(from);
return pdev;
}
* Greg KH <[email protected]>:
> On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
> > On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> > > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> > >
> > > That eventually calls pci_get_dev_by_id(), which increases the
> > > refcount on the device, but never decrements it.
> > >
> > > Looks like that change in behavior happened here:
> > >
> > > PCI: clean up search.c a lot
> > > 95247b57ed844511a212265b45cf9a919753aea1
> > >
> > > pci_get_device() used to decrement the refcount, but no longer
> > > does.
> > >
> > > Thanks to Matthew Wilcox for helping me get this far...
> > >
> > > Like I said, I'm still trying to track down my particular issue,
> > > but I'd like to get your opinion on this.
> >
> > In particular, I'd like to know whether this should be fixed by
> > pci_get_dev_by_id() decrementing the refcount of from/dev_start,
> > pci_get_subsys() decrementing 'from', or by bus_find_device()
> > decrementing 'start'. It looks like bus_find_device() is the place
> > where this should logically happen, but the kerneldoc doesn't document
> > the intended behaviour.
>
> Ah, no the driver core isn't supposed to do this, it's something the pci
> functions do out of "niceness" as that's how we can use them in an
> iterator properly.
>
> Does the following (untested) patch fix the issue for you all?
Perfect, yes.
I applied it, and observed the refcount on the device I was using
to debug this problem.
I was able to modprobe acpiphp successfully, and echo 0 into the
device's 'power' file.
I then watched the rest of the hotplug core do its thing,
decrementing the refcount properly along the way, and at the end,
we did call pci_release_dev(), as I originally expected to. :)
Just to verify, I toggled the device's power a few times (echoing
a 1 and then a 0, etc.) and watched the refcounts. After each
offline, we properly called pci_release_dev().
Thanks for fixing this. It fixes a pretty bad leak in the hotplug
core (we were leaking an entire struct pci_dev * # of functions
for each offlined card, the first time around; subsequent
onlines/offlines were ok).
Tested-by: Alex Chiang <[email protected]>
/ac
>
> thanks,
>
> greg k-h
>
> --------------
> Subject: PCI: fix reference leak in pci_get_dev_by_id()
>
> From: Greg Kroah-Hartman <[email protected]>
>
> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
> not properly decrement the reference on the from pointer if it is
> present, like the documentation for the function states it will.
>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Alex Chiang <[email protected]>
> Cc: stable <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
>
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 217814f..3b3b5f1 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct pci_device_id *id,
> match_pci_dev_by_id);
> if (dev)
> pdev = to_pci_dev(dev);
> + if (from)
> + pci_dev_put(from);
> return pdev;
> }
>
>
On Thursday, August 21, 2008 1:47 pm Greg KH wrote:
> Subject: PCI: fix reference leak in pci_get_dev_by_id()
>
> From: Greg Kroah-Hartman <[email protected]>
>
> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
> not properly decrement the reference on the from pointer if it is
> present, like the documentation for the function states it will.
>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Alex Chiang <[email protected]>
> Cc: stable <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
>
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 217814f..3b3b5f1 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct
> pci_device_id *id, match_pci_dev_by_id);
> if (dev)
> pdev = to_pci_dev(dev);
> + if (from)
> + pci_dev_put(from);
> return pdev;
> }
Wow, thanks Greg, that was fast. I applied this to my for-linus branch; I'll
ask Linus to pull it for 2.6.27.
Thanks,
--
Jesse Barnes, Intel Open Source Technology Center
On Thu, 21 Aug 2008, Jesse Barnes wrote:
> Wow, thanks Greg, that was fast. I applied this to my for-linus branch; I'll
> ask Linus to pull it for 2.6.27.
Does 2.6.26 need it as well?
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Thu, 21 Aug 2008, Henrique de Moraes Holschuh wrote:
> On Thu, 21 Aug 2008, Jesse Barnes wrote:
> > Wow, thanks Greg, that was fast. I applied this to my for-linus branch; I'll
> > ask Linus to pull it for 2.6.27.
>
> Does 2.6.26 need it as well?
Never mind, I just read Greg's reply. It arrived a few seconds after I sent
this. Sorry about it.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
And the pci_get_dev_by_id() is not safe again the PCI device removal. It might fire a warning in bus_find_device() when reference count of the knode_bus is decreased to 0 by pci_remove_bus().
Need to document this or a fix?
Thanks,
Yu
On Friday, August 22, 2008 6:15 AM, Alex Chiang wrote:
>* Greg KH <[email protected]>:
>> On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
>> > On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
>> > > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID,
>PCI_ANY_ID, d)) != NULL)
>> > >
>> > > That eventually calls pci_get_dev_by_id(), which increases the
>> > > refcount on the device, but never decrements it.
>> > >
>> > > Looks like that change in behavior happened here:
>> > >
>> > > PCI: clean up search.c a lot
>> > > 95247b57ed844511a212265b45cf9a919753aea1
>> > >
>> > > pci_get_device() used to decrement the refcount, but no longer
>> > > does.
>> > >
>> > > Thanks to Matthew Wilcox for helping me get this far...
>> > >
>> > > Like I said, I'm still trying to track down my particular issue,
>> > > but I'd like to get your opinion on this.
>> >
>> > In particular, I'd like to know whether this should be fixed by
>> > pci_get_dev_by_id() decrementing the refcount of from/dev_start,
>> > pci_get_subsys() decrementing 'from', or by bus_find_device()
>> > decrementing 'start'. It looks like bus_find_device() is the place
>> > where this should logically happen, but the kerneldoc doesn't document
>> > the intended behaviour.
>>
>> Ah, no the driver core isn't supposed to do this, it's something the pci
>> functions do out of "niceness" as that's how we can use them in an
>> iterator properly.
>>
>> Does the following (untested) patch fix the issue for you all?
>
>Perfect, yes.
>
>I applied it, and observed the refcount on the device I was using
>to debug this problem.
>
>I was able to modprobe acpiphp successfully, and echo 0 into the
>device's 'power' file.
>
>I then watched the rest of the hotplug core do its thing,
>decrementing the refcount properly along the way, and at the end,
>we did call pci_release_dev(), as I originally expected to. :)
>
>Just to verify, I toggled the device's power a few times (echoing
>a 1 and then a 0, etc.) and watched the refcounts. After each
>offline, we properly called pci_release_dev().
>
>Thanks for fixing this. It fixes a pretty bad leak in the hotplug
>core (we were leaking an entire struct pci_dev * # of functions
>for each offlined card, the first time around; subsequent
>onlines/offlines were ok).
>
>Tested-by: Alex Chiang <[email protected]>
>
>/ac
>
>>
>> thanks,
>>
>> greg k-h
>>
>> --------------
>> Subject: PCI: fix reference leak in pci_get_dev_by_id()
>>
>> From: Greg Kroah-Hartman <[email protected]>
>>
>> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
>> not properly decrement the reference on the from pointer if it is
>> present, like the documentation for the function states it will.
>>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Alex Chiang <[email protected]>
>> Cc: stable <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>
>>
>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> index 217814f..3b3b5f1 100644
>> --- a/drivers/pci/search.c
>> +++ b/drivers/pci/search.c
>> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct
>pci_device_id *id,
>> match_pci_dev_by_id);
>> if (dev)
>> pdev = to_pci_dev(dev);
>> + if (from)
>> + pci_dev_put(from);
>> return pdev;
>> }
>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
> And the pci_get_dev_by_id() is not safe again the PCI device removal.
> It might fire a warning in bus_find_device() when reference count of
> the knode_bus is decreased to 0 by pci_remove_bus().
Is this something new? Hasn't this always been that way? Why would you
be wanting to call this function anyway?
thanks,
greg k-h
It's been so for a while, guess since 2.5. I meant to say the pci_remove_bus_device(), not pci_remove_bus(), is used by pci hotplug code. If a device is being plugged off, and some drivers are calling pci_get_dev_by_id() to search something, then the warning might be fired through bus_find_device -> klist_iter_init_node ->kref_get.
Thanks,
Yu
>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Greg KH
>Sent: Saturday, August 30, 2008 1:38 PM
>To: Zhao, Yu
>Cc: Alex Chiang; Matthew Wilcox; [email protected];
>[email protected]
>Subject: Re: refcount leak in pci_get_device()?
>
>On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
>> And the pci_get_dev_by_id() is not safe again the PCI device removal.
>> It might fire a warning in bus_find_device() when reference count of
>> the knode_bus is decreased to 0 by pci_remove_bus().
>
>Is this something new? Hasn't this always been that way? Why would you
>be wanting to call this function anyway?
>
>thanks,
>
>greg k-h
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
Just happened to see a change in fakephp when searching usage of the pci_remove_bus_device(). I couldn't find the original patches, but this http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commitdiff;h=fe99740cac117f208707488c03f3789cf4904957 shows, the pci_remove_bus_device() was removed while the remove_slot() was added in disable_slot(), which means: 1) fakephp can't remove pci_dev anymore; 2) deadlock happens because remove_slot() tries to remove the sysfs entry calling itself.
On Saturday, August 30, 2008 2:21 PM, Zhao, Yu wrote:
>It's been so for a while, guess since 2.5. I meant to say the
>pci_remove_bus_device(), not pci_remove_bus(), is used by pci hotplug code. If
>a device is being plugged off, and some drivers are calling pci_get_dev_by_id()
>to search something, then the warning might be fired through bus_find_device
>-> klist_iter_init_node ->kref_get.
>
>Thanks,
>Yu
>
>>-----Original Message-----
>>From: [email protected]
>>[mailto:[email protected]] On Behalf Of Greg KH
>>Sent: Saturday, August 30, 2008 1:38 PM
>>To: Zhao, Yu
>>Cc: Alex Chiang; Matthew Wilcox; [email protected];
>>[email protected]
>>Subject: Re: refcount leak in pci_get_device()?
>>
>>On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
>>> And the pci_get_dev_by_id() is not safe again the PCI device removal.
>>> It might fire a warning in bus_find_device() when reference count of
>>> the knode_bus is decreased to 0 by pci_remove_bus().
>>
>>Is this something new? Hasn't this always been that way? Why would you
>>be wanting to call this function anyway?
>>
>>thanks,
>>
>>greg k-h
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html