Hi all,
for_each_child_of_node() and similar functions increase the refcount
on each returned node and expect the caller to release the node by
calling of_node_put() when done.
Looking through the kernel code, it appears this is hardly ever done,
if at all. Some code even calls of_node_get() on returned nodes again.
I guess this doesn't matter in cases where devicetree is a static entity.
However, this is not (or no longer) the case with devicetree overlays,
or more generically in cases where devicetree nodes are added and
removed dynamically.
Fundamental question: Would patches to fix this problem be accepted upstream ?
Or, of course, stepping a bit back: Am I missing something essential ?
Thanks,
Guenter
On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
> Hi all,
>
> for_each_child_of_node() and similar functions increase the refcount
> on each returned node and expect the caller to release the node by
> calling of_node_put() when done.
>
> Looking through the kernel code, it appears this is hardly ever done,
> if at all. Some code even calls of_node_get() on returned nodes again.
>
> I guess this doesn't matter in cases where devicetree is a static entity.
> However, this is not (or no longer) the case with devicetree overlays,
> or more generically in cases where devicetree nodes are added and
> removed dynamically.
>
> Fundamental question: Would patches to fix this problem be accepted upstream
> ?
Certainly.
> Or, of course, stepping a bit back: Am I missing something essential ?
No. I think this is frequently wrong since it typically doesn't matter
for static entries as you mention.
Rob
On Sat, Oct 12, 2013 at 10:15:03PM -0500, Rob Herring wrote:
> On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
> > Hi all,
> >
> > for_each_child_of_node() and similar functions increase the refcount
> > on each returned node and expect the caller to release the node by
> > calling of_node_put() when done.
> >
> > Looking through the kernel code, it appears this is hardly ever done,
> > if at all. Some code even calls of_node_get() on returned nodes again.
> >
> > I guess this doesn't matter in cases where devicetree is a static entity.
> > However, this is not (or no longer) the case with devicetree overlays,
> > or more generically in cases where devicetree nodes are added and
> > removed dynamically.
> >
> > Fundamental question: Would patches to fix this problem be accepted upstream
> > ?
>
> Certainly.
>
> > Or, of course, stepping a bit back: Am I missing something essential ?
>
> No. I think this is frequently wrong since it typically doesn't matter
> for static entries as you mention.
Actually, I think it actually happens to be correct most of the time.
The reason is that for_each_child_of_node() internally calls the
of_get_next_child() to iterate over all children. And that function
already calls of_node_put() on the "previous" node. So if all the code
does is to iterate over all nodes to query them, then all should be
fine.
The only case where you actually need to drop the reference on a node is
if you break out of the loop (so that of_get_next_child() will not be
called). But that's usually the case when you need to perform some
operation on the node, in which case it is the right thing to hold on to
a reference until you're done with the node.
Thierry
On Wed, Oct 23, 2013 at 09:10:07AM +0200, Thierry Reding wrote:
> On Sat, Oct 12, 2013 at 10:15:03PM -0500, Rob Herring wrote:
> > On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
> > > Hi all,
> > >
> > > for_each_child_of_node() and similar functions increase the refcount
> > > on each returned node and expect the caller to release the node by
> > > calling of_node_put() when done.
> > >
> > > Looking through the kernel code, it appears this is hardly ever done,
> > > if at all. Some code even calls of_node_get() on returned nodes again.
> > >
> > > I guess this doesn't matter in cases where devicetree is a static entity.
> > > However, this is not (or no longer) the case with devicetree overlays,
> > > or more generically in cases where devicetree nodes are added and
> > > removed dynamically.
> > >
> > > Fundamental question: Would patches to fix this problem be accepted upstream
> > > ?
> >
> > Certainly.
> >
> > > Or, of course, stepping a bit back: Am I missing something essential ?
> >
> > No. I think this is frequently wrong since it typically doesn't matter
> > for static entries as you mention.
>
> Actually, I think it actually happens to be correct most of the time.
> The reason is that for_each_child_of_node() internally calls the
> of_get_next_child() to iterate over all children. And that function
> already calls of_node_put() on the "previous" node. So if all the code
> does is to iterate over all nodes to query them, then all should be
> fine.
>
Good, that reduces the scope of the problem significantly.
> The only case where you actually need to drop the reference on a node is
> if you break out of the loop (so that of_get_next_child() will not be
> called). But that's usually the case when you need to perform some
> operation on the node, in which case it is the right thing to hold on to
> a reference until you're done with the node.
>
Unfortunately, there are many cases with code such as
if (error)
return; /* or break; */
or even
if (found node)
return of_node_get(node);
in the loop.
Guenter
On Wed, Oct 23, 2013 at 09:16:44AM -0700, Guenter Roeck wrote:
> On Wed, Oct 23, 2013 at 09:10:07AM +0200, Thierry Reding wrote:
> > On Sat, Oct 12, 2013 at 10:15:03PM -0500, Rob Herring wrote:
> > > On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
> > > > Hi all,
> > > >
> > > > for_each_child_of_node() and similar functions increase the refcount
> > > > on each returned node and expect the caller to release the node by
> > > > calling of_node_put() when done.
> > > >
> > > > Looking through the kernel code, it appears this is hardly ever done,
> > > > if at all. Some code even calls of_node_get() on returned nodes again.
> > > >
> > > > I guess this doesn't matter in cases where devicetree is a static entity.
> > > > However, this is not (or no longer) the case with devicetree overlays,
> > > > or more generically in cases where devicetree nodes are added and
> > > > removed dynamically.
> > > >
> > > > Fundamental question: Would patches to fix this problem be accepted upstream
> > > > ?
> > >
> > > Certainly.
> > >
> > > > Or, of course, stepping a bit back: Am I missing something essential ?
> > >
> > > No. I think this is frequently wrong since it typically doesn't matter
> > > for static entries as you mention.
> >
> > Actually, I think it actually happens to be correct most of the time.
> > The reason is that for_each_child_of_node() internally calls the
> > of_get_next_child() to iterate over all children. And that function
> > already calls of_node_put() on the "previous" node. So if all the code
> > does is to iterate over all nodes to query them, then all should be
> > fine.
> >
> Good, that reduces the scope of the problem significantly.
>
> > The only case where you actually need to drop the reference on a node is
> > if you break out of the loop (so that of_get_next_child() will not be
> > called). But that's usually the case when you need to perform some
> > operation on the node, in which case it is the right thing to hold on to
> > a reference until you're done with the node.
> >
> Unfortunately, there are many cases with code such as
>
> if (error)
> return; /* or break; */
Well, a break isn't necessarily bad, since you could be using the node
subsequently. I imagine that depending on the exact block following the
if statement the node could also be assigned to some field within a
structure or similar, in which case this might still be valid. So it
really needs to be evaluated on a case by case basis.
If the above is actually verbatim, then yes, that's certainly an error.
> or even
> if (found node)
> return of_node_get(node);
>
> in the loop.
Yeah, I think all of those are probably wrong too.
Thierry
On 10/24/2013 12:50 AM, Thierry Reding wrote:
> On Wed, Oct 23, 2013 at 09:16:44AM -0700, Guenter Roeck wrote:
>> On Wed, Oct 23, 2013 at 09:10:07AM +0200, Thierry Reding wrote:
>>> On Sat, Oct 12, 2013 at 10:15:03PM -0500, Rob Herring wrote:
>>>> On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
>>>>> Hi all,
>>>>>
>>>>> for_each_child_of_node() and similar functions increase the refcount
>>>>> on each returned node and expect the caller to release the node by
>>>>> calling of_node_put() when done.
>>>>>
>>>>> Looking through the kernel code, it appears this is hardly ever done,
>>>>> if at all. Some code even calls of_node_get() on returned nodes again.
>>>>>
>>>>> I guess this doesn't matter in cases where devicetree is a static entity.
>>>>> However, this is not (or no longer) the case with devicetree overlays,
>>>>> or more generically in cases where devicetree nodes are added and
>>>>> removed dynamically.
>>>>>
>>>>> Fundamental question: Would patches to fix this problem be accepted upstream
>>>>> ?
>>>>
>>>> Certainly.
>>>>
>>>>> Or, of course, stepping a bit back: Am I missing something essential ?
>>>>
>>>> No. I think this is frequently wrong since it typically doesn't matter
>>>> for static entries as you mention.
>>>
>>> Actually, I think it actually happens to be correct most of the time.
>>> The reason is that for_each_child_of_node() internally calls the
>>> of_get_next_child() to iterate over all children. And that function
>>> already calls of_node_put() on the "previous" node. So if all the code
>>> does is to iterate over all nodes to query them, then all should be
>>> fine.
>>>
>> Good, that reduces the scope of the problem significantly.
>>
>>> The only case where you actually need to drop the reference on a node is
>>> if you break out of the loop (so that of_get_next_child() will not be
>>> called). But that's usually the case when you need to perform some
>>> operation on the node, in which case it is the right thing to hold on to
>>> a reference until you're done with the node.
>>>
>> Unfortunately, there are many cases with code such as
>>
>> if (error)
>> return; /* or break; */
>
> Well, a break isn't necessarily bad, since you could be using the node
> subsequently. I imagine that depending on the exact block following the
Correct, but I meant the error case. Randomly looking through several
drivers, most of them get error return handling wrong. "Winner" so far
is of_regulator_match(), which doesn't release the node on error return,
but does not acquire references for use afterwards either.
Something to do with my non-existing free time ;-).
Guenter
On Thu, Oct 24, 2013 at 06:31:21AM -0700, Guenter Roeck wrote:
> On 10/24/2013 12:50 AM, Thierry Reding wrote:
> >On Wed, Oct 23, 2013 at 09:16:44AM -0700, Guenter Roeck wrote:
> >>On Wed, Oct 23, 2013 at 09:10:07AM +0200, Thierry Reding wrote:
> >>>On Sat, Oct 12, 2013 at 10:15:03PM -0500, Rob Herring wrote:
> >>>>On Sat, Oct 12, 2013 at 3:54 PM, Guenter Roeck <[email protected]> wrote:
> >>>>>Hi all,
> >>>>>
> >>>>>for_each_child_of_node() and similar functions increase the refcount
> >>>>>on each returned node and expect the caller to release the node by
> >>>>>calling of_node_put() when done.
> >>>>>
> >>>>>Looking through the kernel code, it appears this is hardly ever done,
> >>>>>if at all. Some code even calls of_node_get() on returned nodes again.
> >>>>>
> >>>>>I guess this doesn't matter in cases where devicetree is a static entity.
> >>>>>However, this is not (or no longer) the case with devicetree overlays,
> >>>>>or more generically in cases where devicetree nodes are added and
> >>>>>removed dynamically.
> >>>>>
> >>>>>Fundamental question: Would patches to fix this problem be accepted upstream
> >>>>>?
> >>>>
> >>>>Certainly.
> >>>>
> >>>>>Or, of course, stepping a bit back: Am I missing something essential ?
> >>>>
> >>>>No. I think this is frequently wrong since it typically doesn't matter
> >>>>for static entries as you mention.
> >>>
> >>>Actually, I think it actually happens to be correct most of the time.
> >>>The reason is that for_each_child_of_node() internally calls the
> >>>of_get_next_child() to iterate over all children. And that function
> >>>already calls of_node_put() on the "previous" node. So if all the code
> >>>does is to iterate over all nodes to query them, then all should be
> >>>fine.
> >>>
> >>Good, that reduces the scope of the problem significantly.
> >>
> >>>The only case where you actually need to drop the reference on a node is
> >>>if you break out of the loop (so that of_get_next_child() will not be
> >>>called). But that's usually the case when you need to perform some
> >>>operation on the node, in which case it is the right thing to hold on to
> >>>a reference until you're done with the node.
> >>>
> >>Unfortunately, there are many cases with code such as
> >>
> >> if (error)
> >> return; /* or break; */
> >
> >Well, a break isn't necessarily bad, since you could be using the node
> >subsequently. I imagine that depending on the exact block following the
>
> Correct, but I meant the error case. Randomly looking through several
> drivers, most of them get error return handling wrong. "Winner" so far
> is of_regulator_match(), which doesn't release the node on error return,
> but does not acquire references for use afterwards either.
>
> Something to do with my non-existing free time ;-).
Well, that's better than boring, isn't it? =)
Thierry