2016-04-14 07:48:19

by Masahiro Yamada

[permalink] [raw]
Subject: [Question] refcount of DT node

Hi experts.

My understanding of refcount of DT node is poor.
Please help me understand it correctly.

Sorry if I am asking stupid questions.


[1] Does this reference count exist for Overlay?
Is a node freed when its refcount becomes zero?


[2] When of_node_put() should be called,
or should not be called?


Shouldn't of_node_put() be called
when we are still referencing to any of its properties?

For example, cpu_read_enable_method()
in arch/arm64/kernel/cpu_ops.c
returns a pointer to the property value
instead of creating a copy of it.

In this case, of_node_put() should not be called
because we are still referencing the DT property
(in other words, referencing to the DT node indirectly).

Am I right?


[3] Is the following code correct?

np = of_find_compatible_node(NULL, NULL,"foo-node");
of_node_put(np);
ret = of_address_to_resource(np, 0, &res);
if (ret) {
pr_err("failed to get resource\n");
return ret;
}

Actually I wrote the code above, and it was applied.

But, the node is still referenced while of_address_to_resource() is being run.

So the correct code should be as follows?

np = of_find_compatible_node(NULL, NULL,"foo-node");
ret = of_address_to_resource(np, 0, &res);
of_node_put(np);
if (ret) {
pr_err("failed to get resource\n");
return ret;
}



--
Best Regards
Masahiro Yamada


2016-04-14 08:49:06

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
> Hi experts.
>
> My understanding of refcount of DT node is poor.

The message from DT people is... don't worry about DT node refcounting.
Do whatever you want with it, they don't care whether you have correct
refcounting or not.

The background behind that is that I've tried to fix the refcounting,
and even had the coccinelle people generate some stuff to work on this
issue, but DT people's attitude towards it is "don't bother".

So yes, people may get it wrong, but it seems it's something that DT
people want ignored.

--
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-04-14 10:00:21

by Mark Rutland

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On Thu, Apr 14, 2016 at 09:48:49AM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
> > Hi experts.
> >
> > My understanding of refcount of DT node is poor.
>
> The message from DT people is... don't worry about DT node refcounting.
> Do whatever you want with it, they don't care whether you have correct
> refcounting or not.
>
> The background behind that is that I've tried to fix the refcounting,
> and even had the coccinelle people generate some stuff to work on this
> issue, but DT people's attitude towards it is "don't bother".
>
> So yes, people may get it wrong, but it seems it's something that DT
> people want ignored.

I'm not sure that's quite fair; the last discussion I recall about this
ended up concluding that we need a better API, rather than papering over
problems.

That said, there isn't much obvious progress on that front.

Frank, Pantelis, Rob, were there any conclusions on this from ELC, or is
this something that needs someone to propose something?

Mark.

[1] http://permalink.gmane.org/gmane.linux.drivers.devicetree/153777

2016-04-14 10:03:25

by Pantelis Antoniou

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

Hi Mark,

> On Apr 14, 2016, at 12:59 , Mark Rutland <[email protected]> wrote:
>
> On Thu, Apr 14, 2016 at 09:48:49AM +0100, Russell King - ARM Linux wrote:
>> On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
>>> Hi experts.
>>>
>>> My understanding of refcount of DT node is poor.
>>
>> The message from DT people is... don't worry about DT node refcounting.
>> Do whatever you want with it, they don't care whether you have correct
>> refcounting or not.
>>
>> The background behind that is that I've tried to fix the refcounting,
>> and even had the coccinelle people generate some stuff to work on this
>> issue, but DT people's attitude towards it is "don't bother".
>>
>> So yes, people may get it wrong, but it seems it's something that DT
>> people want ignored.
>
> I'm not sure that's quite fair; the last discussion I recall about this
> ended up concluding that we need a better API, rather than papering over
> problems.
>
> That said, there isn't much obvious progress on that front.
>
> Frank, Pantelis, Rob, were there any conclusions on this from ELC, or is
> this something that needs someone to propose something?
>

Frank mentioned that he wants a new API. I have some ideas about it too.

My take is that drivers should never do reference counting, we have to figure
out a way for DT access using copy semantics or locks.

References would still be required for core DT code, but that’s a sane subset.

> Mark.
>
> [1] http://permalink.gmane.org/gmane.linux.drivers.devicetree/153777

Regards

— Pantelis

2016-04-14 10:11:18

by Mark Rutland

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
> Hi experts.
>
> My understanding of refcount of DT node is poor.
> Please help me understand it correctly.
>
> Sorry if I am asking stupid questions.
>
>
> [1] Does this reference count exist for Overlay?
> Is a node freed when its refcount becomes zero?

I'm not familiar with the way that overlays are intended to work, but
generally this is true, and I believe the same applies.

Pantelis, please correct me if I am wrong on that front.

> [2] When of_node_put() should be called,
> or should not be called?
>
>
> Shouldn't of_node_put() be called
> when we are still referencing to any of its properties?
>
> For example, cpu_read_enable_method()
> in arch/arm64/kernel/cpu_ops.c
> returns a pointer to the property value
> instead of creating a copy of it.
>
> In this case, of_node_put() should not be called
> because we are still referencing the DT property
> (in other words, referencing to the DT node indirectly).
>
> Am I right?

Yes, the node should not be freed while its data is referred to.

We are leaking a ref there, though, as we no longer refer to that data
after cpu_read_ops().

Fixing that will require some restructuring. We don't expect a CPU node
to need to disappear, so while it's currently not strictly correct the
code shouldn't lead to any adverse behaviour.

> [3] Is the following code correct?
>
> np = of_find_compatible_node(NULL, NULL,"foo-node");
> of_node_put(np);
> ret = of_address_to_resource(np, 0, &res);
> if (ret) {
> pr_err("failed to get resource\n");
> return ret;
> }
>
> Actually I wrote the code above, and it was applied.
>
> But, the node is still referenced while of_address_to_resource() is being run.
>
> So the correct code should be as follows?
>
> np = of_find_compatible_node(NULL, NULL,"foo-node");
> ret = of_address_to_resource(np, 0, &res);
> of_node_put(np);
> if (ret) {
> pr_err("failed to get resource\n");
> return ret;
> }

It is correctly balanced, yes.

If you don't need to keep the node for future use, this is fine.

Mark.

2016-04-14 10:40:36

by Pantelis Antoniou

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

Hi Mark,

> On Apr 14, 2016, at 13:10 , Mark Rutland <[email protected]> wrote:
>
> On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
>> Hi experts.
>>
>> My understanding of refcount of DT node is poor.
>> Please help me understand it correctly.
>>
>> Sorry if I am asking stupid questions.
>>
>>
>> [1] Does this reference count exist for Overlay?
>> Is a node freed when its refcount becomes zero?
>
> I'm not familiar with the way that overlays are intended to work, but
> generally this is true, and I believe the same applies.
>
> Pantelis, please correct me if I am wrong on that front.
>

Yes, if the refcount goes to zero everything is released.
No that doesn’t always imply freeing memory.

>> [2] When of_node_put() should be called,
>> or should not be called?
>>
>>
>> Shouldn't of_node_put() be called
>> when we are still referencing to any of its properties?
>>
>> For example, cpu_read_enable_method()
>> in arch/arm64/kernel/cpu_ops.c
>> returns a pointer to the property value
>> instead of creating a copy of it.
>>
>> In this case, of_node_put() should not be called
>> because we are still referencing the DT property
>> (in other words, referencing to the DT node indirectly).
>>
>> Am I right?
>
> Yes, the node should not be freed while its data is referred to.
>
> We are leaking a ref there, though, as we no longer refer to that data
> after cpu_read_ops().
>
> Fixing that will require some restructuring. We don't expect a CPU node
> to need to disappear, so while it's currently not strictly correct the
> code shouldn't lead to any adverse behaviour.
>

So let me explain a bit.

Due to historical reasons by default nodes and property contents are not dynamically
allocated, they are merely being pointed at in the binary blob that was passed on
during boot.

This all takes place in __unflatten_device_tree and there is an allocator argument for
allocating the device_node & properties structures. The content pointers of those structures
are directly pointing in the binary blob.

Early in the boot process the allocator used is not the standard kmalloc allocator; this does
not support freeing the structures at all.

Dynamic DT instead kmalloc’s everything; this is marked by using the node flag OF_DYNAMIC.

So when a node is released a check is performed whether OF_DYNAMIC is set.

There is an additional complication with properties. Since DT property methods (like of_get_property)
return a pointer to property value it is generally not safe to free a property. So when a property
is removed (but the node still exists) it is placed on the node’s deadprops list which may reside
until the node is released.

Hope this makes things clearer.

>> [3] Is the following code correct?
>>
>> np = of_find_compatible_node(NULL, NULL,"foo-node");
>> of_node_put(np);
>> ret = of_address_to_resource(np, 0, &res);
>> if (ret) {
>> pr_err("failed to get resource\n");
>> return ret;
>> }
>>
>> Actually I wrote the code above, and it was applied.
>>
>> But, the node is still referenced while of_address_to_resource() is being run.
>>
>> So the correct code should be as follows?
>>
>> np = of_find_compatible_node(NULL, NULL,"foo-node");
>> ret = of_address_to_resource(np, 0, &res);
>> of_node_put(np);
>> if (ret) {
>> pr_err("failed to get resource\n");
>> return ret;
>> }
>
> It is correctly balanced, yes.
>
> If you don't need to keep the node for future use, this is fine.
>

The second code fragment is the correct one.

> Mark.

Regards

— Pantelis

2016-04-14 16:10:50

by Frank Rowand

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On 4/14/2016 3:02 AM, Pantelis Antoniou wrote:
> Hi Mark,
>
>> On Apr 14, 2016, at 12:59 , Mark Rutland <[email protected]> wrote:
>>
>> On Thu, Apr 14, 2016 at 09:48:49AM +0100, Russell King - ARM Linux wrote:
>>> On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
>>>> Hi experts.
>>>>
>>>> My understanding of refcount of DT node is poor.
>>>
>>> The message from DT people is... don't worry about DT node refcounting.
>>> Do whatever you want with it, they don't care whether you have correct
>>> refcounting or not.
>>>
>>> The background behind that is that I've tried to fix the refcounting,
>>> and even had the coccinelle people generate some stuff to work on this
>>> issue, but DT people's attitude towards it is "don't bother".
>>>
>>> So yes, people may get it wrong, but it seems it's something that DT
>>> people want ignored.
>>
>> I'm not sure that's quite fair; the last discussion I recall about this
>> ended up concluding that we need a better API, rather than papering over
>> problems.
>>
>> That said, there isn't much obvious progress on that front.
>>
>> Frank, Pantelis, Rob, were there any conclusions on this from ELC, or is
>> this something that needs someone to propose something?
>>
>
> Frank mentioned that he wants a new API. I have some ideas about it too.
>
> My take is that drivers should never do reference counting, we have to figure
> out a way for DT access using copy semantics or locks.
>
> References would still be required for core DT code, but that’s a sane subset.

Yes. Nothing concrete about implementation was decided at ELC, but this issue
is on my todo list.

-Frank

>
>> Mark.
>>
>> [1] http://permalink.gmane.org/gmane.linux.drivers.devicetree/153777
>
> Regards
>
> — Pantelis
>
> .
>

2016-04-14 17:03:02

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On Thu, Apr 14, 2016 at 01:02:56PM +0300, Pantelis Antoniou wrote:
> Hi Mark,
>
> > On Apr 14, 2016, at 12:59 , Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Apr 14, 2016 at 09:48:49AM +0100, Russell King - ARM Linux wrote:
> >> On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
> >>> Hi experts.
> >>>
> >>> My understanding of refcount of DT node is poor.
> >>
> >> The message from DT people is... don't worry about DT node refcounting.
> >> Do whatever you want with it, they don't care whether you have correct
> >> refcounting or not.
> >>
> >> The background behind that is that I've tried to fix the refcounting,
> >> and even had the coccinelle people generate some stuff to work on this
> >> issue, but DT people's attitude towards it is "don't bother".
> >>
> >> So yes, people may get it wrong, but it seems it's something that DT
> >> people want ignored.
> >
> > I'm not sure that's quite fair; the last discussion I recall about this
> > ended up concluding that we need a better API, rather than papering over
> > problems.
> >
> > That said, there isn't much obvious progress on that front.
> >
> > Frank, Pantelis, Rob, were there any conclusions on this from ELC, or is
> > this something that needs someone to propose something?
> >
>
> Frank mentioned that he wants a new API. I have some ideas about it too.
>
> My take is that drivers should never do reference counting, we have to figure
> out a way for DT access using copy semantics or locks.

Generally yes, but I think there may be exceptions. I think the locking
is too fine grained for what we need. For almost all users, I think we
only need locking at the overlay or changeset level. The only other user
I am aware of is PSeries (IIRC) and they only need reference counting
for a few things like memory and cpu. I would handle those cases
explicitly. But that is going to require someone familar with PSeries to
work on. I suppose we could separate overlays from the OF_DYNAMIC
dependency (or just the ref counting part of it) and then OF_DYNAMIC
goes back to PPC only.

Rob

2016-04-14 18:39:13

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

On Thu, Apr 14, 2016 at 10:59:57AM +0100, Mark Rutland wrote:
> On Thu, Apr 14, 2016 at 09:48:49AM +0100, Russell King - ARM Linux wrote:
> > On Thu, Apr 14, 2016 at 04:47:57PM +0900, Masahiro Yamada wrote:
> > > Hi experts.
> > >
> > > My understanding of refcount of DT node is poor.
> >
> > The message from DT people is... don't worry about DT node refcounting.
> > Do whatever you want with it, they don't care whether you have correct
> > refcounting or not.
> >
> > The background behind that is that I've tried to fix the refcounting,
> > and even had the coccinelle people generate some stuff to work on this
> > issue, but DT people's attitude towards it is "don't bother".
> >
> > So yes, people may get it wrong, but it seems it's something that DT
> > people want ignored.
>
> I'm not sure that's quite fair; the last discussion I recall about this
> ended up concluding that we need a better API, rather than papering over
> problems.

Sorry, but I started out trying to get the of_node_put() stuff
correct, and sparked Julia into doing coccinelle patches, and I
was told by Rob that we shouldn't care about of_node_put() being
wrong, and the feeling is as I stated it: DT folk don't care
enough to fix the existing places, even though a great many can
be sorted via the coccinelle approach.

Their stance is not something I agree with - if we have something,
it should be correct, even if it's not what we would ultimately
desire, _or_ it should be removed. This half-way house that we
have today is total madness to me.

--
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-04-16 15:03:18

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [Question] refcount of DT node

Hi experts.


2016-04-15 3:38 GMT+09:00 Russell King - ARM Linux <[email protected]>:

>
> Their stance is not something I agree with - if we have something,
> it should be correct, even if it's not what we would ultimately
> desire, _or_ it should be removed. This half-way house that we
> have today is total madness to me.

I agree with Russell here.

I am comfortable to do my best in the given API at this point
(and also happy to adjust my drivers when a new API is supported in the future).

So, the answers from DT people are really helpful for me to write correct code.

Thank you everyone!



--
Best Regards
Masahiro Yamada