2015-11-16 14:46:03

by Liviu Dudau

[permalink] [raw]
Subject: [PATCH 0/2] Improve drm_of_component_probe() and move rockchip to use it

Hello,

When I have introduced the drm_of_component_probe() function I have managed to
break rockchip's DRM driver as the compare_of() function had to match both local
crtc ports and remote encoder ones. As suggested by Russell King, I have now
enhanced the drm_of_component_probe() function to take two comparison functions,
and converted (again) rockchip driver to use it.

I would really like to get some Tested-By this time if possible from IMX, Armada
and Rockchip developers as I lack hardware to do that myself.

The only thing not implemented from Russell's suggestion list is the renaming of
the function into drm_kms_component_probe().

Best regards,
Liviu

Liviu Dudau (2):
drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.
drm/rockchip: Convert the probe function to the generic drm_of_component_probe()

drivers/gpu/drm/armada/armada_drv.c | 3 +-
drivers/gpu/drm/drm_of.c | 23 +++++--
drivers/gpu/drm/imx/imx-drm-core.c | 3 +-
drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 98 ++++++-----------------------
include/drm/drm_of.h | 6 +-
5 files changed, 44 insertions(+), 89 deletions(-)

--
2.6.0


2015-11-16 14:46:35

by Liviu Dudau

[permalink] [raw]
Subject: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

Rockchip DRM driver cannot use the same compare_of() function to match
ports and remote ports (aka encoders) as their OF sub-trees look different.
Add a second compare function to be used when encoders are added to the
component framework and patch the existing users of the function
accordingly.

Signed-off-by: Liviu Dudau <[email protected]>
---
drivers/gpu/drm/armada/armada_drv.c | 3 ++-
drivers/gpu/drm/drm_of.c | 23 ++++++++++++++++++-----
drivers/gpu/drm/imx/imx-drm-core.c | 3 ++-
include/drm/drm_of.h | 6 ++++--
4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c
index 77ab93d..3a2a929 100644
--- a/drivers/gpu/drm/armada/armada_drv.c
+++ b/drivers/gpu/drm/armada/armada_drv.c
@@ -274,7 +274,8 @@ static int armada_drm_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
int ret;

- ret = drm_of_component_probe(dev, compare_dev_name, &armada_master_ops);
+ ret = drm_of_component_probe(dev, compare_dev_name, compare_dev_name,
+ &armada_master_ops);
if (ret != -EINVAL)
return ret;

diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
index 493c05c..58fafd7 100644
--- a/drivers/gpu/drm/drm_of.c
+++ b/drivers/gpu/drm/drm_of.c
@@ -77,7 +77,8 @@ EXPORT_SYMBOL(drm_of_find_possible_crtcs);
* Returns zero if successful, or one of the standard error codes if it fails.
*/
int drm_of_component_probe(struct device *dev,
- int (*compare_of)(struct device *, void *),
+ int (*compare_port)(struct device *, void *),
+ int (*compare_encoder)(struct device *, void *),
const struct component_master_ops *m_ops)
{
struct device_node *ep, *port, *remote;
@@ -101,8 +102,14 @@ int drm_of_component_probe(struct device *dev,
continue;
}

- component_match_add(dev, &match, compare_of, port);
- of_node_put(port);
+ component_match_add(dev, &match, compare_port, port);
+ /*
+ * component_match_add keeps a reference to the port
+ * variable but does not do proper reference counting,
+ * so we cannot release the reference here. If that
+ * gets fixed, the following line should be uncommented
+ */
+ /* of_node_put(port); */
}

if (i == 0) {
@@ -140,8 +147,14 @@ int drm_of_component_probe(struct device *dev,
continue;
}

- component_match_add(dev, &match, compare_of, remote);
- of_node_put(remote);
+ component_match_add(dev, &match, compare_encoder, remote);
+ /*
+ * component_match_add keeps a reference to the port
+ * variable but does not do proper reference counting,
+ * so we cannot release the reference here. If that
+ * gets fixed, the following line should be uncommented
+ */
+ /* of_node_put(remote); */
}
of_node_put(port);
}
diff --git a/drivers/gpu/drm/imx/imx-drm-core.c b/drivers/gpu/drm/imx/imx-drm-core.c
index 64f16ea..0d36410 100644
--- a/drivers/gpu/drm/imx/imx-drm-core.c
+++ b/drivers/gpu/drm/imx/imx-drm-core.c
@@ -531,7 +531,8 @@ static const struct component_master_ops imx_drm_ops = {

static int imx_drm_platform_probe(struct platform_device *pdev)
{
- int ret = drm_of_component_probe(&pdev->dev, compare_of, &imx_drm_ops);
+ int ret = drm_of_component_probe(&pdev->dev, compare_of, compare_of,
+ &imx_drm_ops);

if (!ret)
ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
diff --git a/include/drm/drm_of.h b/include/drm/drm_of.h
index 8544665..1c29e42 100644
--- a/include/drm/drm_of.h
+++ b/include/drm/drm_of.h
@@ -10,7 +10,8 @@ struct device_node;
extern uint32_t drm_of_find_possible_crtcs(struct drm_device *dev,
struct device_node *port);
extern int drm_of_component_probe(struct device *dev,
- int (*compare_of)(struct device *, void *),
+ int (*compare_port)(struct device *, void *),
+ int (*compare_encoder)(struct device *, void *),
const struct component_master_ops *m_ops);
#else
static inline uint32_t drm_of_find_possible_crtcs(struct drm_device *dev,
@@ -21,7 +22,8 @@ static inline uint32_t drm_of_find_possible_crtcs(struct drm_device *dev,

static inline int
drm_of_component_probe(struct device *dev,
- int (*compare_of)(struct device *, void *),
+ int (*compare_port)(struct device *, void *),
+ int (*compare_encoder)(struct device *, void *),
const struct component_master_ops *m_ops)
{
return -EINVAL;
--
2.6.0

2015-11-16 14:47:33

by Liviu Dudau

[permalink] [raw]
Subject: [PATCH 2/2] drm/rockchip: Convert the probe function to the generic drm_of_component_probe()

Take two: Initial attempt to convert rockchip to drm_of_component_probe()
missed the difference between ports and encoders when using the
compare_of() function. Now that drm_of_component_probe() has been enhanced,
let's try again the conversion.

Signed-off-by: Liviu Dudau <[email protected]>
---
drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 98 ++++++-----------------------
1 file changed, 18 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index f22e1e1..581e98c 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -19,6 +19,7 @@
#include <drm/drmP.h>
#include <drm/drm_crtc_helper.h>
#include <drm/drm_fb_helper.h>
+#include <drm/drm_of.h>
#include <linux/dma-mapping.h>
#include <linux/pm_runtime.h>
#include <linux/module.h>
@@ -411,36 +412,6 @@ int rockchip_drm_encoder_get_mux_id(struct device_node *node,
}
EXPORT_SYMBOL_GPL(rockchip_drm_encoder_get_mux_id);

-static int compare_of(struct device *dev, void *data)
-{
- struct device_node *np = data;
-
- return dev->of_node == np;
-}
-
-static void rockchip_add_endpoints(struct device *dev,
- struct component_match **match,
- struct device_node *port)
-{
- struct device_node *ep, *remote;
-
- for_each_child_of_node(port, ep) {
- remote = of_graph_get_remote_port_parent(ep);
- if (!remote || !of_device_is_available(remote)) {
- of_node_put(remote);
- continue;
- } else if (!of_device_is_available(remote->parent)) {
- dev_warn(dev, "parent device of %s is not available\n",
- remote->full_name);
- of_node_put(remote);
- continue;
- }
-
- component_match_add(dev, match, compare_of, remote);
- of_node_put(remote);
- }
-}
-
static int rockchip_drm_bind(struct device *dev)
{
struct drm_device *drm;
@@ -481,63 +452,30 @@ static const struct component_master_ops rockchip_drm_ops = {
.unbind = rockchip_drm_unbind,
};

-static int rockchip_drm_platform_probe(struct platform_device *pdev)
+static int compare_port(struct device *dev, void *data)
{
- struct device *dev = &pdev->dev;
- struct component_match *match = NULL;
- struct device_node *np = dev->of_node;
- struct device_node *port;
- int i;
+ struct device_node *np = data;

- if (!np)
- return -ENODEV;
- /*
- * Bind the crtc ports first, so that
- * drm_of_find_possible_crtcs called from encoder .bind callbacks
- * works as expected.
- */
- for (i = 0;; i++) {
- port = of_parse_phandle(np, "ports", i);
- if (!port)
- break;
-
- if (!of_device_is_available(port->parent)) {
- of_node_put(port);
- continue;
- }
+ return dev->parent->of_node == np;
+}

- component_match_add(dev, &match, compare_of, port->parent);
- of_node_put(port);
- }
+static int compare_encoder(struct device *dev, void *data)
+{
+ struct device_node *np = data;

- if (i == 0) {
- dev_err(dev, "missing 'ports' property\n");
- return -ENODEV;
- }
+ return dev->of_node == np;
+}

- if (!match) {
- dev_err(dev, "No available vop found for display-subsystem.\n");
- return -ENODEV;
- }
- /*
- * For each bound crtc, bind the encoders attached to its
- * remote endpoint.
- */
- for (i = 0;; i++) {
- port = of_parse_phandle(np, "ports", i);
- if (!port)
- break;
-
- if (!of_device_is_available(port->parent)) {
- of_node_put(port);
- continue;
- }
+static int rockchip_drm_platform_probe(struct platform_device *pdev)
+{
+ int ret = drm_of_component_probe(&pdev->dev, compare_port,
+ compare_encoder, &rockchip_drm_ops);

- rockchip_add_endpoints(dev, &match, port);
- of_node_put(port);
- }
+ /* keep compatibility with old code that was returning -ENODEV */
+ if (ret == -EINVAL)
+ return -ENODEV;

- return component_master_add_with_match(dev, &rockchip_drm_ops, match);
+ return ret;
}

static int rockchip_drm_platform_remove(struct platform_device *pdev)
--
2.6.0

2015-11-16 16:07:37

by Heiko Stuebner

[permalink] [raw]
Subject: Re: [PATCH 0/2] Improve drm_of_component_probe() and move rockchip to use it

Hi Liviu,

Am Montag, 16. November 2015, 14:44:51 schrieb Liviu Dudau:
> When I have introduced the drm_of_component_probe() function I have managed
> to break rockchip's DRM driver as the compare_of() function had to match
> both local crtc ports and remote encoder ones. As suggested by Russell
> King, I have now enhanced the drm_of_component_probe() function to take two
> comparison functions, and converted (again) rockchip driver to use it.
>
> I would really like to get some Tested-By this time if possible from IMX,
> Armada and Rockchip developers as I lack hardware to do that myself.
>
> The only thing not implemented from Russell's suggestion list is the
> renaming of the function into drm_kms_component_probe().

with these patches applied I loose the display on my rk3288. A bit of dumb
debug-output shows that the compare function does seem to do strange things:

[ 1.020476] [drm] Initialized drm 1.1.0 20060810
[ 1.025943] drm_of_component_probe: adding port /vop@ff940000/port
[ 1.032225] drm_of_component_probe: adding port /vop@ff930000/port
[ 1.038421] drm_of_component_probe: adding encoder /hdmi@ff980000
[ 1.044535] drm_of_component_probe: adding encoder /edp@ff970000
[ 1.050562] drm_of_component_probe: adding encoder /hdmi@ff980000
[ 1.056663] drm_of_component_probe: adding encoder /edp@ff970000

---- Columns: dev->parent / dev comparing dev->parent->of_node with np

[ 1.062683] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
[ 1.071017] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
[ 1.079024] compare_port: platform/ff930000.vop comparing NULL with /vop@ff940000/port
[ 1.087117] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
[ 1.095130] compare_port: platform/ff930000.vop comparing NULL with /vop@ff940000/port
[ 1.103054] compare_port: platform/ff940000.vop comparing NULL with /vop@ff940000/port
[ 1.111553] panel_regulator: supplied by vcc33_sys


I need to dig deeper to find out what's happening there, but maybe you
already have some idea in the meantime :-)


Thanks
Heiko

2015-11-16 16:23:17

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 02:44:52PM +0000, Liviu Dudau wrote:
> Rockchip DRM driver cannot use the same compare_of() function to match
> ports and remote ports (aka encoders) as their OF sub-trees look different.
> Add a second compare function to be used when encoders are added to the
> component framework and patch the existing users of the function
> accordingly.
>
> Signed-off-by: Liviu Dudau <[email protected]>
> ---
> drivers/gpu/drm/armada/armada_drv.c | 3 ++-
> drivers/gpu/drm/drm_of.c | 23 ++++++++++++++++++-----
> drivers/gpu/drm/imx/imx-drm-core.c | 3 ++-
> include/drm/drm_of.h | 6 ++++--
> 4 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c
> index 77ab93d..3a2a929 100644
> --- a/drivers/gpu/drm/armada/armada_drv.c
> +++ b/drivers/gpu/drm/armada/armada_drv.c
> @@ -274,7 +274,8 @@ static int armada_drm_probe(struct platform_device *pdev)
> struct device *dev = &pdev->dev;
> int ret;
>
> - ret = drm_of_component_probe(dev, compare_dev_name, &armada_master_ops);
> + ret = drm_of_component_probe(dev, compare_dev_name, compare_dev_name,
> + &armada_master_ops);
> if (ret != -EINVAL)
> return ret;
>
> diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
> index 493c05c..58fafd7 100644
> --- a/drivers/gpu/drm/drm_of.c
> +++ b/drivers/gpu/drm/drm_of.c
> @@ -77,7 +77,8 @@ EXPORT_SYMBOL(drm_of_find_possible_crtcs);
> * Returns zero if successful, or one of the standard error codes if it fails.
> */
> int drm_of_component_probe(struct device *dev,
> - int (*compare_of)(struct device *, void *),
> + int (*compare_port)(struct device *, void *),
> + int (*compare_encoder)(struct device *, void *),
> const struct component_master_ops *m_ops)
> {
> struct device_node *ep, *port, *remote;
> @@ -101,8 +102,14 @@ int drm_of_component_probe(struct device *dev,
> continue;
> }
>
> - component_match_add(dev, &match, compare_of, port);
> - of_node_put(port);
> + component_match_add(dev, &match, compare_port, port);
> + /*
> + * component_match_add keeps a reference to the port
> + * variable but does not do proper reference counting,
> + * so we cannot release the reference here. If that
> + * gets fixed, the following line should be uncommented
> + */
> + /* of_node_put(port); */

Even if it is fixed, this line should _never_ be uncommented. This is
totally the wrong place to drop the reference.

> }
>
> if (i == 0) {
> @@ -140,8 +147,14 @@ int drm_of_component_probe(struct device *dev,
> continue;
> }
>
> - component_match_add(dev, &match, compare_of, remote);
> - of_node_put(remote);
> + component_match_add(dev, &match, compare_encoder, remote);
> + /*
> + * component_match_add keeps a reference to the port
> + * variable but does not do proper reference counting,
> + * so we cannot release the reference here. If that
> + * gets fixed, the following line should be uncommented
> + */
> + /* of_node_put(remote); */

Ditto.

The component helper retains a reference (by pointer value) to the 'port'
or 'remote', which is then subsequently passed into the supplied
'compare_encoder' or 'compare_port' method.

If you drop the reference after adding the match, then that pointer can
go away and be re-used for something else - and that means it's totally
useless to the compare functions, since the memory pointed to by it may
not contain an device_node struct anymore.

So, it _never_ makes sense to drop the reference at this point.

Where it does make sense is when the array of matches is destroyed. For
that, we'd need to add a callback to the master ops struct so that the
master driver can properly release these pointers.

I'd keep most of the big comments though (up to "... varable") to
explain why we don't drop the reference.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-16 16:30:54

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/2] drm/rockchip: Convert the probe function to the generic drm_of_component_probe()

I've tweaked your patch to make the above (buggy) change a little clearer.

On Mon, Nov 16, 2015 at 02:44:53PM +0000, Liviu Dudau wrote:
> - for (i = 0;; i++) {
> - port = of_parse_phandle(np, "ports", i);
> - if (!port)
> - break;
> -
> - if (!of_device_is_available(port->parent)) {
> - of_node_put(port);
> - continue;
> - }
>
> - component_match_add(dev, &match, compare_of, port->parent);
> - of_node_put(port);
> - }

> -static int compare_of(struct device *dev, void *data)
> -{
> - struct device_node *np = data;
> -
> - return dev->of_node == np;
> -}

The original above passes port->parent to component_match_add(). This
means 'np' in the above compare_of() function is 'port->parent'.

This means the above comparison is effectively:

dev->of_node == port->parent

The generic code instead does this:

component_match_add(dev, &match, compare_of, port);

So what we get in the comparison function is 'port' rather than
'port->parent':

> +static int compare_port(struct device *dev, void *data)
> {
> + struct device_node *np = data;
> + return dev->parent->of_node == np;
> +}

which means the comparison is:

dev->parent->of_node == port

which is a different comparison from the above.

You instead want this to be:

return dev->of_node == np->parent;

Heiko, please test the above change to compare_port() - I think you'll
find that will fix your issue.

Thanks.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-16 16:49:14

by Liviu Dudau

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 04:22:41PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 16, 2015 at 02:44:52PM +0000, Liviu Dudau wrote:
> > Rockchip DRM driver cannot use the same compare_of() function to match
> > ports and remote ports (aka encoders) as their OF sub-trees look different.
> > Add a second compare function to be used when encoders are added to the
> > component framework and patch the existing users of the function
> > accordingly.
> >
> > Signed-off-by: Liviu Dudau <[email protected]>
> > ---
> > drivers/gpu/drm/armada/armada_drv.c | 3 ++-
> > drivers/gpu/drm/drm_of.c | 23 ++++++++++++++++++-----
> > drivers/gpu/drm/imx/imx-drm-core.c | 3 ++-
> > include/drm/drm_of.h | 6 ++++--
> > 4 files changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c
> > index 77ab93d..3a2a929 100644
> > --- a/drivers/gpu/drm/armada/armada_drv.c
> > +++ b/drivers/gpu/drm/armada/armada_drv.c
> > @@ -274,7 +274,8 @@ static int armada_drm_probe(struct platform_device *pdev)
> > struct device *dev = &pdev->dev;
> > int ret;
> >
> > - ret = drm_of_component_probe(dev, compare_dev_name, &armada_master_ops);
> > + ret = drm_of_component_probe(dev, compare_dev_name, compare_dev_name,
> > + &armada_master_ops);
> > if (ret != -EINVAL)
> > return ret;
> >
> > diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
> > index 493c05c..58fafd7 100644
> > --- a/drivers/gpu/drm/drm_of.c
> > +++ b/drivers/gpu/drm/drm_of.c
> > @@ -77,7 +77,8 @@ EXPORT_SYMBOL(drm_of_find_possible_crtcs);
> > * Returns zero if successful, or one of the standard error codes if it fails.
> > */
> > int drm_of_component_probe(struct device *dev,
> > - int (*compare_of)(struct device *, void *),
> > + int (*compare_port)(struct device *, void *),
> > + int (*compare_encoder)(struct device *, void *),
> > const struct component_master_ops *m_ops)
> > {
> > struct device_node *ep, *port, *remote;
> > @@ -101,8 +102,14 @@ int drm_of_component_probe(struct device *dev,
> > continue;
> > }
> >
> > - component_match_add(dev, &match, compare_of, port);
> > - of_node_put(port);
> > + component_match_add(dev, &match, compare_port, port);
> > + /*
> > + * component_match_add keeps a reference to the port
> > + * variable but does not do proper reference counting,
> > + * so we cannot release the reference here. If that
> > + * gets fixed, the following line should be uncommented
> > + */
> > + /* of_node_put(port); */
>
> Even if it is fixed, this line should _never_ be uncommented. This is
> totally the wrong place to drop the reference.

What if (as implied by the comment) component_match_add() does some reference counting
of sorts? (I know it doesn't get used only with OF nodes, it is more generic). I feel
that holding onto a reference to a counted resource without incrementing the use count
is not the right way of doing things, or at least it should be clearly documented in
the interface of component_match_add() so that people understand the mess they are
getting into.


>
> > }
> >
> > if (i == 0) {
> > @@ -140,8 +147,14 @@ int drm_of_component_probe(struct device *dev,
> > continue;
> > }
> >
> > - component_match_add(dev, &match, compare_of, remote);
> > - of_node_put(remote);
> > + component_match_add(dev, &match, compare_encoder, remote);
> > + /*
> > + * component_match_add keeps a reference to the port
> > + * variable but does not do proper reference counting,
> > + * so we cannot release the reference here. If that
> > + * gets fixed, the following line should be uncommented
> > + */
> > + /* of_node_put(remote); */
>
> Ditto.
>
> The component helper retains a reference (by pointer value) to the 'port'
> or 'remote', which is then subsequently passed into the supplied
> 'compare_encoder' or 'compare_port' method.
>
> If you drop the reference after adding the match, then that pointer can
> go away and be re-used for something else - and that means it's totally
> useless to the compare functions, since the memory pointed to by it may
> not contain an device_node struct anymore.
>
> So, it _never_ makes sense to drop the reference at this point.

See my comment above. Keeping a reference to a resource and passing it
on to other functions (even if they are callbacks) should clearly flag the
component framework as one of the refcounters.

>
> Where it does make sense is when the array of matches is destroyed. For
> that, we'd need to add a callback to the master ops struct so that the
> master driver can properly release these pointers.
>
> I'd keep most of the big comments though (up to "... varable") to
> explain why we don't drop the reference.

Sorry, if I understand you correctly, you're saying that we should keep
the following comment:

/*
* component_match_add keeps a reference to the port
* variable.
*/

How does that explain why we don't drop the reference? Did you mean the comment
should be truncated somewhere else by chance (like including the fact the
reference counting is not done)?

Best regards,
Liviu

>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
>

--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯

2015-11-16 16:50:30

by Liviu Dudau

[permalink] [raw]
Subject: Re: [PATCH 0/2] Improve drm_of_component_probe() and move rockchip to use it

On Mon, Nov 16, 2015 at 05:07:17PM +0100, Heiko Stübner wrote:
> Hi Liviu,
>
> Am Montag, 16. November 2015, 14:44:51 schrieb Liviu Dudau:
> > When I have introduced the drm_of_component_probe() function I have managed
> > to break rockchip's DRM driver as the compare_of() function had to match
> > both local crtc ports and remote encoder ones. As suggested by Russell
> > King, I have now enhanced the drm_of_component_probe() function to take two
> > comparison functions, and converted (again) rockchip driver to use it.
> >
> > I would really like to get some Tested-By this time if possible from IMX,
> > Armada and Rockchip developers as I lack hardware to do that myself.
> >
> > The only thing not implemented from Russell's suggestion list is the
> > renaming of the function into drm_kms_component_probe().
>
> with these patches applied I loose the display on my rk3288. A bit of dumb
> debug-output shows that the compare function does seem to do strange things:
>
> [ 1.020476] [drm] Initialized drm 1.1.0 20060810
> [ 1.025943] drm_of_component_probe: adding port /vop@ff940000/port
> [ 1.032225] drm_of_component_probe: adding port /vop@ff930000/port
> [ 1.038421] drm_of_component_probe: adding encoder /hdmi@ff980000
> [ 1.044535] drm_of_component_probe: adding encoder /edp@ff970000
> [ 1.050562] drm_of_component_probe: adding encoder /hdmi@ff980000
> [ 1.056663] drm_of_component_probe: adding encoder /edp@ff970000
>
> ---- Columns: dev->parent / dev comparing dev->parent->of_node with np
>
> [ 1.062683] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
> [ 1.071017] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
> [ 1.079024] compare_port: platform/ff930000.vop comparing NULL with /vop@ff940000/port
> [ 1.087117] compare_port: platform/ff980000.hdmi comparing NULL with /vop@ff940000/port
> [ 1.095130] compare_port: platform/ff930000.vop comparing NULL with /vop@ff940000/port
> [ 1.103054] compare_port: platform/ff940000.vop comparing NULL with /vop@ff940000/port
> [ 1.111553] panel_regulator: supplied by vcc33_sys
>
>
> I need to dig deeper to find out what's happening there, but maybe you
> already have some idea in the meantime :-)

Did I got the content of the compare_{port,encoder}() functions the wrong way around?

Best regards,
Liviu

>
>
> Thanks
> Heiko
>

--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯

2015-11-16 16:52:12

by Liviu Dudau

[permalink] [raw]
Subject: Re: [PATCH 2/2] drm/rockchip: Convert the probe function to the generic drm_of_component_probe()

On Mon, Nov 16, 2015 at 04:30:16PM +0000, Russell King - ARM Linux wrote:
> I've tweaked your patch to make the above (buggy) change a little clearer.
>
> On Mon, Nov 16, 2015 at 02:44:53PM +0000, Liviu Dudau wrote:
> > - for (i = 0;; i++) {
> > - port = of_parse_phandle(np, "ports", i);
> > - if (!port)
> > - break;
> > -
> > - if (!of_device_is_available(port->parent)) {
> > - of_node_put(port);
> > - continue;
> > - }
> >
> > - component_match_add(dev, &match, compare_of, port->parent);
> > - of_node_put(port);
> > - }
>
> > -static int compare_of(struct device *dev, void *data)
> > -{
> > - struct device_node *np = data;
> > -
> > - return dev->of_node == np;
> > -}
>
> The original above passes port->parent to component_match_add(). This
> means 'np' in the above compare_of() function is 'port->parent'.
>
> This means the above comparison is effectively:
>
> dev->of_node == port->parent
>
> The generic code instead does this:
>
> component_match_add(dev, &match, compare_of, port);
>
> So what we get in the comparison function is 'port' rather than
> 'port->parent':
>
> > +static int compare_port(struct device *dev, void *data)
> > {
> > + struct device_node *np = data;
> > + return dev->parent->of_node == np;
> > +}
>
> which means the comparison is:
>
> dev->parent->of_node == port
>
> which is a different comparison from the above.
>
> You instead want this to be:
>
> return dev->of_node == np->parent;
>
> Heiko, please test the above change to compare_port() - I think you'll
> find that will fix your issue.

Sorry, I admit I'm not very good at doing patches without being able
to test them. :(

Thanks for helping on this!

Liviu

>
> Thanks.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
>

--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯

2015-11-16 17:02:10

by Heiko Stuebner

[permalink] [raw]
Subject: Re: [PATCH 2/2] drm/rockchip: Convert the probe function to the generic drm_of_component_probe()

Am Montag, 16. November 2015, 16:52:06 schrieb Liviu Dudau:
> On Mon, Nov 16, 2015 at 04:30:16PM +0000, Russell King - ARM Linux wrote:
> > I've tweaked your patch to make the above (buggy) change a little clearer.
> >
> > On Mon, Nov 16, 2015 at 02:44:53PM +0000, Liviu Dudau wrote:
> > > - for (i = 0;; i++) {
> > > - port = of_parse_phandle(np, "ports", i);
> > > - if (!port)
> > > - break;
> > > -
> > > - if (!of_device_is_available(port->parent)) {
> > > - of_node_put(port);
> > > - continue;
> > > - }
> > >
> > > - component_match_add(dev, &match, compare_of, port->parent);
> > > - of_node_put(port);
> > > - }
> > >
> > > -static int compare_of(struct device *dev, void *data)
> > > -{
> > > - struct device_node *np = data;
> > > -
> > > - return dev->of_node == np;
> > > -}
> >
> > The original above passes port->parent to component_match_add(). This
> > means 'np' in the above compare_of() function is 'port->parent'.
> >
> > This means the above comparison is effectively:
> > dev->of_node == port->parent
> >
> > The generic code instead does this:
> > component_match_add(dev, &match, compare_of, port);
> >
> > So what we get in the comparison function is 'port' rather than
> >
> > 'port->parent':
> > > +static int compare_port(struct device *dev, void *data)
> > >
> > > {
> > >
> > > + struct device_node *np = data;
> > > + return dev->parent->of_node == np;
> > > +}
> >
> > which means the comparison is:
> > dev->parent->of_node == port
> >
> > which is a different comparison from the above.
> >
> > You instead want this to be:
> > return dev->of_node == np->parent;
> >
> > Heiko, please test the above change to compare_port() - I think you'll
> > find that will fix your issue.
>
> Sorry, I admit I'm not very good at doing patches without being able
> to test them. :(
>
> Thanks for helping on this!

Russell's hint was correct. With the compare function changed like he pointed
out, I again get a working display with your patches :-)

So, thanks Russell for spotting this.


Heiko

2015-11-16 17:23:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

Please sensibly wrap your messages. Your lines are longer than 80 characters which makes it exceedingly difficult for some people to reply to your very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long lines without first reformatting them manually - and why should they bother to reply if they have that kind of additional work? Thanks.

On Mon, Nov 16, 2015 at 04:49:07PM +0000, Liviu Dudau wrote:
> On Mon, Nov 16, 2015 at 04:22:41PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Nov 16, 2015 at 02:44:52PM +0000, Liviu Dudau wrote:
> > > Rockchip DRM driver cannot use the same compare_of() function to match
> > > ports and remote ports (aka encoders) as their OF sub-trees look different.
> > > Add a second compare function to be used when encoders are added to the
> > > component framework and patch the existing users of the function
> > > accordingly.
> > >
> > > Signed-off-by: Liviu Dudau <[email protected]>
> > > ---
> > > drivers/gpu/drm/armada/armada_drv.c | 3 ++-
> > > drivers/gpu/drm/drm_of.c | 23 ++++++++++++++++++-----
> > > drivers/gpu/drm/imx/imx-drm-core.c | 3 ++-
> > > include/drm/drm_of.h | 6 ++++--
> > > 4 files changed, 26 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c
> > > index 77ab93d..3a2a929 100644
> > > --- a/drivers/gpu/drm/armada/armada_drv.c
> > > +++ b/drivers/gpu/drm/armada/armada_drv.c
> > > @@ -274,7 +274,8 @@ static int armada_drm_probe(struct platform_device *pdev)
> > > struct device *dev = &pdev->dev;
> > > int ret;
> > >
> > > - ret = drm_of_component_probe(dev, compare_dev_name, &armada_master_ops);
> > > + ret = drm_of_component_probe(dev, compare_dev_name, compare_dev_name,
> > > + &armada_master_ops);
> > > if (ret != -EINVAL)
> > > return ret;
> > >
> > > diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
> > > index 493c05c..58fafd7 100644
> > > --- a/drivers/gpu/drm/drm_of.c
> > > +++ b/drivers/gpu/drm/drm_of.c
> > > @@ -77,7 +77,8 @@ EXPORT_SYMBOL(drm_of_find_possible_crtcs);
> > > * Returns zero if successful, or one of the standard error codes if it fails.
> > > */
> > > int drm_of_component_probe(struct device *dev,
> > > - int (*compare_of)(struct device *, void *),
> > > + int (*compare_port)(struct device *, void *),
> > > + int (*compare_encoder)(struct device *, void *),
> > > const struct component_master_ops *m_ops)
> > > {
> > > struct device_node *ep, *port, *remote;
> > > @@ -101,8 +102,14 @@ int drm_of_component_probe(struct device *dev,
> > > continue;
> > > }
> > >
> > > - component_match_add(dev, &match, compare_of, port);
> > > - of_node_put(port);
> > > + component_match_add(dev, &match, compare_port, port);
> > > + /*
> > > + * component_match_add keeps a reference to the port
> > > + * variable but does not do proper reference counting,
> > > + * so we cannot release the reference here. If that
> > > + * gets fixed, the following line should be uncommented
> > > + */
> > > + /* of_node_put(port); */
> >
> > Even if it is fixed, this line should _never_ be uncommented. This is
> > totally the wrong place to drop the reference.
>
> What if (as implied by the comment) component_match_add() does some
> reference counting of sorts? (I know it doesn't get used only with
> OF nodes, it is more generic).

Put those two sentences together and please think about it. If the
pointer type is unknown to component_match_add(), how could it ever
use of_node_get() on it? What if it's a string? Or a struct device?
Or something else.

> I feel that holding onto a reference to a counted resource without
> incrementing the use count is not the right way of doing things, or
> at least it should be clearly documented in the interface of
> component_match_add() so that people understand the mess they are
> getting into.

That's just crap. You're not thinking before you hit your keyboard.
Why should component_match_add(), which has _zero_ knowledge about
what it's being used with, have documentation attached to it which
would be:

"If you use component_match_add() with X, you need to do X'. If you
use it with Y, you need to do Y'. If you use it with Z, you need to
do Z'."

No, that's utterly insane.

> > Where it does make sense is when the array of matches is destroyed. For
> > that, we'd need to add a callback to the master ops struct so that the
> > master driver can properly release these pointers.
> >
> > I'd keep most of the big comments though (up to "... varable") to
> > explain why we don't drop the reference.
>
> Sorry, if I understand you correctly, you're saying that we should keep
> the following comment:
>
> /*
> * component_match_add keeps a reference to the port
> * variable.
> */

Exactly.

> How does that explain why we don't drop the reference? Did you mean the
> comment should be truncated somewhere else by chance (like including the
> fact the reference counting is not done)?

I'm sorry, I totally fail to understand what's soo difficult for you to
understand about the above comment. I can only conclude that there's
something wrong in your understanding of memory pointers, aka addresses.

Each and every memory pointer is a _reference_ to a piece of data - just
like your home address is a _reference_ to the location where you live.

Some references we explicitly count, other references we implicitly count,
and others we don't bother.

In this case, of_parse_phandle() looks up the reference to the requested
device_node struct - and of_parse_phandle() knows that the reference is
going to be passed to the caller, outside it's control, so it increments
the count of references.

So, the code in drm_of_component_probe() gains a reference _and_ a count
against the device_node, the count being a tool to ensure that the
reference doesn't become invalid.

We then store _our_ reference to this inside the component helper by
means of calling component_add_match(). We don't care about how or
where it's stored, only that it is stored.

Hence, because we want to _store_ that away, it would now be incorrect
to drop the count against the reference. We have stored a reference
to it inside a helper which knows nothing about the refcounting of this
structure.

When the helper gets fixed, it will have a callback to release the stored
data, which becomes the responsibility of the creator to provide. In
this case, the reference to the device node by way of pointer value is
passed back to the creating code, at which point the refcount is dropped.

There is no need what so ever for the component stuff to mess around
with reference counts itself - and adding it will only make things
more messy. I can't see why anyone would even suggest crap like that.

Whenever something takes an opaque object, refcounting has to be done
by the code using the opaque object, which has to ensure that the
lifetime of the references stored in the opaque object are longer than
the opaque object itself. That's exactly what's going on here.

Please bear in mind that a _reference_ and a _refcount_ are two totally
separate things. A reference would be a copy of your home postal
address. A refcounter would be a box kept at your home address which
indicates how many references there are out there.

If you gave me your home postal address on a piece of paper, and I then
photocopied it, stored it in a safe, and then destroyed the original
copy you gave me, why should I have to increment the refcounter and
then immediately decrement it again...

That's exactly what's going on here: we're _temporarily_ transferring
the responsibility for the held refcount to the opaque component
helper. The fact that the component helper has no way to transfer
responsibility for that held refcount back to us is neither here nor
there...

To rephrase using the example above - the reference is stored in the
safe, but the safe has been destroyed without regard for its contents.
That's where the problem lies: the safe should have been opened, the
address obtained, and the refcounter for the address decremented.

If you separate the concept of "reference" from "refcount" and realise
that a reference is merely a pointer to something, then I don't see
what the issue is with understanding what's going on here.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-16 17:32:11

by Daniel Stone

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On 16 November 2015 at 17:22, Russell King - ARM Linux
<[email protected]> wrote:
> Please sensibly wrap your messages. Your lines are longer than 80
> characters which makes it exceedingly difficult for some people to reply
> to your very very very very very very very very very very very very
> very very very very very very very very very very very very very very
> very very very very very very very very very very very very very very
> very very very very very very very very very very very very very very
> very very very very very very very very very very very very very very
> very very very very very very long lines without first reformatting
> them manually - and why should they bother to reply if they have that
> kind of additional work? Thanks.

His lines are wrapped mostly at 80, but sometimes at 86, characters.
You should probably look into fixing your MUA.

Cheers,
Daniel

2015-11-16 17:43:32

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 05:32:05PM +0000, Daniel Stone wrote:
> On 16 November 2015 at 17:22, Russell King - ARM Linux
> <[email protected]> wrote:
> > Please sensibly wrap your messages. Your lines are longer than 80
> > characters which makes it exceedingly difficult for some people to reply
> > to your very very very very very very very very very very very very
> > very very very very very very very very very very very very very very
> > very very very very very very very very very very very very very very
> > very very very very very very very very very very very very very very
> > very very very very very very very very very very very very very very
> > very very very very very very long lines without first reformatting
> > them manually - and why should they bother to reply if they have that
> > kind of additional work? Thanks.
>
> His lines are wrapped mostly at 80, but sometimes at 86, characters.
> You should probably look into fixing your MUA.

No.

Standard net etiquette is that email should be wrapped around 72
characters to give space for reply indentation to remain readable
on an 80 column screen. Most of my email replies are written on an
80 column screen.

Wrapping at 80+ characters is against proper net etiquette.

Same applies to GIT commits: wrap at around 72 characters to allow
them to be readable on an 80 column display. Linus will hate you
if you don't.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-16 17:48:40

by Daniel Stone

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On 16 November 2015 at 17:43, Russell King - ARM Linux
<[email protected]> wrote:
> On Mon, Nov 16, 2015 at 05:32:05PM +0000, Daniel Stone wrote:
>> On 16 November 2015 at 17:22, Russell King - ARM Linux
>> <[email protected]> wrote:
>> > Please sensibly wrap your messages. Your lines are longer than 80
>> > characters which makes it exceedingly difficult for some people to reply
>> > to your very very very very very very very very very very very very
>> > very very very very very very very very very very very very very very
>> > very very very very very very very very very very very very very very
>> > very very very very very very very very very very very very very very
>> > very very very very very very very very very very very very very very
>> > very very very very very very long lines without first reformatting
>> > them manually - and why should they bother to reply if they have that
>> > kind of additional work? Thanks.
>>
>> His lines are wrapped mostly at 80, but sometimes at 86, characters.
>> You should probably look into fixing your MUA.
>
> No.
>
> Standard net etiquette is that email should be wrapped around 72
> characters to give space for reply indentation to remain readable
> on an 80 column screen. Most of my email replies are written on an
> 80 column screen.

Oh, I just thought there was
a greater reason for your
660-character tantrum
than someone overshooting
by 14 characters on
a couple of lines.

Then again,
having read the
rest of the
pointlessly hostile mail,
it's entirely in keeping
with the rest.

Shame that your
needless aggression
obscures a
fairly valid
point.

Have
a
lovely
even
ing
.

2015-11-16 17:53:59

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 05:48:32PM +0000, Daniel Stone wrote:
> On 16 November 2015 at 17:43, Russell King - ARM Linux
> <[email protected]> wrote:
> > On Mon, Nov 16, 2015 at 05:32:05PM +0000, Daniel Stone wrote:
> >> On 16 November 2015 at 17:22, Russell King - ARM Linux
> >> <[email protected]> wrote:
> >> > Please sensibly wrap your messages. Your lines are longer than 80
> >> > characters which makes it exceedingly difficult for some people to reply
> >> > to your very very very very very very very very very very very very
> >> > very very very very very very very very very very very very very very
> >> > very very very very very very very very very very very very very very
> >> > very very very very very very very very very very very very very very
> >> > very very very very very very very very very very very very very very
> >> > very very very very very very long lines without first reformatting
> >> > them manually - and why should they bother to reply if they have that
> >> > kind of additional work? Thanks.
> >>
> >> His lines are wrapped mostly at 80, but sometimes at 86, characters.
> >> You should probably look into fixing your MUA.
> >
> > No.
> >
> > Standard net etiquette is that email should be wrapped around 72
> > characters to give space for reply indentation to remain readable
> > on an 80 column screen. Most of my email replies are written on an
> > 80 column screen.
>
> Oh, I just thought there was
> a greater reason for your
> 660-character tantrum

Sorry, I've stopped reading at this point, at the point where you start
slinging crap around. If you have a serious point to make, maybe you
could grow up in the next five minutes and write a proper email, thanks.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-16 21:24:02

by Liviu Dudau

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 05:22:48PM +0000, Russell King - ARM Linux wrote:
> Please sensibly wrap your messages. Your lines are longer than 80 characters which makes it exceedingly difficult for some people to reply to your very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long lines without first reformatting them manually - and why should they bother to reply if they have that kind of additional work? Thanks.

I usually follow the text above and type in text mode. I am made to suffer
a Microsoft Exchange server in between me and the open world and I have
no control if it decides to reformat the message. It is by no means
my intent to type everything in one line but I also choose sometime to
have longer lines if I believe that it helps the readability of the text.

>
> On Mon, Nov 16, 2015 at 04:49:07PM +0000, Liviu Dudau wrote:
> > On Mon, Nov 16, 2015 at 04:22:41PM +0000, Russell King - ARM Linux wrote:
> > > On Mon, Nov 16, 2015 at 02:44:52PM +0000, Liviu Dudau wrote:
> > > > Rockchip DRM driver cannot use the same compare_of() function to match
> > > > ports and remote ports (aka encoders) as their OF sub-trees look different.
> > > > Add a second compare function to be used when encoders are added to the
> > > > component framework and patch the existing users of the function
> > > > accordingly.
> > > >
> > > > Signed-off-by: Liviu Dudau <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/armada/armada_drv.c | 3 ++-
> > > > drivers/gpu/drm/drm_of.c | 23 ++++++++++++++++++-----
> > > > drivers/gpu/drm/imx/imx-drm-core.c | 3 ++-
> > > > include/drm/drm_of.h | 6 ++++--
> > > > 4 files changed, 26 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c
> > > > index 77ab93d..3a2a929 100644
> > > > --- a/drivers/gpu/drm/armada/armada_drv.c
> > > > +++ b/drivers/gpu/drm/armada/armada_drv.c
> > > > @@ -274,7 +274,8 @@ static int armada_drm_probe(struct platform_device *pdev)
> > > > struct device *dev = &pdev->dev;
> > > > int ret;
> > > >
> > > > - ret = drm_of_component_probe(dev, compare_dev_name, &armada_master_ops);
> > > > + ret = drm_of_component_probe(dev, compare_dev_name, compare_dev_name,
> > > > + &armada_master_ops);
> > > > if (ret != -EINVAL)
> > > > return ret;
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_of.c b/drivers/gpu/drm/drm_of.c
> > > > index 493c05c..58fafd7 100644
> > > > --- a/drivers/gpu/drm/drm_of.c
> > > > +++ b/drivers/gpu/drm/drm_of.c
> > > > @@ -77,7 +77,8 @@ EXPORT_SYMBOL(drm_of_find_possible_crtcs);
> > > > * Returns zero if successful, or one of the standard error codes if it fails.
> > > > */
> > > > int drm_of_component_probe(struct device *dev,
> > > > - int (*compare_of)(struct device *, void *),
> > > > + int (*compare_port)(struct device *, void *),
> > > > + int (*compare_encoder)(struct device *, void *),
> > > > const struct component_master_ops *m_ops)
> > > > {
> > > > struct device_node *ep, *port, *remote;
> > > > @@ -101,8 +102,14 @@ int drm_of_component_probe(struct device *dev,
> > > > continue;
> > > > }
> > > >
> > > > - component_match_add(dev, &match, compare_of, port);
> > > > - of_node_put(port);
> > > > + component_match_add(dev, &match, compare_port, port);
> > > > + /*
> > > > + * component_match_add keeps a reference to the port
> > > > + * variable but does not do proper reference counting,
> > > > + * so we cannot release the reference here. If that
> > > > + * gets fixed, the following line should be uncommented
> > > > + */
> > > > + /* of_node_put(port); */
> > >
> > > Even if it is fixed, this line should _never_ be uncommented. This is
> > > totally the wrong place to drop the reference.
> >
> > What if (as implied by the comment) component_match_add() does some
> > reference counting of sorts? (I know it doesn't get used only with
> > OF nodes, it is more generic).
>
> Put those two sentences together and please think about it. If the
> pointer type is unknown to component_match_add(), how could it ever
> use of_node_get() on it? What if it's a string? Or a struct device?
> Or something else.

I did not say of_node_get() I said "some reference counting of sorts".
One option would be to have (yet) another callback that the user of the
components framework has to provide that does resource management. And for
the record, I did thought about it and not for just a few minutes. I also
came out of the thought process with the conclusion that while the components
framework is doing the job it was coded for, it needs serious improvements
in terms of documentation.

>
> > I feel that holding onto a reference to a counted resource without
> > incrementing the use count is not the right way of doing things, or
> > at least it should be clearly documented in the interface of
> > component_match_add() so that people understand the mess they are
> > getting into.
>
> That's just crap. You're not thinking before you hit your keyboard.

Russell, I have no idea what do I need to do in order to gain your respect
and for you to start treating me like a human being. Maybe *I* need to
lower my expectations and stop hoping to have a civilised discussion with
you.

> Why should component_match_add(), which has _zero_ knowledge about
> what it's being used with, have documentation attached to it which
> would be:
>
> "If you use component_match_add() with X, you need to do X'. If you
> use it with Y, you need to do Y'. If you use it with Z, you need to
> do Z'."
>
> No, that's utterly insane.

No. It is utterly insane not to have documentation when using the framework
or the function leads to memory leaks. You can say "If you use
component_match_add() you need to provide a callback for resource management
if your X or Y or Z needs such operations". The current components framework
is seriously in need of documentation. To borrow your language, look at the
crap one needs to put up with when starting to use the framework: to be a
component one needs to fill the component_ops structure. This is its
definition:

struct component_ops {
int (*bind)(struct device *, struct device *, void *);
void (*unbind)(struct device *, struct device *, void *);
};

Note that there are no names associated with the parameters and no documentation
to allow one to distinguish between the two struct device pointers. You have
to look at component_bind() to learn that first parameter is a component's
dev part and the second is the master's dev part.

You might think that the code is the proper documentation and ultimately that
is right, but even code is wrong sometimes and (as we discovered with the
component_match_add() discussion here) its side effects are not easy to spot
even to the authors of the code.

So, yes, we do need documentation, even if it is insane, because it tells us
how insanely fragile the code is. Even God, through Moses, has tersely
documented his Operating System through the 10 commandments.

Another example: of the 6 commits for drivers/base/component.c, 3 of them
have "fixed ... handling/cleanup/bug". So it is not easy to get it right.

>
> > > Where it does make sense is when the array of matches is destroyed. For
> > > that, we'd need to add a callback to the master ops struct so that the
> > > master driver can properly release these pointers.
> > >
> > > I'd keep most of the big comments though (up to "... varable") to
> > > explain why we don't drop the reference.
> >
> > Sorry, if I understand you correctly, you're saying that we should keep
> > the following comment:
> >
> > /*
> > * component_match_add keeps a reference to the port
> > * variable.
> > */
>
> Exactly.
>
> > How does that explain why we don't drop the reference? Did you mean the
> > comment should be truncated somewhere else by chance (like including the
> > fact the reference counting is not done)?
>
> I'm sorry, I totally fail to understand what's soo difficult for you to
> understand about the above comment. I can only conclude that there's
> something wrong in your understanding of memory pointers, aka addresses.

No, I was operating under the belief that if some framework holds a pointer
to a resource it should also do the management of that resource. Your
choice for the component framework has been different and you have decided
that it is just a conduit for data gathering and binding, without any deep
knowledge of what that data is. But that is not documented anywhere, so I
don't feel too bad about having a different view.

>
> Each and every memory pointer is a _reference_ to a piece of data - just
> like your home address is a _reference_ to the location where you live.
>
> Some references we explicitly count, other references we implicitly count,
> and others we don't bother.
>
> In this case, of_parse_phandle() looks up the reference to the requested
> device_node struct - and of_parse_phandle() knows that the reference is
> going to be passed to the caller, outside it's control, so it increments
> the count of references.
>
> So, the code in drm_of_component_probe() gains a reference _and_ a count
> against the device_node, the count being a tool to ensure that the
> reference doesn't become invalid.
>
> We then store _our_ reference to this inside the component helper by
> means of calling component_add_match(). We don't care about how or
> where it's stored, only that it is stored.
>
> Hence, because we want to _store_ that away, it would now be incorrect
> to drop the count against the reference. We have stored a reference
> to it inside a helper which knows nothing about the refcounting of this
> structure.
>
> When the helper gets fixed, it will have a callback to release the stored
> data, which becomes the responsibility of the creator to provide. In
> this case, the reference to the device node by way of pointer value is
> passed back to the creating code, at which point the refcount is dropped.
>
> There is no need what so ever for the component stuff to mess around
> with reference counts itself - and adding it will only make things
> more messy. I can't see why anyone would even suggest crap like that.
>
> Whenever something takes an opaque object, refcounting has to be done
> by the code using the opaque object, which has to ensure that the
> lifetime of the references stored in the opaque object are longer than
> the opaque object itself. That's exactly what's going on here.
>
> Please bear in mind that a _reference_ and a _refcount_ are two totally
> separate things. A reference would be a copy of your home postal
> address. A refcounter would be a box kept at your home address which
> indicates how many references there are out there.
>
> If you gave me your home postal address on a piece of paper, and I then
> photocopied it, stored it in a safe, and then destroyed the original
> copy you gave me, why should I have to increment the refcounter and
> then immediately decrement it again...
>
> That's exactly what's going on here: we're _temporarily_ transferring
> the responsibility for the held refcount to the opaque component
> helper. The fact that the component helper has no way to transfer
> responsibility for that held refcount back to us is neither here nor
> there...
>
> To rephrase using the example above - the reference is stored in the
> safe, but the safe has been destroyed without regard for its contents.
> That's where the problem lies: the safe should have been opened, the
> address obtained, and the refcounter for the address decremented.
>
> If you separate the concept of "reference" from "refcount" and realise
> that a reference is merely a pointer to something, then I don't see
> what the issue is with understanding what's going on here.

I understand your point. You have also written some excellent piece of
documentation for the behaviour of the component framework that I feel
should be included in the kernel rather than being lost in the sea of emails.

Best regards,
Liviu

>
> --
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

--
-------------------
.oooO
( )
\ ( Oooo.
\_) ( )
) /
(_/

One small step
for me ...

2015-11-16 22:33:52

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm: Improve drm_of_component_probe() to correctly handle ports and remote ports.

On Mon, Nov 16, 2015 at 09:17:35PM +0000, Liviu Dudau wrote:
> On Mon, Nov 16, 2015 at 05:22:48PM +0000, Russell King - ARM Linux wrote:
> > Put those two sentences together and please think about it. If the
> > pointer type is unknown to component_match_add(), how could it ever
> > use of_node_get() on it? What if it's a string? Or a struct device?
> > Or something else.
>
> I did not say of_node_get() I said "some reference counting of sorts".
> One option would be to have (yet) another callback that the user of the
> components framework has to provide that does resource management.

I really do not see the point of anything other than a "release this
void pointer" callback which is exactly what I proposed at the
beginning of this discussion. There's no reason to add a "take a
reference on this void pointer" - in fact, adding that brings with
it a _huge_ amount of complexity, such as:

void component_match_add(struct device *dev, struct component_match **matchptr,
int (*compare)(struct device *, void *), void *compare_data)

would become:

void component_match_add(struct device *dev, struct component_match **matchptr,
int (*take_ref)(void *),
int (*release_ref)(void *),
int (*compare)(struct device *, void *), void *compare_data)
{
struct component_match *match = *matchptr;

if (IS_ERR(match))
return;

if (take_ref) {
int ret = take_ref(compare_data);
if (ret) {
/*
* add code to walk the list of already added
* matches and call their own release_refs
*/
free(match);
*matchptr = ERR_PTR(ret);
return;
}
}

And this is exactly why I believe it's pointless to have this callback:
anything you do in take_ref() you can do before calling this function,
and if you do it before calling this function, you are more efficient.

However, let's continue on:

if (!match || match->num == match->alloc) {
size_t new_size = match ? match->alloc + 16 : 15;

match = component_match_realloc(dev, match, new_size);

... this becomes much more complicated as well -
component_match_realloc() can fail (just like standard
realloc()), but we'd need to release all the references
here as well.

... rest of existing function ...
}

Then there's component_master_add_with_match() itself, which has
several failure points, which also have to do a dance with the
refcounting.

Rather than all this, what I'm suggesting is:

1. Add component_match_init() which allocates an initial match struct,
which contains a head node which stores the error status (so if we
fail to extend the array, we can keep the old array around.)
2. component_match_add() appends to the match array, taking an optional
release function for the void pointer. Any failure adding the current
match results in the release function being called.
3. component_master_add_with_match() does its current job of
validating the list, and creating the final array of matches.
Any failure at this point walks the array and calls their optional
release function.
4. component_master_del() walks the array and calls their optional
release function.

This should result in a relatively simple implementation without multiple
failure cleanup paths.

There is absolutely zero need for some additional complex resource
management framework to be built around this at all.

> And for
> the record, I did thought about it and not for just a few minutes. I also
> came out of the thought process with the conclusion that while the components
> framework is doing the job it was coded for, it needs serious improvements
> in terms of documentation.

Just like the rest of the kernel, I've put it on the same todo list that
other maintainers put that on, which is the one which is always pushed to
the very back of the queue, because there's always more pressing matters
to attend to. I can pull out lots of examples where I've spent a long
time getting to know the code where there are almost zero comments - and
this is one of the reasons why such stuff gets pushed to the back. If
everyone documented their code and kept it up to date, then we wouldn't
need to read any code, and life would be wonderful.

I wouldn't need to spend the time trying to understand kernel code, I
could use that time to write documentation!

> > > I feel that holding onto a reference to a counted resource without
> > > incrementing the use count is not the right way of doing things, or
> > > at least it should be clearly documented in the interface of
> > > component_match_add() so that people understand the mess they are
> > > getting into.
> >
> > That's just crap. You're not thinking before you hit your keyboard.
>
> Russell, I have no idea what do I need to do in order to gain your respect
> and for you to start treating me like a human being. Maybe *I* need to
> lower my expectations and stop hoping to have a civilised discussion with
> you.

Sorry, but I'm direct and blunt in email. That's the way I am in email.
It's not about respect, I do respect others, but I say things directly.

> > Why should component_match_add(), which has _zero_ knowledge about
> > what it's being used with, have documentation attached to it which
> > would be:
> >
> > "If you use component_match_add() with X, you need to do X'. If you
> > use it with Y, you need to do Y'. If you use it with Z, you need to
> > do Z'."
> >
> > No, that's utterly insane.
>
> No. It is utterly insane not to have documentation when using the framework
> or the function leads to memory leaks.

I've pointed out that the lack of "how can the void *data pointer be
freed" is currently something that isn't currently implemented. That's
something on my todo list - along with getting some other patches out
the door for the component helper which I've had knocking around for
the last 18 months:

Author: Russell King <[email protected]>
Date: Fri Apr 18 23:05:53 2014 +0100

component: track components via array rather than list

Author: Russell King <[email protected]>
Date: Wed Apr 23 10:46:11 2014 +0100

component: move check for unbound master into try_to_bring_up_masters()

Author: Russell King <[email protected]>
Date: Fri Apr 18 22:10:32 2014 +0100

component: remove old add_components method

Now, consider this: if it takes more than 18 months for me to get my
/own/ patches into the kernel, what hope do you think there is for me
to get around to writing documentation?

> You might think that the code is the proper documentation and ultimately that
> is right, but even code is wrong sometimes and (as we discovered with the
> component_match_add() discussion here) its side effects are not easy to spot
> even to the authors of the code.

Sorry? It's not new to me, I've known about it for the last year or
more, I just haven't found the time to address it. So far from "not
easy to spot even to the authors of the code", the authors of the
code know it's shortcomings, know it lacks documentation, and know
it needs more attention. Even have outstanding patches against it
for a long time.

> Another example: of the 6 commits for drivers/base/component.c, 3 of them
> have "fixed ... handling/cleanup/bug". So it is not easy to get it right.

That's stretching it. 2 of them.
drivers/base: fix devres handling for master device
component: fix missed cleanup in case of devres failure

The third which I assume you refer to:
component: fix bug with legacy API
is a side effect of modifying the code in:
component: add support for component match array

which would not have been an issue had the above three outstanding
commits had gone in - instead, the issue there would've been "hey,
the legacy API has gone!" due to an addition of a new user in the
same merge window which would've removed the legacy API. Hence,
it's very unfair to say that this is an example of the author not
understanding his own code.

You'll notice that the above three outstanding commits pre-date this
last fix.

> No, I was operating under the belief that if some framework holds a pointer
> to a resource it should also do the management of that resource. Your
> choice for the component framework has been different and you have decided
> that it is just a conduit for data gathering and binding, without any deep
> knowledge of what that data is. But that is not documented anywhere, so I
> don't feel too bad about having a different view.

It's no different from something like qsort(). qsort() does not care
what the underlying data is - it's given a pointer to some memory,
size of elements, the number of elements, and a match function. It
doesn't know what's stored there - it could be numbers of apples,
pointers to addresses, refcounted objects, etc.

This is no different. It's not some "different choice", it's a
standard construct going back years in programming history.

All that's going on here is it's gathering up a list of pointers and
their corresponding comparison function (and if I can get a round tuit,
a release function). There are only two chunks of code which have any
business deferencing those pointers, and they are the comparison
and release functions.

It would be against programming principles to create an API that takes
void pointers and then casts them to something else so that they can
be dereferenced. Sorry, I don't code like that. Where there's a void
pointer, that means "this is an opaque object that's being passed in
that we won't dereference".

There's another example where this applies. request_irq(), free_irq()
and the IRQ handler. These two functions take a void pointer, and
have exactly the same behaviour that the component helper has: it's
an undereferencable cookie as far as they are concerned. In the IRQ
case, the only thing that's allowed to dereference those void pointers
is the IRQ handler, which knows the type of the object. Same for the
component helper: the only thing that's allowed to dereference the
void pointer is the associated compare (and release) functions.

> I understand your point. You have also written some excellent piece of
> documentation for the behaviour of the component framework that I feel
> should be included in the kernel rather than being lost in the sea of emails.

No, it's not documentation for the component framework at all. The
description was largely a description of memory pointers and reference
counting in general, which I believe is pretty standard stuff.

However, what I have done above is documented _your_ code by describing
what _your_ drm_of_component_probe is doing. I find that rather ironic.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.