2023-09-18 17:43:22

by Johan Hovold

[permalink] [raw]
Subject: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

A recent commit reordered probe so that the interrupt line is now
requested before making sure that the device exists.

This breaks machines like the Lenovo ThinkPad X13s which rely on the
HID driver to probe second-source devices and only register the variant
that is actually populated. Specifically, the interrupt line may now
already be (temporarily) claimed when doing asynchronous probing of the
touchpad:

genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
i2c_hid_of: probe of 21-0015 failed with error -16

Fix this by restoring the old behaviour of first making sure the device
exists before requesting the interrupt line.

Note that something like this should probably be implemented also for
"panel followers", whose actual probe is currently effectively deferred
until the DRM panel is probed (e.g. by powering down the device after
making sure it exists and only then register it as a follower).

Fixes: 675cd877c952 ("HID: i2c-hid: Rearrange probe() to power things up later")
Cc: Douglas Anderson <[email protected]>
Cc: Maxime Ripard <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/hid/i2c-hid/i2c-hid-core.c | 142 ++++++++++++++++-------------
1 file changed, 80 insertions(+), 62 deletions(-)

diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
index 9601c0605fd9..e66c058a4b00 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -998,45 +998,29 @@ static int i2c_hid_core_resume(struct i2c_hid *ihid)
return hid_driver_reset_resume(hid);
}

-/**
- * __do_i2c_hid_core_initial_power_up() - First time power up of the i2c-hid device.
- * @ihid: The ihid object created during probe.
- *
- * This function is called at probe time.
- *
- * The initial power on is where we do some basic validation that the device
- * exists, where we fetch the HID descriptor, and where we create the actual
- * HID devices.
- *
- * Return: 0 or error code.
+/*
+ * Check that the device exists and parse the HID descriptor.
*/
-static int __do_i2c_hid_core_initial_power_up(struct i2c_hid *ihid)
+static int __i2c_hid_core_probe(struct i2c_hid *ihid)
{
struct i2c_client *client = ihid->client;
struct hid_device *hid = ihid->hid;
int ret;

- ret = i2c_hid_core_power_up(ihid);
- if (ret)
- return ret;
-
/* Make sure there is something at this address */
ret = i2c_smbus_read_byte(client);
if (ret < 0) {
i2c_hid_dbg(ihid, "nothing at this address: %d\n", ret);
- ret = -ENXIO;
- goto err;
+ return -ENXIO;
}

ret = i2c_hid_fetch_hid_descriptor(ihid);
if (ret < 0) {
dev_err(&client->dev,
"Failed to fetch the HID Descriptor\n");
- goto err;
+ return ret;
}

- enable_irq(client->irq);
-
hid->version = le16_to_cpu(ihid->hdesc.bcdVersion);
hid->vendor = le16_to_cpu(ihid->hdesc.wVendorID);
hid->product = le16_to_cpu(ihid->hdesc.wProductID);
@@ -1050,17 +1034,49 @@ static int __do_i2c_hid_core_initial_power_up(struct i2c_hid *ihid)

ihid->quirks = i2c_hid_lookup_quirk(hid->vendor, hid->product);

+ return 0;
+}
+
+static int i2c_hid_core_register_hid(struct i2c_hid *ihid)
+{
+ struct i2c_client *client = ihid->client;
+ struct hid_device *hid = ihid->hid;
+ int ret;
+
+ enable_irq(client->irq);
+
ret = hid_add_device(hid);
if (ret) {
if (ret != -ENODEV)
hid_err(client, "can't add hid device: %d\n", ret);
- goto err;
+ disable_irq(client->irq);
+ return ret;
}

return 0;
+}
+
+static int i2c_hid_core_probe_panel_follower(struct i2c_hid *ihid)
+{
+ int ret;

-err:
+ ret = i2c_hid_core_power_up(ihid);
+ if (ret)
+ return ret;
+
+ ret = __i2c_hid_core_probe(ihid);
+ if (ret)
+ goto err_power_down;
+
+ ret = i2c_hid_core_register_hid(ihid);
+ if (ret)
+ goto err_power_down;
+
+ return 0;
+
+err_power_down:
i2c_hid_core_power_down(ihid);
+
return ret;
}

@@ -1077,7 +1093,7 @@ static void ihid_core_panel_prepare_work(struct work_struct *work)
* steps.
*/
if (!hid->version)
- ret = __do_i2c_hid_core_initial_power_up(ihid);
+ ret = i2c_hid_core_probe_panel_follower(ihid);
else
ret = i2c_hid_core_resume(ihid);

@@ -1156,30 +1172,6 @@ static int i2c_hid_core_register_panel_follower(struct i2c_hid *ihid)
return 0;
}

-static int i2c_hid_core_initial_power_up(struct i2c_hid *ihid)
-{
- /*
- * If we're a panel follower, we'll register and do our initial power
- * up when the panel turns on; otherwise we do it right away.
- */
- if (drm_is_panel_follower(&ihid->client->dev))
- return i2c_hid_core_register_panel_follower(ihid);
- else
- return __do_i2c_hid_core_initial_power_up(ihid);
-}
-
-static void i2c_hid_core_final_power_down(struct i2c_hid *ihid)
-{
- /*
- * If we're a follower, the act of unfollowing will cause us to be
- * powered down. Otherwise we need to manually do it.
- */
- if (ihid->is_panel_follower)
- drm_panel_remove_follower(&ihid->panel_follower);
- else
- i2c_hid_core_suspend(ihid, true);
-}
-
int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
u16 hid_descriptor_address, u32 quirks)
{
@@ -1224,14 +1216,10 @@ int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
return ret;
device_enable_async_suspend(&client->dev);

- ret = i2c_hid_init_irq(client);
- if (ret < 0)
- goto err_buffers_allocated;
-
hid = hid_allocate_device();
if (IS_ERR(hid)) {
ret = PTR_ERR(hid);
- goto err_irq;
+ goto err_free_buffers;
}

ihid->hid = hid;
@@ -1242,19 +1230,42 @@ int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
hid->bus = BUS_I2C;
hid->initial_quirks = quirks;

- ret = i2c_hid_core_initial_power_up(ihid);
+ /* Power on and probe unless device is a panel follower. */
+ if (!drm_is_panel_follower(&ihid->client->dev)) {
+ ret = i2c_hid_core_power_up(ihid);
+ if (ret < 0)
+ goto err_destroy_device;
+
+ ret = __i2c_hid_core_probe(ihid);
+ if (ret < 0)
+ goto err_power_down;
+ }
+
+ ret = i2c_hid_init_irq(client);
+ if (ret < 0)
+ goto err_power_down;
+
+ /*
+ * If we're a panel follower, we'll register when the panel turns on;
+ * otherwise we do it right away.
+ */
+ if (drm_is_panel_follower(&ihid->client->dev))
+ ret = i2c_hid_core_register_panel_follower(ihid);
+ else
+ ret = i2c_hid_core_register_hid(ihid);
if (ret)
- goto err_mem_free;
+ goto err_free_irq;

return 0;

-err_mem_free:
- hid_destroy_device(hid);
-
-err_irq:
+err_free_irq:
free_irq(client->irq, ihid);
-
-err_buffers_allocated:
+err_power_down:
+ if (!drm_is_panel_follower(&ihid->client->dev))
+ i2c_hid_core_power_down(ihid);
+err_destroy_device:
+ hid_destroy_device(hid);
+err_free_buffers:
i2c_hid_free_buffers(ihid);

return ret;
@@ -1266,7 +1277,14 @@ void i2c_hid_core_remove(struct i2c_client *client)
struct i2c_hid *ihid = i2c_get_clientdata(client);
struct hid_device *hid;

- i2c_hid_core_final_power_down(ihid);
+ /*
+ * If we're a follower, the act of unfollowing will cause us to be
+ * powered down. Otherwise we need to manually do it.
+ */
+ if (ihid->is_panel_follower)
+ drm_panel_remove_follower(&ihid->panel_follower);
+ else
+ i2c_hid_core_suspend(ihid, true);

hid = ihid->hid;
hid_destroy_device(hid);
--
2.41.0


2023-09-18 19:30:15

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Mon, Sep 18, 2023 at 6:00 AM Johan Hovold <[email protected]> wrote:
>
> A recent commit reordered probe so that the interrupt line is now
> requested before making sure that the device exists.
>
> This breaks machines like the Lenovo ThinkPad X13s which rely on the
> HID driver to probe second-source devices and only register the variant
> that is actually populated. Specifically, the interrupt line may now
> already be (temporarily) claimed when doing asynchronous probing of the
> touchpad:
>
> genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
> i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
> i2c_hid_of: probe of 21-0015 failed with error -16
>
> Fix this by restoring the old behaviour of first making sure the device
> exists before requesting the interrupt line.
>
> Note that something like this should probably be implemented also for
> "panel followers", whose actual probe is currently effectively deferred
> until the DRM panel is probed (e.g. by powering down the device after
> making sure it exists and only then register it as a follower).
>
> Fixes: 675cd877c952 ("HID: i2c-hid: Rearrange probe() to power things up later")
> Cc: Douglas Anderson <[email protected]>
> Cc: Maxime Ripard <[email protected]>
> Signed-off-by: Johan Hovold <[email protected]>
> ---
> drivers/hid/i2c-hid/i2c-hid-core.c | 142 ++++++++++++++++-------------
> 1 file changed, 80 insertions(+), 62 deletions(-)

Ugh, sorry for the regression. :( It actually turns out that I've been
digging into this same issue on a different device (see
mt8173-elm-hana). I hadn't realized that it was a regression caused by
my recent change, though.

I haven't yet reviewed your change in detail, but to me it seems like
at most a short term fix. Specifically, I think the way that this has
been working has been partially via hacks and partially via luck. Let
me explain...

Currently, to make this work the `sc8280xp-lenovo-thinkpad-x13s.dts`
file has a hack in it. You can see that the `tpad_default` pinctrl
entry has been moved up to the i2c bus level even though it doesn't
belong there (it should be in each trackpad). This is because,
otherwise, you would have run into similar type problems as the device
core would have failed to claim the pin for one of the devices.

Currently, we're getting a bit lucky with
`sc8280xp-lenovo-thinkpad-x13s.dts` that there are no other shared
resources between the two devices besides the interrupt. Specifically
a number of trackpads / touchscreens also have a "reset" GPIO that
needs to be power sequenced properly in order to talk to the
touchscreen. In this case we'll be stuck again because both instances
would need to grab the "reset" GPIO before being able to confirm if
the device is there.

This is an old problem. The first I remember running into it was back
in 2015 on rk3288-veryron-minnie. We had a downstream hack to make
this work with -EPROBE_DEFER. https://crrev.com/c/266224. By the time
we shipped, though, we decided not to do the 2nd sourcing. After that
I always NAKed HW designs like this, but I guess that didn't help with
Mediatek hardware I wasn't involved with. :( ...and, of course, it
didn't help with devices that aren't Chromebooks like the Thinkpad
X13S.

FWIW: as a short term solution, we ended up forcing synchronous probe
in <https://crrev.com/c/4857566>. This has some pretty serious boot
time implications, but it's also very simple.


I'm actively working on coming up with a better solution here. My
current thought is that that maybe we want to do:

1. Undo the hack in the device tree and have each "2nd source" have
its own pinctrl entry.

2. In core pinctrl / device probing code detect the pinctrl conflict
and only probe one of the devices at a time.

...that sounds like a nice/elegant solution and I'm trying to make it
work, though it does have some downsides. Namely:

a) It requires "dts" changes to work. Namely we've got to undo the
hack that pushed the pinctrl up to the controller level (or, in the
case of mt8173-elm-hana, that just totally skipped the "pinctrl" entry
altogether). Unfortunately those same "dts" changes will actually make
things _worse_ if you don't have the code change. :(

b) It only handles the case where the resources shared by 2nd sourcing
are expressed by pinctrl. In a practical sense this seems to be most
cases, but conceivably you could imagine running into this situation
with a non-pin-related shared resource.

c) To solve this in the core, we have to make sure we properly handle
(without hanging/failing) multiple partially-conflicting devices and
devices that might acquire resources in arbitrary orders.

Though the above solution detecting the pinctrl conflicts sounds
appealing and I'm currently working on prototyping it, I'm still not
100% convinced. I'm worried about the above downsides.


Personally, I feel like we could add information to the device tree
that would help us out. The question is: is this an abuse of device
tree for something that Linux ought to be able to figure out on its
own, or is it OK? To make it concrete, I was thinking about something
like this:

/ {
tp_ex_group: trackpad-exclusion-group {
members = <&tp1>, <&tp2>, <&tp3>;
};
};

&i2c_bus {
tp1: trackpad@10 {
...
mutual-exclusion-group = <&tp_ex_group>;
};
tp2: trackpad@20 {
...
mutual-exclusion-group = <&tp_ex_group>;
};
tp3: trackpad@30 {
...
mutual-exclusion-group = <&tp_ex_group>;
};
};

Then the device core would know not to probe devices in the same
"mutual-exclusion-group" at the same time.

If DT folks are OK with the "mutual-exclusion-group" idea then I'll
probably backburner my attempt to make this work on the pinctrl level
and go with that.

2023-09-19 07:09:47

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

On Mon, Sep 18, 2023 at 08:00:15AM -0700, Doug Anderson wrote:
> On Mon, Sep 18, 2023 at 6:00 AM Johan Hovold <[email protected]> wrote:
> > A recent commit reordered probe so that the interrupt line is now
> > requested before making sure that the device exists.
> >
> > This breaks machines like the Lenovo ThinkPad X13s which rely on the
> > HID driver to probe second-source devices and only register the variant
> > that is actually populated. Specifically, the interrupt line may now
> > already be (temporarily) claimed when doing asynchronous probing of the
> > touchpad:
> >
> > genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
> > i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
> > i2c_hid_of: probe of 21-0015 failed with error -16
> >
> > Fix this by restoring the old behaviour of first making sure the device
> > exists before requesting the interrupt line.

> Ugh, sorry for the regression. :( It actually turns out that I've been
> digging into this same issue on a different device (see
> mt8173-elm-hana). I hadn't realized that it was a regression caused by
> my recent change, though.
>
> I haven't yet reviewed your change in detail, but to me it seems like
> at most a short term fix. Specifically, I think the way that this has
> been working has been partially via hacks and partially via luck. Let
> me explain...
>
> Currently, to make this work the `sc8280xp-lenovo-thinkpad-x13s.dts`
> file has a hack in it. You can see that the `tpad_default` pinctrl
> entry has been moved up to the i2c bus level even though it doesn't
> belong there (it should be in each trackpad). This is because,
> otherwise, you would have run into similar type problems as the device
> core would have failed to claim the pin for one of the devices.

Ḯ'm well aware of that and it was mentioned in the commit message for
4367d763698c ("arm64: dts: qcom: sc8280xp-x13s: enable alternate
touchpad") as well as discussed briefly with Rob here:

https://lore.kernel.org/all/[email protected]/

> Currently, we're getting a bit lucky with
> `sc8280xp-lenovo-thinkpad-x13s.dts` that there are no other shared
> resources between the two devices besides the interrupt. Specifically
> a number of trackpads / touchscreens also have a "reset" GPIO that
> needs to be power sequenced properly in order to talk to the
> touchscreen. In this case we'll be stuck again because both instances
> would need to grab the "reset" GPIO before being able to confirm if
> the device is there.

Right, this will only work for fairly simple cases, but we do have a few
of those in tree since some years back.

> This is an old problem. The first I remember running into it was back
> in 2015 on rk3288-veryron-minnie. We had a downstream hack to make
> this work with -EPROBE_DEFER. https://crrev.com/c/266224. By the time
> we shipped, though, we decided not to do the 2nd sourcing. After that
> I always NAKed HW designs like this, but I guess that didn't help with
> Mediatek hardware I wasn't involved with. :( ...and, of course, it
> didn't help with devices that aren't Chromebooks like the Thinkpad
> X13S.
>
> FWIW: as a short term solution, we ended up forcing synchronous probe
> in <https://crrev.com/c/4857566>. This has some pretty serious boot
> time implications, but it's also very simple.
>
>
> I'm actively working on coming up with a better solution here. My
> current thought is that that maybe we want to do:
>
> 1. Undo the hack in the device tree and have each "2nd source" have
> its own pinctrl entry.
>
> 2. In core pinctrl / device probing code detect the pinctrl conflict
> and only probe one of the devices at a time.
>
> ...that sounds like a nice/elegant solution and I'm trying to make it
> work, though it does have some downsides. Namely:
>
> a) It requires "dts" changes to work. Namely we've got to undo the
> hack that pushed the pinctrl up to the controller level (or, in the
> case of mt8173-elm-hana, that just totally skipped the "pinctrl" entry
> altogether). Unfortunately those same "dts" changes will actually make
> things _worse_ if you don't have the code change. :(

Right, a proper solution will likely require an updated DT.

> b) It only handles the case where the resources shared by 2nd sourcing
> are expressed by pinctrl. In a practical sense this seems to be most
> cases, but conceivably you could imagine running into this situation
> with a non-pin-related shared resource.

Indeed.

> c) To solve this in the core, we have to make sure we properly handle
> (without hanging/failing) multiple partially-conflicting devices and
> devices that might acquire resources in arbitrary orders.
>
> Though the above solution detecting the pinctrl conflicts sounds
> appealing and I'm currently working on prototyping it, I'm still not
> 100% convinced. I'm worried about the above downsides.

Yes, I agree that we'd need to take a broader look at this and not just
focus on the immediate pinctrl issue.

> Personally, I feel like we could add information to the device tree
> that would help us out. The question is: is this an abuse of device
> tree for something that Linux ought to be able to figure out on its
> own, or is it OK? To make it concrete, I was thinking about something
> like this:
>
> / {
> tp_ex_group: trackpad-exclusion-group {
> members = <&tp1>, <&tp2>, <&tp3>;
> };
> };
>
> &i2c_bus {
> tp1: trackpad@10 {
> ...
> mutual-exclusion-group = <&tp_ex_group>;
> };
> tp2: trackpad@20 {
> ...
> mutual-exclusion-group = <&tp_ex_group>;
> };
> tp3: trackpad@30 {
> ...
> mutual-exclusion-group = <&tp_ex_group>;
> };
> };
>
> Then the device core would know not to probe devices in the same
> "mutual-exclusion-group" at the same time.
>
> If DT folks are OK with the "mutual-exclusion-group" idea then I'll
> probably backburner my attempt to make this work on the pinctrl level
> and go with that.

I expressed something along these lines in the thread above:

It seems we'd need some way to describe the devices as mutually
exclusive...

but given that we had prior art for handling simple cases and due to
lack of time, I left it on the ever-growing todo list.

But regardless of what a long-term proper solution to this may look
like, we need to fix the regression in 6.6-rc1 by restoring the old
behaviour.

Johan

2023-09-19 20:18:55

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:
>
> > c) To solve this in the core, we have to make sure we properly handle
> > (without hanging/failing) multiple partially-conflicting devices and
> > devices that might acquire resources in arbitrary orders.
> >
> > Though the above solution detecting the pinctrl conflicts sounds
> > appealing and I'm currently working on prototyping it, I'm still not
> > 100% convinced. I'm worried about the above downsides.
>
> Yes, I agree that we'd need to take a broader look at this and not just
> focus on the immediate pinctrl issue.

OK. FWIW, I got blocked on trying to solve this in the core
automatically by just using the conflicting "pinctrl" entries. There
are probably some ways to get it solved, but none of them are easy.


> > Personally, I feel like we could add information to the device tree
> > that would help us out. The question is: is this an abuse of device
> > tree for something that Linux ought to be able to figure out on its
> > own, or is it OK? To make it concrete, I was thinking about something
> > like this:
> >
> > / {
> > tp_ex_group: trackpad-exclusion-group {
> > members = <&tp1>, <&tp2>, <&tp3>;
> > };
> > };
> >
> > &i2c_bus {
> > tp1: trackpad@10 {
> > ...
> > mutual-exclusion-group = <&tp_ex_group>;
> > };
> > tp2: trackpad@20 {
> > ...
> > mutual-exclusion-group = <&tp_ex_group>;
> > };
> > tp3: trackpad@30 {
> > ...
> > mutual-exclusion-group = <&tp_ex_group>;
> > };
> > };
> >
> > Then the device core would know not to probe devices in the same
> > "mutual-exclusion-group" at the same time.
> >
> > If DT folks are OK with the "mutual-exclusion-group" idea then I'll
> > probably backburner my attempt to make this work on the pinctrl level
> > and go with that.
>
> I expressed something along these lines in the thread above:

I'm going to try coding up the above to see how it looks. Assuming
nothing comes up, I'll try to have something in the next few days.


> It seems we'd need some way to describe the devices as mutually
> exclusive...
>
> but given that we had prior art for handling simple cases and due to
> lack of time, I left it on the ever-growing todo list.
>
> But regardless of what a long-term proper solution to this may look
> like, we need to fix the regression in 6.6-rc1 by restoring the old
> behaviour.

OK, fair enough. I'll take a look at your patch, though I think the
person that really needs to approve it is Benjamin...

Style-wise, I will say that Benjamin really wanted to keep the "panel
follower" code out of the main probe routine. Some of my initial
patches adding "panel follower" looked more like the results after
your patch but Benjamin really wasn't happy until there were no
special cases for panel-followers in the main probe routine. This is
why the code is structured as it is.

Thinking that way, is there any reason you can't just move the
i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
could replace the call to enable_irq() with it and then remove the
`IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
wanted to use a 2nd source + the panel follower concept? Both devices
would probe, but only one of them would actually grab the interrupt
and only one of them would actually create real HID devices. We might
need to do some work to keep from trying again at every poweron of the
panel, but it would probably be workable? I think this would also be a
smaller change...

-Doug

2023-09-20 07:32:42

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:

> > But regardless of what a long-term proper solution to this may look
> > like, we need to fix the regression in 6.6-rc1 by restoring the old
> > behaviour.
>
> OK, fair enough. I'll take a look at your patch, though I think the
> person that really needs to approve it is Benjamin...
>
> Style-wise, I will say that Benjamin really wanted to keep the "panel
> follower" code out of the main probe routine. Some of my initial
> patches adding "panel follower" looked more like the results after
> your patch but Benjamin really wasn't happy until there were no
> special cases for panel-followers in the main probe routine. This is
> why the code is structured as it is.

Ok, I prefer not hiding away things like that as it obscures what's
really going on, for example, in this case, that you register a device
without really having probed it.

As I alluded to in the commit message, you probably want to be able to
support second-source touchscreen panel followers as well at some point
and then deferring checking whether device is populated until the panel
is powered on is not going to work.

I skimmed the thread were you added this, but I'm not sure I saw any
reason for why powering on the panel follower temporarily during probe
would not work?

> Thinking that way, is there any reason you can't just move the
> i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> could replace the call to enable_irq() with it and then remove the
> `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> wanted to use a 2nd source + the panel follower concept? Both devices
> would probe, but only one of them would actually grab the interrupt
> and only one of them would actually create real HID devices. We might
> need to do some work to keep from trying again at every poweron of the
> panel, but it would probably be workable? I think this would also be a
> smaller change...

That was my first idea as well, but conceptually it is more correct to
request resources at probe time and not at some later point when you can
no longer fail probe.

You'd also need to handle the fact that the interrupt may never have
been requested when remove() is called, which adds unnecessary
complexity.

Johan

2023-09-20 18:21:01

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <[email protected]> wrote:
>
> On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:
>
> > > But regardless of what a long-term proper solution to this may look
> > > like, we need to fix the regression in 6.6-rc1 by restoring the old
> > > behaviour.
> >
> > OK, fair enough. I'll take a look at your patch, though I think the
> > person that really needs to approve it is Benjamin...
> >
> > Style-wise, I will say that Benjamin really wanted to keep the "panel
> > follower" code out of the main probe routine. Some of my initial
> > patches adding "panel follower" looked more like the results after
> > your patch but Benjamin really wasn't happy until there were no
> > special cases for panel-followers in the main probe routine. This is
> > why the code is structured as it is.
>
> Ok, I prefer not hiding away things like that as it obscures what's
> really going on, for example, in this case, that you register a device
> without really having probed it.

I can see your reasoning and I think that intuition is why the earlier
versions of my patches had explicit "panel follower" logic in probe.
However, Benjamin really liked the logic abstracted out.


> As I alluded to in the commit message, you probably want to be able to
> support second-source touchscreen panel followers as well at some point
> and then deferring checking whether device is populated until the panel
> is powered on is not going to work.

Yeah, I've been pondering this too. I _think_ it would work OK-ish if
both devices probed and then only one of the two would actually make
the sub-HID devices. So you'd actually see both devices succeed at
probing but only one of them would actually be functional. It's a bit
ugly, though. :( Maybe marginally better would be if we could figure
out how to have the device which fails to get its interrupt later
unbind itself, if that's possible...

The only other thought I had would be to have the parent i2c bus
understand that it had children that were panel followers, which it
should be able to do by seeing the "panel" attribute in their device
tree. Then the i2c bus could itself register as a panel follower and
could wait to probe its children until they were powered on. This
could happen in the i2c core so we didn't have to add code like this
to all i2c bus drivers. ...and, if necessary, we could add this to
other busses like SPI. It feels a little awkward but could work.


> I skimmed the thread were you added this, but I'm not sure I saw any
> reason for why powering on the panel follower temporarily during probe
> would not work?

My first instinct says we can't do this, but let's think about it...

In general the "panel follower" API is designed to give all the
decision making about when to power things on and off to the panel
driver, which is controlled by DRM.

The reason for this is from experience I had when dealing with the
Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
on that panel tended to die if you didn't sequence it just right.
Specifically, if you were sending pixels to the panel and then stopped
then you absolutely needed to power the panel off and on again. Folks
I talked to even claimed that the panel was working "to spec" since,
in the "Power Sequencing" section of the eDP spec it clearly shows
that you _must_ turn the panel off and on again after you stop giving
it bits. ...this is despite the fact that no other panel I've worked
with cares. ;-)

On homestar, since we didn't have the "panel follower" API, we ended
up adding cost to the hardware and putting the panel and touchscreens
on different power rails. However, I wanted to make sure that if we
ran into a similar situation in the future (or maybe if we were trying
to make hardware work that we didn't have control over) that we could
solve it.

The other reason for giving full control to the panel driver is just
how userspace usually works. Right now userspace tends to power off
panels if they're not used (like if a lid is closed on a laptop) but
doesn't necessarily power off the touchscreen. Thus if the touchscreen
has the ability to keep things powered on then we'd never get to a low
power state.

The above all explains why panel followers like the touchscreen
shouldn't be able to keep power on. However, you are specifically
suggesting that we just turn the power on temporarily during probe. As
I think about that, it might be possible? I guess you'd have to
temporarily block DRM from changing the state of the panel while the
touchscreen is probing. Then if the panel was off then you'd turn it
on briefly, do your probe, and then turn it off again. If the panel
was on then by blocking DRM you'd ensure that it stayed on. I'm not
sure how palatable that would be or if there are any other tricky
parts I'm not thinking about.


> > Thinking that way, is there any reason you can't just move the
> > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > could replace the call to enable_irq() with it and then remove the
> > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > wanted to use a 2nd source + the panel follower concept? Both devices
> > would probe, but only one of them would actually grab the interrupt
> > and only one of them would actually create real HID devices. We might
> > need to do some work to keep from trying again at every poweron of the
> > panel, but it would probably be workable? I think this would also be a
> > smaller change...
>
> That was my first idea as well, but conceptually it is more correct to
> request resources at probe time and not at some later point when you can
> no longer fail probe.
>
> You'd also need to handle the fact that the interrupt may never have
> been requested when remove() is called, which adds unnecessary
> complexity.

I don't think it's a lot of complexity, is it? Just an extra "if" statement...

-Doug

2023-09-22 09:19:52

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

On Wed, Sep 20, 2023 at 08:41:12AM -0700, Doug Anderson wrote:
> On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <[email protected]> wrote:
> > On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:

> > As I alluded to in the commit message, you probably want to be able to
> > support second-source touchscreen panel followers as well at some point
> > and then deferring checking whether device is populated until the panel
> > is powered on is not going to work.
>
> Yeah, I've been pondering this too. I _think_ it would work OK-ish if
> both devices probed and then only one of the two would actually make
> the sub-HID devices. So you'd actually see both devices succeed at
> probing but only one of them would actually be functional. It's a bit
> ugly, though. :( Maybe marginally better would be if we could figure
> out how to have the device which fails to get its interrupt later
> unbind itself, if that's possible...
>
> The only other thought I had would be to have the parent i2c bus
> understand that it had children that were panel followers, which it
> should be able to do by seeing the "panel" attribute in their device
> tree. Then the i2c bus could itself register as a panel follower and
> could wait to probe its children until they were powered on. This
> could happen in the i2c core so we didn't have to add code like this
> to all i2c bus drivers. ...and, if necessary, we could add this to
> other busses like SPI. It feels a little awkward but could work.

There may be other device on the bus that have nothing to do with
panels, but I guess you mean that this would only apply to a subset of
the children. In any case, this feels like a hack and layering
violation.

> > I skimmed the thread were you added this, but I'm not sure I saw any
> > reason for why powering on the panel follower temporarily during probe
> > would not work?
>
> My first instinct says we can't do this, but let's think about it...
>
> In general the "panel follower" API is designed to give all the
> decision making about when to power things on and off to the panel
> driver, which is controlled by DRM.
>
> The reason for this is from experience I had when dealing with the
> Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> on that panel tended to die if you didn't sequence it just right.
> Specifically, if you were sending pixels to the panel and then stopped
> then you absolutely needed to power the panel off and on again. Folks
> I talked to even claimed that the panel was working "to spec" since,
> in the "Power Sequencing" section of the eDP spec it clearly shows
> that you _must_ turn the panel off and on again after you stop giving
> it bits. ...this is despite the fact that no other panel I've worked
> with cares. ;-)
>
> On homestar, since we didn't have the "panel follower" API, we ended
> up adding cost to the hardware and putting the panel and touchscreens
> on different power rails. However, I wanted to make sure that if we
> ran into a similar situation in the future (or maybe if we were trying
> to make hardware work that we didn't have control over) that we could
> solve it.
>
> The other reason for giving full control to the panel driver is just
> how userspace usually works. Right now userspace tends to power off
> panels if they're not used (like if a lid is closed on a laptop) but
> doesn't necessarily power off the touchscreen. Thus if the touchscreen
> has the ability to keep things powered on then we'd never get to a low
> power state.

Don't you need to keep the touchscreen powered to support wakeup events
(e.g. when not closing the lid)?

And if you close the lid with wakeup disabled, you should still be able
to power down the touchscreen as part of suspend, right?

> The above all explains why panel followers like the touchscreen
> shouldn't be able to keep power on. However, you are specifically
> suggesting that we just turn the power on temporarily during probe. As
> I think about that, it might be possible? I guess you'd have to
> temporarily block DRM from changing the state of the panel while the
> touchscreen is probing. Then if the panel was off then you'd turn it
> on briefly, do your probe, and then turn it off again. If the panel
> was on then by blocking DRM you'd ensure that it stayed on. I'm not
> sure how palatable that would be or if there are any other tricky
> parts I'm not thinking about.

As this would allow actually probing the touchscreen during probe(), as
the driver model expects, this seems like a better approach then
deferring probe and registration if it's at all doable.

> > > Thinking that way, is there any reason you can't just move the
> > > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > > could replace the call to enable_irq() with it and then remove the
> > > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > > wanted to use a 2nd source + the panel follower concept? Both devices
> > > would probe, but only one of them would actually grab the interrupt
> > > and only one of them would actually create real HID devices. We might
> > > need to do some work to keep from trying again at every poweron of the
> > > panel, but it would probably be workable? I think this would also be a
> > > smaller change...
> >
> > That was my first idea as well, but conceptually it is more correct to
> > request resources at probe time and not at some later point when you can
> > no longer fail probe.
> >
> > You'd also need to handle the fact that the interrupt may never have
> > been requested when remove() is called, which adds unnecessary
> > complexity.
>
> I don't think it's a lot of complexity, is it? Just an extra "if" statement...

Well you'd need keep track of whether the interrupt has been requested
or not (and manage serialisation) yourself for a start.

But the main reason is still that requesting resources belongs in
probe() and should not be deferred to some later random time where you
cannot inform driver core of failures (e.g. for probe deferral if the
interrupt controller is not yet available).

Johan

2023-09-23 01:26:49

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Fri, Sep 22, 2023 at 2:08 AM Johan Hovold <[email protected]> wrote:
>
> On Wed, Sep 20, 2023 at 08:41:12AM -0700, Doug Anderson wrote:
> > On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <[email protected]> wrote:
> > > On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > > > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:
>
> > > As I alluded to in the commit message, you probably want to be able to
> > > support second-source touchscreen panel followers as well at some point
> > > and then deferring checking whether device is populated until the panel
> > > is powered on is not going to work.
> >
> > Yeah, I've been pondering this too. I _think_ it would work OK-ish if
> > both devices probed and then only one of the two would actually make
> > the sub-HID devices. So you'd actually see both devices succeed at
> > probing but only one of them would actually be functional. It's a bit
> > ugly, though. :( Maybe marginally better would be if we could figure
> > out how to have the device which fails to get its interrupt later
> > unbind itself, if that's possible...
> >
> > The only other thought I had would be to have the parent i2c bus
> > understand that it had children that were panel followers, which it
> > should be able to do by seeing the "panel" attribute in their device
> > tree. Then the i2c bus could itself register as a panel follower and
> > could wait to probe its children until they were powered on. This
> > could happen in the i2c core so we didn't have to add code like this
> > to all i2c bus drivers. ...and, if necessary, we could add this to
> > other busses like SPI. It feels a little awkward but could work.
>
> There may be other device on the bus that have nothing to do with
> panels, but I guess you mean that this would only apply to a subset of
> the children. In any case, this feels like a hack and layering
> violation.

Right, the idea would be to only do this for the subset of children
that are panel followers.

It definitely doesn't seem ideal, but it also didn't seem too terrible to me.


> > > I skimmed the thread were you added this, but I'm not sure I saw any
> > > reason for why powering on the panel follower temporarily during probe
> > > would not work?
> >
> > My first instinct says we can't do this, but let's think about it...
> >
> > In general the "panel follower" API is designed to give all the
> > decision making about when to power things on and off to the panel
> > driver, which is controlled by DRM.
> >
> > The reason for this is from experience I had when dealing with the
> > Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> > on that panel tended to die if you didn't sequence it just right.
> > Specifically, if you were sending pixels to the panel and then stopped
> > then you absolutely needed to power the panel off and on again. Folks
> > I talked to even claimed that the panel was working "to spec" since,
> > in the "Power Sequencing" section of the eDP spec it clearly shows
> > that you _must_ turn the panel off and on again after you stop giving
> > it bits. ...this is despite the fact that no other panel I've worked
> > with cares. ;-)
> >
> > On homestar, since we didn't have the "panel follower" API, we ended
> > up adding cost to the hardware and putting the panel and touchscreens
> > on different power rails. However, I wanted to make sure that if we
> > ran into a similar situation in the future (or maybe if we were trying
> > to make hardware work that we didn't have control over) that we could
> > solve it.
> >
> > The other reason for giving full control to the panel driver is just
> > how userspace usually works. Right now userspace tends to power off
> > panels if they're not used (like if a lid is closed on a laptop) but
> > doesn't necessarily power off the touchscreen. Thus if the touchscreen
> > has the ability to keep things powered on then we'd never get to a low
> > power state.
>
> Don't you need to keep the touchscreen powered to support wakeup events
> (e.g. when not closing the lid)?

No. The only reason you'd use panel follower is if the hardware was
designed such that the touchscreen needed to be power sequenced with
the panel. If the touchscreen can stay powered when the panel is off
then it is, by definition, not a panel follower.

For a laptop I don't think most people expect the touchscreen to stay
powered when the screen is off. I certainly wouldn't expect it. If the
screen was off and I wanted to interact with the device, I would hit a
key on the keyboard or touch the trackpad. When the people designing
sc7180-trogdor chose to have the display and touchscreen share a power
rail they made a conscious choice that they didn't need the
touchscreen active when the screen was off.

For the other hardware I'm aware of that needs panel-follower there is
a single external chip on the board that handles driving the panel and
the touchscreen. The power sequencing requirements for this chip
simply don't allow the touchscreen to be powered on while the display
is off.

One use case where I could intuitively think I might touch a
touchscreen of a screen that was "off" would be a kiosk of some sort.
It would make sense there to have two power rails. ...or, I suppose,
userspace could just choose to turn the backlight off but keep the
screen (and touchscreen) powered.


> And if you close the lid with wakeup disabled, you should still be able
> to power down the touchscreen as part of suspend, right?
>
> > The above all explains why panel followers like the touchscreen
> > shouldn't be able to keep power on. However, you are specifically
> > suggesting that we just turn the power on temporarily during probe. As
> > I think about that, it might be possible? I guess you'd have to
> > temporarily block DRM from changing the state of the panel while the
> > touchscreen is probing. Then if the panel was off then you'd turn it
> > on briefly, do your probe, and then turn it off again. If the panel
> > was on then by blocking DRM you'd ensure that it stayed on. I'm not
> > sure how palatable that would be or if there are any other tricky
> > parts I'm not thinking about.
>
> As this would allow actually probing the touchscreen during probe(), as
> the driver model expects, this seems like a better approach then
> deferring probe and registration if it's at all doable.

Yeah, I don't 100% know if it's doable but it seems possible.
Certainly it's something for future investigation.

Luckily, at the moment anything I'm aware of that truly needs panel
follower also doesn't have multiple sources for a touchscreen.


> > > > Thinking that way, is there any reason you can't just move the
> > > > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > > > could replace the call to enable_irq() with it and then remove the
> > > > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > > > wanted to use a 2nd source + the panel follower concept? Both devices
> > > > would probe, but only one of them would actually grab the interrupt
> > > > and only one of them would actually create real HID devices. We might
> > > > need to do some work to keep from trying again at every poweron of the
> > > > panel, but it would probably be workable? I think this would also be a
> > > > smaller change...
> > >
> > > That was my first idea as well, but conceptually it is more correct to
> > > request resources at probe time and not at some later point when you can
> > > no longer fail probe.
> > >
> > > You'd also need to handle the fact that the interrupt may never have
> > > been requested when remove() is called, which adds unnecessary
> > > complexity.
> >
> > I don't think it's a lot of complexity, is it? Just an extra "if" statement...
>
> Well you'd need keep track of whether the interrupt has been requested
> or not (and manage serialisation) yourself for a start.

Sure. So I guess an "if" test plus a boolean state variable. I still
don't think it's a lot of complexity.


> But the main reason is still that requesting resources belongs in
> probe() and should not be deferred to some later random time where you
> cannot inform driver core of failures (e.g. for probe deferral if the
> interrupt controller is not yet available).

OK, I guess the -EPROBE_DEFER is technically possible though probably
not likely in practice. ...so that's a good reason to make sure we
request the IRQ in probe even in the "panel follower" case. I still
beleive Benjamin would prefer that this was abstracted out and not in
the actual probe() routine, but I guess we can wait to hear from him.

One last idea I had while digging would be to wonder if we could
somehow solve this case with "IRQF_PROBE_SHARED". I guess that doesn't
work well together with "IRQF_NO_AUTOEN", but conceivably we could
have the interrupt handler return "IRQ_NONE" if the initial power up
never happened? I haven't spent much time poking with shared
interrupts though, so I don't know if there are other side effects...


-Doug

2023-10-02 13:36:43

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

On Fri, Sep 22, 2023 at 09:37:43AM -0700, Doug Anderson wrote:
> On Fri, Sep 22, 2023 at 2:08 AM Johan Hovold <[email protected]> wrote:
> > On Wed, Sep 20, 2023 at 08:41:12AM -0700, Doug Anderson wrote:
> > > On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <[email protected]> wrote:
> > > > On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > > > > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <[email protected]> wrote:
> >
> > > > As I alluded to in the commit message, you probably want to be able to
> > > > support second-source touchscreen panel followers as well at some point
> > > > and then deferring checking whether device is populated until the panel
> > > > is powered on is not going to work.

> > > > I skimmed the thread were you added this, but I'm not sure I saw any
> > > > reason for why powering on the panel follower temporarily during probe
> > > > would not work?
> > >
> > > My first instinct says we can't do this, but let's think about it...
> > >
> > > In general the "panel follower" API is designed to give all the
> > > decision making about when to power things on and off to the panel
> > > driver, which is controlled by DRM.
> > >
> > > The reason for this is from experience I had when dealing with the
> > > Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> > > on that panel tended to die if you didn't sequence it just right.
> > > Specifically, if you were sending pixels to the panel and then stopped
> > > then you absolutely needed to power the panel off and on again. Folks
> > > I talked to even claimed that the panel was working "to spec" since,
> > > in the "Power Sequencing" section of the eDP spec it clearly shows
> > > that you _must_ turn the panel off and on again after you stop giving
> > > it bits. ...this is despite the fact that no other panel I've worked
> > > with cares. ;-)
> > >
> > > On homestar, since we didn't have the "panel follower" API, we ended
> > > up adding cost to the hardware and putting the panel and touchscreens
> > > on different power rails. However, I wanted to make sure that if we
> > > ran into a similar situation in the future (or maybe if we were trying
> > > to make hardware work that we didn't have control over) that we could
> > > solve it.

Out of curiosity, are there any machines that actually need this
"panel-follower" API today, or are saying above that this is just
something that may be needed one day?

> > > The other reason for giving full control to the panel driver is just
> > > how userspace usually works. Right now userspace tends to power off
> > > panels if they're not used (like if a lid is closed on a laptop) but
> > > doesn't necessarily power off the touchscreen. Thus if the touchscreen
> > > has the ability to keep things powered on then we'd never get to a low
> > > power state.
> >
> > Don't you need to keep the touchscreen powered to support wakeup events
> > (e.g. when not closing the lid)?
>
> No. The only reason you'd use panel follower is if the hardware was
> designed such that the touchscreen needed to be power sequenced with
> the panel. If the touchscreen can stay powered when the panel is off
> then it is, by definition, not a panel follower.
>
> For a laptop I don't think most people expect the touchscreen to stay
> powered when the screen is off. I certainly wouldn't expect it. If the
> screen was off and I wanted to interact with the device, I would hit a
> key on the keyboard or touch the trackpad. When the people designing
> sc7180-trogdor chose to have the display and touchscreen share a power
> rail they made a conscious choice that they didn't need the
> touchscreen active when the screen was off.

Sure, but that's a policy decision and not something that should be
hard-coded in our drivers.

> For the other hardware I'm aware of that needs panel-follower there is
> a single external chip on the board that handles driving the panel and
> the touchscreen. The power sequencing requirements for this chip
> simply don't allow the touchscreen to be powered on while the display
> is off.
>
> One use case where I could intuitively think I might touch a
> touchscreen of a screen that was "off" would be a kiosk of some sort.
> It would make sense there to have two power rails. ...or, I suppose,
> userspace could just choose to turn the backlight off but keep the
> screen (and touchscreen) powered.

Right.

> > And if you close the lid with wakeup disabled, you should still be able
> > to power down the touchscreen as part of suspend, right?
> >
> > > The above all explains why panel followers like the touchscreen
> > > shouldn't be able to keep power on. However, you are specifically
> > > suggesting that we just turn the power on temporarily during probe. As
> > > I think about that, it might be possible? I guess you'd have to
> > > temporarily block DRM from changing the state of the panel while the
> > > touchscreen is probing. Then if the panel was off then you'd turn it
> > > on briefly, do your probe, and then turn it off again. If the panel
> > > was on then by blocking DRM you'd ensure that it stayed on. I'm not
> > > sure how palatable that would be or if there are any other tricky
> > > parts I'm not thinking about.
> >
> > As this would allow actually probing the touchscreen during probe(), as
> > the driver model expects, this seems like a better approach then
> > deferring probe and registration if it's at all doable.
>
> Yeah, I don't 100% know if it's doable but it seems possible.
> Certainly it's something for future investigation.
>
> Luckily, at the moment anything I'm aware of that truly needs panel
> follower also doesn't have multiple sources for a touchscreen.

Ok, so with the current panel-follower implementation you essentially
only waste a bit of memory in case of a non-populated touchscreen (e.g.
by keeping the platform and follower devices registered).

> > > > > Thinking that way, is there any reason you can't just move the
> > > > > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > > > > could replace the call to enable_irq() with it and then remove the
> > > > > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > > > > wanted to use a 2nd source + the panel follower concept? Both devices
> > > > > would probe, but only one of them would actually grab the interrupt
> > > > > and only one of them would actually create real HID devices. We might
> > > > > need to do some work to keep from trying again at every poweron of the
> > > > > panel, but it would probably be workable? I think this would also be a
> > > > > smaller change...
> > > >
> > > > That was my first idea as well, but conceptually it is more correct to
> > > > request resources at probe time and not at some later point when you can
> > > > no longer fail probe.
> > > >
> > > > You'd also need to handle the fact that the interrupt may never have
> > > > been requested when remove() is called, which adds unnecessary
> > > > complexity.
> > >
> > > I don't think it's a lot of complexity, is it? Just an extra "if" statement...
> >
> > Well you'd need keep track of whether the interrupt has been requested
> > or not (and manage serialisation) yourself for a start.
>
> Sure. So I guess an "if" test plus a boolean state variable. I still
> don't think it's a lot of complexity.

I never said "a lot", I used the word "unnecessary". But how much it
adds also depends on whether you need additional synchronisation.

But again, the main point is that the "panel-follower" feature should
not complicate and obfuscate the driver's probe implementation. And
looking up resources belongs in probe().

> > But the main reason is still that requesting resources belongs in
> > probe() and should not be deferred to some later random time where you
> > cannot inform driver core of failures (e.g. for probe deferral if the
> > interrupt controller is not yet available).
>
> OK, I guess the -EPROBE_DEFER is technically possible though probably
> not likely in practice. ...so that's a good reason to make sure we
> request the IRQ in probe even in the "panel follower" case. I still
> beleive Benjamin would prefer that this was abstracted out and not in
> the actual probe() routine, but I guess we can wait to hear from him.

I talked to Benjamin at Kernel Recipes last week and I don't think he
has any fundamental objections to the fix I'm proposing.

I prefer it as it makes the code easier to reason about and clearly
marks the code paths that differ in case the device is a "panel
follower". And since you said it also makes the code look more like what
you originally intended, then I guess you should be ok with it too?

Looking at the patch again, I may do a v2 to only look up the "panel"
property once even if that's a really minor optimisation.

> One last idea I had while digging would be to wonder if we could
> somehow solve this case with "IRQF_PROBE_SHARED". I guess that doesn't
> work well together with "IRQF_NO_AUTOEN", but conceivably we could
> have the interrupt handler return "IRQ_NONE" if the initial power up
> never happened? I haven't spent much time poking with shared
> interrupts though, so I don't know if there are other side effects...

Yeah, that doesn't seem right, though. The interrupt line is not really
shared, it's just that we need to check whether the device is populated
before requesting the interrupt.

Johan

2023-10-02 16:06:09

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

On Mon, Oct 02, 2023 at 07:35:06AM -0700, Doug Anderson wrote:
> On Mon, Oct 2, 2023 at 5:09 AM Johan Hovold <[email protected]> wrote:

> > Out of curiosity, are there any machines that actually need this
> > "panel-follower" API today, or are saying above that this is just
> > something that may be needed one day?
>
> Yes. See commit de0874165b83 ("drm/panel: Add a way for other devices
> to follow panel state") where I point to Cong Yang's original patch
> [1]. In that patch Cong was trying to make things work by assuming
> probe ordering and manually taking some of the power sequencing stuff
> out of some of the drivers in order to get things to work.
>
> [1] https://lore.kernel.org/r/20230519032316.3464732-1-yangcong5@huaqin.corp-partner.google.com

Ok, thanks for the pointer.

> > > > Don't you need to keep the touchscreen powered to support wakeup events
> > > > (e.g. when not closing the lid)?
> > >
> > > No. The only reason you'd use panel follower is if the hardware was
> > > designed such that the touchscreen needed to be power sequenced with
> > > the panel. If the touchscreen can stay powered when the panel is off
> > > then it is, by definition, not a panel follower.
> > >
> > > For a laptop I don't think most people expect the touchscreen to stay
> > > powered when the screen is off. I certainly wouldn't expect it. If the
> > > screen was off and I wanted to interact with the device, I would hit a
> > > key on the keyboard or touch the trackpad. When the people designing
> > > sc7180-trogdor chose to have the display and touchscreen share a power
> > > rail they made a conscious choice that they didn't need the
> > > touchscreen active when the screen was off.
> >
> > Sure, but that's a policy decision and not something that should be
> > hard-coded in our drivers.
>
> If the touchscreen and panel can be powered separately then, sure,
> it's a policy decision.
>
> In the cases where the touchscreen and panel need to be powered
> together I'd say it's more than a policy decision. Even if it wasn't,
> you have to make _some_ decision in the kernel. One could also argue
> that if you say that you're going to force the panel to be powered on
> whenever the touchscreen is on then that's just as much of a policy
> decision, isn't it?

I get your point, but with runtime pm suspending the touchpad after a
timeout it seems that would still be the most flexible alternative
which allows deferring the decision whether to support wakeup on
touch events to the user.

> In any case, the fact that there is a shared power rail / shared power
> sequence is because the hardware designer intended them to either be
> both off or both on. Whenever I asked the EEs that designed these
> boards about leaving the touchscreen on while turning the panel power
> off they always looked at me incredulously and asked why I would ever
> do that. Although we can work around the hardware by powering the
> panel in order to allow the touchscreen to be on, it's just not the
> intention.

I hear you, but users sometimes want do things with their hardware which
may not have originally been intended (e.g. your kiosk example).

> > > > But the main reason is still that requesting resources belongs in
> > > > probe() and should not be deferred to some later random time where you
> > > > cannot inform driver core of failures (e.g. for probe deferral if the
> > > > interrupt controller is not yet available).
> > >
> > > OK, I guess the -EPROBE_DEFER is technically possible though probably
> > > not likely in practice. ...so that's a good reason to make sure we
> > > request the IRQ in probe even in the "panel follower" case. I still
> > > beleive Benjamin would prefer that this was abstracted out and not in
> > > the actual probe() routine, but I guess we can wait to hear from him.
> >
> > I talked to Benjamin at Kernel Recipes last week and I don't think he
> > has any fundamental objections to the fix I'm proposing.
>
> Sure. I don't either though I'm hoping that we can come up with a more
> complete solution long term.
>
>
> > I prefer it as it makes the code easier to reason about and clearly
> > marks the code paths that differ in case the device is a "panel
> > follower". And since you said it also makes the code look more like what
> > you originally intended, then I guess you should be ok with it too?
>
> It looks OK to me. The biggest objection I have is just that I dislike
> it when code churns because two people disagree what the nicer style
> is. It just makes for bigger diffs and more work to review things.

Ok, but this isn't just about style as that initial_power_on() function
which does all the magic needs to be broken up to fix the regression
(unless you want to convolute the driver and defer resource lookups
until panel power-on).

I'll respin a v2 with that panel-property lookup change I mentioned and
hopefully we can get this fixed this week.

> > > One last idea I had while digging would be to wonder if we could
> > > somehow solve this case with "IRQF_PROBE_SHARED". I guess that doesn't
> > > work well together with "IRQF_NO_AUTOEN", but conceivably we could
> > > have the interrupt handler return "IRQ_NONE" if the initial power up
> > > never happened? I haven't spent much time poking with shared
> > > interrupts though, so I don't know if there are other side effects...
> >
> > Yeah, that doesn't seem right, though. The interrupt line is not really
> > shared, it's just that we need to check whether the device is populated
> > before requesting the interrupt.
>
> I'm not convinced that marking it as shared is any "less right" than
> extra work to request the interrupt after we've probed the device.
> Fundamentally both are taking into account that another touchscreen
> might be trying to probe with the same interrupt line.

If you need to start to thinking about rewriting your interrupt handler,
I'd say that qualifies as "less right". ;)

Johan

2023-10-02 19:25:56

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Mon, Oct 2, 2023 at 5:09 AM Johan Hovold <[email protected]> wrote:
>
> > > > > I skimmed the thread were you added this, but I'm not sure I saw any
> > > > > reason for why powering on the panel follower temporarily during probe
> > > > > would not work?
> > > >
> > > > My first instinct says we can't do this, but let's think about it...
> > > >
> > > > In general the "panel follower" API is designed to give all the
> > > > decision making about when to power things on and off to the panel
> > > > driver, which is controlled by DRM.
> > > >
> > > > The reason for this is from experience I had when dealing with the
> > > > Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> > > > on that panel tended to die if you didn't sequence it just right.
> > > > Specifically, if you were sending pixels to the panel and then stopped
> > > > then you absolutely needed to power the panel off and on again. Folks
> > > > I talked to even claimed that the panel was working "to spec" since,
> > > > in the "Power Sequencing" section of the eDP spec it clearly shows
> > > > that you _must_ turn the panel off and on again after you stop giving
> > > > it bits. ...this is despite the fact that no other panel I've worked
> > > > with cares. ;-)
> > > >
> > > > On homestar, since we didn't have the "panel follower" API, we ended
> > > > up adding cost to the hardware and putting the panel and touchscreens
> > > > on different power rails. However, I wanted to make sure that if we
> > > > ran into a similar situation in the future (or maybe if we were trying
> > > > to make hardware work that we didn't have control over) that we could
> > > > solve it.
>
> Out of curiosity, are there any machines that actually need this
> "panel-follower" API today, or are saying above that this is just
> something that may be needed one day?

Yes. See commit de0874165b83 ("drm/panel: Add a way for other devices
to follow panel state") where I point to Cong Yang's original patch
[1]. In that patch Cong was trying to make things work by assuming
probe ordering and manually taking some of the power sequencing stuff
out of some of the drivers in order to get things to work.

[1] https://lore.kernel.org/r/20230519032316.3464732-1-yangcong5@huaqin.corp-partner.google.com



> > > > The other reason for giving full control to the panel driver is just
> > > > how userspace usually works. Right now userspace tends to power off
> > > > panels if they're not used (like if a lid is closed on a laptop) but
> > > > doesn't necessarily power off the touchscreen. Thus if the touchscreen
> > > > has the ability to keep things powered on then we'd never get to a low
> > > > power state.
> > >
> > > Don't you need to keep the touchscreen powered to support wakeup events
> > > (e.g. when not closing the lid)?
> >
> > No. The only reason you'd use panel follower is if the hardware was
> > designed such that the touchscreen needed to be power sequenced with
> > the panel. If the touchscreen can stay powered when the panel is off
> > then it is, by definition, not a panel follower.
> >
> > For a laptop I don't think most people expect the touchscreen to stay
> > powered when the screen is off. I certainly wouldn't expect it. If the
> > screen was off and I wanted to interact with the device, I would hit a
> > key on the keyboard or touch the trackpad. When the people designing
> > sc7180-trogdor chose to have the display and touchscreen share a power
> > rail they made a conscious choice that they didn't need the
> > touchscreen active when the screen was off.
>
> Sure, but that's a policy decision and not something that should be
> hard-coded in our drivers.

If the touchscreen and panel can be powered separately then, sure,
it's a policy decision.

In the cases where the touchscreen and panel need to be powered
together I'd say it's more than a policy decision. Even if it wasn't,
you have to make _some_ decision in the kernel. One could also argue
that if you say that you're going to force the panel to be powered on
whenever the touchscreen is on then that's just as much of a policy
decision, isn't it?

In any case, the fact that there is a shared power rail / shared power
sequence is because the hardware designer intended them to either be
both off or both on. Whenever I asked the EEs that designed these
boards about leaving the touchscreen on while turning the panel power
off they always looked at me incredulously and asked why I would ever
do that. Although we can work around the hardware by powering the
panel in order to allow the touchscreen to be on, it's just not the
intention.


> > > But the main reason is still that requesting resources belongs in
> > > probe() and should not be deferred to some later random time where you
> > > cannot inform driver core of failures (e.g. for probe deferral if the
> > > interrupt controller is not yet available).
> >
> > OK, I guess the -EPROBE_DEFER is technically possible though probably
> > not likely in practice. ...so that's a good reason to make sure we
> > request the IRQ in probe even in the "panel follower" case. I still
> > beleive Benjamin would prefer that this was abstracted out and not in
> > the actual probe() routine, but I guess we can wait to hear from him.
>
> I talked to Benjamin at Kernel Recipes last week and I don't think he
> has any fundamental objections to the fix I'm proposing.

Sure. I don't either though I'm hoping that we can come up with a more
complete solution long term.


> I prefer it as it makes the code easier to reason about and clearly
> marks the code paths that differ in case the device is a "panel
> follower". And since you said it also makes the code look more like what
> you originally intended, then I guess you should be ok with it too?

It looks OK to me. The biggest objection I have is just that I dislike
it when code churns because two people disagree what the nicer style
is. It just makes for bigger diffs and more work to review things.


> > One last idea I had while digging would be to wonder if we could
> > somehow solve this case with "IRQF_PROBE_SHARED". I guess that doesn't
> > work well together with "IRQF_NO_AUTOEN", but conceivably we could
> > have the interrupt handler return "IRQ_NONE" if the initial power up
> > never happened? I haven't spent much time poking with shared
> > interrupts though, so I don't know if there are other side effects...
>
> Yeah, that doesn't seem right, though. The interrupt line is not really
> shared, it's just that we need to check whether the device is populated
> before requesting the interrupt.

I'm not convinced that marking it as shared is any "less right" than
extra work to request the interrupt after we've probed the device.
Fundamentally both are taking into account that another touchscreen
might be trying to probe with the same interrupt line.

-Doug

2023-10-02 20:10:00

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] HID: i2c-hid: fix handling of unpopulated devices

Hi,

On Mon, Oct 2, 2023 at 8:48 AM Johan Hovold <[email protected]> wrote:
>
> > In any case, the fact that there is a shared power rail / shared power
> > sequence is because the hardware designer intended them to either be
> > both off or both on. Whenever I asked the EEs that designed these
> > boards about leaving the touchscreen on while turning the panel power
> > off they always looked at me incredulously and asked why I would ever
> > do that. Although we can work around the hardware by powering the
> > panel in order to allow the touchscreen to be on, it's just not the
> > intention.
>
> I hear you, but users sometimes want do things with their hardware which
> may not have originally been intended (e.g. your kiosk example).

...and they can. I don't think it's totally unreasonable for userspace
in this case to take into account that they need to keep the panel
powered on (maybe with the screen black and the backlight off) if they
want the touchscreen kept on. If I was coding up userspace it wouldn't
surprise me at all if the touchscreen stopped working when the panel
was off.

I will further note that there is actually hardware where it's even
more difficult. On the same sc7180-trogdor laptops (and others as
well) the USB webcam is _also_ powered by the same power rail. When
you power the screen off then the USB webcam deenumerates. When you
power the screen on then it shows back up. It would be really weird if
somehow the USB webcam driver needed a link to the panel to try to
keep it powered.


-Doug