2019-10-05 10:56:48

by Hans de Goede

[permalink] [raw]
Subject: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Commit c394159310d0 ("Input: soc_button_array - add support for newer
surface devices") not only added support for the MSHW0040 ACPI HID,
but for some reason it also makes changes to the error handling of the
soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally
this seamingly unrelated change would have been made in a separate commit,
with a message explaining the what and why of this change.

I guess this change may have been added to deal with -EPROBE_DEFER errors,
but in case of the existing support for PNP0C40 devices, treating
-EPROBE_DEFER as any other error is deliberate, see the comment this
commit adds for why.

The actual returning of -EPROBE_DEFER to the caller of soc_button_probe()
introduced by the new error checking causes a serious regression:

On devices with so called virtual GPIOs soc_button_lookup_gpio() will
always return -EPROBE_DEFER for these fake GPIOs, when this happens
during the second call of soc_button_device_create() we already have
successfully registered our first child. This causes the kernel to think
we are making progress with probing things even though we unregister the
child before again before we return the -EPROBE_DEFER. Since we are making
progress the kernel will retry deferred-probes again immediately ending
up stuck in a loop with the following showing in dmesg:

[ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537
[ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538
[ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539
[ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540
[ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541
[ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542
[ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543
[ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544
[ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545
[ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546
[ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547
<continues on and on and on>

And 1 CPU core being stuck at 100% and udev hanging since it is waiting
for the modprobe of soc_button_array to return.

This patch reverts the soc_button_lookup_gpio() error handling changes,
fixing this regression.

Fixes: c394159310d0 ("Input: soc_button_array - add support for newer surface devices")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205031
Cc: Maximilian Luz <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
---
drivers/input/misc/soc_button_array.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c
index 97e3639e99d0..97761421d6dd 100644
--- a/drivers/input/misc/soc_button_array.c
+++ b/drivers/input/misc/soc_button_array.c
@@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev,
continue;

gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index);
- if (gpio < 0 && gpio != -ENOENT) {
- error = gpio;
- goto err_free_mem;
- } else if (!gpio_is_valid(gpio)) {
- /* Skip GPIO if not present */
+ if (!gpio_is_valid(gpio)) {
+ /*
+ * Skip GPIO if not present. Note we deliberately
+ * ignore -EPROBE_DEFER errors here. On some devices
+ * Intel is using so called virtual GPIOs which are not
+ * GPIOs at all but some way for AML code to check some
+ * random status bits without need a custom opregion.
+ * In some cases the resources table we parse points to
+ * such a virtual GPIO, since these are not real GPIOs
+ * we do not have a driver for these so they will never
+ * show up, therefor we ignore -EPROBE_DEFER.
+ */
continue;
}

--
2.23.0


2019-10-05 12:21:54

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Hi,

sorry for the inconvenience this change has caused.

On 10/5/19 12:55 PM, Hans de Goede wrote:
> Note ideally this seamingly unrelated change would have been made in a
> separate commit, with a message explaining the what and why of this
> change.

Would I have known the impact, then yes. This change was added due to
some reported instances where it seems that soc_button_array would
occasionally load on MSHW0040 before the GPIO controller was ready,
causing power and volume buttons to not work.

> I guess this change may have been added to deal with -EPROBE_DEFER errors,

Correct. After a comment mentioned that gpiod_get() returning
-EPROBE_DEFER would be the proper way to detect this, I decided on this
change.

Might I suggest the following addition:

Signed-off-by: Maximilian Luz <[email protected]>
---
drivers/input/misc/soc_button_array.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c
index 97e3639e99d0..a0f0c977b790 100644
--- a/drivers/input/misc/soc_button_array.c
+++ b/drivers/input/misc/soc_button_array.c
@@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev,
continue;

gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index);
- if (gpio < 0 && gpio != -ENOENT) {
- error = gpio;
- goto err_free_mem;
- } else if (!gpio_is_valid(gpio)) {
- /* Skip GPIO if not present */
+ if (!gpio_is_valid(gpio)) {
+ /*
+ * Skip GPIO if not present. Note we deliberately
+ * ignore -EPROBE_DEFER errors here. On some devices
+ * Intel is using so called virtual GPIOs which are not
+ * GPIOs at all but some way for AML code to check some
+ * random status bits without need a custom opregion.
+ * In some cases the resources table we parse points to
+ * such a virtual GPIO, since these are not real GPIOs
+ * we do not have a driver for these so they will never
+ * show up, therefor we ignore -EPROBE_DEFER.
+ */
continue;
}

@@ -429,6 +436,14 @@ static int soc_device_check_MSHW0040(struct device *dev)

dev_dbg(dev, "OEM Platform Revision %llu\n", oem_platform_rev);

+ /*
+ * Explicitly check if GPIO controller is ready. This check is done here
+ * to avoid issues with virtual GPIOs on other chips, as elaborated above.
+ * We are at least expecting one GPIO pin for the power button (index 0).
+ */
+ if (soc_button_lookup_gpio(dev, 0) == -EPROBE_DEFER)
+ return -EPROBE_DEFER;
+
return 0;
}

--
2.23.0

2019-10-05 12:27:33

by Yauhen Kharuzhy

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

On Sat, Oct 05, 2019 at 12:55:51PM +0200, Hans de Goede wrote:
> Commit c394159310d0 ("Input: soc_button_array - add support for newer
> surface devices") not only added support for the MSHW0040 ACPI HID,
> but for some reason it also makes changes to the error handling of the
> soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally
> this seamingly unrelated change would have been made in a separate commit,
> with a message explaining the what and why of this change.
>
> I guess this change may have been added to deal with -EPROBE_DEFER errors,
> but in case of the existing support for PNP0C40 devices, treating
> -EPROBE_DEFER as any other error is deliberate, see the comment this
> commit adds for why.
>
> The actual returning of -EPROBE_DEFER to the caller of soc_button_probe()
> introduced by the new error checking causes a serious regression:
>
> On devices with so called virtual GPIOs soc_button_lookup_gpio() will
> always return -EPROBE_DEFER for these fake GPIOs, when this happens
> during the second call of soc_button_device_create() we already have
> successfully registered our first child. This causes the kernel to think
> we are making progress with probing things even though we unregister the
> child before again before we return the -EPROBE_DEFER. Since we are making
> progress the kernel will retry deferred-probes again immediately ending
> up stuck in a loop with the following showing in dmesg:
>
> [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537
> [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538
> [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539
> [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540
> [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541
> [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542
> [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543
> [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544
> [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545
> [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546
> [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547
> <continues on and on and on>
>
> And 1 CPU core being stuck at 100% and udev hanging since it is waiting
> for the modprobe of soc_button_array to return.
>
> This patch reverts the soc_button_lookup_gpio() error handling changes,
> fixing this regression.

I confirm this issue and I made similar hotfix while developing of other
things, so it's actual and patch should work. I will test it at my hardware
soon.

--
Yauhen Kharuzhy

2019-10-05 13:21:46

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Hi,

On 05-10-2019 14:17, Maximilian Luz wrote:
> Hi,
>
> sorry for the inconvenience this change has caused.
>
> On 10/5/19 12:55 PM, Hans de Goede wrote:
>> Note ideally this seamingly unrelated change would have been made in a
>> separate commit, with a message explaining the what and why of this
>> change.
>
> Would I have known the impact, then yes. This change was added due to
> some reported instances where it seems that soc_button_array would
> occasionally load on MSHW0040 before the GPIO controller was ready,
> causing power and volume buttons to not work.
>
>> I guess this change may have been added to deal with -EPROBE_DEFER errors,
>
> Correct. After a comment mentioned that gpiod_get() returning
> -EPROBE_DEFER would be the proper way to detect this, I decided on this
> change.

Ok, on x86 the GPIO drivers really should all be builtin because
various ACPI methods including device D0 / D3 (power-on/off) methods
may depend on them. So normally this should never happen.

If this (-EPROBE_DEFER on surface devices) somehow still is happening
please let me know and we will figure something out.

> Might I suggest the following addition:
>
> Signed-off-by: Maximilian Luz <[email protected]>

S-o-b is only for patches which pass through your hands, e.g. if
you make changes to my patch and submit a v2 of it.

I guess you mean / want one of:

Acked-by: ...

or

Reviewed-by: ...


?

Regards,

Hans



> ---
>  drivers/input/misc/soc_button_array.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c
> index 97e3639e99d0..a0f0c977b790 100644
> --- a/drivers/input/misc/soc_button_array.c
> +++ b/drivers/input/misc/soc_button_array.c
> @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev,
>              continue;
>
>          gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index);
> -        if (gpio < 0 && gpio != -ENOENT) {
> -            error = gpio;
> -            goto err_free_mem;
> -        } else if (!gpio_is_valid(gpio)) {
> -            /* Skip GPIO if not present */
> +        if (!gpio_is_valid(gpio)) {
> +            /*
> +             * Skip GPIO if not present. Note we deliberately
> +             * ignore -EPROBE_DEFER errors here. On some devices
> +             * Intel is using so called virtual GPIOs which are not
> +             * GPIOs at all but some way for AML code to check some
> +             * random status bits without need a custom opregion.
> +             * In some cases the resources table we parse points to
> +             * such a virtual GPIO, since these are not real GPIOs
> +             * we do not have a driver for these so they will never
> +             * show up, therefor we ignore -EPROBE_DEFER.
> +             */
>              continue;
>          }
>
> @@ -429,6 +436,14 @@ static int soc_device_check_MSHW0040(struct device *dev)
>
>      dev_dbg(dev, "OEM Platform Revision %llu\n", oem_platform_rev);
>
> +    /*
> +     * Explicitly check if GPIO controller is ready. This check is done here
> +     * to avoid issues with virtual GPIOs on other chips, as elaborated above.
> +     * We are at least expecting one GPIO pin for the power button (index 0).
> +     */
> +    if (soc_button_lookup_gpio(dev, 0) == -EPROBE_DEFER)
> +        return -EPROBE_DEFER;
> +
>      return 0;
>  }
>

2019-10-05 13:36:27

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Hi,

On 10/5/19 3:20 PM, Hans de Goede wrote:
> Hi,
>
> S-o-b is only for patches which pass through your hands, e.g. if
> you make changes to my patch and submit a v2 of it.
>
> I guess you mean / want one of:
>
> Acked-by: ...
>
> or
>
> Reviewed-by: ...
>
>
> ?

I apologize if the s-o-b was misplaced here, it was meant for the
modification proposed below. I can send a v2 if requested, but I felt
this was a change that I wanted to bring up here first, to see if anyone
has any comments on it.

Regards,

Maximilian

2019-10-05 15:06:38

by Maximilian Luz

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Hi, again

On 10/5/19 3:20 PM, Hans de Goede wrote:
> Ok, on x86 the GPIO drivers really should all be builtin because
> various ACPI methods including device D0 / D3 (power-on/off) methods
> may depend on them. So normally this should never happen.
>
> If this (-EPROBE_DEFER on surface devices) somehow still is happening
> please let me know and we will figure something out.

I have never personally experienced this, only received reports which
indicated this and that the change (as well as manually reloading
soc_button_array) fixed it. I will come back to you if I hear anything
in regards to this again.

I have now also tested your patch on the Surface Book 2. Does not cause
any issues as far as I can tell.

Tested-by: Maximilian Luz <[email protected]>

And if that is needed/wanted

Acked-by: Maximilian Luz <[email protected]>

Regards,

Maximilian

2019-10-05 20:15:36

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

Hi,

On 05-10-2019 17:01, Maximilian Luz wrote:
> Hi, again
>
> On 10/5/19 3:20 PM, Hans de Goede wrote:
>> Ok, on x86 the GPIO drivers really should all be builtin because
>> various ACPI methods including device D0 / D3 (power-on/off) methods
>> may depend on them. So normally this should never happen.
>>
>> If this (-EPROBE_DEFER on surface devices) somehow still is happening
>> please let me know and we will figure something out.
>
> I have never personally experienced this, only received reports which
> indicated this and that the change (as well as manually reloading
> soc_button_array) fixed it. I will come back to you if I hear anything
> in regards to this again.
>
> I have now also tested your patch on the Surface Book 2. Does not cause
> any issues as far as I can tell.
>
> Tested-by: Maximilian Luz <[email protected]>
>
> And if that is needed/wanted
>
> Acked-by: Maximilian Luz <[email protected]>

Great, thank you.

Regards,

Hans

2019-10-08 23:44:54

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

On Sat, Oct 05, 2019 at 12:55:51PM +0200, Hans de Goede wrote:
> Commit c394159310d0 ("Input: soc_button_array - add support for newer
> surface devices") not only added support for the MSHW0040 ACPI HID,
> but for some reason it also makes changes to the error handling of the
> soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally
> this seamingly unrelated change would have been made in a separate commit,
> with a message explaining the what and why of this change.
>
> I guess this change may have been added to deal with -EPROBE_DEFER errors,
> but in case of the existing support for PNP0C40 devices, treating
> -EPROBE_DEFER as any other error is deliberate, see the comment this
> commit adds for why.
>
> The actual returning of -EPROBE_DEFER to the caller of soc_button_probe()
> introduced by the new error checking causes a serious regression:
>
> On devices with so called virtual GPIOs soc_button_lookup_gpio() will
> always return -EPROBE_DEFER for these fake GPIOs, when this happens
> during the second call of soc_button_device_create() we already have
> successfully registered our first child. This causes the kernel to think
> we are making progress with probing things even though we unregister the
> child before again before we return the -EPROBE_DEFER. Since we are making
> progress the kernel will retry deferred-probes again immediately ending
> up stuck in a loop with the following showing in dmesg:
>
> [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537
> [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538
> [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539
> [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540
> [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541
> [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542
> [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543
> [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544
> [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545
> [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546
> [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547
> <continues on and on and on>
>
> And 1 CPU core being stuck at 100% and udev hanging since it is waiting
> for the modprobe of soc_button_array to return.
>
> This patch reverts the soc_button_lookup_gpio() error handling changes,
> fixing this regression.
>
> Fixes: c394159310d0 ("Input: soc_button_array - add support for newer surface devices")
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205031
> Cc: Maximilian Luz <[email protected]>
> Signed-off-by: Hans de Goede <[email protected]>

Applied, thank you.

> ---
> drivers/input/misc/soc_button_array.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c
> index 97e3639e99d0..97761421d6dd 100644
> --- a/drivers/input/misc/soc_button_array.c
> +++ b/drivers/input/misc/soc_button_array.c
> @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev,
> continue;
>
> gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index);
> - if (gpio < 0 && gpio != -ENOENT) {
> - error = gpio;
> - goto err_free_mem;
> - } else if (!gpio_is_valid(gpio)) {
> - /* Skip GPIO if not present */
> + if (!gpio_is_valid(gpio)) {
> + /*
> + * Skip GPIO if not present. Note we deliberately
> + * ignore -EPROBE_DEFER errors here. On some devices
> + * Intel is using so called virtual GPIOs which are not
> + * GPIOs at all but some way for AML code to check some
> + * random status bits without need a custom opregion.
> + * In some cases the resources table we parse points to
> + * such a virtual GPIO, since these are not real GPIOs
> + * we do not have a driver for these so they will never
> + * show up, therefor we ignore -EPROBE_DEFER.
> + */
> continue;
> }
>
> --
> 2.23.0
>

--
Dmitry