2022-02-16 21:03:20

by Shreeya Patel

[permalink] [raw]
Subject: [PATCH v5] gpio: Return EPROBE_DEFER if gc->to_irq is NULL

We are racing the registering of .to_irq when probing the
i2c driver. This results in random failure of touchscreen
devices.

Following explains the race condition better.

[gpio driver] gpio driver registers gpio chip
[gpio consumer] gpio is acquired
[gpio consumer] gpiod_to_irq() fails with -ENXIO
[gpio driver] gpio driver registers irqchip
gpiod_to_irq works at this point, but -ENXIO is fatal

We could see the following errors in dmesg logs when gc->to_irq is NULL

[2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
[2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22

To avoid this situation, defer probing until to_irq is registered.
Returning -EPROBE_DEFER would be the first step towards avoiding
the failure of devices due to the race in registration of .to_irq.
Final solution to this issue would be to avoid using gc irq members
until they are fully initialized.

This issue has been reported many times in past and people have been
using workarounds like changing the pinctrl_amd to built-in instead
of loading it as a module or by adding a softdep for pinctrl_amd into
the config file.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>

---
Changes in v5
- Improve explanation in commit message and sending it to the correct
email address.

Changes in v4
- Remove blank line and make the first letter of the sentence
capital.

Changes in v3
- Fix the error reported by kernel test robot.

Changes in v2
- Add a condition to check for irq chip to avoid bogus error.
---
drivers/gpio/gpiolib.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 3859911b61e9..a3d14277f17c 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -3147,6 +3147,16 @@ int gpiod_to_irq(const struct gpio_desc *desc)

return retirq;
}
+#ifdef CONFIG_GPIOLIB_IRQCHIP
+ if (gc->irq.chip) {
+ /*
+ * Avoid race condition with other code, which tries to lookup
+ * an IRQ before the irqchip has been properly registered,
+ * i.e. while gpiochip is still being brought up.
+ */
+ return -EPROBE_DEFER;
+ }
+#endif
return -ENXIO;
}
EXPORT_SYMBOL_GPL(gpiod_to_irq);
--
2.30.2


2022-02-24 00:47:51

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: [PATCH v5] gpio: Return EPROBE_DEFER if gc->to_irq is NULL

On Wed, Feb 16, 2022 at 9:27 PM Shreeya Patel
<[email protected]> wrote:
>
> We are racing the registering of .to_irq when probing the
> i2c driver. This results in random failure of touchscreen
> devices.
>
> Following explains the race condition better.
>
> [gpio driver] gpio driver registers gpio chip
> [gpio consumer] gpio is acquired
> [gpio consumer] gpiod_to_irq() fails with -ENXIO
> [gpio driver] gpio driver registers irqchip
> gpiod_to_irq works at this point, but -ENXIO is fatal
>
> We could see the following errors in dmesg logs when gc->to_irq is NULL
>
> [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
> [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
>
> To avoid this situation, defer probing until to_irq is registered.
> Returning -EPROBE_DEFER would be the first step towards avoiding
> the failure of devices due to the race in registration of .to_irq.
> Final solution to this issue would be to avoid using gc irq members
> until they are fully initialized.
>
> This issue has been reported many times in past and people have been
> using workarounds like changing the pinctrl_amd to built-in instead
> of loading it as a module or by adding a softdep for pinctrl_amd into
> the config file.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
> Reviewed-by: Linus Walleij <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Shreeya Patel <[email protected]>
>
> ---
> Changes in v5
> - Improve explanation in commit message and sending it to the correct
> email address.
>
> Changes in v4
> - Remove blank line and make the first letter of the sentence
> capital.
>
> Changes in v3
> - Fix the error reported by kernel test robot.
>
> Changes in v2
> - Add a condition to check for irq chip to avoid bogus error.
> ---
> drivers/gpio/gpiolib.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index 3859911b61e9..a3d14277f17c 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -3147,6 +3147,16 @@ int gpiod_to_irq(const struct gpio_desc *desc)
>
> return retirq;
> }
> +#ifdef CONFIG_GPIOLIB_IRQCHIP
> + if (gc->irq.chip) {
> + /*
> + * Avoid race condition with other code, which tries to lookup
> + * an IRQ before the irqchip has been properly registered,
> + * i.e. while gpiochip is still being brought up.
> + */
> + return -EPROBE_DEFER;
> + }
> +#endif
> return -ENXIO;
> }
> EXPORT_SYMBOL_GPL(gpiod_to_irq);
> --
> 2.30.2
>

Queued for fixes, thanks!

Bart