There is a race in registering of gc->irq.domain when
probing the I2C driver.
This sometimes leads to a Kernel NULL pointer dereference
in gpiochip_to_irq function which uses the domain variable.
To avoid this issue, set gc->to_irq after domain is
initialized. This will make sure whenever gpiochip_to_irq
is called, it has domain already initialized.
Signed-off-by: Shreeya Patel <[email protected]>
---
Following is the NULL pointer dereference Oops for reference :-
kernel: Call Trace:
kernel: gpiod_to_irq+0x53/0x70
kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel: i2c_acpi_get_irq+0xc0/0xd0
kernel: i2c_device_probe+0x28a/0x2a0
kernel: really_probe+0xf2/0x460
kernel: driver_probe_device+0xe8/0x160
kernel: ? driver_allows_async_probing+0x50/0x50
kernel: bus_for_each_drv+0x8f/0xd0
kernel: __device_attach_async_helper+0x9f/0xf0
kernel: async_run_entry_fn+0x2e/0x110
kernel: process_one_work+0x214/0x3e0
kernel: worker_thread+0x4d/0x3d0
kernel: ? process_one_work+0x3e0/0x3e0
kernel: kthread+0x133/0x150
kernel: ? kthread_associate_blkcg+0xc0/0xc0
kernel: ret_from_fork+0x22/0x30
kernel: CR2: 0000000000000028
kernel: ---[ end trace d0f5a7a0e0eb268f ]---
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
drivers/gpio/gpiolib.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index abfbf546d159..9a6f7c265a91 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1512,7 +1512,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gc,
if (gc->to_irq)
chip_warn(gc, "to_irq is redefined in %s and you shouldn't rely on it\n", __func__);
- gc->to_irq = gpiochip_to_irq;
gc->irq.default_type = type;
gc->irq.lock_key = lock_key;
gc->irq.request_key = request_key;
@@ -1533,6 +1532,8 @@ static int gpiochip_add_irqchip(struct gpio_chip *gc,
return -EINVAL;
}
+ gc->to_irq = gpiochip_to_irq;
+
if (gc->irq.parent_handler) {
for (i = 0; i < gc->irq.num_parents; i++) {
void *data;
--
2.30.2
Hi Shreeya, all,
On 2021/11/09, Shreeya Patel wrote:
> There is a race in registering of gc->irq.domain when
> probing the I2C driver.
> This sometimes leads to a Kernel NULL pointer dereference
> in gpiochip_to_irq function which uses the domain variable.
>
> To avoid this issue, set gc->to_irq after domain is
> initialized. This will make sure whenever gpiochip_to_irq
> is called, it has domain already initialized.
>
What is stopping the next developer to moving the assignment to the
incorrect place? Aka should we add an inline comment about this?
<snip>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index abfbf546d159..9a6f7c265a91 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -1512,7 +1512,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gc,
> if (gc->to_irq)
> chip_warn(gc, "to_irq is redefined in %s and you shouldn't rely on it\n", __func__);
>
Move the warning alongside the assignment?
HTH
Emil
Emil Velikov <[email protected]> writes:
> Hi Shreeya, all,
>
> On 2021/11/09, Shreeya Patel wrote:
>> There is a race in registering of gc->irq.domain when
>> probing the I2C driver.
>> This sometimes leads to a Kernel NULL pointer dereference
>> in gpiochip_to_irq function which uses the domain variable.
>>
>> To avoid this issue, set gc->to_irq after domain is
>> initialized. This will make sure whenever gpiochip_to_irq
>> is called, it has domain already initialized.
>>
>
> What is stopping the next developer to moving the assignment to the
> incorrect place? Aka should we add an inline comment about this?
I agree with Emil. The patch seems like a workaround that doesn't
really solve the underlying issue. I'm not familiar with this code, but
it seems that gc is missing a lock during this initialization, to prevent
it from exposing a partially initialized gc->irq.
--
Gabriel Krisman Bertazi
On 16/11/21 1:23 am, Gabriel Krisman Bertazi wrote:
> Emil Velikov <[email protected]> writes:
>
>> Hi Shreeya, all,
>>
>> On 2021/11/09, Shreeya Patel wrote:
>>> There is a race in registering of gc->irq.domain when
>>> probing the I2C driver.
>>> This sometimes leads to a Kernel NULL pointer dereference
>>> in gpiochip_to_irq function which uses the domain variable.
>>>
>>> To avoid this issue, set gc->to_irq after domain is
>>> initialized. This will make sure whenever gpiochip_to_irq
>>> is called, it has domain already initialized.
>>>
>> What is stopping the next developer to moving the assignment to the
>> incorrect place? Aka should we add an inline comment about this?
> I agree with Emil. The patch seems like a workaround that doesn't
> really solve the underlying issue. I'm not familiar with this code, but
> it seems that gc is missing a lock during this initialization, to prevent
> it from exposing a partially initialized gc->irq.
I do not see any locking mechanism used for protecting the use of gc
members before they are
initialized. We faced a very similar problem with gc->to_irq as well
where we had to return EPROBE_DEFER until it was initialized and ready
to be used.
Linus, do you have any suggestion on what would be the correct way to
fix this issue of race in registration of gc members?
Thanks,
Shreeya Patel
>
On Thu, Nov 25, 2021 at 11:56 AM Shreeya Patel
<[email protected]> wrote:
> On 16/11/21 1:23 am, Gabriel Krisman Bertazi wrote:
> > Emil Velikov <[email protected]> writes:
> >> Hi Shreeya, all,
> >>
> >> On 2021/11/09, Shreeya Patel wrote:
> >>> There is a race in registering of gc->irq.domain when
> >>> probing the I2C driver.
> >>> This sometimes leads to a Kernel NULL pointer dereference
> >>> in gpiochip_to_irq function which uses the domain variable.
> >>>
> >>> To avoid this issue, set gc->to_irq after domain is
> >>> initialized. This will make sure whenever gpiochip_to_irq
> >>> is called, it has domain already initialized.
> >>>
> >> What is stopping the next developer to moving the assignment to the
> >> incorrect place? Aka should we add an inline comment about this?
> > I agree with Emil. The patch seems like a workaround that doesn't
> > really solve the underlying issue. I'm not familiar with this code, but
> > it seems that gc is missing a lock during this initialization, to prevent
> > it from exposing a partially initialized gc->irq.
>
> I do not see any locking mechanism used for protecting the use of gc
> members before they are
> initialized. We faced a very similar problem with gc->to_irq as well
> where we had to return EPROBE_DEFER until it was initialized and ready
> to be used.
>
> Linus, do you have any suggestion on what would be the correct way to
> fix this issue of race in registration of gc members?
Not really, we just haven't faced the issue until now because it is only
now that people have actually added all these devlinks and deferred
probing and what not that actually starts to stress the system and
now that results in it being less stable, right?
How do other subsystems do it?
Yours,
Linus Walleij