2024-02-27 11:34:52

by Herve Codina

[permalink] [raw]
Subject: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

Hi,

When a gpio chip device is removed while some related gpio are used by
the user-space (gpiomon for instance), the following warning can appear:
remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
...
Call trace:
remove_proc_entry+0x190/0x19c
unregister_irq_proc+0xd0/0x104
free_desc+0x4c/0xc4
irq_free_descs+0x6c/0x90
irq_dispose_mapping+0x104/0x14c
gpiochip_irqchip_remove+0xcc/0x1a4
gpiochip_remove+0x48/0x100
...

Indeed, even if the gpio removal is notified to the gpio-cdev, the
IRQ used is not released when it should be.

This series calls the gpio removal notifier sooner in the removal
process in order to give a chance to a notifier function to release
the IRQ before releasing the IRQ mapping and adds the needed
operations to release the IRQ in the gpio cdev notifier function.

Compared to the previous iteration:
https://lore.kernel.org/linux-kernel/[email protected]/
this v2 series set gdev->chip to NULL before calling gcdev_unregister().

Also, this v2 series was rebased on top of for-next branch of the GPIO
tree.

Best regards,
Hervé Codina

Changes v1 -> v2:
- Patch 1
Set gdev->chip to NULL before calling gcdev_unregister()

- Patch 2
No changes

Herve Codina (2):
gpiolib: call gcdev_unregister() sooner in the removal operations
gpiolib: cdev: release IRQs when the gpio chip device is removed

drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
drivers/gpio/gpiolib.c | 6 +++++-
2 files changed, 27 insertions(+), 12 deletions(-)

--
2.43.0



2024-02-27 11:34:56

by Herve Codina

[permalink] [raw]
Subject: [PATCH v2 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed

When gpio chip device is removed while some related gpio are used by the
user-space, the following warning can appear:
remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
...
Call trace:
remove_proc_entry+0x190/0x19c
unregister_irq_proc+0xd0/0x104
free_desc+0x4c/0xc4
irq_free_descs+0x6c/0x90
irq_dispose_mapping+0x104/0x14c
gpiochip_irqchip_remove+0xcc/0x1a4
gpiochip_remove+0x48/0x100
...

Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the
gpio chip device is removed.

Release IRQs used in the device removal notifier functions.
Also move one of these function definition in order to avoid a forward
declaration (move after the edge_detector_stop() definition).

Signed-off-by: Herve Codina <[email protected]>
---
drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index f384fa278764..96c05bd326d0 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -682,17 +682,6 @@ static void line_set_debounce_period(struct line *line,
GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \
GPIO_V2_LINE_EDGE_FLAGS)

-static int linereq_unregistered_notify(struct notifier_block *nb,
- unsigned long action, void *data)
-{
- struct linereq *lr = container_of(nb, struct linereq,
- device_unregistered_nb);
-
- wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
-
- return NOTIFY_OK;
-}
-
static void linereq_put_event(struct linereq *lr,
struct gpio_v2_line_event *le)
{
@@ -1183,6 +1172,23 @@ static int edge_detector_update(struct line *line,
return edge_detector_setup(line, lc, line_idx, edflags);
}

+static int linereq_unregistered_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct linereq *lr = container_of(nb, struct linereq,
+ device_unregistered_nb);
+ int i;
+
+ for (i = 0; i < lr->num_lines; i++) {
+ if (lr->lines[i].desc)
+ edge_detector_stop(&lr->lines[i]);
+ }
+
+ wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
+
+ return NOTIFY_OK;
+}
+
static u64 gpio_v2_line_config_flags(struct gpio_v2_line_config *lc,
unsigned int line_idx)
{
@@ -1892,6 +1898,11 @@ static int lineevent_unregistered_notify(struct notifier_block *nb,
struct lineevent_state *le = container_of(nb, struct lineevent_state,
device_unregistered_nb);

+ if (le->irq) {
+ free_irq(le->irq, le);
+ le->irq = 0;
+ }
+
wake_up_poll(&le->wait, EPOLLIN | EPOLLERR);

return NOTIFY_OK;
--
2.43.0


2024-02-27 11:35:07

by Herve Codina

[permalink] [raw]
Subject: [PATCH v2 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations

When gpio chip device is removed while some related gpio are used by the
user-space, the following warning can appear:
remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
...
Call trace:
remove_proc_entry+0x190/0x19c
unregister_irq_proc+0xd0/0x104
free_desc+0x4c/0xc4
irq_free_descs+0x6c/0x90
irq_dispose_mapping+0x104/0x14c
gpiochip_irqchip_remove+0xcc/0x1a4
gpiochip_remove+0x48/0x100
...

Indeed, the gpio cdev uses an IRQ but this IRQ is not released
(irq_free() call) before the call to gpiochip_irqchip_remove().

In order to give a chance to the gpio dev driver to release this
irq before removing the IRQ mapping, notify the cdev driver about
the gpio device removal before the gpiochip_irqchip_remove() call.

Signed-off-by: Herve Codina <[email protected]>
---
drivers/gpio/gpiolib.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 63e793a410e3..4ad3e260dec2 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1118,6 +1118,11 @@ void gpiochip_remove(struct gpio_chip *gc)
/* Numb the device, cancelling all outstanding operations */
rcu_assign_pointer(gdev->chip, NULL);
synchronize_srcu(&gdev->srcu);
+ /*
+ * Tell gcdev that the device is removing. If any gpio resources are in
+ * use (irqs for instance), it's time for gcdev to release them.
+ */
+ gcdev_unregister(gdev);
gpiochip_irqchip_remove(gc);
acpi_gpiochip_remove(gc);
of_gpiochip_remove(gc);
@@ -1135,7 +1140,6 @@ void gpiochip_remove(struct gpio_chip *gc)
* be removed, else it will be dangling until the last user is
* gone.
*/
- gcdev_unregister(gdev);
gpio_device_put(gdev);
}
EXPORT_SYMBOL_GPL(gpiochip_remove);
--
2.43.0


2024-02-27 19:32:09

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Tue, Feb 27, 2024 at 12:34 PM Herve Codina <herve.codina@bootlincom> wrote:
>
> Hi,
>
> When a gpio chip device is removed while some related gpio are used by
> the user-space (gpiomon for instance), the following warning can appear:
> remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
> WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
> ...
> Call trace:
> remove_proc_entry+0x190/0x19c
> unregister_irq_proc+0xd0/0x104
> free_desc+0x4c/0xc4
> irq_free_descs+0x6c/0x90
> irq_dispose_mapping+0x104/0x14c
> gpiochip_irqchip_remove+0xcc/0x1a4
> gpiochip_remove+0x48/0x100
> ...
>
> Indeed, even if the gpio removal is notified to the gpio-cdev, the
> IRQ used is not released when it should be.
>
> This series calls the gpio removal notifier sooner in the removal
> process in order to give a chance to a notifier function to release
> the IRQ before releasing the IRQ mapping and adds the needed
> operations to release the IRQ in the gpio cdev notifier function.
>
> Compared to the previous iteration:
> https://lore.kernel.org/linux-kernel/[email protected]/
> this v2 series set gdev->chip to NULL before calling gcdev_unregister().
>
> Also, this v2 series was rebased on top of for-next branch of the GPIO
> tree.
>
> Best regards,
> Hervé Codina
>
> Changes v1 -> v2:
> - Patch 1
> Set gdev->chip to NULL before calling gcdev_unregister()
>
> - Patch 2
> No changes
>
> Herve Codina (2):
> gpiolib: call gcdev_unregister() sooner in the removal operations
> gpiolib: cdev: release IRQs when the gpio chip device is removed
>
> drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
> drivers/gpio/gpiolib.c | 6 +++++-
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
> --
> 2.43.0
>

Sorry but this is just papering over the real issue. I'd say NAK for
now as I'd really prefer to get to the root of the problem and fix it
for all GPIO interrupt users.

Kent, Linus: what do you think?

Bart

2024-02-28 01:11:41

by Kent Gibson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Tue, Feb 27, 2024 at 08:31:33PM +0100, Bartosz Golaszewski wrote:
> On Tue, Feb 27, 2024 at 12:34 PM Herve Codina <[email protected]> wrote:
> >
> > Hi,
> >
> > When a gpio chip device is removed while some related gpio are used by
> > the user-space (gpiomon for instance), the following warning can appear:
> > remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
> > WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
> > ...
> > Call trace:
> > remove_proc_entry+0x190/0x19c
> > unregister_irq_proc+0xd0/0x104
> > free_desc+0x4c/0xc4
> > irq_free_descs+0x6c/0x90
> > irq_dispose_mapping+0x104/0x14c
> > gpiochip_irqchip_remove+0xcc/0x1a4
> > gpiochip_remove+0x48/0x100
> > ...
> >
> > Indeed, even if the gpio removal is notified to the gpio-cdev, the
> > IRQ used is not released when it should be.
> >
> > This series calls the gpio removal notifier sooner in the removal
> > process in order to give a chance to a notifier function to release
> > the IRQ before releasing the IRQ mapping and adds the needed
> > operations to release the IRQ in the gpio cdev notifier function.
> >
> > Compared to the previous iteration:
> > https://lore.kernel.org/linux-kernel/[email protected]/
> > this v2 series set gdev->chip to NULL before calling gcdev_unregister().
> >
> > Also, this v2 series was rebased on top of for-next branch of the GPIO
> > tree.
> >
> > Best regards,
> > Hervé Codina
> >
> > Changes v1 -> v2:
> > - Patch 1
> > Set gdev->chip to NULL before calling gcdev_unregister()
> >
> > - Patch 2
> > No changes
> >
> > Herve Codina (2):
> > gpiolib: call gcdev_unregister() sooner in the removal operations
> > gpiolib: cdev: release IRQs when the gpio chip device is removed
> >
> > drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
> > drivers/gpio/gpiolib.c | 6 +++++-
> > 2 files changed, 27 insertions(+), 12 deletions(-)
> >
> > --
> > 2.43.0
> >
>
> Sorry but this is just papering over the real issue. I'd say NAK for
> now as I'd really prefer to get to the root of the problem and fix it
> for all GPIO interrupt users.
>
> Kent, Linus: what do you think?
>

Agreed - a broader solution makes sense to me.

Cheers,
Kent.

2024-02-29 14:11:26

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Tue, Feb 27, 2024 at 8:31 PM Bartosz Golaszewski <[email protected]> wrote:
> On Tue, Feb 27, 2024 at 12:34 PM Herve Codina <[email protected]> wrote:

> > Herve Codina (2):
> > gpiolib: call gcdev_unregister() sooner in the removal operations
> > gpiolib: cdev: release IRQs when the gpio chip device is removed
(...)
> Sorry but this is just papering over the real issue. I'd say NAK for
> now as I'd really prefer to get to the root of the problem and fix it
> for all GPIO interrupt users.
>
> Kent, Linus: what do you think?

I'm not sure. What does "all GPIO interrupt users" mean in this context?

If you mean "also the kernel-internal" (such as some random driver
having performed gpiod_to_irq() and requested it or, taken it from a
phandle in the device tree) then I think these are slightly semantically
different.

The big difference is that users of the cdev are *expected* to *crash*
sometimes, releasing the file handle and then this cleanup needs to
happen. Also cdev is more likely to be used for hotplugged/unplugged
GPIOs.

The kernel-internal users are *not* expected to crash, but to clean up
their usage the right way. Also they are predominantly if not exclusively
used for fixed GPIOs such as those on an SoC that do not hot-unplug
and go away randomly.

Use case 1: you run gpio-mon on a random GPIO with IRQ on a board.
It is using a SoC-native GPIO. Suddenly gpio-mon crashes because
of OOM or whatever and releases the filehandle on the way down.
What to do?

Use case 2: you plug in a USB dongle with GPIOs on. Start gpio-mon
on one of the pins. Unplug the dongle. Then it is fair that the cdev cleans
up the irq, because I don't see any way that a kernel driver would
request any of these GPIOs (but I'm more uncertain here).

I just think it is necessary to think about the big picture here.

Yours,
Linus Walleij

2024-03-01 07:23:04

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Thu, Feb 29, 2024 at 3:09 PM Linus Walleij <linus.walleij@linaroorg> wrote:
>
> On Tue, Feb 27, 2024 at 8:31 PM Bartosz Golaszewski <[email protected]> wrote:
> > On Tue, Feb 27, 2024 at 12:34 PM Herve Codina <[email protected]> wrote:
>
> > > Herve Codina (2):
> > > gpiolib: call gcdev_unregister() sooner in the removal operations
> > > gpiolib: cdev: release IRQs when the gpio chip device is removed
> (...)
> > Sorry but this is just papering over the real issue. I'd say NAK for
> > now as I'd really prefer to get to the root of the problem and fix it
> > for all GPIO interrupt users.
> >
> > Kent, Linus: what do you think?
>
> I'm not sure. What does "all GPIO interrupt users" mean in this context?
>
> If you mean "also the kernel-internal" (such as some random driver
> having performed gpiod_to_irq() and requested it or, taken it from a
> phandle in the device tree) then I think these are slightly semantically
> different.
>

Yes I mean both in-kernel and user-space consumers.

> The big difference is that users of the cdev are *expected* to *crash*
> sometimes, releasing the file handle and then this cleanup needs to
> happen. Also cdev is more likely to be used for hotplugged/unplugged
> GPIOs.
>
> The kernel-internal users are *not* expected to crash, but to clean up
> their usage the right way. Also they are predominantly if not exclusively
> used for fixed GPIOs such as those on an SoC that do not hot-unplug
> and go away randomly.
>
> Use case 1: you run gpio-mon on a random GPIO with IRQ on a board.
> It is using a SoC-native GPIO. Suddenly gpio-mon crashes because
> of OOM or whatever and releases the filehandle on the way down.
> What to do?
>
> Use case 2: you plug in a USB dongle with GPIOs on. Start gpio-mon
> on one of the pins. Unplug the dongle. Then it is fair that the cdev cleans
> up the irq, because I don't see any way that a kernel driver would
> request any of these GPIOs (but I'm more uncertain here).
>
> I just think it is necessary to think about the big picture here.
>

Agreed and the big picture - just like with the reason behind the SRCU
rework - is the fact that even static GPIO chips defined in ACPI or DT
can be unbound. Unless you want to make the decision that we
arbitrarily suppress_bind_attrs for all GPIO chips which I don't think
you do.

I have shown in the discussion under the previous iteration that a
static GPIO chip defined in DT that is also marked as an
interrupt-controller may have interrupts requested directly from its
irq domain bypassing the .to_irq() callback. As long as this GPIO chip
may be unbound (and we do not restrict this) it means the splat
mentioned here can be triggered from user-space with a simple rmmod
because a requested irq does not increase the module reference count
nor do device links seem to work for interrupts not associated with a
struct device explicitly.

I DO want to fix it, don't get me wrong. I don't want to just leave it
like this, especially since we've made so much progress with
hotpluggability recently. I just don't believe this is the right fix,
I will try to come up with a solution that addresses the issue
globally.

Bart

> Yours,
> Linus Walleij

[1] https://lore.kernel.org/lkml/CAMRc=Mf5fRWoOMsJ41vzvE=-vp3wi-Obw=j5fBk3DuQaZNQP2Q@mail.gmail.com/

2024-03-01 20:15:53

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Fri, Mar 1, 2024 at 8:21 AM Bartosz Golaszewski <[email protected]> wrote:

> Agreed and the big picture - just like with the reason behind the SRCU
> rework - is the fact that even static GPIO chips defined in ACPI or DT
> can be unbound. Unless you want to make the decision that we
> arbitrarily suppress_bind_attrs for all GPIO chips which I don't think
> you do.
>
> I have shown in the discussion under the previous iteration that a
> static GPIO chip defined in DT that is also marked as an
> interrupt-controller may have interrupts requested directly from its
> irq domain bypassing the .to_irq() callback. As long as this GPIO chip
> may be unbound (and we do not restrict this) it means the splat
> mentioned here can be triggered from user-space with a simple rmmod
> because a requested irq does not increase the module reference count
> nor do device links seem to work for interrupts not associated with a
> struct device explicitly.
>
> I DO want to fix it, don't get me wrong. I don't want to just leave it
> like this, especially since we've made so much progress with
> hotpluggability recently. I just don't believe this is the right fix,
> I will try to come up with a solution that addresses the issue
> globally.

OK I trust you to come up with something better for sure!

With regards to the ability to unbind/rebind drivers from sysfs, true.
I have heard about that as a counterargument to many things.

The problem is that I have never heard about a user unbinding/binding
a driver from sysfs for anything but debugging a drivers ability to
bind/unbind. Partly I feel that thing should just be moved
to debugfs given the usecase and because it just looks like a way for
attackers to provoke crashes given how some drivers look.

Yours,
Linus Walleij

2024-03-02 09:00:12

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Fri, Mar 1, 2024 at 9:15 PM Linus Walleij <[email protected]> wrote:
>
> On Fri, Mar 1, 2024 at 8:21 AM Bartosz Golaszewski <[email protected]> wrote:
>
> > Agreed and the big picture - just like with the reason behind the SRCU
> > rework - is the fact that even static GPIO chips defined in ACPI or DT
> > can be unbound. Unless you want to make the decision that we
> > arbitrarily suppress_bind_attrs for all GPIO chips which I don't think
> > you do.
> >
> > I have shown in the discussion under the previous iteration that a
> > static GPIO chip defined in DT that is also marked as an
> > interrupt-controller may have interrupts requested directly from its
> > irq domain bypassing the .to_irq() callback. As long as this GPIO chip
> > may be unbound (and we do not restrict this) it means the splat
> > mentioned here can be triggered from user-space with a simple rmmod
> > because a requested irq does not increase the module reference count
> > nor do device links seem to work for interrupts not associated with a
> > struct device explicitly.
> >
> > I DO want to fix it, don't get me wrong. I don't want to just leave it
> > like this, especially since we've made so much progress with
> > hotpluggability recently. I just don't believe this is the right fix,
> > I will try to come up with a solution that addresses the issue
> > globally.
>
> OK I trust you to come up with something better for sure!
>
> With regards to the ability to unbind/rebind drivers from sysfs, true.
> I have heard about that as a counterargument to many things.
>
> The problem is that I have never heard about a user unbinding/binding
> a driver from sysfs for anything but debugging a drivers ability to
> bind/unbind. Partly I feel that thing should just be moved
> to debugfs given the usecase and because it just looks like a way for
> attackers to provoke crashes given how some drivers look.
>
> Yours,
> Linus Walleij

That's not the only thing - device unbind can also be triggered by
removing the module providing the driver which is a completely normal
operation for user-space.

Bart

2024-03-03 08:48:32

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Sat, Mar 2, 2024 at 10:00 AM Bartosz Golaszewski <[email protected]> wrote:
> On Fri, Mar 1, 2024 at 9:15 PM Linus Walleij <[email protected]> wrote:

> > The problem is that I have never heard about a user unbinding/binding
> > a driver from sysfs for anything but debugging a drivers ability to
> > bind/unbind. Partly I feel that thing should just be moved
> > to debugfs given the usecase and because it just looks like a way for
> > attackers to provoke crashes given how some drivers look.
>
> That's not the only thing - device unbind can also be triggered by
> removing the module providing the driver which is a completely normal
> operation for user-space.

That is one thing, but every time I hear about "have you thought about
users going in and using bind/unbind in sysfs" it's for builtin bool
drivers.

I almost feel like bool drivers should have bind/unbind disabled by
default :/

The introduction of deferred probe has made the situation much better
because it tends to exercise the bind/unbind path a bit.

Yours,
Linus Walleij

2024-04-08 18:58:51

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal

On Mon, Apr 8, 2024 at 3:27 PM Herve Codina <[email protected]> wrote:
>
> Hi Bartosz,
>
> On Fri, 1 Mar 2024 08:21:09 +0100
> Bartosz Golaszewski <[email protected]> wrote:
>
> ...
> >
> > I DO want to fix it, don't get me wrong. I don't want to just leave it
> > like this, especially since we've made so much progress with
> > hotpluggability recently. I just don't believe this is the right fix,
> > I will try to come up with a solution that addresses the issue
> > globally.
> >
>
> I didn't see anything but I could miss something.
> Did you move forward on this topic ?
>

No, I'm currently spending almost 100% of my GPIO time on getting the
libgpiod DBus API ready and I should send the first version by the end
of this week. After that I'll be at EOSS and then on vacation, so the
earliest I will get to working on it is early May.

If you feel like giving it another shot - go for it. I believe the
device link problem I described previously[1] must be fixed first. I
would also consider a more generic solution in the interrupt subsystem
that would make it aware that interrupt controllers may be torn down
with interrupts still requested.

As I said: I will get to it but not in the following couple weeks.

Bart

[1] https://lore.kernel.org/lkml/CAMRc=Mf5fRWoOMsJ41vzvE=-vp3wi-Obw=j5fBk3DuQaZNQP2Q@mail.gmail.com/