2007-05-06 14:23:39

by Russell King

[permalink] [raw]
Subject: LED stuff - why bother with error handling

While reviewing 4359/2, I found the following issue. Let's remove all
the irrelevant lines of code to illustrate the problem.

void led_trigger_register_simple(const char *name, struct led_trigger **tp)
{
struct led_trigger *trigger;

trigger = kzalloc(sizeof(struct led_trigger), GFP_KERNEL);
...
*tp = trigger;
}

void led_trigger_unregister(struct led_trigger *trigger)
{
list_del(&trigger->next_trig);
}

The thing to note is that led_trigger_register_simple() can fail but it's
a void function. There's a very good school of thought which says that
functions which can fail should never have a return type of void.

However, let's continue. Looking at patch 4359/2 in the ARM patch
system:

+static int __init h1940bt_probe(struct platform_device *pdev)
+{
...
+#ifdef CONFIG_LEDS_H1940
+ led_trigger_register_simple("h1940-bluetooth", &bt_led_trigger);
+#endif
+ err = device_create_file(&pdev->dev, &dev_attr_enable);
+
+ return err;
+}
+
+static int h1940bt_remove(struct platform_device *pdev)
+{
+#ifdef CONFIG_LEDS_H1940
+ led_trigger_unregister_simple(bt_led_trigger);
+#endif
+ return 0;
+}

Notice the lack of checking to see if bt_led_trigger is NULL or not - if
the kzalloc fails, we happily return success from the probe function.
However, when the driver is removed, we call led_trigger_unregister_simple()
which consequently does a NULL dereference.

This seems to be an endemic problem to led_trigger_register_simple() - 100%
of the merged uses of this function suffer from the same issue, eg:

static int __init ledtrig_ide_init(void)
{
led_trigger_register_simple("ide-disk", &ledtrig_ide);
return 0;
}

static void __exit ledtrig_ide_exit(void)
{
led_trigger_unregister_simple(ledtrig_ide);
}

See also nand_base_init and sharpsl_pm_probe.

If we want people to detect errors then we must *not* return values by
reference. It stops people thinking about "what if this returns a non-
zero error value" or "what if it returns NULL?". Adding __must_check
gives extra persuasion to check the return value.

Can the LED trigger API be fixed please?

I'm not going to merge 4359/2 because it's just going to spread this
problem over a wider area.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:


2007-05-07 10:01:12

by Richard Purdie

[permalink] [raw]
Subject: Re: LED stuff - why bother with error handling

On Sun, 2007-05-06 at 15:23 +0100, Russell King wrote:
> While reviewing 4359/2, I found the following issue. Let's remove all
> the irrelevant lines of code to illustrate the problem.
>
> void led_trigger_register_simple(const char *name, struct led_trigger **tp)
> {
> struct led_trigger *trigger;
>
> trigger = kzalloc(sizeof(struct led_trigger), GFP_KERNEL);
> ...
> *tp = trigger;
> }
>
> void led_trigger_unregister(struct led_trigger *trigger)
> {
> list_del(&trigger->next_trig);
> }
>
> The thing to note is that led_trigger_register_simple() can fail but it's
> a void function. There's a very good school of thought which says that
> functions which can fail should never have a return type of void.

The simple triggers are often tagged onto other subsystems and were
designed to be minimally invasive. Since they were ancillary, it was
left that they didn't return error codes. Taking the IDE trigger as an
example, the failure of the trigger should not cause the IDE subsystem
to fail, if should just skip over the LED registration. The alternative
is every simple trigger site has an error check and a printk which
seemed a bit pointless.

Note that you can still check whether it succeeded by looking at the
trigger pointer if you really need to - I've never seen a use case that
did though.

Moving to the problem you raised, I thought all the code paths a simple
trigger could end up on checked for NULL and/or were NULL safe. The
led_trigger_unregister_simple function has a missing NULL check which
I'll fix.

> If we want people to detect errors then we must *not* return values by
> reference. It stops people thinking about "what if this returns a non-
> zero error value" or "what if it returns NULL?". Adding __must_check
> gives extra persuasion to check the return value.
>
> Can the LED trigger API be fixed please?

I agree the failure shouldn't be silent as it currently is and therefore
led_trigger_register_simple() should have some warning printks added so
the user can tell if it failed to register. I think they belong in the
core rather than at every call site though for the reasons I mention
above.

Regards,

Richard