2012-10-02 13:03:29

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Hi Greg,

Em Mon, 25 Jun 2012 22:55:41 -0300
Mauro Carvalho Chehab <[email protected]> escreveu:

> Em 25-06-2012 19:33, Greg KH escreveu:
> > On Mon, Jun 25, 2012 at 05:49:25PM -0300, Mauro Carvalho Chehab wrote:
> >> Greg,
> >>
> >> Basically, the recent changes at request_firmware() exposed an issue that
> >> affects all media drivers that use firmware (64 drivers).
> >
> > What change was that? How did it break anything?
>
> https://bugzilla.redhat.com/show_bug.cgi?id=827538
>
> Basically, userspace changes and some pm-related patches, ...
...

As I pinged you back in June, the recent changes on udev made lots of regression
on all media drivers that require firmware. The news is that the fixes are also
causing more regressions...

Btw, Linus also complained about that:
https://lkml.org/lkml/2012/8/25/136

I basically tried a few different approaches, including deferred probe(),
as you suggested, and request_firmware_async(), as Kay suggested.

The deferred probe() doesn't work as-is, as it will re-probe the driver
only if a new driver is modprobed. It should likely not be that hard to modify
the drivers core to allow it to re-probe after returning the probe status to
udev, but I failed to find a way of doing that (mostly because of the lack of
time, as I'm being too busy those days).

I tried a few other things, like using request_firmware_async() at a few
drivers that require firmware.

That caused a big regression on Kernel 3.6 that we're currently discussing
how to fix, but basically, media devices are very complex, as they have lots
of different blocks there: tuners, analog video demod, digital video demod,
audio demod, etc. On most cases, each of those "extra" blocks are implemented
via I2C, and the device initialization happens serialized.

By letting an I2C driver to do asynchronous initialization, other dependent
drivers broke, because there are some signals used by the other blocks that
are dependent on a proper initialization of a block previously initialized.

One of the specific case is a device very popular in Europe nowadays: the
PCTV 520e. This device uses an em28xx USB bridge chip, a tda18271 i2c tuner
and a drx-k DVB i2c demod (among other things). The DRX-K chipset requires a
firmware; the other chipsets there don't.

The drx-k driver is used in combination with several other drivers, like
em28xx, az6027, xc5000, tda18271, tda18271dd, etc. On my tests, with
the devices I have available (Terratec H5, H6, H7, HTC; Hauppauge HVR 530C),
the usage of request_firmware_async() worked as expected.

So, such patch that made DRX-K firmware load to be deferred were applied at
Kernel 3.6 on this changeset:

commit bd02dbcd008f92135b2c7a92b6
Author: Mauro Carvalho Chehab <[email protected]>
Date: Thu Jun 21 09:36:38 2012 -0300

[media] drxk: change it to use request_firmware_nowait()

The firmware blob may not be available when the driver probes.

Instead of blocking the whole kernel use request_firmware_nowait() and
continue without firmware.

This shouldn't be that bad on drx-k devices, as they all seem to have an
internal firmware. So, only the firmware update will take a little longer
to happen.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

It should be noticed that, for the DVB demodulation to work, the DVB demod
should be attached with the tuner module, via this code snippet[1]:

fe = drxk_attach(&pctv_520e_drxk_cfg, &i2c_adap);
tda18271_attach(fe, i2c_tuner_address, &i2c_adap, &pctv_520e_tda18271_cfg);

The evil are in the details: the DRX-K chip produces some clocks needed by
the tuner, and it has some generic GPIO pins that allow to control random
things, with are device-dependent. Also, there is an I2C switch at the board,
used to control I2C access to the tuner: when the switch is off, the tuner is
not visible at the I2C bus[2].

The tda18271 tuner driver needs to read one device register, in order to
identify the chipset version, as there are a few different setups there,
depending if the silicon is version 1 or 2. As this driver doesn't require
any firmware, such read happens at driver's attach time, with should be fine.

Well, before the patch:

1) drxk_attach() would synchronously read the firmware and initialize the
device, setting the needed GPIO pins and clock signals;

2) tda18271_attach() would read the version ID register and initialize the
device.

However, after the patch, drxk_attach() does nothing but filling some internal data
structures and defer the device init job to happen after the firmware load.

So, when tda18271_attach() is called, there wasn't enough time yet to initialize
the DRX-K device. The end result is that the tda18271 version is not identified
and the I2C driver read fails:

tda18271_read_regs: [5-0060|M] ERROR: i2c_transfer returned: -19
Unknown device (16) detected @ 5-0060, device not supported.

Ok, there are lots of ways to fix it, but I suspect that we'll just push the
problem to happen on some place else.

The fact is that coordinating the initialization between the several parts of
those devices is already complex enough when done serialized; doing it
asynchronously will make the initialization code complex and won't bring any
benefit, as the I2C bus will serialize the initialization anyway.

Also, this is just one of the 495 media drivers. Several of them require firmware
during probe() and are currently broken. Fixing all of them will likely require
years of hard work. So, we need to do something else.

For Kernel 3.6, we'll likely apply some quick hack.

However, for 3.7 or 3.8, I think that the better is to revert changeset 177bc7dade38b5
and to stop with udev's insanity of requiring asynchronous firmware load during
device driver initialization. If udev's developers are not willing to do that,
we'll likely need to add something at the drivers core to trick udev for it to
think that the modules got probed before the probe actually happens.

What do you think?

[1] This is a simplified version of the code: I removed a macro and the error
logic from it, to make the code cleaner for reading.

[2] The I2C switch is there to prevent I2C traffic to generate noise that might
interfere at the tuner functions.

Thanks,
Mauro


2012-10-02 16:33:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 2, 2012 at 6:03 AM, Mauro Carvalho Chehab
<[email protected]> wrote:
>
> I basically tried a few different approaches, including deferred probe(),
> as you suggested, and request_firmware_async(), as Kay suggested.

Stop this crazy. FIX UDEV ALREADY, DAMMIT.

Who maintains udev these days? Is it Lennart/Kai, as part of systemd?

Lennart/Kai, fix the udev regression already. Lennart was the one who
brought up kernel ABI regressions at some conference, and if you now
you have the *gall* to break udev in an incompatible manner that
requires basically impossible kernel changes for the kernel to "fix"
the udev interface, I don't know what to say.

"Two-faced lying weasel" would be the most polite thing I could say.
But it almost certainly will involve a lot of cursing.

> However, for 3.7 or 3.8, I think that the better is to revert changeset 177bc7dade38b5
> and to stop with udev's insanity of requiring asynchronous firmware load during
> device driver initialization. If udev's developers are not willing to do that,
> we'll likely need to add something at the drivers core to trick udev for it to
> think that the modules got probed before the probe actually happens.

The fact is, udev made new - and insane - rules that are simply
*invalid*. Modern udev is broken, and needs to be fixed.

I don't know where the problem started in udev, but the report I saw
was that udev175 was fine, and udev182 was broken, and would deadlock
if module_init() did a request_firmware(). That kind of nested
behavior is absolutely *required* to work, in order to not cause
idiotic problems for the kernel for no good reason.

What kind of insane udev maintainership do we have? And can we fix it?

Greg, I think you need to step up here too. You were the one who let
udev go. If the new maintainers are causing problems, they need to be
fixed some way.

Linus

2012-10-02 21:03:52

by Ivan Kalvachev

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On 10/2/12, Linus Torvalds <[email protected]> wrote:
> On Tue, Oct 2, 2012 at 6:03 AM, Mauro Carvalho Chehab
> <[email protected]> wrote:
>>
>> I basically tried a few different approaches, including deferred probe(),
>> as you suggested, and request_firmware_async(), as Kay suggested.
>
> Stop this crazy. FIX UDEV ALREADY, DAMMIT.
>
> Who maintains udev these days? Is it Lennart/Kai, as part of systemd?
>
> Lennart/Kai, fix the udev regression already. Lennart was the one who
> brought up kernel ABI regressions at some conference, and if you now
> you have the *gall* to break udev in an incompatible manner that
> requires basically impossible kernel changes for the kernel to "fix"
> the udev interface, I don't know what to say.
>
> "Two-faced lying weasel" would be the most polite thing I could say.
> But it almost certainly will involve a lot of cursing.
>
>> However, for 3.7 or 3.8, I think that the better is to revert changeset
>> 177bc7dade38b5
>> and to stop with udev's insanity of requiring asynchronous firmware load
>> during
>> device driver initialization. If udev's developers are not willing to do
>> that,
>> we'll likely need to add something at the drivers core to trick udev for
>> it to
>> think that the modules got probed before the probe actually happens.
>
> The fact is, udev made new - and insane - rules that are simply
> *invalid*. Modern udev is broken, and needs to be fixed.
>
> I don't know where the problem started in udev, but the report I saw
> was that udev175 was fine, and udev182 was broken, and would deadlock
> if module_init() did a request_firmware(). That kind of nested
> behavior is absolutely *required* to work, in order to not cause
> idiotic problems for the kernel for no good reason.
>
> What kind of insane udev maintainership do we have? And can we fix it?
>
> Greg, I think you need to step up here too. You were the one who let
> udev go. If the new maintainers are causing problems, they need to be
> fixed some way.

I'm not kernel developer and probably my opinion would be a little
naive, but here it is.


Please, make the kernel load firmware from the filesystem on its own.


This should solve almost 99.9% of the problems related to firmware loading.

I don't mind if there is still userland component that could be used
to request a firmware from repository. You can even keep the udev
userland piping as a fallback if you want, but I think you can
simplify a lot of code if you phase it out. The firmware loading
should follow the same concept as modules loading.

I've heard that the udev userland piping of firmware is done to avoid
some licensing issues. But honestly, if you can not store the firmware
on the user's disk, no free operating system should support it at all.

This piping thing have been feature creep that have metastasized all
over the kernel while keeping a lot of obscure modules broken (in hard
to find way) and requiring increasingly complicated schemes to
workaround its flaws.

KiSS.

Best Regards
Ivan Kalvachev

2012-10-02 22:12:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 02, 2012 at 09:33:03AM -0700, Linus Torvalds wrote:
> I don't know where the problem started in udev, but the report I saw
> was that udev175 was fine, and udev182 was broken, and would deadlock
> if module_init() did a request_firmware(). That kind of nested
> behavior is absolutely *required* to work, in order to not cause
> idiotic problems for the kernel for no good reason.
>
> What kind of insane udev maintainership do we have? And can we fix it?
>
> Greg, I think you need to step up here too. You were the one who let
> udev go. If the new maintainers are causing problems, they need to be
> fixed some way.

I've talked about this with Kay in the past (Plumbers conference I
think) and I thought he said it was all fixed in the latest version of
udev so there shouldn't be any problems anymore with this.

Mauro, what version of udev are you using that is still showing this
issue?

Kay, didn't you resolve this already? If not, what was the reason why?

thanks,

greg k-h

2012-10-02 22:23:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 02, 2012 at 03:12:39PM -0700, Greg KH wrote:
> On Tue, Oct 02, 2012 at 09:33:03AM -0700, Linus Torvalds wrote:
> > I don't know where the problem started in udev, but the report I saw
> > was that udev175 was fine, and udev182 was broken, and would deadlock
> > if module_init() did a request_firmware(). That kind of nested
> > behavior is absolutely *required* to work, in order to not cause
> > idiotic problems for the kernel for no good reason.
> >
> > What kind of insane udev maintainership do we have? And can we fix it?
> >
> > Greg, I think you need to step up here too. You were the one who let
> > udev go. If the new maintainers are causing problems, they need to be
> > fixed some way.
>
> I've talked about this with Kay in the past (Plumbers conference I
> think) and I thought he said it was all fixed in the latest version of
> udev so there shouldn't be any problems anymore with this.
>
> Mauro, what version of udev are you using that is still showing this
> issue?
>
> Kay, didn't you resolve this already? If not, what was the reason why?

Hm, in digging through the udev tree, the only change I found was this
one:

commit 39177382a4f92a834b568d6ae5d750eb2a5a86f9
Author: Kay Sievers <[email protected]>
Date: Thu Jul 19 12:32:24 2012 +0200

udev: firmware - do not cancel requests in the initrd

diff --git a/src/udev/udev-builtin-firmware.c b/src/udev/udev-builtin-firmware.c
index 56dc8fc..de93d7b 100644
--- a/src/udev/udev-builtin-firmware.c
+++ b/src/udev/udev-builtin-firmware.c
@@ -129,7 +129,13 @@ static int builtin_firmware(struct udev_device *dev, int argc, char *argv[], boo
err = -errno;
} while (err == -ENOENT);
rc = EXIT_FAILURE;
- set_loading(udev, loadpath, "-1");
+ /*
+ * Do not cancel the request in the initrd, the real root might have
+ * the firmware file and the 'coldplug' run in the real root will find
+ * this pending request and fulfill or cancel it.
+ * */
+ if (!in_initrd())
+ set_loading(udev, loadpath, "-1");
goto exit;
}


which went into udev release 187 which I think corresponds to the place
when people started having problems, right Mauro?

If so, Mauro, is the solution just putting the firmware into the initrd?
No wait, it looks like this change was trying to fix the problem where
firmware files were not in the initrd, so it would stick around for the
real root to show up so that they could be loaded. So this looks like
it was fixing firmware loading problems for people?

Kay, am I just looking at the totally wrong place here, and this file in
udev didn't have anything to do with the breakage?

thanks,

greg k-h

2012-10-02 22:38:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 2, 2012 at 2:03 PM, Ivan Kalvachev <[email protected]> wrote:
>
> I'm not kernel developer and probably my opinion would be a little
> naive, but here it is.
>
> Please, make the kernel load firmware from the filesystem on its own.

We probably should do that, not just for firmware, but for modules
too. It would also simplify the whole "built-in firmware" thing

Afaik, the only thing udev really does is to lok in
/lib/firmware/updates and /lib/firmware for the file, load it, and
pass it back to the kernel. We could make the kernel try to do it
manually first, and only fall back to udev if that fails.

Afaik, nobody ever does anything else anyway.

I'd prefer to not have to do that, but if the udev code is buggy or
the maintainership is flaky, I guess we don't really have much choice.

Doing the same thing for module loading is probably a good idea too.
There were already people (from the google/Android camp) who wanted to
do module loading based on filename rather than the buffer passed to
it, because they have a "I trust this filesystem" model.

> I've heard that the udev userland piping of firmware is done to avoid
> some licensing issues.

No, I think it was mainly a combination of

- some people like the whole "let's do things in user land" model
even when it makes things more complicated

- we do tend to try to punt "policy" issues to user space, and the
whole "/lib/firmware" location is an example of such a policy issue.

along with the fact that we already had the hotplug model for these
kinds of things (eg module loading used to actually have a big user
space component that did the whole relocation etc, so we had real
historical reasons to do that in user space)

Does anybody want to try to cook up a patch, leaving the udev path as
a fallback?

We already have the case of "builtin firmware" as one option, this
would go after that..

Linus

2012-10-02 22:47:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 2, 2012 at 3:23 PM, Greg KH <[email protected]> wrote:
>
> which went into udev release 187 which I think corresponds to the place
> when people started having problems, right Mauro?

According to what I've seen, people started complaining in 182, not 187.

See for example

http://patchwork.linuxtv.org/patch/13085/

which is a thread where you were involved too..

See also the arch linux thread:

https://bbs.archlinux.org/viewtopic.php?id=134012&p=1

And see this email from Kay Sievers that shows that it was all known
about and intentional in the udev camp:

http://www.spinics.net/lists/netdev/msg185742.html

There's a possible patch suggested here:

http://lists.freedesktop.org/archives/systemd-devel/2012-August/006357.html

but quite frankly, I am leery of the fact that the udev maintenance
seems to have gone into some "crazy mode" where they have made changes
that were known to be problematic, and are pure and utter stupidity.

Having the module init path load the firmware IS THE SENSIBLE AND
OBVIOUS THING TO DO for many cases. The fact that udev people have
apparently unilaterally decided that it's somehow wrong is scary.

Linus

2012-10-03 00:01:54

by Jiri Kosina

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, 2 Oct 2012, Linus Torvalds wrote:

> And see this email from Kay Sievers that shows that it was all known
> about and intentional in the udev camp:
>
> http://www.spinics.net/lists/netdev/msg185742.html

This seems confusing indeed.

That e-mail referenced above is talking about loading firmware at ifup
time. While that might work for network device drivers (I am not sure even
about that), what are the udev maintainers advice for other drivers, where
there is no analogy to ifup?

--
Jiri Kosina
SUSE Labs

2012-10-03 00:13:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 2, 2012 at 5:01 PM, Jiri Kosina <[email protected]> wrote:
> On Tue, 2 Oct 2012, Linus Torvalds wrote:
>
>> And see this email from Kay Sievers that shows that it was all known
>> about and intentional in the udev camp:
>>
>> http://www.spinics.net/lists/netdev/msg185742.html
>
> This seems confusing indeed.
>
> That e-mail referenced above is talking about loading firmware at ifup
> time. While that might work for network device drivers (I am not sure even
> about that), what are the udev maintainers advice for other drivers, where
> there is no analogy to ifup?

Yeah, it's an udev bug. It really is that simple.

This is why I'm complaining. There's no way in hell we're fixing this
in kernel space, unless we call the "bypass udev entirely because the
maintainership quality of it has taken a nose dive". Yes, I've seen
some work-around patches, but quite frankly, I think it would be
absolutely insane for the kernel to work around the fact that udev is
buggy.

The fact is, doing request_firmware() from within module_init() is
simply the easiest approach for some devices.

Now, at the same time, I do agree that network devices should
generally try to delay it until ifup time, so I'm not arguing against
that part per se. I do think that when possible, people should aim to
delay firmware loading until as late as reasonable.

But as you point out, it's simply not always reasonable, and the media
people are clearly hitting the cases where it's just painful. Now,
those cases seem to be happily fairly *rare*, so this isn't getting a
ton of attention, but we should fix it.

Because the udev behavior is all pain, no gain. There's no *reason*
for udev to be pissy about this. And it didn't use to be.

Linus

2012-10-03 14:13:05

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Em 02-10-2012 19:23, Greg KH escreveu:
> On Tue, Oct 02, 2012 at 03:12:39PM -0700, Greg KH wrote:
>> On Tue, Oct 02, 2012 at 09:33:03AM -0700, Linus Torvalds wrote:
>>> I don't know where the problem started in udev, but the report I saw
>>> was that udev175 was fine, and udev182 was broken, and would deadlock
>>> if module_init() did a request_firmware(). That kind of nested
>>> behavior is absolutely *required* to work, in order to not cause
>>> idiotic problems for the kernel for no good reason.
>>>
>>> What kind of insane udev maintainership do we have? And can we fix it?
>>>
>>> Greg, I think you need to step up here too. You were the one who let
>>> udev go. If the new maintainers are causing problems, they need to be
>>> fixed some way.
>>
>> I've talked about this with Kay in the past (Plumbers conference I
>> think) and I thought he said it was all fixed in the latest version of
>> udev so there shouldn't be any problems anymore with this.
>>
>> Mauro, what version of udev are you using that is still showing this
>> issue?
>>
>> Kay, didn't you resolve this already? If not, what was the reason why?
>
> Hm, in digging through the udev tree, the only change I found was this
> one:
>
> commit 39177382a4f92a834b568d6ae5d750eb2a5a86f9
> Author: Kay Sievers <[email protected]>
> Date: Thu Jul 19 12:32:24 2012 +0200
>
> udev: firmware - do not cancel requests in the initrd
>
> diff --git a/src/udev/udev-builtin-firmware.c b/src/udev/udev-builtin-firmware.c
> index 56dc8fc..de93d7b 100644
> --- a/src/udev/udev-builtin-firmware.c
> +++ b/src/udev/udev-builtin-firmware.c
> @@ -129,7 +129,13 @@ static int builtin_firmware(struct udev_device *dev, int argc, char *argv[], boo
> err = -errno;
> } while (err == -ENOENT);
> rc = EXIT_FAILURE;
> - set_loading(udev, loadpath, "-1");
> + /*
> + * Do not cancel the request in the initrd, the real root might have
> + * the firmware file and the 'coldplug' run in the real root will find
> + * this pending request and fulfill or cancel it.
> + * */
> + if (!in_initrd())
> + set_loading(udev, loadpath, "-1");
> goto exit;
> }
>
>
> which went into udev release 187 which I think corresponds to the place
> when people started having problems, right Mauro?

I'm using here udev-182.

> If so, Mauro, is the solution just putting the firmware into the initrd?

I don't think that putting firmware on initrd is something that we want to
for media drivers. None of the webcam drivers currently need a firmware;
those are required on more complex devices (typically digital TV ones).

While there are a number of PCI devices that require firmware, in practice,
we're seeing more people using USB devices. IMO, it doesn't make any sense
that a hot-pluggable USB device to require a firmware at initrd/initramfs,
with is available only at boot time.

> No wait, it looks like this change was trying to fix the problem where
> firmware files were not in the initrd, so it would stick around for the
> real root to show up so that they could be loaded.
>
> So this looks like it was fixing firmware loading problems for people?

I'll run some tests with this patch applied and see what happens.

> Kay, am I just looking at the totally wrong place here, and this file in
> udev didn't have anything to do with the breakage?
>
> thanks,
>
> greg k-h
>

2012-10-03 14:37:18

by Kay Sievers

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 12:12 AM, Greg KH <[email protected]> wrote:

> Mauro, what version of udev are you using that is still showing this
> issue?
>
> Kay, didn't you resolve this already? If not, what was the reason why?

It's the same in the current release, we still haven't wrapped our
head around how to fix it/work around it.

Unlike what the heated and pretty uncivilized and rude emails here
claim, udev does not dead-lock or "break" things, it's just "slow".
The modprobe event handling runs into a ~30 second event timeout.
Everything is still fully functional though, there's only this delay.

Udev ensures full dependency resolution between parent and child
events. Parent events have to finish the event handling and have to
return, before child event handlers are started. We need to ensure
such things so that (among other things) disk events have finished
their operations before the partition events are started, so they can
rely and access their fully set up parent devices.

What happens here is that the module_init() call blocks in a userspace
transaction, creating a child event that is not started until the
parent event has finished. The event handler for modprobe times out
then the child event loads the firmware.

Having kernel module relying on a running and fully functional
userspace to return from module_init() is surely a broken driver
model, at least it's not how things should work. If userspace does not
respond to firmware requests, module_init() locks up until the
firmware timeout happens.

This all is not so much about how probe() should behave, it's about a
fragile dependency on a specific userspace transaction to link a
loadable module into the kernel. Drivers should avoid such loops for
many reasons. Also, it's unclear in many cases how such a model should
work at all if the module is compiled in and initialized when no
userspace is running.

If that unfortunate module_init() lockup can't be solved properly in
the kernel, we need to find out if we need to make the modprobe
handling in udev async, or let firmware events bypass dependency
resolving. As mentioned, we haven't decided as of now which road to
take here.

Thanks,
Kay

2012-10-03 14:44:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 7:36 AM, Kay Sievers <[email protected]> wrote:
>
> If that unfortunate module_init() lockup can't be solved properly in
> the kernel

Stop this idiocy.

The kernel doesn't have a lockup problem. udev does.

As even you admit, it is *udev* that has the whole serialization
issue, and does excessive (and incorrect) serialization between
events. Resulting in the kernel timing out a udev event that takes too
long.

The fact that you then continually try to make this a "kernel issue"
is just disgusting. Talking about the kernel solving it "properly" is
pretty dishonest, when the kernel wasn't the problem in the first
place. The kernel not only does things right, but this all used to
work fine.

Linus

2012-10-03 15:15:58

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Em 02-10-2012 19:47, Linus Torvalds escreveu:
> On Tue, Oct 2, 2012 at 3:23 PM, Greg KH <[email protected]> wrote:
>>
>> which went into udev release 187 which I think corresponds to the place
>> when people started having problems, right Mauro?
>
> According to what I've seen, people started complaining in 182, not 187.

Yes. The issue was noticed with media drivers when people started using the
drivers on Fedora 17, witch came with udev-182. There's an open
bugzilla there:
https://bugzilla.redhat.com/show_bug.cgi?id=827538

> See for example
>
> http://patchwork.linuxtv.org/patch/13085/
>
> which is a thread where you were involved too..
>
> See also the arch linux thread:
>
> https://bbs.archlinux.org/viewtopic.php?id=134012&p=1
>
> And see this email from Kay Sievers that shows that it was all known
> about and intentional in the udev camp:
>
> http://www.spinics.net/lists/netdev/msg185742.html
>
> There's a possible patch suggested here:
>
> http://lists.freedesktop.org/archives/systemd-devel/2012-August/006357.html
>
> but quite frankly, I am leery of the fact that the udev maintenance
> seems to have gone into some "crazy mode" where they have made changes
> that were known to be problematic, and are pure and utter stupidity.
>
> Having the module init path load the firmware IS THE SENSIBLE AND
> OBVIOUS THING TO DO for many cases.

Yes, that is the case for most media devices. Some devices can only be
detected as a supported device after the firmware load, as we need the
firmware for the USB (or PCI) bridge to be there, in order to talk with
the media components under the board's internal I2C bus, as sometimes
the same USB/PCI ID is used by boards with different internal components.

> The fact that udev people have
> apparently unilaterally decided that it's somehow wrong is scary.
>

Thanks,
Mauro

2012-10-03 16:39:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 8:13 AM, Mauro Carvalho Chehab
<[email protected]> wrote:
>
> Yes. The issue was noticed with media drivers when people started using the
> drivers on Fedora 17, witch came with udev-182. There's an open
> bugzilla there:
> https://bugzilla.redhat.com/show_bug.cgi?id=827538

Yeah, that bugzilla shows the problem with Kay as a maintainer too,
not willing to own up to problems he caused.

Can you actually see the problem? I did add the attached patch as an
attachment to the bugzilla, so the reporter there may be able to test
it, but it's been open for a long while..

Anyway. Attached is a really stupid patch that tries to do the "direct
firmware load" as suggested by Ivan. It has not been tested very
extensively at all (but I did test that it loaded the brcmsmac
firmware images on my laptop so it has the *potential* to work).

It has a few extra printk's sprinkled in (to show whether it does
anything or not), and it has a few other issues, but it might be worth
testing as a starting point.

We are apparently better off trying to avoid udev like the plague.
Doing something very similar to this for module loading is probably a
good idea too.

I'm adding Ming Lei to the participants too, because hooking into the
firmware loader like this means that the image doesn't get cached.
Which is sad. I'm hoping Ming Lei might be open to trying to fix that.

Hmm?

Linus


Attachments:
patch.diff (2.32 kB)

2012-10-03 16:57:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 04:36:53PM +0200, Kay Sievers wrote:
> On Wed, Oct 3, 2012 at 12:12 AM, Greg KH <[email protected]> wrote:
>
> > Mauro, what version of udev are you using that is still showing this
> > issue?
> >
> > Kay, didn't you resolve this already? If not, what was the reason why?
>
> It's the same in the current release, we still haven't wrapped our
> head around how to fix it/work around it.

Ick, as this is breaking people's previously-working machines, shouldn't
this be resolved quickly?

> Unlike what the heated and pretty uncivilized and rude emails here
> claim, udev does not dead-lock or "break" things, it's just "slow".
> The modprobe event handling runs into a ~30 second event timeout.
> Everything is still fully functional though, there's only this delay.

Mauro said it broke the video drivers. Mauro, if you wait 30 seconds,
does everything then "work"?

Not to say that waiting 30 seconds is a correct thing here...

> Udev ensures full dependency resolution between parent and child
> events. Parent events have to finish the event handling and have to
> return, before child event handlers are started. We need to ensure
> such things so that (among other things) disk events have finished
> their operations before the partition events are started, so they can
> rely and access their fully set up parent devices.
>
> What happens here is that the module_init() call blocks in a userspace
> transaction, creating a child event that is not started until the
> parent event has finished. The event handler for modprobe times out
> then the child event loads the firmware.

module_init() can do lots of "bad" things, sleeping, asking for
firmware, and lots of other things. To have userspace block because of
this doesn't seem very wise.

> Having kernel module relying on a running and fully functional
> userspace to return from module_init() is surely a broken driver
> model, at least it's not how things should work. If userspace does not
> respond to firmware requests, module_init() locks up until the
> firmware timeout happens.

But previously this all "just worked" as we ran 'modprobe' in a new
thread/process right? What's wrong with going back to just execing
modprobe and letting that process go off and do what ever it wants to
do? It can't be that "expensive" as modprobe is a very slow thing, and
it should solve this issue. udev will then have handled the 'a device
has shown up, run modprobe' event in the correct order, and then
anything else that the module_init() process wants to do, it can do
without worrying about stopping anything else in the system that might
want to happen at the same time (like load multiple modules in a row).

> This all is not so much about how probe() should behave, it's about a
> fragile dependency on a specific userspace transaction to link a
> loadable module into the kernel. Drivers should avoid such loops for
> many reasons. Also, it's unclear in many cases how such a model should
> work at all if the module is compiled in and initialized when no
> userspace is running.
>
> If that unfortunate module_init() lockup can't be solved properly in
> the kernel, we need to find out if we need to make the modprobe
> handling in udev async, or let firmware events bypass dependency
> resolving. As mentioned, we haven't decided as of now which road to
> take here.

It's not a lockup, there have never been rules about what a driver could
and could not do in its module_init() function. Sure, there are some
not-nice drivers out there, but don't halt the whole system just because
of them.

I recommend making module loading async, like it used to be, and then
all should be fine, right?

That's also the way the mdev works, and I don't think that people have
been having problems there. :)

thanks,

greg k-h

2012-10-03 17:01:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 9:38 AM, Linus Torvalds
<[email protected]> wrote:
>
> Anyway. Attached is a really stupid patch that tries to do the "direct
> firmware load" as suggested by Ivan. It has not been tested very
> extensively at all (but I did test that it loaded the brcmsmac
> firmware images on my laptop so it has the *potential* to work).

Oh, and I stupidly put the new functions next to the builtin firmware
loading function, which means that the patch only works if you have
CONFIG_FW_LOADER enabled.

That's bogus, and the functions should be moved out of that #ifdef,
but I don't think it should hurt testing.

Linus

2012-10-03 17:09:16

by Al Viro

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 09:38:52AM -0700, Linus Torvalds wrote:
> Yeah, that bugzilla shows the problem with Kay as a maintainer too,
> not willing to own up to problems he caused.
>
> Can you actually see the problem? I did add the attached patch as an
> attachment to the bugzilla, so the reporter there may be able to test
> it, but it's been open for a long while..
>
> Anyway. Attached is a really stupid patch that tries to do the "direct
> firmware load" as suggested by Ivan. It has not been tested very
> extensively at all (but I did test that it loaded the brcmsmac
> firmware images on my laptop so it has the *potential* to work).

+ if (!S_ISREG(inode->i_mode))
+ return false;
+ size = i_size_read(inode);

Probably better to do vfs_getattr() and check mode and size in kstat; if
it's sufficiently hot for that to hurt, we are fucked anyway.

+ file = filp_open(path, O_RDONLY, 0);
+ if (IS_ERR(file))
+ continue;
+printk("from file '%s' ", path);
+ success = fw_read_file_contents(file, fw);
+ filp_close(file, NULL);

fput(file), please. We have enough misuses of filp_close() as it is...

> We are apparently better off trying to avoid udev like the plague.
> Doing something very similar to this for module loading is probably a
> good idea too.

That, or just adding usr/udev in the kernel git tree and telling the
vertical integrators to go kiss a lamprey...

2012-10-03 17:24:57

by Kay Sievers

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 6:57 PM, Greg KH <[email protected]> wrote:

>> It's the same in the current release, we still haven't wrapped our
>> head around how to fix it/work around it.
>
> Ick, as this is breaking people's previously-working machines, shouldn't
> this be resolved quickly?

Nothing really "breaks", It's "slow" and it will surely be fixed when
we know what's the right fix, which we haven't sorted out at this
moment.

> module_init() can do lots of "bad" things, sleeping, asking for
> firmware, and lots of other things. To have userspace block because of
> this doesn't seem very wise.

Not saying that it is right or nice, but it's the kernel itself that
blocks. Run init=/bin/sh and do a modprobe of one of these drivers and
it hangs un-interruptible until the kernel's internal firmware loading
request times out, just because userspace is not there.

> But previously this all "just worked" as we ran 'modprobe' in a new
> thread/process right?

No, we used to un-queue events which had a timeout specified in the
environment, that code caused other issues and was removed.

> it can do without worrying about stopping anything else in the system that might
> want to happen at the same time (like load multiple modules in a row).

It should not be an issue, the serialization is strictly parent <->
child, everything else runs in parallel.

>> If that unfortunate module_init() lockup can't be solved properly in
>> the kernel, we need to find out if we need to make the modprobe
>> handling in udev async, or let firmware events bypass dependency
>> resolving. As mentioned, we haven't decided as of now which road to
>> take here.
>
> It's not a lockup, there have never been rules about what a driver could
> and could not do in its module_init() function. Sure, there are some
> not-nice drivers out there, but don't halt the whole system just because
> of them.

It is a kind of lock up, just try modprobe with the init=/bin/sh boot.

> I recommend making module loading async, like it used to be, and then
> all should be fine, right?

That's the current idea, and Tom is looking into it how it could look like.

I also have no issues at all if the kernel does load the firmware from
the filesystem on its own; it sounds like the simplest and most robust
solution from a general look at the problem. It would also make the
difference between in-kernel firmware and out-of-kernel firmware less
visible, which sounds good.
Honestly, requiring firmware-loading userspace-transactions to
successfully link a module into the kernel sounds like a pretty bad
idea to start with. Unlike module loading which needs the depmod alias
database and userspace configuration; with firmware loading, there is
no policy involved where userspace would add any single additional
value to that step.

Thanks,
Kay

2012-10-03 17:32:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 10:09 AM, Al Viro <[email protected]> wrote:
>
> + if (!S_ISREG(inode->i_mode))
> + return false;
> + size = i_size_read(inode);
>
> Probably better to do vfs_getattr() and check mode and size in kstat; if
> it's sufficiently hot for that to hurt, we are fucked anyway.
>
> + file = filp_open(path, O_RDONLY, 0);
> + if (IS_ERR(file))
> + continue;
> +printk("from file '%s' ", path);
> + success = fw_read_file_contents(file, fw);
> + filp_close(file, NULL);
>
> fput(file), please. We have enough misuses of filp_close() as it is...

Ok, like this?

Linus


Attachments:
patch.diff (2.75 kB)

2012-10-03 18:07:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 10:24 AM, Kay Sievers <[email protected]> wrote:
>
> Nothing really "breaks", It's "slow" and it will surely be fixed when
> we know what's the right fix, which we haven't sorted out at this
> moment.

A thirty-second pause at bootup is easily long enough that some people
might think the machine is hung.

I also call bullshit on your "it will surely be fixed when we know
what's the right fix" excuses.

The fact is, you've spent the last several months blaming everybody
but yourself, and actively told people to stop blaming you:

https://bugzilla.redhat.com/show_bug.cgi?id=827538#c12

and have ignored patches that were sent to you:

http://lists.freedesktop.org/archives/systemd-devel/2012-August/006357.html

despite having clearly seen the patch (you *replied* to it, for
chissake, and I even told you in that same thread why that reply was
wrong at the time).

> I also have no issues at all if the kernel does load the firmware from
> the filesystem on its own; it sounds like the simplest and most robust
> solution from a general look at the problem. It would also make the
> difference between in-kernel firmware and out-of-kernel firmware less
> visible, which sounds good.

So now, after you've dismissed the patch that did the equivalent fix
in udev (Ming Lei's patch basically disabled your idiotic and wrong
sequence number test for firmware loading), you say it's ok to bypass
udev entirely, because that is "more robust".

Kay, you are so full of sh*t that it's not funny. You're refusing to
acknowledge your bugs, you refuse to fix them even when a patch is
sent to you, and then you make excuses for the fact that we have to
work around *your* bugs, and say that we should have done so from the
very beginning.

Yes, doing it in the kernel is "more robust". But don't play games,
and stop the lying. It's more robust because we have maintainers that
care, and because we know that regressions are not something we can
play fast and loose with. If something breaks, and we don't know what
the right fix for that breakage is, we *revert* the thing that broke.

So yes, we're clearly better off doing it in the kernel.

Not because firmware loading cannot be done in user space. But simply
because udev maintenance since Greg gave it up has gone downhill.

Linus

2012-10-03 19:26:57

by Al Viro

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 10:32:08AM -0700, Linus Torvalds wrote:
> On Wed, Oct 3, 2012 at 10:09 AM, Al Viro <[email protected]> wrote:
> >
> > + if (!S_ISREG(inode->i_mode))
> > + return false;
> > + size = i_size_read(inode);
> >
> > Probably better to do vfs_getattr() and check mode and size in kstat; if
> > it's sufficiently hot for that to hurt, we are fucked anyway.
> >
> > + file = filp_open(path, O_RDONLY, 0);
> > + if (IS_ERR(file))
> > + continue;
> > +printk("from file '%s' ", path);
> > + success = fw_read_file_contents(file, fw);
> > + filp_close(file, NULL);
> >
> > fput(file), please. We have enough misuses of filp_close() as it is...
>
> Ok, like this?

Looks sane. TBH, I'd still prefer to see udev forcibly taken over and put into
usr/udev in kernel tree - I don't trust that crowd at all and the fewer
critical userland bits they can play leverage games with, the safer we are.

Al, that -><- close to volunteering for maintaining that FPOS kernel-side...

2012-10-03 19:47:07

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Em 03-10-2012 13:57, Greg KH escreveu:
> On Wed, Oct 03, 2012 at 04:36:53PM +0200, Kay Sievers wrote:
>> On Wed, Oct 3, 2012 at 12:12 AM, Greg KH <[email protected]> wrote:
>>
>>> Mauro, what version of udev are you using that is still showing this
>>> issue?
>>>
>>> Kay, didn't you resolve this already? If not, what was the reason why?
>>
>> It's the same in the current release, we still haven't wrapped our
>> head around how to fix it/work around it.
>
> Ick, as this is breaking people's previously-working machines, shouldn't
> this be resolved quickly?
>
>> Unlike what the heated and pretty uncivilized and rude emails here
>> claim, udev does not dead-lock or "break" things, it's just "slow".
>> The modprobe event handling runs into a ~30 second event timeout.
>> Everything is still fully functional though, there's only this delay.
>
> Mauro said it broke the video drivers. Mauro, if you wait 30 seconds,
> does everything then "work"?
>
> Not to say that waiting 30 seconds is a correct thing here...

Before the implementation of the DVB async firmware load, as requested
by udev maintainers, and on a test case, the additional 30 seconds
wait won't hurt[1].

[1] Well, I didn't test any tm6000 devices with Kernel 3.6. On tm6000,
firmwares are needed for xc3028 tuners, via the board internal I2C bus.
Those devices only support 10 kHz speed for those transfers. Also, due
to a bug at the hardware, extra delay is needed. So, a firmware load
there takes ~30 seconds. I suspect that waiting for ~60 seconds will
trigger the max allowed time for firmware load to happen, actually
breaking completely the driver.

There are applications like MythTV, however, that opens all interfaces
produced by a media device at the same time (alsa, video nodes, dvb nodes).

I'm not a MythTV user myself, but someone told me recently that MythTV
has a 5 seconds timeout there. If it can't open an interface on that time,
it will give up for that device.

As you likely know, MythTV is used as a Media Center: there are distros
that load it (or some other applications like XBMC) at boot time.

As the media drivers are modular, the DVB, alsa and TV API's are implemented
on (almost) independent modules. On such scenario, if the DVB firmware
load takes 30 seconds, because the DVB demod requires a firmware,
while the analog TV doesn't require a firmware, I suspect that Mythtv
will assume that DVB is not there, causing a failure to the end user.

[1] The attempt to fix the issues with udev forced us to re-design a few
drivers. One of them (drxk) now loads firmware asynchronously. That effectively
broke the driver on Kernel 3.6, for PCTV520e and similar devices, as the
tuner driver, attached just after drx-k, stopped working, likely due to
some missing clock or GPIO signals that drx-k is supposed to rise after
firmware load.

The fixes for it are at:
http://git.linuxtv.org/media_tree.git/commit/6ae5e060840589f567c1837613e8a9d34fc9188a
http://git.linuxtv.org/media_tree.git/commit/8e30783b0b3270736b2cff6415c68b894bc411df
http://git.linuxtv.org/media_tree.git/commit/2425bb3d4016ed95ce83a90b53bd92c7f31091e4

In order to fix that issues, I had to defer DVB initialization for all em28xx
devices, in despite of the fact that only 5 boards (of 86 devices supported there)
require DVB firmwares. Patches are likely a little pedantic, as they are
doing async DVB initialization even when the drivers are compiled builtin,
but, as I do expect to revert the entire async thing from em28xx/drx-k,
I didn't want to make too big changes there.

FYI, I'm planning to upstream those fixes tomorrow, if nobody detects
any issue on this approach.

> I recommend making module loading async, like it used to be, and then
> all should be fine, right?

IMHO, module loading should be async: it seems a way better and more
POSIX compliant to take more time during init, than delaying firmware
init to happen open(), especially when open() is called on non-block mode.

Regards,
Mauro

2012-10-03 19:48:07

by Kirill A. Shutemov

[permalink] [raw]
Subject: Access files from kernel

On Wed, Oct 03, 2012 at 09:38:52AM -0700, Linus Torvalds wrote:
>+static bool fw_get_filesystem_firmware(struct firmware *fw, const char *name)
>+{
>+ int i;
>+ bool success = false;
>+ const char *fw_path[] = { "/lib/firmware/update", "/firmware", "/lib/firmware" };
>+ char *path = __getname();
>+
>+printk("Trying to load fw '%s' ", name);
>+ for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
>+ struct file *file;
>+ snprintf(path, PATH_MAX, "%s/%s", fw_path[i], name);
>+
>+ file = filp_open(path, O_RDONLY, 0);

AFAIK, accessing files on filesystem form kernel directly was no-go for a
long time. What's the new rule here?

Is it worth to introduce an execption, if it's possible to solve the
problem in userspace.

--
Kirill A. Shutemov

2012-10-03 19:51:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 10:32:08AM -0700, Linus Torvalds wrote:
> On Wed, Oct 3, 2012 at 10:09 AM, Al Viro <[email protected]> wrote:
> >
> > + if (!S_ISREG(inode->i_mode))
> > + return false;
> > + size = i_size_read(inode);
> >
> > Probably better to do vfs_getattr() and check mode and size in kstat; if
> > it's sufficiently hot for that to hurt, we are fucked anyway.
> >
> > + file = filp_open(path, O_RDONLY, 0);
> > + if (IS_ERR(file))
> > + continue;
> > +printk("from file '%s' ", path);
> > + success = fw_read_file_contents(file, fw);
> > + filp_close(file, NULL);
> >
> > fput(file), please. We have enough misuses of filp_close() as it is...
>
> Ok, like this?

This looks good to me. Having udev do firmware loading and tieing it to
the driver model may have not been such a good idea so many years ago.
Doing it this way makes more sense.

greg k-h

2012-10-03 20:32:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: Access files from kernel

On Wed, Oct 3, 2012 at 12:48 PM, Kirill A. Shutemov
<[email protected]> wrote:
>
> AFAIK, accessing files on filesystem form kernel directly was no-go for a
> long time. What's the new rule here?

Oh, we've *always* accessed files from the kernel.

What we don't want is random drivers doing so directly and without a
good abstraction layer, because that just ends up being a total
nightmare.

Still, they've done that too. Ugh. Too many drivers having random
hacks like that.

Linus

2012-10-03 20:39:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <[email protected]> wrote:
>>
>> Ok, like this?
>
> This looks good to me. Having udev do firmware loading and tieing it to
> the driver model may have not been such a good idea so many years ago.
> Doing it this way makes more sense.

Ok, I wish this had been getting more testing in Linux-next or
something, but I suspect that what I'll do is to commit this patch
asap, and then commit another patch that turns off udev firmware
loading entirely for the synchronous firmware loading case.

Why? Just to get more testing, and seeing if there are reports of
breakage. Maybe some udev out there has a different search path (or
because udev runs in a different filesystem namespace or whatever), in
which case running udev as a fallback would otherwise hide the fact
that he direct kernel firmware loading isn't working.

We can (and will) revert things if that turns out to break things, but
I'd like to make any failures of the firmware direct-load path be fast
and hard, so that we can see when/what it breaks.

Ok? Comments?

Linus

2012-10-03 21:04:24

by Kay Sievers

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 10:39 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <[email protected]> wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me. Having udev do firmware loading and tieing it to
>> the driver model may have not been such a good idea so many years ago.
>> Doing it this way makes more sense.
>
> Ok, I wish this had been getting more testing in Linux-next or
> something, but I suspect that what I'll do is to commit this patch
> asap, and then commit another patch that turns off udev firmware
> loading entirely for the synchronous firmware loading case.
>
> Why? Just to get more testing, and seeing if there are reports of
> breakage. Maybe some udev out there has a different search path (or
> because udev runs in a different filesystem namespace or whatever), in
> which case running udev as a fallback would otherwise hide the fact
> that he direct kernel firmware loading isn't working.

> Ok? Comments?

The current udev directory search order is:
/lib/firmware/updates/$(uname -r)/
/lib/firmware/updates/
/lib/firmware/$(uname -r)/
/lib/firmware/

There is no commonly known /firmware directory.

http://cgit.freedesktop.org/systemd/systemd/tree/src/udev/udev-builtin-firmware.c#n100
http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n548

Kay

2012-10-03 21:05:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 01:39:23PM -0700, Linus Torvalds wrote:
> On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <[email protected]> wrote:
> >>
> >> Ok, like this?
> >
> > This looks good to me. Having udev do firmware loading and tieing it to
> > the driver model may have not been such a good idea so many years ago.
> > Doing it this way makes more sense.
>
> Ok, I wish this had been getting more testing in Linux-next or
> something, but I suspect that what I'll do is to commit this patch
> asap, and then commit another patch that turns off udev firmware
> loading entirely for the synchronous firmware loading case.
>
> Why? Just to get more testing, and seeing if there are reports of
> breakage. Maybe some udev out there has a different search path (or
> because udev runs in a different filesystem namespace or whatever), in
> which case running udev as a fallback would otherwise hide the fact
> that he direct kernel firmware loading isn't working.
>
> We can (and will) revert things if that turns out to break things, but
> I'd like to make any failures of the firmware direct-load path be fast
> and hard, so that we can see when/what it breaks.
>
> Ok? Comments?

I have no objection to this. As for the firmware path, maybe we should
change that to be modified by userspace (much like /sbin/hotplug was) in
a proc file so that distros can override the location if they need to.
But for now, that's probably overkill. This solves the problem that
Mauro and others have reported and can be easily backported by any
affected distros if needed.

thanks,

greg k-h

2012-10-03 21:11:15

by Andy Walls

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Greg KH <[email protected]> wrote:

>On Wed, Oct 03, 2012 at 10:32:08AM -0700, Linus Torvalds wrote:
>> On Wed, Oct 3, 2012 at 10:09 AM, Al Viro <[email protected]>
>wrote:
>> >
>> > + if (!S_ISREG(inode->i_mode))
>> > + return false;
>> > + size = i_size_read(inode);
>> >
>> > Probably better to do vfs_getattr() and check mode and size in
>kstat; if
>> > it's sufficiently hot for that to hurt, we are fucked anyway.
>> >
>> > + file = filp_open(path, O_RDONLY, 0);
>> > + if (IS_ERR(file))
>> > + continue;
>> > +printk("from file '%s' ", path);
>> > + success = fw_read_file_contents(file, fw);
>> > + filp_close(file, NULL);
>> >
>> > fput(file), please. We have enough misuses of filp_close() as it
>is...
>>
>> Ok, like this?
>
>This looks good to me. Having udev do firmware loading and tieing it
>to
>the driver model may have not been such a good idea so many years ago.
>Doing it this way makes more sense.
>
>greg k-h
>--
>To unsubscribe from this list: send the line "unsubscribe linux-media"
>in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

I agree that not calling out to userspace for firmware load is better.

Here is an old, unresolved bug about Oops on firmware loading race condition

https://bugzilla.kernel.org/show_bug.cgi?id=15294

The firmware loading timeout in the kernel was cleaning things up, just as udev what trying to say "I'm done loading the firmware" via sysfs; and then *boom*.

Regards,
Andy

2012-10-03 21:18:29

by Kay Sievers

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 11:05 PM, Greg KH <[email protected]> wrote:

> As for the firmware path, maybe we should
> change that to be modified by userspace (much like /sbin/hotplug was) in
> a proc file so that distros can override the location if they need to.

If that's needed, a CONFIG_FIRMWARE_PATH= with the array of locations
would probably be sufficient.

Like udev's defaults here:
http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n550

Kay

2012-10-03 21:41:19

by Alan

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, 3 Oct 2012 23:18:06 +0200
Kay Sievers <[email protected]> wrote:

> On Wed, Oct 3, 2012 at 11:05 PM, Greg KH <[email protected]> wrote:
>
> > As for the firmware path, maybe we should
> > change that to be modified by userspace (much like /sbin/hotplug was) in
> > a proc file so that distros can override the location if they need to.
>
> If that's needed, a CONFIG_FIRMWARE_PATH= with the array of locations
> would probably be sufficient.

The current system permits firmware to be served by a daemon, or even
assembled on the fly from parts. You break that for one.

Just fix udev, and if you can't fix it someone please just fork the last
working one.

Alan

2012-10-03 21:58:56

by Lucas De Marchi

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 5:39 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <[email protected]> wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me. Having udev do firmware loading and tieing it to
>> the driver model may have not been such a good idea so many years ago.
>> Doing it this way makes more sense.
>
> Ok, I wish this had been getting more testing in Linux-next or
> something, but I suspect that what I'll do is to commit this patch
> asap, and then commit another patch that turns off udev firmware
> loading entirely for the synchronous firmware loading case.

This would break non-udev users with different paths. Namely Android,
which uses /system/etc/firmware and /system/vendor/firmware (not sure
about them though, I'm not keen on Android camp)

So maintaining the fallback or adding a configurable entry to set the
firmware paths might be good.


Lucas De Marchi

2012-10-03 22:15:56

by Lucas De Marchi

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, Oct 2, 2012 at 7:37 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Oct 2, 2012 at 2:03 PM, Ivan Kalvachev <[email protected]> wrote:
>>
>> I'm not kernel developer and probably my opinion would be a little
>> naive, but here it is.
>>
>> Please, make the kernel load firmware from the filesystem on its own.
>
> We probably should do that, not just for firmware, but for modules
> too. It would also simplify the whole "built-in firmware" thing
>
> Afaik, the only thing udev really does is to lok in
> /lib/firmware/updates and /lib/firmware for the file, load it, and
> pass it back to the kernel. We could make the kernel try to do it
> manually first, and only fall back to udev if that fails.
>
> Afaik, nobody ever does anything else anyway.
>
> I'd prefer to not have to do that, but if the udev code is buggy or
> the maintainership is flaky, I guess we don't really have much choice.
>
> Doing the same thing for module loading is probably a good idea too.

humn... I don't think so. It would work perfectly well for firmware,
but for modules there are more things involved like fulfilling
dependencies, soft-dependencies, aliases and the like. It would create
several regressions.

> There were already people (from the google/Android camp) who wanted to
> do module loading based on filename rather than the buffer passed to
> it, because they have a "I trust this filesystem" model.

They wanted to pass a fd instead of a buffer. That is being done in
the new finit_module syscall being discussed:
http://www.gossamer-threads.com/lists/linux/kernel/1592271?do=post_view_flat


Lucas De Marchi

2012-10-03 22:17:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 2:58 PM, Lucas De Marchi
<[email protected]> wrote:
>
> So maintaining the fallback or adding a configurable entry to set the
> firmware paths might be good.

Yeah, I do think we need to make it configurable. Probably both at
kernel compile time and dynamically.

The aim of having a user-mode daemon do this was that it would be easy
to configure. Sadly, if we can't trust the daemon, that is all totally
worthless.

Linus

2012-10-03 22:48:54

by Andy Walls

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Linus Torvalds <[email protected]> wrote:

>On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <[email protected]>
>wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me. Having udev do firmware loading and tieing it
>to
>> the driver model may have not been such a good idea so many years
>ago.
>> Doing it this way makes more sense.
>
>Ok, I wish this had been getting more testing in Linux-next or
>something, but I suspect that what I'll do is to commit this patch
>asap, and then commit another patch that turns off udev firmware
>loading entirely for the synchronous firmware loading case.
>
>Why? Just to get more testing, and seeing if there are reports of
>breakage. Maybe some udev out there has a different search path (or
>because udev runs in a different filesystem namespace or whatever), in
>which case running udev as a fallback would otherwise hide the fact
>that he direct kernel firmware loading isn't working.
>
>We can (and will) revert things if that turns out to break things, but
>I'd like to make any failures of the firmware direct-load path be fast
>and hard, so that we can see when/what it breaks.
>
>Ok? Comments?
>
> Linus
>--
>To unsubscribe from this list: send the line "unsubscribe linux-media"
>in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

I don't know if you can remove the /sys/.../firmware ABI altogether, because there is at least one, somewhat popular udev replacement that also uses it: mdev

http://git.busybox.net/busybox/plain/docs/mdev.txt

Regards,
Andy

2012-10-03 22:54:02

by Stephen Rothwell

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

Hi Linus,

On Wed, 3 Oct 2012 13:39:23 -0700 Linus Torvalds <[email protected]> wrote:
>
> Ok, I wish this had been getting more testing in Linux-next or
> something

If you ever want a patch tested for a few days, just send it to me and I
will put it in my "fixes" tree which is merged into linux-next
immediately on top of your tree. If nothing else, that will give it wide
build testing (see http://kisskb.ellerman.id.au/linux-next).

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
(No filename) (520.00 B)
(No filename) (836.00 B)
Download all attachments

2012-10-03 22:58:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 3:48 PM, Andy Walls <[email protected]> wrote:
>
> I don't know if you can remove the /sys/.../firmware ABI altogether, because there is at least one, somewhat popular udev replacement that also uses it: mdev
>
> http://git.busybox.net/busybox/plain/docs/mdev.txt

Heh. That web doc documents /lib/firmware as being the place to be.

That said, there's clearly enough variation here that I think that for
now I won't take the step to disable the udev part. I'll do the patch
to support "direct filesystem firmware loading" using the udev default
paths, and that hopefully fixes the particular case people see with
media modules.

We definitely want to have configurable paths and a way to configure
udev entirely off for firmware (together with a lack of paths
configuring the direct filesystem loading off - that way people can
set things up just the way they like), but since I'm travelling
tomorrow and this clearly needs more work, I'll do the first step only
for now..

Linus

2012-10-04 00:57:29

by Nix

[permalink] [raw]
Subject: Re: udev breakages -

On 3 Oct 2012, Al Viro spake thusly:

> Looks sane. TBH, I'd still prefer to see udev forcibly taken over and put into
> usr/udev in kernel tree - I don't trust that crowd at all and the fewer
> critical userland bits they can play leverage games with, the safer we are.
>
> Al, that -><- close to volunteering for maintaining that FPOS kernel-side...

<flamebait type="heartfelt" subtype="fever-induced">

Please! It has already been forked at least once in userspace by people
who have the temerity to *not use systemd*, imagine that! and still want
a udev that is up-to-date in other ways. (We are now being told that,
contrary to what was said when udev was migrated into the systemd tree,
running udev without systemd is now deprecated and untested and might go
away completely. How surprising, nobody ever predicted that when it
migrated in, oh wait yes we did, and were assured that we were wrong,
that standalone udev would always be supported for those of us who
weren't using systemd. Way to destroy your userbase's trust in you...)

Possibly udev 175, the last standalone udev as far as I know, is a good
place to start from: but you might want to go back a release or two
before that, since 174 was where they started doing insane
backward-compatibility breaks. By 175 they were requiring devtmpfs, no
longer creating device nodes (which I thought was the original *point*
of udev), moving lots of install locations on the assumption that / and
/usr were on the same filesystem, and migrating udevd from /sbin into
/lib/udev without bothering to provide a backward-compatibility symlink.
Net benefit of all this thrashing about to udev users: nil. Net sysadmin
overhead on upgrade: substantial, oh and if you don't do it your system
won't boot.

By udev 175 I, and a lot of other people, had simply stopped upgrading
udev entirely on the grounds that we could no longer tolerate the
uncertainty over whether our systems would boot every time we upgraded
it, for no discernible benefit. Yes, all the incompatible changes are
(or were, as of udev 175) called out in the release notes -- but there
are so *many* of them, it's easy to miss one. And they all seem so
completely unnecessary, and their implications for your system
configuration grow more and more invasive all the time.

When gregkh was maintaining udev it was nicely robust and kept working
from release to release, and just did its job without requiring us to
change the way our system booted in backwardly-incompatible ways on
every release, merge filesystems together or mount /usr in early boot
(which is SSH-tunneled over a network in the case of one of my systems,
that was fun to do in the initramfs), or install invasive packages that
extend tentacles throughout the entire system, require a complete
rewriting of my boot process and require millions of kernel features
that may not always be turned on.

I'd like those days back. I can trust the kernel people to maintain some
semblance of userspace compatibility between releases, as is crucial for
boot-critical processes. It is now quite clear that I cannot trust the
present udev maintainers, or anyone else involved in the ongoing Linux
desktop trainwreck, to do any such thing.

</flamebait>

--
NULL && (void)

2012-10-04 01:42:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 6:33 PM, Ming Lei <[email protected]> wrote:
>
> Yes, the patch will make firmware cache not working, I would like to fix
> that when I return from one trip next week.
>
> BTW, firmware cache is still needed even direct loading is taken.

I agree 100%, I'd have liked to do the caching for the direct-loading
case too. It's just that the freeing case for that is so intimately
tied to the firmware_buf format which is actually very inconvenient
for direct-loading, that making that happen looked a lot more
involved.

And I was indeed hoping you'd look at it, since you touched the code
last. "Tag, you're it"

It shouldn't be *too* bad to instead of doing the "vmalloc()" allocate
an array of pages and then using "vmap()" instead in order to read
them (we end up doing the vmap anyway, since the firmware *user* wants
a virtually contiguous buffer), but the code will definitely get a bit
more opaque.

Linus

2012-10-04 02:40:21

by Kay Sievers

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Thu, Oct 4, 2012 at 12:58 AM, Linus Torvalds
<[email protected]> wrote:
> That said, there's clearly enough variation here that I think that for
> now I won't take the step to disable the udev part. I'll do the patch
> to support "direct filesystem firmware loading" using the udev default
> paths, and that hopefully fixes the particular case people see with
> media modules.

If that approach looks like it works out, please aim for full
in-kernel-*only* support. I would absolutely like to get udev entirely
out of the sick game of firmware loading here. I would welcome if we
are not falling back to the blocking timeouted behaviour again.

The whole story would be contained entirely in the kernel, and we get
rid of the rather fragile "userspace transaction" to execute
module_init(), where the kernel has no idea if userspace is even up to
ever responding to its requests.

There would be no coordination with userspace tools needed, which
sounds like a better fit in the way we develop things with the loosely
coupled kernel <-> udev requirements.

If that works out, it would a bit like devtmpfs which turned out to be
very simple, reliable and absolutely the right thing we could do to
primarily mange /dev content.

The whole dance with the fake firmware struct device, which has a 60
second timeout to wait for userspace, is a long story of weird
failures at various aspects.

It would not only solve the unfortunate modprobe lockup with
init=/bin/sh we see here, also big servers with an insane amount of
devices happen to run into the 60 sec timeout, because udev, which
runs with 4000-8000 threads in parallel handling things like 30.000
disks is not scheduled in time to fulfill network card firmware
requests. It would be nice if we don't have that arbitrary timeout at
all.

Having any timeout at all to answer the simple question if a file
stored in the rootfs exists, should be a hint that there is something
really wrong with the model that stuff is done.

Thanks,
Kay

2012-10-04 10:35:28

by Nix

[permalink] [raw]
Subject: Re: udev breakages -

[Kay removed because I don't like emailing arguable flamebait directly
to the person flamed.]

On 4 Oct 2012, [email protected] stated:

> By udev 175 I, and a lot of other people, had simply stopped upgrading
> udev entirely on the grounds that we could no longer tolerate the
> uncertainty over whether our systems would boot every time we upgraded
> it, for no discernible benefit. Yes, all the incompatible changes are
> (or were, as of udev 175) called out in the release notes -- but there
> are so *many* of them, it's easy to miss one. And they all seem so
> completely unnecessary, and their implications for your system
> configuration grow more and more invasive all the time.

In the bright light of day I realize that this post is not as off-topic
for this thread as it appears. This is all of a piece. The udev
maintainer insists that everyone else adapt to udev's demands: before
now, it has been users, sysadmins, and userspace who must adapt, but now
is is pushing its demands in the other direction as well: this thread is
the result.

--
NULL && (void)

2012-10-04 13:39:44

by Josh Boyer

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 3, 2012 at 6:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Oct 3, 2012 at 3:48 PM, Andy Walls <[email protected]> wrote:
>>
>> I don't know if you can remove the /sys/.../firmware ABI altogether, because there is at least one, somewhat popular udev replacement that also uses it: mdev
>>
>> http://git.busybox.net/busybox/plain/docs/mdev.txt
>
> Heh. That web doc documents /lib/firmware as being the place to be.
>
> That said, there's clearly enough variation here that I think that for
> now I won't take the step to disable the udev part. I'll do the patch
> to support "direct filesystem firmware loading" using the udev default
> paths, and that hopefully fixes the particular case people see with
> media modules.

As you probably noticed, we had a tester in the RH bug report success
with the commit you included yesterday.

Do you think this is something worth including in the stable kernels
after it gets some further testing during the merge window? Perhaps
not that specific commit as there seems to be some additional changes
needed for configurable paths, etc, but a backport of the fleshed out
changeset might be wanted.

We have a new enough udev in Fedora 17 to hit this issue with 3.5 and
3.6 when we rebase. I'm sure other distributions will be in similar
circumstances soon if they aren't already. Udev isn't going to be
fixed, so having something working in these cases would be great.

josh

2012-10-04 13:58:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Thu, Oct 04, 2012 at 09:39:41AM -0400, Josh Boyer wrote:
> > That said, there's clearly enough variation here that I think that for
> > now I won't take the step to disable the udev part. I'll do the patch
> > to support "direct filesystem firmware loading" using the udev default
> > paths, and that hopefully fixes the particular case people see with
> > media modules.
>
> As you probably noticed, we had a tester in the RH bug report success
> with the commit you included yesterday.
>
> Do you think this is something worth including in the stable kernels
> after it gets some further testing during the merge window? Perhaps
> not that specific commit as there seems to be some additional changes
> needed for configurable paths, etc, but a backport of the fleshed out
> changeset might be wanted.
>
> We have a new enough udev in Fedora 17 to hit this issue with 3.5 and
> 3.6 when we rebase. I'm sure other distributions will be in similar
> circumstances soon if they aren't already. Udev isn't going to be
> fixed, so having something working in these cases would be great.

Yes, I don't have a problem taking this into the stable kernel releases
once it gets some testing and fleshed out. I'll be watching it to see
how it goes.

thanks,

greg k-h

2012-10-04 14:36:40

by Jiri Kosina

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Tue, 2 Oct 2012, Linus Torvalds wrote:

> Now, at the same time, I do agree that network devices should generally
> try to delay it until ifup time

Slightly tangential to the ongoing discussion, but still ... I think that
even "all network drivers should delay firmware loading to ifup time"
might be too general.

I would expect that there are network cards which require firmware to be
present for PHY to work, right? On such cards, if you want to have link
detection even on interfaces that are down (so that things like ifplugd
can detect the link presence and configure the interface), ifup is too
late.

I admit I haven't checked whether there actually are such cards out there,
but it doesn't seem to be completely unrealistic to me.

--
Jiri Kosina
SUSE Labs

2012-10-04 14:57:07

by Kurt H Maier

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Wed, Oct 03, 2012 at 07:27:01PM +0000, Al Viro wrote:
>
> Al, that -><- close to volunteering for maintaining that FPOS kernel-side...
>

This would be fantastic.

Kurt H Maier

2012-10-04 17:30:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: udev breakages -

Kay Sievers <[email protected]> writes:

> If that works out, it would a bit like devtmpfs which turned out to be
> very simple, reliable and absolutely the right thing we could do to
> primarily mange /dev content.

ROFL.

There are still quite a few interesting cases that devtmpfs does not
even think about supporting. Cases that were reported when devtmpfs was
being reviewed.

Additionally the devtmpfs maintainership has not dealt with legitimate
concerns any better than this firmware issue has been dealt with. I
still haven't even hear a productive suggestion back on the hole
/dev/ptmx mess.

As it happens devtmpfs wound up being a userspace process that happens
to reside in the kernel and call mknod. How it makes sense two layers
of messaging and device management instead of just one I don't know.
Certainly I would not crow about that being a success of anything except
passing the buck.

There is debacle written all over the user space interface for dealing
with devices right now.

Eric

2012-10-04 17:43:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: udev breakages -

On Thu, Oct 04, 2012 at 10:29:51AM -0700, Eric W. Biederman wrote:
> There are still quite a few interesting cases that devtmpfs does not
> even think about supporting. Cases that were reported when devtmpfs was
> being reviewed.

Care to refresh my memory?

> Additionally the devtmpfs maintainership has not dealt with legitimate
> concerns any better than this firmware issue has been dealt with. I
> still haven't even hear a productive suggestion back on the hole
> /dev/ptmx mess.

I don't know how to handle the /dev/ptmx issue properly from within
devtmpfs, does anyone? Proposals are always welcome, the last time this
came up a week or so ago, I don't recall seeing any proposals, just a
general complaint.

thanks,

greg k-h

2012-10-04 19:12:31

by Alan

[permalink] [raw]
Subject: Re: udev breakages -

> I don't know how to handle the /dev/ptmx issue properly from within
> devtmpfs, does anyone? Proposals are always welcome, the last time this
> came up a week or so ago, I don't recall seeing any proposals, just a
> general complaint.

Is it really a problem - devtmpfs is optional. It's a problem for the
userspace folks to handle and if they made it mandatory in their code
diddums, someone better go fork working versions.

Alan

2012-10-05 08:03:34

by Emmanuel Benisty

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Thu, Oct 4, 2012 at 9:50 PM, Kurt H Maier <[email protected]> wrote:
> On Wed, Oct 03, 2012 at 07:27:01PM +0000, Al Viro wrote:
>>
>> Al, that -><- close to volunteering for maintaining that FPOS kernel-side...
>>
>
> This would be fantastic.

And that would solve this very much worrying issue [1], quoting:
"(Yes, udev on non-systemd systems is in our eyes a dead end, in case you
haven't noticed it yet. I am looking forward to the day when we can drop
that support entirely.)"

[1] http://lists.freedesktop.org/archives/systemd-devel/2012-August/006066.html

2012-10-05 20:17:21

by David Lang

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

> On Thu, Oct 4, 2012 at 9:50 PM, Kurt H Maier <[email protected]> wrote:
>> On Wed, Oct 03, 2012 at 07:27:01PM +0000, Al Viro wrote:
>>>
>>> Al, that -><- close to volunteering for maintaining that FPOS kernel-side...
>>>
>>
>> This would be fantastic.
>
> And that would solve this very much worrying issue [1], quoting:
> "(Yes, udev on non-systemd systems is in our eyes a dead end, in case you
> haven't noticed it yet. I am looking forward to the day when we can drop
> that support entirely.)"
>
> [1] http://lists.freedesktop.org/archives/systemd-devel/2012-August/006066.html

I think it's worth noting that even though udev 189 was recently released,
the not-yet-released Ubuntu 10.10 is going to be including udev 175.

David Lang

Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Fri, 05 Oct 2012, [email protected] wrote:
> >On Thu, Oct 4, 2012 at 9:50 PM, Kurt H Maier <[email protected]> wrote:
> >>On Wed, Oct 03, 2012 at 07:27:01PM +0000, Al Viro wrote:
> >>>Al, that -><- close to volunteering for maintaining that FPOS kernel-side...
> >>
> >>This would be fantastic.
> >
> >And that would solve this very much worrying issue [1], quoting:
> >"(Yes, udev on non-systemd systems is in our eyes a dead end, in case you
> >haven't noticed it yet. I am looking forward to the day when we can drop
> >that support entirely.)"
> >
> >[1] http://lists.freedesktop.org/archives/systemd-devel/2012-August/006066.html
>
> I think it's worth noting that even though udev 189 was recently
> released, the not-yet-released Ubuntu 10.10 is going to be including
> udev 175.

So is Debian.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2012-10-06 19:10:21

by Nix

[permalink] [raw]
Subject: Re: udev breakages -

On 5 Oct 2012, Henrique de Moraes Holschuh told this:

> On Fri, 05 Oct 2012, [email protected] wrote:
>> >On Thu, Oct 4, 2012 at 9:50 PM, Kurt H Maier <[email protected]> wrote:
>> >>On Wed, Oct 03, 2012 at 07:27:01PM +0000, Al Viro wrote:
>> >>>Al, that -><- close to volunteering for maintaining that FPOS kernel-side...
>> >>
>> >>This would be fantastic.
>> >
>> >And that would solve this very much worrying issue [1], quoting:
>> >"(Yes, udev on non-systemd systems is in our eyes a dead end, in case you
>> >haven't noticed it yet. I am looking forward to the day when we can drop
>> >that support entirely.)"
>> >
>> >[1] http://lists.freedesktop.org/archives/systemd-devel/2012-August/006066.html
>>
>> I think it's worth noting that even though udev 189 was recently
>> released, the not-yet-released Ubuntu 10.10 is going to be including
>> udev 175.
>
> So is Debian.

I'm a bit surprised they didn't stick with 168, really. Even udev 175
requires a /usr mount in the initramfs, and devtmpfs (though the latter
is easy to replace). The former is probably a good idea anyway. The
latter... for any system other than the very largest, or embedded
systems that must boot in a second or less, I can't see a benefit.

--
NULL && (void)

2012-10-10 03:19:23

by Felipe Contreras

[permalink] [raw]
Subject: Re: udev breakages -

On Thu, Oct 4, 2012 at 9:17 PM, Alan Cox <[email protected]> wrote:
>> I don't know how to handle the /dev/ptmx issue properly from within
>> devtmpfs, does anyone? Proposals are always welcome, the last time this
>> came up a week or so ago, I don't recall seeing any proposals, just a
>> general complaint.
>
> Is it really a problem - devtmpfs is optional. It's a problem for the
> userspace folks to handle and if they made it mandatory in their code
> diddums, someone better go fork working versions.

If only there was a viable alternative to udev.

Distributions are being pushed around by the udev+systemd project
precisely because of this reason; udev maintainers have said that udev
on non-systemd systems is a dead end, so everyone that uses udev
(everyone) is being forced to switch to systemd if they want to
receive proper support, and at some point there might not be even a
choice.

I for one would like an alternative to both systemd and udev on my
Linux systems, and as of yet, I don't know of one.

Cheers.

--
Felipe Contreras

2012-10-10 16:08:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: udev breakages -

On Wed, Oct 10, 2012 at 5:19 AM, Felipe Contreras
<[email protected]> wrote:
> On Thu, Oct 4, 2012 at 9:17 PM, Alan Cox <[email protected]> wrote:
>>> I don't know how to handle the /dev/ptmx issue properly from within
>>> devtmpfs, does anyone? Proposals are always welcome, the last time this
>>> came up a week or so ago, I don't recall seeing any proposals, just a
>>> general complaint.
>>
>> Is it really a problem - devtmpfs is optional. It's a problem for the
>> userspace folks to handle and if they made it mandatory in their code
>> diddums, someone better go fork working versions.
>
> If only there was a viable alternative to udev.
>
> Distributions are being pushed around by the udev+systemd project
> precisely because of this reason; udev maintainers have said that udev
> on non-systemd systems is a dead end, so everyone that uses udev
> (everyone) is being forced to switch to systemd if they want to
> receive proper support, and at some point there might not be even a
> choice.
>
> I for one would like an alternative to both systemd and udev on my
> Linux systems, and as of yet, I don't know of one.

A few years ago, the OpenWRT people pointed me to hotplug2 when I mentioned
udev made my poor m68k box with 12 MiB of RAM immediately go OOM.
Don't know if it's suitable for "bigger" machines, though.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2012-10-11 03:32:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: udev breakages -

Greg KH <[email protected]> writes:

> On Thu, Oct 04, 2012 at 10:29:51AM -0700, Eric W. Biederman wrote:
>> There are still quite a few interesting cases that devtmpfs does not
>> even think about supporting. Cases that were reported when devtmpfs was
>> being reviewed.
>
> Care to refresh my memory?

Anyone who wants something besides the default policy. Containers
chroots anyone who doesn't want /dev/console to be c 5 1.

>> Additionally the devtmpfs maintainership has not dealt with legitimate
>> concerns any better than this firmware issue has been dealt with. I
>> still haven't even hear a productive suggestion back on the hole
>> /dev/ptmx mess.
>
> I don't know how to handle the /dev/ptmx issue properly from within
> devtmpfs, does anyone? Proposals are always welcome, the last time this
> came up a week or so ago, I don't recall seeing any proposals, just a
> general complaint.

The proposal at that time was to work around the silliness with a little
kernel magic.

To recap for those who haven't watched closely. devpts now has a ptmx
device node and it would be very nice if we were to use that device
node instead of /dev/ptmx.

Baically it would be nice to tell udev to not create /dev/ptmx, and
instead to make /dev/ptmx a symlink to /dev/pts/ptmx.

I got to looking at the problem and if I don't worry about systemd and
just look at older versions of udev that are out there in the wild it
turns out the following udev configuratoin line does exactly what is
needed. It creats a symlink from /dev/ptmx to /dev/pts/ptmx. And if
on the odd chance devpts is not mounted it creates /dev/pts/ptmx as
well.

KERNEL=="ptmx" NAME:="pts/ptmx" SYMLINK="ptmx"

Does assigning to NAME to specify the device naming policy work in
systemd-udev or has that capability been ripped out?

Thinking about it. Since systemd-udev no longer supports changing the
device name. And likely it no longer even supports assigning to NAME
even for purposes of changing the target of the symlink. Then I expect
what we want to do is:

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 147d1a4..7dc5bed 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -377,6 +377,7 @@ static int devtmpfsd(void *p)
goto out;
sys_chdir("/.."); /* will traverse into overmounted root */
sys_chroot(".");
+ sys_symlink("pts/ptmx", "ptmx");
complete(&setup_done);
while (1) {
spin_lock(&req_lock);



Eric

2012-10-11 18:33:59

by Shea Levy

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On 10/02/2012 06:37 PM, Linus Torvalds wrote:
> On Tue, Oct 2, 2012 at 2:03 PM, Ivan Kalvachev <[email protected]> wrote:
>> I'm not kernel developer and probably my opinion would be a little
>> naive, but here it is.
>>
>> Please, make the kernel load firmware from the filesystem on its own.
> We probably should do that, not just for firmware, but for modules
> too. It would also simplify the whole "built-in firmware" thing
>
> Afaik, the only thing udev really does is to lok in
> /lib/firmware/updates and /lib/firmware for the file, load it, and
> pass it back to the kernel. We could make the kernel try to do it
> manually first, and only fall back to udev if that fails.
>
> Afaik, nobody ever does anything else anyway.
>

FWIW (and probably that's not much), the NixOS[0] distro doesn't
currently use /lib/firmware. There is no /lib directory by default on
NixOS, instead we create a new symlink tree representing the current
system on each system change and symlink /run/current-system to that
tree. We currently build udev/systemd with the
--with-firmware-path=/run/current-system/firmware configuration-time
option, but we also patch module-init-tools and kmod to respect the
$MODULE_DIR env var and may do the same for firmware in the future. The
way we do things has significant advantages (or at least we like to
think so), but we already have exceptions for /bin/sh and /usr/bin/env,
so I suspect we'll probably add in /lib/firmware if this functionality
moves into the kernel.

[0]: https://nixos.org/

Cheers,
Shea Levy

2012-10-12 01:55:21

by Ming Lei

[permalink] [raw]
Subject: Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

On Fri, Oct 12, 2012 at 2:33 AM, Shea Levy <[email protected]> wrote:

>
> FWIW (and probably that's not much), the NixOS[0] distro doesn't currently
> use /lib/firmware. There is no /lib directory by default on NixOS, instead
> we create a new symlink tree representing the current system on each system
> change and symlink /run/current-system to that tree. We currently build
> udev/systemd with the --with-firmware-path=/run/current-system/firmware
> configuration-time option, but we also patch module-init-tools and kmod to
> respect the $MODULE_DIR env var and may do the same for firmware in the
> future. The way we do things has significant advantages (or at least we like
> to think so), but we already have exceptions for /bin/sh and /usr/bin/env,
> so I suspect we'll probably add in /lib/firmware if this functionality moves
> into the kernel.

The kernel parameter for customizing firmware search path will be added, so
you use can pass your search path from kernel command too.

Thanks,
--
Ming Lei