2017-06-07 17:09:05

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()

Martin, just for completeness on documenting on the commit log of the next
swait proposed fix for this -- what signal did the process get from which you
note the child dies below ? Exactly what in Android sent this signal ?

> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal

Luis


2017-06-07 17:54:42

by Martin Fuzzey

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On 07/06/17 19:08, Luis R. Rodriguez wrote:
> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>> 1) Android init calls write() on the sysfs file
>> 2) The sysfs .store() callback registered by a driver is called
>> 3) The driver calls request_firmware()
>> 4) request_firmware() sends the firmware load request to userspace and
>> calls wait_for_completion_interruptible()
> Martin, just for completeness on documenting on the commit log of the next
> swait proposed fix for this -- what signal did the process get from which you
> note the child dies below ? Exactly what in Android sent this signal ?

Android didn't send the signal, the kernel did (SIGCHLD).

Like this:

1) Android init (pid=1) fork()s (say pid=42) [this child process is
totally unrelated to firmware loading]
2) Android init (pid=1) does a write() on a (driver custom) sysfs file
which ends up calling request_firmware() kernel side
3) The firmware loading fallback mechanism is used, the request is sent
to userspace and pid 1 waits in the kernel on wait_*
4) before firmware loading completes pid 42 dies (for any reason - in my
case normal termination)
5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
causes -ERESTARTSYS to be returned from wait_*


Martin

2017-06-09 01:10:45

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <[email protected]> wrote:
> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>
>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>
>>> 1) Android init calls write() on the sysfs file
>>> 2) The sysfs .store() callback registered by a driver is called
>>> 3) The driver calls request_firmware()
>>> 4) request_firmware() sends the firmware load request to userspace and
>>> calls wait_for_completion_interruptible()
>>
>> Martin, just for completeness on documenting on the commit log of the next
>> swait proposed fix for this -- what signal did the process get from which
>> you
>> note the child dies below ? Exactly what in Android sent this signal ?
>
>
> Android didn't send the signal, the kernel did (SIGCHLD).
>
> Like this:
>
> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> unrelated to firmware loading]
> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> ends up calling request_firmware() kernel side
> 3) The firmware loading fallback mechanism is used, the request is sent to
> userspace and pid 1 waits in the kernel on wait_*
> 4) before firmware loading completes pid 42 dies (for any reason - in my
> case normal termination)

Interesting, could one interpretation here be that the process
successfully finishing + the signal being sent beats out the timing of
the firmware_class syfs code detecting that the write completed ?

> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
> causes -ERESTARTSYS to be returned from wait_*

Luis

2017-06-09 01:57:33

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <[email protected]> wrote:
>> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>>
>>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>>
>>>> 1) Android init calls write() on the sysfs file
>>>> 2) The sysfs .store() callback registered by a driver is called
>>>> 3) The driver calls request_firmware()
>>>> 4) request_firmware() sends the firmware load request to userspace and
>>>> calls wait_for_completion_interruptible()
>>>
>>> Martin, just for completeness on documenting on the commit log of the next
>>> swait proposed fix for this -- what signal did the process get from which
>>> you
>>> note the child dies below ? Exactly what in Android sent this signal ?
>>
>>
>> Android didn't send the signal, the kernel did (SIGCHLD).
>>
>> Like this:
>>
>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>> unrelated to firmware loading]
>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>> ends up calling request_firmware() kernel side
>> 3) The firmware loading fallback mechanism is used, the request is sent to
>> userspace and pid 1 waits in the kernel on wait_*
>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>> case normal termination)

Martin just to be clear, by "normal case termination" do you mean
completing successfully ?? Ie the firmware actually did make it onto
the device ?

> Interesting, could one interpretation here be that the process
> successfully finishing + the signal being sent beats out the timing of
> the firmware_class syfs code detecting that the write completed ?
>
>> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
>> causes -ERESTARTSYS to be returned from wait_*
>
> Luis

2017-06-09 07:40:54

by Martin Fuzzey

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On 09/06/17 03:57, Luis R. Rodriguez wrote:
> On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <[email protected]> wrote:
>>> Android didn't send the signal, the kernel did (SIGCHLD).
>>>
>>> Like this:
>>>
>>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>>> unrelated to firmware loading]
>>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>>> ends up calling request_firmware() kernel side
>>> 3) The firmware loading fallback mechanism is used, the request is sent to
>>> userspace and pid 1 waits in the kernel on wait_*
>>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>>> case normal termination)
> Martin just to be clear, by "normal case termination" do you mean
> completing successfully ?? Ie the firmware actually did make it onto
> the device ?

The firmware did *not* make it onto the device since the
request_firmware() call returned an error
(the code that would have transfered it to the device is only executed
following a successful request_firmware)

The process that terminates normally is unrelated to firmware loading as
I said above.

The only things that matter are:
- It is a child process of the process that calls request_firmware()
- It terminates *while* the the wait_ is still in progress


Here is a way of reproducing the problem using the test_firmware module
(which I only just saw) on normal linux with no Android or custom driver


#!/bin/sh
set -e

# Make sure the system firmware loader doesn't get in the way
/etc/init.d/udev stop

modprobe test_firmware

DIR=/sys/devices/virtual/misc/test_firmware

echo 10 >/sys/class/firmware/timeout;
sleep 2 &
echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;



If run with the "sleep 2 &" it terminates after 2 seconds
If the sleep is commented it runs for the expected 10 seconds (the
firmware loading timeout)

Since the sleep process is a child of the script process requesting a
firmware load its death causes a SIGCHLD causing request_firmware() to
abort prematurely.


Martin

2017-06-09 21:12:23

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <[email protected]> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > >
> > > > Like this:
> > > >
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
>
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
>
> The process that terminates normally is unrelated to firmware loading as I
> said above.
>
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
>
>
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
>
>
> #!/bin/sh
> set -e
>
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
>
> modprobe test_firmware
>
> DIR=/sys/devices/virtual/misc/test_firmware
>
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
>
>
>
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
>
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

Thanks this could mean we also *should* trigger a failure if init is issuing
modprobe on a series of drivers and one completes before another while
request_firmware() is called on init or probe on a subsequent driver. If true
I'm surprised this never was reported back when the fallback mechanism was
popular, I suppose it was not an issue given most firmware *was* present on
/lib/firmware/ and the direct filesystem lookup first step always found the
firmware first, so this would only be an issue for folks relying on the
fallback mechanism exclusively.

Will include a test case based on your above script. Thanks!

Luis

2017-06-09 22:55:33

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <[email protected]> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > >
> > > > Like this:
> > > >
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
>
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
>
> The process that terminates normally is unrelated to firmware loading as I
> said above.
>
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
>
>
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
>
>
> #!/bin/sh
> set -e
>
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
>
> modprobe test_firmware
>
> DIR=/sys/devices/virtual/misc/test_firmware
>
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
>
>
>
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
>
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

BTW I've run some more tests and if you use request_firmware_nowait() the above
issue also is avoidable. Also, if you use the custom fallback (uevent parameter
is false) the timeout is max value always. The only way to kill the custom
requests is through the echo -1 > loading or the max timeout triggers (don't
rely on that!). The custom interface was a bad idea though so best to just
avoid it.

Anyway I can reproduce in my tests now, will add this testcase as well.
The stable fixes will be sent as well.

Luis