Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751656AbdFIVMX (ORCPT ); Fri, 9 Jun 2017 17:12:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:39362 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751584AbdFIVMV (ORCPT ); Fri, 9 Jun 2017 17:12:21 -0400 Date: Fri, 9 Jun 2017 23:12:17 +0200 From: "Luis R. Rodriguez" To: Martin Fuzzey Cc: "Luis R. Rodriguez" , Linux FS Devel , Alan Cox , "Ted Ts'o" , Andy Lutomirski , Dmitry Torokhov , "Michael Kerrisk (man-pages)" , Linux API , Peter Zijlstra , Greg KH , Daniel Wagner , David Woodhouse , jewalt@lgsinnovations.com, rafal@milecki.pl, Arend Van Spriel , "Rafael J. Wysocki" , "Li, Yi" , atull@opensource.altera.com, Moritz Fischer , Petr Mladek , Johannes Berg , Emmanuel Grumbach , Luca Coelho , Kalle Valo , Linus Torvalds , Kees Cook , AKASHI Takahiro , David Howells , Peter Jones , Hans de Goede , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback Message-ID: <20170609211217.GE27288@wotan.suse.de> References: <20170524214027.7775-1-mcgrof@kernel.org> <20170607170858.GK27288@wotan.suse.de> <59383DDA.3040702@parkeon.com> <593A50FF.40604@parkeon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <593A50FF.40604@parkeon.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2891 Lines: 72 On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote: > On 09/06/17 03:57, Luis R. Rodriguez wrote: > > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez wrote: > > > > Android didn't send the signal, the kernel did (SIGCHLD). > > > > > > > > Like this: > > > > > > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally > > > > unrelated to firmware loading] > > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which > > > > ends up calling request_firmware() kernel side > > > > 3) The firmware loading fallback mechanism is used, the request is sent to > > > > userspace and pid 1 waits in the kernel on wait_* > > > > 4) before firmware loading completes pid 42 dies (for any reason - in my > > > > case normal termination) > > Martin just to be clear, by "normal case termination" do you mean > > completing successfully ?? Ie the firmware actually did make it onto > > the device ? > > The firmware did *not* make it onto the device since the request_firmware() > call returned an error > (the code that would have transfered it to the device is only executed > following a successful request_firmware) > > The process that terminates normally is unrelated to firmware loading as I > said above. > > The only things that matter are: > - It is a child process of the process that calls request_firmware() > - It terminates *while* the the wait_ is still in progress > > > Here is a way of reproducing the problem using the test_firmware module > (which I only just saw) on normal linux with no Android or custom driver > > > #!/bin/sh > set -e > > # Make sure the system firmware loader doesn't get in the way > /etc/init.d/udev stop > > modprobe test_firmware > > DIR=/sys/devices/virtual/misc/test_firmware > > echo 10 >/sys/class/firmware/timeout; > sleep 2 & > echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request; > > > > If run with the "sleep 2 &" it terminates after 2 seconds > If the sleep is commented it runs for the expected 10 seconds (the firmware > loading timeout) > > Since the sleep process is a child of the script process requesting a > firmware load its death causes a SIGCHLD causing request_firmware() to abort > prematurely. Thanks this could mean we also *should* trigger a failure if init is issuing modprobe on a series of drivers and one completes before another while request_firmware() is called on init or probe on a subsequent driver. If true I'm surprised this never was reported back when the fallback mechanism was popular, I suppose it was not an issue given most firmware *was* present on /lib/firmware/ and the direct filesystem lookup first step always found the firmware first, so this would only be an issue for folks relying on the fallback mechanism exclusively. Will include a test case based on your above script. Thanks! Luis