Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754523AbcKIVRr (ORCPT ); Wed, 9 Nov 2016 16:17:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:57135 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753170AbcKIVRq (ORCPT ); Wed, 9 Nov 2016 16:17:46 -0500 Date: Wed, 9 Nov 2016 22:17:41 +0100 From: "Luis R. Rodriguez" To: Daniel Wagner Cc: linux-kernel@vger.kernel.org, Ming Lei , "Luis R . Rodriguez" , Greg Kroah-Hartman , "Srivatsa S . Bhat" , "Rafael J. Wysocki" , Daniel Vetter , Takashi Iwai , Bjorn Andersson , Arend van Spriel , Daniel Wagner , Andy Lutomirski , Linus Torvalds , Harald Hoyer , Jiri Kosina , Johannes Berg , Jouni Malinen , Kees Cook , Tom Gundersen , Kay Sievers , Josh Boyer , Dmitry Torokhov , Seth Forshee , Yves-Alexis Perez , Mimi Zohar Subject: Re: [PATCH v6 1/6] firmware: move umh locking code into fw_load_from_user_helper() Message-ID: <20161109211741.GI13978@wotan.suse.de> References: <1476957132-27818-1-git-send-email-wagi@monom.org> <1476957132-27818-2-git-send-email-wagi@monom.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1476957132-27818-2-git-send-email-wagi@monom.org> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11260 Lines: 246 Not trimming the thread as others were not Cc'd so letting them review my review at the bottom. This review also includes quite a bit of a huge summary of some serious complexities of the UMH which are not really well understood or documented. This all relates to your patch and the ongoing conversation on the pivot_root() races. On Thu, Oct 20, 2016 at 11:52:07AM +0200, Daniel Wagner wrote: > From: Daniel Wagner > > When we load the firmware directly we don't need to take the umh > lock(1). So move this part inside fw_load_from_user_helper which is only > available when CONFIG_FW_LOADER_USER_HELPER is set. > > This avoids a dependency on firmware_loading_timeout() even when > !CONFIG_FW_LOADER_USER_HELPER. firmware_loading_timeout() will be moved > into the CONFIG_FW_LOADER_USER_HELPER section later. > > The usermodehelper locking code was added by b298d289c792 ("PM / Sleep: > Fix freezer failures due to racy usermodehelper_is_disabled()"). > > (1) Luis analized this: > > The original goal of the usermode helper lock came can be traced back in > time via a144c6a ("PM: Print a warning if firmware is requested when > task are frozen) when Rafael noticed drivers were calling to request > firmware on *resume* calls! Why would that be an issue? It was because > of the stupid delays incurred on resume when firmware *was not found* > __due__ to the stupid user mode helper timeout delay and people > believing this often meant users confusing resume *stalling* for resume > was *broken! Later b298d289c7 > ("PM / Sleep: Fix freezer failures due to racy > usermodehelper_is_disabled()") addressed races. That code in turn was > later massaged into shape to better accept direct FS loading via commit > 4e0c92d015235 ("firmware: Refactoring for splitting user-mode helper > code"). > > So for a while we've been nagging driver developers to not add > request_firmware() calls on resume calls. In fact the > drivers/base/firmware_class.c code *kills* firmware UMH calls when we go > to suspend for the firmware cache, as such even on suspend its a stupid > idea to use the UMH, not only because it can stall suspend but *also* > because we kill these UMH calls. > > [...] > > So, as I see it the user mode helper lock should have *nothing* to do > with the race between reading files from the kernel and having a path > ready. If it was *used* for that, that was papering over the real > issue. Its no different of a hack for instance as if a driver using > request_firmware() tried to use async probe to avoid the same race. Yes > it will help, but no, it does not mean it is deterministically safe. > > Reviewing commit 247bc03742545 ("PM / Sleep: Mitigate race between the freezer > and request_firmware()") which originally extended umh state machine from > just being enabled/disabled, with the concepts of UMH_ENABLED, UMH_FREEZING, > UMH_DISABLED -- its goal was to prevent UMH uses during suspend. So -- the > "UMH lock" on firmware was actually added to help avoid races between freezing > and request_firmware(). We should not re-use UMH status notifiers when the > firmware UMH is disabled for the same concepts -- if we needed such a concept > then we should take this out from UMH code and generalize it. > > Signed-off-by: Daniel Wagner > Cc: Ming Lei > Cc: Luis R. Rodriguez > Cc: Greg Kroah-Hartman > --- > drivers/base/firmware_class.c | 52 +++++++++++++++++++++++-------------------- > 1 file changed, 28 insertions(+), 24 deletions(-) > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index 960f8f7c7aa2..d4fee06b67f3 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -981,13 +981,38 @@ static int fw_load_from_user_helper(struct firmware *firmware, > unsigned int opt_flags, long timeout) > { > struct firmware_priv *fw_priv; > + int ret; > + > + timeout = firmware_loading_timeout(); > + if (opt_flags & FW_OPT_NOWAIT) { > + timeout = usermodehelper_read_lock_wait(timeout); > + if (!timeout) { > + dev_dbg(device, "firmware: %s loading timed out\n", > + name); > + return -EBUSY; > + } > + } else { > + ret = usermodehelper_read_trylock(); > + if (WARN_ON(ret)) { > + dev_err(device, "firmware: %s will not be loaded\n", > + name); > + return ret; > + } > + } > > fw_priv = fw_create_instance(firmware, name, device, opt_flags); > - if (IS_ERR(fw_priv)) > - return PTR_ERR(fw_priv); > + if (IS_ERR(fw_priv)) { > + ret = PTR_ERR(fw_priv); > + goto release_lock; > + } > > fw_priv->buf = firmware->priv; > - return _request_firmware_load(fw_priv, opt_flags, timeout); > + ret = _request_firmware_load(fw_priv, opt_flags, timeout); > + > +release_lock: > + usermodehelper_read_unlock(); > + > + return ret; > } > > #ifdef CONFIG_PM_SLEEP > @@ -1150,25 +1175,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name, > if (ret <= 0) /* error or already assigned */ > goto out; > > - ret = 0; > - timeout = firmware_loading_timeout(); > - if (opt_flags & FW_OPT_NOWAIT) { > - timeout = usermodehelper_read_lock_wait(timeout); > - if (!timeout) { > - dev_dbg(device, "firmware: %s loading timed out\n", > - name); > - ret = -EBUSY; > - goto out; > - } > - } else { > - ret = usermodehelper_read_trylock(); > - if (WARN_ON(ret)) { > - dev_err(device, "firmware: %s will not be loaded\n", > - name); > - goto out; > - } > - } > - > ret = fw_get_filesystem_firmware(device, fw->priv); > if (ret) { > if (!(opt_flags & FW_OPT_NO_WARN)) > @@ -1185,8 +1191,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name, > if (!ret) > ret = assign_firmware_buf(fw, device, opt_flags); > > - usermodehelper_read_unlock(); > - > out: > if (ret < 0) { > release_firmware(fw); While this move is genuinely correct, lets recap few things: These days most distributions enable CONFIG_FW_LOADER_USER_HELPER but disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK the reason being a huge backlash against it. Furthermore its important to recall that systemd udev removed the userspace firmware loader back in 2014. Five important things to consider for this patch set in light of this: --- 0) The firmware cache is setup by adding a devres entry for each device that uses a sync call. If an async call is used the firmware cache is only set up for adevice if the UMH was not explicitly requested, that is if the second argument to request_firmware_nowait() is false. The firmware devres entry is maintained throughout the lifetime of the device, so even if you release_firmware() the firmware cache is still used on suspend today. 1) The UMH lock was originally added by Rafael to help splat a warning if the firmware API was used on resume, this was due to race issues that started creeping up and also that the firmware API with the usermode helper stalled suspend as many folks were implementing their own caching techniques on suspend and if this relied on the UMH and the firmware was not found it would stall suspend... The firmware cache cache added by Ming long ago fixed the race concerns as it implemented our own cache solution using devres so drivers no longer have to implement their own solution, it requests firmware for each device we have added a firmware devres for (explained above on item 0) prior to suspend, then upon resume if the driver request firmware it Although the lock helps protect races against suspend/resume calls the lock also prevents another race: looking for firmware not built-in if boot is too early in the process, we enable moving forward with with a direct filesystem lookup for firmware on init/main.c after usermodehelper_enable() is called right before do_initcalls() -- before we start calling each kernel level init calls. Naturally this introduces some races if you are not using initramfs, these races are being discussed elsewhere [0]. [0] https://lkml.kernel.org/r/20161108224726.GD13978@wotan.suse.de 2) The UMH lock is currently *always* used for both sync and async calls whether or not the UMH is used or required but only after the built-in firmware is checked, if the firmware is in built-in we move on with life and give it immediately. 3) The UMH is *only* used if: If you have CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled (most distros disable this now) the UMH will be used on both all sync and async calls. Because of this kernel configuration we cannot simply remove the UMH, given some kernels exist where they may have relied upon every firmware API call to use the UMH. If you do not have CONFIG_FW_LOADER_USER_HELPER_FALLBACK but do have CONFIG_FW_LOADER_USER_HELPER enabled you can only use and request the UMH to be used *iff* an async call explicitly required it. Since most Linux distributions fall in this build category in practice these days there are only a few drivers explicitly asking for the UMH, last I checked 2 drivers and they had a valid UMH requirement. Since I had not heard of *anyone* complain about compartamentalizing the UMH slowly, and given all the complexities above with the firmware cache and history of systemd interaction with udev (now removed) I have been working on grammar rules with Coccinelle to enable us to police for new UMH callers, and white-list with annotations only the callers that really need the UMH (2 drivers). --- There's two subtle issues then with moving the UMH lock to only the code paths that need the UMH -- since the UMH lock was done to help warn of firmware API callers on resume (but we now have a fw cache solution) *and* races on init, moving the UMH lock to only the code that needs the UMH could introduce races for *non-UMH* callers on init. The fw cache implemented should help issues on suspend/resume as drivers no longer need to implement their own fw calls on suspend/resume if they do not depend on the UMH, but races / issues are still possible on init if we remove the lock for them. To move the UMH lock then we need a replacement for this early race, but if you think about it -- all things reading from the FS need this race addressed as well, so now all callers of kernel_read_file_from_path(). Fixing this generally is not something I think we can do easily at this point though so I'd be content if you fix this only for the firmware_class as that is the only user of the UMH lock, the goal would be to remove it. If you want to be adventurous I suppose you can try to address this generally. There is technically also an issue for drivers that rely on the UMH and races of using the firmware API upon suspend and resume, the fw cache cannot work for them so they must implement their own solution and if they issue a fw call on suspend or resume they may race against the filesystem either being gone or not ready, respectively. To address this I think we should consider using the filesystem suspend freeze_super() so we queue up requests as we're going to suspend, this should in theory not be needed once Jiri's work on freeze_all_supers() is in place though which I think lets us get this automatically (?) Luis