2016-10-20 09:52:28

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 0/6] firmware: encapsulate firmware loading status

From: Daniel Wagner <[email protected]>

Hi,

Here is a updated version with the input from Luis.

The firmware user helper code tracks the current state of the loading
process via an member of struct firmware_buf and a completion. Let's
encapsulate this simple state machine into struct fw_status. The aim is
to increase readability and reduce the usage of the fw_lock.

I tested this series with fw_userhelper.sh and fw_filesystem.sh under
kvm and also let it run on real hardware. You can get the series as git tree:

https://git.kernel.org/cgit/linux/kernel/git/wagi/linux.git/log/?h=firmware_async-9

cheers,
daniel


This series depends on Luis' "firmware: add SmPL grammar to avoid issues"
series:

http://marc.info/?l=linux-kernel&m=147320896231418&w=2

changes since v5:
- add enum name (enum fw_status)
- renamed function prefix and structure fw_umh to fw_state (avoid to
clash with enum fw_status)
- add back patch 1 from v3
- move unused functions for direct firmeware loading into UMH section

changes since v4:
- replaced "firmware: Move umh locking code into fw_load_from_user_helper()"
with "firmware: document user mode helper lock usage"
- changed prefix fw_status_ to fw_umh_
- fixed a couple of bux pointed out by Ming
- changed type of fw_umh::status to u8 and updated commit message to
point out that all states are exclusive

changes since v3:
- added 'firmware: Move umh locking code into
fw_load_from_user_helper()'
- dropped loading_tiemout and firmware_loading_time() for
!CONFG_FW_LOADER_USER_HELPER
- rebased on Luis patches

changes since v2:
- more splitting out
- first patch factors out all the bit ops into fw_status
- second patch gets rid of the bit ops
- third get rid of fw_lock by using swait

changes since v1:
- moved swait change into its own patch
- added ifdef section for FW_LOADER_USER_HELPER_FALLBACK
- updated commit message highlighting the mutex usage drop a bit

Daniel Wagner (6):
firmware: move umh locking code into fw_load_from_user_helper()
firmware: encapsulate firmware loading status
firmware: rename fw_load_from_user_helper() and
_request_firmware_load()
firmware: drop bit ops in favor of simple state machine
firmware: do not use fw_lock for fw_state protection
firmware: move fw_state_is_{aborted|done}() into UHM section

drivers/base/firmware_class.c | 226 ++++++++++++++++++++++++------------------
1 file changed, 131 insertions(+), 95 deletions(-)

--
2.7.4


2016-10-20 09:52:31

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 4/6] firmware: drop bit ops in favor of simple state machine

From: Daniel Wagner <[email protected]>

We track the state of the firmware loading with bit ops. Since the
state machine has only a few states and they are all mutual exclusive
there are only a few simple state transition we can model this simplify.

UNKNOWN -> LOADING -> DONE | ABORTED

Because we don't use any bit ops on fw_state::status anymore we are able
to change the data type to enum fw_status and update the function
arguments accordingly.

READ_ONCE() and WRITE_ONCE() are propably not needed because there are a
lot of load and stores around fw_st->status. But let's make it explicit
and not be sorry later.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index a6916a7aaeb2..a1dab6a792f1 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -106,7 +106,7 @@ enum fw_status {
*/
struct fw_state {
struct completion completion;
- unsigned long status;
+ enum fw_status status;
};

static void fw_state_init(struct fw_state *fw_st)
@@ -117,7 +117,7 @@ static void fw_state_init(struct fw_state *fw_st)

static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
{
- return test_bit(status, &fw_st->status);
+ return fw_st->status == status;
}

static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
@@ -126,7 +126,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)

ret = wait_for_completion_interruptible_timeout(&fw_st->completion,
timeout);
- if (ret != 0 && test_bit(FW_STATUS_ABORTED, &fw_st->status))
+ if (ret != 0 && READ_ONCE(fw_st->status) == FW_STATUS_ABORTED)
return -ENOENT;

return ret;
@@ -135,12 +135,10 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
static void __fw_state_set(struct fw_state *fw_st,
enum fw_status status)
{
- set_bit(status, &fw_st->status);
+ WRITE_ONCE(fw_st->status, status);

- if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED) {
- clear_bit(FW_STATUS_LOADING, &fw_st->status);
+ if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED)
complete_all(&fw_st->completion);
- }
}

#define fw_state_start(fw_st) \
--
2.7.4

2016-10-20 09:52:44

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 6/6] firmware: move fw_state_is_{aborted|done}() into UHM section

From: Daniel Wagner <[email protected]>

For direct firmware loading we do not need to read the status after
fw_state_wait() returns. It either returns success or the error
code. Also assign_firmware_buf() is only called if there was no
error (especially ABORTED has been detected by
fw_get_umh_firmware()).

So we are allowed to move the remaining fw_state_is_{aborted|done}()
defintion into the !CONFIG_FW_LOADER_USER_HELPER section.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4f1755537314..813587f2ccbe 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -116,11 +116,6 @@ static void fw_state_init(struct fw_state *fw_st)
fw_st->status = FW_STATUS_UNKNOWN;
}

-static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
-{
- return fw_st->status == status;
-}
-
static inline bool __fw_state_is_done(enum fw_status status)
{
return status == FW_STATUS_DONE || status == FW_STATUS_ABORTED;
@@ -152,16 +147,10 @@ static void __fw_state_set(struct fw_state *fw_st,
__fw_state_set(fw_st, FW_STATUS_LOADING)
#define fw_state_done(fw_st) \
__fw_state_set(fw_st, FW_STATUS_DONE)
-#define fw_state_is_done(fw_st) \
- __fw_state_check(fw_st, FW_STATUS_DONE)
#define fw_state_wait(fw_st) \
__fw_state_wait_common(fw_st, MAX_SCHEDULE_TIMEOUT)

-#ifndef CONFIG_FW_LOADER_USER_HELPER
-
-#define fw_state_is_aborted(fw_st) false
-
-#else /* CONFIG_FW_LOADER_USER_HELPER */
+#ifdef CONFIG_FW_LOADER_USER_HELPER

static int loading_timeout = 60; /* In seconds */

@@ -170,8 +159,15 @@ static inline long firmware_loading_timeout(void)
return loading_timeout > 0 ? loading_timeout * HZ : MAX_JIFFY_OFFSET;
}

+static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
+{
+ return fw_st->status == status;
+}
+
#define fw_state_aborted(fw_st) \
__fw_state_set(fw_st, FW_STATUS_ABORTED)
+#define fw_state_is_done(fw_st) \
+ __fw_state_check(fw_st, FW_STATUS_DONE)
#define fw_state_is_loading(fw_st) \
__fw_state_check(fw_st, FW_STATUS_LOADING)
#define fw_state_is_aborted(fw_st) \
@@ -1156,7 +1152,7 @@ static int assign_firmware_buf(struct firmware *fw, struct device *device,
struct firmware_buf *buf = fw->priv;

mutex_lock(&fw_lock);
- if (!buf->size || fw_state_is_aborted(&buf->fw_st)) {
+ if (!buf->size) {
mutex_unlock(&fw_lock);
return -ENOENT;
}
--
2.7.4

2016-10-20 09:52:47

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 5/6] firmware: do not use fw_lock for fw_state protection

From: Daniel Wagner <[email protected]>

fw_lock is to use to protect 'corner cases' inside firmware_class. It
is not exactly clear what those corner cases are nor what it exactly
protects. fw_state can be used without needing the fw_lock to protect
its state transition and wake ups.

fw_state is holds the state in status and the completion is used to
wake up all waiters (in this case that is the user land helper so only
one). This operation has to be 'atomic' to avoid races. We can do this
by using swait which takes care we don't miss any wake up.

We use also swait instead of wait because don't need all the additional
features wait provides.

Note there some more cleanups possible after with this change. For
example for !CONFIG_FW_LOADER_USER_HELPER we don't check for the state
anymore. Let's to this in the next patch instead mingling to many
changes into this one. And yes you get a gcc warning "‘__fw_state_check’
defined but not used [-Wunused-function] code." for the time beeing.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 52 +++++++++++++------------------------------
1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index a1dab6a792f1..4f1755537314 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -30,6 +30,7 @@
#include <linux/syscore_ops.h>
#include <linux/reboot.h>
#include <linux/security.h>
+#include <linux/swait.h>

#include <generated/utsrelease.h>

@@ -105,13 +106,13 @@ enum fw_status {
* firmwareq.
*/
struct fw_state {
- struct completion completion;
+ struct swait_queue_head wq;
enum fw_status status;
};

static void fw_state_init(struct fw_state *fw_st)
{
- init_completion(&fw_st->completion);
+ init_swait_queue_head(&fw_st->wq);
fw_st->status = FW_STATUS_UNKNOWN;
}

@@ -120,25 +121,31 @@ static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
return fw_st->status == status;
}

+static inline bool __fw_state_is_done(enum fw_status status)
+{
+ return status == FW_STATUS_DONE || status == FW_STATUS_ABORTED;
+}
+
static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
{
int ret;

- ret = wait_for_completion_interruptible_timeout(&fw_st->completion,
- timeout);
- if (ret != 0 && READ_ONCE(fw_st->status) == FW_STATUS_ABORTED)
+ ret = swait_event_interruptible_timeout(fw_st->wq,
+ __fw_state_is_done(READ_ONCE(fw_st->status)),
+ timeout);
+ if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
return -ENOENT;

return ret;
}

static void __fw_state_set(struct fw_state *fw_st,
- enum fw_status status)
+ enum fw_status status)
{
WRITE_ONCE(fw_st->status, status);

if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED)
- complete_all(&fw_st->completion);
+ swake_up(&fw_st->wq);
}

#define fw_state_start(fw_st) \
@@ -374,14 +381,6 @@ static const char * const fw_path[] = {
module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path");

-static void fw_finish_direct_load(struct device *device,
- struct firmware_buf *buf)
-{
- mutex_lock(&fw_lock);
- fw_state_done(&buf->fw_st);
- mutex_unlock(&fw_lock);
-}
-
static int
fw_get_filesystem_firmware(struct device *device, struct firmware_buf *buf)
{
@@ -428,7 +427,7 @@ fw_get_filesystem_firmware(struct device *device, struct firmware_buf *buf)
}
dev_dbg(device, "direct-loading %s\n", buf->fw_id);
buf->size = size;
- fw_finish_direct_load(device, buf);
+ fw_state_done(&buf->fw_st);
break;
}
__putname(path);
@@ -1106,25 +1105,6 @@ static inline void kill_requests_without_uevent(void) { }

#endif /* CONFIG_FW_LOADER_USER_HELPER */

-/* wait until the shared firmware_buf becomes ready (or error) */
-static int sync_cached_firmware_buf(struct firmware_buf *buf)
-{
- int ret = 0;
-
- mutex_lock(&fw_lock);
- while (!fw_state_is_done(&buf->fw_st)) {
- if (fw_state_is_aborted(&buf->fw_st)) {
- ret = -ENOENT;
- break;
- }
- mutex_unlock(&fw_lock);
- ret = fw_state_wait(&buf->fw_st);
- mutex_lock(&fw_lock);
- }
- mutex_unlock(&fw_lock);
- return ret;
-}
-
/* prepare firmware and firmware_buf structs;
* return 0 if a firmware is already assigned, 1 if need to load one,
* or a negative error code
@@ -1158,7 +1138,7 @@ _request_firmware_prepare(struct firmware **firmware_p, const char *name,
firmware->priv = buf;

if (ret > 0) {
- ret = sync_cached_firmware_buf(buf);
+ ret = fw_state_wait(&buf->fw_st);
if (!ret) {
fw_set_page_data(buf, firmware);
return 0; /* assigned */
--
2.7.4

2016-10-20 09:53:42

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 1/6] firmware: move umh locking code into fw_load_from_user_helper()

From: Daniel Wagner <[email protected]>

When we load the firmware directly we don't need to take the umh
lock(1). So move this part inside fw_load_from_user_helper which is only
available when CONFIG_FW_LOADER_USER_HELPER is set.

This avoids a dependency on firmware_loading_timeout() even when
!CONFIG_FW_LOADER_USER_HELPER. firmware_loading_timeout() will be moved
into the CONFIG_FW_LOADER_USER_HELPER section later.

The usermodehelper locking code was added by b298d289c792 ("PM / Sleep:
Fix freezer failures due to racy usermodehelper_is_disabled()").

(1) Luis analized this:

The original goal of the usermode helper lock came can be traced back in
time via a144c6a ("PM: Print a warning if firmware is requested when
task are frozen) when Rafael noticed drivers were calling to request
firmware on *resume* calls! Why would that be an issue? It was because
of the stupid delays incurred on resume when firmware *was not found*
__due__ to the stupid user mode helper timeout delay and people
believing this often meant users confusing resume *stalling* for resume
was *broken! Later b298d289c7
("PM / Sleep: Fix freezer failures due to racy
usermodehelper_is_disabled()") addressed races. That code in turn was
later massaged into shape to better accept direct FS loading via commit
4e0c92d015235 ("firmware: Refactoring for splitting user-mode helper
code").

So for a while we've been nagging driver developers to not add
request_firmware() calls on resume calls. In fact the
drivers/base/firmware_class.c code *kills* firmware UMH calls when we go
to suspend for the firmware cache, as such even on suspend its a stupid
idea to use the UMH, not only because it can stall suspend but *also*
because we kill these UMH calls.

[...]

So, as I see it the user mode helper lock should have *nothing* to do
with the race between reading files from the kernel and having a path
ready. If it was *used* for that, that was papering over the real
issue. Its no different of a hack for instance as if a driver using
request_firmware() tried to use async probe to avoid the same race. Yes
it will help, but no, it does not mean it is deterministically safe.

Reviewing commit 247bc03742545 ("PM / Sleep: Mitigate race between the freezer
and request_firmware()") which originally extended umh state machine from
just being enabled/disabled, with the concepts of UMH_ENABLED, UMH_FREEZING,
UMH_DISABLED -- its goal was to prevent UMH uses during suspend. So -- the
"UMH lock" on firmware was actually added to help avoid races between freezing
and request_firmware(). We should not re-use UMH status notifiers when the
firmware UMH is disabled for the same concepts -- if we needed such a concept
then we should take this out from UMH code and generalize it.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 52 +++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 960f8f7c7aa2..d4fee06b67f3 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -981,13 +981,38 @@ static int fw_load_from_user_helper(struct firmware *firmware,
unsigned int opt_flags, long timeout)
{
struct firmware_priv *fw_priv;
+ int ret;
+
+ timeout = firmware_loading_timeout();
+ if (opt_flags & FW_OPT_NOWAIT) {
+ timeout = usermodehelper_read_lock_wait(timeout);
+ if (!timeout) {
+ dev_dbg(device, "firmware: %s loading timed out\n",
+ name);
+ return -EBUSY;
+ }
+ } else {
+ ret = usermodehelper_read_trylock();
+ if (WARN_ON(ret)) {
+ dev_err(device, "firmware: %s will not be loaded\n",
+ name);
+ return ret;
+ }
+ }

fw_priv = fw_create_instance(firmware, name, device, opt_flags);
- if (IS_ERR(fw_priv))
- return PTR_ERR(fw_priv);
+ if (IS_ERR(fw_priv)) {
+ ret = PTR_ERR(fw_priv);
+ goto release_lock;
+ }

fw_priv->buf = firmware->priv;
- return _request_firmware_load(fw_priv, opt_flags, timeout);
+ ret = _request_firmware_load(fw_priv, opt_flags, timeout);
+
+release_lock:
+ usermodehelper_read_unlock();
+
+ return ret;
}

#ifdef CONFIG_PM_SLEEP
@@ -1150,25 +1175,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
if (ret <= 0) /* error or already assigned */
goto out;

- ret = 0;
- timeout = firmware_loading_timeout();
- if (opt_flags & FW_OPT_NOWAIT) {
- timeout = usermodehelper_read_lock_wait(timeout);
- if (!timeout) {
- dev_dbg(device, "firmware: %s loading timed out\n",
- name);
- ret = -EBUSY;
- goto out;
- }
- } else {
- ret = usermodehelper_read_trylock();
- if (WARN_ON(ret)) {
- dev_err(device, "firmware: %s will not be loaded\n",
- name);
- goto out;
- }
- }
-
ret = fw_get_filesystem_firmware(device, fw->priv);
if (ret) {
if (!(opt_flags & FW_OPT_NO_WARN))
@@ -1185,8 +1191,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
if (!ret)
ret = assign_firmware_buf(fw, device, opt_flags);

- usermodehelper_read_unlock();
-
out:
if (ret < 0) {
release_firmware(fw);
--
2.7.4

2016-10-20 09:53:41

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 2/6] firmware: encapsulate firmware loading status

From: Daniel Wagner <[email protected]>

The firmware loader tracks the current state of the loading
process via unsigned long status and a completion in struct
firmware_buf. Instead of open code tracking the state introduce data
structure which encapsulate the state tracking and synchronization.

While at it also separate UHM states from direct loading states, e.g.
the loading_timeout is only defined when CONFIG_FW_LOADER_USER_HELPER.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 129 ++++++++++++++++++++++++++++++------------
1 file changed, 94 insertions(+), 35 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index d4fee06b67f3..b7a9fa1c5197 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -91,12 +91,73 @@ static inline bool fw_is_builtin_firmware(const struct firmware *fw)
}
#endif

-enum {
+enum fw_status {
+ FW_STATUS_UNKNOWN,
FW_STATUS_LOADING,
FW_STATUS_DONE,
- FW_STATUS_ABORT,
+ FW_STATUS_ABORTED,
};

+/*
+ * Synchronize request_firmware() calls and defered loading of
+ * firmwares via user mode helper. It is also necassary for direct
+ * loading, that is if there are concurrent request for the same
+ * firmwareq.
+ */
+struct fw_state {
+ struct completion completion;
+ unsigned long status;
+};
+
+static void fw_state_init(struct fw_state *fw_st)
+{
+ init_completion(&fw_st->completion);
+ fw_st->status = FW_STATUS_UNKNOWN;
+}
+
+static int __fw_state_check(struct fw_state *fw_st, enum fw_status status)
+{
+ return test_bit(status, &fw_st->status);
+}
+
+static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
+{
+ int ret;
+
+ ret = wait_for_completion_interruptible_timeout(&fw_st->completion,
+ timeout);
+ if (ret != 0 && test_bit(FW_STATUS_ABORTED, &fw_st->status))
+ return -ENOENT;
+
+ return ret;
+}
+
+static void __fw_state_set(struct fw_state *fw_st,
+ enum fw_status status)
+{
+ set_bit(status, &fw_st->status);
+
+ if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED) {
+ clear_bit(FW_STATUS_LOADING, &fw_st->status);
+ complete_all(&fw_st->completion);
+ }
+}
+
+#define fw_state_start(fw_st) \
+ __fw_state_set(fw_st, FW_STATUS_LOADING)
+#define fw_state_done(fw_st) \
+ __fw_state_set(fw_st, FW_STATUS_DONE)
+#define fw_state_is_done(fw_st) \
+ __fw_state_check(fw_st, FW_STATUS_DONE)
+#define fw_state_wait(fw_st) \
+ __fw_state_wait_common(fw_st, MAX_SCHEDULE_TIMEOUT)
+
+#ifndef CONFIG_FW_LOADER_USER_HELPER
+
+#define fw_state_is_aborted(fw_st) false
+
+#else /* CONFIG_FW_LOADER_USER_HELPER */
+
static int loading_timeout = 60; /* In seconds */

static inline long firmware_loading_timeout(void)
@@ -104,6 +165,17 @@ static inline long firmware_loading_timeout(void)
return loading_timeout > 0 ? loading_timeout * HZ : MAX_JIFFY_OFFSET;
}

+#define fw_state_aborted(fw_st) \
+ __fw_state_set(fw_st, FW_STATUS_ABORTED)
+#define fw_state_is_loading(fw_st) \
+ __fw_state_check(fw_st, FW_STATUS_LOADING)
+#define fw_state_is_aborted(fw_st) \
+ __fw_state_check(fw_st, FW_STATUS_ABORTED)
+#define fw_state_wait_timeout(fw_st, timeout) \
+ __fw_state_wait_common(fw_st, timeout)
+
+#endif /* CONFIG_FW_LOADER_USER_HELPER */
+
/* firmware behavior options */
#define FW_OPT_UEVENT (1U << 0)
#define FW_OPT_NOWAIT (1U << 1)
@@ -145,9 +217,8 @@ struct firmware_cache {
struct firmware_buf {
struct kref ref;
struct list_head list;
- struct completion completion;
struct firmware_cache *fwc;
- unsigned long status;
+ struct fw_state fw_st;
void *data;
size_t size;
size_t allocated_size;
@@ -205,7 +276,7 @@ static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
buf->fwc = fwc;
buf->data = dbuf;
buf->allocated_size = size;
- init_completion(&buf->completion);
+ fw_state_init(&buf->fw_st);
#ifdef CONFIG_FW_LOADER_USER_HELPER
INIT_LIST_HEAD(&buf->pending_list);
#endif
@@ -309,8 +380,7 @@ static void fw_finish_direct_load(struct device *device,
struct firmware_buf *buf)
{
mutex_lock(&fw_lock);
- set_bit(FW_STATUS_DONE, &buf->status);
- complete_all(&buf->completion);
+ fw_state_done(&buf->fw_st);
mutex_unlock(&fw_lock);
}

@@ -478,12 +548,11 @@ static void __fw_load_abort(struct firmware_buf *buf)
* There is a small window in which user can write to 'loading'
* between loading done and disappearance of 'loading'
*/
- if (test_bit(FW_STATUS_DONE, &buf->status))
+ if (fw_state_is_done(&buf->fw_st))
return;

list_del_init(&buf->pending_list);
- set_bit(FW_STATUS_ABORT, &buf->status);
- complete_all(&buf->completion);
+ fw_state_aborted(&buf->fw_st);
}

static void fw_load_abort(struct firmware_priv *fw_priv)
@@ -496,9 +565,6 @@ static void fw_load_abort(struct firmware_priv *fw_priv)
fw_priv->buf = NULL;
}

-#define is_fw_load_aborted(buf) \
- test_bit(FW_STATUS_ABORT, &(buf)->status)
-
static LIST_HEAD(pending_fw_head);

/* reboot notifier for avoid deadlock with usermode_lock */
@@ -598,7 +664,7 @@ static ssize_t firmware_loading_show(struct device *dev,

mutex_lock(&fw_lock);
if (fw_priv->buf)
- loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
+ loading = fw_state_is_loading(&fw_priv->buf->fw_st);
mutex_unlock(&fw_lock);

return sprintf(buf, "%d\n", loading);
@@ -653,23 +719,20 @@ static ssize_t firmware_loading_store(struct device *dev,
switch (loading) {
case 1:
/* discarding any previous partial load */
- if (!test_bit(FW_STATUS_DONE, &fw_buf->status)) {
+ if (!fw_state_is_done(&fw_buf->fw_st)) {
for (i = 0; i < fw_buf->nr_pages; i++)
__free_page(fw_buf->pages[i]);
vfree(fw_buf->pages);
fw_buf->pages = NULL;
fw_buf->page_array_size = 0;
fw_buf->nr_pages = 0;
- set_bit(FW_STATUS_LOADING, &fw_buf->status);
+ fw_state_start(&fw_buf->fw_st);
}
break;
case 0:
- if (test_bit(FW_STATUS_LOADING, &fw_buf->status)) {
+ if (fw_state_is_loading(&fw_buf->fw_st)) {
int rc;

- set_bit(FW_STATUS_DONE, &fw_buf->status);
- clear_bit(FW_STATUS_LOADING, &fw_buf->status);
-
/*
* Several loading requests may be pending on
* one same firmware buf, so let all requests
@@ -691,10 +754,11 @@ static ssize_t firmware_loading_store(struct device *dev,
*/
list_del_init(&fw_buf->pending_list);
if (rc) {
- set_bit(FW_STATUS_ABORT, &fw_buf->status);
+ fw_state_aborted(&fw_buf->fw_st);
written = rc;
+ } else {
+ fw_state_done(&fw_buf->fw_st);
}
- complete_all(&fw_buf->completion);
break;
}
/* fallthrough */
@@ -755,7 +819,7 @@ static ssize_t firmware_data_read(struct file *filp, struct kobject *kobj,

mutex_lock(&fw_lock);
buf = fw_priv->buf;
- if (!buf || test_bit(FW_STATUS_DONE, &buf->status)) {
+ if (!buf || fw_state_is_done(&buf->fw_st)) {
ret_count = -ENODEV;
goto out;
}
@@ -842,7 +906,7 @@ static ssize_t firmware_data_write(struct file *filp, struct kobject *kobj,

mutex_lock(&fw_lock);
buf = fw_priv->buf;
- if (!buf || test_bit(FW_STATUS_DONE, &buf->status)) {
+ if (!buf || fw_state_is_done(&buf->fw_st)) {
retval = -ENODEV;
goto out;
}
@@ -955,8 +1019,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
timeout = MAX_JIFFY_OFFSET;
}

- retval = wait_for_completion_interruptible_timeout(&buf->completion,
- timeout);
+ retval = fw_state_wait_timeout(&buf->fw_st, timeout);
if (retval == -ERESTARTSYS || !retval) {
mutex_lock(&fw_lock);
fw_load_abort(fw_priv);
@@ -965,7 +1028,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
retval = 0;
}

- if (is_fw_load_aborted(buf))
+ if (fw_state_is_aborted(&buf->fw_st))
retval = -EAGAIN;
else if (buf->is_paged_buf && !buf->data)
retval = -ENOMEM;
@@ -1040,29 +1103,25 @@ fw_load_from_user_helper(struct firmware *firmware, const char *name,
return -ENOENT;
}

-/* No abort during direct loading */
-#define is_fw_load_aborted(buf) false
-
#ifdef CONFIG_PM_SLEEP
static inline void kill_requests_without_uevent(void) { }
#endif

#endif /* CONFIG_FW_LOADER_USER_HELPER */

-
/* wait until the shared firmware_buf becomes ready (or error) */
static int sync_cached_firmware_buf(struct firmware_buf *buf)
{
int ret = 0;

mutex_lock(&fw_lock);
- while (!test_bit(FW_STATUS_DONE, &buf->status)) {
- if (is_fw_load_aborted(buf)) {
+ while (!fw_state_is_done(&buf->fw_st)) {
+ if (fw_state_is_aborted(&buf->fw_st)) {
ret = -ENOENT;
break;
}
mutex_unlock(&fw_lock);
- ret = wait_for_completion_interruptible(&buf->completion);
+ ret = fw_state_wait(&buf->fw_st);
mutex_lock(&fw_lock);
}
mutex_unlock(&fw_lock);
@@ -1120,7 +1179,7 @@ static int assign_firmware_buf(struct firmware *fw, struct device *device,
struct firmware_buf *buf = fw->priv;

mutex_lock(&fw_lock);
- if (!buf->size || is_fw_load_aborted(buf)) {
+ if (!buf->size || fw_state_is_aborted(&buf->fw_st)) {
mutex_unlock(&fw_lock);
return -ENOENT;
}
--
2.7.4

2016-10-20 09:53:39

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 3/6] firmware: rename fw_load_from_user_helper() and _request_firmware_load()

From: Daniel Wagner <[email protected]>

fw_load_from_user_helper() and _request_firmware_load() are used when
CONFIG_FW_LOADER_USER_HELPER is enabled. In order to clearly mark which
part of the code is depending on UMH we stream line these functions to
match with the rest of the code.

Suggested by Luis.

Signed-off-by: Daniel Wagner <[email protected]>
Acked-by: Luis R. Rodriguez <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/base/firmware_class.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index b7a9fa1c5197..a6916a7aaeb2 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -986,9 +986,8 @@ fw_create_instance(struct firmware *firmware, const char *fw_name,
return fw_priv;
}

-/* load a firmware via user helper */
-static int _request_firmware_load(struct firmware_priv *fw_priv,
- unsigned int opt_flags, long timeout)
+static int _request_firmware_umh(struct firmware_priv *fw_priv,
+ unsigned int opt_flags, long timeout)
{
int retval = 0;
struct device *f_dev = &fw_priv->dev;
@@ -1039,9 +1038,9 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
return retval;
}

-static int fw_load_from_user_helper(struct firmware *firmware,
- const char *name, struct device *device,
- unsigned int opt_flags, long timeout)
+static int fw_get_umh_firmware(struct firmware *firmware, const char *name,
+ struct device *device, unsigned int opt_flags,
+ long timeout)
{
struct firmware_priv *fw_priv;
int ret;
@@ -1070,7 +1069,7 @@ static int fw_load_from_user_helper(struct firmware *firmware,
}

fw_priv->buf = firmware->priv;
- ret = _request_firmware_load(fw_priv, opt_flags, timeout);
+ ret = _request_firmware_umh(fw_priv, opt_flags, timeout);

release_lock:
usermodehelper_read_unlock();
@@ -1096,9 +1095,9 @@ static void kill_requests_without_uevent(void)

#else /* CONFIG_FW_LOADER_USER_HELPER */
static inline int
-fw_load_from_user_helper(struct firmware *firmware, const char *name,
- struct device *device, unsigned int opt_flags,
- long timeout)
+fw_get_umh_firmware(struct firmware *firmware, const char *name,
+ struct device *device, unsigned int opt_flags,
+ long timeout)
{
return -ENOENT;
}
@@ -1242,8 +1241,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
name, ret);
if (opt_flags & FW_OPT_USERHELPER) {
dev_warn(device, "Falling back to user helper\n");
- ret = fw_load_from_user_helper(fw, name, device,
- opt_flags, timeout);
+ ret = fw_get_umh_firmware(fw, name, device,
+ opt_flags, timeout);
}
}

--
2.7.4

2016-11-09 21:17:47

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v6 1/6] firmware: move umh locking code into fw_load_from_user_helper()

Not trimming the thread as others were not Cc'd so letting them review
my review at the bottom. This review also includes quite a bit of
a huge summary of some serious complexities of the UMH which are not
really well understood or documented. This all relates to your patch
and the ongoing conversation on the pivot_root() races.

On Thu, Oct 20, 2016 at 11:52:07AM +0200, Daniel Wagner wrote:
> From: Daniel Wagner <[email protected]>
>
> When we load the firmware directly we don't need to take the umh
> lock(1). So move this part inside fw_load_from_user_helper which is only
> available when CONFIG_FW_LOADER_USER_HELPER is set.
>
> This avoids a dependency on firmware_loading_timeout() even when
> !CONFIG_FW_LOADER_USER_HELPER. firmware_loading_timeout() will be moved
> into the CONFIG_FW_LOADER_USER_HELPER section later.
>
> The usermodehelper locking code was added by b298d289c792 ("PM / Sleep:
> Fix freezer failures due to racy usermodehelper_is_disabled()").
>
> (1) Luis analized this:
>
> The original goal of the usermode helper lock came can be traced back in
> time via a144c6a ("PM: Print a warning if firmware is requested when
> task are frozen) when Rafael noticed drivers were calling to request
> firmware on *resume* calls! Why would that be an issue? It was because
> of the stupid delays incurred on resume when firmware *was not found*
> __due__ to the stupid user mode helper timeout delay and people
> believing this often meant users confusing resume *stalling* for resume
> was *broken! Later b298d289c7
> ("PM / Sleep: Fix freezer failures due to racy
> usermodehelper_is_disabled()") addressed races. That code in turn was
> later massaged into shape to better accept direct FS loading via commit
> 4e0c92d015235 ("firmware: Refactoring for splitting user-mode helper
> code").
>
> So for a while we've been nagging driver developers to not add
> request_firmware() calls on resume calls. In fact the
> drivers/base/firmware_class.c code *kills* firmware UMH calls when we go
> to suspend for the firmware cache, as such even on suspend its a stupid
> idea to use the UMH, not only because it can stall suspend but *also*
> because we kill these UMH calls.
>
> [...]
>
> So, as I see it the user mode helper lock should have *nothing* to do
> with the race between reading files from the kernel and having a path
> ready. If it was *used* for that, that was papering over the real
> issue. Its no different of a hack for instance as if a driver using
> request_firmware() tried to use async probe to avoid the same race. Yes
> it will help, but no, it does not mean it is deterministically safe.
>
> Reviewing commit 247bc03742545 ("PM / Sleep: Mitigate race between the freezer
> and request_firmware()") which originally extended umh state machine from
> just being enabled/disabled, with the concepts of UMH_ENABLED, UMH_FREEZING,
> UMH_DISABLED -- its goal was to prevent UMH uses during suspend. So -- the
> "UMH lock" on firmware was actually added to help avoid races between freezing
> and request_firmware(). We should not re-use UMH status notifiers when the
> firmware UMH is disabled for the same concepts -- if we needed such a concept
> then we should take this out from UMH code and generalize it.
>
> Signed-off-by: Daniel Wagner <[email protected]>
> Cc: Ming Lei <[email protected]>
> Cc: Luis R. Rodriguez <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> ---
> drivers/base/firmware_class.c | 52 +++++++++++++++++++++++--------------------
> 1 file changed, 28 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index 960f8f7c7aa2..d4fee06b67f3 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -981,13 +981,38 @@ static int fw_load_from_user_helper(struct firmware *firmware,
> unsigned int opt_flags, long timeout)
> {
> struct firmware_priv *fw_priv;
> + int ret;
> +
> + timeout = firmware_loading_timeout();
> + if (opt_flags & FW_OPT_NOWAIT) {
> + timeout = usermodehelper_read_lock_wait(timeout);
> + if (!timeout) {
> + dev_dbg(device, "firmware: %s loading timed out\n",
> + name);
> + return -EBUSY;
> + }
> + } else {
> + ret = usermodehelper_read_trylock();
> + if (WARN_ON(ret)) {
> + dev_err(device, "firmware: %s will not be loaded\n",
> + name);
> + return ret;
> + }
> + }
>
> fw_priv = fw_create_instance(firmware, name, device, opt_flags);
> - if (IS_ERR(fw_priv))
> - return PTR_ERR(fw_priv);
> + if (IS_ERR(fw_priv)) {
> + ret = PTR_ERR(fw_priv);
> + goto release_lock;
> + }
>
> fw_priv->buf = firmware->priv;
> - return _request_firmware_load(fw_priv, opt_flags, timeout);
> + ret = _request_firmware_load(fw_priv, opt_flags, timeout);
> +
> +release_lock:
> + usermodehelper_read_unlock();
> +
> + return ret;
> }
>
> #ifdef CONFIG_PM_SLEEP
> @@ -1150,25 +1175,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
> if (ret <= 0) /* error or already assigned */
> goto out;
>
> - ret = 0;
> - timeout = firmware_loading_timeout();
> - if (opt_flags & FW_OPT_NOWAIT) {
> - timeout = usermodehelper_read_lock_wait(timeout);
> - if (!timeout) {
> - dev_dbg(device, "firmware: %s loading timed out\n",
> - name);
> - ret = -EBUSY;
> - goto out;
> - }
> - } else {
> - ret = usermodehelper_read_trylock();
> - if (WARN_ON(ret)) {
> - dev_err(device, "firmware: %s will not be loaded\n",
> - name);
> - goto out;
> - }
> - }
> -
> ret = fw_get_filesystem_firmware(device, fw->priv);
> if (ret) {
> if (!(opt_flags & FW_OPT_NO_WARN))
> @@ -1185,8 +1191,6 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
> if (!ret)
> ret = assign_firmware_buf(fw, device, opt_flags);
>
> - usermodehelper_read_unlock();
> -
> out:
> if (ret < 0) {
> release_firmware(fw);

While this move is genuinely correct, lets recap few things:

These days most distributions enable CONFIG_FW_LOADER_USER_HELPER
but disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK
the reason being a huge backlash against it. Furthermore its important
to recall that systemd udev removed the userspace firmware loader back
in 2014. Five important things to consider for this patch set in light
of this:

---
0) The firmware cache is setup by adding a devres entry for each device
that uses a sync call. If an async call is used the firmware cache is
only set up for adevice if the UMH was not explicitly requested, that is
if the second argument to request_firmware_nowait() is false. The firmware
devres entry is maintained throughout the lifetime of the device, so even
if you release_firmware() the firmware cache is still used on suspend today.

1) The UMH lock was originally added by Rafael to help splat a warning if the
firmware API was used on resume, this was due to race issues that started creeping
up and also that the firmware API with the usermode helper stalled suspend
as many folks were implementing their own caching techniques on suspend
and if this relied on the UMH and the firmware was not found it would stall
suspend...

The firmware cache cache added by Ming long ago fixed the race concerns
as it implemented our own cache solution using devres so drivers no longer
have to implement their own solution, it requests firmware for each device
we have added a firmware devres for (explained above on item 0) prior to
suspend, then upon resume if the driver request firmware it

Although the lock helps protect races against suspend/resume calls
the lock also prevents another race: looking for firmware not built-in
if boot is too early in the process, we enable moving forward with
with a direct filesystem lookup for firmware on init/main.c after
usermodehelper_enable() is called right before do_initcalls() -- before
we start calling each kernel level init calls. Naturally this introduces
some races if you are not using initramfs, these races are being discussed
elsewhere [0].

[0] https://lkml.kernel.org/r/[email protected]

2) The UMH lock is currently *always* used for both sync and async calls
whether or not the UMH is used or required but only after the built-in
firmware is checked, if the firmware is in built-in we move on with
life and give it immediately.

3) The UMH is *only* used if:

If you have CONFIG_FW_LOADER_USER_HELPER_FALLBACK enabled (most distros
disable this now) the UMH will be used on both all sync and async calls.
Because of this kernel configuration we cannot simply remove the UMH,
given some kernels exist where they may have relied upon every firmware
API call to use the UMH.

If you do not have CONFIG_FW_LOADER_USER_HELPER_FALLBACK but do have
CONFIG_FW_LOADER_USER_HELPER enabled you can only use and request the
UMH to be used *iff* an async call explicitly required it. Since
most Linux distributions fall in this build category in practice these
days there are only a few drivers explicitly asking for the UMH, last
I checked 2 drivers and they had a valid UMH requirement. Since I
had not heard of *anyone* complain about compartamentalizing the UMH
slowly, and given all the complexities above with the firmware cache
and history of systemd interaction with udev (now removed) I have been
working on grammar rules with Coccinelle to enable us to police for
new UMH callers, and white-list with annotations only the callers that
really need the UMH (2 drivers).
---

There's two subtle issues then with moving the UMH lock to only the code
paths that need the UMH -- since the UMH lock was done to help warn of
firmware API callers on resume (but we now have a fw cache solution) *and*
races on init, moving the UMH lock to only the code that needs the UMH
could introduce races for *non-UMH* callers on init. The fw cache implemented
should help issues on suspend/resume as drivers no longer need to implement
their own fw calls on suspend/resume if they do not depend on the UMH,
but races / issues are still possible on init if we remove the lock for them.
To move the UMH lock then we need a replacement for this early race,
but if you think about it -- all things reading from the FS need this
race addressed as well, so now all callers of kernel_read_file_from_path().
Fixing this generally is not something I think we can do easily at this
point though so I'd be content if you fix this only for the firmware_class
as that is the only user of the UMH lock, the goal would be to remove it.
If you want to be adventurous I suppose you can try to address this generally.

There is technically also an issue for drivers that rely on the UMH and
races of using the firmware API upon suspend and resume, the fw cache
cannot work for them so they must implement their own solution and if they
issue a fw call on suspend or resume they may race against the filesystem
either being gone or not ready, respectively. To address this I think we
should consider using the filesystem suspend freeze_super() so we queue
up requests as we're going to suspend, this should in theory not be needed
once Jiri's work on freeze_all_supers() is in place though which I think lets
us get this automatically (?)

Luis