LinuxLists.cc - [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback

2021-04-14 14:55:50

Subject: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

This use-after-free happens when a fw_priv object has been freed but
hasn't been removed from the pending list (pending_fw_head). The next
time fw_load_sysfs_fallback tries to insert into the list, it ends up
accessing the pending_list member of the previoiusly freed fw_priv.

The root cause here is that all code paths that abort the fw load
don't delete it from the pending list. For example:

_request_firmware()
-> fw_abort_batch_reqs()
-> fw_state_aborted()

To fix this, delete the fw_priv from the list in __fw_set_state() if
the new state is DONE or ABORTED. This way, all aborts will remove
the fw_priv from the list. Accordingly, remove calls to list_del_init
that were being made before calling fw_state_(aborted|done).

Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
list if it is already aborted. Instead, just jump out and return early.

Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
Reported-by: [email protected]
Tested-by: [email protected]
Signed-off-by: Anirudh Rayabharam <[email protected]>
---

Changes in v3:
Modified the patch to incorporate suggestions by Luis Chamberlain in
order to fix the root cause instead of applying a "band-aid" kind of
fix.
https://lore.kernel.org/lkml/[email protected]/

Changes in v2:
1. Fixed 1 error and 1 warning (in the commit message) reported by
checkpatch.pl. The error was regarding the format for referring to
another commit "commit <sha> ("oneline")". The warning was for line
longer than 75 chars.

---
drivers/base/firmware_loader/fallback.c | 8 ++++++--
drivers/base/firmware_loader/firmware.h | 6 +++++-
drivers/base/firmware_loader/main.c | 2 ++
3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
index 91899d185e31..73581b6998b4 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
if (fw_sysfs_done(fw_priv))
return;

- list_del_init(&fw_priv->pending_list);
fw_state_aborted(fw_priv);
}

@@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
* Same logic as fw_load_abort, only the DONE bit
* is ignored and we set ABORT only on failure.
*/
- list_del_init(&fw_priv->pending_list);
if (rc) {
fw_state_aborted(fw_priv);
written = rc;
@@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
}

mutex_lock(&fw_lock);
+ if (fw_state_is_aborted(fw_priv)) {
+ mutex_unlock(&fw_lock);
+ retval = -EAGAIN;
+ goto out;
+ }
list_add(&fw_priv->pending_list, &pending_fw_head);
mutex_unlock(&fw_lock);

@@ -540,6 +543,7 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
} else if (fw_priv->is_paged_buf && !fw_priv->data)
retval = -ENOMEM;

+out:
device_del(f_dev);
err_put_dev:
put_device(f_dev);
diff --git a/drivers/base/firmware_loader/firmware.h b/drivers/base/firmware_loader/firmware.h
index 63bd29fdcb9c..36bdb413c998 100644
--- a/drivers/base/firmware_loader/firmware.h
+++ b/drivers/base/firmware_loader/firmware.h
@@ -117,8 +117,12 @@ static inline void __fw_state_set(struct fw_priv *fw_priv,

WRITE_ONCE(fw_st->status, status);

- if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED)
+ if (status == FW_STATUS_DONE || status == FW_STATUS_ABORTED) {
+#ifdef CONFIG_FW_LOADER_USER_HELPER
+ list_del_init(&fw_priv->pending_list);
+#endif
complete_all(&fw_st->completion);
+ }
}

static inline void fw_state_aborted(struct fw_priv *fw_priv)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index 4fdb8219cd08..68c549d71230 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -783,8 +783,10 @@ static void fw_abort_batch_reqs(struct firmware *fw)
return;

fw_priv = fw->priv;
+ mutex_lock(&fw_lock);
if (!fw_state_is_aborted(fw_priv))
fw_state_aborted(fw_priv);
+ mutex_unlock(&fw_lock);
}

/* called from request_firmware() and request_firmware_work_func() */
--
2.26.2

2021-04-14 15:54:28

by Luis Chamberlain

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

Shuah, a question for you toward the end here.

On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
> This use-after-free happens when a fw_priv object has been freed but
> hasn't been removed from the pending list (pending_fw_head). The next
> time fw_load_sysfs_fallback tries to insert into the list, it ends up
> accessing the pending_list member of the previoiusly freed fw_priv.
>
> The root cause here is that all code paths that abort the fw load
> don't delete it from the pending list. For example:
>
> _request_firmware()
> -> fw_abort_batch_reqs()
> -> fw_state_aborted()
>
> To fix this, delete the fw_priv from the list in __fw_set_state() if
> the new state is DONE or ABORTED. This way, all aborts will remove
> the fw_priv from the list. Accordingly, remove calls to list_del_init
> that were being made before calling fw_state_(aborted|done).
>
> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
> list if it is already aborted. Instead, just jump out and return early.
>
> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
> Reported-by: [email protected]
> Tested-by: [email protected]
> Signed-off-by: Anirudh Rayabharam <[email protected]>
> ---
>
> Changes in v3:
> Modified the patch to incorporate suggestions by Luis Chamberlain in
> order to fix the root cause instead of applying a "band-aid" kind of
> fix.
> https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v2:
> 1. Fixed 1 error and 1 warning (in the commit message) reported by
> checkpatch.pl. The error was regarding the format for referring to
> another commit "commit <sha> ("oneline")". The warning was for line
> longer than 75 chars.
>
> ---
> drivers/base/firmware_loader/fallback.c | 8 ++++++--
> drivers/base/firmware_loader/firmware.h | 6 +++++-
> drivers/base/firmware_loader/main.c | 2 ++
> 3 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> index 91899d185e31..73581b6998b4 100644
> --- a/drivers/base/firmware_loader/fallback.c
> +++ b/drivers/base/firmware_loader/fallback.c
> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
> if (fw_sysfs_done(fw_priv))
> return;
>
> - list_del_init(&fw_priv->pending_list);
> fw_state_aborted(fw_priv);
> }
>
> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
> * Same logic as fw_load_abort, only the DONE bit
> * is ignored and we set ABORT only on failure.
> */
> - list_del_init(&fw_priv->pending_list);
> if (rc) {
> fw_state_aborted(fw_priv);
> written = rc;
> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
> }
>
> mutex_lock(&fw_lock);
> + if (fw_state_is_aborted(fw_priv)) {
> + mutex_unlock(&fw_lock);
> + retval = -EAGAIN;
> + goto out;
> + }

Thanks for the quick follow up!

This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
signal abort on fallback mechanism") which I had mentioned in my follow
up email you posted a link to. It would regress it since the condition
is just being met earlier and you nullify the effort. So essentially
on Android you would make not being able to detect signal handlers
like the SIGCHLD signal sent to init, if init was the same process
dealing with the sysfs fallback firmware upload.

The way I dealt with this in my patch was I decided to return -EINTR
in the earlier case in the hunk you added, instead of -EAGAIN. In
addition to this, later on fw_load_sysfs_fallback() when
fw_sysfs_wait_timeout() is used that would also deal with checking
for error codes on wait, and only then check if it was a signal
that cancelled things (the check for -ERESTARTSYS). We therefore
only send to userspace -EAGAIN when the wait really did hit the
timeout.

But also note that my change added a check for
fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
as that was a recently intended goal.

In either case I documented well *why* we do these error checks
before sending a code to userspace on fw_sysfs_wait_timeout() since
otherwise it would be easy to regress that code, so please also
document that as I did.

I'll re-iterate again also:

Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
_request_firmware_load() return val for fw load abort") also wanted to
distinguish the timeout vs -ENOMEM, but for some reason in the timeout
case -EAGAIN was being sent back to userspace. I am no longer sure if
that is a good idea, but since we started doing that at some point I
guess we want to keep that behaviour.

Shuah, can you think of any reason to retain -EAGAIN other than you
introduced it here? If there's no real good reason I think it can
simplify the error handling here. But, we *would* change what we do
to userspace... and for that reason we may have to live with it.

Luis

2021-04-14 17:15:58

by Shuah Khan

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On 4/14/21 6:55 AM, Luis Chamberlain wrote:
> Shuah, a question for you toward the end here.
>
> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
>> This use-after-free happens when a fw_priv object has been freed but
>> hasn't been removed from the pending list (pending_fw_head). The next
>> time fw_load_sysfs_fallback tries to insert into the list, it ends up
>> accessing the pending_list member of the previoiusly freed fw_priv.
>>
>> The root cause here is that all code paths that abort the fw load
>> don't delete it from the pending list. For example:
>>
>> _request_firmware()
>> -> fw_abort_batch_reqs()
>> -> fw_state_aborted()
>>
>> To fix this, delete the fw_priv from the list in __fw_set_state() if
>> the new state is DONE or ABORTED. This way, all aborts will remove
>> the fw_priv from the list. Accordingly, remove calls to list_del_init
>> that were being made before calling fw_state_(aborted|done).
>>
>> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
>> list if it is already aborted. Instead, just jump out and return early.
>>
>> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
>> Reported-by: [email protected]
>> Tested-by: [email protected]
>> Signed-off-by: Anirudh Rayabharam <[email protected]>
>> ---
>>
>> Changes in v3:
>> Modified the patch to incorporate suggestions by Luis Chamberlain in
>> order to fix the root cause instead of applying a "band-aid" kind of
>> fix.
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Changes in v2:
>> 1. Fixed 1 error and 1 warning (in the commit message) reported by
>> checkpatch.pl. The error was regarding the format for referring to
>> another commit "commit <sha> ("oneline")". The warning was for line
>> longer than 75 chars.
>>
>> ---
>> drivers/base/firmware_loader/fallback.c | 8 ++++++--
>> drivers/base/firmware_loader/firmware.h | 6 +++++-
>> drivers/base/firmware_loader/main.c | 2 ++
>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
>> index 91899d185e31..73581b6998b4 100644
>> --- a/drivers/base/firmware_loader/fallback.c
>> +++ b/drivers/base/firmware_loader/fallback.c
>> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>> if (fw_sysfs_done(fw_priv))
>> return;
>>
>> - list_del_init(&fw_priv->pending_list);
>> fw_state_aborted(fw_priv);
>> }
>>
>> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
>> * Same logic as fw_load_abort, only the DONE bit
>> * is ignored and we set ABORT only on failure.
>> */
>> - list_del_init(&fw_priv->pending_list);
>> if (rc) {
>> fw_state_aborted(fw_priv);
>> written = rc;
>> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
>> }
>>
>> mutex_lock(&fw_lock);
>> + if (fw_state_is_aborted(fw_priv)) {
>> + mutex_unlock(&fw_lock);
>> + retval = -EAGAIN;
>> + goto out;
>> + }
>
> Thanks for the quick follow up!
>
> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
> signal abort on fallback mechanism") which I had mentioned in my follow
> up email you posted a link to. It would regress it since the condition
> is just being met earlier and you nullify the effort. So essentially
> on Android you would make not being able to detect signal handlers
> like the SIGCHLD signal sent to init, if init was the same process
> dealing with the sysfs fallback firmware upload.
>
> The way I dealt with this in my patch was I decided to return -EINTR
> in the earlier case in the hunk you added, instead of -EAGAIN. In
> addition to this, later on fw_load_sysfs_fallback() when
> fw_sysfs_wait_timeout() is used that would also deal with checking
> for error codes on wait, and only then check if it was a signal
> that cancelled things (the check for -ERESTARTSYS). We therefore
> only send to userspace -EAGAIN when the wait really did hit the
> timeout.
>
> But also note that my change added a check for
> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
> as that was a recently intended goal.
>
> In either case I documented well *why* we do these error checks
> before sending a code to userspace on fw_sysfs_wait_timeout() since
> otherwise it would be easy to regress that code, so please also
> document that as I did.
>
> I'll re-iterate again also:
>
> Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
> _request_firmware_load() return val for fw load abort") also wanted to
> distinguish the timeout vs -ENOMEM, but for some reason in the timeout
> case -EAGAIN was being sent back to userspace. I am no longer sure if
> that is a good idea, but since we started doing that at some point I
> guess we want to keep that behaviour.
>
> Shuah, can you think of any reason to retain -EAGAIN other than you
> introduced it here? If there's no real good reason I think it can
> simplify the error handling here. But, we *would* change what we do
> to userspace... and for that reason we may have to live with it.
>

As I recall the reason for this patch was to be able to differentiate
between timing out vs no memory case when driver was attempting to
load firmware. I wish I added why to the change log.

The code seems to have changed a lot since my commit. I will take a look
at the closely and let you know if this is still necessary late on
today.

thanks,
-- Shuah

2021-04-15 06:06:18

by Anirudh Rayabharam

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On Wed, Apr 14, 2021 at 12:55:40PM +0000, Luis Chamberlain wrote:
> Shuah, a question for you toward the end here.
>
> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
> > This use-after-free happens when a fw_priv object has been freed but
> > hasn't been removed from the pending list (pending_fw_head). The next
> > time fw_load_sysfs_fallback tries to insert into the list, it ends up
> > accessing the pending_list member of the previoiusly freed fw_priv.
> >
> > The root cause here is that all code paths that abort the fw load
> > don't delete it from the pending list. For example:
> >
> > _request_firmware()
> > -> fw_abort_batch_reqs()
> > -> fw_state_aborted()
> >
> > To fix this, delete the fw_priv from the list in __fw_set_state() if
> > the new state is DONE or ABORTED. This way, all aborts will remove
> > the fw_priv from the list. Accordingly, remove calls to list_del_init
> > that were being made before calling fw_state_(aborted|done).
> >
> > Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
> > list if it is already aborted. Instead, just jump out and return early.
> >
> > Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
> > Reported-by: [email protected]
> > Tested-by: [email protected]
> > Signed-off-by: Anirudh Rayabharam <[email protected]>
> > ---
> >
> > Changes in v3:
> > Modified the patch to incorporate suggestions by Luis Chamberlain in
> > order to fix the root cause instead of applying a "band-aid" kind of
> > fix.
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v2:
> > 1. Fixed 1 error and 1 warning (in the commit message) reported by
> > checkpatch.pl. The error was regarding the format for referring to
> > another commit "commit <sha> ("oneline")". The warning was for line
> > longer than 75 chars.
> >
> > ---
> > drivers/base/firmware_loader/fallback.c | 8 ++++++--
> > drivers/base/firmware_loader/firmware.h | 6 +++++-
> > drivers/base/firmware_loader/main.c | 2 ++
> > 3 files changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> > index 91899d185e31..73581b6998b4 100644
> > --- a/drivers/base/firmware_loader/fallback.c
> > +++ b/drivers/base/firmware_loader/fallback.c
> > @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
> > if (fw_sysfs_done(fw_priv))
> > return;
> >
> > - list_del_init(&fw_priv->pending_list);
> > fw_state_aborted(fw_priv);
> > }
> >
> > @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
> > * Same logic as fw_load_abort, only the DONE bit
> > * is ignored and we set ABORT only on failure.
> > */
> > - list_del_init(&fw_priv->pending_list);
> > if (rc) {
> > fw_state_aborted(fw_priv);
> > written = rc;
> > @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
> > }
> >
> > mutex_lock(&fw_lock);
> > + if (fw_state_is_aborted(fw_priv)) {
> > + mutex_unlock(&fw_lock);
> > + retval = -EAGAIN;
> > + goto out;
> > + }
>
> Thanks for the quick follow up!
>
> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
> signal abort on fallback mechanism") which I had mentioned in my follow
> up email you posted a link to. It would regress it since the condition
> is just being met earlier and you nullify the effort. So essentially
> on Android you would make not being able to detect signal handlers
> like the SIGCHLD signal sent to init, if init was the same process
> dealing with the sysfs fallback firmware upload.

Thanks for the detailed comments, Luis!

I don't see how my patch changes existing error code behaviour. Even
without my patch this function would return -EAGAIN if the fw is already
aborted. Without my patch, it would call fw_sysfs_wait_timeout() which
would return -ENOENT (because fw is already aborted) as follows:

ret = wait_for_completion_killable_timeout(...)
if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
return -ENOENT;
if (!ret)
return -ETIMEDOUT;

return ret < 0 ? ret : 0;

Now, this -ENOENT gets converted to -EAGAIN due to this piece of code
in fw_load_sysfs_fallback():

if (fw_state_is_aborted(fw_priv)) {
if (retval == -ERESTARTSYS)
retval = -EINTR;
else
retval = -EAGAIN;
} else if (fw_priv->is_paged_buf && !fw_priv->data)
retval = -ENOMEM;

So, at the end, fw_load_sysfs_fallback() returns -EAGAIN for the case
where the fw is already aborted. Which is what I did in my patch. So, my
patch doesn't seem to regress anything. If this is not the intended behavior
then it means that things are already regressed and not because of this
patch.

Please do correct me if I missed something here.

>
> The way I dealt with this in my patch was I decided to return -EINTR
> in the earlier case in the hunk you added, instead of -EAGAIN. In
> addition to this, later on fw_load_sysfs_fallback() when
> fw_sysfs_wait_timeout() is used that would also deal with checking
> for error codes on wait, and only then check if it was a signal
> that cancelled things (the check for -ERESTARTSYS). We therefore
> only send to userspace -EAGAIN when the wait really did hit the
> timeout.
>
> But also note that my change added a check for
> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
> as that was a recently intended goal.
>
> In either case I documented well *why* we do these error checks
> before sending a code to userspace on fw_sysfs_wait_timeout() since
> otherwise it would be easy to regress that code, so please also
> document that as I did.

The goal of this patch is to fix the UAF reported by syzbot.

I am okay with simply documenting the reasons behind error codes, but I
would rather not do any more refactoring of the error handling code in
this patch since it doesn't directly contribute to fixing the uaf.

Does that sound reasonable?

Thanks!

- Anirudh.

2021-04-15 14:14:13

by Shuah Khan

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On 4/14/21 9:26 AM, Shuah Khan wrote:
> On 4/14/21 6:55 AM, Luis Chamberlain wrote:
>> Shuah, a question for you toward the end here.
>>
>> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
>>> This use-after-free happens when a fw_priv object has been freed but
>>> hasn't been removed from the pending list (pending_fw_head). The next
>>> time fw_load_sysfs_fallback tries to insert into the list, it ends up
>>> accessing the pending_list member of the previoiusly freed fw_priv.
>>>
>>> The root cause here is that all code paths that abort the fw load
>>> don't delete it from the pending list. For example:
>>>
>>>     _request_firmware()
>>>       -> fw_abort_batch_reqs()
>>>           -> fw_state_aborted()
>>>
>>> To fix this, delete the fw_priv from the list in __fw_set_state() if
>>> the new state is DONE or ABORTED. This way, all aborts will remove
>>> the fw_priv from the list. Accordingly, remove calls to list_del_init
>>> that were being made before calling fw_state_(aborted|done).
>>>
>>> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
>>> list if it is already aborted. Instead, just jump out and return early.
>>>
>>> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with
>>> fw_load_sysfs_fallback")
>>> Reported-by: [email protected]
>>> Tested-by: [email protected]
>>> Signed-off-by: Anirudh Rayabharam <[email protected]>
>>> ---
>>>
>>> Changes in v3:
>>> Modified the patch to incorporate suggestions by Luis Chamberlain in
>>> order to fix the root cause instead of applying a "band-aid" kind of
>>> fix.
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Changes in v2:
>>> 1. Fixed 1 error and 1 warning (in the commit message) reported by
>>> checkpatch.pl. The error was regarding the format for referring to
>>> another commit "commit <sha> ("oneline")". The warning was for line
>>> longer than 75 chars.
>>>
>>> ---
>>> drivers/base/firmware_loader/fallback.c | 8 ++++++--
>>> drivers/base/firmware_loader/firmware.h | 6 +++++-
>>> drivers/base/firmware_loader/main.c     | 2 ++
>>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/base/firmware_loader/fallback.c
>>> b/drivers/base/firmware_loader/fallback.c
>>> index 91899d185e31..73581b6998b4 100644
>>> --- a/drivers/base/firmware_loader/fallback.c
>>> +++ b/drivers/base/firmware_loader/fallback.c
>>> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>>>       if (fw_sysfs_done(fw_priv))
>>>           return;
>>> -    list_del_init(&fw_priv->pending_list);
>>>       fw_state_aborted(fw_priv);
>>> }
>>> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct
>>> device *dev,
>>>                * Same logic as fw_load_abort, only the DONE bit
>>>                * is ignored and we set ABORT only on failure.
>>>                */
>>> -            list_del_init(&fw_priv->pending_list);
>>>               if (rc) {
>>>                   fw_state_aborted(fw_priv);
>>>                   written = rc;
>>> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct
>>> fw_sysfs *fw_sysfs, long timeout)
>>>       }
>>>       mutex_lock(&fw_lock);
>>> +    if (fw_state_is_aborted(fw_priv)) {
>>> +        mutex_unlock(&fw_lock);
>>> +        retval = -EAGAIN;
>>> +        goto out;
>>> +    }
>>
>> Thanks for the quick follow up!
>>
>> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
>> signal abort on fallback mechanism") which I had mentioned in my follow
>> up email you posted a link to. It would regress it since the condition
>> is just being met earlier and you nullify the effort. So essentially
>> on Android you would make not being able to detect signal handlers
>> like the SIGCHLD signal sent to init, if init was the same process
>> dealing with the sysfs fallback firmware upload.
>>
>> The way I dealt with this in my patch was I decided to return -EINTR
>> in the earlier case in the hunk you added, instead of -EAGAIN. In
>> addition to this, later on fw_load_sysfs_fallback() when
>> fw_sysfs_wait_timeout() is used that would also deal with checking
>> for error codes on wait, and only then check if it was a signal
>> that cancelled things (the check for -ERESTARTSYS). We therefore
>> only send to userspace -EAGAIN when the wait really did hit the
>> timeout.
>>
>> But also note that my change added a check for
>> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
>> as that was a recently intended goal.
>>
>> In either case I documented well *why* we do these error checks
>> before sending a code to userspace on fw_sysfs_wait_timeout() since
>> otherwise it would be easy to regress that code, so please also
>> document that as I did.
>>
>> I'll re-iterate again also:
>>
>>     Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
>>     _request_firmware_load() return val for fw load abort") also
>> wanted to
>>     distinguish the timeout vs -ENOMEM, but for some reason in the
>> timeout
>>     case -EAGAIN was being sent back to userspace. I am no longer sure if
>>     that is a good idea, but since we started doing that at some point I
>>     guess we want to keep that behaviour.
>>
>> Shuah, can you think of any reason to retain -EAGAIN other than you
>> introduced it here? If there's no real good reason I think it can
>> simplify the error handling here. But, we *would* change what we do
>> to userspace... and for that reason we may have to live with it.
>>
>
> As I recall the reason for this patch was to be able to differentiate
> between timing out vs no memory case when driver was attempting to
> load firmware. I wish I added why to the change log.
>
> The code seems to have changed a lot since my commit. I will take a look
> at the closely and let you know if this is still necessary late on
> today.
>

Luis,

I did some digging and figured out why I added this timeout logic. Media
drivers expect request_firmware() timeout or fail.

It turns out I have notes saved on this problem.

When drivers attempt to load firmware while ext_fs is going through
recovery during boot or resume (after hibernate or suspend),
request_firmware() calls fail without this timeout handling leaving
drivers and devices that need firmware loaded in failed state.

I added this timeout handling so drivers can retry loading the firmware.
Several media drivers retry based on the timeoout return. This helps
differentiate timeout and other failures.

We still need this logic or at least we can't delete without verifying
that is indeed not needed.

thanks,
-- Shuah

2021-04-23 18:42:24

by Luis Chamberlain

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On Wed, Apr 14, 2021 at 09:26:55AM -0600, Shuah Khan wrote:
> On 4/14/21 6:55 AM, Luis Chamberlain wrote:
> > In either case I documented well *why* we do these error checks
> > before sending a code to userspace on fw_sysfs_wait_timeout() since
> > otherwise it would be easy to regress that code, so please also
> > document that as I did.
> >
> > I'll re-iterate again also:
> >
> > Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
> > _request_firmware_load() return val for fw load abort") also wanted to
> > distinguish the timeout vs -ENOMEM, but for some reason in the timeout
> > case -EAGAIN was being sent back to userspace. I am no longer sure if
> > that is a good idea, but since we started doing that at some point I
> > guess we want to keep that behaviour.
> >
> > Shuah, can you think of any reason to retain -EAGAIN other than you
> > introduced it here? If there's no real good reason I think it can
> > simplify the error handling here. But, we *would* change what we do
> > to userspace... and for that reason we may have to live with it.
> >
>
> As I recall the reason for this patch was to be able to differentiate
> between timing out vs no memory case when driver was attempting to
> load firmware. I wish I added why to the change log.
>
> The code seems to have changed a lot since my commit. I will take a look
> at the closely and let you know if this is still necessary late on
> today.

Shuah, *poke*

Luis

2021-04-23 18:47:08

by Shuah Khan

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On 4/15/21 8:10 AM, Shuah Khan wrote:
> On 4/14/21 9:26 AM, Shuah Khan wrote:
>> On 4/14/21 6:55 AM, Luis Chamberlain wrote:
>>> Shuah, a question for you toward the end here.
>>>
>>> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
>>>> This use-after-free happens when a fw_priv object has been freed but
>>>> hasn't been removed from the pending list (pending_fw_head). The next
>>>> time fw_load_sysfs_fallback tries to insert into the list, it ends up
>>>> accessing the pending_list member of the previoiusly freed fw_priv.
>>>>
>>>> The root cause here is that all code paths that abort the fw load
>>>> don't delete it from the pending list. For example:
>>>>
>>>>     _request_firmware()
>>>>       -> fw_abort_batch_reqs()
>>>>           -> fw_state_aborted()
>>>>
>>>> To fix this, delete the fw_priv from the list in __fw_set_state() if
>>>> the new state is DONE or ABORTED. This way, all aborts will remove
>>>> the fw_priv from the list. Accordingly, remove calls to list_del_init
>>>> that were being made before calling fw_state_(aborted|done).
>>>>
>>>> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
>>>> list if it is already aborted. Instead, just jump out and return early.
>>>>
>>>> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with
>>>> fw_load_sysfs_fallback")
>>>> Reported-by: [email protected]
>>>> Tested-by: [email protected]
>>>> Signed-off-by: Anirudh Rayabharam <[email protected]>
>>>> ---
>>>>
>>>> Changes in v3:
>>>> Modified the patch to incorporate suggestions by Luis Chamberlain in
>>>> order to fix the root cause instead of applying a "band-aid" kind of
>>>> fix.
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> Changes in v2:
>>>> 1. Fixed 1 error and 1 warning (in the commit message) reported by
>>>> checkpatch.pl. The error was regarding the format for referring to
>>>> another commit "commit <sha> ("oneline")". The warning was for line
>>>> longer than 75 chars.
>>>>
>>>> ---
>>>> drivers/base/firmware_loader/fallback.c | 8 ++++++--
>>>> drivers/base/firmware_loader/firmware.h | 6 +++++-
>>>> drivers/base/firmware_loader/main.c     | 2 ++
>>>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/base/firmware_loader/fallback.c
>>>> b/drivers/base/firmware_loader/fallback.c
>>>> index 91899d185e31..73581b6998b4 100644
>>>> --- a/drivers/base/firmware_loader/fallback.c
>>>> +++ b/drivers/base/firmware_loader/fallback.c
>>>> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>>>>       if (fw_sysfs_done(fw_priv))
>>>>           return;
>>>> -    list_del_init(&fw_priv->pending_list);
>>>>       fw_state_aborted(fw_priv);
>>>> }
>>>> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct
>>>> device *dev,
>>>>                * Same logic as fw_load_abort, only the DONE bit
>>>>                * is ignored and we set ABORT only on failure.
>>>>                */
>>>> -            list_del_init(&fw_priv->pending_list);
>>>>               if (rc) {
>>>>                   fw_state_aborted(fw_priv);
>>>>                   written = rc;
>>>> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct
>>>> fw_sysfs *fw_sysfs, long timeout)
>>>>       }
>>>>       mutex_lock(&fw_lock);
>>>> +    if (fw_state_is_aborted(fw_priv)) {
>>>> +        mutex_unlock(&fw_lock);
>>>> +        retval = -EAGAIN;
>>>> +        goto out;
>>>> +    }
>>>
>>> Thanks for the quick follow up!
>>>
>>> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
>>> signal abort on fallback mechanism") which I had mentioned in my follow
>>> up email you posted a link to. It would regress it since the condition
>>> is just being met earlier and you nullify the effort. So essentially
>>> on Android you would make not being able to detect signal handlers
>>> like the SIGCHLD signal sent to init, if init was the same process
>>> dealing with the sysfs fallback firmware upload.
>>>
>>> The way I dealt with this in my patch was I decided to return -EINTR
>>> in the earlier case in the hunk you added, instead of -EAGAIN. In
>>> addition to this, later on fw_load_sysfs_fallback() when
>>> fw_sysfs_wait_timeout() is used that would also deal with checking
>>> for error codes on wait, and only then check if it was a signal
>>> that cancelled things (the check for -ERESTARTSYS). We therefore
>>> only send to userspace -EAGAIN when the wait really did hit the
>>> timeout.
>>>
>>> But also note that my change added a check for
>>> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
>>> as that was a recently intended goal.
>>>
>>> In either case I documented well *why* we do these error checks
>>> before sending a code to userspace on fw_sysfs_wait_timeout() since
>>> otherwise it would be easy to regress that code, so please also
>>> document that as I did.
>>>
>>> I'll re-iterate again also:
>>>
>>>     Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
>>>     _request_firmware_load() return val for fw load abort") also
>>> wanted to
>>>     distinguish the timeout vs -ENOMEM, but for some reason in the
>>> timeout
>>>     case -EAGAIN was being sent back to userspace. I am no longer
>>> sure if
>>>     that is a good idea, but since we started doing that at some point I
>>>     guess we want to keep that behaviour.
>>>
>>> Shuah, can you think of any reason to retain -EAGAIN other than you
>>> introduced it here? If there's no real good reason I think it can
>>> simplify the error handling here. But, we *would* change what we do
>>> to userspace... and for that reason we may have to live with it.
>>>
>>
>> As I recall the reason for this patch was to be able to differentiate
>> between timing out vs no memory case when driver was attempting to
>> load firmware. I wish I added why to the change log.
>>
>> The code seems to have changed a lot since my commit. I will take a look
>> at the closely and let you know if this is still necessary late on
>> today.
>>
>
> Luis,
>
> I did some digging and figured out why I added this timeout logic. Media
> drivers expect request_firmware() timeout or fail.
>
> It turns out I have notes saved on this problem.
>
> When drivers attempt to load firmware while ext_fs is going through
> recovery during boot or resume (after hibernate or suspend),
> request_firmware() calls fail without this timeout handling leaving
> drivers and devices that need firmware loaded in failed state.
>
> I added this timeout handling so drivers can retry loading the firmware.
> Several media drivers retry based on the timeoout return. This helps
> differentiate timeout and other failures.
>
> We still need this logic or at least we can't delete without verifying
> that is indeed not needed.
>

Doesn't look like this manded in your Inbox?

thanks,
-- Shuah

2021-04-23 18:48:09

by Shuah Khan

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On 4/23/21 12:40 PM, Luis Chamberlain wrote:
> On Wed, Apr 14, 2021 at 09:26:55AM -0600, Shuah Khan wrote:
>> On 4/14/21 6:55 AM, Luis Chamberlain wrote:
>>> In either case I documented well *why* we do these error checks
>>> before sending a code to userspace on fw_sysfs_wait_timeout() since
>>> otherwise it would be easy to regress that code, so please also
>>> document that as I did.
>>>
>>> I'll re-iterate again also:
>>>
>>> Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
>>> _request_firmware_load() return val for fw load abort") also wanted to
>>> distinguish the timeout vs -ENOMEM, but for some reason in the timeout
>>> case -EAGAIN was being sent back to userspace. I am no longer sure if
>>> that is a good idea, but since we started doing that at some point I
>>> guess we want to keep that behaviour.
>>>
>>> Shuah, can you think of any reason to retain -EAGAIN other than you
>>> introduced it here? If there's no real good reason I think it can
>>> simplify the error handling here. But, we *would* change what we do
>>> to userspace... and for that reason we may have to live with it.
>>>
>>
>> As I recall the reason for this patch was to be able to differentiate
>> between timing out vs no memory case when driver was attempting to
>> load firmware. I wish I added why to the change log.
>>
>> The code seems to have changed a lot since my commit. I will take a look
>> at the closely and let you know if this is still necessary late on
>> today.
>
> Shuah, *poke*
>
Luis,

I responded to you a week ago. Let me resend the message.

thanks,
-- Shuah

2021-04-30 02:52:46

by Anirudh Rayabharam

[permalink] [raw]

Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

On Thu, Apr 15, 2021 at 11:35:12AM +0530, Anirudh Rayabharam wrote:
> On Wed, Apr 14, 2021 at 12:55:40PM +0000, Luis Chamberlain wrote:
> > Shuah, a question for you toward the end here.
> >
> > On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
> > > This use-after-free happens when a fw_priv object has been freed but
> > > hasn't been removed from the pending list (pending_fw_head). The next
> > > time fw_load_sysfs_fallback tries to insert into the list, it ends up
> > > accessing the pending_list member of the previoiusly freed fw_priv.
> > >
> > > The root cause here is that all code paths that abort the fw load
> > > don't delete it from the pending list. For example:
> > >
> > > _request_firmware()
> > > -> fw_abort_batch_reqs()
> > > -> fw_state_aborted()
> > >
> > > To fix this, delete the fw_priv from the list in __fw_set_state() if
> > > the new state is DONE or ABORTED. This way, all aborts will remove
> > > the fw_priv from the list. Accordingly, remove calls to list_del_init
> > > that were being made before calling fw_state_(aborted|done).
> > >
> > > Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
> > > list if it is already aborted. Instead, just jump out and return early.
> > >
> > > Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
> > > Reported-by: [email protected]
> > > Tested-by: [email protected]
> > > Signed-off-by: Anirudh Rayabharam <[email protected]>
> > > ---
> > >
> > > Changes in v3:
> > > Modified the patch to incorporate suggestions by Luis Chamberlain in
> > > order to fix the root cause instead of applying a "band-aid" kind of
> > > fix.
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > Changes in v2:
> > > 1. Fixed 1 error and 1 warning (in the commit message) reported by
> > > checkpatch.pl. The error was regarding the format for referring to
> > > another commit "commit <sha> ("oneline")". The warning was for line
> > > longer than 75 chars.
> > >
> > > ---
> > > drivers/base/firmware_loader/fallback.c | 8 ++++++--
> > > drivers/base/firmware_loader/firmware.h | 6 +++++-
> > > drivers/base/firmware_loader/main.c | 2 ++
> > > 3 files changed, 13 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> > > index 91899d185e31..73581b6998b4 100644
> > > --- a/drivers/base/firmware_loader/fallback.c
> > > +++ b/drivers/base/firmware_loader/fallback.c
> > > @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
> > > if (fw_sysfs_done(fw_priv))
> > > return;
> > >
> > > - list_del_init(&fw_priv->pending_list);
> > > fw_state_aborted(fw_priv);
> > > }
> > >
> > > @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
> > > * Same logic as fw_load_abort, only the DONE bit
> > > * is ignored and we set ABORT only on failure.
> > > */
> > > - list_del_init(&fw_priv->pending_list);
> > > if (rc) {
> > > fw_state_aborted(fw_priv);
> > > written = rc;
> > > @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
> > > }
> > >
> > > mutex_lock(&fw_lock);
> > > + if (fw_state_is_aborted(fw_priv)) {
> > > + mutex_unlock(&fw_lock);
> > > + retval = -EAGAIN;
> > > + goto out;
> > > + }
> >
> > Thanks for the quick follow up!
> >
> > This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
> > signal abort on fallback mechanism") which I had mentioned in my follow
> > up email you posted a link to. It would regress it since the condition
> > is just being met earlier and you nullify the effort. So essentially
> > on Android you would make not being able to detect signal handlers
> > like the SIGCHLD signal sent to init, if init was the same process
> > dealing with the sysfs fallback firmware upload.
>
> Thanks for the detailed comments, Luis!
>
> I don't see how my patch changes existing error code behaviour. Even
> without my patch this function would return -EAGAIN if the fw is already
> aborted. Without my patch, it would call fw_sysfs_wait_timeout() which
> would return -ENOENT (because fw is already aborted) as follows:
>
> ret = wait_for_completion_killable_timeout(...)
> if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> return -ENOENT;
> if (!ret)
> return -ETIMEDOUT;
>
> return ret < 0 ? ret : 0;
>
> Now, this -ENOENT gets converted to -EAGAIN due to this piece of code
> in fw_load_sysfs_fallback():
>
> if (fw_state_is_aborted(fw_priv)) {
> if (retval == -ERESTARTSYS)
> retval = -EINTR;
> else
> retval = -EAGAIN;
> } else if (fw_priv->is_paged_buf && !fw_priv->data)
> retval = -ENOMEM;
>
> So, at the end, fw_load_sysfs_fallback() returns -EAGAIN for the case
> where the fw is already aborted. Which is what I did in my patch. So, my
> patch doesn't seem to regress anything. If this is not the intended behavior
> then it means that things are already regressed and not because of this
> patch.
>
> Please do correct me if I missed something here.
>
> >
> > The way I dealt with this in my patch was I decided to return -EINTR
> > in the earlier case in the hunk you added, instead of -EAGAIN. In
> > addition to this, later on fw_load_sysfs_fallback() when
> > fw_sysfs_wait_timeout() is used that would also deal with checking
> > for error codes on wait, and only then check if it was a signal
> > that cancelled things (the check for -ERESTARTSYS). We therefore
> > only send to userspace -EAGAIN when the wait really did hit the
> > timeout.
> >
> > But also note that my change added a check for
> > fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
> > as that was a recently intended goal.
> >
> > In either case I documented well *why* we do these error checks
> > before sending a code to userspace on fw_sysfs_wait_timeout() since
> > otherwise it would be easy to regress that code, so please also
> > document that as I did.
>
> The goal of this patch is to fix the UAF reported by syzbot.
>
> I am okay with simply documenting the reasons behind error codes, but I
> would rather not do any more refactoring of the error handling code in
> this patch since it doesn't directly contribute to fixing the uaf.
>
> Does that sound reasonable?

Hey Luis, did you get a chance to go through this email? Shall I send a
v4 with just the comments added? Would that be acceptable to you?

Thanks!

- Anirudh.