2022-04-27 10:48:40

by Duoming Zhou

[permalink] [raw]
Subject: [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs

There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.

There are three situations that could lead to double-free bugs.

The first situation is shown below:

(Thread 1) | (Thread 2)
nfcmrvl_fw_dnld_start |
... | nfcmrvl_nci_unregister_dev
release_firmware() | nfcmrvl_fw_dnld_abort
kfree(fw) //(1) | fw_dnld_over
| release_firmware
... | kfree(fw) //(2)
| ...

The second situation is shown below:

(Thread 1) | (Thread 2)
nfcmrvl_fw_dnld_start |
... |
mod_timer |
(wait a time) |
fw_dnld_timeout | nfcmrvl_nci_unregister_dev
fw_dnld_over | nfcmrvl_fw_dnld_abort
release_firmware | fw_dnld_over
kfree(fw) //(1) | release_firmware
... | kfree(fw) //(2)

The third situation is shown below:

(Thread 1) | (Thread 2)
nfcmrvl_nci_recv_frame |
if(..->fw_download_in_progress)|
nfcmrvl_fw_dnld_recv_frame |
queue_work |
|
fw_dnld_rx_work | nfcmrvl_nci_unregister_dev
fw_dnld_over | nfcmrvl_fw_dnld_abort
release_firmware | fw_dnld_over
kfree(fw) //(1) | release_firmware
| kfree(fw) //(2)

The firmware struct is deallocated in position (1) and deallocated
in position (2) again.

The crash trace triggered by POC is like below:

[ 122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
[ 122.640457] Call Trace:
[ 122.640457] <TASK>
[ 122.640457] kfree+0xb0/0x330
[ 122.640457] fw_dnld_over+0x28/0xf0
[ 122.640457] nfcmrvl_nci_unregister_dev+0x61/0x70
[ 122.640457] nci_uart_tty_close+0x87/0xd0
[ 122.640457] tty_ldisc_kill+0x3e/0x80
[ 122.640457] tty_ldisc_hangup+0x1b2/0x2c0
[ 122.640457] __tty_hangup.part.0+0x316/0x520
[ 122.640457] tty_release+0x200/0x670
[ 122.640457] __fput+0x110/0x410
[ 122.640457] task_work_run+0x86/0xd0
[ 122.640457] exit_to_user_mode_prepare+0x1aa/0x1b0
[ 122.640457] syscall_exit_to_user_mode+0x19/0x50
[ 122.640457] do_syscall_64+0x48/0x90
[ 122.640457] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 122.640457] RIP: 0033:0x7f68433f6beb

What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.

This patch reorders destructive operations after nci_unregister_device
and adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

(Thread 1) | (Thread 2)
nfcmrvl_nci_unregister_dev |
nci_unregister_device |
nfc_unregister_device | nfc_fw_download
device_lock() |
... |
nfc_download = false; | ...
device_unlock() |
... | device_lock()
(destructive operations) | if(.. || !nfc_download)
| goto error;
| error:
| device_unlock()

If the device is detaching, the download function will goto error.
If the download function is executing, nci_unregister_device will
wait until download function is finished.

Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <[email protected]>
---
Changes in v4:
- Change bool variable nfc_download to static.

drivers/nfc/nfcmrvl/main.c | 2 +-
net/nfc/core.c | 6 +++++-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
{
struct nci_dev *ndev = priv->ndev;

+ nci_unregister_device(ndev);
if (priv->ndev->nfc_dev->fw_download_in_progress)
nfcmrvl_fw_dnld_abort(priv);

@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
if (gpio_is_valid(priv->config.reset_n_io))
gpio_free(priv->config.reset_n_io);

- nci_unregister_device(ndev);
nci_free_device(ndev);
kfree(priv);
}
diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..1d91334ee86 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,8 @@
#define NFC_CHECK_PRES_FREQ_MS 2000

int nfc_devlist_generation;
+/* nfc_download: used to judge whether nfc firmware download could start */
+static bool nfc_download;
DEFINE_MUTEX(nfc_devlist_mutex);

/* NFC device ID bitmap */
@@ -38,7 +40,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)

device_lock(&dev->dev);

- if (!device_is_registered(&dev->dev)) {
+ if (!device_is_registered(&dev->dev) || !nfc_download) {
rc = -ENODEV;
goto error;
}
@@ -1134,6 +1136,7 @@ int nfc_register_device(struct nfc_dev *dev)
dev->rfkill = NULL;
}
}
+ nfc_download = true;
device_unlock(&dev->dev);

rc = nfc_genl_device_added(dev);
@@ -1166,6 +1169,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
rfkill_unregister(dev->rfkill);
rfkill_destroy(dev->rfkill);
}
+ nfc_download = false;
device_unlock(&dev->dev);

if (dev->ops->check_presence) {
--
2.17.1


2022-04-28 06:18:11

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs

On Wed, 27 Apr 2022 09:14:38 +0800 Duoming Zhou wrote:
> diff --git a/net/nfc/core.c b/net/nfc/core.c
> index dc7a2404efd..1d91334ee86 100644
> --- a/net/nfc/core.c
> +++ b/net/nfc/core.c
> @@ -25,6 +25,8 @@
> #define NFC_CHECK_PRES_FREQ_MS 2000
>
> int nfc_devlist_generation;
> +/* nfc_download: used to judge whether nfc firmware download could start */
> +static bool nfc_download;
> DEFINE_MUTEX(nfc_devlist_mutex);
>
> /* NFC device ID bitmap */
> @@ -38,7 +40,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
>
> device_lock(&dev->dev);
>
> - if (!device_is_registered(&dev->dev)) {
> + if (!device_is_registered(&dev->dev) || !nfc_download) {
> rc = -ENODEV;
> goto error;
> }
> @@ -1134,6 +1136,7 @@ int nfc_register_device(struct nfc_dev *dev)
> dev->rfkill = NULL;
> }
> }
> + nfc_download = true;
> device_unlock(&dev->dev);
>
> rc = nfc_genl_device_added(dev);
> @@ -1166,6 +1169,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
> rfkill_unregister(dev->rfkill);
> rfkill_destroy(dev->rfkill);
> }
> + nfc_download = false;
> device_unlock(&dev->dev);
>
> if (dev->ops->check_presence) {

You can't use a single global variable, there can be many devices
each with their own lock.

Paolo suggested adding a lock, if spin lock doesn't fit the bill
why not add a mutex?

2022-04-28 07:45:27

by Duoming Zhou

[permalink] [raw]
Subject: Re: Re: [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs

Hello,

On Wed, 27 Apr 2022 17:45:48 -0700 Jakub Kicinski wrote:

> > diff --git a/net/nfc/core.c b/net/nfc/core.c
> > index dc7a2404efd..1d91334ee86 100644
> > --- a/net/nfc/core.c
> > +++ b/net/nfc/core.c
> > @@ -25,6 +25,8 @@
> > #define NFC_CHECK_PRES_FREQ_MS 2000
> >
> > int nfc_devlist_generation;
> > +/* nfc_download: used to judge whether nfc firmware download could start */
> > +static bool nfc_download;
> > DEFINE_MUTEX(nfc_devlist_mutex);
> >
> > /* NFC device ID bitmap */
> > @@ -38,7 +40,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
> >
> > device_lock(&dev->dev);
> >
> > - if (!device_is_registered(&dev->dev)) {
> > + if (!device_is_registered(&dev->dev) || !nfc_download) {
> > rc = -ENODEV;
> > goto error;
> > }
> > @@ -1134,6 +1136,7 @@ int nfc_register_device(struct nfc_dev *dev)
> > dev->rfkill = NULL;
> > }
> > }
> > + nfc_download = true;
> > device_unlock(&dev->dev);
> >
> > rc = nfc_genl_device_added(dev);
> > @@ -1166,6 +1169,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
> > rfkill_unregister(dev->rfkill);
> > rfkill_destroy(dev->rfkill);
> > }
> > + nfc_download = false;
> > device_unlock(&dev->dev);
> >
> > if (dev->ops->check_presence) {
>
> You can't use a single global variable, there can be many devices
> each with their own lock.
>
> Paolo suggested adding a lock, if spin lock doesn't fit the bill
> why not add a mutex?

We could not use mutex either, because the release_firmware() is also called by fw_dnld_timeout()
which is a timer handler. If we use mutex lock in a timer handler, it will cause sleep in atomic bug.
The process is shown below:

nfcmrvl_fw_dnld_start
...
mod_timer
(wait a time)
fw_dnld_timeout
fw_dnld_over
release_firmware

I will change the single global variable to dev->dev_up flag, which is shown below:

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
{
struct nci_dev *ndev = priv->ndev;

+ nci_unregister_device(ndev);
if (priv->ndev->nfc_dev->fw_download_in_progress)
nfcmrvl_fw_dnld_abort(priv);

@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
if (gpio_is_valid(priv->config.reset_n_io))
gpio_free(priv->config.reset_n_io);

- nci_unregister_device(ndev);
nci_free_device(ndev);
kfree(priv);
}
diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..09f54c599fe 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -1166,6 +1166,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
rfkill_unregister(dev->rfkill);
rfkill_destroy(dev->rfkill);
}
+ dev->dev_up = false;
device_unlock(&dev->dev);

if (dev->ops->check_presence) {

The above solution has been tested, it is well synchronized.

Best regards,
Duoming Zhou

2022-04-28 14:28:34

by Lin Ma

[permalink] [raw]
Subject: Re: [PATCH net v4] nfc: ... device_is_registered() is data race-able

Hello Jakub,

and hello there maintainers, when we tried to fix this race problem, we found another very weird issue as below

>
> You can't use a single global variable, there can be many devices
> each with their own lock.
>
> Paolo suggested adding a lock, if spin lock doesn't fit the bill
> why not add a mutex?

The lock patch can be added to nfcmrvl code but we prefer to fix this in the NFC core layer hence every other driver that supports the firmware downloading task will be free of such a problem.

But when we analyze the race between the netlink tasks and the cleanup routine, we find some *condition checks* fail to fulfill their responsibility.

For example, we once though that the device_lock + device_is_registered check can help to fix the race as below.

netlink task | cleanup routine
|
nfc_genl_fw_download | nfc_unregister_device
nfc_fw_download | device_del
device_lock | device_lock
// wait lock | kobject_del
// ... | ...
device_is_registered | device_unlock
rc = -ENODEV |

However, by dynamic debugging this issue, we find out that **even after the device_del**, the device_is_registered check still returns TRUE!

This is by no means matching our expectations as one of our previous patch relies on the device_is_registered code.

-> the patch: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")

To find out why, we find out the device_is_registered is implemented like below:

static inline int device_is_registered(struct device *dev)
{
return dev->kobj.state_in_sysfs;
}

By debugging, we find out in normal case, this kobj.state_in_sysfs will be clear out like below

[#0] 0xffffffff81f0743a → __kobject_del(kobj=0xffff888009ca7018)
[#1] 0xffffffff81f07882 → kobject_del(kobj=0xffff888009ca7018)
[#2] 0xffffffff81f07882 → kobject_del(kobj=0xffff888009ca7018)
[#3] 0xffffffff827708db → device_del(dev=0xffff888009ca7018)
[#4] 0xffffffff8396496f → nfc_unregister_device(dev=0xffff888009ca7000)
[#5] 0xffffffff839850a9 → nci_unregister_device(ndev=0xffff888009ca3000)
[#6] 0xffffffff82811308 → nfcmrvl_nci_unregister_dev(priv=0xffff88800c805c00)
[#7] 0xffffffff83990c4f → nci_uart_tty_close(tty=0xffff88800b450000)
[#8] 0xffffffff820f6bd3 → tty_ldisc_kill(tty=0xffff88800b450000)
[#9] 0xffffffff820f7fb1 → tty_ldisc_hangup(tty=0xffff88800b450000, reinit=0x0)

The clear out is in function __kobject_del

static void __kobject_del(struct kobject *kobj)
{
// ...

kobj->state_in_sysfs = 0;
kobj_kset_leave(kobj);
kobj->parent = NULL;
}

The structure of device_del is like below

void device_del(struct device *dev)
{
struct device *parent = dev->parent;
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
unsigned int noio_flag;

device_lock(dev);
kill_device(dev);
device_unlock(dev);

// ...
kobject_del(&dev->kobj);
cleanup_glue_dir(dev, glue_dir);
memalloc_noio_restore(noio_flag);
put_device(parent);
}

In another word, the device_del -> kobject_del -> __kobject_del is not protected by the device_lock.

This means the device_lock + device_is_registered is still prone to the data race. And this is not just the problem with firmware downloading. The all relevant netlink tasks that use the device_lock + device_is_registered is possible to be raced.

To this end, we will come out with two patches, one for fixing this device_is_registered by using another status variable instead. The other is the patch that reorders the code in nci_unregister_device.

Thanks
Lin Ma