2022-04-11 13:00:35

by Duoming Zhou

[permalink] [raw]
Subject: [PATCH] drivers: nfc: nfcmrvl: fix double free bug in nfcmrvl_nci_unregister_dev()

There is a potential double bug in nfcmrvl usb driver between
unregister and resume operation.

The race that cause that double free bug can be shown as below:

(FREE) | (USE)
| nfcmrvl_resume
| nfcmrvl_submit_bulk_urb
| nfcmrvl_bulk_complete
| nfcmrvl_nci_recv_frame
| nfcmrvl_fw_dnld_recv_frame
| queue_work
| fw_dnld_rx_work
| fw_dnld_over
| release_firmware
| kfree(fw); //(1)
nfcmrvl_disconnect |
nfcmrvl_nci_unregister_dev |
nfcmrvl_fw_dnld_abort |
fw_dnld_over | ...
if (priv->fw_dnld.fw) |
release_firmware |
kfree(fw); //(2) |
... | fw = NULL;

When nfcmrvl usb driver is resuming, we detach the device.
The release_firmware() will deallocate firmware in position (1),
but firmware will be deallocated again in position (2), which
leads to double free.

This patch reorders nfcmrvl_fw_dnld_deinit() before nfcmrvl_fw_dnld_abort()
in order to prevent double free bug. Because destroy_workqueue() will
not return until all work is finished.

Signed-off-by: Duoming Zhou <[email protected]>
---
drivers/nfc/nfcmrvl/main.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..d8342271f50 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,11 +183,10 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
{
struct nci_dev *ndev = priv->ndev;

+ nfcmrvl_fw_dnld_deinit(priv);
if (priv->ndev->nfc_dev->fw_download_in_progress)
nfcmrvl_fw_dnld_abort(priv);

- nfcmrvl_fw_dnld_deinit(priv);
-
if (gpio_is_valid(priv->config.reset_n_io))
gpio_free(priv->config.reset_n_io);

--
2.17.1


2022-04-11 22:19:13

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] drivers: nfc: nfcmrvl: fix double free bug in nfcmrvl_nci_unregister_dev()

On 10/04/2022 10:31, Duoming Zhou wrote:
> There is a potential double bug in nfcmrvl usb driver between
> unregister and resume operation.
>

Thank you for your patch. There is something to discuss/improve.

> The race that cause that double free bug can be shown as below:

Your patch solves the most visible race, but because of lack of locking,
I believe race still might exist:

(FREE) | (USE)
| nfcmrvl_resume
| nfcmrvl_submit_bulk_urb
| nfcmrvl_bulk_complete
| nfcmrvl_nci_recv_frame
| nfcmrvl_fw_dnld_recv_frame
| queue_work
| fw_dnld_rx_work
nfcmrvl_disconnect |
nfcmrvl_nci_unregister_dev |
nfcmrvl_fw_dnld_deinit |
wait for the workqueue to finish |
| fw_dnld_over
| release_firmware
| kfree(fw);
| no synchronization //(1)
if (fw_download_in_progress)
- no synchronization, so CPU sees old value
nfcmrvl_fw_dnld_abort |
fw_dnld_over | ...
if (priv->fw_dnld.fw) |
release_firmware |
kfree(fw); //(2) |
... | fw = NULL;

The kfree() from (2) would still free old value. Even if fw=NULL happens
earlier, it is not propagated back to the other CPU, unless there are
some implicit barriers due to workqueue?

Is it safe then to rely on such implicit barriers from workqueue?

>
> (FREE) | (USE)
> | nfcmrvl_resume
> | nfcmrvl_submit_bulk_urb
> | nfcmrvl_bulk_complete
> | nfcmrvl_nci_recv_frame
> | nfcmrvl_fw_dnld_recv_frame
> | queue_work
> | fw_dnld_rx_work
> | fw_dnld_over
> | release_firmware
> | kfree(fw); //(1)
> nfcmrvl_disconnect |
> nfcmrvl_nci_unregister_dev |
> nfcmrvl_fw_dnld_abort |
> fw_dnld_over | ...
> if (priv->fw_dnld.fw) |
> release_firmware |
> kfree(fw); //(2) |
> ... | fw = NULL;
>

Best regards,
Krzysztof

2022-04-12 13:22:05

by Duoming Zhou

[permalink] [raw]
Subject: Re: Re: [PATCH] drivers: nfc: nfcmrvl: fix double free bug in nfcmrvl_nci_unregister_dev()

Hello,

On Sun, 10 Apr 2022 11:27:09 +0200 Krzysztof Kozlowski wrote:

> > There is a potential double bug in nfcmrvl usb driver between
> > unregister and resume operation.
> >
>
> Thank you for your patch. There is something to discuss/improve.
>
> > The race that cause that double free bug can be shown as below:
>
> Your patch solves the most visible race, but because of lack of locking,
> I believe race still might exist:
>
> (FREE) | (USE)
> | nfcmrvl_resume
> | nfcmrvl_submit_bulk_urb
> | nfcmrvl_bulk_complete
> | nfcmrvl_nci_recv_frame
> | nfcmrvl_fw_dnld_recv_frame
> | queue_work
> | fw_dnld_rx_work
> nfcmrvl_disconnect |
> nfcmrvl_nci_unregister_dev |
> nfcmrvl_fw_dnld_deinit |
> wait for the workqueue to finish |
> | fw_dnld_over
> | release_firmware
> | kfree(fw);
> | no synchronization //(1)
> if (fw_download_in_progress)
> - no synchronization, so CPU sees old value
> nfcmrvl_fw_dnld_abort |
> fw_dnld_over | ...
> if (priv->fw_dnld.fw) |
> release_firmware |
> kfree(fw); //(2) |
> ... | fw = NULL;
>
> The kfree() from (2) would still free old value. Even if fw=NULL happens
> earlier, it is not propagated back to the other CPU, unless there are
> some implicit barriers due to workqueue?
>
> Is it safe then to rely on such implicit barriers from workqueue?

I think it is safe to rely on such barriers from destroy_workqueue().
The destroy_workqueue will wait all work items to finish, the code
behind it will not execute until all work items are finished.
The progress is shown below:

destroy_workqueue()-->drain_workqueue()-->flush_workqueue()-->wait_for_completion().

The function drain_workqueue() will wait until the workqueue becomes empty.
It sets wq->flags to __WQ_DRAINING, this could ensure the new coming work items
will not be queued into the draining workqueue. Because __queue_work will check
the __WQ_DRAINING flag.

Then drain_workqueue() calls flush_workqueue() to ensure that any scheduled work
has run to completion. Because wait_for_completion() is called in flush_workqueue(),
it will block current thread and wait work items to completion.

Best regards,
Duoming Zhou.