2023-03-18 08:13:32

by Zheng Wang

[permalink] [raw]
Subject: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition

In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
with qedi_recovery_handler and bound &qedi->board_disable_work
with qedi_board_disable_work.

When it calls qedi_schedule_recovery_handler, it will finally
call schedule_delayed_work to start the work.

When we call qedi_remove to remove the driver, there
may be a sequence as follows:

Fix it by finishing the work before cleanup in qedi_remove.

CPU0 CPU1

|qedi_recovery_handler
qedi_remove |
__qedi_remove |
iscsi_host_free |
scsi_host_put |
//free shost |
|iscsi_host_for_each_session
|//use qedi->shost

Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
Signed-off-by: Zheng Wang <[email protected]>
---
drivers/scsi/qedi/qedi_main.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index f2ee49756df8..25223f6f5344 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
int rval;
u16 retry = 10;

+ /*cancel work*/
+ cancel_delayed_work_sync(&qedi->recovery_work);
+ cancel_delayed_work_sync(&qedi->board_disable_work);
+
if (mode == QEDI_MODE_NORMAL)
iscsi_host_remove(qedi->shost, false);
else if (mode == QEDI_MODE_SHUTDOWN)
--
2.25.1



2023-03-20 16:20:16

by Mike Christie

[permalink] [raw]
Subject: Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition

On 3/18/23 3:13 AM, Zheng Wang wrote:
> In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
> with qedi_recovery_handler and bound &qedi->board_disable_work
> with qedi_board_disable_work.
>
> When it calls qedi_schedule_recovery_handler, it will finally
> call schedule_delayed_work to start the work.
>
> When we call qedi_remove to remove the driver, there
> may be a sequence as follows:
>
> Fix it by finishing the work before cleanup in qedi_remove.
>
> CPU0 CPU1
>
> |qedi_recovery_handler
> qedi_remove |
> __qedi_remove |
> iscsi_host_free |
> scsi_host_put |
> //free shost |
> |iscsi_host_for_each_session
> |//use qedi->shost
>
> Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> Signed-off-by: Zheng Wang <[email protected]>
> ---
> drivers/scsi/qedi/qedi_main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> index f2ee49756df8..25223f6f5344 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
> int rval;
> u16 retry = 10;
>
> + /*cancel work*/

This comment is not needed. The name of the functions you are calling have
"cancel" and "work" in them so we know. If you want to add a comment explain
why the cancel calls are needed here.


> + cancel_delayed_work_sync(&qedi->recovery_work);
> + cancel_delayed_work_sync(&qedi->board_disable_work);


How do you know after you have called cancel_delayed_work_sync that
schedule_recovery_handler or schedule_hw_err_handler can't be called?
I don't know the qed driver well, but it looks like you could have
operations still running, so after you cancel here one of those ops
could lead to them scheduling the work again.


> +
> if (mode == QEDI_MODE_NORMAL)
> iscsi_host_remove(qedi->shost, false);
> else if (mode == QEDI_MODE_SHUTDOWN)


2023-03-23 03:49:07

by Zheng Hacker

[permalink] [raw]
Subject: Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition

Mike Christie <[email protected]> 于2023年3月21日周二 00:11写道:
>
> On 3/18/23 3:13 AM, Zheng Wang wrote:
> > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
> > with qedi_recovery_handler and bound &qedi->board_disable_work
> > with qedi_board_disable_work.
> >
> > When it calls qedi_schedule_recovery_handler, it will finally
> > call schedule_delayed_work to start the work.
> >
> > When we call qedi_remove to remove the driver, there
> > may be a sequence as follows:
> >
> > Fix it by finishing the work before cleanup in qedi_remove.
> >
> > CPU0 CPU1
> >
> > |qedi_recovery_handler
> > qedi_remove |
> > __qedi_remove |
> > iscsi_host_free |
> > scsi_host_put |
> > //free shost |
> > |iscsi_host_for_each_session
> > |//use qedi->shost
> >
> > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > Signed-off-by: Zheng Wang <[email protected]>
> > ---
> > drivers/scsi/qedi/qedi_main.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> > index f2ee49756df8..25223f6f5344 100644
> > --- a/drivers/scsi/qedi/qedi_main.c
> > +++ b/drivers/scsi/qedi/qedi_main.c
> > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
> > int rval;
> > u16 retry = 10;
> >
> > + /*cancel work*/
>
> This comment is not needed. The name of the functions you are calling have
> "cancel" and "work" in them so we know. If you want to add a comment explain
> why the cancel calls are needed here.
>

Hi,

Sorry for my late reply and thanks for your advice. Will remove it in
the next version of patch.

>
> > + cancel_delayed_work_sync(&qedi->recovery_work);
> > + cancel_delayed_work_sync(&qedi->board_disable_work);
>
>
> How do you know after you have called cancel_delayed_work_sync that
> schedule_recovery_handler or schedule_hw_err_handler can't be called?
> I don't know the qed driver well, but it looks like you could have
> operations still running, so after you cancel here one of those ops
> could lead to them scheduling the work again.
>

Sorry I didn't know how to make sure there's no more schedule. But I do think
this is important. Maybe there're someone else who can give us advice.

Best regards,
Zheng
>
> > +
> > if (mode == QEDI_MODE_NORMAL)
> > iscsi_host_remove(qedi->shost, false);
> > else if (mode == QEDI_MODE_SHUTDOWN)
>

2023-03-23 10:20:52

by Manish Rangankar

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition



> -----Original Message-----
> From: Zheng Hacker <[email protected]>
> Sent: Thursday, March 23, 2023 9:15 AM
> To: Mike Christie <[email protected]>
> Cc: Zheng Wang <[email protected]>; Nilesh Javali <[email protected]>;
> Manish Rangankar <[email protected]>; GR-QLogic-Storage-
> Upstream <[email protected]>;
> [email protected]; [email protected]; linux-
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> qedi_remove due to race condition
>
> External Email
>
> ----------------------------------------------------------------------
> Mike Christie <[email protected]> 于2023年3月21日周二 00:11写
> 道:
> >
> > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > In qedi_probe, it calls __qedi_probe, which bound
> > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > &qedi->board_disable_work with qedi_board_disable_work.
> > >
> > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > schedule_delayed_work to start the work.
> > >
> > > When we call qedi_remove to remove the driver, there may be a
> > > sequence as follows:
> > >
> > > Fix it by finishing the work before cleanup in qedi_remove.
> > >
> > > CPU0 CPU1
> > >
> > > |qedi_recovery_handler
> > > qedi_remove |
> > > __qedi_remove |
> > > iscsi_host_free |
> > > scsi_host_put |
> > > //free shost |
> > > |iscsi_host_for_each_session
> > > |//use qedi->shost
> > >
> > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > Signed-off-by: Zheng Wang <[email protected]>
> > > ---
> > > drivers/scsi/qedi/qedi_main.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > 100644
> > > --- a/drivers/scsi/qedi/qedi_main.c
> > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> *pdev, int mode)
> > > int rval;
> > > u16 retry = 10;
> > >
> > > + /*cancel work*/
> >
> > This comment is not needed. The name of the functions you are calling
> > have "cancel" and "work" in them so we know. If you want to add a
> > comment explain why the cancel calls are needed here.
> >
>
> Hi,
>
> Sorry for my late reply and thanks for your advice. Will remove it in the next
> version of patch.
>
> >
> > > + cancel_delayed_work_sync(&qedi->recovery_work);
> > > + cancel_delayed_work_sync(&qedi->board_disable_work);
> >
> >
> > How do you know after you have called cancel_delayed_work_sync that
> > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > I don't know the qed driver well, but it looks like you could have
> > operations still running, so after you cancel here one of those ops
> > could lead to them scheduling the work again.
> >
>
> Sorry I didn't know how to make sure there's no more schedule. But I do
> think this is important. Maybe there're someone else who can give us advice.
>
> Best regards,
> Zheng
> >

Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and
qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.

Thanks,
Manish

2023-03-24 15:22:57

by Zheng Hacker

[permalink] [raw]
Subject: Re: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition

Manish Rangankar <[email protected]> 于2023年3月23日周四 18:17写道:
>
>
>
> > -----Original Message-----
> > From: Zheng Hacker <[email protected]>
> > Sent: Thursday, March 23, 2023 9:15 AM
> > To: Mike Christie <[email protected]>
> > Cc: Zheng Wang <[email protected]>; Nilesh Javali <[email protected]>;
> > Manish Rangankar <[email protected]>; GR-QLogic-Storage-
> > Upstream <[email protected]>;
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in
> > qedi_remove due to race condition
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Mike Christie <[email protected]> 于2023年3月21日周二 00:11写
> > 道:
> > >
> > > On 3/18/23 3:13 AM, Zheng Wang wrote:
> > > > In qedi_probe, it calls __qedi_probe, which bound
> > > > &qedi->recovery_work with qedi_recovery_handler and bound
> > > > &qedi->board_disable_work with qedi_board_disable_work.
> > > >
> > > > When it calls qedi_schedule_recovery_handler, it will finally call
> > > > schedule_delayed_work to start the work.
> > > >
> > > > When we call qedi_remove to remove the driver, there may be a
> > > > sequence as follows:
> > > >
> > > > Fix it by finishing the work before cleanup in qedi_remove.
> > > >
> > > > CPU0 CPU1
> > > >
> > > > |qedi_recovery_handler
> > > > qedi_remove |
> > > > __qedi_remove |
> > > > iscsi_host_free |
> > > > scsi_host_put |
> > > > //free shost |
> > > > |iscsi_host_for_each_session
> > > > |//use qedi->shost
> > > >
> > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
> > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > ---
> > > > drivers/scsi/qedi/qedi_main.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/drivers/scsi/qedi/qedi_main.c
> > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344
> > > > 100644
> > > > --- a/drivers/scsi/qedi/qedi_main.c
> > > > +++ b/drivers/scsi/qedi/qedi_main.c
> > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev
> > *pdev, int mode)
> > > > int rval;
> > > > u16 retry = 10;
> > > >
> > > > + /*cancel work*/
> > >
> > > This comment is not needed. The name of the functions you are calling
> > > have "cancel" and "work" in them so we know. If you want to add a
> > > comment explain why the cancel calls are needed here.
> > >
> >
> > Hi,
> >
> > Sorry for my late reply and thanks for your advice. Will remove it in the next
> > version of patch.
> >
> > >
> > > > + cancel_delayed_work_sync(&qedi->recovery_work);
> > > > + cancel_delayed_work_sync(&qedi->board_disable_work);
> > >
> > >
> > > How do you know after you have called cancel_delayed_work_sync that
> > > schedule_recovery_handler or schedule_hw_err_handler can't be called?
> > > I don't know the qed driver well, but it looks like you could have
> > > operations still running, so after you cancel here one of those ops
> > > could lead to them scheduling the work again.
> > >
> >
> > Sorry I didn't know how to make sure there's no more schedule. But I do
> > think this is important. Maybe there're someone else who can give us advice.
> >
> > Best regards,
> > Zheng
> > >
>
> Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and
> qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver.
>

Sorry for my late reply. Will apply that in next version.

Best reagrds,
Zheng

> Thanks,
> Manish
>