2021-06-03 22:27:52

by Colin King

[permalink] [raw]
Subject: re: scsi: iscsi: Drop suspend calls from ep_disconnect

Hi,

Static analysis on linux-next with Coverity has found an issue in
drivers/scsi/qedi/qedi_iscsi.c with the following commit:

commit 27e986289e739d08c1a4861cc3d3ec9b3a60845e
Author: Mike Christie <[email protected]>
Date: Tue May 25 13:17:56 2021 -0500

scsi: iscsi: Drop suspend calls from ep_disconnect

The analysis is as follows:

1662 void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess)
1663 {
1664 struct iscsi_session *session = cls_sess->dd_data;
1665 struct iscsi_conn *conn = session->leadconn;

deref_ptr: Directly dereferencing pointer conn.

1666 struct qedi_conn *qedi_conn = conn->dd_data;
1667
1668 if (iscsi_is_session_online(cls_sess)) {
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking conn suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

1669 if (conn)
1670 iscsi_suspend_queue(conn);
1671 qedi_ep_disconnect(qedi_conn->iscsi_ep);
1672 }

Pointer conn is being checked to see if it is null, but earlier it has
been dereferenced on the assignment of qedi_conn. So either conn will
be null at some point and a null ptr dereference occurs when qedi_conn
is assigned, or conn can never be null and the conn null check is
redundant and can be removed.

Colin


2021-06-03 23:29:46

by Mike Christie

[permalink] [raw]
Subject: Re: scsi: iscsi: Drop suspend calls from ep_disconnect

On 6/3/21 5:25 PM, Colin Ian King wrote:
> Hi,
>
> Static analysis on linux-next with Coverity has found an issue in
> drivers/scsi/qedi/qedi_iscsi.c with the following commit:
>
> commit 27e986289e739d08c1a4861cc3d3ec9b3a60845e
> Author: Mike Christie <[email protected]>
> Date: Tue May 25 13:17:56 2021 -0500
>
> scsi: iscsi: Drop suspend calls from ep_disconnect
>
> The analysis is as follows:
>
> 1662 void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess)
> 1663 {
> 1664 struct iscsi_session *session = cls_sess->dd_data;
> 1665 struct iscsi_conn *conn = session->leadconn;
>
> deref_ptr: Directly dereferencing pointer conn.
>
> 1666 struct qedi_conn *qedi_conn = conn->dd_data;
> 1667
> 1668 if (iscsi_is_session_online(cls_sess)) {
> Dereference before null check (REVERSE_INULL)
> check_after_deref: Null-checking conn suggests that it may be null,
> but it has already been dereferenced on all paths leading to the check.
>
> 1669 if (conn)
> 1670 iscsi_suspend_queue(conn);
> 1671 qedi_ep_disconnect(qedi_conn->iscsi_ep);
> 1672 }
>
> Pointer conn is being checked to see if it is null, but earlier it has
> been dereferenced on the assignment of qedi_conn. So either conn will
> be null at some point and a null ptr dereference occurs when qedi_conn
> is assigned, or conn can never be null and the conn null check is
> redundant and can be removed.

The analysis is correct.

The bigger problem is that this entire function seems racey with the
normal conn/ep disconnect or shutdown.

Manish, when this function is run iscsid or the in-kernel conn error
cleanup handler can be running right? There is nothing preventing
those from running at the same time?

I think you want to call iscsi_host_remove at the beginning of __qedi_remove.
That will tell userpsace that the host is being removed and libiscsi will
start the session shutdown and removal process. It then waits for the
sessions to be removed. We can then proceed with the other host removal
cleanup, and at the end of __qedi_remove you do the iscsi_host_free
call.

2021-06-03 23:31:33

by Mike Christie

[permalink] [raw]
Subject: Re: scsi: iscsi: Drop suspend calls from ep_disconnect

On 6/3/21 6:25 PM, Mike Christie wrote:
> On 6/3/21 5:25 PM, Colin Ian King wrote:
>> Hi,
>>
>> Static analysis on linux-next with Coverity has found an issue in
>> drivers/scsi/qedi/qedi_iscsi.c with the following commit:
>>
>> commit 27e986289e739d08c1a4861cc3d3ec9b3a60845e
>> Author: Mike Christie <[email protected]>
>> Date: Tue May 25 13:17:56 2021 -0500
>>
>> scsi: iscsi: Drop suspend calls from ep_disconnect
>>
>> The analysis is as follows:
>>
>> 1662 void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess)
>> 1663 {
>> 1664 struct iscsi_session *session = cls_sess->dd_data;
>> 1665 struct iscsi_conn *conn = session->leadconn;
>>
>> deref_ptr: Directly dereferencing pointer conn.
>>
>> 1666 struct qedi_conn *qedi_conn = conn->dd_data;
>> 1667
>> 1668 if (iscsi_is_session_online(cls_sess)) {
>> Dereference before null check (REVERSE_INULL)
>> check_after_deref: Null-checking conn suggests that it may be null,
>> but it has already been dereferenced on all paths leading to the check.
>>
>> 1669 if (conn)
>> 1670 iscsi_suspend_queue(conn);
>> 1671 qedi_ep_disconnect(qedi_conn->iscsi_ep);
>> 1672 }
>>
>> Pointer conn is being checked to see if it is null, but earlier it has
>> been dereferenced on the assignment of qedi_conn. So either conn will
>> be null at some point and a null ptr dereference occurs when qedi_conn
>> is assigned, or conn can never be null and the conn null check is
>> redundant and can be removed.
>
> The analysis is correct.
>
> The bigger problem is that this entire function seems racey with the
> normal conn/ep disconnect or shutdown.
>
> Manish, when this function is run iscsid or the in-kernel conn error
> cleanup handler can be running right? There is nothing preventing
> those from running at the same time?
>
> I think you want to call iscsi_host_remove at the beginning of __qedi_remove.
> That will tell userpsace that the host is being removed and libiscsi will

I meant iscsid not libiscsi.

> start the session shutdown and removal process. It then waits for the
> sessions to be removed. We can then proceed with the other host removal
> cleanup, and at the end of __qedi_remove you do the iscsi_host_free
> call.
>

2021-06-04 21:49:47

by Mike Christie

[permalink] [raw]
Subject: Re: scsi: iscsi: Drop suspend calls from ep_disconnect

On 6/3/21 6:27 PM, Mike Christie wrote:
> On 6/3/21 6:25 PM, Mike Christie wrote:
>> On 6/3/21 5:25 PM, Colin Ian King wrote:
>>> Hi,
>>>
>>> Static analysis on linux-next with Coverity has found an issue in
>>> drivers/scsi/qedi/qedi_iscsi.c with the following commit:
>>>
>>> commit 27e986289e739d08c1a4861cc3d3ec9b3a60845e
>>> Author: Mike Christie <[email protected]>
>>> Date: Tue May 25 13:17:56 2021 -0500
>>>
>>> scsi: iscsi: Drop suspend calls from ep_disconnect
>>>
>>> The analysis is as follows:
>>>
>>> 1662 void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess)
>>> 1663 {
>>> 1664 struct iscsi_session *session = cls_sess->dd_data;
>>> 1665 struct iscsi_conn *conn = session->leadconn;
>>>
>>> deref_ptr: Directly dereferencing pointer conn.
>>>
>>> 1666 struct qedi_conn *qedi_conn = conn->dd_data;
>>> 1667
>>> 1668 if (iscsi_is_session_online(cls_sess)) {
>>> Dereference before null check (REVERSE_INULL)
>>> check_after_deref: Null-checking conn suggests that it may be null,
>>> but it has already been dereferenced on all paths leading to the check.
>>>
>>> 1669 if (conn)
>>> 1670 iscsi_suspend_queue(conn);
>>> 1671 qedi_ep_disconnect(qedi_conn->iscsi_ep);
>>> 1672 }
>>>
>>> Pointer conn is being checked to see if it is null, but earlier it has
>>> been dereferenced on the assignment of qedi_conn. So either conn will
>>> be null at some point and a null ptr dereference occurs when qedi_conn
>>> is assigned, or conn can never be null and the conn null check is
>>> redundant and can be removed.
>>
>> The analysis is correct.
>>
>> The bigger problem is that this entire function seems racey with the
>> normal conn/ep disconnect or shutdown.
>>
>> Manish, when this function is run iscsid or the in-kernel conn error
>> cleanup handler can be running right? There is nothing preventing
>> those from running at the same time?
>>
>> I think you want to call iscsi_host_remove at the beginning of __qedi_remove.
>> That will tell userpsace that the host is being removed and libiscsi will
>
> I meant iscsid not libiscsi.
>
>> start the session shutdown and removal process. It then waits for the
>> sessions to be removed. We can then proceed with the other host removal
>> cleanup, and at the end of __qedi_remove you do the iscsi_host_free
>> call.
>>
>

Hey Manish,

Here is a lightly tested patch.


diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h
index fb44a282613e..9f8e8ef405a1 100644
--- a/drivers/scsi/qedi/qedi_gbl.h
+++ b/drivers/scsi/qedi/qedi_gbl.h
@@ -72,6 +72,5 @@ void qedi_remove_sysfs_ctx_attr(struct qedi_ctx *qedi);
void qedi_clearsq(struct qedi_ctx *qedi,
struct qedi_conn *qedi_conn,
struct iscsi_task *task);
-void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess);

#endif
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index bf581ecea897..97f83760da88 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1659,23 +1659,6 @@ void qedi_process_iscsi_error(struct qedi_endpoint *ep,
qedi_start_conn_recovery(qedi_conn->qedi, qedi_conn);
}

-void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess)
-{
- struct iscsi_session *session = cls_sess->dd_data;
- struct iscsi_conn *conn = session->leadconn;
- struct qedi_conn *qedi_conn = conn->dd_data;
-
- if (iscsi_is_session_online(cls_sess)) {
- if (conn)
- iscsi_suspend_queue(conn);
- qedi_ep_disconnect(qedi_conn->iscsi_ep);
- }
-
- qedi_conn_destroy(qedi_conn->cls_conn);
-
- qedi_session_destroy(cls_sess);
-}
-
void qedi_process_tcp_error(struct qedi_endpoint *ep,
struct iscsi_eqe_data *data)
{
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index edf915432704..0b0acb827071 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2417,11 +2417,9 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
int rval;
u16 retry = 10;

- if (mode == QEDI_MODE_SHUTDOWN)
- iscsi_host_for_each_session(qedi->shost,
- qedi_clear_session_ctx);
-
if (mode == QEDI_MODE_NORMAL || mode == QEDI_MODE_SHUTDOWN) {
+ iscsi_host_remove(qedi->shost);
+
if (qedi->tmf_thread) {
flush_workqueue(qedi->tmf_thread);
destroy_workqueue(qedi->tmf_thread);
@@ -2482,7 +2480,6 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
if (qedi->boot_kset)
iscsi_boot_destroy_kset(qedi->boot_kset);

- iscsi_host_remove(qedi->shost);
iscsi_host_free(qedi->shost);
}
}

2021-06-07 11:07:02

by Manish Rangankar

[permalink] [raw]
Subject: RE: [EXT] Re: scsi: iscsi: Drop suspend calls from ep_disconnect

> >> Manish, when this function is run iscsid or the in-kernel conn error
> >> cleanup handler can be running right? There is nothing preventing
> >> those from running at the same time?
> >>

Yes, this code only expected to run when we have boot from SAN multipath setup,
where one connection is active and other is in FAILED state and shutdown
handler gets called.

> >> I think you want to call iscsi_host_remove at the beginning of
> __qedi_remove.
> >> That will tell userpsace that the host is being removed and libiscsi
> >> will
> >
> > I meant iscsid not libiscsi.
> >
> >> start the session shutdown and removal process. It then waits for the
> >> sessions to be removed. We can then proceed with the other host
> >> removal cleanup, and at the end of __qedi_remove you do the
> >> iscsi_host_free call.
> >>
> >
>
> Hey Manish,
>
> Here is a lightly tested patch.
>
>
> diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h index
> fb44a282613e..9f8e8ef405a1 100644
> --- a/drivers/scsi/qedi/qedi_gbl.h
> +++ b/drivers/scsi/qedi/qedi_gbl.h
> @@ -72,6 +72,5 @@ void qedi_remove_sysfs_ctx_attr(struct qedi_ctx *qedi);
> void qedi_clearsq(struct qedi_ctx *qedi,
> struct qedi_conn *qedi_conn,
> struct iscsi_task *task);
> -void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess);
>
> #endif
> diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index
> bf581ecea897..97f83760da88 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -1659,23 +1659,6 @@ void qedi_process_iscsi_error(struct qedi_endpoint
> *ep,
> qedi_start_conn_recovery(qedi_conn->qedi, qedi_conn); }
>
> -void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess) -{
> - struct iscsi_session *session = cls_sess->dd_data;
> - struct iscsi_conn *conn = session->leadconn;
> - struct qedi_conn *qedi_conn = conn->dd_data;
> -
> - if (iscsi_is_session_online(cls_sess)) {
> - if (conn)
> - iscsi_suspend_queue(conn);
> - qedi_ep_disconnect(qedi_conn->iscsi_ep);
> - }
> -
> - qedi_conn_destroy(qedi_conn->cls_conn);
> -
> - qedi_session_destroy(cls_sess);
> -}
> -
> void qedi_process_tcp_error(struct qedi_endpoint *ep,
> struct iscsi_eqe_data *data)
> {
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index
> edf915432704..0b0acb827071 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -2417,11 +2417,9 @@ static void __qedi_remove(struct pci_dev *pdev, int
> mode)
> int rval;
> u16 retry = 10;
>
> - if (mode == QEDI_MODE_SHUTDOWN)
> - iscsi_host_for_each_session(qedi->shost,
> - qedi_clear_session_ctx);
> -
> if (mode == QEDI_MODE_NORMAL || mode ==
> QEDI_MODE_SHUTDOWN) {
> + iscsi_host_remove(qedi->shost);
> +
> if (qedi->tmf_thread) {
> flush_workqueue(qedi->tmf_thread);
> destroy_workqueue(qedi->tmf_thread);
> @@ -2482,7 +2480,6 @@ static void __qedi_remove(struct pci_dev *pdev, int
> mode)
> if (qedi->boot_kset)
> iscsi_boot_destroy_kset(qedi->boot_kset);
>
> - iscsi_host_remove(qedi->shost);
> iscsi_host_free(qedi->shost);
> }
> }

Agree, with your code changes. I will run BFS + shutdown tests with this change, and let you know in case hit any issue.

Thanks,
Manish

2021-06-09 13:00:17

by Manish Rangankar

[permalink] [raw]
Subject: RE: [EXT] Re: scsi: iscsi: Drop suspend calls from ep_disconnect



> > Hey Manish,
> >
> > Here is a lightly tested patch.
> >
> >
> > diff --git a/drivers/scsi/qedi/qedi_gbl.h
> > b/drivers/scsi/qedi/qedi_gbl.h index
> > fb44a282613e..9f8e8ef405a1 100644
> > --- a/drivers/scsi/qedi/qedi_gbl.h
> > +++ b/drivers/scsi/qedi/qedi_gbl.h
> > @@ -72,6 +72,5 @@ void qedi_remove_sysfs_ctx_attr(struct qedi_ctx
> > *qedi); void qedi_clearsq(struct qedi_ctx *qedi,
> > struct qedi_conn *qedi_conn,
> > struct iscsi_task *task);
> > -void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess);
> >
> > #endif
> > diff --git a/drivers/scsi/qedi/qedi_iscsi.c
> > b/drivers/scsi/qedi/qedi_iscsi.c index
> > bf581ecea897..97f83760da88 100644
> > --- a/drivers/scsi/qedi/qedi_iscsi.c
> > +++ b/drivers/scsi/qedi/qedi_iscsi.c
> > @@ -1659,23 +1659,6 @@ void qedi_process_iscsi_error(struct
> > qedi_endpoint *ep,
> > qedi_start_conn_recovery(qedi_conn->qedi, qedi_conn); }
> >
> > -void qedi_clear_session_ctx(struct iscsi_cls_session *cls_sess) -{
> > - struct iscsi_session *session = cls_sess->dd_data;
> > - struct iscsi_conn *conn = session->leadconn;
> > - struct qedi_conn *qedi_conn = conn->dd_data;
> > -
> > - if (iscsi_is_session_online(cls_sess)) {
> > - if (conn)
> > - iscsi_suspend_queue(conn);
> > - qedi_ep_disconnect(qedi_conn->iscsi_ep);
> > - }
> > -
> > - qedi_conn_destroy(qedi_conn->cls_conn);
> > -
> > - qedi_session_destroy(cls_sess);
> > -}
> > -
> > void qedi_process_tcp_error(struct qedi_endpoint *ep,
> > struct iscsi_eqe_data *data)
> > {
> > diff --git a/drivers/scsi/qedi/qedi_main.c
> > b/drivers/scsi/qedi/qedi_main.c index
> > edf915432704..0b0acb827071 100644
> > --- a/drivers/scsi/qedi/qedi_main.c
> > +++ b/drivers/scsi/qedi/qedi_main.c
> > @@ -2417,11 +2417,9 @@ static void __qedi_remove(struct pci_dev *pdev,
> > int
> > mode)
> > int rval;
> > u16 retry = 10;
> >
> > - if (mode == QEDI_MODE_SHUTDOWN)
> > - iscsi_host_for_each_session(qedi->shost,
> > - qedi_clear_session_ctx);
> > -
> > if (mode == QEDI_MODE_NORMAL || mode ==
> > QEDI_MODE_SHUTDOWN) {
> > + iscsi_host_remove(qedi->shost);
> > +
> > if (qedi->tmf_thread) {
> > flush_workqueue(qedi->tmf_thread);
> > destroy_workqueue(qedi->tmf_thread);
> > @@ -2482,7 +2480,6 @@ static void __qedi_remove(struct pci_dev *pdev,
> > int
> > mode)
> > if (qedi->boot_kset)
> > iscsi_boot_destroy_kset(qedi->boot_kset);
> >
> > - iscsi_host_remove(qedi->shost);
> > iscsi_host_free(qedi->shost);
> > }
> > }
>
> Agree, with your code changes. I will run BFS + shutdown tests with this change,
> and let you know in case hit any issue.
>
> Thanks,
> Manish

Mike,

No issue observed in mentioned tests.

Thanks,
Manish