2008-01-03 18:30:19

by Roland Dreier

[permalink] [raw]
Subject: [GIT PULL] please pull infiniband.git for-linus

Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will pull one fix for an oops caused by reloading the ib_srp module:

David Dillow (1):
IB/srp: Fix list corruption/oops on module reload

drivers/infiniband/ulp/srp/ib_srp.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 950228f..77e8b90 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2054,6 +2054,7 @@ static void srp_remove_one(struct ib_device *device)
list_for_each_entry_safe(target, tmp_target,
&host->target_list, list) {
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);


2008-01-03 18:40:01

by David Dillow

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus


On Thu, 2008-01-03 at 10:29 -0800, Roland Dreier wrote:
> Linus, please pull from
>
> master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This tree is also available from kernel.org mirrors at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This will pull one fix for an oops caused by reloading the ib_srp module:
>
> David Dillow (1):
> IB/srp: Fix list corruption/oops on module reload

If we've got time before 2.6.24 final, I'd wait on this a bit.
ib_srp:srp_remove_work() has them reversed as well, and I'm currently
tracking down why it oopses when the srp_remove_host() happens before
the scsi_remove_host(), which is the documented call sequence.

This "fixes" the oops I see on unload, but I'm sure it is a correct fix.

2008-01-03 18:57:16

by Roland Dreier

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus

> If we've got time before 2.6.24 final, I'd wait on this a bit.
> ib_srp:srp_remove_work() has them reversed as well, and I'm currently
> tracking down why it oopses when the srp_remove_host() happens before
> the scsi_remove_host(), which is the documented call sequence.

I think the best thing to do is to merge this (assuming that Linus
gets to it), since it looks quite safe and definitely fixes a crash.
Then if we get to the root cause we can change the order of the calls
if it turns out a different fix is required.

- R.

2008-01-03 20:13:18

by David Dillow

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus


On Thu, 2008-01-03 at 10:56 -0800, Roland Dreier wrote:
> > If we've got time before 2.6.24 final, I'd wait on this a bit.
> > ib_srp:srp_remove_work() has them reversed as well, and I'm currently
> > tracking down why it oopses when the srp_remove_host() happens before
> > the scsi_remove_host(), which is the documented call sequence.
>
> I think the best thing to do is to merge this (assuming that Linus
> gets to it), since it looks quite safe and definitely fixes a crash.
> Then if we get to the root cause we can change the order of the calls
> if it turns out a different fix is required.

I've made progress on the root cause (posted in another thread), but we
need to fix ib_srp.c:srp_remove_work() as well, as I think it will have
the same issue. It will only be hit if we cannot reconnect to the
target, so it probably doesn't see a lot of use.

I'll send a new patch to you for just the ib_srp.c side.
Dave

2008-01-03 20:20:25

by David Dillow

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus

Subject: IB/srp: Fix list corruption/oops on module reload

ib_srp doesn't clean up the transport attributes properly when
unloading, so it leaves references around to free'd memory.

The srp_remove_host() cannot go before the scsi_remove_host() call as
the documented call sequence suggests, as it will cause an oops when the
SRP transport code will free up an object that is not its own. So,
temporarily reorder the calls in srp_remove_work() to avoid the problem
as well until the transport code can be fixed.

Signed-off-by: David Dillow <[email protected]>
---
On Thu, 2008-01-03 at 15:13 -0500, David Dillow wrote:
> I'll send a new patch to you for just the ib_srp.c side.

Should've just sent it with the last email.

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 950228f..6e7e3c8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
list_del(&target->list);
spin_unlock(&target->srp_host->target_lock);

- srp_remove_host(target->scsi_host);
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
scsi_host_put(target->scsi_host);
@@ -2054,6 +2054,7 @@ static void srp_remove_one(struct ib_device *device)
list_for_each_entry_safe(target, tmp_target,
&host->target_list, list) {
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);

2008-01-03 21:34:49

by Roland Dreier

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus

> @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> list_del(&target->list);
> spin_unlock(&target->srp_host->target_lock);
>
> - srp_remove_host(target->scsi_host);
> scsi_remove_host(target->scsi_host);
> + srp_remove_host(target->scsi_host);

Thanks... I just confirmed (by crashing my system) that either this
change or the fix to the srp transport class is needed too. I think
we have time before 2.6.24 final to get the right fix in, so I'll wait
until tomorrow before asking Linus to pull this.

- R.

2008-01-03 23:12:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus

On Thu, 03 Jan 2008 15:20:09 -0500
David Dillow <[email protected]> wrote:

> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 950228f..6e7e3c8 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> list_del(&target->list);
> spin_unlock(&target->srp_host->target_lock);
>
> - srp_remove_host(target->scsi_host);
> scsi_remove_host(target->scsi_host);
> + srp_remove_host(target->scsi_host);
> ib_destroy_cm_id(target->cm_id);
> srp_free_target_ib(target);
> scsi_host_put(target->scsi_host);

These last two look suspicious. Are you freeing target before
freeing target->scsi_host or does the code simply not do what
it looks like it's doing? :)

(no, I haven't looked at the IB code - I'm probably wrong)

--
All Rights Reversed

2008-01-03 23:25:30

by David Dillow

[permalink] [raw]
Subject: Re: [GIT PULL] please pull infiniband.git for-linus

On Thu, 2008-01-03 at 18:11 -0500, Rik van Riel wrote:
> On Thu, 03 Jan 2008 15:20:09 -0500
> David Dillow <[email protected]> wrote:
>
> > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> > index 950228f..6e7e3c8 100644
> > --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> > list_del(&target->list);
> > spin_unlock(&target->srp_host->target_lock);
> >
> > - srp_remove_host(target->scsi_host);
> > scsi_remove_host(target->scsi_host);
> > + srp_remove_host(target->scsi_host);
> > ib_destroy_cm_id(target->cm_id);
> > srp_free_target_ib(target);
> > scsi_host_put(target->scsi_host);
>
> These last two look suspicious. Are you freeing target before
> freeing target->scsi_host or does the code simply not do what
> it looks like it's doing? :)
>
> (no, I haven't looked at the IB code - I'm probably wrong)

srp_free_target_ib() just frees the buffers for the target, and
scsi_host_put() does the actual cleanup once the refcount drops to zero.

Dave