Linus, please pull from
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
This tree is also available from kernel.org mirrors at:
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
This will pull one fix for an oops caused by reloading the ib_srp module:
David Dillow (1):
IB/srp: Fix list corruption/oops on module reload
drivers/infiniband/ulp/srp/ib_srp.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 950228f..77e8b90 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2054,6 +2054,7 @@ static void srp_remove_one(struct ib_device *device)
list_for_each_entry_safe(target, tmp_target,
&host->target_list, list) {
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
On Thu, 2008-01-03 at 10:29 -0800, Roland Dreier wrote:
> Linus, please pull from
>
> master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This tree is also available from kernel.org mirrors at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This will pull one fix for an oops caused by reloading the ib_srp module:
>
> David Dillow (1):
> IB/srp: Fix list corruption/oops on module reload
If we've got time before 2.6.24 final, I'd wait on this a bit.
ib_srp:srp_remove_work() has them reversed as well, and I'm currently
tracking down why it oopses when the srp_remove_host() happens before
the scsi_remove_host(), which is the documented call sequence.
This "fixes" the oops I see on unload, but I'm sure it is a correct fix.
> If we've got time before 2.6.24 final, I'd wait on this a bit.
> ib_srp:srp_remove_work() has them reversed as well, and I'm currently
> tracking down why it oopses when the srp_remove_host() happens before
> the scsi_remove_host(), which is the documented call sequence.
I think the best thing to do is to merge this (assuming that Linus
gets to it), since it looks quite safe and definitely fixes a crash.
Then if we get to the root cause we can change the order of the calls
if it turns out a different fix is required.
- R.
On Thu, 2008-01-03 at 10:56 -0800, Roland Dreier wrote:
> > If we've got time before 2.6.24 final, I'd wait on this a bit.
> > ib_srp:srp_remove_work() has them reversed as well, and I'm currently
> > tracking down why it oopses when the srp_remove_host() happens before
> > the scsi_remove_host(), which is the documented call sequence.
>
> I think the best thing to do is to merge this (assuming that Linus
> gets to it), since it looks quite safe and definitely fixes a crash.
> Then if we get to the root cause we can change the order of the calls
> if it turns out a different fix is required.
I've made progress on the root cause (posted in another thread), but we
need to fix ib_srp.c:srp_remove_work() as well, as I think it will have
the same issue. It will only be hit if we cannot reconnect to the
target, so it probably doesn't see a lot of use.
I'll send a new patch to you for just the ib_srp.c side.
Dave
Subject: IB/srp: Fix list corruption/oops on module reload
ib_srp doesn't clean up the transport attributes properly when
unloading, so it leaves references around to free'd memory.
The srp_remove_host() cannot go before the scsi_remove_host() call as
the documented call sequence suggests, as it will cause an oops when the
SRP transport code will free up an object that is not its own. So,
temporarily reorder the calls in srp_remove_work() to avoid the problem
as well until the transport code can be fixed.
Signed-off-by: David Dillow <[email protected]>
---
On Thu, 2008-01-03 at 15:13 -0500, David Dillow wrote:
> I'll send a new patch to you for just the ib_srp.c side.
Should've just sent it with the last email.
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 950228f..6e7e3c8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
list_del(&target->list);
spin_unlock(&target->srp_host->target_lock);
- srp_remove_host(target->scsi_host);
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
scsi_host_put(target->scsi_host);
@@ -2054,6 +2054,7 @@ static void srp_remove_one(struct ib_device *device)
list_for_each_entry_safe(target, tmp_target,
&host->target_list, list) {
scsi_remove_host(target->scsi_host);
+ srp_remove_host(target->scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
> @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> list_del(&target->list);
> spin_unlock(&target->srp_host->target_lock);
>
> - srp_remove_host(target->scsi_host);
> scsi_remove_host(target->scsi_host);
> + srp_remove_host(target->scsi_host);
Thanks... I just confirmed (by crashing my system) that either this
change or the fix to the srp transport class is needed too. I think
we have time before 2.6.24 final to get the right fix in, so I'll wait
until tomorrow before asking Linus to pull this.
- R.
On Thu, 03 Jan 2008 15:20:09 -0500
David Dillow <[email protected]> wrote:
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 950228f..6e7e3c8 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> list_del(&target->list);
> spin_unlock(&target->srp_host->target_lock);
>
> - srp_remove_host(target->scsi_host);
> scsi_remove_host(target->scsi_host);
> + srp_remove_host(target->scsi_host);
> ib_destroy_cm_id(target->cm_id);
> srp_free_target_ib(target);
> scsi_host_put(target->scsi_host);
These last two look suspicious. Are you freeing target before
freeing target->scsi_host or does the code simply not do what
it looks like it's doing? :)
(no, I haven't looked at the IB code - I'm probably wrong)
--
All Rights Reversed
On Thu, 2008-01-03 at 18:11 -0500, Rik van Riel wrote:
> On Thu, 03 Jan 2008 15:20:09 -0500
> David Dillow <[email protected]> wrote:
>
> > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> > index 950228f..6e7e3c8 100644
> > --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > @@ -423,8 +423,8 @@ static void srp_remove_work(struct work_struct *work)
> > list_del(&target->list);
> > spin_unlock(&target->srp_host->target_lock);
> >
> > - srp_remove_host(target->scsi_host);
> > scsi_remove_host(target->scsi_host);
> > + srp_remove_host(target->scsi_host);
> > ib_destroy_cm_id(target->cm_id);
> > srp_free_target_ib(target);
> > scsi_host_put(target->scsi_host);
>
> These last two look suspicious. Are you freeing target before
> freeing target->scsi_host or does the code simply not do what
> it looks like it's doing? :)
>
> (no, I haven't looked at the IB code - I'm probably wrong)
srp_free_target_ib() just frees the buffers for the target, and
scsi_host_put() does the actual cleanup once the refcount drops to zero.
Dave