2006-02-11 20:22:26

by Roland Dreier

[permalink] [raw]
Subject: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Fix the following race scenario:
- Device is up.
- Port event or set mcast list triggers ipoib_mcast_stop_thread,
this cancels the query and waits on mcast "done" completion.
- Completion is called and "done" is set.
- Meanwhile, ipoib_mcast_send arrives and starts a new query,
re-initializing "done".

Fix this by adding a "multicast started" bit and checking it before
starting a send-only join.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

---

drivers/infiniband/ulp/ipoib/ipoib.h | 1 +
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)

479a079663bd4c5f3d2714643b1b8c406aaba3e0
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index e0a5412..2f85a9a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -78,6 +78,7 @@ enum {
IPOIB_FLAG_SUBINTERFACE = 4,
IPOIB_MCAST_RUN = 5,
IPOIB_STOP_REAPER = 6,
+ IPOIB_MCAST_STARTED = 7,

IPOIB_MAX_BACKOFF_SECONDS = 16,

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index ccaa0c3..1c71482 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -601,6 +601,10 @@ int ipoib_mcast_start_thread(struct net_
queue_work(ipoib_workqueue, &priv->mcast_task);
mutex_unlock(&mcast_mutex);

+ spin_lock_irq(&priv->lock);
+ set_bit(IPOIB_MCAST_STARTED, &priv->flags);
+ spin_unlock_irq(&priv->lock);
+
return 0;
}

@@ -611,6 +615,10 @@ int ipoib_mcast_stop_thread(struct net_d

ipoib_dbg_mcast(priv, "stopping multicast thread\n");

+ spin_lock_irq(&priv->lock);
+ clear_bit(IPOIB_MCAST_STARTED, &priv->flags);
+ spin_unlock_irq(&priv->lock);
+
mutex_lock(&mcast_mutex);
clear_bit(IPOIB_MCAST_RUN, &priv->flags);
cancel_delayed_work(&priv->mcast_task);
@@ -693,6 +701,12 @@ void ipoib_mcast_send(struct net_device
*/
spin_lock(&priv->lock);

+ if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) {
+ ++priv->stats.tx_dropped;
+ dev_kfree_skb_any(skb);
+ goto unlock;
+ }
+
mcast = __ipoib_mcast_find(dev, mgid);
if (!mcast) {
/* Let's create a new send only group now */
@@ -754,6 +768,7 @@ out:
ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
}

+unlock:
spin_unlock(&priv->lock);
}

--
1.1.3


2006-02-11 20:22:48

by Roland Dreier

[permalink] [raw]
Subject: [git patch review 2/4] IPoIB: Fix another send-only join race

Further, there's an additional issue that I saw in testing:
ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if
the device was downed and then upped internally because of a port
event).

If this happends and the send-only join request gets completed before
priv->broadcast is set, we get an oops.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

---

drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

7bcb974ef6a0ae903888272c92c66ea779388c01
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 1c71482..932bf13 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -701,7 +701,7 @@ void ipoib_mcast_send(struct net_device
*/
spin_lock(&priv->lock);

- if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) {
+ if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) {
++priv->stats.tx_dropped;
dev_kfree_skb_any(skb);
goto unlock;
--
1.1.3

2006-02-11 20:22:49

by Roland Dreier

[permalink] [raw]
Subject: [git patch review 4/4] IPoIB: Yet another fix for send-only joins

Even after the last fix, it's still possible for a send-only join to
start before the join for the broadcast group has finished. This
could cause us to create a multicast group using attributes from the
broadcast group that haven't been initialized yet, so we would use
garbage for the Q_Key, etc. Fix this by waiting until the broadcast
group's attached flag is set before starting send-only joins.

Signed-off-by: Roland Dreier <[email protected]>

---

drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)

20b83382d1c5d4d1a73fc5671261db5239d1dbb3
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 932bf13..a2408d7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -533,8 +533,10 @@ void ipoib_mcast_join_task(void *dev_ptr
}

if (!priv->broadcast) {
- priv->broadcast = ipoib_mcast_alloc(dev, 1);
- if (!priv->broadcast) {
+ struct ipoib_mcast *broadcast;
+
+ broadcast = ipoib_mcast_alloc(dev, 1);
+ if (!broadcast) {
ipoib_warn(priv, "failed to allocate broadcast group\n");
mutex_lock(&mcast_mutex);
if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
@@ -544,10 +546,11 @@ void ipoib_mcast_join_task(void *dev_ptr
return;
}

- memcpy(priv->broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
+ spin_lock_irq(&priv->lock);
+ memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid));
+ priv->broadcast = broadcast;

- spin_lock_irq(&priv->lock);
__ipoib_mcast_add(dev, priv->broadcast);
spin_unlock_irq(&priv->lock);
}
@@ -701,7 +704,9 @@ void ipoib_mcast_send(struct net_device
*/
spin_lock(&priv->lock);

- if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) {
+ if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) ||
+ !priv->broadcast ||
+ !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
++priv->stats.tx_dropped;
dev_kfree_skb_any(skb);
goto unlock;
--
1.1.3

2006-02-11 20:22:28

by Roland Dreier

[permalink] [raw]
Subject: [git patch review 3/4] IB/mthca: Don't print debugging info until we have all values

When debugging is enabled, the mthca_QUERY_DEV_LIM() firmware command
function prints out some of the device limits that it queries.
However the debugging prints happen before all of the fields are
extracted from the firmware response, so some of the values that get
printed are uninitialized junk. Move the prints to the end of the
function to fix this.

Signed-off-by: Roland Dreier <[email protected]>

---

drivers/infiniband/hw/mthca/mthca_cmd.c | 38 ++++++++++++++++---------------
1 files changed, 19 insertions(+), 19 deletions(-)

f295c79b6766b25fe8c1aad88211c54d1caa7e0b
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index f9b9b93..2825615 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1029,25 +1029,6 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
MTHCA_GET(size, outbox, QUERY_DEV_LIM_UAR_ENTRY_SZ_OFFSET);
dev_lim->uar_scratch_entry_sz = size;

- mthca_dbg(dev, "Max QPs: %d, reserved QPs: %d, entry size: %d\n",
- dev_lim->max_qps, dev_lim->reserved_qps, dev_lim->qpc_entry_sz);
- mthca_dbg(dev, "Max SRQs: %d, reserved SRQs: %d, entry size: %d\n",
- dev_lim->max_srqs, dev_lim->reserved_srqs, dev_lim->srq_entry_sz);
- mthca_dbg(dev, "Max CQs: %d, reserved CQs: %d, entry size: %d\n",
- dev_lim->max_cqs, dev_lim->reserved_cqs, dev_lim->cqc_entry_sz);
- mthca_dbg(dev, "Max EQs: %d, reserved EQs: %d, entry size: %d\n",
- dev_lim->max_eqs, dev_lim->reserved_eqs, dev_lim->eqc_entry_sz);
- mthca_dbg(dev, "reserved MPTs: %d, reserved MTTs: %d\n",
- dev_lim->reserved_mrws, dev_lim->reserved_mtts);
- mthca_dbg(dev, "Max PDs: %d, reserved PDs: %d, reserved UARs: %d\n",
- dev_lim->max_pds, dev_lim->reserved_pds, dev_lim->reserved_uars);
- mthca_dbg(dev, "Max QP/MCG: %d, reserved MGMs: %d\n",
- dev_lim->max_pds, dev_lim->reserved_mgms);
- mthca_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
- dev_lim->max_cq_sz, dev_lim->max_qp_sz, dev_lim->max_srq_sz);
-
- mthca_dbg(dev, "Flags: %08x\n", dev_lim->flags);
-
if (mthca_is_memfree(dev)) {
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SRQ_SZ_OFFSET);
dev_lim->max_srq_sz = 1 << field;
@@ -1093,6 +1074,25 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
dev_lim->mpt_entry_sz = MTHCA_MPT_ENTRY_SIZE;
}

+ mthca_dbg(dev, "Max QPs: %d, reserved QPs: %d, entry size: %d\n",
+ dev_lim->max_qps, dev_lim->reserved_qps, dev_lim->qpc_entry_sz);
+ mthca_dbg(dev, "Max SRQs: %d, reserved SRQs: %d, entry size: %d\n",
+ dev_lim->max_srqs, dev_lim->reserved_srqs, dev_lim->srq_entry_sz);
+ mthca_dbg(dev, "Max CQs: %d, reserved CQs: %d, entry size: %d\n",
+ dev_lim->max_cqs, dev_lim->reserved_cqs, dev_lim->cqc_entry_sz);
+ mthca_dbg(dev, "Max EQs: %d, reserved EQs: %d, entry size: %d\n",
+ dev_lim->max_eqs, dev_lim->reserved_eqs, dev_lim->eqc_entry_sz);
+ mthca_dbg(dev, "reserved MPTs: %d, reserved MTTs: %d\n",
+ dev_lim->reserved_mrws, dev_lim->reserved_mtts);
+ mthca_dbg(dev, "Max PDs: %d, reserved PDs: %d, reserved UARs: %d\n",
+ dev_lim->max_pds, dev_lim->reserved_pds, dev_lim->reserved_uars);
+ mthca_dbg(dev, "Max QP/MCG: %d, reserved MGMs: %d\n",
+ dev_lim->max_pds, dev_lim->reserved_mgms);
+ mthca_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
+ dev_lim->max_cq_sz, dev_lim->max_qp_sz, dev_lim->max_srq_sz);
+
+ mthca_dbg(dev, "Flags: %08x\n", dev_lim->flags);
+
out:
mthca_free_mailbox(dev, mailbox);
return err;
--
1.1.3

2006-02-11 22:02:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Roland Dreier <[email protected]> wrote:
>
> + spin_lock_irq(&priv->lock);
> + set_bit(IPOIB_MCAST_STARTED, &priv->flags);
> + spin_unlock_irq(&priv->lock);

Strange to put a lock around an atomic op like that.

Sometimes it's valid. If another cpu was doing:

spin_lock(lock);

if (test_bit(IPOIB_MCAST_STARTED))
something();
...
if (test_bit(IPOIB_MCAST_STARTED))
something_else();

spin_unlock(lock);

then the locked set_bit() makes sense.

But often it doesn't ;)

2006-02-12 02:03:23

by Roland Dreier

[permalink] [raw]
Subject: Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

> Roland Dreier <[email protected]> wrote:
> >
> > + spin_lock_irq(&priv->lock);
> > + set_bit(IPOIB_MCAST_STARTED, &priv->flags);
> > + spin_unlock_irq(&priv->lock);
>
> Strange to put a lock around an atomic op like that.
>
> Sometimes it's valid. If another cpu was doing:
>
> spin_lock(lock);
>
> if (test_bit(IPOIB_MCAST_STARTED))
> something();
> ...
> if (test_bit(IPOIB_MCAST_STARTED))
> something_else();
>
> spin_unlock(lock);
>
> then the locked set_bit() makes sense.
>
> But often it doesn't ;)

Good point. Michael, any reason why the lock is there around the
set_bit()? (And similarly for the corresponding clear_bit())

Thanks,
Roland

2006-02-12 07:49:23

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Quoting r. Roland Dreier <[email protected]>:
> Subject: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
>
> > Roland Dreier <[email protected]> wrote:
> > >
> > > + spin_lock_irq(&priv->lock);
> > > + set_bit(IPOIB_MCAST_STARTED, &priv->flags);
> > > + spin_unlock_irq(&priv->lock);
> >
> > Strange to put a lock around an atomic op like that.
> >
> > Sometimes it's valid. If another cpu was doing:
> >
> > spin_lock(lock);
> >
> > if (test_bit(IPOIB_MCAST_STARTED))
> > something();
> > ...
> > if (test_bit(IPOIB_MCAST_STARTED))
> > something_else();
> >
> > spin_unlock(lock);
> >
> > then the locked set_bit() makes sense.
> >
> > But often it doesn't ;)
>
> Good point. Michael, any reason why the lock is there around the
> set_bit()? (And similarly for the corresponding clear_bit())
>
> Thanks,
> Roland

Basically, its as Andrew said: the lock around clear_bit is there to ensure that
ipoib_mcast_send isnt running already when we stop the thread. Thats why
test_bit has to be inside the lock, too.

This was discussed with Krishna Kumar when I posted the patch originally.
For more detail, please review this thread:
http://www.mail-archive.com/[email protected]/msg13206.html
or here
http://openib.org/pipermail/openib-general/2005-December/014370.html


--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

2006-02-12 08:13:12

by Kyle Moffett

[permalink] [raw]
Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

On Feb 12, 2006, at 02:50, Michael S. Tsirkin wrote:
> Basically, its as Andrew said: the lock around clear_bit is there
> to ensure that ipoib_mcast_send isnt running already when we stop
> the thread. Thats why test_bit has to be inside the lock, too.

Looks like you guys could use nonatomic versions to improve bus
efficiency slightly, but they appear to be relying on the fact that
when the function calling set_bit() returns, the multicast thread
will be guaranteed to be finished and never run again. The set_bit()
can only happen when the thread is not doing work (due to the lock),
and since the thread firsts checks the bit before doing any work, it
provides more guarantees than just the atomics would.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/E/U d- s++: a18 C++++>$ ULBX*++++(+++)>$ P++++(+++)>$ L++++
(+++)>$ !E- W+++(++) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP
+ t+(+++) 5 X R? !tv-(--) b++++(++) DI+(++) D+++ G e>++++$ h*(+)>++$ r
%(--) !y?-(--)
------END GEEK CODE BLOCK------



2006-02-12 08:17:55

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Quoting r. Kyle Moffett <[email protected]>:
> On Feb 12, 2006, at 02:50, Michael S. Tsirkin wrote:
> >Basically, its as Andrew said: the lock around clear_bit is there
> >to ensure that ipoib_mcast_send isnt running already when we stop
> >the thread. Thats why test_bit has to be inside the lock, too.
>
> Looks like you guys could use nonatomic versions to improve bus
> efficiency slightly

I think we need atomics since other places touch bits in the same word without
taking the lock.

--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

2006-02-12 16:37:51

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Michael> Basically, its as Andrew said: the lock around clear_bit
Michael> is there to ensure that ipoib_mcast_send isnt running
Michael> already when we stop the thread. Thats why test_bit has
Michael> to be inside the lock, too.

Makes sense I guess. If I'm understanding correctly, the lock isn't
really there to serialize the bit ops, but rather to make sure
ipoib_mcast_send() won't do anything after we clear the bit.

Does that mean that there's no reason to take the lock around the set_bit()?

- R.

2006-02-12 16:55:40

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped

Quoting r. Roland Dreier <[email protected]>:
> Subject: Re: [openib-general] Re: [git patch review 1/4] IPoIB: Don't start send-only joins while multicast thread is stopped
>
> Michael> Basically, its as Andrew said: the lock around clear_bit
> Michael> is there to ensure that ipoib_mcast_send isnt running
> Michael> already when we stop the thread. Thats why test_bit has
> Michael> to be inside the lock, too.
>
> Makes sense I guess. If I'm understanding correctly, the lock isn't
> really there to serialize the bit ops, but rather to make sure
> ipoib_mcast_send() won't do anything after we clear the bit.

Right. Thats one way to put it.

> Does that mean that there's no reason to take the lock around the set_bit()?

Ugh, sorry, I dont really remember why I put it there.

I guess I just have easier time reasoning about locks than barriers and atomic
operations. "bit is protected by priv->lock" is a simple rule, and we are not on
data path here. The fact that the race went unnoticed for a while validates
this approach in my eyes.

I guess longer term we will replace mcast_mutex with priv->lock anyway, so it
doesnt matter much.

--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies