2006-03-09 23:26:46

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

> +static int ipath_sma_release(struct inode *in, struct file *fp)
> +{
> + int s;
> +
> + ipath_sma_alive = 0;
> + ipath_cdbg(SMA, "Closing SMA device\n");
> + for (s = 0; s < atomic_read(&ipath_max); s++) {
> + struct ipath_devdata *dd = ipath_lookup(s);
> +
> + if (!dd || !(dd->ipath_flags & IPATH_INITTED))
> + continue;
> + *dd->ipath_statusp &= ~IPATH_STATUS_SMA;
> + if (dd->verbs_layer.l_flags & IPATH_VERBS_KERNEL_SMA)
> + *dd->ipath_statusp |= IPATH_STATUS_OIB_SMA;
> + }
> + return 0;
> +}

Similarly what protects against another process opening the device
right after the ipath_sma_alive = 0 setting, but before you do all the
cleanup that's after that?

And what protects against a hot unplug of a device after the test of s
against ipath_max?

- R.


2006-03-09 23:52:49

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:

> Similarly what protects against another process opening the device
> right after the ipath_sma_alive = 0 setting, but before you do all the
> cleanup that's after that?

This is fixed by the stuff I just did in response to your earlier
message.

> And what protects against a hot unplug of a device after the test of s
> against ipath_max?

We don't support hotplugged devices at the moment. If you're asking
whether an rmmod at the wrong time could cause something bad to happen,
I don't *think* so.

<b

2006-03-10 00:00:29

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

Bryan> We don't support hotplugged devices at the moment. If
Bryan> you're asking whether an rmmod at the wrong time could
Bryan> cause something bad to happen, I don't *think* so.

How do you stop someone from hot plugging a PCIe device?

- R.

2006-03-10 00:04:35

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 16:00 -0800, Roland Dreier wrote:
> Bryan> We don't support hotplugged devices at the moment.
>
> How do you stop someone from hot plugging a PCIe device?

You say "we don't support this yet" somewhere in big letters. The chips
and cards support hotplug electrically and logically. We just haven't
had time yet to do the driver support work, and won't for a while.

<b

2006-03-10 00:45:26

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, Mar 09, 2006 at 03:52:47PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:
>
> > Similarly what protects against another process opening the device
> > right after the ipath_sma_alive = 0 setting, but before you do all the
> > cleanup that's after that?
>
> This is fixed by the stuff I just did in response to your earlier
> message.
>
> > And what protects against a hot unplug of a device after the test of s
> > against ipath_max?
>
> We don't support hotplugged devices at the moment.

Why not? Your cards can't be placed in a machine that supports PCI
Hotplug (or PCI-E hotplug)? You can't really tell users that (no matter
how often I have wished I could...)

thanks,

greg k-h

2006-03-10 00:48:50

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:

> > We don't support hotplugged devices at the moment.
>
> Why not? Your cards can't be placed in a machine that supports PCI
> Hotplug (or PCI-E hotplug)?

No, the driver and userspace code doesn't support it yet. That's all.

> You can't really tell users that (no matter
> how often I have wished I could...)

I don't expect this to be a practical problem. We're planning to add
hotplug support to the driver once we have some cycles free.

<b

2006-03-10 01:04:22

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, Mar 09, 2006 at 04:48:45PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:
>
> > > We don't support hotplugged devices at the moment.
> >
> > Why not? Your cards can't be placed in a machine that supports PCI
> > Hotplug (or PCI-E hotplug)?
>
> No, the driver and userspace code doesn't support it yet. That's all.
>
> > You can't really tell users that (no matter
> > how often I have wished I could...)
>
> I don't expect this to be a practical problem. We're planning to add
> hotplug support to the driver once we have some cycles free.

Ugh, that means it's never going to be there.

All new PCI drivers have the requirement that they work properly in
hotplug systems, as they should follow the PCI core api. If not, odds
are they will not be accepted into the tree :(

thanks,

greg k-h

2006-03-10 04:41:42

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:

> > I don't expect this to be a practical problem. We're planning to add
> > hotplug support to the driver once we have some cycles free.
>
> Ugh, that means it's never going to be there.
>
> All new PCI drivers have the requirement that they work properly in
> hotplug systems, as they should follow the PCI core api. If not, odds
> are they will not be accepted into the tree :(

Okay, maybe we're talking at cross purposes here. We do follow the PCI
core API. We have a __devinit probe and __devexit remove routine, a
MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
is detected or the driver is unloaded, and so on.

I *assumed* that there was something more that we would need to do in
order to support real hotplug of actual physical cards, but now that I
look more closely, it doesn't appear that there is. At least, there's
nothing in Documentation/pci.txt or LDD3 that indicates to me that we
ought to be doing more.

Am I missing something?

<b

2006-03-10 05:48:43

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, Mar 09, 2006 at 08:41:36PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:
>
> > > I don't expect this to be a practical problem. We're planning to add
> > > hotplug support to the driver once we have some cycles free.
> >
> > Ugh, that means it's never going to be there.
> >
> > All new PCI drivers have the requirement that they work properly in
> > hotplug systems, as they should follow the PCI core api. If not, odds
> > are they will not be accepted into the tree :(
>
> Okay, maybe we're talking at cross purposes here. We do follow the PCI
> core API. We have a __devinit probe and __devexit remove routine, a
> MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
> is detected or the driver is unloaded, and so on.
>
> I *assumed* that there was something more that we would need to do in
> order to support real hotplug of actual physical cards, but now that I
> look more closely, it doesn't appear that there is. At least, there's
> nothing in Documentation/pci.txt or LDD3 that indicates to me that we
> ought to be doing more.
>
> Am I missing something?

Nope, that's all that you need to do. Your driver will be notified that
the device will be going away by calling the disconnect function. So
great, nothing needs to be done :)

Oh, and you can test this out if you don't have a pci hotplug system by
using the fakephp driver and disconnecting your device that way.

thanks,

greg k-h

2006-03-10 05:55:07

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

Bryan> I *assumed* that there was something more that we would
Bryan> need to do in order to support real hotplug of actual
Bryan> physical cards, but now that I look more closely, it
Bryan> doesn't appear that there is. At least, there's nothing in
Bryan> Documentation/pci.txt or LDD3 that indicates to me that we
Bryan> ought to be doing more.

Bryan> Am I missing something?

No, the only problems are with the way the various pieces of your
drivers refer to devices by index. There are obvious races such as

> +int __init ipath_verbs_init(void)
> +{
> + int i;
> +
> + number_of_devices = ipath_layer_get_num_of_dev();
> + i = number_of_devices * sizeof(struct ipath_ibdev *);
> + ipath_devices = kmalloc(i, GFP_ATOMIC);
> + if (ipath_devices == NULL)
> + return -ENOMEM;
> +
> + for (i = 0; i < number_of_devices; i++) {
> + struct ipath_devdata *dd;
> + int ret = ipath_verbs_register(i, ipath_ib_piobufavail,
> + ipath_ib_rcv, ipath_ib_timer,
> + &dd);

suppose number_of_devices gets set to 5 but by the time you call
ipath_verbs_register(5,...), the 5th device has been unplugged?

Also you only do this when the module is loaded, so you won't handle
devices that are hot-plugged later. And I don't see anything that
would handle hot unplug either.

Pretty much any use of ipath_max is probably broken by hot plug.

- R.

2006-03-10 13:40:48

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 21:48 -0800, Greg KH wrote:

> Oh, and you can test this out if you don't have a pci hotplug system by
> using the fakephp driver and disconnecting your device that way.

That's good to know, thanks.

<b

2006-03-10 13:43:26

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Thu, 2006-03-09 at 21:55 -0800, Roland Dreier wrote:

> No, the only problems are with the way the various pieces of your
> drivers refer to devices by index.

OK. What's a safe way to iterate over the devices in the presence of
hotplug, then? I assume it's list_for_each_mumble; I just don't know
what mumble is :-)

> Also you only do this when the module is loaded, so you won't handle
> devices that are hot-plugged later.

No, ipath_max is updated any time a probe routine is called.

> And I don't see anything that
> would handle hot unplug either.

What would this anything look like, if I were hoping for an example to
emulate? There's nothing in LDD3 about this, so I'm kind of in the
dark.

<b

--
Bryan O'Sullivan <[email protected]>

2006-03-10 16:58:36

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Fri, Mar 10, 2006 at 05:43:50AM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 21:55 -0800, Roland Dreier wrote:
>
> > No, the only problems are with the way the various pieces of your
> > drivers refer to devices by index.
>
> OK. What's a safe way to iterate over the devices in the presence of
> hotplug, then? I assume it's list_for_each_mumble; I just don't know
> what mumble is :-)

You keep an internal list of devices, if you really need to do such a
thing.

> > Also you only do this when the module is loaded, so you won't handle
> > devices that are hot-plugged later.
>
> No, ipath_max is updated any time a probe routine is called.
>
> > And I don't see anything that
> > would handle hot unplug either.
>
> What would this anything look like, if I were hoping for an example to
> emulate? There's nothing in LDD3 about this, so I'm kind of in the
> dark.

It's just the "disconnect" PCI function being called, which can happen
at any time.

thanks,

greg k-h

2006-03-10 17:05:25

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Fri, 2006-03-10 at 08:58 -0800, Greg KH wrote:

> You keep an internal list of devices, if you really need to do such a
> thing.

OK. I do think we need it, because we have a dozen or so cases where we
do "iterate over all of the known devices".

> It's just the "disconnect" PCI function being called, which can happen
> at any time.

Thanks.

<b

2006-03-10 17:08:31

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

Bryan> OK. What's a safe way to iterate over the devices in the
Bryan> presence of hotplug, then? I assume it's
Bryan> list_for_each_mumble; I just don't know what mumble is :-)

You need something that takes a reference to each device while you're
looking at it, like for_each_pci_dev(). But in general iterating
through devices is usually the wrong thing to do, because devices can
come and go in the middle of your loop. It's better to be driven by
the add and remove callbacks.

Bryan> No, ipath_max is updated any time a probe routine is
Bryan> called.

Yes, that's true. (BTW, what does making ipath_max an atomic_t get
you? The updates are protected by a lock anyway). But I was talking
about the code in ipath_verbs_init(), which is the only place you call
ipath_verbs_register() that I could find. You make one pass through
the devices that are present when ipath_verbs_init() is called at
module load time, and any devices that get added later are missed.

Similarly, if a device is unplugged while the verbs module is loaded,
there's no notification from the core driver of that, and you'll go
ahead and do ipath_verbs_unregister() on a device that is long gone
when you get to ipath_verbs_cleanup().

- R.

2006-03-10 17:32:38

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

On Fri, 2006-03-10 at 09:08 -0800, Roland Dreier wrote:
> Bryan> OK. What's a safe way to iterate over the devices in the
> Bryan> presence of hotplug, then? I assume it's
> Bryan> list_for_each_mumble; I just don't know what mumble is :-)
>
> You need something that takes a reference to each device while you're
> looking at it, like for_each_pci_dev().

OK, thanks.

It seems like the thing to do to be fully safe might be to have
ipath_get() (just rename ipath_lookup) and ipath_put() functions. Embed
a kref inside ipath_devdata, and have ipath_dev_get increment the ref
count on both the ipath_devdata and the pci_dev.

Is that sane, or am I way off base?

> But in general iterating
> through devices is usually the wrong thing to do, because devices can
> come and go in the middle of your loop.

I understand that. In practice, though, I think this is not a good
approach in many of the cases we're dealing with. Every use of
ipath_max iterates over all devices for a reason.

For example, the diag open routine wants to find at least one device
that's up. We could maintain a separate list of devices that are in the
state that it needs, so that it can just get the first entry off that
list or fail if the list is empty, but that's more complex than simply
iterating over every device and checking each one.

> (BTW, what does making ipath_max an atomic_t get
> you? The updates are protected by a lock anyway).

Probably not much. The motivation was to ensure that if it got
incremented during an iteration, whoever was iterating would see the
update in a timely fashion.

> But I was talking
> about the code in ipath_verbs_init(), which is the only place you call
> ipath_verbs_register() that I could find. You make one pass through
> the devices that are present when ipath_verbs_init() is called at
> module load time, and any devices that get added later are missed.

Yes, that code ought to be restructured.

Thanks for the helpful feedback.

<b

2006-03-10 22:21:06

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

Bryan> Probably not much. The motivation was to ensure that if it
Bryan> got incremented during an iteration, whoever was iterating
Bryan> would see the update in a timely fashion.

But as far as I can see, you never do atomic_inc() -- only
atomic_set() under a spinlock.