2010-01-25 23:50:33

by Alex Chiang

[permalink] [raw]
Subject: infiniband limit of 32 cards per system?

Hello,

I'm pretty unfamiliar with Infiniband, so apologies for the
stupid question.

Is there a limit on how many IB devices a system might support?

My colleague points out the following enum in uverbs_main.c:

enum {
IB_UVERBS_MAJOR = 231,
IB_UVERBS_BASE_MINOR = 192,
IB_UVERBS_MAX_DEVICES = 32
};

Experimentally, we've determined that on a system where we
plugged in 40 IB cards, OFED only reports 32 cards are present.

If that enum is indeed the limiting factor, would someone mind
explaining (or pointing me at TFM ;) why it's limited to 32
devices?

Thanks,
/ac


2010-01-26 03:48:37

by Roland Dreier

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?


> My colleague points out the following enum in uverbs_main.c:
>
> enum {
> IB_UVERBS_MAJOR = 231,
> IB_UVERBS_BASE_MINOR = 192,
> IB_UVERBS_MAX_DEVICES = 32
> };
>
> Experimentally, we've determined that on a system where we
> plugged in 40 IB cards, OFED only reports 32 cards are present.

wow, 40 HCAs in one system !

> If that enum is indeed the limiting factor, would someone mind
> explaining (or pointing me at TFM ;) why it's limited to 32
> devices?

That dates back to when device #s had 8 bits for major and 8 bits for
minor. We got one major assigned for IB, and had to split up the 256
minors that gave us among userspace verbs, management access, etc. And
32 seemed like a pretty reasonable limit for most uses.

Nowadays I guess we should look into expanding that to dynamic device
numbers on overflow, assuming you do have a realistic situation where
someone would want to use that many adapters per system.

- R.

2010-01-26 03:59:12

by Alex Chiang

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?

* Roland Dreier <[email protected]>:
>
> > My colleague points out the following enum in uverbs_main.c:
> >
> > enum {
> > IB_UVERBS_MAJOR = 231,
> > IB_UVERBS_BASE_MINOR = 192,
> > IB_UVERBS_MAX_DEVICES = 32
> > };
> >
> > Experimentally, we've determined that on a system where we
> > plugged in 40 IB cards, OFED only reports 32 cards are present.
>
> wow, 40 HCAs in one system !

HP sell some pretty big systems. :)

> > If that enum is indeed the limiting factor, would someone mind
> > explaining (or pointing me at TFM ;) why it's limited to 32
> > devices?
>
> That dates back to when device #s had 8 bits for major and 8 bits for
> minor. We got one major assigned for IB, and had to split up the 256
> minors that gave us among userspace verbs, management access, etc. And
> 32 seemed like a pretty reasonable limit for most uses.

Thanks for the explanation.

> Nowadays I guess we should look into expanding that to dynamic device
> numbers on overflow, assuming you do have a realistic situation where
> someone would want to use that many adapters per system.

Think of a large scale-up ia64 box, possibly running some
virtualization stack.

I'm guessing that it's not just a simple kernel fix though since
OFED has to change too, right?

/ac

2010-01-26 04:04:07

by Roland Dreier

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?


> I'm guessing that it's not just a simple kernel fix though since
> OFED has to change too, right?

Dunno about OFED. Nothing sane is hard-coding major/minor numbers
though -- so I think OFED should be OK, asuming there are no crazy
scripts that bypass udev creating device nodes etc.

I don't think that it's _totally_ trivial in the kernel -- we do need to
add some code in several places to allocate dynamic device numbers when
we run out of the static allocation (probably best to keep the legacy
device numbers for "small" < 32 adapter systems, since there may be
really small systems with static hard-coded /dev etc).

- R.

2010-01-26 21:38:27

by Alex Chiang

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?

* Roland Dreier <[email protected]>:
>
> > I'm guessing that it's not just a simple kernel fix though since
> > OFED has to change too, right?
>
> Dunno about OFED. Nothing sane is hard-coding major/minor numbers
> though -- so I think OFED should be OK, asuming there are no crazy
> scripts that bypass udev creating device nodes etc.

Ok.

> I don't think that it's _totally_ trivial in the kernel -- we
> do need to add some code in several places to allocate dynamic
> device numbers when we run out of the static allocation
> (probably best to keep the legacy device numbers for "small" <
> 32 adapter systems, since there may be really small systems
> with static hard-coded /dev etc).

I take it this concern is what prevents us from simply increasing
IB_UVERBS_MAX_DEVICES to 64 or something?

thanks,
/ac

2010-01-26 22:03:37

by Alex Chiang

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?

* Roland Dreier <[email protected]>:
>
> > If that enum is indeed the limiting factor, would someone
> > mind explaining (or pointing me at TFM ;) why it's limited
> > to 32 devices?
>
> That dates back to when device #s had 8 bits for major and 8
> bits for minor. We got one major assigned for IB, and had to
> split up the 256 minors that gave us among userspace verbs,
> management access, etc. And 32 seemed like a pretty reasonable
> limit for most uses.

Hm...

IB_UMAD_MAX_PORTS = 64,
IB_UMAD_MAJOR = 231,
IB_UMAD_MINOR_BASE = 0
register_chrdev_region(base_dev, IB_UMAD_MAX_PORTS * 2, "infiniband_mad");

IB_UVERBS_MAJOR = 231,
IB_UVERBS_BASE_MINOR = 192,
IB_UVERBS_MAX_DEVICES = 32
register_chrdev_region(IB_UVERBS_BASE_DEV, IB_UVERBS_MAX_DEVICES, "infiniband_verbs");

IB_UCM_MAJOR = 231,
IB_UCM_BASE_MINOR = 224,
IB_UCM_MAX_DEVICES = 32
register_chrdev_region(IB_UCM_BASE_DEV, IB_UCM_MAX_DEVICES, "infiniband_cm");

It looks like we have a hole from [128, 192).

Would it be something as simple as this?

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uve
index 5f284ff..b9aa2b8 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -57,8 +57,8 @@ MODULE_LICENSE("Dual BSD/GPL");

enum {
IB_UVERBS_MAJOR = 231,
- IB_UVERBS_BASE_MINOR = 192,
- IB_UVERBS_MAX_DEVICES = 32
+ IB_UVERBS_BASE_MINOR = 128,
+ IB_UVERBS_MAX_DEVICES = 64
};

#define IB_UVERBS_BASE_DEV MKDEV(IB_UVERBS_MAJOR, IB_UVERBS_BASE_MINOR)

2010-01-26 23:54:57

by Roland Dreier

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?


> I take it this concern is what prevents us from simply increasing
> IB_UVERBS_MAX_DEVICES to 64 or something?

Yes, I think then the device numbers will overlap.

- R.

2010-01-27 00:09:27

by Roland Dreier

[permalink] [raw]
Subject: Re: infiniband limit of 32 cards per system?


> It looks like we have a hole from [128, 192).
>
> Would it be something as simple as this?

> - IB_UVERBS_BASE_MINOR = 192,
> - IB_UVERBS_MAX_DEVICES = 32
> + IB_UVERBS_BASE_MINOR = 128,
> + IB_UVERBS_MAX_DEVICES = 64

I don't think this is a good idea for two reasons:

- It doesn't take into account the fact that the infiniband_mad and
infiniband_cm drivers will take up more minors if more devices appear
(in the best case, you would only be able to run opensm on the first
32 HCAs or something like that).

- It changes the minor of the first uverbs device, so something like a
system with hardcoded static /dev would break in a mysterious way.

I think unfortunately we have to extend the device # assignment so the
first 32 HCAs get the same minors they would have and then overflow into
some dynamic region.

- R.