2004-03-15 22:57:42

by Woodruff, Robert J

[permalink] [raw]
Subject: RE: PATCH - InfiniBand Access Layer (IBAL)

On Sun, Mar 14, Greg KH wrote:

First, As my boss remined me this morning,
let me make sure I was clear, there are not 2 different efforts now,
only one, openib.org.

1) OpenIB represents a number of companies coming together with lots of
InfiniBand source code,
with duplicate code for the access layer and most of the ULPs
2) the SourceForge work is already part of this
3) the foundation of infiniband support will be the Access Layer, so it
needs the community's feedback first
4) we are looking for feedback on both the access layer code in the
current openib snapshot and the access layer code that we submitted a
few weeks ago
to learn which is more acceptable to the community.

Now to answer a couple specific questions.

>Hm, without open source drivers, the Intel stack doesn't seem very
>viable, correct?

Correct, that is why we hope that Mellanox contributes their driver for
IBAL to open source.

>> The comments you have given on IBAL would probably only take a few
>> weeks to change.
>Is that work already underway? Finished? If neither, why not?

Work is, or at least was underway, but
we put it aside last week to review the rest of the code now in
openib.org.
We also need an open source driver.

>What are the issues with the OpenIB stack?

As I stated above, we are part of the openib.org collaboration and
will be working on helping develop a stack that is "best of
breed" from all of the available code. Starting from the bottom up,
we first need to review the various proposals for the
Access Layer and determine which code base to start with.
The initial agreement was to use the
TopSpin code for an access layer. This agreement was made before anyone
got to see any code.
After review of this code, we think it needs a lot of work. We were
waiting for the openib.org email lists to open and sending in comments
there.
That way we could work a lot of details offline from lkml, since
lots of discussion will be needed.

But since you asked here are a few,

1.) The tsapi APIs look like Windows APIs (at least in the original
drop)
2.) Looking at the API specification document,
It is missing lots of verbs required by the InfiniBand Specification
CloseCA, ModifyAV, QueryCQ, CreateEEC, ModifyEEC, QueryEEC,
DestroyEEC, QueryMR, ReregisterMR, ReregisterPhysMR, RegisterSharedMR,
AllocMW, QueryMW, BindMW, and FreeMW
3.) The code is not compliant with the InfiniBand specification and has
proprietary
implementations of things like "path records" so it will only work with
the
TopSpin subnet manager that requires you to buy a topspin switch.
4.) Not sure if they have fixed this yet in the 2.6 code, but the 2.4
code
has like 18 different loadable modules. This could probably be collapsed
into 5, one for the HCA driver, one for the access layer, one for the
IPoIB driver, one for the SRP driver and one for the SDP driver.
5.) There is no user-mode access layer requiring ULPs to code to the HCA

user-mode driver APIs directly.
This will mean that new user mode ULPs will need to be
developed for each new HCA that comes along.
6.)The VAPI code has extra propietary verbs that are not specified by
the InfiniBand
Specification.
7.) The implementation is deficient in it's support for InfiniBand
management
services, like the required RMPP protocol, MAD services, SA query helper
functions.
8.) Some of the message fields of the CM are hard coded.
9.) The CM does not support reliable datagrams.
10) There is no built in support for plug and play events, port up/down,
LID change, SM change,
11) VAPI call stack is deep and puts a lot of big data structure on the
stack.

There is more, but as I stated before, we suggest discussing most of
these issues within
openib.org first, trying to come to agreement on what is best and then
review our
suggestions with lkml to make sure we are one track.

>If there are any, how does the Intel stack solve those
>issues?

The SourceForge code IBAL(not just developed by Intel but has
contributions from several companies,
including InfiniCon, Mellanox, Fujitsu and Intel)
is feature complete and compliant with the InfiniBand specification. It
may not be quite as
hardened as the TopSpin stack, but that gap is rapidly closing.
We'd also like to know from the other openib.org people,
What are the issues with the SourceForge IBAL ?

We know the issues raised by lkml and think these can be fixed.

The biggest problem I see is that we do not have an open source HCA
driver
and that could be fixed too, if Mellanox wanted to, or someone could
take the VAPI code they open sourced and port it to IBAL.

>Could the Intel solutions be merged
>into the OpenIB stack to solve these issues?

Given there are so many issues with the TSAPI, would it be easier to
fix the issues lkml raised with IBAL and port the "best of breed" ULPs
to it ?
Since all the tsAPIs will have to change anyway, to non-Windows-ize
them,
all the ULPs will need to be re-ported again anyway.


2004-03-15 23:17:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Mon, Mar 15, 2004 at 02:52:44PM -0800, Woodruff, Robert J wrote:
> As I stated above, we are part of the openib.org collaboration and
> will be working on helping develop a stack that is "best of
> breed" from all of the available code. Starting from the bottom up,
> we first need to review the various proposals for the
> Access Layer and determine which code base to start with.

>From looking at both codebases starting from scratch sounds like the
best idea to me..

2004-03-15 23:44:27

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Mon, Mar 15, 2004, Christoph Hellwig <[email protected]> wrote:
> On Mon, Mar 15, 2004 at 02:52:44PM -0800, Woodruff, Robert J wrote:
> > As I stated above, we are part of the openib.org collaboration and
> > will be working on helping develop a stack that is "best of
> > breed" from all of the available code. Starting from the bottom up,
> > we first need to review the various proposals for the
> > Access Layer and determine which code base to start with.
>
> From looking at both codebases starting from scratch sounds like the
> best idea to me..

What's fatally wrong with the code that's currently available via
openib.org?

JE

2004-03-15 23:48:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Mon, Mar 15, 2004 at 03:44:25PM -0800, Johannes Erdfelt wrote:
> > From looking at both codebases starting from scratch sounds like the
> > best idea to me..
>
> What's fatally wrong with the code that's currently available via
> openib.org?

Did you actually read it?

p.s. if you reply to my mails please keep me in the To line. Really,
please don't do any fany reply to group tricks unless people explicitly
request it in the Mail-Fup header.

2004-03-15 23:54:24

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Mon, Mar 15, 2004, Christoph Hellwig <[email protected]> wrote:
> On Mon, Mar 15, 2004 at 03:44:25PM -0800, Johannes Erdfelt wrote:
> > > From looking at both codebases starting from scratch sounds like the
> > > best idea to me..
> >
> > What's fatally wrong with the code that's currently available via
> > openib.org?
>
> Did you actually read it?

The code on openib.org? Yes, I wrote some of it.

I would be the first to say that there are portions that need to be
rewritten, but I definately do not think all or even most of it does.

That's why I was asking what specifically you found fatally wrong with
it. I haven't seen many critiques, so I can only assume it's the same
things I see wrong with it.

> p.s. if you reply to my mails please keep me in the To line. Really,
> please don't do any fany reply to group tricks unless people explicitly
> request it in the Mail-Fup header.

If you really want duplicates of all the replies, sure, I'll make an
exception for you.

I don't see why a smarter client, or mail filter, couldn't do the same
thing without depending on the behaviour of the sender.

JE

2004-03-16 00:45:32

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Tue, Mar 16, 2004, Christoph Hellwig <[email protected]> wrote:
> On Mon, Mar 15, 2004 at 03:54:14PM -0800, Johannes Erdfelt wrote:
> > > Did you actually read it?
> >
> > The code on openib.org? Yes, I wrote some of it.
> >
> > I would be the first to say that there are portions that need to be
> > rewritten, but I definately do not think all or even most of it does.
> >
> > That's why I was asking what specifically you found fatally wrong with
> > it. I haven't seen many critiques, so I can only assume it's the same
> > things I see wrong with it.
>
> Start with the thing Robert already mentioned. Ad ontop of that:
>
> - the horrible Winodes/Linux compat code. We all know this kind
> of compat code is messy. But the way it's don in this code is just
> incredibly idiotic.
> - totally braindead use of macro abstraction
> - those split into far too many files
> - wrong use of dma mapping abstraction
> - braindead malloc code
> - wrong modversions handling duplicated in every file
>
> I'm really surprised you're admitting to having touched that code.
> I'd have guessed everyone who did would hide in his house ashamed.

You do realize that the code on openib.org is from multiple vendors,
right? I only touched one part of that code. That's why I said 'some'.

Only some of the code has the problems you listed, and some of those are
far from fatal.

How about I ask you what parts of the code do you feel don't need a
complete rewrite?

> > > p.s. if you reply to my mails please keep me in the To line. Really,
> > > please don't do any fany reply to group tricks unless people explicitly
> > > request it in the Mail-Fup header.
> >
> > If you really want duplicates of all the replies, sure, I'll make an
> > exception for you.
> >
> > I don't see why a smarter client, or mail filter, couldn't do the same
> > thing without depending on the behaviour of the sender.
>
> Replying to people personally is good taste. You might know I'm on
> lkml but on many other lists I'm not. As are other people on lkml.
> A filter can easily filter out duplicates but it can't magically
> create copies of mails not addressed to you.

Sure it can. I replied to your email. Check the mail headers,
specifically the ones labeled References and In-Reply-To.

> In addition I tend to read my inbox fairly quick all the time and
> the lkml mailbox only when I'm at least a little idle.

I didn't need an immediate answer.

JE

2004-03-16 00:42:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

On Mon, Mar 15, 2004 at 03:54:14PM -0800, Johannes Erdfelt wrote:
> > Did you actually read it?
>
> The code on openib.org? Yes, I wrote some of it.
>
> I would be the first to say that there are portions that need to be
> rewritten, but I definately do not think all or even most of it does.
>
> That's why I was asking what specifically you found fatally wrong with
> it. I haven't seen many critiques, so I can only assume it's the same
> things I see wrong with it.

Start with the thing Robert already mentioned. Ad ontop of that:

- the horrible Winodes/Linux compat code. We all know this kind
of compat code is messy. But the way it's don in this code is just
incredibly idiotic.
- totally braindead use of macro abstraction
- those split into far too many files
- wrong use of dma mapping abstraction
- braindead malloc code
- wrong modversions handling duplicated in every file

I'm really surprised you're admitting to having touched that code.
I'd have guessed everyone who did would hide in his house ashamed.

> > p.s. if you reply to my mails please keep me in the To line. Really,
> > please don't do any fany reply to group tricks unless people explicitly
> > request it in the Mail-Fup header.
>
> If you really want duplicates of all the replies, sure, I'll make an
> exception for you.
>
> I don't see why a smarter client, or mail filter, couldn't do the same
> thing without depending on the behaviour of the sender.

Replying to people personally is good taste. You might know I'm on
lkml but on many other lists I'm not. As are other people on lkml.
A filter can easily filter out duplicates but it can't magically
create copies of mails not addressed to you.

In addition I tend to read my inbox fairly quick all the time and
the lkml mailbox only when I'm at least a little idle.

2004-03-16 01:45:39

by Roland Dreier

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

I understand that you've only had a short time to review the code, and
many of your comments are points well taken. I think most of the
technical comments can wait to be debated on the openib.org mailing
lists (which I am assured are coming soon). However, since this is
being preserved for posterity in the linux-kernel archive, I wanted to
correct a few inaccuracies.

Robert> 3.) The code is not compliant with the InfiniBand
Robert> specification and has proprietary implementations of
Robert> things like "path records" so it will only work with the
Robert> TopSpin subnet manager that requires you to buy a topspin
Robert> switch.

This is definitely not our intent, and we fix whatever IB compliancy
bugs we find as soon as we know about them. In addition, I know that
people have been able to use the Topspin/Mellanox code from OpenIB
with OpenSM and no Topspin switch (this did require some compliance
fixes to both OpenSM and the Topspin code).

Robert> 6.)The VAPI code has extra propietary verbs that are not
Robert> specified by the InfiniBand Specification.

True, but I'm not sure why this is a deficiency. We found certain
things that were required for good performance were not specified by
the IBTA, and we added them. The Linux way is definitely not to
follow a spec when the spec is wrong.

Robert> 9.) The CM does not support reliable datagrams.

Fair enough but I'm sure we can easily add RD support before any
hardware supporting RD is ready. We took a pragmatic approach and
didn't implement "speculative" features.

Robert> 10) There is no built in support for plug and play events,
Robert> port up/down, LID change, SM change

I'm not sure what plug and play events are (certainly they're not part
of the IB spec). However, we did add extended IB asynchronous events
for LID/SM changes and P_Key changes (port up and down are already
part of the IB spec).

As I said above, the rest of your points are well taken (although of
course there are two sides to every story) and we can talk about them
when we get the openib.org mailing lists up.

- Roland

2004-03-19 18:48:26

by Ulrich Drepper

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

I think I should be telling something about another nuance of this problem.


Parties interested in Infiniband have been working under the OpenGroup
umbrella for quite some time now on API extensions to better accommodate
interconnect fibers. They've even presented at am Austin Group meeting
in 2001 (I think) to get on the road map for being included in POSIX.

But when I wanted to take a look at the specs this was categorically
rejected. My contacts were explicitly forbidden to give the drafts to
anybody but the elite circle. Mind you, Red Hat is member of the OpenGroup.

So, these people come up with their own software stacks, unreviewed
interface extensions, and demand that everybody accepts what they were
"designing" without the ability to question anything.

I surely find this completely unacceptable and any consideration of
accepting anything the Infiniband group comes up with should be
postponed until every bit of the design can be reviewed. If bits and
pieces are accepted prematurely it'll just be "now that this is support
you have to add this too, otherwise it'll not be useful".

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2004-03-19 19:25:55

by Fab Tillier

[permalink] [raw]
Subject: RE: PATCH - InfiniBand Access Layer (IBAL)

> -----Original Message-----
> From: Ulrich Drepper [mailto:[email protected]]
> Sent: Friday, March 19, 2004 10:47 AM
>
> So, these people come up with their own software stacks, unreviewed
> interface extensions, and demand that everybody accepts what they were
> "designing" without the ability to question anything.

Yes, a design review with a period to provide feedback at the design level,
not the code level, would make sense. I don't see how one could argue
against that.

>
> I surely find this completely unacceptable and any consideration of
> accepting anything the Infiniband group comes up with should be
> postponed until every bit of the design can be reviewed. If bits and
> pieces are accepted prematurely it'll just be "now that this is support
> you have to add this too, otherwise it'll not be useful".

For the IBAL stack, there are numerous documents on the Linux InfiniBand
Project (http://infiniband.sourceforge.net/) describing most everything from
the overall architecture to the APIs. On the project home page is a general
overview of what InfiniBand is, and how it fits into the OS. More detailed
documentation is available there too. Of particular interest to this thread
would be the Access Layer documents. Below are links to documents of
interest.

- The overall software architecture spec is the "Linux SAS", available at
http://infiniband.sourceforge.net/LinuxSAS.1.0.1.pdf.
- A presentation describing the IBAL APIs is here:
http://infiniband.sourceforge.net/IAL/Access/AlInterface.pdf
- The IBAL high level design is here:
http://infiniband.sourceforge.net/IAL/Access/IBA_AL_HLD.pdf
- A user's guide to IBAL is here:
http://infiniband.sourceforge.net/IAL/Access/AL_Users_Guide.pdf
- And finally, the API documentation is here:
http://infiniband.sourceforge.net/IAL/Access/IBAL/IBAL_mi.html

HTH,

- Fab

2004-03-19 20:21:00

by Ulrich Drepper

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)


> For the IBAL stack, there are numerous documents on the Linux InfiniBand
> Project (http://infiniband.sourceforge.net/) describing most everything from
> the overall architecture to the APIs.

That's not the same, and it incidentally shows a problem in what you do.

First, the working group I refer to is this:

http://www.opengroup.org/icsc/

Dave Edmondson from Sun gave a presentation at the Austin group meeting
(in 2002, not 2001 as I said before) which you can see here:

http://www.opengroup.org/austin/docs/austin_105.pdf


This is the information I cannot get to. Now why would I want that
instead of what you do?

First of all, these are the extensions to the existing interfaces.
These extensions not only influence existing code, but the extensions
can also be useful for non-interconnect usage. That is, if we can adopt
them if necessary. Many of the interconnect problems also are present
in Gig and 10Gig ethernet. The ICSC refused to give anyone outside
their group access to the document despite the fact that they want to
have the extensions added to POSIX. If we develop separate extensions
they will collide and/or cause unnecessary duplication of code and effort.


Now, why is ICSC relevant? In my opinion, and I'm not really that
knowledgeable about all these networking issues but I know a thing or
two about APIs, going down the road with Infiniband specific APIs is
bad. Why would we want to have separate APIs for other interconnect
fibers? The ICSC, according to my understanding, tries to unify the
APIs to be usable not only by Infiniband. This is IMO highly desirable.

You can get a glimpse of what they are doing by looking at the documents
referenced in

http://www.opengroup.org/icsc/

But that's all there is. The socket extension working group

http://www.opengroup.org/icsc/sockets/

only has some meeting minutes available.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2004-03-19 20:48:15

by Roland Dreier

[permalink] [raw]
Subject: Re: PATCH - InfiniBand Access Layer (IBAL)

Ulrich> I think I should be telling something about another nuance
Ulrich> of this problem. Parties interested in Infiniband have
Ulrich> been working under the OpenGroup umbrella for quite some
Ulrich> time now on API extensions to better accommodate
Ulrich> interconnect fibers. They've even presented at am Austin
Ulrich> Group meeting in 2001 (I think) to get on the road map for
Ulrich> being included in POSIX.

Ulrich> But when I wanted to take a look at the specs this was
Ulrich> categorically rejected. My contacts were explicitly
Ulrich> forbidden to give the drafts to anybody but the elite
Ulrich> circle. Mind you, Red Hat is member of the OpenGroup.

Ulrich> So, these people come up with their own software stacks,
Ulrich> unreviewed interface extensions, and demand that everybody
Ulrich> accepts what they were "designing" without the ability to
Ulrich> question anything.

Ulrich> I surely find this completely unacceptable and any
Ulrich> consideration of accepting anything the Infiniband group
Ulrich> comes up with should be postponed until every bit of the
Ulrich> design can be reviewed. If bits and pieces are accepted
Ulrich> prematurely it'll just be "now that this is support you
Ulrich> have to add this too, otherwise it'll not be useful".

I believe what you are referring to is the OpenGroup's "ITAPI" work.
This is in a sense a competing spec to the DAT Collaborative
(http://www.datcollaborative.org) work. The DAT Collaborative seems to be
far more open, and their spec is available without any hurdles.

Any demands from groups designing unacceptable specs should be treated
with the appropriate level of cooperation.

Note that neither the OpenGroup nor the DAT Collaborative are
affiliated with the InfiniBand Trade Association or the OpenIB group,
although they may have members in common.

Best,
Roland