LinuxLists.cc - RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS

2007-08-20 16:55:21

Subject: RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

> -----Original Message-----
> From: Evgeniy Polyakov [mailto:[email protected]]
> Sent: Monday, August 20, 2007 2:43 AM
> To: Felix Marti
> Cc: David Miller; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> On Sun, Aug 19, 2007 at 05:47:59PM -0700, Felix Marti
> ([email protected]) wrote:
> > [Felix Marti] David and Herbert, so you agree that the user<>kernel
> > space memory copy overhead is a significant overhead and we want to
> > enable zero-copy in both the receive and transmit path? - Yes, copy
>
> It depends. If you need to access that data after received, you will
> get
> cache miss and performance will not be much better (if any) that with
> copy.
Yes, the app will take the cache hits when accessing the data. However,
the fact remains that if there is a copy in the receive path, you
require and additional 3x memory BW (which is very significant at these
high rates and most likely the bottleneck for most current systems)...
and somebody always has to take the cache miss be it the copy_to_user or
the app.
>
> > avoidance is mainly an API issue and unfortunately the so widely
used
> > (synchronous) sockets API doesn't make copy avoidance easy, which is
> one
> > area where protocol offload can help. Yes, some apps can resort to
> > sendfile() but there are many apps which seem to have trouble
> switching
> > to that API... and what about the receive path?
>
> There is number of implementations, and all they are suitable for is
> to have recvfile(), since this is likely the only case, which can work
> without cache.
>
> And actually RDMA stack exist and no one said it should be thrown away
> _until_ it messes with main stack. It started to speal ports. What
will
> happen when it gest all port space and no new legal network conection
> can be opened, although there is no way to show to user who got it?
> What will happen if hardware RDMA connection got terminated and
> software
> could not free the port? Will RDMA request to export connection reset
> functions out of stack to drop network connections which are on the
> ports
> which are supposed to be used by new RDMA connections?
Yes, RDMA support is there... but we could make it better and easier to
use. We have a problem today with port sharing and there was a proposal
to address the issue by tighter integration (see the beginning of the
thread) but the proposal got shot down immediately... because it is RDMA
and not for technical reasons. I believe this email threads shows in
detail how RDMA (a network technology) is treated as bastard child by
the network folks, well at least by one of them.
>
> RDMA is not a problem, but how it influence to the network stack is.
> Let's better think about how to work correctly with network stack
> (since
> we already have that cr^Wdifferent hardware) instead of saying that
> others do bad work and do not allow shiny new feature to exist.
By no means did I want to imply that others do bad work; are you
referring to me using TSO implementation issues as an example? - If so,
let me clarify: I understand that the TSO implementation took some time
to get right. What I was referring to is that TSO(/LRO) have their own
issues, some eluded to by Roland and me. In fact, customers working on
the LSR couldn't use TSO due to the burstiness it introduces and had to
fall-back to our fine grained packet scheduling done in the offload
device. I am for variety, let us support new technologies that solve
real problems (lots of folks are buying this stuff for a reason) instead
of the 'ah, its brain-dead and has no future' attitude... there is
precedence for offloading the host CPUs: have a look at graphics.
Graphics used to be done by the host CPU and now we have dedicated
graphics adapters that do a much better job... so, why is it so
farfetched that offload devices can do a better job at a data-flow
problem?
>
> --
> Evgeniy Polyakov

2007-08-20 17:17:20

by Andi Kleen

[permalink] [raw]

Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

"Felix Marti" <[email protected]> writes:

> What I was referring to is that TSO(/LRO) have their own
> issues, some eluded to by Roland and me. In fact, customers working on
> the LSR couldn't use TSO due to the burstiness it introduces

That was in old kernels where TSO didn't honor the initial cwnd correctly,
right? I assume it's long fixed.

If not please clarify what the problem was.

> have a look at graphics.
> Graphics used to be done by the host CPU and now we have dedicated
> graphics adapters that do a much better job...

Is your off load device as programable as a modern GPU?

> farfetched that offload devices can do a better job at a data-flow
> problem?

One big difference is that there is no potentially adverse and
always varying internet between the graphics card and your monitor.

-Andi

2007-08-20 21:29:17

by Patrick Geoffray

[permalink] [raw]

Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

Felix Marti wrote:
> Yes, the app will take the cache hits when accessing the data. However,
> the fact remains that if there is a copy in the receive path, you
> require and additional 3x memory BW (which is very significant at these
> high rates and most likely the bottleneck for most current systems)...
> and somebody always has to take the cache miss be it the copy_to_user or
> the app.

The cache miss is going to cost you half the memory bandwidth of a full
copy. If the data is already in cache, then the copy is cheaper.

However, removing the copy removes the kernel from the picture on the
receive side, so you lose demultiplexing, asynchronism, security,
accounting, flow-control, swapping, etc. If it's ok with you to not use
the kernel stack, then why expect to fit in the existing infrastructure
anyway ?

> Yes, RDMA support is there... but we could make it better and easier to

What do you need from the kernel for RDMA support beyond HW drivers ? A
fast way to pin and translate user memory (ie registration). That is
pretty much the sandbox that David referred to.

Eventually, it would be useful to be able to track the VM space to
implement a registration cache instead of using ugly hacks in user-space
to hijack malloc, but this is completely independent from the net stack.

> use. We have a problem today with port sharing and there was a proposal

The port spaces are either totally separate and there is no issue, or
completely identical and you should then run your connection manager in
user-space or fix your middlewares.

> and not for technical reasons. I believe this email threads shows in
> detail how RDMA (a network technology) is treated as bastard child by
> the network folks, well at least by one of them.

I don't think it's fair. This thread actually show how pushy some RDMA
folks are about not acknowledging that the current infrastructure is
here for a reason, and about mistaking zero-copy and RDMA.

This is a similar argument than the TOE discussion, and it was
definitively a good decision to not mess up the Linux stack with TOEs.

Patrick