2007-08-19 19:50:47

by Felix Marti

[permalink] [raw]
Subject: RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.



> -----Original Message-----
> From: David Miller [mailto:[email protected]]
> Sent: Sunday, August 19, 2007 12:32 PM
> To: Felix Marti
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> From: "Felix Marti" <[email protected]>
> Date: Sun, 19 Aug 2007 10:33:31 -0700
>
> > I know that you don't agree that TSO has drawbacks, as outlined by
> > Roland, but its history showing something else: the addition of TSO
> > took a fair amount of time and network performance was erratic for
> > multiple kernel revisions and the TSO code is sprinkled across the
> > network stack.
>
> This thing you call "sprinkled" is a necessity of any hardware
> offload when it is possible for a packet to later get "steered"
> to a device which cannot perform the offload.
>
> Therefore we need a software implementation of TSO so that those
> packets can still get output to the non-TSO-capable device.
>
> We do the same thing for checksum offloading.
>
> And for free we can use the software offloading mechanism to
> get batching to arbitrary network devices, even those which cannot
> do TSO.
>
> What benefits does RDMA infrastructure give to non-RDMA capable
> devices? None? I see, that's great.
>
> And again the TSO bugs and issues are being overstated and, also for
> the second time, these issues are more indicative of my bad
> programming skills then they are of intrinsic issues of TSO. The
> TSO implementation was looking for a good design, and it took me
> a while to find it because I personally suck.
>
> Face it, stateless offloads are always going to be better in the long
> term. And this is proven.
>
> You RDMA folks really do live in some kind of fantasy land.
[Felix Marti] You're not at all addressing the fact that RDMA does solve
the memory BW problem and stateless offload doesn't. Apart from that, I
don't quite understand your argument with respect to the benefits of the
RDMA infrastructure; what benefits does the TSO infrastructure give the
non-TSO capable devices? Isn't the answer none and yet you added TSO
support?! I don't think that the argument is stateless _versus_ stateful
offload both have their advantages and disadvantages. Stateless offload
does help, i.e. TSO/LRO do improve performance in back-to-back
benchmarks. It seems me that _you_ claim that there is no benefit to
statefull offload and that is where we're disagreeing; there is benefit
and i.e. the much lower memory BW requirements is just one example, yet
an important one. We'll probably never agree but it seems to me that
we're asking only for small changes to the software stack and then we
can give the choice to the end users: they can opt for stateless offload
if it fits the performance needs or for statefull offload if their apps
require the extra boost in performance.


2007-08-19 22:33:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

"Felix Marti" <[email protected]> writes:

> what benefits does the TSO infrastructure give the
> non-TSO capable devices?

It improves performance on software queueing devices between guests
and hypervisors. This is a more and more important application these
days. Even when the system running the Hypervisor has a non TSO
capable device in the end it'll still save CPU cycles this way. Right now
virtualized IO tends to much more CPU intensive than direct IO so any
help it can get is beneficial.

It also makes loopback faster, although given that's probably not that
useful.

And a lot of the "TSO infrastructure" was needed for zero copy TX anyways,
which benefits most reasonable modern NICs (anything with hardware
checksumming)

-Andi

2007-08-19 23:08:38

by David Miller

[permalink] [raw]
Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

From: "Felix Marti" <[email protected]>
Date: Sun, 19 Aug 2007 12:49:05 -0700

> You're not at all addressing the fact that RDMA does solve the
> memory BW problem and stateless offload doesn't.

It does, I just didn't retort to your claims because they were
so blatantly wrong.

2007-08-19 23:12:21

by David Miller

[permalink] [raw]
Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

From: Andi Kleen <[email protected]>
Date: 20 Aug 2007 01:27:35 +0200

> "Felix Marti" <[email protected]> writes:
>
> > what benefits does the TSO infrastructure give the
> > non-TSO capable devices?
>
> It improves performance on software queueing devices between guests
> and hypervisors. This is a more and more important application these
> days. Even when the system running the Hypervisor has a non TSO
> capable device in the end it'll still save CPU cycles this way. Right now
> virtualized IO tends to much more CPU intensive than direct IO so any
> help it can get is beneficial.
>
> It also makes loopback faster, although given that's probably not that
> useful.
>
> And a lot of the "TSO infrastructure" was needed for zero copy TX anyways,
> which benefits most reasonable modern NICs (anything with hardware
> checksumming)

And also, you can enable TSO generation for a non-TSO-hw device and
get all of the segmentation overhead reduction gains which works out
as a pure win as long as the device can at a minimum do checksumming.