2002-09-10 14:54:46

by Mala Anand

[permalink] [raw]
Subject: Early SPECWeb99 results on 2.5.33 with TSO on e1000

I am resending this note with the subject heading, so that
it can be viewed through the subject catagory.

?>?"David S. Miller" wrote:
>> NAPI is also not the panacea to all problems in the world.

???>Mala did some testing on this a couple of weeks back. It appears that
???>NAPI damaged performance significantly.



>http://www-124.ibm.com/developerworks/opensource/linuxperf/netperf/results/july_02/netperf2.5.25results.htm



>Unfortunately it is not listed what e1000 and core NAPI
>patch was used. Also, not listed, are the RX/TX mitigation
>and ring sizes given to the kernel module upon loading.
The default driver that is included in 2.5.25 kernel for Intel
gigabit adapter was used for the baseline test and the NAPI driver
was downloaded from Robert Olsson's website. I have updated my web
page to include Robert's patch. However it is given there for reference
purpose only. Except for the ones mentioned explicitly the rest of
the configurable values used are default. The default for RX/TX mitigation
is 64 microseconds and the default ring size is 80.

I have added statistics collected during the test to my web site. I do
want to analyze and understand how NAPI can be improved in my tcp_stream
test. Last year around November, when I first tested NAPI, I did find NAPI
results better than the baseline using udp_stream. However I am
concentrating on tcp_stream since that is where NAPI can be improved in
my setup. I will update the website as I do more work on this.


>Robert can comment on optimal settings
I saw Robert's postings. Looks like he may have a more recent version of
NAPI
driver than the one I used. I also see 2.5.33 has NAPI, I will move to
2.5.33
and continue my work on that.


>Robert and Jamal can make a more detailed analysis of Mala's
>graphs than I.
Jamal has questioned about socket buffer size that I used, I have tried
132k
socket buffer size in the past and I didn't see much difference in my
tests.
I will add that to my list again.


Regards,
Mala


Mala Anand
IBM Linux Technology Center - Kernel Performance
E-mail:[email protected]
http://www-124.ibm.com/developerworks/opensource/linuxperf
http://www-124.ibm.com/developerworks/projects/linuxperf
Phone:838-8088; Tie-line:678-8088






2002-09-10 16:52:08

by Manfred Spraul

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

Robert Olsson wrote:
>
> Anyway. A tulip NAPI variant added mitigation when we reached "some
> load" to avoid the static interrupt delay. (Still keeping things
> pretty simple):
>
> Load "Mode"
> -------------------
> Lo 1) RxIntDelay=0
> Mid 2) RxIntDelay=fix (When we had X pkts on the RX ring)
> Hi 3) Consecutive polling. No RX interrupts.
>
Sounds good.

The difficult part is when to go from Lo to Mid. Unfortunately my tulip
card is braindead (LC82C168), but I'll try to find something usable for
benchmarking

In my tests with the winbond card, I've switched at a fixed packet rate:

< 2000 packets/sec: no delay
> 2000 packets/sec: poll rx at 0.5 ms



--
Manfred

2002-09-11 07:34:48

by Robert Olsson

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000



> > Load "Mode"
> > -------------------
> > Lo 1) RxIntDelay=0
> > Mid 2) RxIntDelay=fix (When we had X pkts on the RX ring)
> > Hi 3) Consecutive polling. No RX interrupts.

Manfred Spraul writes:

> Sounds good.
>
> The difficult part is when to go from Lo to Mid. Unfortunately my tulip
> card is braindead (LC82C168), but I'll try to find something usable for
> benchmarking

21143 for tulip's. Well any NIC with "RxIntDelay" should do.

> In my tests with the winbond card, I've switched at a fixed packet rate:
>
> < 2000 packets/sec: no delay
> > 2000 packets/sec: poll rx at 0.5 ms

I was experimenting with all sorts of moving averages but never got a good
correlation with bursty network traffic as this level of resolution. The
only measure I found fast and simple enough for this was the number of
packets on the RX ring as I mentioned.


Cheers.
--ro

2002-09-16 21:33:25

by David Miller

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

From: [email protected]
Date: Mon, 16 Sep 2002 15:32:56 -0600 (MDT)

new system calls into the networking code

The system calls would go into the VFS, sys_receivefile is not
networking specific in any way shape or form.

And to answer your question, if I had the time I'd work on it yes.

Right now the answer to "well do you have the time" is no, I am
working on something much more important wrt. Linux networking. I've
hinted at what this is in previous postings, and if people can't
figure out what it is I'm not going to mention this explicitly :-)

2002-09-16 22:48:09

by David Woodhouse

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000


[email protected] said:
> new system calls into the networking code
> The system calls would go into the VFS, sys_receivefile is not
> networking specific in any way shape or form.

Er, surely the same goes for sys_sendfile? Why have a new system call
rather than just swapping the 'in' and 'out' fds?

--
dwmw2


2002-09-16 22:50:38

by David Miller

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

From: David Woodhouse <[email protected]>
Date: Mon, 16 Sep 2002 23:53:00 +0100

Er, surely the same goes for sys_sendfile? Why have a new system call
rather than just swapping the 'in' and 'out' fds?

There is an assumption that one is a linear stream of output (in this
case a socket) and the other one is a page cache based file.

It would be nice to extend sys_sendfile to work properly in both
ways in a manner that Linus would accept, want to work on that?

2002-09-16 22:58:24

by David Woodhouse

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000


[email protected] said:
> > Er, surely the same goes for sys_sendfile? Why have a new system
> > call rather than just swapping the 'in' and 'out' fds?

> There is an assumption that one is a linear stream of output (in this
> case a socket) and the other one is a page cache based file.

That's an implementation detail and it's not clear we should be exposing it
to the user. It's not entirely insane to contemplate socket->socket or
file->file sendfile either -- would we invent new system calls for those
too? File descriptors are file descriptors.

> It would be nice to extend sys_sendfile to work properly in both ways
> in a manner that Linus would accept, want to work on that?

Yeah -- I'll add it to the TODO list. Scheduled for some time in 2007 :)

More seriously though, I'd hope that whoever implemented what you call
'sys_receivefile' would solve this issue, as 'sys_receivefile' isn't really
useful as anything more than a handy nomenclature for describing the
process in question.

--
dwmw2


2002-09-16 23:03:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

David Woodhouse wrote:
> [email protected] said:
>
>>> Er, surely the same goes for sys_sendfile? Why have a new system
>>> call rather than just swapping the 'in' and 'out' fds?
>>
>
>>There is an assumption that one is a linear stream of output (in this
>>case a socket) and the other one is a page cache based file.
>
>
> That's an implementation detail and it's not clear we should be exposing it
> to the user. It's not entirely insane to contemplate socket->socket or
> file->file sendfile either -- would we invent new system calls for those
> too? File descriptors are file descriptors.

I was rather disappointed when file->file sendfile was [purposefully?]
broken in 2.5.x...

Jeff



2002-09-16 23:06:09

by David Miller

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

From: Jeff Garzik <[email protected]>
Date: Mon, 16 Sep 2002 19:08:15 -0400

I was rather disappointed when file->file sendfile was [purposefully?]
broken in 2.5.x...

What change made this happen?

2002-09-16 23:44:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

David S. Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Mon, 16 Sep 2002 19:08:15 -0400
>
> I was rather disappointed when file->file sendfile was [purposefully?]
> broken in 2.5.x...
>
> What change made this happen?


I dunno when it happened, but 2.5.x now returns EINVAL for all
file->file cases.

In 2.4.x, if sendpage is NULL, file_send_actor in mm/filemap.c faked a
call to fops->write().
In 2.5.x, if sendpage is NULL, EINVAL is unconditionally returned.

2002-09-16 23:47:45

by David Miller

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

From: Jeff Garzik <[email protected]>
Date: Mon, 16 Sep 2002 19:48:37 -0400

I dunno when it happened, but 2.5.x now returns EINVAL for all
file->file cases.

In 2.4.x, if sendpage is NULL, file_send_actor in mm/filemap.c faked a
call to fops->write().
In 2.5.x, if sendpage is NULL, EINVAL is unconditionally returned.


What if source and destination file and offsets match?
Sounds like 2.4.x might deadlock.

In fact it sounds similar to the "read() with buf pointed to same
page in MAP_WRITE mmap()'d area" deadlock we had ages ago.

2002-09-16 23:57:00

by Jeff Garzik

[permalink] [raw]
Subject: Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000

#include <sys/sendfile.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#include <stdio.h>

int main (int argc, char *argv[])
{
int in, out;
struct stat st;
off_t off = 0;
ssize_t rc;

in = open("test.data", O_RDONLY);
if (in < 0) {
perror("test.data read");
return 1;
}

fstat(in, &st);

out = open("test.data", O_WRONLY);
if (out < 0) {
perror("test.data write");
return 1;
}

rc = sendfile(out, in, &off, st.st_size);
if (rc < 0) {
perror("sendfile");
close(in);
unlink("out");
close(out);
return 1;
}

close(in);
close(out);
return 0;
}


Attachments:
sendfile-test-2.c (635.00 B)