I am resending this note with the subject heading, so that
it can be viewed through the subject catagory.
?>?"David S. Miller" wrote:
>> NAPI is also not the panacea to all problems in the world.
???>Mala did some testing on this a couple of weeks back. It appears that
???>NAPI damaged performance significantly.
>http://www-124.ibm.com/developerworks/opensource/linuxperf/netperf/results/july_02/netperf2.5.25results.htm
>Unfortunately it is not listed what e1000 and core NAPI
>patch was used. Also, not listed, are the RX/TX mitigation
>and ring sizes given to the kernel module upon loading.
The default driver that is included in 2.5.25 kernel for Intel
gigabit adapter was used for the baseline test and the NAPI driver
was downloaded from Robert Olsson's website. I have updated my web
page to include Robert's patch. However it is given there for reference
purpose only. Except for the ones mentioned explicitly the rest of
the configurable values used are default. The default for RX/TX mitigation
is 64 microseconds and the default ring size is 80.
I have added statistics collected during the test to my web site. I do
want to analyze and understand how NAPI can be improved in my tcp_stream
test. Last year around November, when I first tested NAPI, I did find NAPI
results better than the baseline using udp_stream. However I am
concentrating on tcp_stream since that is where NAPI can be improved in
my setup. I will update the website as I do more work on this.
>Robert can comment on optimal settings
I saw Robert's postings. Looks like he may have a more recent version of
NAPI
driver than the one I used. I also see 2.5.33 has NAPI, I will move to
2.5.33
and continue my work on that.
>Robert and Jamal can make a more detailed analysis of Mala's
>graphs than I.
Jamal has questioned about socket buffer size that I used, I have tried
132k
socket buffer size in the past and I didn't see much difference in my
tests.
I will add that to my list again.
Regards,
Mala
Mala Anand
IBM Linux Technology Center - Kernel Performance
E-mail:[email protected]
http://www-124.ibm.com/developerworks/opensource/linuxperf
http://www-124.ibm.com/developerworks/projects/linuxperf
Phone:838-8088; Tie-line:678-8088
Robert Olsson wrote:
>
> Anyway. A tulip NAPI variant added mitigation when we reached "some
> load" to avoid the static interrupt delay. (Still keeping things
> pretty simple):
>
> Load "Mode"
> -------------------
> Lo 1) RxIntDelay=0
> Mid 2) RxIntDelay=fix (When we had X pkts on the RX ring)
> Hi 3) Consecutive polling. No RX interrupts.
>
Sounds good.
The difficult part is when to go from Lo to Mid. Unfortunately my tulip
card is braindead (LC82C168), but I'll try to find something usable for
benchmarking
In my tests with the winbond card, I've switched at a fixed packet rate:
< 2000 packets/sec: no delay
> 2000 packets/sec: poll rx at 0.5 ms
--
Manfred
> > Load "Mode"
> > -------------------
> > Lo 1) RxIntDelay=0
> > Mid 2) RxIntDelay=fix (When we had X pkts on the RX ring)
> > Hi 3) Consecutive polling. No RX interrupts.
Manfred Spraul writes:
> Sounds good.
>
> The difficult part is when to go from Lo to Mid. Unfortunately my tulip
> card is braindead (LC82C168), but I'll try to find something usable for
> benchmarking
21143 for tulip's. Well any NIC with "RxIntDelay" should do.
> In my tests with the winbond card, I've switched at a fixed packet rate:
>
> < 2000 packets/sec: no delay
> > 2000 packets/sec: poll rx at 0.5 ms
I was experimenting with all sorts of moving averages but never got a good
correlation with bursty network traffic as this level of resolution. The
only measure I found fast and simple enough for this was the number of
packets on the RX ring as I mentioned.
Cheers.
--ro
From: [email protected]
Date: Mon, 16 Sep 2002 15:32:56 -0600 (MDT)
new system calls into the networking code
The system calls would go into the VFS, sys_receivefile is not
networking specific in any way shape or form.
And to answer your question, if I had the time I'd work on it yes.
Right now the answer to "well do you have the time" is no, I am
working on something much more important wrt. Linux networking. I've
hinted at what this is in previous postings, and if people can't
figure out what it is I'm not going to mention this explicitly :-)
[email protected] said:
> new system calls into the networking code
> The system calls would go into the VFS, sys_receivefile is not
> networking specific in any way shape or form.
Er, surely the same goes for sys_sendfile? Why have a new system call
rather than just swapping the 'in' and 'out' fds?
--
dwmw2
From: David Woodhouse <[email protected]>
Date: Mon, 16 Sep 2002 23:53:00 +0100
Er, surely the same goes for sys_sendfile? Why have a new system call
rather than just swapping the 'in' and 'out' fds?
There is an assumption that one is a linear stream of output (in this
case a socket) and the other one is a page cache based file.
It would be nice to extend sys_sendfile to work properly in both
ways in a manner that Linus would accept, want to work on that?
[email protected] said:
> > Er, surely the same goes for sys_sendfile? Why have a new system
> > call rather than just swapping the 'in' and 'out' fds?
> There is an assumption that one is a linear stream of output (in this
> case a socket) and the other one is a page cache based file.
That's an implementation detail and it's not clear we should be exposing it
to the user. It's not entirely insane to contemplate socket->socket or
file->file sendfile either -- would we invent new system calls for those
too? File descriptors are file descriptors.
> It would be nice to extend sys_sendfile to work properly in both ways
> in a manner that Linus would accept, want to work on that?
Yeah -- I'll add it to the TODO list. Scheduled for some time in 2007 :)
More seriously though, I'd hope that whoever implemented what you call
'sys_receivefile' would solve this issue, as 'sys_receivefile' isn't really
useful as anything more than a handy nomenclature for describing the
process in question.
--
dwmw2
David Woodhouse wrote:
> [email protected] said:
>
>>> Er, surely the same goes for sys_sendfile? Why have a new system
>>> call rather than just swapping the 'in' and 'out' fds?
>>
>
>>There is an assumption that one is a linear stream of output (in this
>>case a socket) and the other one is a page cache based file.
>
>
> That's an implementation detail and it's not clear we should be exposing it
> to the user. It's not entirely insane to contemplate socket->socket or
> file->file sendfile either -- would we invent new system calls for those
> too? File descriptors are file descriptors.
I was rather disappointed when file->file sendfile was [purposefully?]
broken in 2.5.x...
Jeff
From: Jeff Garzik <[email protected]>
Date: Mon, 16 Sep 2002 19:08:15 -0400
I was rather disappointed when file->file sendfile was [purposefully?]
broken in 2.5.x...
What change made this happen?
David S. Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Mon, 16 Sep 2002 19:08:15 -0400
>
> I was rather disappointed when file->file sendfile was [purposefully?]
> broken in 2.5.x...
>
> What change made this happen?
I dunno when it happened, but 2.5.x now returns EINVAL for all
file->file cases.
In 2.4.x, if sendpage is NULL, file_send_actor in mm/filemap.c faked a
call to fops->write().
In 2.5.x, if sendpage is NULL, EINVAL is unconditionally returned.
From: Jeff Garzik <[email protected]>
Date: Mon, 16 Sep 2002 19:48:37 -0400
I dunno when it happened, but 2.5.x now returns EINVAL for all
file->file cases.
In 2.4.x, if sendpage is NULL, file_send_actor in mm/filemap.c faked a
call to fops->write().
In 2.5.x, if sendpage is NULL, EINVAL is unconditionally returned.
What if source and destination file and offsets match?
Sounds like 2.4.x might deadlock.
In fact it sounds similar to the "read() with buf pointed to same
page in MAP_WRITE mmap()'d area" deadlock we had ages ago.
#include <sys/sendfile.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
#include <stdio.h>
int main (int argc, char *argv[])
{
int in, out;
struct stat st;
off_t off = 0;
ssize_t rc;
in = open("test.data", O_RDONLY);
if (in < 0) {
perror("test.data read");
return 1;
}
fstat(in, &st);
out = open("test.data", O_WRONLY);
if (out < 0) {
perror("test.data write");
return 1;
}
rc = sendfile(out, in, &off, st.st_size);
if (rc < 0) {
perror("sendfile");
close(in);
unlink("out");
close(out);
return 1;
}
close(in);
close(out);
return 0;
}