Mark Mielke wrote:
> As far as I understand, sendfile() still requires the data to get from the
> disk to a page in memory, similar to how send() referencing an mmap()'d page
> may cause a page fault, reading the data from disk to a page in memory. One
> copy each. I don't know of a kernel interface that lets data be copied from
> disk to ethernet card without involving a temporary copy to be in paged
> memory at some point in time... perhaps the iSCSI stuff can do this? I dunno.
According to this:
http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm
using sendfile() is easier on the CPU due to less trashing of the TLB.
I do get your point about protocol limitiations though.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
On Fri, May 02, 2003 at 12:19:25AM -0400, Chris Friesen wrote:
> According to this:
> http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm
> using sendfile() is easier on the CPU due to less trashing of the TLB.
Thanks for the link. It looks quite accurate.
One question it raises in my mind, is whether there would be value in
improving write()/send() such that they detect that the userspace
pointer refers entirely to mmap()'d file pages, and therefore no copy
of data from userspace -> kernelspace should be performed. The pages
could be loaded and accessed directly (as they are with sendfile())
rather than generating a page fault to load the pages. The TLB trashing
does not need to occur.
I guess the first response to this question would be 'why not use
sendfile()? it already exists, and people have already begun to use
it...'
My answer is that I don't like sendfile(). It is not-yet-standard, and
is fairly limited. I could just be naive, but I think that:
write(fd, mmapped_file_pages, length);
Could be transparently mapped to the sendfile() code in the kernel,
minimizing the benefit of sendfile() having its own system call. It all
comes down to optimization. The current implementation of mmap() is not
optimal where mmap()'d file pages are passed as data to system calls.
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
On Fri, 2 May 2003, Mark Mielke wrote:
> On Fri, May 02, 2003 at 12:19:25AM -0400, Chris Friesen wrote:
> > According to this:
> > http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm
> > using sendfile() is easier on the CPU due to less trashing of the TLB.
>
> Thanks for the link. It looks quite accurate.
>
> One question it raises in my mind, is whether there would be value in
> improving write()/send() such that they detect that the userspace
> pointer refers entirely to mmap()'d file pages, and therefore no copy
> of data from userspace -> kernelspace should be performed. The pages
> could be loaded and accessed directly (as they are with sendfile())
> rather than generating a page fault to load the pages. The TLB trashing
> does not need to occur.
>
> I guess the first response to this question would be 'why not use
> sendfile()? it already exists, and people have already begun to use
> it...'
>
> My answer is that I don't like sendfile(). It is not-yet-standard, and
> is fairly limited. I could just be naive, but I think that:
>
> write(fd, mmapped_file_pages, length);
>
> Could be transparently mapped to the sendfile() code in the kernel,
> minimizing the benefit of sendfile() having its own system call. It all
> comes down to optimization. The current implementation of mmap() is not
> optimal where mmap()'d file pages are passed as data to system calls.
This is somewhat similar to what I want to do as well. As long as sendfile
can have this, why cant we make write/send/... similar. Thus, removing the
copy operation. Then, one can easier support streaming applications (or
applications needing more control than sendfile)!
-ph
> mark
> Sat, May 03, 2003 at 12:42:59AM +0000, Miquel van Smoorenburg wrote:
> > In article <[email protected]>,
> > Mark Mielke <[email protected]> wrote:
> > >One question it raises in my mind, is whether there would be value in
> > >improving write()/send() such that they detect that the userspace
> > >pointer refers entirely to mmap()'d file pages, and therefore no copy
> > >of data from userspace -> kernelspace should be performed.
> > You mean like
> >
> http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html
>
> Yes, definately, and thank you for referring us to work that has already
> been done.
>
> mark
Does this mean that if you memory map a file and send it through TCP,
you'll have no copy operations transfering data from disk to NIC (except
the DMS transfers disk->memory and memory->NIC)?
Does there exist work implementing this also for UDP?
-ph
On Sat, 03 May 2003 23:01:21, P?l Halvorsen wrote:
>
> > Sat, May 03, 2003 at 12:42:59AM +0000, Miquel van Smoorenburg wrote:
> > > In article <[email protected]>,
> > > Mark Mielke <[email protected]> wrote:
> > > >One question it raises in my mind, is whether there would be value
> in
> > > >improving write()/send() such that they detect that the userspace
> > > >pointer refers entirely to mmap()'d file pages, and therefore no
> copy
> > > >of data from userspace -> kernelspace should be performed.
> > > You mean like
> > >
> >
> http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html
> >
> > Yes, definately, and thank you for referring us to work that has
> already
> > been done.
> >
> > mark
>
> Does this mean that if you memory map a file and send it through TCP,
> you'll have no copy operations transfering data from disk to NIC (except
> the DMS transfers disk->memory and memory->NIC)?
No. I just referred to an earlier discussion about this topic. That does't
mean it has been implemented. In fact if you actually read that discussion
you'll see that it probably won't be implemented at all.
Mike.
--
| Miquel van Smoorenburg | "I know one million ways, to always pick
|
| miquels@{drinkel.,}cistron.nl | the wrong fantasy" - the Black Crowes.
|