2009-09-23 06:08:58

by Xavier Roche

[permalink] [raw]
Subject: Inter-process send()/recv() using zero-copy ?

Hi folks,

I was wondering if there was a way to have zero-copy send()/recv(), when
the socket is connected to the local machine (to another process on the
same machine, for example) ?

Such feature would be only feasible with page-aligned blocks, from an a
mmap'ed block to another one, I guess.

Typical case:

Process #1 (uid A)
buff = mmap(0, size, ..) /* anonymous or not */
...
send(s, buff, size, 0)
munmap(buff, size)

Process #2 (uid B)
buff = mmap(0, size, .. | MAP_ANONYMOUS, ..)
recv(s, buff, size, 0)

In an ideal fantasy world, the first process would use send() to
transmit the complete page-aligned memory block to the other side, and
the second process would use recv() to get the memory block on a similar
anonymously mmap'ed block, and the only operation the kernel would do
would be to share the memory block between the two processes with
copy-on-write.

On the real world, the same operation requires a first read of the whole
memory block (possibly partially on disk) and a complete write (possibly
partially on disk, too) with two copies of the same memory region at the
end.

Two solutions can be used to emulate such feature:

1. use a temporary mmap'ed file
- but requires a temporary file
- permissions for the file ? (not necessarily from the same UID)
- special case for local network block transmissions vs. machine-to-machine

2. use shared memory explicitely
- handling of permissions ? (ditto)
- special case for local network block transmissions vs. machine-to-machine

splice() and friends do not appear to give any help for this case, and I
was wondering if there was a chance to do that ?


2009-09-23 06:43:00

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Inter-process send()/recv() using zero-copy ?

On Wed, 23 Sep 2009 08:01:27 +0200
Xavier Roche <[email protected]> wrote:

> Hi folks,
>
> I was wondering if there was a way to have zero-copy send()/recv(),
> when the socket is connected to the local machine (to another process
> on the same machine, for example) ?
>
> Such feature would be only feasible with page-aligned blocks, from an
> a mmap'ed block to another one, I guess.
>
> Typical case:
>
> Process #1 (uid A)
> buff = mmap(0, size, ..) /* anonymous or not */
> ...
> send(s, buff, size, 0)
> munmap(buff, size)
>
> Process #2 (uid B)
> buff = mmap(0, size, .. | MAP_ANONYMOUS, ..)
> recv(s, buff, size, 0)
>
> In an ideal fantasy world, the first process would use send() to
> transmit the complete page-aligned memory block to the other side,
> and the second process would use recv() to get the memory block on a
> similar anonymously mmap'ed block, and the only operation the kernel
> would do would be to share the memory block between the two processes
> with copy-on-write.
>
> On the real world, the same operation requires a first read of the
> whole memory block (possibly partially on disk) and a complete write
> (possibly partially on disk, too) with two copies of the same memory
> region at the end.
>
> Two solutions can be used to

the problem you have is that
1) memory copies are cheap
(say, 3000 cycles/page or less)
2) page table operations (mmap etc) are very expensive.

these two combined tend to not make it a win to substitute simple
copies with complex pagetable tricks.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-23 06:51:30

by Nikita V. Youshchenko

[permalink] [raw]
Subject: Re: Inter-process send()/recv() using zero-copy ?

> the problem you have is that
> 1) memory copies are cheap
> (say, 3000 cycles/page or less)

What about L1 cache pollution? Doesn't it change situation?

> 2) page table operations (mmap etc) are very expensive.
>
> these two combined tend to not make it a win to substitute simple
> copies with complex pagetable tricks.

2009-09-23 07:04:44

by Xavier Roche

[permalink] [raw]
Subject: Re: Inter-process send()/recv() using zero-copy ?

Arjan van de Ven wrote:
> 1) memory copies are cheap
> (say, 3000 cycles/page or less)

Yes, but this case would be more than useful for large memory blocks
(typically memory_size/N, with N typically 2..10) -- something you
generally have when you deal with mmap'ed blocks.