2005-01-30 09:16:07

by Oded Shimon

[permalink] [raw]
Subject: Pipes and fd question. Large amounts of data.

A Unix C programming question. Has to do mostly with pipes, so I am hoping I
am asking in the right place.

I have a rather unique situation. I have 2 programs, neither of which have
control over.
Program A writes into TWO fifo's.
Program B reads from two fifo's.

My program is the middle step.

The problem - neither programs are aware of each other, and write into any of
the fifo's at their own free will. They will also block until whatever data
moving they did is complete.

Meaning, if I were to use the direct approach and have no middle step, the
programs would be thrown into a deadlock instantly. as one program will write
info fifo 1, and the other will be reading from fifo 2.

The amounts of data is very large, GB's of data in total, and at least 10mb a
second or possibly as much as 300mb a second. So efficiency in context
switching is very important.

programs A & B both write and read using large chunks, usually 300k.

So far, my solution is using select() and non blocking pipes. I also used
large buffers (20mb). In my measurements, at worst case the programs
write/read 6mb before switching to the other fifo. so 20mb is safe enough.

I have implemented this, but it has a major disadvantage - every 'write()'
only write 4k at a time, never more, because of how non-blocking pipes are
done. at 20,000 context switches a second, this method reaches barely 10mb a
second, if not less.

Blocking pipes have an advantage - they can write large chunks at a time. They
have a more serious disadvantage though - the amount of data you ask to be
written/read, IS the amount of data that will be written or read, and will
block until that much data is moved. I cannot know beforehand exactly how
much data the programs want, so this could easily fall into a dead lock.

Ideally, I could do this:
my program: write(20mb);
program B: read(300k);
my program: write() returns with return value '300,000'

I was unable to find anything like this solution or similar.
No combination of blocking/non blocking fd's will give this, or any system
call.
I am looking for alternative/better suggestions.

- ods15.


2005-01-30 10:48:51

by Oded Shimon

[permalink] [raw]
Subject: Re: Pipes and fd question. Large amounts of data.

On Sunday 30 January 2005 11:41, Miles wrote:
> My suggestion would be to perform blocking writes in a seperate thread
> for each of the two written-to fds. You can still use select/poll for
> the read side ... tho' once you're using threading on the write side it
> might be more straightforward to to use threading on the read side as
> well. Bear in mind that if you do that you'll need to dedicate threads
> to _each_ of the four fds, because each of them could block
> independently while progress is required on one or more of the others.
>
> I'd say that this was one of the rare cases where a solution using
> threads is not only superior to one using event-driven IO, but actually
> required.

Yeah, I reached just about the same conclusion. At first I thought only 2
threads were necessary, one for each data flow, but I realized a deadlock
could happen just as well in that too. Making a 4 thread implementation I
trust is gonna be hard... I better get working. :)

Thanks for the reply,
- ods15

2005-01-30 19:41:40

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: Pipes and fd question. Large amounts of data.

In article <[email protected]>,
Oded Shimon <[email protected]> wrote:
>I have implemented this, but it has a major disadvantage - every 'write()'
>only write 4k at a time, never more, because of how non-blocking pipes are
>done. at 20,000 context switches a second, this method reaches barely 10mb a
>second, if not less.

If you're using pipe(), you might want to try socketpair()
instead. You can setsockopt() SO_RCVBUF and SO_SNDBUF to
large values if you want.

Mike.

2005-01-31 15:03:05

by Chris Friesen

[permalink] [raw]
Subject: Re: Pipes and fd question. Large amounts of data.

Oded Shimon wrote:
> On Sunday 30 January 2005 11:41, Miles wrote:

>>I'd say that this was one of the rare cases where a solution using
>>threads is not only superior to one using event-driven IO, but actually
>>required.

> Yeah, I reached just about the same conclusion. At first I thought only 2
> threads were necessary, one for each data flow, but I realized a deadlock
> could happen just as well in that too. Making a 4 thread implementation I
> trust is gonna be hard... I better get working. :)

Your other option would be to use processes with shared memory (either
sysV or memory-mapped files). This gets you the speed of shared memory
maps, but also lets you get the reliability of not sharing your entire
memory space.

If you use NPTL, your locking should be quick as well. If not, you can
always roll your own futex-based locking.

Chris

2005-01-31 15:14:22

by Oded Shimon

[permalink] [raw]
Subject: Re: Pipes and fd question. Large amounts of data.

On Monday 31 January 2005 17:02, Chris Friesen wrote:
> Your other option would be to use processes with shared memory (either
> sysV or memory-mapped files). This gets you the speed of shared memory
> maps, but also lets you get the reliability of not sharing your entire
> memory space.
>
> If you use NPTL, your locking should be quick as well. If not, you can
> always roll your own futex-based locking.

To be honest, most of that was gibrish to me (NTPL, futex, sysV..).. Most of
my experience with system calls is with pipes and files, I know very little
about these other things...
Either way, you are a bit late, just half an hour ago, I have completed my
program, and it works. :) I finished the pthread instead of select()
implementation pretty quickly (now I understand why lazy programmers use
threads.. heh), what took me so long was troubles with the 2 other programs,
had to refine their command line params carefully...
(btw, the 2 other programs - MPlayer and MEncoder, and my job was transferring
video AND audio between them.)

Thankyou for the reply,
- ods15