2007-10-25 06:05:47

by Ph. Marek

[permalink] [raw]
Subject: Exporting a lot of data to other processes?

Hello everybody,

I've already pondered about a question for some time, and would like to ask
for a better idea here. It's not entirely about the kernel - although that
surely has some impact, too.


I've got some process/daemon, that wants to export information to other
processes. As model for exporting that data I found the sysfs and procfs
nice - an easy "cat" can give you the needed data.

Now, in order to do something like that from userspace, I either have to:

-) use FUSE
- feels slow (many context switches)
- much overhead for such common things (another daemon)
-) use named pipes in some directory structure, and keep them open in
the daemon - waiting to be written to
- many open filehandles
- feels not really useable for bigger (>1000) structures
-) use some ramfs/shmfs or similar, and overwrite the data occasionally
- not current data
- runtime overhead (processor load)

Now the open/close events wouldn't be interesting; the read() and (possibly)
write events would have to be relayed (which is not the case for FUSE, IIUC)

Is there some better way? For small structures the pipes seem to be the best
way ... just wait for a reader, give it data, and finished.


Thank you for all ideas.


Regards,

Phil


2007-10-26 22:58:39

by Nix

[permalink] [raw]
Subject: Re: Exporting a lot of data to other processes?

On 25 Oct 2007, Ph. Marek told this:
> -) use some ramfs/shmfs or similar, and overwrite the data occasionally
> - not current data
> - runtime overhead (processor load)

This is roughly what the nscd implementation in glibc does: the client
can work over a socket, but prefers to ask the daemon for a file
descriptor to an mmap()ed copy of the database. Then it works from that.

Properly done this is as efficient as working in the local process, with
only context switch overhead involved, and even that only when the
database is being updated. You *do* have to think about proper locking
of some kind, perhaps by designing the data structure in the shared
mmap()ed region appropriately.

(nscd has an ill-deserved reputation for sloth and memory-hungriness.
This is probably caused by bad experiences with old Solaris nscds, which
were quite appallingly inefficient. glibc nscd manages to speed up
passwd lookups even if your passwd database is only about ten lines
long, simply by amortizing the parsing overhead... and the nscd process
itself incurs *no* CPU overhead except as needed to reparse and update
the database, which is overhead that would otherwise be duplicated in
every process that did service parsing. As for memory hungriness, well,
yes, it looks like nscd is huge, but it's a huge *sparse* mmap()ed
block, so doesn't actually consume more than a few hundred Kb, not the
dozens of Mb it looks like in ps(1).)

--
`Some people don't think performance issues are "real bugs", and I think
such people shouldn't be allowed to program.' --- Linus Torvalds