LinuxLists.cc - [PATCH] POSIX message queues

2002-08-27 21:43:50

Subject: [PATCH] POSIX message queues

Hello,

We have written (Michal Wronski & me) a patch for 2.4.18 & 2.4.19 kernel
adding POSIX message queues (mq_* family functions). In difference to
standard SysV message queues (msgsnd() etc) they allow to send messages
with different priorities, support asynchronous receive and more.

As patches are quite large here is URL to our page:
http://www.mat.uni.torun.pl/~wrona/posix_ipc

and direct links:
http://www.mat.uni.torun.pl/~wrona/posix_ipc/mqueue_patch-2.4.18.tar.gz
http://www.mat.uni.torun.pl/~wrona/posix_ipc/mqueue_patch-2.4.19.tar.gz

also there is an appropriate library:
http://www.mat.uni.torun.pl/~wrona/posix_ipc/mqueue_lib-2.0.tar.gz

I'd like to point out some issues:

1) Architecture & syscall - We have added support only for i386. But the
only one thing which is arch. dependent is a new system call. So porting
it is fairly simply. We haven't done it only because we are afraid of
making some silly bugs without any possibility to test. We know that
_new_ system call is rather "undesirable", but we can add invoking of our
code to e.g. ipc syscall.

2) SMP - I'm not SMP specialist and so everything touching SMP is done in
the simplest way. Meaning: it can be done more efficient. Also
there can be bugs - we don't have SMP box to test it on (during academic
year it can change).

Awaiting comments, feedbacks, etc. (especially about above two points)

Krzysiek Benedyczak

2002-08-27 22:12:07

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Tue, Aug 27, 2002 at 11:48:07PM +0200, Krzysztof Benedyczak wrote:
> Awaiting comments, feedbacks, etc. (especially about above two points)

The multiplexer syscall is horribly ugly. I'd suggest implementing it
as filesystems so each message queue object can be represented as file,
using defined file methods as much as possible.

2002-08-29 22:05:06

by Peter Waechtler

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

some comments as asked for:

I know that it's nowhere stated, but POSIX mqueues are perfectly
designed to be
implemented in userspace with locking facilities provided by the system.

With PROCESS_SHARED mutexes and condvars in NGPT we have that - and I am
in the process in converting the mmap() based implementation of
Richard Stevens in UNPv2 onto Linux.

The messages are stored in shmem and the library routines access the
structures
with proper locking. I am not very happy about the fact, that with
futexes the whole
cooperating system get stuck when 1 process crashes inside a critical
region
(yes, then your system is screwed anyway).
BUT the messages are not copied between user- and kernelspace like they
are
in SysV msgsnd.

POSIX mqueues have "kernel persistence", i.e. they live until
mq_unlink() is called.
They do not vanish with the creator on exit().
Without rlimits you can easily consume all available kernel memory (DoS)
by creating
a mqueue and filling it with garbage.

When implemented in kernel space, you have to create a thread with the
brand new
sys_clone_startup (or whatever name it gets) as notification
(SIGEV_THREAD) - which
is SCOPE_SYSTEM, no control about this and not always what is desired.

2002-08-30 14:45:53

by Amos Waterland

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Thu, Aug 29, 2002 at 11:53:50PM +0200, [email protected] wrote:
> some comments as asked for:
>
> I know that it's nowhere stated, but POSIX mqueues are perfectly
> designed to be implemented in userspace with locking facilities
> provided by the system.

I am not sure if this is correct. You can achieve proper locking in
userspace, but I do not think you achieve proper security.

I assume you are proposing an implementation based on shared memory:
which means that at least some pages of the shared memory must be
writable. If the processes cooperate and only write to the shared pages
through library routines which use sychronization, things are ok, but a
malicious process could forge messages or perform DOS attacks etc. by
bypassing the mq_*() functions and using write().

> With PROCESS_SHARED mutexes and condvars in NGPT we have that - and I
> am in the process in converting the mmap() based implementation of
> Richard Stevens in UNPv2 onto Linux.
>
> The messages are stored in shmem and the library routines access the
> structures with proper locking. I am not very happy about the fact,
> that with futexes the whole cooperating system get stuck when 1
> process crashes inside a critical region (yes, then your system is
> screwed anyway). BUT the messages are not copied between user- and
> kernelspace like they are in SysV msgsnd.
>
> POSIX mqueues have "kernel persistence", i.e. they live until
> mq_unlink() is called. They do not vanish with the creator on exit().
> Without rlimits you can easily consume all available kernel memory
> (DoS) by creating a mqueue and filling it with garbage.

The mq_maxmsg and mq_msgsize members of the mq_attr structure required
if O_CREAT is passed to mq_open() ensure that an implementation can
prevent the kernel memory DoS you mention: a malicious application can
only fill up the MQ memory.

> When implemented in kernel space, you have to create a thread with the
> brand new sys_clone_startup (or whatever name it gets) as notification
> (SIGEV_THREAD) - which is SCOPE_SYSTEM, no control about this and not
> always what is desired.

2002-08-31 11:42:34

by Peter Waechtler

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Am Freitag den, 30. August 2002, um 11:48, schrieb Amos Waterland:

> On Thu, Aug 29, 2002 at 11:53:50PM +0200, [email protected] wrote:
>> some comments as asked for:
>>
>> I know that it's nowhere stated, but POSIX mqueues are perfectly
>> designed to be implemented in userspace with locking facilities
>> provided by the system.
>
> I am not sure if this is correct. You can achieve proper locking in
> userspace, but I do not think you achieve proper security.

Well, I can't think of efficient inter process locks without
kernel/scheduler help.
Do you want to use a spinlock, with lowering priority or even sleep, use
a pipe/fifo/flock or waiting in sigsuspend? This all is implemented in
kernel space.
How would you implement entirely userspace locking?
With futexes (fast "userspace" locks) only the uncontented case is
handled
in userspace - if there's contention the process waits inside the
kernel - or
does get a notification from the kernel (AWAIT or FD)

> I assume you are proposing an implementation based on shared memory:
> which means that at least some pages of the shared memory must be
> writable. If the processes cooperate and only write to the shared pages
> through library routines which use sychronization, things are ok, but a
> malicious process could forge messages or perform DOS attacks etc. by
> bypassing the mq_*() functions and using write().

yes, of course that could be compromised by a process with the same uid.
This process could simply kill the other process too.
The shm_open() employs proper file system permission on the object.

> The mq_maxmsg and mq_msgsize members of the mq_attr structure required
> if O_CREAT is passed to mq_open() ensure that an implementation can
> prevent the kernel memory DoS you mention: a malicious application can
> only fill up the MQ memory.

And how many mqueues am I allowed to create?
You would need an extra resource limit for that.

2002-08-31 12:49:35

by Krzysztof Benedyczak

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Hello,

On Thu, 29 Aug 2002 [email protected] wrote:

> I know that it's nowhere stated, but POSIX mqueues are perfectly
> designed to be
> implemented in userspace with locking facilities provided by the system.
> ...
> with proper locking. I am not very happy about the fact, that with
> futexes the whole
> cooperating system get stuck when 1 process crashes inside a critical
> region
> (yes, then your system is screwed anyway).
> BUT the messages are not copied between user- and kernelspace like they
> are
> in SysV msgsnd.
Is coping between user and kernel spaces so bad? As you pointed
out there are problems with only user space implementation.

> POSIX mqueues have "kernel persistence", i.e. they live until
> mq_unlink() is called.
> They do not vanish with the creator on exit().
Yes. But I don't see what is wrong with our system? Our queues _don't_
vanish with creator exit. (Our add on to exit() (and fork) is to keep
track of processes that have opened mqueue. Then mq_unlink() can
postpone deleting queue to the time when it isn't opened by anyone)

> Without rlimits you can easily consume all available kernel memory (DoS)
> by creating
> a mqueue and filling it with garbage.
To this I answer in an answer to your next post :)

>
> When implemented in kernel space, you have to create a thread with the
> brand new
> sys_clone_startup (or whatever name it gets) as notification
> (SIGEV_THREAD) - which
> is SCOPE_SYSTEM, no control about this and not always what is desired.
I don't fully understand it. Can you explain it in more details?

Thanks

Krzysiek Benedyczak

2002-08-31 13:10:08

by Krzysztof Benedyczak

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sat, 31 Aug 2002 [email protected] wrote:
> > The mq_maxmsg and mq_msgsize members of the mq_attr structure required
> > if O_CREAT is passed to mq_open() ensure that an implementation can
> > prevent the kernel memory DoS you mention: a malicious application can
> > only fill up the MQ memory.
> And how many mqueues am I allowed to create?
> You would need an extra resource limit for that.

Some explanation about limits:
1) POSIX states about following limits:

You can specify when creating a new queue mq_maxmsg (max number of mes. in
this queue) and mq_msgsize (max mes. size). Values of those parameters are
limited by MQ_MAXMSG and MQ_MSGSIZE. Defaults are 40 and 16384, but you
can change them. Max number of queues (in system) is limited by MQ_MAX
(default=64). Anyway the problem is how to, keeping this constants at
sensible level, prevent from DOS. (40*16384*64= ca 40Mb)

So we added
2) non-POSIX limit:

MQ_MAXSYSSIZE which limits space used by all messages (NOT queues) system
wide. Maybe it isn't POSIX but useful I think. Default is 1Mb. It can be
given MQ_MAX*MQ_MAXMSG*MQ_MAXSIZE value; then it do nothing => and
you have only POSIX limits if you want so.

2002-08-31 13:24:21

by Krzysztof Benedyczak

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Hello,

On Tue, 27 Aug 2002, Christoph Hellwig wrote:
> The multiplexer syscall is horribly ugly. I'd suggest implementing it
> as filesystems so each message queue object can be represented as file,
> using defined file methods as much as possible.

It seems clever. In fact previous version used file representation in
very simple and rather undesirable way so we resigned from it. But we
can try to change it.

BTW two questions: who is IPC maintainer? Are there any chances to
incorporate mqueues into kernel?

Krzysztof Benedyczak

2002-09-01 05:20:03

by Amos Waterland

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

2002-09-01 06:48:12

by Amos Waterland

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sat, Aug 31, 2002 at 01:43:56PM +0200, [email protected] wrote:
> Am Freitag den, 30. August 2002, um 11:48, schrieb Amos Waterland:
>
> > On Thu, Aug 29, 2002 at 11:53:50PM +0200, [email protected] wrote:
> >> some comments as asked for:
> >>
> >> I know that it's nowhere stated, but POSIX mqueues are perfectly
> >> designed to be implemented in userspace with locking facilities
> >> provided by the system.
> >
> > I am not sure if this is correct. You can achieve proper locking in
> > userspace, but I do not think you achieve proper security.
>
> Well, I can't think of efficient inter process locks without
> kernel/scheduler help.
> Do you want to use a spinlock, with lowering priority or even sleep, use
> a pipe/fifo/flock or waiting in sigsuspend? This all is implemented in
> kernel space.
> How would you implement entirely userspace locking?
> With futexes (fast "userspace" locks) only the uncontented case is
> handled
> in userspace - if there's contention the process waits inside the
> kernel - or
> does get a notification from the kernel (AWAIT or FD)

I think we are agreeing: POSIX mqueues are not perfectly designed to be
implemented in userspace.

> > I assume you are proposing an implementation based on shared memory:
> > which means that at least some pages of the shared memory must be
> > writable. If the processes cooperate and only write to the shared pages
> > through library routines which use sychronization, things are ok, but a
> > malicious process could forge messages or perform DOS attacks etc. by
> > bypassing the mq_*() functions and using write().
>
> yes, of course that could be compromised by a process with the same uid.
> This process could simply kill the other process too.
> The shm_open() employs proper file system permission on the object.

No, it is more complicated than that. They can be compromised by an
arbitrary process if the permissions on the mq include S_IWOTH.

That is the fundamental problem with a userspace shared memory
implementation: write permissions on a message queue should grant
mq_send(), but write permissions on shared memory grant a lot more than
just that.

Amos Waterland

2002-09-01 07:20:06

by Jakub Jelinek

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sat, Aug 31, 2002 at 03:28:43PM +0200, Krzysztof Benedyczak wrote:
> Hello,
>
> On Tue, 27 Aug 2002, Christoph Hellwig wrote:
> > The multiplexer syscall is horribly ugly. I'd suggest implementing it
> > as filesystems so each message queue object can be represented as file,
> > using defined file methods as much as possible.
>
> It seems clever. In fact previous version used file representation in
> very simple and rather undesirable way so we resigned from it. But we
> can try to change it.

I have written MQ as filesystem about 2 years ago, see:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0011.2/0639.html
(though did not manage to update it since then).
Dunno if it is easier to update the patch or write it from scratch though...

Jakub

2002-09-04 11:04:41

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sun, 1 Sep 2002, Amos Waterland wrote:

> That is the fundamental problem with a userspace shared memory
> implementation: write permissions on a message queue should grant
> mq_send(), but write permissions on shared memory grant a lot more than
> just that.

is it really a problem? As long as the read and write queues are separated
per sender, all that can happen is that a sender is allowed to read his
own messages - that is not an exciting capability.

Ingo

2002-09-04 15:58:39

by Manfred Spraul

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Ingo wrote:
> On Sun, 1 Sep 2002, Amos Waterland wrote:
>
>> That is the fundamental problem with a userspace shared memory
>> implementation: write permissions on a message queue should grant
>> mq_send(), but write permissions on shared memory grant a lot more than
>> just that.
>
> is it really a problem? As long as the read and write queues are separated
> per sender, all that can happen is that a sender is allowed to read his
> own messages - that is not an exciting capability.
>
Messages with the same prio are ordered - a separated per sender queue
would break SuS.

--
Manfred

2002-09-06 20:53:43

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Hi!

> > That is the fundamental problem with a userspace shared memory
> > implementation: write permissions on a message queue should grant
> > mq_send(), but write permissions on shared memory grant a lot more than
> > just that.
>
> is it really a problem? As long as the read and write queues are separated
> per sender, all that can happen is that a sender is allowed to read his
> own messages - that is not an exciting capability.

Imagine something that writes data into the que then erases the data and
gets rid of setuid.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-09-06 22:49:37

by Amos Waterland

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Wed, Sep 04, 2002 at 01:13:28PM +0200, Ingo Molnar wrote:
>
> On Sun, 1 Sep 2002, Amos Waterland wrote:
>
> > That is the fundamental problem with a userspace shared memory
> > implementation: write permissions on a message queue should grant
> > mq_send(), but write permissions on shared memory grant a lot more than
> > just that.
>
> is it really a problem? As long as the read and write queues are separated
> per sender, all that can happen is that a sender is allowed to read his
> own messages - that is not an exciting capability.

Ingo:

I can see no way to keep the queues separated per sender if userspace
shared memory is used. The data structures used to keep track of the
messages must be writable by both the senders and receivers, because of
the kernel persistent nature of message queues: messages must stay in
the queue during the interval of arbitrary length between the sender
calling mq_send() and the receiver calling mq_receive().

That is, suppose process X posts message M, and then exits. Process Y
wants to receive message M, which means it must acquire a process-shared
lock and then remove M from the queue: so Y must be able to update the
data structures representing the queue. I see no way to allow Y to
update the data structures in shared memory without giving Y write
permission to the pages. If Y has write permission to the pages, it can
spoof messages/wreck the queue, etc. If you see a way around this,
please correct me. Thanks.

Ulrich/Jakub:

Is the above related to glibc's position that mq's will not go in until
there is kernel support? Thanks.

Amos Waterland

2002-09-07 15:04:48

by Peter Waechtler

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Am Freitag den, 6. September 2002, um 12:04, schrieb Pavel Machek:

> Hi!
>
>>> That is the fundamental problem with a userspace shared memory
>>> implementation: write permissions on a message queue should grant
>>> mq_send(), but write permissions on shared memory grant a lot more
>>> than
>>> just that.
>>
>> is it really a problem? As long as the read and write queues are
>> separated
>> per sender, all that can happen is that a sender is allowed to read his
>> own messages - that is not an exciting capability.
>
> Imagine something that writes data into the que then erases the data and
> gets rid of setuid.
>
Well, I can imagine that - but what do you mean by that?
Do you mean: replacing the data with shellcode, manipulating the length
field
for provoking buffer overflows?

2002-09-07 15:05:46

by Peter Waechtler

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Am Samstag den, 31. August 2002, um 14:53, schrieb Krzysztof Benedyczak:

> On Thu, 29 Aug 2002 [email protected] wrote:
>>
>> When implemented in kernel space, you have to create a thread with the
>> brand new
>> sys_clone_startup (or whatever name it gets) as notification
>> (SIGEV_THREAD) - which
>> is SCOPE_SYSTEM, no control about this and not always what is desired.
> I don't fully understand it. Can you explain it in more details?
>
Yes, sounds weird.

The app requests a SIGEV_THREAD for when a new message arrives.
It stores the threads function pointer into the structure that is past
into the kernel.

If you do not provide some sort of demultiplexer in userspace, the kernel
has to create the thread. But unlike fork() the thread is started
asyncronously
- no code in userspace is there to recognize that. With that the thread
scheduler
in userspace does not know about this thread.

If you want to create a "userspace thread", scheduled by NGPTs scheduler,
NGPT has to provide support for this. For this you would need a
registry, so when
the event is triggered (and a signal with siginfo_t sent to the thread
group)- a new
thread could be spawned by the NGPT scheduler itself.

2002-09-07 15:04:47

by Peter Waechtler

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

Am Samstag den, 7. September 2002, um 00:48, schrieb Amos Waterland:

> On Wed, Sep 04, 2002 at 01:13:28PM +0200, Ingo Molnar wrote:
>>
>> On Sun, 1 Sep 2002, Amos Waterland wrote:
>>
>>> That is the fundamental problem with a userspace shared memory
>>> implementation: write permissions on a message queue should grant
>>> mq_send(), but write permissions on shared memory grant a lot more
>>> than
>>> just that.
>>
>> is it really a problem? As long as the read and write queues are
>> separated
>> per sender, all that can happen is that a sender is allowed to read his
>> own messages - that is not an exciting capability.
>
> Ingo:
>
> I can see no way to keep the queues separated per sender if userspace
> shared memory is used. The data structures used to keep track of the
> messages must be writable by both the senders and receivers, because of
> the kernel persistent nature of message queues: messages must stay in
> the queue during the interval of arbitrary length between the sender
> calling mq_send() and the receiver calling mq_receive().

It would be really hard to fetch the oldest and highest priority message
out of several "per sender" queues. This effort is not worthwhile.
OTOH I can't see a _big_ problem when a process with sufficient
permissions
can trash the message queues - otherwise I wonder why file permissions
are granted "per user" and not "per process".
The apps processes are designed to cooperate and trust each other for
this.
The "application" would not work correctly if a message is not sent
anyway.

>
> Ulrich/Jakub:
>
> Is the above related to glibc's position that mq's will not go in until
> there is kernel support? Thanks.

I know this is no real argument, but: on IRIX 6.5 the POSIX mqueues are
implemented with shared mem and flocks in libc.
I have an almost working userspace implementation.

But I have another argument: speed and robustness.
I did some measurements with 32byte small messages and with 4KiB
messages, with the kernel implementation and with a userspace one:

userspace mqueue (NGPT PROCESS_SHARED MUTEX/COND)
==================================================
peewee:~/src/mqtest:>time ./uq_receive -c 99999 -q mmm >/dev/null
real 0m21.325s
user 0m9.140s
sys 0m8.340s
peewee:~/src/mqtest:>time ./uq_send -b 4092 -c 99999 mmm
real 0m21.039s
user 0m11.260s
sys 0m8.760s

kernel mqueue
==============
peewee:~/src/mqtest:>time ./mq_receive -c 99999 -q mmm >/dev/null
real 0m11.172s
user 0m0.260s
sys 0m7.130s
peewee:~/src/mqtest:>time ./mq_send -b 4092 -c 99999 mmm
real 0m10.880s
user 0m1.160s
sys 0m7.540s

The kernel one is about 2 times faster, regardless of message size!
I don't know yet where the user time is spent, I think it could be the
"slow" implementation of mutexes/condvars in NGPT.
I will retry the tests with futex-2.0 locks..

Then there are problems with the locks needed to protect the mq headers.
What happens when a signal arrives that causes the process to exit?
When I protect the calls with sigprocmask() there is even a higher
overhead in numbers of syscalls - if I don't protect, I get dangling
locks...

I therefore vote for using mqueue in kernel and share most of the code
between SysV and POSIX mqueues.

2002-09-07 15:10:39

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sat, 7 Sep 2002 [email protected] wrote:

> OTOH I can't see a _big_ problem when a process with sufficient
> permissions can trash the message queues - otherwise I wonder why file
> permissions are granted "per user" and not "per process".

yes - furthermore, processes from the same user can 'trash' queues anyway,
via ptrace() or mmaping /proc.

Ingo

2002-09-08 21:56:32

by Amos Waterland

[permalink] [raw]

Subject: Re: [PATCH] POSIX message queues

On Sat, Sep 07, 2002 at 05:17:35PM +0200, Ingo Molnar wrote:
>
> On Sat, 7 Sep 2002 [email protected] wrote:
>
> > OTOH I can't see a _big_ problem when a process with sufficient
> > permissions can trash the message queues - otherwise I wonder why file
> > permissions are granted "per user" and not "per process".
>
> yes - furthermore, processes from the same user can 'trash' queues anyway,
> via ptrace() or mmaping /proc.

That is correct, but it is not the issue though. The issue is that
completely unrelated processes can spoof/destroy each other's messages.

If a queue is set up with mq_open(name, O_CREAT|O_RDWR, S_IWOTH, &attr),
the process which set it up expects that "others" (processes not owned
by the user or by users in his/her group) will be able to send, and only
send, messages. If shared memory is used, "others" must be able to
update the data structures representing the queue, so they will be able
to do a lot more than just send.

The fundamental problem is that filesystem permissions do not map
cleanly to message queue permissions. Does this make sense? Thanks.

Amos Waterland