2002-08-30 21:01:10

by Anton Lavrentiev

[permalink] [raw]
Subject: BUG:: IPC/semop clobbers PID of the last process performed semop() on the semaphore

Dear Linux Developers:

I reported this bug a while ago but seems that it is still there.

If "wait for zero" on the IPC semaphore is going to be blocked,
then it clobbers PID of the last process, which operated on
semaphore, actually reverting the PID to be the PID of the last but
one process that made an operation.

In particular, if the syscall was interrupted by a signal, restarted,
and interrupted again, the PID of last process operated on the semaphore
(obtainable via semctl(...GETPID...)) becomes unconditionally zero.

The bug comes from the following fragment of file (look at the lines
marked with exclamation points)

ipc/sem.c, try_atomic_semop():
---------------------------------------------------------------------------
if (!sem_op && curr->semval) /*!!!!!!*/
goto would_block;

curr->sempid = (curr->sempid << 16) | pid; /*!!!!!!*/

........

would_block:
if (sop->sem_flg & IPC_NOWAIT)
result = -EAGAIN;
else
result = 1;

undo:
while (sop >= sops) {
curr = sma->sem_base + sop->sem_num;
curr->semval -= sop->sem_op;
curr->sempid >>= 16; /*!!!!!!*/
---------------------------------------------------------------------------

The simplest fix (that works!) is just to swap the "wait for zero" condition
and PID backup, like this:

curr->sempid = (curr->sempid << 16) | pid;

if (!sem_op && curr->semval)
goto would_block;


Best regards,

Anton Lavrentiev
NCBI/NLM/NIH
Bethesda MD 20894


P.S. There is a potential bug in the very idea of temporary saving of PID in the
spare most significant word of "curr->sempid": if the user requests more that one
operation on the same semaphore in the same "sem_op" block, and the fact of blocking
is not figured out at the first operation, but later, than backup in 16 upper
bits will not work, because there will be two (or more) 16-bit rshifts in restore
operation, effectively zeroing "curr->sempid". On the other hand, this is a very
odd (and rather "theoretical") situation, cause normally the fact of blocking
is checked by "sem_op" first, prior to doing something else without further blocking.