I posted yesterday about a problem in 2.4.0-test10 regarding *LONG*
stalls in 'ps' and 'vmstat'. After a conversation with Rik van Riel, it
seems that this may be caused by contention over the mmap_sem semaphore.
I have a question about the fairness of the semaphore implementation
that may be an explanation for the 'bug' that stops top and vmstat from
updating.
Assume some process, A, is constantly requiring some resource that's
protected by a semaphore, S. Assume also that the resource is not
available, and that A sleeps inside the kernel, waiting for the
resource, while holding S.
Assume also that some other process, B, is sleeping on aquiring S.
Is it possible for the following to happen repeatedly, keeping B from
ever aquiring S.
1) Resource becomes available.
2) A is 'runnable' and is given an entire timeslice.
3) schedule() to A
4) A releases S
5) A returns to userspace
6) A uses much less than entire timeslice doing calculation
7) A needs some resource again
7) A enters kernel and aquires S
8) A sleeps on resource, rest of timeslice not used, A's 'goodness'
isn't messed up.
9) goto 1.
In this scenario, as long as A never uses it's full timeslice, B will
never get to aquire S.
Specifically, A is some memory hogging program, B is 'ps'. S is the
mmap_sem and the 'resource' that A is constantly getting in trouble
about is memory (it enters the kernel via a page fault).
Can anyone explain why this wouldn't happen, and wouldn't cause infinite
starvation of B?
David Mansfield
On 2 Nov 00 at 13:51, David Mansfield wrote:
> Is it possible for the following to happen repeatedly, keeping B from
> ever aquiring S.
>
> 1) Resource becomes available.
> 2) A is 'runnable' and is given an entire timeslice.
> 3) schedule() to A
> 4) A releases S
> 5) A returns to userspace
> 6) A uses much less than entire timeslice doing calculation
> 7) A needs some resource again
> 7) A enters kernel and aquires S
> 8) A sleeps on resource, rest of timeslice not used, A's 'goodness'
> isn't messed up.
> 9) goto 1.
>
> In this scenario, as long as A never uses it's full timeslice, B will
> never get to aquire S.
Yes, it can happens. It for sure happens in ncpfs - as ncpfs uses
ping-pong protocol, and I'm lazy to use different thing than semaphore,
connection to server is guarded by semaphore.
If one task does long read/write, nobody else can perform any operation
on mountpoint (these processes are listed in "D", as usual). As soon as
copying task pauses (writting readed data to another FS, pagefault), other
of competing tasks starts... If you'll start two copies in parallel, each
task usually copies each file without any progress on other task. After
copying one file, other task starts copying...
For now I solved it by adding second processor into the box ;-)
But if anybody has easy (or nice) implementation of fair semaphore,
or an idea how to fix it, I'd like to put it into ncpfs...
Probably creating safe_semaphore by moving sem->count++ from up()
(when up does __up_wakup) to __down_failed could work. But it is not
nice, as implementation of this idea requires spinlock in safe_up()
fastpath...
Best regards,
Petr Vandrovec
[email protected]
On Thu, 2 Nov 2000, Petr Vandrovec wrote:
> Yes, it can happens. It for sure happens in ncpfs - as ncpfs uses
> ping-pong protocol, and I'm lazy to use different thing than semaphore,
> connection to server is guarded by semaphore.
If you use a rw semaphore taken for writing by two writers, it will
alternate between the two because the second writer will bias the lock
(its next state is predetermined when the second writer goes to sleep).
This is also true for the mix of reader -> writer and writer -> reader.
-ben