On 09/02/2013 09:59 PM, Manfred Spraul wrote:
> Hi,
>
> [forgot to cc everyone, thus I'll summarize some mails...]
> On 09/02/2013 06:58 AM, Vineet Gupta wrote:
>> On 08/31/2013 11:20 PM, Linus Torvalds wrote:
>>> Vineet, actual patch for what Davidlohr suggests attached. Can you try it?
>>>
>>> Linus
>> Apologies for late in getting back to this - I was away from my computer for a bit.
>>
>> Unfortunately, with a quick test, this patch doesn't help.
>> FWIW, this is latest mainline (.config attached).
>>
>> Let me know what diagnostics I can add to help with this.
> msgctl08 is a bulk message send/receive test. I had to look at it once
> before, then it was a broken hardware:
> https://lkml.org/lkml/2008/6/12/365
> This can be ruled out, because it works with 3.10.
>
> msgctl08 uses pairs of threads: one thread does msgsnd(), the other one
> msgrcv().
> There is no synchronization, i.e. the msgsnd() can race ahead until the
> kernel buffer is full and then a block with msgrcv() follows or it could
> be pairs of alternating msgsnd()/msgrcv() operations.
> No special features are used: each pair of threads has it's own message
> queues, all messages have type=1.
>
> Vineet ran strace - and just before the signal from killing msgctl08,
> there are only msgsnd()/msgrcv() calls.
> Vineet:
> a) could you run strace tomorrow again, with '-ttt' as an additional
> option? I don't see where exactly it hangs.
Yet to do this.
> b) Could you check that it is not just a performance regression?
> Does ./msgctl08 1000 16 hang, too?
Nope that doesn't hang. The minimal configuration that hangs reliably is msgctl
50000 2
With this config there are 3 processes.
...
555 554 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
554 551 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
551 496 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
...
[ARCLinux]$ cat /proc/551/stack
[<80aec3c6>] do_wait+0xa02/0xc94
[<80aecad2>] SyS_wait4+0x52/0xa4
[<80ae24fc>] ret_from_system_call+0x0/0x4
[ARCLinux]$ cat /proc/555/stack
[<80c2950e>] SyS_msgrcv+0x252/0x420
[<80ae24fc>] ret_from_system_call+0x0/0x4
[ARCLinux]$ cat /proc/554/stack
[<80c28c82>] do_msgsnd+0x116/0x35c
[<80ae24fc>] ret_from_system_call+0x0/0x4
Is this a case of lost wakeup or some such. I'm running with some more diagnostics
and will report soon ...
-Vineet
On 09/03/2013 10:44 AM, Vineet Gupta wrote:
>> b) Could you check that it is not just a performance regression?
>> Does ./msgctl08 1000 16 hang, too?
> Nope that doesn't hang. The minimal configuration that hangs reliably is msgctl
> 50000 2
>
> With this config there are 3 processes.
> ...
> 555 554 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
> 554 551 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
> 551 496 root S 1208 0.4 0 0.0 ./msgctl08 50000 2
> ...
>
> [ARCLinux]$ cat /proc/551/stack
> [<80aec3c6>] do_wait+0xa02/0xc94
> [<80aecad2>] SyS_wait4+0x52/0xa4
> [<80ae24fc>] ret_from_system_call+0x0/0x4
>
> [ARCLinux]$ cat /proc/555/stack
> [<80c2950e>] SyS_msgrcv+0x252/0x420
> [<80ae24fc>] ret_from_system_call+0x0/0x4
>
> [ARCLinux]$ cat /proc/554/stack
> [<80c28c82>] do_msgsnd+0x116/0x35c
> [<80ae24fc>] ret_from_system_call+0x0/0x4
>
> Is this a case of lost wakeup or some such. I'm running with some more diagnostics
> and will report soon ...
What is the output of ipcs -q? Is the queue full or empty when it hangs?
I.e. do we forget to wake up a receiver or forget to wake up a sender?
--
Manfred