Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932187Ab3IBQ3t (ORCPT ); Mon, 2 Sep 2013 12:29:49 -0400 Received: from mail-bk0-f44.google.com ([209.85.214.44]:33143 "EHLO mail-bk0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756270Ab3IBQ3s (ORCPT ); Mon, 2 Sep 2013 12:29:48 -0400 Message-ID: <5224BCF6.2080401@colorfullife.com> Date: Mon, 02 Sep 2013 18:29:42 +0200 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Vineet Gupta CC: Linus Torvalds , Davidlohr Bueso , Sedat Dilek , Davidlohr Bueso , linux-next , LKML , Stephen Rothwell , Andrew Morton , linux-mm , Andi Kleen , Rik van Riel , Jonathan Gonzalez Subject: Re: ipc-msg broken again on 3.11-rc7? References: <1372192414.1888.8.camel@buesod1.americas.hpqcorp.net> <1372202983.1888.22.camel@buesod1.americas.hpqcorp.net> <521DE5D7.4040305@synopsys.com> <52205597.3090609@synopsys.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2940 Lines: 76 Hi, [forgot to cc everyone, thus I'll summarize some mails...] On 09/02/2013 06:58 AM, Vineet Gupta wrote: > On 08/31/2013 11:20 PM, Linus Torvalds wrote: >> Vineet, actual patch for what Davidlohr suggests attached. Can you try it? >> >> Linus > Apologies for late in getting back to this - I was away from my computer for a bit. > > Unfortunately, with a quick test, this patch doesn't help. > FWIW, this is latest mainline (.config attached). > > Let me know what diagnostics I can add to help with this. msgctl08 is a bulk message send/receive test. I had to look at it once before, then it was a broken hardware: https://lkml.org/lkml/2008/6/12/365 This can be ruled out, because it works with 3.10. msgctl08 uses pairs of threads: one thread does msgsnd(), the other one msgrcv(). There is no synchronization, i.e. the msgsnd() can race ahead until the kernel buffer is full and then a block with msgrcv() follows or it could be pairs of alternating msgsnd()/msgrcv() operations. No special features are used: each pair of threads has it's own message queues, all messages have type=1. Vineet ran strace - and just before the signal from killing msgctl08, there are only msgsnd()/msgrcv() calls. Vineet: a) could you run strace tomorrow again, with '-ttt' as an additional option? I don't see where exactly it hangs. b) Could you check that it is not just a performance regression? Does ./msgctl08 1000 16 hang, too? In ipc/msg.c, I haven't seen any obvious reason why it should hang. The only race I spotted so far is this one: > for (;;) { > struct msg_sender s; > > err = -EACCES; > if (ipcperms(ns, &msq->q_perm, S_IWUGO)) > goto out_unlock1; > > err = security_msg_queue_msgsnd(msq, msg, msgflg); > if (err) > goto out_unlock1; > > if (msgsz + msq->q_cbytes <= msq->q_qbytes && > 1 + msq->q_qnum <= msq->q_qbytes) { > break; > } > [snip] > if (!pipelined_send(msq, msg)) { > /* no one is waiting for this message, enqueue it */ > list_add_tail(&msg->m_list, &msq->q_messages); > msq->q_cbytes += msgsz; > msq->q_qnum++; > atomic_add(msgsz, &ns->msg_bytes); The access to msq->q_cbytes is not protected. Thus two parallel msgsnd() calls could succeed, even if both together brings the queue length above the limit. But it can't explain why 3.11-rc7 hangs: As explained above, msgctl08 uses one queue for each thread pair. -- Manfred -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/