Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754178AbbBKT7J (ORCPT ); Wed, 11 Feb 2015 14:59:09 -0500 Received: from mail-ig0-f172.google.com ([209.85.213.172]:40440 "EHLO mail-ig0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753004AbbBKT7H (ORCPT ); Wed, 11 Feb 2015 14:59:07 -0500 MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 11 Feb 2015 11:59:06 -0800 X-Google-Sender-Auth: PDrhCKXIzk8Y-D5sEpkhvsV0Fgg Message-ID: Subject: Re: smp_call_function_single lockups From: Linus Torvalds To: Rafael David Tinoco Cc: LKML , Thomas Gleixner , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2217 Lines: 52 On Wed, Feb 11, 2015 at 10:18 AM, Linus Torvalds wrote: > > I'll think about this all, but we couldn't figure anything out last > time we looked at it, so without more clues, don't hold your breath. So having looked at it once more, one thing struck me: Look at smp_call_function_single_async(). The comment says * Like smp_call_function_single(), but the call is asynchonous and * can thus be done from contexts with disabled interrupts. but that is *only* true if we don't have to wait for the csd lock. The comments even clarify that: * The caller passes his own pre-allocated data structure * (ie: embedded in an object) and is responsible for synchronizing it * such that the IPIs performed on the @csd are strictly serialized. but it's not at all clear that the caller *can* do that. Since the "csd_unlock()" is done *after* the call to the callback function, any serialization done by the caller is fundamentally not trustworthy, since it cannot serialize with the csd lock - if it releases things in the callback, the csd lock will still be set after releasing things. So the caller has a really hard time guaranteeing that CSD_LOCK isn't set. And if the call is done in interrupt context, for all we know it is interrupting the code that is going to clear CSD_LOCK, so CSD_LOCK will never be cleared at all, and csd_lock() will wait forever. So I actually think that for the async case, we really *should* unlock before doing the callback (which is what Thomas' old patch did). And we migth well be better off doing something like WARN_ON_ONCE(csd->flags & CSD_LOCK); in smp_call_function_single_async(), because that really is a hard requirement. And it strikes me that hrtick_csd is one of these cases that do this with interrupts disabled, and use the callback for serialization. So I really wonder if this is part of the problem.. Thomas? Am I missing something? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/