Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752162AbaLCUkr (ORCPT ); Wed, 3 Dec 2014 15:40:47 -0500 Received: from relay3.sgi.com ([192.48.152.1]:54727 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751178AbaLCUkq (ORCPT ); Wed, 3 Dec 2014 15:40:46 -0500 Date: Wed, 3 Dec 2014 14:40:49 -0600 From: Alex Thorlton To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Peter Zijlstra , Fabian Frederick , Ingo Molnar , Alex Thorlton , Russ Anderson , linux-kernel@vger.kernel.org Subject: [BUG] Possible locking issues in stop_machine code on 6k core machine Message-ID: <20141203204048.GJ4720@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey guys, While working to get our newly upgraded 6k core machine online, we've discovered a few possible locking issues in the stop_machine code that we're trying to get sorted out. (We think) the problems we're seeing stem from possible interaction between stop_cpus and stop_one_cpu. The issue presents as a deadlock, and seems to only show itself intermittently. After quite a bit of debugging we think we've narrowed the issue down to the fact that stop_one_cpu does not respect many of the locks that are taken in the stop_cpus code path. For reference the stop_cpus code path takes the stop_cpus_mutex, then stop_cpus_lock, and then takes each cpu's stopper->lock. stop_one_cpu seems to rely solely on the stopper->lock. What appears to be happening to cause our deadlock is, stop_cpus works its way down to queue_stop_cpus_work, which tells each cpu's stopper task to wake up, take its lock, and do its work. As the loop that does this progresses, the lowest numbered cpus complete their work, and are allowed to go on about their business. The problem occurs when one of these lower numbered cpus calls stop_one_cpu, targeting one of the higher numbered cpus, which the stop_cpus loop has not yet reached. If this happens, that higher numbered cpu's completion variable will get stomped on, and the wait_for_completion in the stop_cpus code path will never return. A quick example: CPU 0 calls stop_cpus, which will hit all 6,000 cores. CPU 50 completes its stopper work, and at some point in the near future calls stop_one_cpu on CPU 5000. This clobbers CPU 5000's pointer to the cpu_stop_done struct set up in queue_stop_cpus_work, meaning that, once CPU 5000 completes its work, it won't be able to decrement the nr_todo for the correct cpu_stop_done struct, and CPU 0's wait_for_completion will never return. Again, much of this is semi-educated guesswork, put together based on information gathered from examining lots of debug output, in an attempt to spot the problem. We're fairly certain that we've pinned down our issue, but we'd like to ask those who are more knowledgeable of these code paths to weigh in their opinions here. We'd really appreciate any help that anyone can offer. Thanks! - Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/