Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752576AbdHRU34 (ORCPT ); Fri, 18 Aug 2017 16:29:56 -0400 Received: from mail-oi0-f41.google.com ([209.85.218.41]:36093 "EHLO mail-oi0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752256AbdHRU3y (ORCPT ); Fri, 18 Aug 2017 16:29:54 -0400 MIME-Version: 1.0 In-Reply-To: <20170818200510.GQ28715@tassilo.jf.intel.com> References: <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> <20170818122339.24grcbzyhnzmr4qw@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> <20170818200510.GQ28715@tassilo.jf.intel.com> From: Linus Torvalds Date: Fri, 18 Aug 2017 13:29:53 -0700 X-Google-Sender-Auth: _eH_XsxQ8iCueRm6o8UANZk-4h8 Message-ID: Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: Andi Kleen Cc: "Liang, Kan" , Mel Gorman , Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Peter Zijlstra , Ingo Molnar , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1673 Lines: 40 On Fri, Aug 18, 2017 at 1:05 PM, Andi Kleen wrote: > > I think what's happening is that it allows more parallelism during wakeup: > > Normally it's like > > CPU 1 CPU 2 CPU 3 ..... > > LOCK > wake up tasks on other CPUs woken up woken up > UNLOCK SPIN on waitq lock SPIN on waitq lock Hmm. The processes that are woken up shouldn't need to touch the waitq lock after wakeup. The default "autoremove_wake_function()" does the wait list removal, so if you just use the normal wait/wakeup, you're all done an don't need to do anythig more. That's very much by design. In fact, it's why "finish_wait()" uses that "list_empty_careful()" thing on the entry - exactly so that it only needs to take the wait queue lock if it is still on the wait list (ie it was woken up by something else). Now, it *is* racy, in the sense that the autoremove_wake_function() will remove the entry *after* having successfully woken up the process, so with bad luck and a quick wakeup, the woken process may not see the good list_empty_careful() case. So we really *should* do the remove earlier inside the pi_lock region in ttwu(). We don't have that kind of interface, though. If you actually do see tasks getting stuck on the waitqueue lock after being woken up, it might be worth looking at, though. The other possibility is that you were looking at cases that didn't use "autoremove_wake_function()" at all, of course. Maybe they are worth fixing. The autoremval really does make a difference, exactly because of the issue you point to. Linus