Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753442AbdHOTFm (ORCPT ); Tue, 15 Aug 2017 15:05:42 -0400 Received: from mga11.intel.com ([192.55.52.93]:63251 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752523AbdHOTFl (ORCPT ); Tue, 15 Aug 2017 15:05:41 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,379,1498546800"; d="scan'208";a="300513601" Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: Linus Torvalds , Andi Kleen Cc: Peter Zijlstra , Ingo Molnar , Kan Liang , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List References: <84c7f26182b7f4723c0fe3b34ba912a9de92b8b7.1502758114.git.tim.c.chen@linux.intel.com> <20170815022743.GB28715@tassilo.jf.intel.com> <20170815031524.GC28715@tassilo.jf.intel.com> From: Tim Chen Message-ID: <0b7b6132-a374-9636-53f9-c2e1dcec230f@linux.intel.com> Date: Tue, 15 Aug 2017 12:05:40 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2014 Lines: 50 On 08/14/2017 08:28 PM, Linus Torvalds wrote: > On Mon, Aug 14, 2017 at 8:15 PM, Andi Kleen wrote: >> But what should we do when some other (non page) wait queue runs into the >> same problem? > > Hopefully the same: root-cause it. > > Once you have a test-case, it should generally be fairly simple to do > with profiles, just seeing who the caller is when ttwu() (or whatever > it is that ends up being the most noticeable part of the wakeup chain) > shows up very heavily. We have a test case but it is a customer workload. We'll try to get a bit more info. > > And I think that ends up being true whether the "break up long chains" > patch goes in or not. Even if we end up allowing interrupts in the > middle, a long wait-queue is a problem. > > I think the "break up long chains" thing may be the right thing > against actual malicious attacks, but not for any actual real > benchmark or load. This is a concern from our customer as we could trigger the watchdog timer by running user space workloads. > > I don't think we normally have cases of long wait-queues, though. At > least not the kinds that cause problems. The real (and valid) > thundering herd cases should already be using exclusive waiters that > only wake up one process at a time. > > The page bit-waiting is hopefully special. As mentioned, we used to > have some _really_ special code for it for other reasons, and I > suspect you see this problem with them because we over-simplified it > from being a per-zone dynamically sized one (where the per-zone thing > caused both performance problems and actual bugs) to being that > "static small array". > > So I think/hope that just re-introducing some dynamic sizing will help > sufficiently, and that this really is an odd and unusual case. I agree that dynamic sizing makes a lot of sense. We'll check to see if additional size to the hash table helps, assuming that the waiters are distributed among different pages for our test case. Thanks. Tim