Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754022AbdH2QNc (ORCPT ); Tue, 29 Aug 2017 12:13:32 -0400 Received: from mga11.intel.com ([192.55.52.93]:63093 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752419AbdH2QNb (ORCPT ); Tue, 29 Aug 2017 12:13:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,445,1498546800"; d="scan'208";a="1008900648" Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit To: Linus Torvalds , "Liang, Kan" Cc: Mel Gorman , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , Christopher Lameter , "Eric W . Biederman" , Davidlohr Bueso , linux-mm , Linux Kernel Mailing List References: <83f675ad385d67760da4b99cd95ee912ca7c0b44.1503677178.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A07E9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A1C19@SHSMSX103.ccr.corp.intel.com> From: Tim Chen Message-ID: Date: Tue, 29 Aug 2017 09:13:29 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1082 Lines: 28 On 08/29/2017 09:01 AM, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 5:57 AM, Liang, Kan wrote: >>> >>> Attached is an ALMOST COMPLETELY UNTESTED forward-port of those two >>> patches, now without that nasty WQ_FLAG_ARRIVALS logic, because we now >>> always put the new entries at the end of the waitqueue. >> >> The patches fix the long wait issue. >> >> Tested-by: Kan Liang > > Ok. I'm not 100% comfortable applying them at rc7, so let me think > about it. There's only one known load triggering this, and by "known" > I mean "not really known" since we don't even know what the heck it > does outside of intel and whoever your customer is. > > So I suspect I'll apply the patches next merge window, and we can > maybe mark them for stable if this actually ends up mattering. > > Can you tell if the problem is actually hitting _production_ use or > was some kind of benchmark stress-test? > > It is affecting not a production use, but the customer's acceptance test for their systems. So I suspect it is a stress test. Tim