Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752801AbdHRUFO (ORCPT ); Fri, 18 Aug 2017 16:05:14 -0400 Received: from mga01.intel.com ([192.55.52.88]:26698 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752772AbdHRUFK (ORCPT ); Fri, 18 Aug 2017 16:05:10 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,393,1498546800"; d="scan'208";a="120513789" Date: Fri, 18 Aug 2017 13:05:10 -0700 From: Andi Kleen To: Linus Torvalds Cc: "Liang, Kan" , Mel Gorman , Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Peter Zijlstra , Ingo Molnar , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk Message-ID: <20170818200510.GQ28715@tassilo.jf.intel.com> References: <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> <20170818122339.24grcbzyhnzmr4qw@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1354 Lines: 35 > I was really hoping that we'd root-cause this and have a solution (and > then apply Tim's patch as a "belt and suspenders" kind of thing), but One thing I wanted to point out is that Tim's patch seems to make several schedule intensive micro benchmarks faster. I think what's happening is that it allows more parallelism during wakeup: Normally it's like CPU 1 CPU 2 CPU 3 ..... LOCK wake up tasks on other CPUs woken up woken up UNLOCK SPIN on waitq lock SPIN on waitq lock LOCK remove waitq UNLOCk LOCK remove waitq UNLOCK So everything is serialized. But with the bookmark patch the other CPUs can go through the "remove waitq" sequence earlier because they have a chance to get a go at the lock and do it in parallel with the main wakeup. Tim used a 64 task threshold for the bookmark. That may be actually too large. It may even be faster to use a shorter one. So I think it's more than a bandaid, but likely a useful performance improvement even for less extreme wait queues. -Andi