Received: by 10.192.165.148 with SMTP id m20csp851593imm; Wed, 25 Apr 2018 08:37:53 -0700 (PDT) X-Google-Smtp-Source: AIpwx48lcRDkhqkhBMSsLhOYtaXV/M0/qE5naU+ln93ESK5Tx+SYFYXb15xAUkfBs2XB3hvfvm3K X-Received: by 10.99.186.5 with SMTP id k5mr24040540pgf.39.1524670673901; Wed, 25 Apr 2018 08:37:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524670673; cv=none; d=google.com; s=arc-20160816; b=Vn40yKNEn02/V5Z4UJJ9s+HQeSASblDcaOfs+40QhiWqcjn+NkAnS4FXz5bGZ0DbB1 PGPb3Icy8SlFW4owoVioL6tfnG8FnqZu4ZXabC59Kv5IO68SLGnuw74CJ8N4huCGPM+5 5U7mQBcp+yjmj7qBIer6nKTdgYGfJdMOsN+V6WTwcMHz2TU9xnPoK+l7sVx+T1pfMBxx Yh+ITOGQQ6RK3ivx8F5XBTnnpK/X7h97E8cWVdntU/eRg3fEY9m9ZqJYgn9Kp5jn2s/8 zbs2N9nGB9UqYmic7v4O1ItaYVC4sWHvQNVUc8OxLgkFk1QLXOekSsuaULmejm7P9Lil 99Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=GbDVAHaOnY4G1yz9bA33nRsEj+3Qy9d9NBoN1/53F2g=; b=0klAUP93t8WWNNCP1eEkVbasV9t76vjQqQ7mapkbLeowYCRBG7cXThifVvlHBnFRHi 76Oa3mkY9J5LM7rXvU9Cas8LH0VXqWQqlWlvCJ8mzrZqO8+52zu7Vuy1AnkZ4/Tzrj+S iizqnfz+P7veyq77zcuDmIdpaO9nvNJ02tzYhp0qis776y0F/GOPUxwj6/z3e2q/s9ow QxWoK8LozDtlHxFUe/TcH+KVjSR4JLt4hbWKceR/03y3mjd3/StXFW+eZm3BUvHPhL5W dVfJrcNy5KYOC9PtXPgp0TBwyUfh4C9fNRcIzDlaylO6IP//HNZf3K51KmO25V9EJ2TB x05A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=nLVjV6XN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z9si2592831pgu.450.2018.04.25.08.37.39; Wed, 25 Apr 2018 08:37:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=nLVjV6XN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755010AbeDYPgR (ORCPT + 99 others); Wed, 25 Apr 2018 11:36:17 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:34400 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754395AbeDYPgP (ORCPT ); Wed, 25 Apr 2018 11:36:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=GbDVAHaOnY4G1yz9bA33nRsEj+3Qy9d9NBoN1/53F2g=; b=nLVjV6XNR9d4Y6xjEr/kGNOk6 5yZYmCx68yM6sM3GmRSQyDv96VWYbX/t0wQZvBpjrnSs9uSlWFg+OEQ6x8MwiNpvXVeIEP4JE3lAR okSTrWldFU0ygVjgiJa+n44mXei54Pf+KT1Zka3Sdykv/yu1VOpbEA3a8ZkbIYgz7gwR+VON4f6/z fkkL63dfgHKNY794Jjk/ByOmN9u5HQ9L8WR/ZE8hfuyYvIVmtbQlJssT0rfxyNfs2oY92lHw+pu3R /IAm0W1QJ626ttZmNdX3gwBEvgluegqvKI7idAT7Pm2FLXptf2tX7ZsgD1MByvUeATD6fXTYJ6Ecz KyZHn/Dnw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fBMSg-0005nR-HO; Wed, 25 Apr 2018 15:36:02 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id EF1532029FD46; Wed, 25 Apr 2018 17:36:00 +0200 (CEST) Date: Wed, 25 Apr 2018 17:36:00 +0200 From: Peter Zijlstra To: Subhra Mazumdar Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, daniel.lezcano@linaro.org, steven.sistare@oracle.com, dhaval.giani@oracle.com, rohit.k.jain@oracle.com Subject: Re: [PATCH 3/3] sched: limit cpu search and rotate search window for scalability Message-ID: <20180425153600.GA4043@hirez.programming.kicks-ass.net> References: <20180424004116.28151-1-subhra.mazumdar@oracle.com> <20180424004116.28151-4-subhra.mazumdar@oracle.com> <20180424125349.GU4082@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 24, 2018 at 05:10:34PM -0700, Subhra Mazumdar wrote: > On 04/24/2018 05:53 AM, Peter Zijlstra wrote: > > Why do you need to put a max on? Why isn't the proportional thing > > working as is? (is the average no good because of big variance or what) > Firstly the choosing of 512 seems arbitrary. It is; it is a crud attempt to deal with big variance. The comment says as much. > Secondly the logic here is that the enqueuing cpu should search up to > time it can get work itself. Why is that the optimal amount to > search? 1/512-th of the time in fact, per the above random number, but yes. Because searching for longer than we're expecting to be idle for is clearly bad, at that point we're inhibiting doing useful work. But while thinking about all this, I think I've spotted a few more issues, aside from the variance: Firstly, while avg_idle estimates the average duration for _when_ we go idle, it doesn't give a good measure when we do not in fact go idle. So when we get wakeups while fully busy, avg_idle is a poor measure. Secondly, the number of wakeups performed is also important. If we have a lot of wakeups, we need to look at aggregate wakeup time over a period. Not just single wakeup time. And thirdly, we're sharing the idle duration with newidle balance. And I think the 512 is a result of me not having recognised these additional issues when looking at the traces, I saw variance and left it there. This leaves me thinking we need a better estimator for wakeups. Because if there really is significant idle time, not looking for idle CPUs to run on is bad. Placing that upper limit, especially such a low one, is just an indication of failure. I'll see if I can come up with something.