Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1303192ybk; Thu, 21 May 2020 03:40:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxkwhJK/mdOwvDUmk+VHQ52qJ34O5dwHYgzbqNyTYE2mL2EP4FnrWhxtUjlG5bMJj0xtE2V X-Received: by 2002:a50:be8f:: with SMTP id b15mr6990357edk.182.1590057643755; Thu, 21 May 2020 03:40:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590057643; cv=none; d=google.com; s=arc-20160816; b=CprEh30dGsZBCva3AKlKxxxvZmdQNwhk/ygtSdWHgY+oir5gtnUo2iM+XEysRREEgh ohycSpBs0YXDIzuAD6v2BnvhwEcvPoTkKrUDH9KjFrGxPX1tKw5Th28Xh2j1Ukrwjcx/ HJ1LjxEuwjPDMwOib1iKN9cs3lwpum7VcRh8RhXS8vE4pTNdVshalRCLUyL7hZOIdwgI ZelkGw90rep64/HVRPy9oUelXTfOGzk4lmpDA9/8iVBQtxTB5uMghZUlwx029/5RU6IU 0x9qe3I8aa15ygM//XWZw/zWS2TlUdAhbXj/Yd0lRdGloFwlDkI/AlGhDO6eUywdNaRo 9wLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=TfvAGowDk+lnh+ayXjgCyGn2pv9xvE8s8AhW5TgZbIA=; b=uCRrhk2952LmlLCARX1zxEsLEsQxoYXDuZIWZlxw1FDLTmN5Ftuhqzl00dt9vId22X ozWlXHmVtmRxluIv4sgaaTk0ofrG855T4xPM0N9GNs0RNPMQOomyu03saqtl6c1jDIvD RaObRWBMSKPrw3WoQQNf6FR8hKQB0hQlV2bx1LoEvxU3MNF2IHksc2TT8pw3fvG3zXMT Umi3n8jJZp/EER3PWIPSEzUephfSK1GtrAt9MMbnoEjzPOMyIVXALrp6AGqoVooxFC2f auATi3bbwNttIG+PYPskHCpQmi8Fc++8r+pRkYH4Jj0x1Vi2l5wn7CbnGiC+AskmqCCE vLEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qx7si3139114ejb.526.2020.05.21.03.40.19; Thu, 21 May 2020 03:40:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728391AbgEUKiZ (ORCPT + 99 others); Thu, 21 May 2020 06:38:25 -0400 Received: from outbound-smtp31.blacknight.com ([81.17.249.62]:48138 "EHLO outbound-smtp31.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726871AbgEUKiY (ORCPT ); Thu, 21 May 2020 06:38:24 -0400 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp31.blacknight.com (Postfix) with ESMTPS id 5045DC0BCB for ; Thu, 21 May 2020 11:38:22 +0100 (IST) Received: (qmail 19210 invoked from network); 21 May 2020 10:38:22 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 21 May 2020 10:38:21 -0000 Date: Thu, 21 May 2020 11:38:16 +0100 From: Mel Gorman To: Peter Zijlstra Cc: Jirka Hladky , Phil Auld , Ingo Molnar , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Valentin Schneider , Hillf Danton , LKML , Douglas Shakshober , Waiman Long , Joe Mario , Bill Gray , riel@surriel.com Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Message-ID: <20200521103816.GA7167@techsingularity.net> References: <20200507155422.GD3758@techsingularity.net> <20200508092212.GE3758@techsingularity.net> <20200513153023.GF3758@techsingularity.net> <20200514153122.GE2978@hirez.programming.kicks-ass.net> <20200515084740.GJ3758@techsingularity.net> <20200515111732.GS2957@hirez.programming.kicks-ass.net> <20200515142444.GK3001@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20200515142444.GK3001@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 15, 2020 at 04:24:44PM +0200, Peter Zijlstra wrote: > On Fri, May 15, 2020 at 01:17:32PM +0200, Peter Zijlstra wrote: > > On Fri, May 15, 2020 at 09:47:40AM +0100, Mel Gorman wrote: > > > > > However, the wakeups are so rapid that the wakeup > > > happens while the server is descheduling. That forces the waker to spin > > > on smp_cond_load_acquire for longer. In this case, it can be cheaper to > > > add the task to the rq->wake_list even if that potentially requires an IPI. > > > > Right, I think Rik ran into that as well at some point. He wanted to > > make ->on_cpu do a hand-off, but simply queueing the wakeup on the prev > > cpu (which is currently in the middle of schedule()) should be an easier > > proposition. > > > > Maybe something like this untested thing... could explode most mighty, > > didn't thing too hard. > > > > Mel pointed out that that patch got mutilated somewhere (my own .Sent > copy was fine), let me try again. > Sorry for the slow response. My primary work machine suffered a catastrophic failure on Sunday night which is a fantastic way to start a Monday morning so I'm playing catchup. IIUC, this patch front-loads as much work as possible before checking if the task is on_rq and then the waker/wakee shares a cache, queue task on the wake_list and otherwise do a direct wakeup. The advantage is that spinning is avoided on p->on_rq when p does not share a cache. The disadvantage is that it may result in tasks being stacked but this should only happen when the domain is overloaded and select_task_eq() is unlikely to find an idle CPU. The load balancer would soon correct the situation anyway. In terms of netperf for my testing, the benefit is marginal because the wakeups are primarily between tasks that share cache. It does trigger as perf indicates that some time is spent in ttwu_queue_remote with this patch, it's just that the overall time spent spinning on p->on_rq is very similar. I'm still waiting on other workloads to complete to see what the impact is. However, intuitively at least, it makes sense to avoid spinning on p->on_rq when it's unnecessary and the other changes appear to be safe. Even if wake_list should be used in some cases for local wakeups, it would make sense to put that on top of this patch. Do you want to slap a changelog around this and update the comments or do you want me to do it? I should have more results in a few hours even if they are limited to one machine but ideally Rik would test his workload too. -- Mel Gorman SUSE Labs