Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932113AbeAKCJF (ORCPT + 1 other); Wed, 10 Jan 2018 21:09:05 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:58534 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932069AbeAKCJD (ORCPT ); Wed, 10 Jan 2018 21:09:03 -0500 Subject: Re: [RFC PATCH V2] sched: Improve scalability of select_idle_sibling using SMT balance To: Steven Sistare Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, mingo@redhat.com, dhaval.giani@oracle.com References: <20180108221237.31761-1-subhra.mazumdar@oracle.com> <20180108221851.GV29822@worktop.programming.kicks-ass.net> <8d6b2a4a-6fc0-3dd9-1d47-3b1d0e5af066@oracle.com> From: Subhra Mazumdar Message-ID: Date: Wed, 10 Jan 2018 18:09:56 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <8d6b2a4a-6fc0-3dd9-1d47-3b1d0e5af066@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8770 signatures=668652 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=884 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801110022 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/09/2018 06:50 AM, Steven Sistare wrote: > On 1/8/2018 5:18 PM, Peter Zijlstra wrote: >> On Mon, Jan 08, 2018 at 02:12:37PM -0800, subhra mazumdar wrote: >>> @@ -2751,6 +2763,31 @@ context_switch(struct rq *rq, struct task_struct *prev, >>> struct task_struct *next, struct rq_flags *rf) >>> { >>> struct mm_struct *mm, *oldmm; >>> + int this_cpu = rq->cpu; >>> + struct sched_domain *sd; >>> + int prev_busy, next_busy; >>> + >>> + if (rq->curr_util == UTIL_UNINITIALIZED) >>> + prev_busy = 0; >>> + else >>> + prev_busy = (prev != rq->idle); >>> + next_busy = (next != rq->idle); >>> + >>> + /* >>> + * From sd_llc downward update the SMT utilization. >>> + * Skip the lowest level 0. >>> + */ >>> + sd = rcu_dereference_sched(per_cpu(sd_llc, this_cpu)); >>> + if (next_busy != prev_busy) { >>> + for_each_lower_domain(sd) { >>> + if (sd->level == 0) >>> + break; >>> + sd_context_switch(sd, rq, next_busy - prev_busy); >>> + } >>> + } >>> + >> No, we're not going to be adding atomic ops here. We've been arguing >> over adding a single memory barrier to this path, atomic are just not >> going to happen. >> >> Also this is entirely the wrong way to do this, we already have code >> paths that _know_ if they're going into or coming out of idle. > Yes, it would be more efficient to adjust the busy-cpu count of each level > of the hierarchy in pick_next_task_idle and put_prev_task_idle. OK, I have moved it to pick_next_task_idle/put_prev_task_idle. Will send out the v3. Thanks, Subhra > > - Steve