Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751281AbdFESdx (ORCPT ); Mon, 5 Jun 2017 14:33:53 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:40130 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751173AbdFESdw (ORCPT ); Mon, 5 Jun 2017 14:33:52 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org B814B607BC Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=jhugo@codeaurora.org Subject: Re: [PATCH V4 1/2] sched/fair: Fix load_balance() affinity redo path From: Jeffrey Hugo To: Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org Cc: Dietmar Eggemann , Austin Christ , Tyler Baicar , Timur Tabi References: <1496442432-330-1-git-send-email-jhugo@codeaurora.org> <1496442432-330-2-git-send-email-jhugo@codeaurora.org> <783f8711-0f90-9480-2b76-b1ffcf4c34a8@codeaurora.org> Message-ID: Date: Mon, 5 Jun 2017 12:33:48 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <783f8711-0f90-9480-2b76-b1ffcf4c34a8@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2299 Lines: 58 On 6/5/2017 11:23 AM, Jeffrey Hugo wrote: > On 6/2/2017 4:27 PM, Jeffrey Hugo wrote: >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index d711093..84255ab 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6737,10 +6737,10 @@ int can_migrate_task(struct task_struct *p, >> struct lb_env *env) >> * our sched_group. We may want to revisit it if we couldn't >> * meet load balance goals by pulling other tasks on src_cpu. >> * >> - * Also avoid computing new_dst_cpu if we have already computed >> - * one in current iteration. >> + * Avoid computing new_dst_cpu for NEWLY_IDLE or if we have >> + * already computed one in current iteration. >> */ >> - if (!env->dst_grpmask || (env->flags & LBF_DST_PINNED)) >> + if (env->idle == CPU_NEWLY_IDLE || (env->flags & >> LBF_DST_PINNED)) >> return 0; > > Self NACK. This breaks active_load_balance_cpu_stop(). Looks like > env->idle == CPU_IDLE, but env->dst_grpmask is uninitialized, so it can > be NULL, which causes a null pointer dereference a few lines later. > > I'm still having a look to see what makes sense to address the issue. > > As far as I can see, there appears to be two options to resolve the issue - 1. Update active_load_balance_cpu_stop() to initialize dst_grpmask to a sane value 2. Undo the proposed changes in load_balance() to "ensure" dst_grpmask is valid, and calculate the value on demand when checking to see if the redo path needs to be done. The downside to #1 is that dst_grpmask is not needed in the active_load_balance_cpu_stop() path, and the loop to calculate a new dst_cpu will be used. Extra code is evaluated, but there appears to be no side effects. The downside to #2 is that dst_grpmask is valid the majority of the time in load_balance(), so calculating it on demand is redundant most of the time, but again there appears to be no side effects. It somewhat feels like a choice of which option is less bad. Peter/Dietmar, any preferences? -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.