Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp62433ybx; Thu, 31 Oct 2019 15:59:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqwSIrqnpbdjtYaFyLDgbUukeZqHD2dFait6t1G/9OKqOq/pM7uPl4/RQ6D1n2NelubPVoLh X-Received: by 2002:a17:906:1f44:: with SMTP id d4mr6697567ejk.16.1572562778792; Thu, 31 Oct 2019 15:59:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572562778; cv=none; d=google.com; s=arc-20160816; b=hCEMvnj/pC0mgwR2TvoAYxW02ZlRTZvaMtyx5aZUHOiAiODX17hSZzEfqIRLa3Lpzl gblj8P8fwsNe7jeVCW5VnXrDe06YrsjJnpQ+ENwYZo/OGh3htGWlUTmbRl77b3zf0CMJ l+VZm4Rfg32oLKgE1O8GfsSoCmmeggaFeLaIrm2B0CDM+Ku7oc0D0KyMo04pRCoPBpW9 mFGGBCPsUkTPCtGF+aDiz78yuXvfGeSgx28hO1yuW5m85Kx/UK0JZM+AM8jeK4lNtC2Y eMldT7aP0m2H6PwfPTcDltU4sOd7+3ZRxdHKtLCj1uSl/wYqOKt5Si5TsCguUnxYZMWk vrqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=udr0eOlU4Fuqght38n0yVyOwpdXOT2p7u1QS5D24JmY=; b=CG+2LQLzP9Hn4xjvitU+4d0jfcZ2erEGZkNUtOa2bapyZuUav01XVdh/Sm9FreTi77 RU7FEiaQ1YU7TTLD1VjJGhV65GcojSwhD+Ncwog62C/WyoLZzETbmt0KX+8smTU5utks jQmPm3wFMFSwBvm1yhLipqmjUXk+xHK9dVlF7/G1/+QIa9akg7Gu/ytCYSyIBz+/wIcK 1z8w2WP+7zTSpKmsY3DNyvyYagu7KeO+UrEIXeUmUTolck7DEqQy6v6d/LFvcH2C5qKQ iqbmokRBhioTOJBylUEBljEYT0+MQ5X5lAzBhb14Bi2wulPYuOmyjfyi9vbULsQnCzNX g82g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u59si5989682edc.193.2019.10.31.15.59.15; Thu, 31 Oct 2019 15:59:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729232AbfJaWQJ (ORCPT + 99 others); Thu, 31 Oct 2019 18:16:09 -0400 Received: from foss.arm.com ([217.140.110.172]:56858 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726602AbfJaWQJ (ORCPT ); Thu, 31 Oct 2019 18:16:09 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C6201F1; Thu, 31 Oct 2019 15:16:08 -0700 (PDT) Received: from [10.188.222.161] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 441703F6C4; Thu, 31 Oct 2019 15:16:06 -0700 (PDT) Subject: Re: NULL pointer dereference in pick_next_task_fair To: Ram Muthiah , Quentin Perret Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, aaron.lwe@gmail.com, mingo@kernel.org, pauld@redhat.com, jdesfossez@digitalocean.com, naravamudan@digitalocean.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, juri.lelli@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, kernel-team@android.com, john.stultz@linaro.org References: <20191028174603.GA246917@google.com> <20191029113411.GP4643@worktop.programming.kicks-ass.net> <20191029115000.GA11194@google.com> From: Valentin Schneider Message-ID: Date: Thu, 31 Oct 2019 23:15:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/10/2019 23:50, Ram Muthiah wrote: > Quentin and I were able to create a setup which reproduces the issue. > > Given this, I tried Peter's proposed fix and was still able to reproduce the > issue unfortunately. Current patch is located here - > https://android-review.googlesource.com/c/kernel/common/+/1153487 > > Our mitigation for this issue on the android-mainline branch has been to > revert 67692435c411 ("sched: Rework pick_next_task() slow-path"). > https://android-review.googlesource.com/c/kernel/common/+/1152564 > Still no cigar, but one thing to note is that we got a similar splat on a Juno just yesterday, here it is fed through decode_stacktrace.sh: [ 22.930829] Mem abort info: [ 22.935904] ESR = 0x96000006 [ 22.938924] EC = 0x25: DABT (current EL), IL = 32 bits [ 22.944178] SET = 0, FnV = 0 [ 22.947197] EA = 0, S1PTW = 0 [ 22.950300] Data abort info: [ 22.953145] ISV = 0, ISS = 0x00000006 [ 22.956937] CM = 0, WnR = 0 [ 22.959870] user pgtable: 4k pages, 48-bit VAs, pgdp=00000009f3905000 [ 22.966243] [0000000000000040] pgd=00000009efa6e003, pud=00000009f41c3003, pmd=0000000000000000 [ 22.974858] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 22.980369] Modules linked in: tda998x drm_kms_helper drm crct10dif_ce ip_tables x_tables ipv6 nf_defrag_ipv6 [ 22.990200] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc2-00001-gaa57157be69f #1 [ 22.998036] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Oct 19 2018 [ 23.008710] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 23.013457] pc : set_next_entity (kernel/sched/fair.c:4156) [ 23.017511] lr : pick_next_task_fair (kernel/sched/fair.c:6829 (discriminator 1)) [ 23.021991] sp : ffff800011a13e10 [ 23.025267] x29: ffff800011a13e10 x28: 0000000000000000 [ 23.030525] x27: ffff800011a13f10 x26: ffff800010c205ec [ 23.035782] x25: ffff000975cea108 x24: ffff800011789000 [ 23.041038] x23: ffff8000113cf000 x22: ffff000975ce9b80 [ 23.046294] x21: ffff00097ef78d40 x20: ffff000974e21a00 [ 23.051550] x19: 0000000000000000 x18: 0000000000000000 [ 23.056806] x17: 0000000000000000 x16: 0000000000000000 [ 23.062062] x15: 0000000000000000 x14: 00000000000001ad [ 23.067317] x13: 0000000000000001 x12: 071c71c71c71c71c [ 23.072573] x11: 0000000000000500 x10: ffff00097ef77dc8 [ 23.077829] x9 : 0000000000000087 x8 : ffff00097ef77de8 [ 23.083085] x7 : 0000000000000003 x6 : 0000000000000000 [ 23.088340] x5 : 0000000000000000 x4 : 0000000000000000 [ 23.093596] x3 : 000000054f6e7800 x2 : 0000000000000000 [ 23.098851] x1 : 0000000000000000 x0 : ffff000974e21a00 [ 23.104107] Call trace: [ 23.106527] set_next_entity (kernel/sched/fair.c:4156) [ 23.110236] pick_next_task_fair (kernel/sched/fair.c:6829 (discriminator 1)) [ 23.114377] __schedule (kernel/sched/core.c:3920 kernel/sched/core.c:4039) [ 23.117828] schedule_idle (kernel/sched/core.c:4165 (discriminator 1)) [ 23.121365] do_idle (./arch/arm64/include/asm/current.h:19 ./arch/arm64/include/asm/preempt.h:31 kernel/sched/idle.c:275) [ 23.124557] cpu_startup_entry (kernel/sched/idle.c:355 (discriminator 1)) [ 23.128439] secondary_start_kernel (arch/arm64/kernel/smp.c:262) The faulty line is the very first se dereference in set_next_entity(). As Peter pointed out on IRC, this happens in the 'simple' path (prev is the idle task).