Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751804AbdHAJUO (ORCPT ); Tue, 1 Aug 2017 05:20:14 -0400 Received: from mail-it0-f53.google.com ([209.85.214.53]:37286 "EHLO mail-it0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbdHAJUM (ORCPT ); Tue, 1 Aug 2017 05:20:12 -0400 MIME-Version: 1.0 In-Reply-To: <20170801091213.mcygpbrf3c5c5qf5@hirez.programming.kicks-ass.net> References: <1501581716-8608-1-git-send-email-laoar.shao@gmail.com> <20170801083812.GH6524@worktop.programming.kicks-ass.net> <20170801091213.mcygpbrf3c5c5qf5@hirez.programming.kicks-ass.net> From: Yafang Shao Date: Tue, 1 Aug 2017 17:20:11 +0800 Message-ID: Subject: Re: [PATCH] sched: fix NULL pointer issue in pick_next_entity() To: Peter Zijlstra Cc: mingo@redhat.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1034 Lines: 25 Hi Peter, 2017-08-01 17:12 GMT+08:00 Peter Zijlstra : > On Tue, Aug 01, 2017 at 04:57:43PM +0800, Yafang Shao wrote: >> > And how would that happen? We only call pick_next_entity(.curr=NULL) >> > when we _know_ cfs_rq->nr_running. >> >> It crashed my machine when I did hadoop test, and after I made this change >> it works now. >> On SMP system, cfs_rq->nr_running isn't protected well, although we _know_ >> cfs_rq->nr_running, >> but it is modified by other thread running on other CPU and the >> sched_entity is set NULL as well. >> Then this thread broken here as accessed the NULL pointer here. > > cfs_rq->nr_running should be protected by the rq->lock. If it is not, > something else is buggered. Yes, I admit that something else is buggered, but unfortunately I haven't understood the scheduler deeply so can't find the root cause. But from my understanding, it is obviously a bug here as we can find that it may occurs that both 'se' and 'curr' are NULL. That's why I submit this patch to fix it.