Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751805AbdHAJMT (ORCPT ); Tue, 1 Aug 2017 05:12:19 -0400 Received: from merlin.infradead.org ([205.233.59.134]:55850 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751368AbdHAJMS (ORCPT ); Tue, 1 Aug 2017 05:12:18 -0400 Date: Tue, 1 Aug 2017 11:12:14 +0200 From: Peter Zijlstra To: Yafang Shao Cc: mingo@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: fix NULL pointer issue in pick_next_entity() Message-ID: <20170801091213.mcygpbrf3c5c5qf5@hirez.programming.kicks-ass.net> References: <1501581716-8608-1-git-send-email-laoar.shao@gmail.com> <20170801083812.GH6524@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 635 Lines: 14 On Tue, Aug 01, 2017 at 04:57:43PM +0800, Yafang Shao wrote: > > And how would that happen? We only call pick_next_entity(.curr=NULL) > > when we _know_ cfs_rq->nr_running. > > It crashed my machine when I did hadoop test, and after I made this change > it works now. > On SMP system, cfs_rq->nr_running isn't protected well, although we _know_ > cfs_rq->nr_running, > but it is modified by other thread running on other CPU and the > sched_entity is set NULL as well. > Then this thread broken here as accessed the NULL pointer here. cfs_rq->nr_running should be protected by the rq->lock. If it is not, something else is buggered.