Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3174581imu; Fri, 18 Jan 2019 06:08:19 -0800 (PST) X-Google-Smtp-Source: ALg8bN61vdbH9C4tabK1iUMDPyJQGALvpM9POTPMtZ/U5UyOH1LoGqA1ySfUyCii7tSKTse9GCGx X-Received: by 2002:a63:981:: with SMTP id 123mr17927913pgj.444.1547820499294; Fri, 18 Jan 2019 06:08:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547820499; cv=none; d=google.com; s=arc-20160816; b=W9VtWWWlHudWw3vfIEU1Zx133qtNCLJ25/62aSpOm4Xq6tKV2GGdgvWCIJhoBeJn2Y ygCfPd0LRsHGFD00lvyKAalQdBN3/VEDemeGscq1sp7zHUDfPJroOJqySoczTCpERrUb ejsCsXxv8flmPaHzhPMpVQdi1lQzY04zEbjAnpsITQ/jgOaKNLnRYBLofvxNqGGwDGBz u1jOzg6x4ZH3yDFL1M40y4MEn2jJJ4vwVjNBylwSDAIfs24SgBdxbAG/W8zSXMRRHtDY dXyLQD9LKoWCxB36Q9anAzmA3PnpjiBQ+ua93GmBFhmGfo33n3WONMo7wBV37pIa4/PP bHfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=UYo1/LrA3p7SI0Dg4nWcXfrbKIa82o3YNuRGLWd3ibQ=; b=XnciKkmhsu91hxGM6zoa2zjCLHeP7ZVn4bjIeDPTglbUBfOY4CTRqd58iMwOcy+GSl LC3W4YUzTu/DYRMvEatrY0KFFSQ7UzCSlQIXYsrqpk0X1a9CR9+c9miLYCJmeo4ffSiO 78MvEL0yWHD7VWadjRYXyvYIiToV8IXh9CwFB0Wh11AdUr/fHgggxph+d7Y9P87Nn67v PtuJcWaSToqL99bScYDD/L1SJwjP4KOJpUsE/dPGC8uXF9vBMlzgOsw14t7202MmJ4W7 o4g4k5wyqNLJP2QlUHYvezLHGheWxNbcIP2zojhQmfbzM6BH67zabFL6t+whubl9SU7Q 4HFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GkBpfn4R; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si4539132plt.93.2019.01.18.06.07.52; Fri, 18 Jan 2019 06:08:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GkBpfn4R; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727429AbfAROGl (ORCPT + 99 others); Fri, 18 Jan 2019 09:06:41 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:38094 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726881AbfAROGl (ORCPT ); Fri, 18 Jan 2019 09:06:41 -0500 Received: by mail-it1-f194.google.com with SMTP id h65so5904361ith.3 for ; Fri, 18 Jan 2019 06:06:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UYo1/LrA3p7SI0Dg4nWcXfrbKIa82o3YNuRGLWd3ibQ=; b=GkBpfn4R22r5Wp8LkeIqzcpeoEgwC2x8tfx9nml9UAwQ/KmrJF1o6ipP6dtGaNCQwm U68sYX2FlLi+sRFWpiXQLxb+Z/397R+ha2i1qEdaleohyOV2dZR8/+x/gTeG0tmv7klE xnQSloJ7Q++RpQitSKmApqASm6fQUq/37IoNI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UYo1/LrA3p7SI0Dg4nWcXfrbKIa82o3YNuRGLWd3ibQ=; b=cUQFmzquyEn8Z5KMCwBIZt4xcf0hCIiLV/VVyyDBSYOQAGWYhuw20rig2m/USvPzs5 UxquWRulVNF1rSIYvUzxtlSvDG+0ZTPcpPbCVJlxgTnP3VRDNH0hNxh+tanHvOLRixsZ +pQwIPpG6sFS+ZtP3PLpp6u5sJ6NYAR07Fy3KFqAQcWZfg5WUJGP6eKaWPolFrj5ZP8e 8665T6L3c+WwSclONvk39NFlRzhCXlbPMuME2Qj7/hsFkmkIjAOgGv4etUcyxplxAVrs W7soIxWAz85IGZ8z1jncw624c379fyKKf4c5KP8LBI9q67Q/Lmtkoh86hMsCl7wakGvT axbQ== X-Gm-Message-State: AJcUukcIoxtkp3B0Z1XX/65GqWePWk9q2lRsy+gWT0a2AEIhv4w67cMG LJMVAXt0CpSb5RaGbmiguy+5NOoSCdftH+swx/7V2w== X-Received: by 2002:a24:a20e:: with SMTP id j14mr11388139itf.14.1547820400004; Fri, 18 Jan 2019 06:06:40 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Vincent Guittot Date: Fri, 18 Jan 2019 15:06:28 +0100 Message-ID: Subject: Re: Crash in list_add_leaf_cfs_rq due to bad tmp_alone_branch To: Sargun Dhillon Cc: LKML , Ingo Molnar , Peter Zijlstra , Tejun Heo , Peter Zijlstra , Gabriel Hartmann , Gabriel Hartmann Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 18 Jan 2019 at 11:16, Vincent Guittot wrote: > > On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon wrote: > > > > On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon wrote: > > > > > > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix > > > infinite loop in update_blocked_averages() by reverting a9e7f6544b9c > > > and put it on top of 4.19.13. In addition to this, I uninlined > > > list_add_leaf_cfs_rq for debugging. With the fix above applied, the code that manages the leaf_cfs_rq_list is the same since v4.9. Have you noticed similar problem on other older kernel version between v4.9 and v4.19 ? The problem might have been introduce while modifying other part of the scheduler like the sequence for adding/removing cgroup. Knowing the most recent kernel version without the problem could help to narrow the problem Thanks, Vincent > > > > > > This revealed a new bug that we didn't get to because we kept getting > > > crashes from the previous issue. When we are running with cgroups that > > > are rapidly changing, with CFS bandwidth control, and in addition > > > using the cpusets cgroup, we see this crash. Specifically, it seems to > > > occur with cgroups that are throttled and we change the allowed > > > cpuset. > > Thanks for the context, I will try to reproduce the problem and > understand how we can stop in the middle of walking to the > sched_entity branch with a parent not already added > > How many cgroup level have you got in you setup ? > > > > > > > > This patch from Gabriel should fix the problem: > > > > > > [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete > > > > When a child cfs_rq is added to the leaf cfs_rq list before its parent > > tmp_alone_branch is set to point to the child in preparation for the > > parent being added. > > > > If the child is deleted before the parent is added then tmp_alone_branch > > points to a freed cfs_rq. Any future reference to tmp_alone_branch will > > result in a use after free. > > So, the patch below is a temporary fix that helps to recover from the > situation where tmp_alone_branch doesn't finished back to > rq->leaf_cfs_rq_list > But this situation should not happened at the beginning > > > > > Signed-off-by: Gabriel Hartmann > > Reported-by: Sargun Dhillon > > --- > > kernel/sched/fair.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 7137bc343b4a..0987629cbb76 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct > > cfs_rq *cfs_rq) > > static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) > > { > > if (cfs_rq->on_list) { > > + struct rq *rq = rq_of(cfs_rq); > > + > > + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list) > > + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; > > + > > list_del_rcu(&cfs_rq->leaf_cfs_rq_list); > > cfs_rq->on_list = 0; > > }