Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1529019ybl; Thu, 5 Dec 2019 02:57:14 -0800 (PST) X-Google-Smtp-Source: APXvYqzC0Ki+aEyZU7V6adS7trsZrKYxNeP1mAWhYL5z9vJFiSXrIuiohjIt/UroORV1wbqMIvrb X-Received: by 2002:a9d:7d8e:: with SMTP id j14mr5823001otn.227.1575543434205; Thu, 05 Dec 2019 02:57:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575543434; cv=none; d=google.com; s=arc-20160816; b=bHkQbepQZ57PydQwm+k/BHJEsWHuq9FA1ebdyDkx544ZJj7pDKn2vyx4IneCJg6ZuW 30j4MvLGK55Imkv4m4HHrtyA5SQxKRpKi6iIZlX7ycYPrbtLrybyjZwh/Ie+yDXvDCIs 92EGmNo1AqGN9XX+VBm7fZi4ldB3zzdbe/bOTQhtSzv1JfPemb2GIAAsGUG+4m2YVr3x cqIUq6G9DV0PxSUCKhkp53W9r96gbpP77xq60lHb8S87lIxu7k0F6fIkVZgs7qmp2nbP q+it4I3Q8ALopSHFshBs8AJLLUxuDkdrnLMbuqaXnPB3toyeU1z8sWYdkchlZLLf3cNo TTgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:ironport-sdr :ironport-sdr; bh=X6DZztLb+62BQty6QqFlaf45TKhA6h1u3Cb+cxL7yYQ=; b=SsHJtlyVtSp0yJMxM8jaSYTsEQotonpZ+/y5QPo64kXtwYl7DoIqSuKzBex2lM4NoV Uq245oJaLXP+B8bo+FTyr2yXw1mAHUFojncvHeMP0iR2Ur75OnsgfLYO4NUDDhfY4INB 6f0odsKrAsqHoFnL7bNORU8mMR7IlyhrpT/TD0oPk3fmldGYGM3aSIBEe1ryJp/u1Qmm /sV4xDLNhHK8H7D3L5mI9EPwelW9rtJh1fqv5TDy28uY/KfKePhT6+45EoybTQ+hC7VJ sanBitgGvunplBlSD52pRn6sxYekIxJfaOlKQfjlSoFljwk/Zb/blqMhJgDoU9PI0gm4 Vftw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b203si2384668oif.101.2019.12.05.02.57.02; Thu, 05 Dec 2019 02:57:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729267AbfLEK4T convert rfc822-to-8bit (ORCPT + 99 others); Thu, 5 Dec 2019 05:56:19 -0500 Received: from esa2.mentor.iphmx.com ([68.232.141.98]:18248 "EHLO esa2.mentor.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726384AbfLEK4S (ORCPT ); Thu, 5 Dec 2019 05:56:18 -0500 IronPort-SDR: EhnRgSnZ7ptNGwLo7p31k09YQW4Oab9ACLKU+R6bF2YmLtocmLxanO4CB2fh+Pz2/nkZfDGff0 NxEayFeJiuLfrAl4kax19XSvGmdLjgmPI4bg4ga/b5HEzeEnAlZ9v5vjz61B8OsBd8xhIkbCNC 2BWwlxL6aFIMO2m3rHBrUHVn5wVylZctvUn3yEdaTvzXMG4Gm3rAGCXLDobztSJE9qg4LpDblH UiUbJcLpylKoZdbRfvdFOFiY0jUIMZ08+0+mgan6fvuYojfyY9kGZJj9/AZ7mgGJ6SnMmLgjyw l7I= X-IronPort-AV: E=Sophos;i="5.69,281,1571731200"; d="scan'208";a="43734030" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 05 Dec 2019 02:56:18 -0800 IronPort-SDR: DHcOYqEYCIexs8/1AcHZUJt/AVUVRRn3LG7DYarWurKjbUc5CRciV0zPkyHhdTbPq3hE0Yd2SI ySjs7UHV0sg/bXJxCHgfHIKpK/D0UqabGvs5lFNoclAyAE0Vx+n+21mAD2a6/OOg+WMDqttWPe tkyHpXZUeTX99jm3WTAZrLVuX4CcLSZ3V+Ia3lag7ubVyS0bIeigMYORyjoGHrs+t2BPvH7cMg YgPjtlGtf/I4tqzxqVBdGeApBWdYwSfzFsSpuFlTyRfULfwnf9urib6udEbZz80LmWt9L424mL e5c= From: "Schmid, Carsten" To: Peter Zijlstra CC: "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "walken@google.com" , "dave@stgolabs.net" Subject: AW: Crash in fair scheduler Thread-Topic: Crash in fair scheduler Thread-Index: AQHVqbfg64Wqju89WkedWbWijDxswKeoNgIAgAADvACAADdAgIAC7eEg Date: Thu, 5 Dec 2019 10:56:13 +0000 Message-ID: References: <1575364273836.74450@mentor.com> <20191203103046.GJ2827@hirez.programming.kicks-ass.net> <656260cf50684c11a3122aca88dde0cb@SVR-IES-MBX-03.mgc.mentorg.com> <20191203140153.GP2844@hirez.programming.kicks-ass.net> In-Reply-To: <20191203140153.GP2844@hirez.programming.kicks-ass.net> Accept-Language: de-DE, en-IE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [137.202.0.90] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Von: Peter Zijlstra [mailto:peterz@infradead.org] > > Exatly. > > > I suppose one approach is to add code to both __enqueue_entity() and > __dequeue_entity() that compares ->rb_leftmost to the result of > rb_first(). That'd incur some overhead but it'd double check the logic. As this is a ONCE without reproducer, i would prefer to use an approach to exactly check for this case in the code path where it crashed. Something like this (with pseudo-code): simple: .... do { se = pick_next_entity(..) if (unlikely(!se)) { /* here we check for the issue */ write warning and some useful data to dmesg if (cur_rq->rb_leftmost == NULL) { /* our case */ set cur_rq->rb_leftmost to itself as mentioned in the discussion se = pick_next_entity(..) /* should now return a valid pointer */ } else { /* another case happened, unknown */ write warning to dmesg UNKNOWN panic() /* not known what to do here, would crash anyway. */ } set_next_entity(se, ..) cfs_rq = group_cfs_rq(...) } while (cfs_rq); This will definitely not fix the rb_leftmost being NULL, but we can't tell where this happened at all, so it's digging in the dark. Maybe the data written to dmesg will help to diagnose further, if the issue will happen again. And, this will not affect performance much, as i have to take care of this too. Thanks for all your suggestions. Carsten