Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp545451ybl; Fri, 6 Dec 2019 02:12:25 -0800 (PST) X-Google-Smtp-Source: APXvYqw81Ft4qeWHjc5BqUhEmCS/bCuz8ToJkloVe+uEbmy6rXjwu0aJZZlEFvVKye7nobOqF/qh X-Received: by 2002:a9d:3b8:: with SMTP id f53mr10556833otf.180.1575627145403; Fri, 06 Dec 2019 02:12:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575627145; cv=none; d=google.com; s=arc-20160816; b=XcZ+/TPCLyjSHwOfgfKc7/BX+FFIELT9rfe5xkeZmZeKbBDZRP32cIdjXHs2FZmUbs aD1E2oRojBtDjPMN2GsT54SCLp89SO5Lm9L0HLnjyS1YriJkCJk285TUF+fHuGPKLwX+ ncIv3riBr0VzY1xeQbYA5dkW4azinhKRTNOPbOySMRFWim69pfeVPixO9oOvzJS4M7vq CcNs2s/emPNMLsIhOGRO4NMkpc4xe28O5qoEJRFDjf8uXHRKsguA0+kOheBjVV23DGI5 4JvKxyUJqGcQvqirQM9uE66YmpwclThOhMCZcmC/zFFOHZEcG6CTKlwpw9M8x0bfpgBJ PTPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:message-id:date:thread-index :thread-topic:subject:cc:to:from:ironport-sdr:ironport-sdr; bh=cX0mGOcuyo/y+bT6OZ7SjzbdwQ5KP8EBtQsjA7zW7rI=; b=nz11pFl/GUZPKsv52OyzPZRiIqLaGawuOmTwZYKWod3V6fwR7R9CzrNBG1LRGNpTDT xWM6lby1X2NGiGTdl31i08YcAt/saWDi7YUicbHBsg7FMYwbLsjOoUjOW+P9yCp8YNb7 kadDEtkl/00FeUe8DwwWiCyD8hanJbbWEabDXuApV94XLOwBFSWUJGxTY8mqbZPkAtHa 7zOHUj2YIg9JQxlVOHJgxA+reV2pz8XM3rPtsmJjhDYcDWkxXcW1fgZdIlsPjrXddfTW BhDQfgned2oQxWYMh37xLWCAhGMGXObkgQxPcmwr0yp/CvnMvUtuumr/XzbJepWH8/YH VkqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m2si6062733otq.176.2019.12.06.02.12.13; Fri, 06 Dec 2019 02:12:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726271AbfLFKLa convert rfc822-to-8bit (ORCPT + 99 others); Fri, 6 Dec 2019 05:11:30 -0500 Received: from esa1.mentor.iphmx.com ([68.232.129.153]:43668 "EHLO esa1.mentor.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726128AbfLFKLa (ORCPT ); Fri, 6 Dec 2019 05:11:30 -0500 IronPort-SDR: eqRiC5dtHiz8Bnh85YK/UtADo+XCpV8T4/SUPLiDU6t2XmtwYxIX/fEpmDnZ2vwQasNZIkOWfd xWexELFGWgOPL9uZScl6jCZrRBZXA00mTXb380IqeSUVv/8d0qaOGwg81/5iNkz+AjyDYCIwZp ck0NqTjecvPSM/5LmQXmexBtY4Sb+0ryc5Jin+qQZP9J2m5zWFswNgtWHv1csgcZR5T/POMiDm kX8BqAKGW6hnbakBgHQpxjZjpNZSxdPgeWRRbN3CNXZ6kOYa9OvM/k0DwH9oxfrJx9wx76uH6Q eD8= X-IronPort-AV: E=Sophos;i="5.69,284,1571731200"; d="scan'208";a="45727416" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 06 Dec 2019 02:11:29 -0800 IronPort-SDR: NQEWeS1aCttk1s3wQj/3m53ZoY/GNPcVpR5AeAUCXWcoBa6p4Xgf3TJZEaoET0+ySO6s6iduuI /XYG/N72stR3+X2gWDCLg+J5EZbf0/DY5Y4xAOMsKWOz6NRtGuTLZ6r21wcSsPvlZiGjsk8xyq UuIVsYnbSz1anIaYG1QyAGzk2mNVju7TXxQ29Y8kic/hXV28uA7bZvRh4IsUa6z6Q0mvryh1Br wLANnx/5L2I640YxozVCozUuArA7PQBaWmaEZheyBZ6TQV/NqFLqdtRnvKSUKBCMhq5B2WKXSc 8HI= From: "Schmid, Carsten" To: Davidlohr Bueso , Peter Zijlstra CC: "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "walken@google.com" Subject: AW: Crash in fair scheduler Thread-Topic: Crash in fair scheduler Thread-Index: AQHVrBxnkeU0erWGx0aj8BTLdT17Gw== Date: Fri, 6 Dec 2019 10:11:25 +0000 Message-ID: <1575627084926.26450@mentor.com> Accept-Language: de-DE, en-IE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [137.202.0.90] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Von: Davidlohr Bueso [mailto:dave@stgolabs.net] > Gesendet: Donnerstag, 5. Dezember 2019 18:41 > > Yeah I had never seen this either, and would expect the world to fall > appart if leftmost is buggy (much less a one time occurance), but the > following certainly raises a red flag: > > &cfs_rq->tasks_timeline->rb_leftmost > tasks_timeline = { > rb_root = { > rb_node = 0xffff99a9502e0d10 > }, > rb_leftmost = 0x0 > }, > Meanwhile i am diving a bit deeper into the kernel dump. I can see that for this rb_root we have a node structure with 2 nodes: crash> p -x *(struct rb_node *)0xffff99a9502e0d10 $7 = { __rb_parent_color = 0xffff99a9502e0d10, <- points to SELF rb_right = 0xffff99a9502e0d10, <- points to self rb_left = 0xffff99a9502e1990 <- and we have a node left } The rb_left node: crash> p -x *(struct rb_node *)0xffff99a9502e1990 $6 = { __rb_parent_color = 0xffff99a9502e0d11, <- points to the rb_root node (bit 0 is color) rb_right = 0x0, <- no leaf rb_left = 0x0 <- no leaf } I'm currently trying to extract the information what se (scheduling entity) covers these nodes. Anyway, the cfs_rq->tasks_timeline.rb_leftmost should point to 0xffff99a9502e1990 as far as i understand the rb_tree, right? > > > >I suppose one approach is to add code to both __enqueue_entity() and > >__dequeue_entity() that compares ->rb_leftmost to the result of > >rb_first(). That'd incur some overhead but it'd double check the logic. > > We could benefit from improved debugging in rbtrees, not only the cached > flavor. Perhaps we can start with the following -- this would at least > let us know if the case where the tree is non-empty and leftmost is nil > was hit, whether in the scheduler or another user... > > Thanks, > Davidlohr > That's what i will do too, add some debugging stuff. Add that to the project i'm on here, not upstream; and try to log as much debug data as possible if a similar case occurs again. But as rb_tree is excessively used i need to be careful where to add debug code due to performance impact. The approach you do with a configurable rb_tree debug might help me here, yes; i would have taken a similar approach. Thanks, Carsten