Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp3183321ybd; Mon, 24 Jun 2019 21:11:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqxZpT743wU/CoC9Be0s5qZoe6bWgY6I5l8OPvAfzuK+7esd4X0mfTLCjEtkLOFemG2reZe/ X-Received: by 2002:a17:902:aa41:: with SMTP id c1mr11119719plr.201.1561435882256; Mon, 24 Jun 2019 21:11:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561435882; cv=none; d=google.com; s=arc-20160816; b=xRo7spMDH736GfQaHRppuh5vdZhz8ZUlzNK2/bDLuYAdGW80Csl54/Ku8wsLyvG/ak JChJMkrUDSsDMhwnOw75uWCvVQ7v5niUatHi3BQm7+PNs0B0EUYRYEa4nX8+nPBvYf2+ 2zer9eTrh3kalE/Qjjcj9LPkBizsoM813YNUH0MQAcDrEwxOC2NX9PElIiCvy8z6GbGz uasRqgsFqkoDruyT5l0dlH0n/bo2Rj2CDC0ypS8+IkjTL7H44WuBS1VvfH+7npjcxADD nb2fc1idSM9ptDKIMUeXGJQ4U0LFgjJIkQPCqRxFoC34IO7xRAMlA3Yr6Y8hIwp8b4L7 8tIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=RQZoyOornettpZ61iKLUTiKlu25mLUiqHt7/ifddd/c=; b=HjYiN+5pW3XUMaFawwY2/FgnK/ksybiduf49IjNptk+MkJzJpMuvcQjM3ek1PZ2ArH IYuBE4je1W7Zk7/IK6RKC3QdcO4SGS8AzUxSKCQr0Kgz59fe3ru6zUctJv0gRyN8vAWp 3q4zAfgOTlKIwF002jguO57x5Q4LUQIl0osF6Z3nyTcS4MrqPWzuQRSaXSrivBySiiou xwSBzeA89NpZ63p83/ET2i54d3HLM8hEZ8zZtYfX1U3p9WboBMD6DVMnvM9Kg7nXdgBo fbVxBQoiLr2jDIremiRebLV0iRdDmAfU0HGYgITi4pLv2KITbhvAl907jnG/vGhnK4Sw xPUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k5si12719661pgm.297.2019.06.24.21.11.06; Mon, 24 Jun 2019 21:11:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730672AbfFYDDv (ORCPT + 99 others); Mon, 24 Jun 2019 23:03:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58510 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729502AbfFYDDu (ORCPT ); Mon, 24 Jun 2019 23:03:50 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 177533082133; Tue, 25 Jun 2019 03:03:49 +0000 (UTC) Received: from treble (ovpn-126-66.rdu2.redhat.com [10.10.126.66]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DC3C119722; Tue, 25 Jun 2019 03:03:47 +0000 (UTC) Date: Mon, 24 Jun 2019 22:03:45 -0500 From: Josh Poimboeuf To: Linus Torvalds Cc: Chris Wilson , Linux List Kernel Mailing , Steven Rostedt , Thomas Gleixner Subject: Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5] Message-ID: <20190625030345.dwbydi2w67mpp4zq@treble> References: <156094799629.21217.4574572565333265288@skylake-alporthouse-com> <156097197830.664.13418742301997062555@skylake-alporthouse-com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Tue, 25 Jun 2019 03:03:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 19, 2019 at 01:42:53PM -0700, Linus Torvalds wrote: > On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson wrote: > > > > > Do you have the oops itself at all? > > > > An example at > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/boot0.log > > > > The bug causing the oops is clearly a driver problem. The rc5 fallout > > just seems to be because of some shrinker changes affecting some object > > reaping that were unfortunately still active. What perturbed the CI > > team was the machine failed to panic & reboot. > > Hmm. It's hard to guess at the cause of that. The oopses themselves > don't look like they are happening in any particularly bad context, so > all the normal reboot-on-oops etc stuff _should_ work. Looking at the dmesg, panic_on_oops doesn't seem to be enabled: it went through the rewind_stack_do_exit() path instead of the panic() path. So the system is apparently not configured to reboot on oops. So I'd say the hang was presumably caused by a lock held by the oopsing code. So it looks normal to me, other than the original oops. -- Josh