Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1114778ybi; Wed, 19 Jun 2019 13:51:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqzKI3K2DoOhK7segYMvMtZqQF26bTBExY3UvpKd+XrJb+gigfZm6ylJCSuUQ0xzFTQ5brK+ X-Received: by 2002:a17:902:8203:: with SMTP id x3mr37692614pln.304.1560977495566; Wed, 19 Jun 2019 13:51:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560977495; cv=none; d=google.com; s=arc-20160816; b=eUJzVC0nVClH+dE9wBgvNSfSZsB934XRJg7rHrebuusQPEpPWWEtaVu8Sxs1PCR8vH E1GL9Zep9SpV9lASJaQO+BS4tI2tiqOwEFYlVRNmwODPv/RiYUYgsagiI1Vz4Ui8qirU jJayLSrdnNtLhFBhyo9SCJ3ZwsDxjCG2imaSprXo+g+BCatZ1YKT6cWEPce4pqHpHcbH gqbhBZCFZQPaP8D3qTBe7rnmir3c+dkGB9DH58E/+31VTacYB5FZPnqRsF5xnly9tqnu LdlxkjrDVhoVnZVptGyEbYo0nb/hUJJ/lkdHLv7/GS8Y28JqGFFARc9Pu+QLdX8ntoFi DQNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=kr00XCalu9BxXvOgaKUwp/Tuo/EGDGjVmPbF4AysT5A=; b=NFAkUkHsckzRCWC6LQBtwZzwUor2jMpZDT8OBu3CUSWq9rSoZRHj10Zjush0cu5taZ M4Rgdp76jtKw7SbR/iQahFvg8UJBI2Lrr/eV5yki0j/pQWh0I8WCGS5SzkY1n7Qky9eb Vtok5EveORuhq93SFD06/+9PCDC54goZUt5IIW1AnVXbDVymlulHu3YxXuhvCNE4RGcV u2COG8qXZOM9OsaS1pkuANuYHw/Pv6FPaZU0qXgrtv4FhaFuq1pMlccnuIF5MnPXTLYX IgOConi3aZO/V9Q0ihrPJPj4e1b+Kltyr3Xx9YB923ygKXDuIe9U8egDVOnAYksbqa4t +P2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Q1FREvAh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o4si15995663pls.391.2019.06.19.13.51.19; Wed, 19 Jun 2019 13:51:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Q1FREvAh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730074AbfFSUvG (ORCPT + 99 others); Wed, 19 Jun 2019 16:51:06 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:44169 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726230AbfFSUvG (ORCPT ); Wed, 19 Jun 2019 16:51:06 -0400 Received: by mail-lj1-f195.google.com with SMTP id k18so514012ljc.11 for ; Wed, 19 Jun 2019 13:51:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kr00XCalu9BxXvOgaKUwp/Tuo/EGDGjVmPbF4AysT5A=; b=Q1FREvAhfqHLPs7hiXDan54VkBHE5Oj2EVjFMwHkWWC7wX7DAacLPKkQYU+IIcSvWH x1sSpDVOPkn0QmS6iGkFCAWW0CpcasVSCZOovaaN/vpoiMsyshW78DIDZMUx6Yu0bxeI xqUOuX56dOc/mPuWrrgkbOgpt+3frD4UIqPjc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kr00XCalu9BxXvOgaKUwp/Tuo/EGDGjVmPbF4AysT5A=; b=OpSRyjcV2ONF9Ul8OCcO9VCOnZAueC7iP7HCuDCfWWDm6d8tWie7S8VVqNk+doWNVN uloxQgERzY89mmwEzVrUoJHfPKMHnYtiUXCTV5CQOwrxlXOO73EbE0FOnqruxK7HPL/V KxKCkZShqNwCTDzHfQiN7DBOLIp6h34iMxpOhDQBflWFLmqH/qRVi1781Bv7xJx/v7m2 vL/y7DcjWfpoCTvuPzm/0MevJ9Sxxy4t1WaT88cDF63+LibwPKE2IC/G95Pc4qdkor+S phQ8BiOtZw2pnV1iNUJHMjyaLOh2gABccKlUlH9FybsRgqeL3P+X5pD4Eho6aPZmq/q/ F46g== X-Gm-Message-State: APjAAAXJmUuqDRr30s38NwnAAmKTOxFOjKGgAzUK5OlTxBd4tMvGrLAt F97A35QhXRyAOJsb9UKPe8WfghiIq0Y= X-Received: by 2002:a2e:7c15:: with SMTP id x21mr4186216ljc.55.1560976990379; Wed, 19 Jun 2019 13:43:10 -0700 (PDT) Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com. [209.85.208.175]) by smtp.gmail.com with ESMTPSA id a17sm3169259lfk.0.2019.06.19.13.43.09 for (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Wed, 19 Jun 2019 13:43:09 -0700 (PDT) Received: by mail-lj1-f175.google.com with SMTP id 16so504449ljv.10 for ; Wed, 19 Jun 2019 13:43:09 -0700 (PDT) X-Received: by 2002:a2e:b003:: with SMTP id y3mr34133328ljk.72.1560976989040; Wed, 19 Jun 2019 13:43:09 -0700 (PDT) MIME-Version: 1.0 References: <156094799629.21217.4574572565333265288@skylake-alporthouse-com> <156097197830.664.13418742301997062555@skylake-alporthouse-com> In-Reply-To: <156097197830.664.13418742301997062555@skylake-alporthouse-com> From: Linus Torvalds Date: Wed, 19 Jun 2019 13:42:53 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5] To: Chris Wilson Cc: Linux List Kernel Mailing , Steven Rostedt , Josh Poimboeuf , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson wrote: > > > Do you have the oops itself at all? > > An example at > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/boot0.log > > The bug causing the oops is clearly a driver problem. The rc5 fallout > just seems to be because of some shrinker changes affecting some object > reaping that were unfortunately still active. What perturbed the CI > team was the machine failed to panic & reboot. Hmm. It's hard to guess at the cause of that. The oopses themselves don't look like they are happening in any particularly bad context, so all the normal reboot-on-oops etc stuff _should_ work. So it would help a lot if you could bisect the bad problem at least a bit, if it is at all reproducible. Because with no other clues, it's hard to even guess at what might be up. The fact that you say "NMI watchdog firing as we dumped the ftrace" means that maybe it might be some ftrace / stacktrace issue where the dumping itself leads to some endless loop, but who knows. For example, one thing that has happened during this development cycle is the stacktrace common infrastructure changes (arch_stack_walk() and friends). I'm, not seeing why that would cause your issues, but I'm adding a few random people for ftrace / stacktrace changes. Linus