Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1056014ybi; Fri, 21 Jun 2019 12:57:53 -0700 (PDT) X-Google-Smtp-Source: APXvYqxZ+UkKLlFDH/mNj/fWk3RlXwLhn9++UhvCVSV/wf2lV/v0sjjPy14JPln2E8cBMFfRYweE X-Received: by 2002:a63:6ecf:: with SMTP id j198mr20118407pgc.437.1561147073767; Fri, 21 Jun 2019 12:57:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561147073; cv=none; d=google.com; s=arc-20160816; b=zhW/25KB9c9L8dW1rRE8R8qZ+5UXAIpAGsfjR7YBZ5kMBaRs5EfM5GuPUxjvNJYZUS p1wzBzQClgSOBlTwnCqG3oZShXbOsmEpk8A1140hqLW0ruJi4yfRXBNAa4wcntHaSp7p drmBqZnExp+WKczdjJueJ5bAoY1Jck5WCS7Hnx1VZWrLtAY/B1GJQ1YXGkTx1fkw6uEt UzGiTDYjkppeeFi6zjK2hsHllw8udV7oK2nLOix5Uf9YDR2Cic01o5VUue0LmBjWbSlZ t45zXqBQSPZ4Otgjom1mKTrjcAmmQOTY6//l8iNC8DWK5Hb0BqNGZOijrg4qpD42Iuhl 20qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:subject:user-agent:message-id :references:cc:in-reply-to:from:to:content-transfer-encoding :mime-version; bh=QWtaikBJga8DBU+joM2HeMQulykQm8fvWq9UvRj74E8=; b=gzpAisU1sPHo1bWcp6H6E60UwFhY8NB+uVYSv0pLSM37e/UPa5VdCvB4bAoakC9fv3 IYbiloTwuKibSmNZ2AA433TvNEm34YUuevg5T7aHZv5pMS4S1OlqIIrdNFZbBGzUOGXw ZPsJIdcmL1dE0vokB2ZGhIRSlaXpYw6BC2hjaK2d7whmO+ipF1YAu3PhhSPqObvhOz6C Ig4zV5lUOKzbIx5FNUEjX6fjigBASbgDLHQ/UeWmHIIhUMG8+msArAzvQnfvHXGUoszp R3QYgngBtlyC+/bkt0z31AerDRpSG6WroIqmJ+KZPDtDAPEelAVyBVOoy7r90Nq9lQkL k1iQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q2si3466357plh.56.2019.06.21.12.57.38; Fri, 21 Jun 2019 12:57:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726174AbfFUT5W convert rfc822-to-8bit (ORCPT + 99 others); Fri, 21 Jun 2019 15:57:22 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:63966 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725992AbfFUT5V (ORCPT ); Fri, 21 Jun 2019 15:57:21 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 16983672-1500050 for multiple; Fri, 21 Jun 2019 20:56:15 +0100 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT To: Thomas Gleixner From: Chris Wilson In-Reply-To: Cc: Linus Torvalds , Linux List Kernel Mailing , Steven Rostedt , Josh Poimboeuf , Joerg Roedel References: <156094799629.21217.4574572565333265288@skylake-alporthouse-com> <156097197830.664.13418742301997062555@skylake-alporthouse-com> <156114224132.2401.13297188928702045223@skylake-alporthouse-com> Message-ID: <156114697311.2401.13492363493607545412@skylake-alporthouse-com> User-Agent: alot/0.6 Subject: Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5] Date: Fri, 21 Jun 2019 20:56:13 +0100 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Thomas Gleixner (2019-06-21 20:33:36) > On Fri, 21 Jun 2019, Chris Wilson wrote: > > > Quoting Thomas Gleixner (2019-06-21 16:30:52) > > > Chris, do you have the actual NMI lockup detector splats somewhere? > > > > Sorry, I'm having a hard time reproducing this at will now. The test > > case depends on the right timing of the wrong event to cause the GPU to > > hang. > > > > From memory, I got the > > "Watchdog detected hard LOCKUP on cpu foo" > > followed by the register dump and then nothing. At which point I had to > > power cycle the machine. > > Hmm. Do you have a serial log of that incident? I use netconsole. I think Tomi has a serial console for most things available, but not permanently hooked up. And I didn't have it in a tee as it was late, with the lockup an annoyance to the bug I was trying to solve. I'll keep trying to recreate that bug as once I do have that recipe, it should be possible to bisect. I can check with Tomi on Monday if he can pull a machine out of the farm and see how it locked up. -Chris