Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp3246156iob; Sun, 1 May 2022 10:51:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyl4zY7H1+hdIblwZ6+jhrNrhyda7wscH0UmMgtTw61lcxIaSNJlZ+ffdG9cX6DcATNOPnQ X-Received: by 2002:a05:6512:6c2:b0:46b:9578:7c63 with SMTP id u2-20020a05651206c200b0046b95787c63mr7137190lff.204.1651427462295; Sun, 01 May 2022 10:51:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651427462; cv=none; d=google.com; s=arc-20160816; b=I1JrLj1onOX8naGeeIa4+yqVDLfhlDHS/LC/DrXETUuyZ5yEEUyr+1LNbyQJd1Gj+b ZDYfS3+lZ/zdU0Tv8J7DSNQhuUuhvUfKAOmpwQcp1yuUhFQHPCpvFuYIjuuz5STrEvDP agqS1QNg3IE8QmmH7nZ3JS3hh7xR2j5gPrGQ+l4grDZbt7wNw8ndQJ7Vp5Ye6PEAwXOR DX9IHIv+U79tpfu7k0OSgT7sUrjRpnZyWyP7SqMcFx0ZTwD3G+GtzhS0qXrxLCv2hZbM aV4+MBlyAFvA9C6HdmOKPNYAnQn7qhiKOxQ0TJ82ANYpakhRZEb/G0AkgileEyUXOiZu mFLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=je5SErhibMA6Z9T+WagmypOtHrO0Yu5JDtKo0Bs8EKI=; b=axtr+xsDS1bRLKLxnGjxRLRg4335RftWGFpVlKGkkZ9vkllOSFNfsBPZLU8wvnTeYp 4azulO2Csz/FbMg1CujCA3bVJ/srDynLDQxiwm3Opb6GVMa/Be/L/HMT9aO5i8EA9ZfU xZB+kEcGUJkt6Lk9AhtuzhkbLzLB80liTUflEq+KOyFSAmg17MQut/9li9veMxPbVCvl GYRtuqA+Wu2tnwFWfkw4sTg9tzk7PwAf+SLDH+gELElMaiZFI5xrZQ4IARu7MeSjkYME 8j/s/SKemqvlHA64DbYgLnAz5p0XTGkXXieDGsCGw7V9qu5Dz5tk5khtBJtNuJmnvG72 xI6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gm54mdTu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p26-20020ac246da000000b0047211a68289si11444621lfo.508.2022.05.01.10.50.35; Sun, 01 May 2022 10:51:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gm54mdTu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381638AbiD2XAB (ORCPT + 99 others); Fri, 29 Apr 2022 19:00:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239840AbiD2XAA (ORCPT ); Fri, 29 Apr 2022 19:00:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0E386CE10F for ; Fri, 29 Apr 2022 15:56:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651273000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=je5SErhibMA6Z9T+WagmypOtHrO0Yu5JDtKo0Bs8EKI=; b=gm54mdTuzWbHrCFEn6RPSwI/y6R3+FbdOJQzylamxBUsXhDOiTsRU0QpzdSS5LDkGzqhEP i14ZnJ5dIZJ/tuFEpE8tNnKhGqzo3/5tNOxbVdr5ygFHd2hlJ74ZrrPfHBRWDIBTnxOS5y Wn+bU4jvXS6W1e4L49a5Cr6hKYpJAp4= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-DnCOLlGkMtmaLb3Vv1HVQQ-1; Fri, 29 Apr 2022 18:56:39 -0400 X-MC-Unique: DnCOLlGkMtmaLb3Vv1HVQQ-1 Received: by mail-wm1-f70.google.com with SMTP id g9-20020a1c4e09000000b0038f20d94f01so6491372wmh.8 for ; Fri, 29 Apr 2022 15:56:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=je5SErhibMA6Z9T+WagmypOtHrO0Yu5JDtKo0Bs8EKI=; b=cTkmRyyITjP/RibDyEUiVWPtORAq5Qm3sfrDccsp78SXRbOwzXGSe7WjvZW8U3SI+v Wy9OZBZ4ufqdefA71jwqRX9SyibZBln5BA9hWjgjGHYQy+WLEsu/WCB2ppcAC2zJrlzh NIoVBOsFGSGXFiDTre/J+toco2uZ8jkBv4lLyijaVkRat45120vkmJzT9/sVwCqZPSTJ WLUD8AgGCy5zGwwRAxCu3hUGFTp7v+hnQYJoTYoPeSjCSTJbsXemx6/eGKhGMVeKSrZF xF2nsOl5wl6EG0hmaBYLhuUICQtmQvLVk/nIKo3Zp3oBlD4XrOjuPhl3AOXqJvCx6iRW MGww== X-Gm-Message-State: AOAM5309ZCNwlsTw6A4mCQ4Yc5oE8OgDRV0MuOeaRJXJR8FM9KJ+1h8K /z7xANroZmnahI7eSnvam+tysfpDIFB+qI+j2tOY5sDeGjtSTXuKSCjBBcFntabNy7dQdG1xbAn icaRcKxZnb7jGEvAWy8fpZIEuiHHbKHYAvNmGZKNL X-Received: by 2002:a5d:4a81:0:b0:207:9abe:2908 with SMTP id o1-20020a5d4a81000000b002079abe2908mr913073wrq.341.1651272997153; Fri, 29 Apr 2022 15:56:37 -0700 (PDT) X-Received: by 2002:a5d:4a81:0:b0:207:9abe:2908 with SMTP id o1-20020a5d4a81000000b002079abe2908mr913064wrq.341.1651272996952; Fri, 29 Apr 2022 15:56:36 -0700 (PDT) MIME-Version: 1.0 References: <20220429195350.85620-1-lyude@redhat.com> In-Reply-To: <20220429195350.85620-1-lyude@redhat.com> From: Karol Herbst Date: Sat, 30 Apr 2022 00:56:26 +0200 Message-ID: Subject: Re: [PATCH] drm/nouveau/subdev/bus: Ratelimit logging for fault errors To: Lyude Paul Cc: nouveau , dri-devel , Ben Skeggs , David Airlie , Daniel Vetter , open list Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 29, 2022 at 9:54 PM Lyude Paul wrote: > > There's plenty of ways to fudge the GPU when developing on nouveau by > mistake, some of which can result in nouveau seriously spamming dmesg with > fault errors. This can be somewhat annoying, as it can quickly overrun the > message buffer (or your terminal emulator's buffer) and get rid of actually > useful feedback from the driver. While working on my new atomic only MST > branch, I ran into this issue a couple of times. > > So, let's fix this by adding nvkm_error_ratelimited(), and using it to > ratelimit errors from faults. This should be fine for developers, since > it's nearly always only the first few faults that we care about seeing. > Plus, you can turn off rate limiting in the kernel if you really need to. > good idea :) Although I think we ultimately need a detection of "GPU state is trashed" where we cease to do anything and disable interrupts as some of those issues actually cause the kernel to get DOSed by nouveau spamming/handling interrupts. Not sure if we should go with this patch in the meantime, or if we should just go straight ahead and disable interrupts, which would also stop the spam. Do you know if this patch here is enough to get the system to reboot normally or is it only spamming less? > Signed-off-by: Lyude Paul > --- > drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h | 2 ++ > drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.c | 14 +++++++------- > drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.c | 6 +++--- > drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.c | 6 +++--- > 4 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h b/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > index 1665738948fb..96113c8bee8c 100644 > --- a/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > @@ -62,4 +62,6 @@ void nvkm_subdev_intr(struct nvkm_subdev *); > #define nvkm_debug(s,f,a...) nvkm_printk((s), DEBUG, info, f, ##a) > #define nvkm_trace(s,f,a...) nvkm_printk((s), TRACE, info, f, ##a) > #define nvkm_spam(s,f,a...) nvkm_printk((s), SPAM, dbg, f, ##a) > + > +#define nvkm_error_ratelimited(s,f,a...) nvkm_printk((s), ERROR, err_ratelimited, f, ##a) > #endif > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.c > index 53a6651ac225..80b5aaceeaad 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/gf100.c > @@ -35,13 +35,13 @@ gf100_bus_intr(struct nvkm_bus *bus) > u32 addr = nvkm_rd32(device, 0x009084); > u32 data = nvkm_rd32(device, 0x009088); > > - nvkm_error(subdev, > - "MMIO %s of %08x FAULT at %06x [ %s%s%s]\n", > - (addr & 0x00000002) ? "write" : "read", data, > - (addr & 0x00fffffc), > - (stat & 0x00000002) ? "!ENGINE " : "", > - (stat & 0x00000004) ? "PRIVRING " : "", > - (stat & 0x00000008) ? "TIMEOUT " : ""); > + nvkm_error_ratelimited(subdev, > + "MMIO %s of %08x FAULT at %06x [ %s%s%s]\n", > + (addr & 0x00000002) ? "write" : "read", data, > + (addr & 0x00fffffc), > + (stat & 0x00000002) ? "!ENGINE " : "", > + (stat & 0x00000004) ? "PRIVRING " : "", > + (stat & 0x00000008) ? "TIMEOUT " : ""); > > nvkm_wr32(device, 0x009084, 0x00000000); > nvkm_wr32(device, 0x001100, (stat & 0x0000000e)); > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.c > index ad8da523bb22..c75e463f3501 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv31.c > @@ -45,9 +45,9 @@ nv31_bus_intr(struct nvkm_bus *bus) > u32 addr = nvkm_rd32(device, 0x009084); > u32 data = nvkm_rd32(device, 0x009088); > > - nvkm_error(subdev, "MMIO %s of %08x FAULT at %06x\n", > - (addr & 0x00000002) ? "write" : "read", data, > - (addr & 0x00fffffc)); > + nvkm_error_ratelimited(subdev, "MMIO %s of %08x FAULT at %06x\n", > + (addr & 0x00000002) ? "write" : "read", data, > + (addr & 0x00fffffc)); > > stat &= ~0x00000008; > nvkm_wr32(device, 0x001100, 0x00000008); > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.c > index 3a1e45adeedc..2055d0b100d3 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bus/nv50.c > @@ -60,9 +60,9 @@ nv50_bus_intr(struct nvkm_bus *bus) > u32 addr = nvkm_rd32(device, 0x009084); > u32 data = nvkm_rd32(device, 0x009088); > > - nvkm_error(subdev, "MMIO %s of %08x FAULT at %06x\n", > - (addr & 0x00000002) ? "write" : "read", data, > - (addr & 0x00fffffc)); > + nvkm_error_ratelimited(subdev, "MMIO %s of %08x FAULT at %06x\n", > + (addr & 0x00000002) ? "write" : "read", data, > + (addr & 0x00fffffc)); > > stat &= ~0x00000008; > nvkm_wr32(device, 0x001100, 0x00000008); > -- > 2.35.1 >