Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp90369rdb; Wed, 17 Jan 2024 18:47:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IGh8aQUbNoq6djNMCJX40s5TCAGViRSxJ1f44fTAcU5BZGbSWp7S6VgqacCpqmw3FnTNFtu X-Received: by 2002:a05:620a:f15:b0:783:25d4:6239 with SMTP id v21-20020a05620a0f1500b0078325d46239mr130024qkl.28.1705546031234; Wed, 17 Jan 2024 18:47:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705546031; cv=pass; d=google.com; s=arc-20160816; b=avoHotPMehwuz4bAL+g/O0t2tNQDR4AzNZifSVTCvJWp3InVHN6vqy1tt1rtP5vCk4 ksKyEVxJ8DSvnB9nQQ7m5/ZF5hEYt5KbBxFSMj+8OB89IDwz5RGeZmILiX93O5DK/HRq rWSBlxM9Oxq1dATTPMNxBuEoLba68IlK14gS7tlw7A7hTosQfB5qQmv/8Wtp50V0EQCc NmCNoehyVbAvcIszJe7KNSxVyywGeEykwLs7dugQ2nzwzxPYKzq5haeWVy381gw4hnnf 0gCBxVtcuP0BkIUz1iA09rnsSjfqJKjqia+USVIwEocPc+fmzWCdZJ7nQ3wk87Awrvv1 XWfQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=kHmomQH38paTE5xCI3dpwn2LAG3AYq1xzdGV557xeOg=; fh=pviyXVmiWwgMDrZLYUy6r3jCkWOmPrANkCzlFT1AINw=; b=cGH/UvzhLoK7BRXmGTIJeLyw8Zu/LLRAjU9UZZLhtAtWG8rLx81CUJRvxGp+gUMudP jjQgWY6AFuLhA4kYn2ja4pYQuZ6VnFL/NvH5Vgqxh+97Pw6b8npm74l/34KYaomPH/eM U1cUlXpsx5bZd6Epxzd14ekXbiRPnJCz/1/8o3zSMcwaHtf73m0e1ojVVuZ9Xf+V3fo2 ITdQ9luwb0cIqpMH9Lip9A8923UIbmEr/Z4hh+3ntYbybkMJ83NTLUs+DI9aLTmdFw1u M+nin5nu9sNPekGnFEyZxT1jUb9kZJCE/ymniRAxQzi5EwZODDaKNUud1kmZPB4AihZX zUAw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=aGljgCUi; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-29668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29668-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id z11-20020a05620a260b00b0077f5a4b743fsi13752711qko.119.2024.01.17.18.47.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 18:47:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=aGljgCUi; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-29668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29668-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E97B61C22614 for ; Thu, 18 Jan 2024 02:47:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B7B3253BE; Thu, 18 Jan 2024 02:47:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aGljgCUi" Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FD1523D8 for ; Thu, 18 Jan 2024 02:47:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705546021; cv=none; b=Hdq/m0CzrkE8UvqsH/MvNF/QF/J16G64vt7BMkbS8UnQ7MSf3c3lf36I+75trmnUFt+qIsm4W3hqif2tIhjWrhvIDoL7ECnXlKLXv196EdzOJCWWBGwGWCKqJ4286iOdN4rS0sZUOpE6vOPJZBt4MDINK04ZbY0HVEcc7D51nkU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705546021; c=relaxed/simple; bh=pFliK5BAQ9UGQWjbO0hdtlLaVjgDtiuX8rlirBqgYnk=; h=Received:DKIM-Signature:X-Google-DKIM-Signature: X-Gm-Message-State:X-Google-Smtp-Source:X-Received:MIME-Version: References:In-Reply-To:From:Date:Message-ID:Subject:To:Cc: Content-Type:Content-Transfer-Encoding; b=ppCoaqA3S/qC8uLIhvbHDSIRuwXnT5yzQV12fUlCyLWsJqbC+4VdMq512qG991rdn/c5GiWWc3FLv0GIo2DSyJ1xpnmmeP1owLP2E1V/nFfWGXGtRHxS1TT8/jNn2mIynPYwHE0cJAHxgNb/vMwbEwAtceECVHC/KL1YeAwfc0A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aGljgCUi; arc=none smtp.client-ip=209.85.208.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-557ad92cabbso11800519a12.0 for ; Wed, 17 Jan 2024 18:47:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705546018; x=1706150818; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kHmomQH38paTE5xCI3dpwn2LAG3AYq1xzdGV557xeOg=; b=aGljgCUiLQDQE64n/WLnSlR4S+kHEfvORC1jJPpj4P4PlqQVtFtyzPUjsp4fxTjssw 1iYh/Qu+Vs8n4lfJ8kHe61e6HPIhX5h6Y9+Va5WcJUsuoP3pcqbkCNzkeEPpOCIwvV9A iC/ic0DG93oLPNOo+6Guin0DG4lA4arQOBYgEgHG3aPfj5aNN/xnyzy2GTRNhSL6YUKB gjcJHLKAC3DEPiSk/YDCwd0kAOJq/7TScS/VoAn3vnIXyiFNL8RsUXf//3PLRy8vWsYy vDNkdK8KGBi1W2FjnIqwjO2VdoO6nq5v6nRcFAHd1yTFr5Yd0PYxLNf6jB8qwyNSPGXe NJmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705546018; x=1706150818; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kHmomQH38paTE5xCI3dpwn2LAG3AYq1xzdGV557xeOg=; b=F6+Xo3sjRQckEZfl9yMvYiYnMEIaaHzyaIWxfMR7uhX23DtQ4XIbf7m1UK2Ei7LRdn /CujLodc7K2LCVILVEI6auxafQHxmEn+/wZbXrLe6H1wC6Mzy0GH/EMYruNtp5OqwGjQ iccYiuU3RKvToCfILmMLKIhNKYEZ8TaDCsV+vfyn6Io17/gUYjn7M8JP/gHhp2psNc8r ULhs4rm0yDxSCmQ8TRY+XUTZ3fEjyKo1ccN0z9qEwMS6y3+9Pfdn8Ae7k48v5KPbNCAH 2u0JViCbHTfJn0NJqFivRwHAxUUMlOKLfvSdq933UUvkV7zH8nrSi/3UJU7qpDbktU1M 3oSQ== X-Gm-Message-State: AOJu0YyvJOS/2iMn0z3+L2EIsHj1e9h2gPRpASGwL3Thx/Xg8zbvTbgo j1WfCYdZpovvunMgjam6egOjMUZpVEyLzb+LYUl1PqbaputJQegkuOVB+/5obaIlIBYhLCtV8aS yN7Y8T61++3/Hn2nJFStWltniQU4= X-Received: by 2002:a50:fe97:0:b0:559:2e01:3b75 with SMTP id d23-20020a50fe97000000b005592e013b75mr51053edt.63.1705546018440; Wed, 17 Jan 2024 18:46:58 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240117031212.1104034-1-nunes.erico@gmail.com> <20240117031212.1104034-5-nunes.erico@gmail.com> In-Reply-To: <20240117031212.1104034-5-nunes.erico@gmail.com> From: Qiang Yu Date: Thu, 18 Jan 2024 10:46:46 +0800 Message-ID: Subject: Re: [PATCH v1 4/6] drm/lima: handle spurious timeouts due to high irq latency To: Erico Nunes Cc: dri-devel@lists.freedesktop.org, lima@lists.freedesktop.org, anarsoul@gmail.com, Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Sumit Semwal , christian.koenig@amd.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jan 17, 2024 at 11:12=E2=80=AFAM Erico Nunes wrote: > > There are several unexplained and unreproduced cases of rendering > timeouts with lima, for which one theory is high IRQ latency coming from > somewhere else in the system. > This kind of occurrence may cause applications to trigger unnecessary > resets of the GPU or even applications to hang if it hits an issue in > the recovery path. > Panfrost already does some special handling to account for such > "spurious timeouts", it makes sense to have this in lima too to reduce > the chance that it hit users. > > Signed-off-by: Erico Nunes > --- > drivers/gpu/drm/lima/lima_sched.c | 32 ++++++++++++++++++++++++++----- > drivers/gpu/drm/lima/lima_sched.h | 2 ++ > 2 files changed, 29 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lim= a_sched.c > index 66317296d831..9449b81bcd5b 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -1,6 +1,7 @@ > // SPDX-License-Identifier: GPL-2.0 OR MIT > /* Copyright 2017-2019 Qiang Yu */ > > +#include > #include > #include > #include > @@ -223,10 +224,7 @@ static struct dma_fence *lima_sched_run_job(struct d= rm_sched_job *job) > > task->fence =3D &fence->base; > > - /* for caller usage of the fence, otherwise irq handler > - * may consume the fence before caller use it > - */ > - dma_fence_get(task->fence); > + task->done_fence =3D dma_fence_get(task->fence); > > pipe->current_task =3D task; > > @@ -401,9 +399,33 @@ static enum drm_gpu_sched_stat lima_sched_timedout_j= ob(struct drm_sched_job *job > struct lima_sched_pipe *pipe =3D to_lima_pipe(job->sched); > struct lima_sched_task *task =3D to_lima_task(job); > struct lima_device *ldev =3D pipe->ldev; > + struct lima_ip *ip =3D pipe->processor[0]; > + > + /* > + * If the GPU managed to complete this jobs fence, the timeout is > + * spurious. Bail out. > + */ > + if (dma_fence_is_signaled(task->done_fence)) { > + DRM_WARN("%s spurious timeout\n", lima_ip_name(ip)); > + return DRM_GPU_SCHED_STAT_NOMINAL; > + } > + > + /* > + * Lima IRQ handler may take a long time to process an interrupt > + * if there is another IRQ handler hogging the processing. > + * In order to catch such cases and not report spurious Lima job > + * timeouts, synchronize the IRQ handler and re-check the fence > + * status. > + */ > + synchronize_irq(ip->irq); > + > + if (dma_fence_is_signaled(task->done_fence)) { > + DRM_WARN("%s unexpectedly high interrupt latency\n", lima= _ip_name(ip)); > + return DRM_GPU_SCHED_STAT_NOMINAL; > + } > > if (!pipe->error) > - DRM_ERROR("lima job timeout\n"); > + DRM_ERROR("%s lima job timeout\n", lima_ip_name(ip)); > > drm_sched_stop(&pipe->base, &task->base); > > diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lim= a_sched.h > index 6a11764d87b3..34050facb110 100644 > --- a/drivers/gpu/drm/lima/lima_sched.h > +++ b/drivers/gpu/drm/lima/lima_sched.h > @@ -29,6 +29,8 @@ struct lima_sched_task { > bool recoverable; > struct lima_bo *heap; > > + struct dma_fence *done_fence; This is same as the following fence, do we really need a duplicated one? > + > /* pipe fence */ > struct dma_fence *fence; > }; > -- > 2.43.0 >