Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp1878268rdb; Sat, 20 Jan 2024 19:05:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IFjul+sBHL7NMkJBZxbJEhw/DqKtXzCXEvy9VkaKZWwn2lM/xYGaoQln7cK2EA8n6fOtlPV X-Received: by 2002:a05:6870:c115:b0:210:d06f:39dd with SMTP id f21-20020a056870c11500b00210d06f39ddmr2535865oad.26.1705806309820; Sat, 20 Jan 2024 19:05:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705806309; cv=pass; d=google.com; s=arc-20160816; b=TzewDCQX515GlIloL3oxX8IdtCBTVjapLqkpcqyp82jKnj3NR+aNlMUo5eOVnOVLe6 u39t2H7fJL6nFjP33scqP61KqTz97ESiTPaE46MdtIC+S/BBLVsj6HC4/sE6GlOfssAw GEiUqVzsoK3lQiZFJEQ0WdjuwGmhMZ5VCUsEXrFYj4ZQWYOiqMXur2aiZvLvo93RHIVQ PmtzXhwqGpLM721cXPQ2YauLzEp15uQ5y7R0uGjyCeAqtHOez2Jv48+Hag3PdUUJI7YT 4/o/NOwUpCZbf5PVoSztvYAnTG7n/so+8pLiTZpPk5irYmltKJdW4xENooHDJyPclW3b PHeA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=M8c7Bt3byd9RTRdaOa3TR1Whczb8x0dQJ8VIo3/hb8M=; fh=pviyXVmiWwgMDrZLYUy6r3jCkWOmPrANkCzlFT1AINw=; b=U9kPoo9nNVF1vttJsOzVpqnFLJwyJF7/BaLCapBdQG9D0A41UJTHa4AKNbS1h2hGWN M8PMR1Le9BwVOusn28cInITbtrIRaiqtr9b53FUhyq5GA3Pk4BpXCr/Gi36TbGpdHFMm 3BYIzKsCucw0fxkxbJpmB1F4TSY/XdqaVl1xxJNe7CYo8YgiJrd+CSFhj26U1AyPNYXU tLd6BFK9vLHVhUdPuwB9AXKyeDLYJ1iXc24HOhkOOVzRiNMnLItNXrtQPp+nLd2cT5g8 64LaLlU965CGasQGykAL0pLzUqSibGf+WCa34hYZW7yUszVi2Qo2/6dJVP2KfgCsCOgO saxw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ta4oaK+d; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-31957-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31957-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id e3-20020a636903000000b005cddb743bc7si6046855pgc.709.2024.01.20.19.05.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Jan 2024 19:05:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31957-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ta4oaK+d; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-31957-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31957-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 30CDDB21A00 for ; Sun, 21 Jan 2024 03:05:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A36D53611F; Sun, 21 Jan 2024 03:04:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ta4oaK+d" Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 167A82AE8F for ; Sun, 21 Jan 2024 03:04:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705806294; cv=none; b=KFGvI+/RAlJ+VG5WgLg6yEH2TCHOjIOYQyE+EYKYCwDhlGyiaufay/e+tAu8Skq2uTabB2a8B3KdHStnk3X/fRYAeIMt51e/qACnGnLGJG2gaOVixxBDie7UjQY3ak4ypS286cRG2XJSn5BM27WuYEx3zDeJs+1U/Zb1GbgCGJA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705806294; c=relaxed/simple; bh=vomMp40zGwEWXYmwIXZy+I2UyrExHpCxRhL+2O4FDSw=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=H/2uXZ4vtKrM2s2bDMJMbm/nG+aS5WcT1RYYEFbDQFV08asetKfjcDde8aaumnfltzD6vExKYU4u04iOAb5d0O6HjikxvjIpvnBYA8SKESMJI0Imjd2QMMNAa9zaYmR98OW8whyV0F2X3QIiX8uci6fjAO2QASlUpF52GFOL0kE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ta4oaK+d; arc=none smtp.client-ip=209.85.128.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-40e5afc18f5so23064095e9.3 for ; Sat, 20 Jan 2024 19:04:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705806291; x=1706411091; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M8c7Bt3byd9RTRdaOa3TR1Whczb8x0dQJ8VIo3/hb8M=; b=Ta4oaK+dIINXa2Y/Sa3pKhwVV5Y+vn07xLVH61mWfYRKWmylNB5eX62FIPH6YhnHlJ LAhzXXvMjT5vLEkO4fiaI22GpmsKiGU6ArO+ZrO9XjMbQRv2yIO1gblA9CmgGfCVjnC4 3td1O6j5kMDQoHa4wkUasbYjnNbCHF37z/6CY7/LIfmWotT+W3m9IeUXbImt+weZfitk YXq9okao3I78c83N+V7JavDsbypsTI9HWh6oXMz8V5iVrw7jHzTYqGKYUqMNoMCIkyoL 7v2XNA52rqGiyU5fqoN428R53o5tMOVS2fP0npURoQilVTz6kHj+PwS24AcBOQ945TtC ay5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705806291; x=1706411091; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M8c7Bt3byd9RTRdaOa3TR1Whczb8x0dQJ8VIo3/hb8M=; b=TZYYReFtU7WpGok7oQLADVmoK0eHGQyZwPD5T0xkPXbJkLnq0NOpKQ+RPE79rL4FSz ptwishfzbBxZCty15bqo6fMpjERqPKXHK2SMrGlNOwf27FFM+Y8dJNr7Xn6qMwl7D1TR qOUCFuLSGdmbSPheKcU5EBfmWP5Ab1Hig4s29C3kETb+0HHpxXF2U5kK+3TKXE8lytpz hSA0YpOfPczuIG1YDUlIBoz7TL5BigoZ/VDcjftxCkIhra1+on6VSLzbx2v3IFtiDLZm NzV0xAFSb/iempma2qkv6TGTJqvubOnmNrrwKhz/obgUN6YrM5rY/JhwIpob3V5MuWZr thdQ== X-Gm-Message-State: AOJu0Yy5l/qP0G3reao4MqO0Leq6LN3kSZtAJY5RlYHNgAw/orf5p7JR lXhWYhqUt/TLWEj4hSwMZSyVw6pGnP70gbopUxTGa6w0yuSvKmzRM0aCqYxdADfIfBv/G8G5EGV rOvCzJKVpo4mwkebeJOTWDWTI5NI= X-Received: by 2002:a05:600c:4513:b0:40d:5575:a197 with SMTP id t19-20020a05600c451300b0040d5575a197mr1147431wmo.12.1705806290932; Sat, 20 Jan 2024 19:04:50 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240117031212.1104034-1-nunes.erico@gmail.com> <20240117031212.1104034-5-nunes.erico@gmail.com> In-Reply-To: From: Qiang Yu Date: Sun, 21 Jan 2024 11:04:38 +0800 Message-ID: Subject: Re: [PATCH v1 4/6] drm/lima: handle spurious timeouts due to high irq latency To: Erico Nunes Cc: dri-devel@lists.freedesktop.org, lima@lists.freedesktop.org, anarsoul@gmail.com, Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Sumit Semwal , christian.koenig@amd.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jan 19, 2024 at 9:43=E2=80=AFAM Qiang Yu wrote: > > On Wed, Jan 17, 2024 at 11:12=E2=80=AFAM Erico Nunes wrote: > > > > There are several unexplained and unreproduced cases of rendering > > timeouts with lima, for which one theory is high IRQ latency coming fro= m > > somewhere else in the system. > > This kind of occurrence may cause applications to trigger unnecessary > > resets of the GPU or even applications to hang if it hits an issue in > > the recovery path. > > Panfrost already does some special handling to account for such > > "spurious timeouts", it makes sense to have this in lima too to reduce > > the chance that it hit users. > > > > Signed-off-by: Erico Nunes > > --- > > drivers/gpu/drm/lima/lima_sched.c | 32 ++++++++++++++++++++++++++----- > > drivers/gpu/drm/lima/lima_sched.h | 2 ++ > > 2 files changed, 29 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/l= ima_sched.c > > index 66317296d831..9449b81bcd5b 100644 > > --- a/drivers/gpu/drm/lima/lima_sched.c > > +++ b/drivers/gpu/drm/lima/lima_sched.c > > @@ -1,6 +1,7 @@ > > // SPDX-License-Identifier: GPL-2.0 OR MIT > > /* Copyright 2017-2019 Qiang Yu */ > > > > +#include > > #include > > #include > > #include > > @@ -223,10 +224,7 @@ static struct dma_fence *lima_sched_run_job(struct= drm_sched_job *job) > > > > task->fence =3D &fence->base; > > > > - /* for caller usage of the fence, otherwise irq handler > > - * may consume the fence before caller use it > > - */ > > - dma_fence_get(task->fence); > > + task->done_fence =3D dma_fence_get(task->fence); > > > > pipe->current_task =3D task; > > > > @@ -401,9 +399,33 @@ static enum drm_gpu_sched_stat lima_sched_timedout= _job(struct drm_sched_job *job > > struct lima_sched_pipe *pipe =3D to_lima_pipe(job->sched); > > struct lima_sched_task *task =3D to_lima_task(job); > > struct lima_device *ldev =3D pipe->ldev; > > + struct lima_ip *ip =3D pipe->processor[0]; > > + > > + /* > > + * If the GPU managed to complete this jobs fence, the timeout = is > > + * spurious. Bail out. > > + */ > > + if (dma_fence_is_signaled(task->done_fence)) { > > + DRM_WARN("%s spurious timeout\n", lima_ip_name(ip)); > > + return DRM_GPU_SCHED_STAT_NOMINAL; > > + } > > + > You may just remove this check and left the check after sync irq. > After more thinking, this is only for handling spurious timeouts more efficiently, not for totally reliable timeout handling. So this check shoul= d be OK. > > + /* > > + * Lima IRQ handler may take a long time to process an interrup= t > > + * if there is another IRQ handler hogging the processing. > > + * In order to catch such cases and not report spurious Lima jo= b > > + * timeouts, synchronize the IRQ handler and re-check the fence > > + * status. > > + */ > > + synchronize_irq(ip->irq); > This should be done after drm_sched_stop() to prevent drm scheduler > run more jobs. And we need to disable interrupt of GP/PP for no more > running job triggered fresh INT. This is OK too. We just need to solve reliable timeout handling after drm_sched_stop() in another patch. > > PP may have more than one IRQ, so we need to wait on all of them. > > > + > > + if (dma_fence_is_signaled(task->done_fence)) { > > + DRM_WARN("%s unexpectedly high interrupt latency\n", li= ma_ip_name(ip)); > > + return DRM_GPU_SCHED_STAT_NOMINAL; > > + } > > > > if (!pipe->error) > > - DRM_ERROR("lima job timeout\n"); > > + DRM_ERROR("%s lima job timeout\n", lima_ip_name(ip)); > > > > drm_sched_stop(&pipe->base, &task->base); > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/l= ima_sched.h > > index 6a11764d87b3..34050facb110 100644 > > --- a/drivers/gpu/drm/lima/lima_sched.h > > +++ b/drivers/gpu/drm/lima/lima_sched.h > > @@ -29,6 +29,8 @@ struct lima_sched_task { > > bool recoverable; > > struct lima_bo *heap; > > > > + struct dma_fence *done_fence; > > + > > /* pipe fence */ > > struct dma_fence *fence; > > }; > > -- > > 2.43.0 > >