Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp4118212rwo; Tue, 25 Jul 2023 00:27:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlE4JfqNDw6DX51ISGJVODAt7Iwtoswe9aoC00nWV0bMpVBwRTw9oBJXFL+1jClKJeFuZNvw X-Received: by 2002:a17:906:18:b0:994:4ec3:147f with SMTP id 24-20020a170906001800b009944ec3147fmr11700787eja.67.1690270057649; Tue, 25 Jul 2023 00:27:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690270057; cv=none; d=google.com; s=arc-20160816; b=i8w4H+DYkpmP64NHTGUUd2ev/291YqbmbPGBMAJphd+tlfKFgoTLaWpOMDIqkbVxhX ikskdglPJppIM6PiOLRTdOazRr/KUN1q9jRRMXT+rDwHdibvE+iIoy1KEKuqPeMsLrLt e5dMr5SRFgk+Ne87SATQsYX7O/Ywe2jIxM3mj/CTThrVmQ8e5eHo0CmEK0n55GNPaa/w mDIFuBy3WEyo3FtTxn33ogFhQGGYRm3HbdpUetgV+WdrsSSYfqnfOkbydgIvSICE38u1 e6DjgMXdyh+skQKedq6dkZep1Z6y/iBALAK1hA/7HVFb4pW2A1g0dEjZrPVTtUdxnkI2 lPyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :feedback-id:references:in-reply-to:message-id:subject:cc:from:to :dkim-signature:date; bh=wDy2s35D0p2BYX5LDcVStPxYTQbKkfPe7M8mZSirl80=; fh=fyDyGdMRefsdI8UdY1gHoBJ1T5HGoS/uNru8+joiWNE=; b=0PUGuzHLIRrp3IVO5ybeSuQ6Z96bgVnU/Day2Va99fnlUffaeCf76af1Uh+GPp+qA9 4l0t7ghzgzGKgnIdYMPoq8cPPxKwgNwlK0o7TW/ZvtHJFKInwl/AwcCEjNNfWdBmj0vs 5jRus8UA8VTar0T+j10uhKLXqTli1dCdwNr3vqCCctgviQXH8VhkfRuwi+Ya/07hr39+ gQuFaNRq41YX6YdxCXzbcmKG/IBa1hqB5AWQ8uvApm3OhmtMGfKp7QvXB8ZZ8Y4V9Srm 5d2u83G+VK8/XioA7uMZ1yjs0HNp7EOpGbHwN1PWiN96hDaPWx52D7C2+gMuG+khchWt uX3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@emersion.fr header.s=protonmail2 header.b="C/QrTcc6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=emersion.fr Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g15-20020a170906520f00b00993ebae9927si7400029ejm.784.2023.07.25.00.27.11; Tue, 25 Jul 2023 00:27:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@emersion.fr header.s=protonmail2 header.b="C/QrTcc6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=emersion.fr Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229495AbjGYHCk (ORCPT + 99 others); Tue, 25 Jul 2023 03:02:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229475AbjGYHCi (ORCPT ); Tue, 25 Jul 2023 03:02:38 -0400 Received: from mail-4022.proton.ch (mail-4022.proton.ch [185.70.40.22]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8D5BBD for ; Tue, 25 Jul 2023 00:02:36 -0700 (PDT) Date: Tue, 25 Jul 2023 07:02:22 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emersion.fr; s=protonmail2; t=1690268554; x=1690527754; bh=wDy2s35D0p2BYX5LDcVStPxYTQbKkfPe7M8mZSirl80=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=C/QrTcc6LF7MUrxxEU4w7LuUmfGcP6biTF5cnnbZRRV7yOdC9Hb32Nzep8K18ax47 3IbjBcExeyvR6MlSUPHftJxhFw240B8cxkupHTvD2G5fMsPFIwNm60wfuByX4lmJ2q H6Eqy/mo9oeiYsEA0wkmtGCDUvqeR3RWw64QufcSS+JLcFQeS3m5W8rnCWbA2OM/uP ZZ2yqgTTKz8J/I+NP/X44QomuBhP9Wsm+LkZDuvWZYdxY4wfRf3qQgrh/QWbxmHUwV L7QWHYP910cHxbNBrY8RuxioFjTEywHmf9oPyhZ1s5bG59YQyS33Zbejjk42GLqX+L yXlmDbN1HtJPw== To: =?utf-8?Q?Andr=C3=A9_Almeida?= From: Simon Ser Cc: dri-devel@lists.freedesktop.org, kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, amd-gfx@lists.freedesktop.org, Rob Clark , Pekka Paalanen , linux-kernel@vger.kernel.org, Daniel Vetter , Daniel Stone , =?utf-8?Q?=27Marek_Ol=C5=A1=C3=A1k=27?= , Dave Airlie , =?utf-8?Q?Michel_D=C3=A4nzer?= , Samuel Pitoiset , =?utf-8?Q?Timur_Krist=C3=B3f?= , Bas Nieuwenhuizen , Randy Dunlap , Pekka Paalanen Subject: Re: Non-robust apps and resets (was Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations) Message-ID: <4ImkvYT8BoTiT_R4YqwMS2k20KRuGBvPF05lQX9R_zDtLKNRP646_V2VnGid__mG_h1cI8cL-uco3aXd9cpFnaBlfbwCaQOVwRYCthEhuQI=@emersion.fr> In-Reply-To: References: <20230627132323.115440-1-andrealmeid@igalia.com> Feedback-ID: 1358184:user:proton MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday, July 25th, 2023 at 04:55, Andr=C3=A9 Almeida wrote: > It's not clear what we should do about non-robust OpenGL apps after GPU > resets, so I'll try to summarize the topic, show some options and my > proposal to move forward on that. >=20 > Em 27/06/2023 10:23, Andr=C3=A9 Almeida escreveu: >=20 > > +Robustness > > +---------- > > + > > +The only way to try to keep an application working after a reset is if= it > > +complies with the robustness aspects of the graphical API that it is u= sing. > > + > > +Graphical APIs provide ways to applications to deal with device resets= . However, > > +there is no guarantee that the app will use such features correctly, a= nd the > > +UMD can implement policies to close the app if it is a repeating offen= der, > > +likely in a broken loop. This is done to ensure that it does not keep = blocking > > +the user interface from being correctly displayed. This should be done= even if > > +the app is correct but happens to trigger some bug in the hardware/dri= ver. > > + >=20 > Depending on the OpenGL version, there are different robustness API > available: >=20 > - OpenGL ABR extension [0] > - OpenGL KHR extension [1] > - OpenGL ES extension [2] >=20 > Apps written in OpenGL should use whatever version is available for them > to make the app robust for GPU resets. That usually means calling > GetGraphicsResetStatusARB(), checking the status, and if it encounter > something different from NO_ERROR, that means that a reset has happened, > the context is considered lost and should be recreated. If an app follow > this, it will likely succeed recovering a reset. >=20 > What should non-robustness apps do then? They certainly will not be > notified if a reset happens, and thus can't recover if their context is > lost. OpenGL specification does not explicitly define what should be > done in such situations[3], and I believe that usually when the spec > mandates to close the app, it would explicitly note it. >=20 > However, in reality there are different types of device resets, causing > different results. A reset can be precise enough to damage only the > guilty context, and keep others alive. >=20 > Given that, I believe drivers have the following options: >=20 > a) Kill all non-robust apps after a reset. This may lead to lose work > from innocent applications. >=20 > b) Ignore all non-robust apps OpenGL calls. That means that applications > would still be alive, but the user interface would be freeze. The user > would need to close it manually anyway, but in some corner cases, the > app could autosave some work or the user might be able to interact with > it using some alternative method (command line?). >=20 > c) Kill just the affected non-robust applications. To do that, the > driver need to be 100% sure on the impact of its resets. We've discussed this a while back on #dri-devel IIRC. I think the best experience would be for the Wayland compositor to gray out apps which lost their GL context, and display an information dialog to explain what happened to the user and a button to kill the app. I'm not exactly sure how that would translate to a kernel or Mesa uAPI, and if there's appetite to do a lot of work to get "the best GPU reset UX" (IOW: maybe it's not worth all of the trouble).