Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp4205937rwo; Tue, 25 Jul 2023 02:17:00 -0700 (PDT) X-Google-Smtp-Source: APBJJlED1DIv9p1q8nNBIlv9oDfjeLo9aRyHlxHPXNo5hChca8WnVqAYWWglQh+xi6qKvGFDSGFi X-Received: by 2002:a19:2d05:0:b0:4fb:fabe:27af with SMTP id k5-20020a192d05000000b004fbfabe27afmr6647501lfj.39.1690276620031; Tue, 25 Jul 2023 02:17:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690276620; cv=none; d=google.com; s=arc-20160816; b=TO+DvlaJ2r8K8ys4EQsnkYEIwWjoy7Y18eofK9TFFsiYzCyNMIUPqiNkI9GPZUiRdu V7PXHxZlQ+m1Omydi5JM5S3zXVIa00rrnl9NyVomMzOeiPmXlToTi/erhEINKyKv3Q2s APfOlPldFMcFCRNLy2FCvtuVaGCO5+SyLaoiMXJbL+9Be+VRxgaANEDAEBGlcRuBf1RI dq2icbufCjNs9pgZw3kcA1yizNTfoVPilChcCqC62uWabfs0nS8BXJe5EDHPzaOXBtAt vzyCA9qj8IyGMTBHPpd/XWSTpLGEpFFHRGUJ1ddX3NB3DL7U9d/8DA9gZjEnUAOUHrgb 9MIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:mime-version:date :message-id:dkim-signature; bh=buprnnb3RBHF65zwm/IwwS5+aARu3i5003bm80ROWy0=; fh=1+yZKWdKzFeHLQ2eW5W5l1Otm2pdf4zqSIX1lUG4G9A=; b=QbeEo6UMsoBXzis0XMfLYhxHvW5E262mAexAzT0S6w9WtKvURrnLOFTqUW+rl2X2aO 7tdhSepUHq0nekOFb/3m7+3LBCNsZX7i7HOArT1qLA2mrmzHj0X6hnKgwER3gA5YYgWo KibQnlp1mMytW2SC7opqRSOiu1HZUoKajMo30LN+L/gj+1Ne8rExqNVucQfYDWMK0Qc0 gSc8O4Lwv2w757RUguH6vNND8FnwP5G8nn5qi0mMUZM4eqeBn9GIJ0OV2XbMZI7aIPoL mUfMbXwr8U76LMQ1C1+SMl/5Q7Hrc5JreMiTM1Trx6vz6BoLybq1Cl9UIE4vADoj5YHH MTGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mailbox.org header.s=mail20150812 header.b=SMCmyN7j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=mailbox.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bc23-20020a056402205700b00521d257c064si8148628edb.480.2023.07.25.02.16.35; Tue, 25 Jul 2023 02:16:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@mailbox.org header.s=mail20150812 header.b=SMCmyN7j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=mailbox.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232546AbjGYIDt (ORCPT + 99 others); Tue, 25 Jul 2023 04:03:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230267AbjGYIDr (ORCPT ); Tue, 25 Jul 2023 04:03:47 -0400 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 570F2116 for ; Tue, 25 Jul 2023 01:03:46 -0700 (PDT) Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4R98g24n4pz9slt; Tue, 25 Jul 2023 10:03:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mailbox.org; s=mail20150812; t=1690272222; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=buprnnb3RBHF65zwm/IwwS5+aARu3i5003bm80ROWy0=; b=SMCmyN7j9PVny7FrxgViukvSyWbJbnyin+9i3/syIV7wvZj88g0WycRcQdex1QetW2IciC VvV997rgBQ04GtsEi4afak6/oJ82q/EP1Sixhh4AozCWGrfscEbJB5oTuq/drdllrij4U4 3zBEKDeJLCCszasmOi80lQfR5HgHc45T8vVfJBdOhs8Wr4zB1nMYunNGiY3+i6xh2vbj9+ 8cOGV8gNDxujBtcO3eNKtmSJeygLwfbmRJcnC4TOoeO0lWo/7rfBd5Wd+BtvBLFcVT2JCO dLGp1OX7zh4wg9xXKCIAcrm5fscLWqpIxn+KTZppI0hJig7zyPFqvZHd+55hig== Message-ID: <45a1e527-f5dc-aa6f-9482-8958566ecb96@mailbox.org> Date: Tue, 25 Jul 2023 10:03:39 +0200 MIME-Version: 1.0 Subject: Re: Non-robust apps and resets (was Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations) Content-Language: de-CH-frami, en-CA To: =?UTF-8?Q?Andr=c3=a9_Almeida?= , dri-devel@lists.freedesktop.org Cc: pierre-eric.pelloux-prayer@amd.com, Samuel Pitoiset , Randy Dunlap , Pekka Paalanen , =?UTF-8?B?J01hcmVrIE9sxaHDoWsn?= , =?UTF-8?Q?Timur_Krist=c3=b3f?= , Pekka Paalanen , amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com References: <20230627132323.115440-1-andrealmeid@igalia.com> From: =?UTF-8?Q?Michel_D=c3=a4nzer?= In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-MBO-RS-META: 5k4x7euhw4r53ugwmd8sug3xp1cxcoqf X-MBO-RS-ID: 9afac7eeae3d16ec584 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/25/23 04:55, André Almeida wrote: > Hi everyone, > > It's not clear what we should do about non-robust OpenGL apps after GPU resets, so I'll try to summarize the topic, show some options and my proposal to move forward on that. > > Em 27/06/2023 10:23, André Almeida escreveu: >> +Robustness >> +---------- >> + >> +The only way to try to keep an application working after a reset is if it >> +complies with the robustness aspects of the graphical API that it is using. >> + >> +Graphical APIs provide ways to applications to deal with device resets. However, >> +there is no guarantee that the app will use such features correctly, and the >> +UMD can implement policies to close the app if it is a repeating offender, >> +likely in a broken loop. This is done to ensure that it does not keep blocking >> +the user interface from being correctly displayed. This should be done even if >> +the app is correct but happens to trigger some bug in the hardware/driver. >> + > Depending on the OpenGL version, there are different robustness API available: > > - OpenGL ABR extension [0] > - OpenGL KHR extension [1] > - OpenGL ES extension  [2] > > Apps written in OpenGL should use whatever version is available for them to make the app robust for GPU resets. That usually means calling GetGraphicsResetStatusARB(), checking the status, and if it encounter something different from NO_ERROR, that means that a reset has happened, the context is considered lost and should be recreated. If an app follow this, it will likely succeed recovering a reset. > > What should non-robustness apps do then? They certainly will not be notified if a reset happens, and thus can't recover if their context is lost. OpenGL specification does not explicitly define what should be done in such situations[3], and I believe that usually when the spec mandates to close the app, it would explicitly note it. > > However, in reality there are different types of device resets, causing different results. A reset can be precise enough to damage only the guilty context, and keep others alive. > > Given that, I believe drivers have the following options: > > a) Kill all non-robust apps after a reset. This may lead to lose work from innocent applications. > > b) Ignore all non-robust apps OpenGL calls. That means that applications would still be alive, but the user interface would be freeze. The user would need to close it manually anyway, but in some corner cases, the app could autosave some work or the user might be able to interact with it using some alternative method (command line?). > > c) Kill just the affected non-robust applications. To do that, the driver need to be 100% sure on the impact of its resets. > > RadeonSI currently implements a), as can be seen at [4], while Iris implements what I think it's c)[5]. > > For the user experience point-of-view, c) is clearly the best option, but it's the hardest to archive. There's not much gain on having b) over a), perhaps it could be an optional env var for such corner case applications. I disagree on these conclusions. c) is certainly better than a), but it's not "clearly the best" in all cases. The OpenGL UMD is not a privileged/special component and is in no position to decide whether or not the process as a whole (only some thread(s) of which may use OpenGL at all) gets to continue running or not. > [0] https://registry.khronos.org/OpenGL/extensions/ARB/ARB_robustness.txt > [1] https://registry.khronos.org/OpenGL/extensions/KHR/KHR_robustness.txt > [2] https://registry.khronos.org/OpenGL/extensions/EXT/EXT_robustness.txt > [3] https://registry.khronos.org/OpenGL/specs/gl/glspec46.core.pdf > [4] https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c#L1657 > [5] https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/gallium/drivers/iris/iris_batch.c#L842 -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer