Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp17980526rwd; Tue, 27 Jun 2023 09:56:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5wNtoepmpQzjAi0iJtWgq1j9BEAyxnxkQ5IQcLE13W914dKPp87U/qXaaWHF29s/Yd5KiD X-Received: by 2002:a17:90a:ea13:b0:263:131d:a6e9 with SMTP id w19-20020a17090aea1300b00263131da6e9mr2275567pjy.19.1687884971039; Tue, 27 Jun 2023 09:56:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687884971; cv=none; d=google.com; s=arc-20160816; b=iDwMr5O1zni8/XjlIU3AU15GnbSI/Zof7hSZybgM0GALUkprOSC7pB42NmpRiVX1tm X1SmIfgUKPNckIbyjh9ur26+/XCaQIvjy9EhXMKd2KxV2HtRUDMWcwUW/bqsxEkjuzEB +g614oCL2C7iNPEDVjkOHIuw6i5LsfDetkzIMs7M4kIOf/kVBnAZJUhYyIhQi3C28XOW LffLwi5ZC6ZldrHpzrNv+Ae2z6RYhUhargn7T43tRvOJNzQ28sJg7u0n0avmf6ujY1qC xuWHgq9+spbw059O1kn+tE269Ba882ty0YluHry6NIohdC+Vhy4yMZgL79ybfhGINONt 2zXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=mrpMVPePqZcp/TARbg/0wmIdMAyOFZ9lEopNh2bXlGI=; fh=nYqqj/aU81vsIIPq5/PF8Cn6DdzDhGkfc18wLTMuwaY=; b=sizfbguJVTzvXJQyzBdxOopnyIIOGwZkQIBw/r1IYPtq8QY3UADYjW13hvQGPQDs0s QuXIFfdAdT4pIr0BzzvZvAH/KtacLwXon6i+2nol1qX5u/givSpJaYacbE2EbgvOhU3f Qa9coYikjIarubwm3iuDvzXQ0mRNrVFduePtt+F8oKqhicLtlvH1/py0/1priK2tv6aq 1O8Tup134629/AVjMqqwRj/QSJqI3QvgY1SxgX+ZKXv2sLl+QuuarOKmlfRJX3PmKhwV sfdgdeguBO785MsphWyCD8E3vLRIW788v0J8XRGp7nKNFawspIXkZdt7XL/iv8zRcSz4 DSDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=IVmklvoT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id on16-20020a17090b1d1000b00250a2c9a793si8126019pjb.152.2023.06.27.09.55.56; Tue, 27 Jun 2023 09:56:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=IVmklvoT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231766AbjF0QJR (ORCPT + 99 others); Tue, 27 Jun 2023 12:09:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbjF0QJQ (ORCPT ); Tue, 27 Jun 2023 12:09:16 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1834D3590 for ; Tue, 27 Jun 2023 09:09:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=mrpMVPePqZcp/TARbg/0wmIdMAyOFZ9lEopNh2bXlGI=; b=IVmklvoTr61w2Y5JHaAluqrrkB QwielsATQOForcre5ApJ9MjIbkF8fEFy/wGOG5WKWKvMU/lKVfOBu4NlEZ2iqPT5uV0eBbi5Yt80Q 5CZ4Tm9aJCdXIx9Rwa9o6bc86Okrc+tMhsr2vY35p3QeVwJJJyrkWy1IjE+hnwrXFrKANLxjiSqF4 QBYV3cnNUvN8HliTm0RlH4XHR028lTdYvGV2al0Ju3+ctFHtngCtSkjPM1zlylzCajOm42iR/4Znq DMBUpYOOBapxI87fWPI2/ETyetfW6hkI1pBktjX8FXUuqCocpjiZexjEjrdxdCZW96K4Kp9AAQCaO WURxOjjg==; Received: from [2601:1c2:980:9ec0::2764] by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qEBFW-00DbMZ-1S; Tue, 27 Jun 2023 16:09:02 +0000 Message-ID: Date: Tue, 27 Jun 2023 09:09:01 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH v5 1/1] drm/doc: Document DRM device reset expectations Content-Language: en-US To: =?UTF-8?Q?Andr=c3=a9_Almeida?= , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, Simon Ser , Rob Clark , Pekka Paalanen , Daniel Vetter , Daniel Stone , =?UTF-8?B?J01hcmVrIE9sxaHDoWsn?= , Dave Airlie , =?UTF-8?Q?Michel_D=c3=a4nzer?= , Samuel Pitoiset , =?UTF-8?Q?Timur_Krist=c3=b3f?= , Bas Nieuwenhuizen , Pekka Paalanen References: <20230627132323.115440-1-andrealmeid@igalia.com> From: Randy Dunlap In-Reply-To: <20230627132323.115440-1-andrealmeid@igalia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi André, I have just a few more below: On 6/27/23 06:23, André Almeida wrote: > Create a section that specifies how to deal with DRM device resets for > kernel and userspace drivers. > > Acked-by: Pekka Paalanen > Signed-off-by: André Almeida > --- > > v4: https://lore.kernel.org/lkml/20230626183347.55118-1-andrealmeid@igalia.com/ > > Changes: > - Grammar fixes (Randy) > > Documentation/gpu/drm-uapi.rst | 68 ++++++++++++++++++++++++++++++++++ > 1 file changed, 68 insertions(+) > > diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst > index 65fb3036a580..3cbffa25ed93 100644 > --- a/Documentation/gpu/drm-uapi.rst > +++ b/Documentation/gpu/drm-uapi.rst > @@ -285,6 +285,74 @@ for GPU1 and GPU2 from different vendors, and a third handler for > mmapped regular files. Threads cause additional pain with signal > handling as well. > > +Device reset > +============ > + > +The GPU stack is really complex and is prone to errors, from hardware bugs, > +faulty applications and everything in between the many layers. Some errors > +require resetting the device in order to make the device usable again. This > +sections describes the expectations for DRM and usermode drivers when a section > +device resets and how to propagate the reset status. > + > +Kernel Mode Driver > +------------------ > + > +The KMD is responsible for checking if the device needs a reset, and to perform > +it as needed. Usually a hang is detected when a job gets stuck executing. KMD > +should keep track of resets, because userspace can query any time about the > +reset stats for an specific context. This is needed to propagate to the rest of for a specific stats or status? > +the stack that a reset has happened. Currently, this is implemented by each > +driver separately, with no common DRM interface. > + > +User Mode Driver > +---------------- > + > +The UMD should check before submitting new commands to the KMD if the device has > +been reset, and this can be checked more often if the UMD requires it. After > +detecting a reset, UMD will then proceed to report it to the application using > +the appropriate API error code, as explained in the section below about > +robustness. > + > +Robustness > +---------- > + > +The only way to try to keep an application working after a reset is if it > +complies with the robustness aspects of the graphical API that it is using. > + > +Graphical APIs provide ways to applications to deal with device resets. However, > +there is no guarantee that the app will use such features correctly, and the > +UMD can implement policies to close the app if it is a repeating offender, > +likely in a broken loop. This is done to ensure that it does not keep blocking > +the user interface from being correctly displayed. This should be done even if > +the app is correct but happens to trigger some bug in the hardware/driver. > + > +OpenGL > +~~~~~~ > + > +Apps using OpenGL should use the available robust interfaces, like the > +extension ``GL_ARB_robustness`` (or ``GL_EXT_robustness`` for OpenGL ES). This > +interface tells if a reset has happened, and if so, all the context state is > +considered lost and the app proceeds by creating new ones. If it is possible to > +determine that robustness is not in use, the UMD will terminate the app when a > +reset is detected, giving that the contexts are lost and the app won't be able > +to figure this out and recreate the contexts. > + > +Vulkan > +~~~~~~ > + > +Apps using Vulkan should check for ``VK_ERROR_DEVICE_LOST`` for submissions. > +This error code means, among other things, that a device reset has happened and > +it needs to recreate the contexts to keep going. > + > +Reporting causes of resets > +-------------------------- > + > +Apart from propagating the reset through the stack so apps can recover, it's > +really useful for driver developers to learn more about what caused the reset in > +first place. DRM devices should make use of devcoredump to store relevant the first place. > +information about the reset, so this information can be added to user bug > +reports. > + > .. _drm_driver_ioctl: > > IOCTL Support on Device Nodes and with those addressed: Reviewed-by: Randy Dunlap Thanks for adding the documentation. -- ~Randy