Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp14516117rwb; Mon, 28 Nov 2022 01:52:00 -0800 (PST) X-Google-Smtp-Source: AA0mqf7baucoTxhKYyxeCy05vcRf6j/hxLOH4TJzfOEPYKjnirDrs7O3ZzBHPVQWb6JcsVeWsj/u X-Received: by 2002:a63:5a48:0:b0:45f:88b2:1766 with SMTP id k8-20020a635a48000000b0045f88b21766mr26396640pgm.357.1669629120451; Mon, 28 Nov 2022 01:52:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669629120; cv=none; d=google.com; s=arc-20160816; b=KwjhC1REBnK+YfMl9sd4jn+fWtbntFT0+Lh7swfoy7DynsXsY04Cy1ACMJRdwDnP0V O7F6iEvKapTdpV1Q/jQJRp00jmf3Jgy6XtlUHstDwnUtHKvua35/MUoPjaFIgwjGiBxK V+lGd7D+ZzT31/h8tgdm7zAlFgfR5eVpPtmv/dR+SDsPYGUGtzPZSEGCdbKftwnYef8p Qp2S4htfmEEx2hby8nGcbAbmvZqa7avY/v4yOhHJTslaHtCPfpw6svebms7212g2vxYC lgWeTVqWcaaQobsQaKfdRYrKEZ4+lbhNfk/OdMJronmwbu+zmEhG+2RfkIMNn167Kbib Y5lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=N+sBWVG81GD3481BhUaem1GToFImFejdTt3G+jJFbTA=; b=aYYafWXdEEiCDKiwxUoSsuzTjzHuGl8uDBhHtUfSF/Bq60w51oAC3EWXA1qjsd9vFM YJZAJ8SLXyclJwULRRw3cXUOaJWdEZLUcmA6ZtI+y6C6W8tCbWpBc0EvtHaoWkk+KUEb Rh9KhYQvT+hyvMYOfjp5dZlVmlYPpxskQspQvJR2CDMK22hqDCC05hKT5YeFs8MeIZmF puFCy2B+OQHBIzPA55XK1MqsQNe6qQ+qwVpbkL0s3t3kadAsmQi+KPPd2Kt7NdIb5a0j kJbvsotCRL2cnlrZ9zvkUFVFKu1g7rhqAedgeWo5rzLgxhCqzxXs9uKtz6SFpdLjwwqv Hi3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=hjUgUErD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p19-20020a63fe13000000b0046f55fec417si10319705pgh.650.2022.11.28.01.51.49; Mon, 28 Nov 2022 01:52:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=hjUgUErD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230359AbiK1J2L (ORCPT + 85 others); Mon, 28 Nov 2022 04:28:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230064AbiK1J2D (ORCPT ); Mon, 28 Nov 2022 04:28:03 -0500 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C0CC13EB3 for ; Mon, 28 Nov 2022 01:28:00 -0800 (PST) Received: by mail-lf1-x12e.google.com with SMTP id g12so16413348lfh.3 for ; Mon, 28 Nov 2022 01:28:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:message-id:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=N+sBWVG81GD3481BhUaem1GToFImFejdTt3G+jJFbTA=; b=hjUgUErDZ+19NMPpHO7f+9ODRzxn+XK2Ycdjza/1dE3OxfZYjyNx40iDMwU+Eisfm1 yNq8UDD8yh3er3beQUzsJovBe3rFyiYnNJUHz5PWRr7qFrIpPnbx9DVWgkb0jgu/uocT NuJYpGCuBCbwGrvXDcC0MIz4NyvR6U6b0mHFLLQf9g8HTgsL1ErpKZs4I2t166HCJfIA RSGo3DtnQz3CjwksURfJi5eYIh4Vy7yLh0opskGu8uMpQe8wBE8HX8TDSWYtC5RiL93m s4k55xnmzP0oguaVDnWPRIJUYThTEcSZUebrdpyunRGuqyvaGLVxW4srIZlPw3dmt9eC JfLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:in-reply-to:message-id:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N+sBWVG81GD3481BhUaem1GToFImFejdTt3G+jJFbTA=; b=p/68ewNMqE8x3XTAjwPrFO4ON50NxmezRJD3jHVZQl4UODE7TC4GiBxIBnCOZd1hJu 1noyTTZkr3YhSgmXzho4rojRmBXLP6xl2oZe5o3Ihhp8DH8d3Uekk30pUOBEmRJdNpAI zpE5tBDkk03/sOu9Pn2J3YPlSjBTAgDvodfW9c6AuLCL+yqVrDWjUIxeZowhOcSS63Xf 5i2RUax8lufExiT4bzdWaA8RNVqflcA1kuliVfucjGkbDe503tlCgOExUKDpi805j/+M NBblU+beVv6lLiKR6hEQ1xlbvmkFdOkjGomC+1Ug6wl51zDMxVz3hwZHMDXakfLItYUE Waxw== X-Gm-Message-State: ANoB5pltmIpZNBdSGvKMjkOdn3uJfKH5c1ljnpv2hBBkyTCMIu5mTrBK 4yTUgmTC0CNvXncPuT3Px/w= X-Received: by 2002:a19:dc02:0:b0:4a8:b9c6:8741 with SMTP id t2-20020a19dc02000000b004a8b9c68741mr12423161lfg.178.1669627678695; Mon, 28 Nov 2022 01:27:58 -0800 (PST) Received: from eldfell ([194.136.85.206]) by smtp.gmail.com with ESMTPSA id r5-20020ac25c05000000b004b4f3c0d9f8sm1657493lfp.283.2022.11.28.01.27.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Nov 2022 01:27:58 -0800 (PST) Date: Mon, 28 Nov 2022 11:27:55 +0200 From: Pekka Paalanen To: =?UTF-8?B?QW5kcsOp?= Almeida Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, alexander.deucher@amd.com, contactshashanksharma@gmail.com, amaranath.somalapuram@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, Simon Ser , Rob Clark , Andrey Grodzovsky , Daniel Vetter , Daniel Stone , 'Marek =?UTF-8?B?T2zFocOhayc=?= , Dave Airlie , "Pierre-Loup A . Griffais" , Shashank Sharma Subject: Re: [PATCH v3 1/2] drm: Add GPU reset sysfs event Message-ID: <20221128112755.52df3f6b@eldfell> In-Reply-To: <20221125175203.52481-2-andrealmeid@igalia.com> References: <20221125175203.52481-1-andrealmeid@igalia.com> <20221125175203.52481-2-andrealmeid@igalia.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="Sig_/aUlzZ38K0WssssJjVJMpEdE"; protocol="application/pgp-signature"; micalg=pgp-sha256 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Sig_/aUlzZ38K0WssssJjVJMpEdE Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, 25 Nov 2022 14:52:02 -0300 Andr=C3=A9 Almeida wrote: > From: Shashank Sharma >=20 > Add a sysfs event to notify userspace about GPU resets providing: > - PID that triggered the GPU reset, if any. Resets can happen from > kernel threads as well, in that case no PID is provided > - Information about the reset (e.g. was VRAM lost?) >=20 > Co-developed-by: Andr=C3=A9 Almeida > Signed-off-by: Andr=C3=A9 Almeida > Signed-off-by: Shashank Sharma > --- >=20 > V3: > - Reduce information to just PID and flags > - Use pid pointer instead of just pid number > - BUG() if no reset info is provided >=20 > V2: > - Addressed review comments from Christian and Amar > - move the reset information structure to DRM layer > - drop _ctx from struct name > - make pid 32 bit(than 64) > - set flag when VRAM invalid (than valid) > - add process name as well (Amar) > --- > drivers/gpu/drm/drm_sysfs.c | 26 ++++++++++++++++++++++++++ > include/drm/drm_sysfs.h | 13 +++++++++++++ > 2 files changed, 39 insertions(+) >=20 > diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c > index 430e00b16eec..85777abf4194 100644 > --- a/drivers/gpu/drm/drm_sysfs.c > +++ b/drivers/gpu/drm/drm_sysfs.c > @@ -409,6 +409,32 @@ void drm_sysfs_hotplug_event(struct drm_device *dev) > } > EXPORT_SYMBOL(drm_sysfs_hotplug_event); > =20 > +/** > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset > + * @dev: DRM device > + * @reset_info: The contextual information about the reset (like PID, fl= ags) > + * > + * Send a uevent for the DRM device specified by @dev. This informs > + * user that a GPU reset has occurred, so that an interested client > + * can take any recovery or profiling measure. > + */ > +void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_even= t_info *reset_info) > +{ > + unsigned char pid_str[13]; Hi, "PID=3D4111222333\0" I count 15 bytes instead of 13? > + unsigned char flags_str[18]; > + unsigned char reset_str[] =3D "RESET=3D1"; > + char *envp[] =3D { reset_str, pid_str, flags_str, NULL }; It seems you always send PID attribute, even if it's nonsense (I guess zero). Should sending nonsense be avoided? > + > + DRM_DEBUG("generating reset event\n"); > + > + BUG_ON(!reset_info); > + > + snprintf(pid_str, sizeof(pid_str), "PID=3D%u", pid_vnr(reset_info->pid)= ); Passing PID by number is racy, but I suppose it has two rationales: - there is no way to pass a pidfd? - it's safe enough because the kernel avoids aggressive PID re-use? Maybe this would be good to note in the commit message to justify the design. What about pid namespaces, are they handled by pid_vnr() auto-magically somehow? Does it mean that the daemon handling these events *must not* be running under a (non-root?) pid namespace to work at all? E.g. if you have a container that is given a dedicated GPU device, I guess it might be reasonable to want to run the daemon inside that container as well. I have no idea how pid namespaces work, but I recall hearing they are a thing. > + snprintf(flags_str, sizeof(flags_str), "FLAGS=3D0x%llx", reset_info->fl= ags); > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); > +} > +EXPORT_SYMBOL(drm_sysfs_reset_event); > + > /** > * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any con= nector > * change > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h > index 6273cac44e47..dbb0ac6230b8 100644 > --- a/include/drm/drm_sysfs.h > +++ b/include/drm/drm_sysfs.h > @@ -2,15 +2,28 @@ > #ifndef _DRM_SYSFS_H_ > #define _DRM_SYSFS_H_ > =20 > +#define DRM_RESET_EVENT_VRAM_LOST (1 << 0) Since flags are UAPI, they should be documented somewhere in UAPI docs. Shouldn't this whole event be documented somewhere in UAPI docs to say what it means and what attributes it may have and what they mean? Thanks, pq > + > struct drm_device; > struct device; > struct drm_connector; > struct drm_property; > =20 > +/** > + * struct drm_reset_event_info - Information about a GPU reset event > + * @pid: Process that triggered the reset, if any > + * @flags: Extra information around the reset event (e.g. is VRAM lost?) > + */ > +struct drm_reset_event_info { > + struct pid *pid; > + uint64_t flags; > +}; > + > int drm_class_device_register(struct device *dev); > void drm_class_device_unregister(struct device *dev); > =20 > void drm_sysfs_hotplug_event(struct drm_device *dev); > +void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_even= t_info *reset_info); > void drm_sysfs_connector_hotplug_event(struct drm_connector *connector); > void drm_sysfs_connector_status_event(struct drm_connector *connector, > struct drm_property *property); --Sig_/aUlzZ38K0WssssJjVJMpEdE Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEJQjwWQChkWOYOIONI1/ltBGqqqcFAmOEfxsACgkQI1/ltBGq qqffnhAAn/qNAIPJJSqP+3/6qTZQhkiLuZ+MvbCFtrLOTr1DAd6yeNcWF8biypY2 J5CI42EyZfMiuwiVgsyEESGrf6/gGhF7G47NzsA1oal4iDkj9WZ8Wizzyw8M84gS 1DqX0ACVl7dDu741GZo1H3fDrnAEqdrvmJ5vzHeVb3EQovg8ZdSVJutuaqXqqP2L 6haRA6KGyTNV+pr5sEDiXPm6oj7sQgvaB4mJ4odnczUmIS9IJdaI+nZ1PFkIaD0E BepXG805OeZCylLW56ikirup6GnKjhAkNShgySlK9cfOmDlerv5n5wKjgjTixJC9 tXbLUlXkbMDPuRFnaaWa0r7fYKbWBooBRtuaFimry0mStYbpqhYkdvWPa21ZzNsD nFo7eAWv6fkue9m5bdWVMeVh1B0Fy6Jyi9ZiO/kIZ6Fisd9Mx2eqIWq2+X2EIEIx C4GJ4v19EDNZ15FYisPUkUyTfnwOaY0ifK04PRJItYILm75VxV/64oclekA+APQ0 sIBbwU9XUg2WobEi6pe7l4V9EfhW1+MzuvuB9z0mZ8krBoMVCnpDWv2Jcr4kq8U5 l1Onx8sN/hFlB3JR4DoWjd2kHuUfqU2yAvdxAaA76DlK6blfEfjYgjD7rENJHdPl wPpV57+ercFPrdfKKmW1AZr6z5Wpvizn02tX0emPuK/wYF3zuFg= =lH7L -----END PGP SIGNATURE----- --Sig_/aUlzZ38K0WssssJjVJMpEdE--