Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp186917ioo; Fri, 20 May 2022 18:10:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3qVBvnnrduXPmgsIwcVI7s0+sZrgi8ZpoAWM9HALDCuySy17er8I1Xn9YLItQaNZQkb7v X-Received: by 2002:a17:903:1c4:b0:15e:9f34:378c with SMTP id e4-20020a17090301c400b0015e9f34378cmr12277488plh.87.1653095431656; Fri, 20 May 2022 18:10:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653095431; cv=none; d=google.com; s=arc-20160816; b=NONxl96+RdJF9Dww/vOJ6p9MULoguekRq1viE6Uev0smA08W5t9n8knYytfutC9pxG meLTWvc07dnZ5aIfxi2T9GRHN7Wv9aYC8KnZ6sETMU3WXv5RzXZjkEHGp3O+vLkKCE9m HSuISVfoOShrlUF24ACGXGPdEXULWeax1WqzuEtnpg5fqS67oplrCAGRIbIdupaVAuTZ hrsQBLwXNl7AqMDI6WZRQF413j2ZwsxaCfEI0WDsGROzhiho6OnjCwgyh6cFywaotXU6 u7wGLGfHT2rlf6AZmRCmCt5TiTajcg+y/P4MDmSsATdGR1xDPsJTmK2hX+jqlnqIwtmw Bq8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature:dkim-signature; bh=p/ZcdHvmtB8X1rkCDJsKxIe35fAqtqdi3lNcAexvvSo=; b=kNJLBllOvegXE7mFcmTpaqNS3FGoKfTIUiWfYN1e6Ob5yGsvXKdQQHe2O0hv9AR1dg w0jo4DqMB8dh8uafNM6DvKVvYE8Ofh9EVgUM5L6GNJuaU4Zs9GOJZ5nHO++xpFcfyH1b utxDGHDh/M8jYFAQl9T8Uk98D5pB9SmhOEHj5hzGQPYupYOt5CzeH8E+QSf34GQWsJ/+ OJA+eZEzVzy/Bl9psYJ/FiSgBYorLhR+5D3pDUchGsORazBGyzY0zLEsh2bU6ktjSaqO t+MNhScaaUpkU4/1EInWV4tUtPaPQmBJ1n0hR5eDApOx/oYkw9ixRJCpzmHGnb8+qvkf fEKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (expired) header.i=@invisiblethingslab.com header.s=fm1 header.b=PgZzioXT; dkim=neutral (expired) header.i=@messagingengine.com header.s=fm1 header.b=Evw7E5LD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t1-20020a63b701000000b003c63bf66bcesi1315954pgf.685.2022.05.20.18.10.18; Fri, 20 May 2022 18:10:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=neutral (expired) header.i=@invisiblethingslab.com header.s=fm1 header.b=PgZzioXT; dkim=neutral (expired) header.i=@messagingengine.com header.s=fm1 header.b=Evw7E5LD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243761AbiESSTB (ORCPT + 99 others); Thu, 19 May 2022 14:19:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243806AbiESSSX (ORCPT ); Thu, 19 May 2022 14:18:23 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3CE4EBA98 for ; Thu, 19 May 2022 11:18:21 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 5CD0C5C01FC; Thu, 19 May 2022 14:18:19 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 19 May 2022 14:18:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= invisiblethingslab.com; h=cc:cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1652984299; x= 1653070699; bh=p/ZcdHvmtB8X1rkCDJsKxIe35fAqtqdi3lNcAexvvSo=; b=P gZzioXTTzenseX65+yBfXdA8/uzdM3p94L6iHsPd3BJfqvwcboOTcO/BHCn0T/LN edSz1SM9MB+SyXd3ZTJ6pJRUTGKdZuBuqdUtzWyrDEjdYe+LwXcmvDrlSPpz1ygM 38NJpPrU0kx2LeJVMyiQZxVl5BT0XloxnhrTuyi81zJxi6b5XiGdynYyFc2BWGXK vPnsfC0Enf0A0eq3/zJ6WJ+ioxEcvyw8ZVAsKXrUdLRpG7InuX/cIc6tI2TgxtPj VzD2rcpCFGmqTNSYziIEIHvrJItJ5gofJn4gRKch75hk4cBYVBnUrAQOda4mPjhk LH8ofMDuf6VF2id38XPjQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1652984299; x=1653070699; bh=p/ZcdHvmtB8X1rkCDJsKxIe35fAq tqdi3lNcAexvvSo=; b=Evw7E5LD2FXeQTxFHCfso+lRe/JtkQrZYFw5W0mBCzqo MMV46hiPpAQX5q5iDScPgFexHo48yJEkBdfx7h109FGeQ8UAh2eU/L/wNac3P4Mp iRyW0LhqrjDQPnvZUl1LKkNWJIBXbCyfv5m/cq4N0hUXVcXeZH9hq94pU5VAim98 9S3jmvi3bhIBMlYDi5ibNeuh9kDidt62ZjpYwuzO9+kTA+AiEC/KLXQR1gv40KSF /WSzmes4KTssmmceM16KeFaPxLqqBEsf+8O7rA0vToKTNm0gIyZFEezl8OJGYrmf n/UyMRRlWzZuq/+weMrtt7+5XIW4UVTkDGmS8+1wBA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedriedugdduvdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtjeenucfhrhhomhepffgvmhhi ucforghrihgvucfqsggvnhhouhhruceouggvmhhisehinhhvihhsihgslhgvthhhihhngh hslhgrsgdrtghomheqnecuggftrfgrthhtvghrnhepuedthefhtddvffefjeejvdehvdej ieehffehkeekheegleeuleevleduteehteetnecuffhomhgrihhnpehgihhthhhusgdrtg homhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegu vghmihesihhnvhhishhisghlvghthhhinhhgshhlrggsrdgtohhm X-ME-Proxy: Feedback-ID: iac594737:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 19 May 2022 14:18:18 -0400 (EDT) Date: Thu, 19 May 2022 14:17:46 -0400 From: Demi Marie Obenour To: Juergen Gross , Xen developer discussion Cc: Boris Ostrovski , Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= , linux-kernel@vger.kernel.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Daniel Vetter , Intel Graphics Development , DRI Development Subject: Re: Hang in 5.17.4+ that appears to be due to Xen Message-ID: References: <55436ae1-8255-1898-00df-51261080cd41@suse.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2ArJoZLmXtBnMmHx" Content-Disposition: inline In-Reply-To: <55436ae1-8255-1898-00df-51261080cd41@suse.com> X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --2ArJoZLmXtBnMmHx Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Thu, 19 May 2022 14:17:46 -0400 From: Demi Marie Obenour To: Juergen Gross , Xen developer discussion Cc: Boris Ostrovski , Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= , linux-kernel@vger.kernel.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , David Airlie , Daniel Vetter , Intel Graphics Development , DRI Development Subject: Re: Hang in 5.17.4+ that appears to be due to Xen On Mon, May 16, 2022 at 08:48:17AM +0200, Juergen Gross wrote: > On 14.05.22 17:55, Demi Marie Obenour wrote: > > In https://github.com/QubesOS/qubes-issues/issues/7481, a user reported > > that Xorg locked up when resizing a VM window. While I do not have the > > same hardware the user does and thus cannot reproduce the bug, the stack > > trace seems to indicate a deadlock between xen_gntdev and i915. It > > appears that gnttab_unmap_refs_sync() is waiting for i915 to free the > > pages, while i915 is waiting for the MMU notifier that called > > gnttab_unmap_refs_sync() to return. Result: deadlock. > >=20 > > The problem appears to be that a mapped grant in PV mode will stay in > > the =E2=80=9Cinvalidating=E2=80=9D state until it is freed. While MMU = notifiers are > > allowed to sleep, it appears that they cannot wait for the page to be > > freed, as is happening here. That said, I am not very familiar with > > this code, so my diagnosis might be incorrect. >=20 > All I can say for now is that your patch seems to be introducing a use af= ter > free issue, as the parameters of the delayed work might get freed now bef= ore > the delayed work is being executed. >=20 > I don't know why this is happening only with rather recent kernels, as the > last gntdev changes in this area have been made in kernel 4.13. >=20 > I'd suggest to look at i915, as quite some work has happened in the code > visible in your stack backtraces rather recently. Maybe it would be possi= ble > to free the pages in i915 before calling the MMU notifier? Honestly, I would rather fix this in gntdev, regardless of where the actual bug lies. GPU drivers get little testing under Xen, so if something like this happens again, it is likely to remain undiscovered until a Qubes user files a bug report. This results in a bad experience for Qubes users. I would much rather code that works on bare metal work in Xen dom0 too. I have had random hangs in the past (with various kernel versions) that might be due to similar problems. Furthermore, similar problems can arise whenever a driver removes an MMU notifier on userspace pages that someone else has references to. It is hard for me to see how this is the fault of the driver that removed the MMU notifier. I find it much more plausible that the correct fix is on the Xen side: allocate the delayed work parameters on the heap, and free them after the work is finished. This eliminates this entire class of bugs. --=20 Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab --2ArJoZLmXtBnMmHx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmKGiegACgkQsoi1X/+c IsGnpg/+OOZRF3GOjff1/VSuGntrXoauU8RINbU98Rfv9GpTzq1h6l4EdlqIX7kX OFO2d6uVwkuJNFWOu0TWf5Yqz9k/G0fK2q2GUTikDYq3YyTmvUbsClkKBtJVjGxY mkHM21nJ/ao7ZET/UyKI8obwWeuC/VHIcTjjB2iKe4sYtltyWd9gfiJweJxt7Nd2 C/FqerrslTLEz42B/OO31qstbqxuXadl7L6kQhq3CIhtVl+ZEA8/8p92BKaEP/Wp Lsv2jkfDGIEmAWvZuEvaTkS19yQCeKq3jBoUgXHuYwSdEIX0A7x+MblengVfqrFv ZFYF5TVn7HSziGTVahYm/McAH2kCfmVsSh5DAB8v+k4EyDUsrbaPLSJGZboQsutb yUeqSueUV+fPvnD6XyWORtIQvOmeF9ecUWH9AUcWEtDrC1DxCeJADYcmE+RuLYuv t3t9OEo2SB55CmS28u9BIYFmlUDDKVvsQ3xUxDR9114HY++Dulj0W1R3ZrW0mWHH 49OrKp47DvmdaX1OyRhI/3RaPc4u+OTV8Z/1jWL0cHrybsimiTkjySKPXonIaO2E emd+UWqLoYAyQLZ+u8alEI5s7IBjeXPcL8VEwh6K7nblCjK3yfqdOCyYRem+1lgx JlHonk/YOiLPfZG1xN29p+/vRnzQBaZR28py7GpsVBsniWQQxS4= =DHnk -----END PGP SIGNATURE----- --2ArJoZLmXtBnMmHx--