Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp599922lqp; Thu, 21 Mar 2024 09:54:21 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX26BItgUt8jjLvEa90/DBRBKhOKFFeUxouJiOI6DZ6gEPy4C88/Dkzk5ADyJZ9mtXP6Xelj3zUo0yqm0TWXcKjvQxkdq6UgTRHLByIKw== X-Google-Smtp-Source: AGHT+IFqUchVDukw4d/0xKRlMnXubYycEQz97NsR+Qx6K0hIH8gIL75aGwtRMdFPsFYpdGcxSHBi X-Received: by 2002:aa7:ccd0:0:b0:56b:9b23:40bc with SMTP id y16-20020aa7ccd0000000b0056b9b2340bcmr5052517edt.15.1711040061514; Thu, 21 Mar 2024 09:54:21 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711040061; cv=pass; d=google.com; s=arc-20160816; b=FnckIo+aqt1+r04VR3Yd+JcnUNdo+trfjxFJPMch3MyQwDW7pergKTGKGUx7c+wxw7 184jEgp9kzkEaQW+pmgWZGI773GZvjhRQEzdiVg1R+Xxmp+uQd/18d19Dz+pFYfyyKSW TS6Kpvn+bFOL7YAo1e5NyA6vQozsFW67hOd9ihDSt/dwlHsplfD31+D7nuAa7EHj1xSa +RPEPXI3/MaiyStRk2W9wEK8H7S8mSJHxvFRdXD3KKj5V9+hQ75Ee5qioi+9F55HdwFq Na98+JV2tAplpJWtLGWk5utk0JgedkAOdTLdhhGu10tBnXU3gOteb+Z0tfq1jFASsSBX qlPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=zG0bkifvwKsIQlrm90OwcFfslvJB5gnEYqOxXgRC5+Y=; fh=/Tvl+uDAQ0HiiOTkl8shMIh6FPVPcGwJkfACwg8QSrE=; b=n97UwDXH1GV9oUayIr+LUjAheduoDAr77iT90G0GLcK/BObjSfrsOXfFx8qPtuq/8S fg9g6qHaSecGBjLKnRFkzh+0Of/bKMw5SCCIy2A5+JkmaTM5eLugkOjuXB8p3fzKaoJT 6oq2QdWDyigZGQRg/JUCamYaQ5aM/aYma7JX0IAbtDEs9/kcCqpo0G8fMLCtrccjkOFb 02dXMNcLMgI1Y8+/mbt/JNS8uTfjrfEDvOsVP93/JYbdjIIB6dzk/QmUGsfUXT1ZGMSX ErT/r3s1y/IYsDaX39IBDHhhJqEL+Q/Vwf94xQkd4OtFoKrdk2X8PvOgrHDtT3bKXVzd eHPQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mQYwfrRK; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-110399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-110399-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id w9-20020a056402128900b0056bb698acccsi57385edv.430.2024.03.21.09.54.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Mar 2024 09:54:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-110399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mQYwfrRK; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-110399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-110399-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0F82D1F21454 for ; Thu, 21 Mar 2024 16:54:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A5DFFE54D; Thu, 21 Mar 2024 16:40:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mQYwfrRK" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E91112B7E; Thu, 21 Mar 2024 16:40:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711039258; cv=none; b=L3XaVyLriFyUeE3dFEiTy4BRR73zwSZje6WCqamQtsfOkoekwzJowyEtoIPFwK1fixEWgrh/+K4qHW3Lk+qaGjpwUzZPV+b50M1ikxZwLJnOfTP8y4/6tVoSzZnGaEIVF4fH2Qqf2drr63M+YcxqUgf4k6HIinG2bFuoFNXmtGY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711039258; c=relaxed/simple; bh=Rg3Qyg/aCiHbBDT5o4Egzs5J5LChZshjXRXkzKdFYoY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=MPg85tYRU76/lmcUA5eqeJon8IwRua5/ew6Rf6xUBTfrH8ljD8OpK74dCMwFirlnFf6H18l1Q3VBWPH1Tcjw5D19FnuLlI1yndth8H+d3anDet5TNBDbx37EguHkzFI43u/do+v5BMAYBRgK2JfLHT0mPFSxHUdRnfN37aIvTd8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mQYwfrRK; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F2E6C433C7; Thu, 21 Mar 2024 16:40:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711039258; bh=Rg3Qyg/aCiHbBDT5o4Egzs5J5LChZshjXRXkzKdFYoY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mQYwfrRKlmBpZtRh1sVLmwtfHGxXtsJlqNmo7dVaTfOapv2REcGyrzPgytUQaNsI1 cPcLN3qiahtFIPWyKZ9T0fAXrFHKwVQBW34wxyRIZ+hWmtkfcx16FKQdwwcBPgl5XE 2plqgCGYA+6yi+IWm9viO2oONOtI1aevMJiH9zsCSJwaztgUoMSmAzrloaPWlHyRTG t2mvRCCC5ujTAaLOxyYLvyIqrgvJogGFKxW2a8frOJwoaHeD4Ge0G/UPjzE/2tH/z4 w1GL3bt+o9ASG47/cPbHsBesEatxUqiOdqij4KhEp2AIHI/lE+jKwOWiQqeuZAjYFE tzRX8goZyCciQ== Received: from johan by xi.lan with local (Exim 4.97.1) (envelope-from ) id 1rnLTV-0000000082A-2Ev6; Thu, 21 Mar 2024 17:41:06 +0100 Date: Thu, 21 Mar 2024 17:41:05 +0100 From: Johan Hovold To: Abhinav Kumar Cc: freedreno@lists.freedesktop.org, Rob Clark , Dmitry Baryshkov , Sean Paul , Marijn Suijten , David Airlie , Daniel Vetter , Kuogee Hsieh , dri-devel@lists.freedesktop.org, swboyd@chromium.org, quic_jesszhan@quicinc.com, quic_parellan@quicinc.com, quic_bjorande@quicinc.com, Rob Clark , linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] drm/msm/dp: move link_ready out of HPD event thread Message-ID: References: <8e125a99-543d-8328-a2a9-100e223e4faf@quicinc.com> <9313aa00-41f0-15af-a646-3f4e4b3098c7@quicinc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Mar 18, 2024 at 11:01:25AM -0700, Abhinav Kumar wrote: > On 3/15/2024 8:57 AM, Johan Hovold wrote: > > On Thu, Mar 14, 2024 at 09:30:57AM -0700, Abhinav Kumar wrote: > >> The race condition is between the time we get disconnect event and set > >> link_ready to false, a commit can come in. Because setting link_ready to > >> false happens in the event thread so it could be slightly delayed. > > > > I get this part, just not why, or rather when, that becomes a problem. > > > > Once the disconnect event is processed, dp_hpd_unplug_handle() will > > update the state to ST_DISCONNECT_PENDING, and queue a notification > > event. link_ready is (before this patch) still set to 1. > This is the case I am thinking of: > > 1) Disconnect event happens which will call dp_hpd_unplug_handle() but > link_ready is not false yet. > > 2) There is a commit with a modeset, which shall trigger > atomic_disable() followed by an atomic_enable() > > atomic_disable() will go through disable clocks and set hpd_state to > ST_DISCONNECTED. > > 3) atomic_enable() will not go through because we will bail out because > state was ST_DISCONNECTED. > > if (state != ST_DISPLAY_OFF && state != ST_MAINLINK_READY) { > mutex_unlock(&dp_display->event_mutex); > return; > } > > 4) Now, if there is another commit with a modeset, it will go and crash > at atomic_disable() Right, that's what I described in the mail you replied to but that still doesn't answer what triggers those mode sets. > > Here a commit comes in; what exactly are you suggesting would trigger > > that? And in such a way that it breaks the state machine? > Like we have seen, the commit can either come directly from userspace as > one last frame (the original bug I had given the link to) or from the > __drm_fb_helper_restore_fbdev_mode_unlocked() which happened in > sc8280xp's case. This is totally independent of the hpd_thread() with no > mutual exclusion. Right . Still not sure about the details about "that last frame" issue, that you saw in the past, and if that's still an issue or not. You claimed that you had fixed that, right? > This commit() can come before the link_ready was set to false. If it had > come after link_ready was set to false, atomic_check() would have failed > and no issue would have been seen. > > My change is making the link_ready false sooner in the disconnect case. Yes, but again, and as you have confirmed, you're only papering over the issue at such a mode set can still come in before you set link_state to false. > > One way this could cause trouble is if you end up with a call to > > dp_bridge_atomic_post_disable() which updates the state to > > ST_DISCONNECTED. (1) > > > > This would then need to be followed by another call to > > dp_bridge_atomic_enable() which bails out early with the link clock > > disabled. (2) (And if link_ready were to be set to 0 sooner, the > > likelihood of this is reduced.) > > > > This in turn, would trigger a reset when dp_bridge_atomic_disable() is > > later called. > Yes, this is exactly what I have written above. Thanks for confirming. > > This is the kind of description of the race I expect to see in the > > commit message, and I'm still not sure what would trigger the call to > > dp_bridge_atomic_post_disable() and dp_bridge_atomic_enable() (i.e. (1) > > and (2) above) and whether this is a real issue or not. > > I have explained what triggers the disable/enable call below. > > > Also note that the above scenario is quite different from the one I've > > hit and described earlier. > Why is that so? Eventually it will also translate to the same scenario. > I would like to understand why this is different. I think in your case, > probably we do not know what triggers the modeset, but its a minor > detail like I have written before. The state transitions are different and the enable event comes in before the bridge has been fully tore down unlike in the scenario we outlined above. And it's certainly not a minor detail, as in the sc8280xp VT case, those spurious hotplug events that trigger the atomic_enable would not have caused any trouble if it wasn't for the case that the bridge was stuck in the ST_MAINLINK_READY state. That explains why the hotplug notification revert in rc7 made a difference on sc8280xp. You're talking about an entirely different and, as far as I can tell, hypothetical scenario where are user executes a modeset while pulling the plug. This is certainly not why we had a number of user suddenly starting to hit this crash after they upgraded to 6.8-rc1. And, just to be clear, we know what triggers the modeset in the VT case, and I posted a detailed explanation with a strack trace here: https://lore.kernel.org/lkml/Ze8Ke_M2xHyPYCu-@hovoldconsulting.com/ > >> It will be hard to reproduce this. Only way I can think of is to delay > >> the EV_NOTIFICATION for sometime and see in dp_bridge_hpd_notify() > >> > >> else if (dp_display->link_ready && status == > >> connector_status_disconnected) > >> dp_add_event(dp, EV_HPD_UNPLUG_INT, 0, 0); > >> > >> as dp_add_event() will add the event, then wakeup the event_q. > > > > Sure that would increase the race window with the current code, but that > > alone isn't enough to trigger the bug AFAICT. > > > >> Before the event thread wakes up and processes this unplug event, the > >> commit can come in. This is the race condition i was thinking of. > > > > Yes, but what triggers the commit? And why would it lead to a mode set > > that disables the bridge? > Commit was triggered from the userspace as it did not process the > disconnect event on time and the userspace was triggering a couple of > modesets by by changing the mode on the CRTC from 1080P to NONE to 1080P. > > [drm:drm_atomic_helper_check_modeset] [CRTC:60:crtc-1] mode changed But *why* would user space do that? Pushing out another frame would generally not trigger a modeset, right? And as I've alluded to repeatedly, your patch only seems concerned with something like the above, where a hypothetical user space is triggering modesets after receiving a notification. And we know that that is not relevant for the crashes I've seen as there is no user space processing any events in my VT or X setup. Johan