Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp116655rwe; Fri, 26 Aug 2022 01:38:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR5cVYjA4DCrzs0sa1wUItaIgXnkjNLX2XP7Fx/Nc+VBA+wU+wyKt3fSGJZJXrsGH5d8PVqR X-Received: by 2002:a63:ad0d:0:b0:41d:433:8c42 with SMTP id g13-20020a63ad0d000000b0041d04338c42mr2422556pgf.105.1661503129707; Fri, 26 Aug 2022 01:38:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661503129; cv=none; d=google.com; s=arc-20160816; b=cZd/bxUX/IDFXA+Hkq+RYjKib1wBuwjffszU3EJiq74Gfafk7YORyFFMS/uG7ZDV9+ sYf0X4sjVkYvJi+XNExMdMPP3OuFGvgIAvzifknouRdfxwdcDvk8Opbzqn97965FJ8to oikvk7jm97SVE18j+uH5dSv//FoQi9eolrwvJFoxLUaYrBUb9U0Cq9hZ6A0SC7cRyR09 q+vSCWgCm/kwU03FKzSC3y/KXT+PZL7KDvqcIQoy9gQ6CwFljixP7Iin8KltxF5UP/jH gJHi7UvoeZ+UiOlVOZseJmfmErD6uXEiBPo9t/Jntae7ZL+v61VNEw0tgmf4v68Qom5U Y0dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=NwyaCFfEMgdgvoY41Arzh1xx+CoAI+pbpsz9rR9RZv0=; b=FiGvY4kh5BiyvM/b1dJQwiXbfgXEzUL2s/kMEXN45WyokEZYRMw7wzmEsdn0YYK6fE w3TAtWLrom5xuWFhUyLjB8iD2D7Br7zJqvNwI9hleMfezbVWijR8SoixKX91qcnRtL9w +d7qsRtT87NL32SoR/OSuyPDuSsJlzcr9LbhrFhjg6T/SHkdtsRJWd7gzwINlcFWo3rP PU9fyQc+iz4w1hmRwk5ztpJH07CdWFX/s6U7/ShyG4H8p0BSrs3cHZeubVD0s6vjNXcv Ebg9AgM372Cka1KsBQjZ7AsGdEmeEvhzAgTZhtLMjzfS+u3PJ5BElZ/JnfBTi8EQNetN doRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TmzUhFNg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a64-20020a639043000000b0041610ee2f17si1190072pge.261.2022.08.26.01.38.38; Fri, 26 Aug 2022 01:38:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=TmzUhFNg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230207AbiHZIT6 (ORCPT + 99 others); Fri, 26 Aug 2022 04:19:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230109AbiHZITy (ORCPT ); Fri, 26 Aug 2022 04:19:54 -0400 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 800C3B98 for ; Fri, 26 Aug 2022 01:19:52 -0700 (PDT) Received: by mail-lj1-x231.google.com with SMTP id bn9so878940ljb.6 for ; Fri, 26 Aug 2022 01:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc; bh=NwyaCFfEMgdgvoY41Arzh1xx+CoAI+pbpsz9rR9RZv0=; b=TmzUhFNgBDMPKhvHU8R2w4J91J4zP7ynPvouc6XI4eH8gywVjkg51E4cEFM1TgDXgK 6++5o9rDdD0ZM0GANQ/GhvYbqFMXElfJOLoRsFvyGVnZgquXBcjeOmM5CtUxMyspUYqY G+2VViBc3Q67+lHYT//yYuZL7y1cDDngIJS/NitYCEDribRHwd8fUgS/xOoKnDifFwdn Vno6osTu85V7/JAFofswxhCMEL3EnebKtkvfWFBjyuS4ET2TkuNL09BhyuIdpaNfyT5o 9q5/LW2P270XGTVrzhEP87NVMhaT7FSbkc7QIyCi8N854B1Uqcft5Ssd36tAhNoHBH1q qBVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=NwyaCFfEMgdgvoY41Arzh1xx+CoAI+pbpsz9rR9RZv0=; b=CNbGN2xmpcm395aD4tjpn8EOzuO+QVbiarUVYGH3lumD5jECpWhoyHnXWaE0oxkjjr clRCyD+y45zIpSZo/k181+4Erj1LwV6cigUAzrZWccbfkmXg4A9HJFxleE3+2qwCS1h9 XJ8aDcqaGaxQK6pLDoWqOluK+2y2ytcbeH408ogcY+qT508wVDFN022D0kBoutON03/J iakVXVjtO4IKxqhzpM44LQGCoolYNMqfHhlYzendI7K8gDI1YHpuG0e3Z1PkoCNaoMbw kGk3HnO0j43UhfSd0pC1wWQO4JKq2ha76v0Bi5AG0w9kduh0L9iTSn3N2SaNFYZcytez ZAtg== X-Gm-Message-State: ACgBeo3CLfaPDHfSAjakksXAQ3CCWqBzQT1jn3kJ4xfGYxdqZb3AzRDU v2fcCU5xD9sJNve5/dlVwxJowkEY8h2X7g== X-Received: by 2002:a05:651c:238b:b0:261:ca71:5945 with SMTP id bk11-20020a05651c238b00b00261ca715945mr2017204ljb.253.1661501990732; Fri, 26 Aug 2022 01:19:50 -0700 (PDT) Received: from [192.168.1.211] ([37.153.55.125]) by smtp.gmail.com with ESMTPSA id a13-20020a19ca0d000000b0048b045a5c3asm293092lfg.201.2022.08.26.01.19.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 Aug 2022 01:19:50 -0700 (PDT) Message-ID: <1f31afce-6807-5d33-15e2-78bc42894764@linaro.org> Date: Fri, 26 Aug 2022 11:19:49 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: [PATCH] drm/msm/dp: add atomic_check to bridge ops Content-Language: en-GB To: Abhinav Kumar Cc: Kuogee Hsieh , robdclark@gmail.com, sean@poorly.run, swboyd@chromium.org, dianders@chromium.org, vkoul@kernel.org, daniel@ffwll.ch, airlied@linux.ie, agross@kernel.org, bjorn.andersson@linaro.org, quic_sbillaka@quicinc.com, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, quic_aravindh@quicinc.com, freedreno@lists.freedesktop.org References: <1660759314-28088-1-git-send-email-quic_khsieh@quicinc.com> <266c0531-344e-5589-2143-02ab1fe9b276@linaro.org> <724d695d-0293-db81-7014-57cb96bd6d4b@quicinc.com> <13509c06-cf2b-e37b-d8ec-b5cc5370f566@quicinc.com> <16f2c33f-c91e-8b4a-f67a-81f13adb2eac@quicinc.com> <35384419-0801-dabf-3481-1b5717bc11e9@quicinc.com> From: Dmitry Baryshkov In-Reply-To: <35384419-0801-dabf-3481-1b5717bc11e9@quicinc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24/08/2022 22:16, Abhinav Kumar wrote: > > > On 8/24/2022 1:25 AM, Dmitry Baryshkov wrote: >> On Wed, 24 Aug 2022 at 01:59, Abhinav Kumar >> wrote: >>> >>> >>> >>> On 8/23/2022 3:41 PM, Dmitry Baryshkov wrote: >>>> On Wed, 24 Aug 2022 at 01:07, Abhinav Kumar >>>> wrote: >>>>> On 8/22/2022 11:33 AM, Dmitry Baryshkov wrote: >>>>>> On 22/08/2022 20:32, Abhinav Kumar wrote: >>>>>>> >>>>>>> >>>>>>> On 8/22/2022 9:49 AM, Dmitry Baryshkov wrote: >>>>>>>> On 22/08/2022 19:38, Abhinav Kumar wrote: >>>>>>>>> Hi Dmitry >>>>>>>>> >>>>>>>>> On 8/22/2022 9:18 AM, Dmitry Baryshkov wrote: >>>>>>>>>> On 17/08/2022 21:01, Kuogee Hsieh wrote: >>>>>>>>>>> DRM commit_tails() will disable downstream >>>>>>>>>>> crtc/encoder/bridge if >>>>>>>>>>> both disable crtc is required and crtc->active is set before >>>>>>>>>>> pushing >>>>>>>>>>> a new frame downstream. >>>>>>>>>>> >>>>>>>>>>> There is a rare case that user space display manager issue an >>>>>>>>>>> extra >>>>>>>>>>> screen update immediately followed by close DRM device while >>>>>>>>>>> down >>>>>>>>>>> stream display interface is disabled. This extra screen >>>>>>>>>>> update will >>>>>>>>>>> timeout due to the downstream interface is disabled but will >>>>>>>>>>> cause >>>>>>>>>>> crtc->active be set. Hence the followed commit_tails() called by >>>>>>>>>>> drm_release() will pass the disable downstream >>>>>>>>>>> crtc/encoder/bridge >>>>>>>>>>> conditions checking even downstream interface is disabled. >>>>>>>>>>> This cause the crash to happen at dp_bridge_disable() due to it >>>>>>>>>>> trying >>>>>>>>>>> to access the main link register to push the idle pattern out >>>>>>>>>>> while main >>>>>>>>>>> link clocks is disabled. >>>>>>>>>>> >>>>>>>>>>> This patch adds atomic_check to prevent the extra frame will not >>>>>>>>>>> be pushed down if display interface is down so that crtc->active >>>>>>>>>>> will not be set neither. This will fail the conditions checking >>>>>>>>>>> of disabling down stream crtc/encoder/bridge which prevent >>>>>>>>>>> drm_release() from calling dp_bridge_disable() so that crash >>>>>>>>>>> at dp_bridge_disable() prevented. >>>>>>>>>> >>>>>>>>>> I must admit I had troubles parsing this description. However >>>>>>>>>> if I >>>>>>>>>> got you right, I think the check that the main link clock is >>>>>>>>>> running in the dp_bridge_disable() or dp_ctrl_push_idle() >>>>>>>>>> would be >>>>>>>>>> a better fix. >>>>>>>>> >>>>>>>>> Originally, thats what was posted >>>>>>>>> https://patchwork.freedesktop.org/patch/496984/. >>>>>>>> >>>>>>>> This patch is also not so correct from my POV. It checks for the >>>>>>>> hpd >>>>>>>> status, while in reality it should check for main link clocks being >>>>>>>> enabled. >>>>>>>> >>>>>>> >>>>>>> We can push another fix to check for the clk state instead of the >>>>>>> hpd >>>>>>> status. But I must say we are again just masking something which the >>>>>>> fwk should have avoided isnt it? >>>>>>> >>>>>>> As per the doc in the include/drm/drm_bridge.h it says, >>>>>>> >>>>>>> "* >>>>>>>     * The bridge can assume that the display pipe (i.e. clocks >>>>>>> and timing >>>>>>>     * signals) feeding it is still running when this callback is >>>>>>> called. >>>>>>>     *" >>>>>> >>>>>> Yes, that's what I meant about this chunk begging to go to the >>>>>> core. In >>>>>> my opinion, if we are talking about the disconnected sinks, it is the >>>>>> framework who should disallow submitting the frames to the >>>>>> disconnected >>>>>> sinks. >>>>>> >>>>>>> >>>>>>> By adding an extra layers of protection in the driver, we are just >>>>>>> avoiding another issue but the commit should not have been issued in >>>>>>> the first place. >>>>>>> >>>>>>> So shouldnt we do both then? That is add protection to check if >>>>>>> clock >>>>>>> is ON and also, reject commits when display is disconnected. >>>>>>> >>>>>>>>> >>>>>>>>> Then it seemed like we were just protecting against an issue in >>>>>>>>> the >>>>>>>>> framework which was allowing the frames to be pushed even after >>>>>>>>> the >>>>>>>>> display was disconnected. The DP driver did send out the >>>>>>>>> disconnect >>>>>>>>> event correctly and as per the logs, this frame came down after >>>>>>>>> that >>>>>>>>> and the DRM fwk did allow it. >>>>>>>>> >>>>>>>>> So after discussing on IRC with Rob, we came up with this >>>>>>>>> approach that >>>>>>>>> if the display is not connected, then atomic_check should fail. >>>>>>>>> That >>>>>>>>> way the commit will not happen. >>>>>>>>> >>>>>>>>> Just seemed a bit cleaner instead of adding all our protections. >>>>>>>> >>>>>>>> The check to fail atomic_check if display is not connected seems >>>>>>>> out >>>>>>>> of place. In its current way it begs go to the upper layer, >>>>>>>> forbidding using disconnected sinks for all the drivers. There is >>>>>>>> nothing special in the MSM DP driver with respect to the HPD events >>>>>>>> processing and failing atomic_check() based on that. >>>>>>>> >>>>>>> >>>>>>> Why all the drivers? This is only for MSM DP bridge. >>>>>> >>>>>> Yes, we change the MSM DRM driver. But the check is generic >>>>>> enough. I'm >>>>>> not actually insisting on pushing the check to the core, just >>>>>> trying to >>>>>> understand the real cause here. >>>>>> >>>>>>> >>>>> >>>>> I actually wanted to push this to the core and thats what I had >>>>> originally asked on IRC because it does seem to be generic enough that >>>>> it should belong to the core but after discussion with Rob on >>>>> freedreno, >>>>> he felt this was a better approach because for some of the legacy >>>>> connectors like VGA, this need not belong to the DRM core, hence we >>>>> went >>>>> with this approach. >>>> >>>> It might be better to whitelist such connectors (S-VIDEO/composite >>>> comes to my mind rather than VGA). >>> >>> I am fine with that approach, if Rob is onboard with that. >>> >>>> >>>>>>>>>>> SError Interrupt on CPU7, code 0x00000000be000411 -- SError >>>>>>>>>>> CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19 >>>>>>>>>>> Hardware name: Google Lazor (rev3 - 8) (DT) >>>>>>>>>>> pstate: a04000c9 (NzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>>>>>>> pc : __cmpxchg_case_acq_32+0x14/0x2c >>>>>>>>>>> lr : do_raw_spin_lock+0xa4/0xdc >>>>>>>>>>> sp : ffffffc01092b6a0 >>>>>>>>>>> x29: ffffffc01092b6a0 x28: 0000000000000028 x27: >>>>>>>>>>> 0000000000000038 >>>>>>>>>>> x26: 0000000000000004 x25: ffffffd2973dce48 x24: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> x23: 00000000ffffffff x22: 00000000ffffffff x21: >>>>>>>>>>> ffffffd2978d0008 >>>>>>>>>>> x20: ffffffd2978d0008 x19: ffffff80ff759fc0 x18: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> x17: 004800a501260460 x16: 0441043b04600438 x15: >>>>>>>>>>> 04380000089807d0 >>>>>>>>>>> x14: 07b0089807800780 x13: 0000000000000000 x12: >>>>>>>>>>> 0000000000000000 >>>>>>>>>>> x11: 0000000000000438 x10: 00000000000007d0 x9 : >>>>>>>>>>> ffffffd2973e09e4 >>>>>>>>>>> x8 : ffffff8092d53300 x7 : ffffff808902e8b8 x6 : >>>>>>>>>>> 0000000000000001 >>>>>>>>>>> x5 : ffffff808902e880 x4 : 0000000000000000 x3 : >>>>>>>>>>> ffffff80ff759fc0 >>>>>>>>>>> x2 : 0000000000000001 x1 : 0000000000000000 x0 : >>>>>>>>>>> ffffff80ff759fc0 >>>>>>>>>>> Kernel panic - not syncing: Asynchronous SError Interrupt >>>>>>>>>>> CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19 >>>>>>>>>>> Hardware name: Google Lazor (rev3 - 8) (DT) >>>>>>>>>>> Call trace: >>>>>>>>>>>     dump_backtrace.part.0+0xbc/0xe4 >>>>>>>>>>>     show_stack+0x24/0x70 >>>>>>>>>>>     dump_stack_lvl+0x68/0x84 >>>>>>>>>>>     dump_stack+0x18/0x34 >>>>>>>>>>>     panic+0x14c/0x32c >>>>>>>>>>>     nmi_panic+0x58/0x7c >>>>>>>>>>>     arm64_serror_panic+0x78/0x84 >>>>>>>>>>>     do_serror+0x40/0x64 >>>>>>>>>>>     el1h_64_error_handler+0x30/0x48 >>>>>>>>>>>     el1h_64_error+0x68/0x6c >>>>>>>>>>>     __cmpxchg_case_acq_32+0x14/0x2c >>>>>>>>>>>     _raw_spin_lock_irqsave+0x38/0x4c >>>>>> >>>>>> You know, after re-reading the trace, I could not help but notice >>>>>> that >>>>>> the issue seems to be related to completion/timer/spinlock memory >>>>>> becoming unavailable rather than disabling the main link clock. >>>>>> See, the SError comes in the spin_lock path, not during register >>>>>> read. >>>>>> >>>>>> Thus I think the commit message is a bit misleading. >>>>>> >>>>> >>>>> No, this issue is due to unclocked access. Please check this part >>>>> of the >>>>> stack: >>>> >>>> Well, if it were for the unlocked access, we would see SError on the >>>> register access, wouldn't we? However in this case the SError comes >>>> from the raw spinlock code. >>> >>> This is not uncommon. With unclocked access, we have seen in the past >>> that sometimes the stack is off by one line. The fact that this issue >>> got resolved even with the older version of the patch >>> https://patchwork.freedesktop.org/patch/496984/ is pointing towards an >>> unclocked access and not the dp/dp->ctrl memory pointers. >> >> As far as I understood, the bug is reproducible. Just to make me feel >> safe, can we please: >> -  either have a trace which shows when the clocks are disabled (or >> not enabled) >> - or make sure that keeping the mainlink clock on would also mitigate >> the issue? > > Yes, this trace is already available with all the drm_dbg_dp messages > enabled. Please refer to the attachment named > 2022-08-15-dmesg-drm-4K-crash.txt in the bug > https://gitlab.freedesktop.org/drm/msm/-/issues/17. > > You can jump to this section of the log. > > [   99.191216] msm_dpu ae01000.mdp: [drm:dp_display_host_phy_exit] > type=10 core_init=1 phy_init=1 > [   99.192354] [drm:dp_ctrl_phy_exit] phy=00000000b9b91350 init=0 > power_on=0 > [   99.192369] msm_dpu ae01000.mdp: > [drm:dp_display_disable.constprop.0.isra.0] sink count: 1 > > Here is the dp_display_disable() you were looking for. > > [   99.192378] msm_dpu ae01000.mdp: [drm:dp_bridge_post_disable] type=10 > Done > [   99.192389] msm_dpu ae01000.mdp: > [drm:drm_atomic_helper_commit_modeset_disables] disabling [CRTC:60:crtc-1] > [   99.192561] [drm:dpu_crtc_disable] no frames pending > [   99.192571] [drm:dpu_core_perf_crtc_update] crtc:60 stop_req:1 > core_clk:200000000 > [   99.192581] [drm:dpu_core_perf_crtc_update] crtc=60 disable > [   99.192588] [drm:_dpu_core_perf_crtc_update_bus] crtc=59 bw=0 paths:1 > [   99.192595] [drm:_dpu_core_perf_crtc_update_bus] crtc=60 bw=0 paths:1 > [   99.192700] [drm:dpu_core_perf_crtc_update] clk:200000000 > [   99.192714] [drm:dpu_core_perf_crtc_update] update clk rate = > 200000000 HZ > [   99.192729] msm_dpu ae01000.mdp: > [drm:drm_atomic_helper_commit_modeset_disables] modeset on > [ENCODER:33:TMDS-33] > [   99.192738] [drm:dpu_encoder_virt_atomic_mode_set] enc33 > [   99.192749] [drm:dpu_crtc_atomic_begin] crtc59 >> >>> >>>> >>>>>    >>>>>>   wait_for_completion_timeout+0x2c/0x54 >>>>>    >>>>>>   dp_ctrl_push_idle+0x40/0x88 >>>>>    >>>>>>   dp_bridge_disable+0x24/0x30 >>>>>    >>>>>>   drm_atomic_bridge_chain_disable+0x90/0xbc >>>>>    >>>>>>   drm_atomic_helper_commit_modeset_disables+0x198/0x444 >>>>>    >>>>>>   msm_atomic_commit_tail+0x1d0/0x374 >>>>>    >>>>>>   commit_tail+0x80/0x108 >>>>>    >>>>>>   drm_atomic_helper_commit+0x118/0x11c >>>>>    >>>>>>   drm_atomic_commit+0xb4/0xe0 >>>>>    >>>>>>   drm_client_modeset_commit_atomic+0x184/0x224 >>>>>    >>>>>>   drm_client_modeset_commit_locked+0x58/0x160 >>>>>    >>>>>>   drm_client_modeset_commit+0x3c/0x64 >>>>> >>>>>> Can we please get a trace checking which calls were actually made for >>>>>> the dp bridge and if the dp/dp->ctrl memory pointers are correct? >>>>>> >>>>>> I do not see the dp_display_disable() being called. Maybe I just >>>>>> missed >>>>>> the call. >>>>>> >>>>> >>>>> Yes it is called, please refer to the above part of the stack that I >>>>> have pasted. >>>> >>>> The stacktrace mentions dp_bridge_disable(), not dp_display_disable() >>>> (which I asked for). >>>> >>> >>> So whats happening here is the crash is happening in >>> dp_bridge_disable(). >>> >>> dp_display_disable() is called from post_disable() thats why it doesnt >>> show up in the stack. >>> >> >> Yes. But the mainlink clocks are disabled in dp_display_disable() >> that's why I'm asking if the function was called at all. > > Now, I see why you were asking about dp_display_disable(). So basically > your question is that when did dp_display_disable() happen that disabled > the clocks causing this issue. > > dp_display_disable() happened when the cable was disconnected as shown > in the above section of the logs. > > We also sent the disconnected uevent to the usermode. But this commit is > happening from the drm_lastclose() context which doesnt check the > connection status. > > This leads to a commit after the cable has been disconnected causing the > unclocked access. > > You can refer this log and comment if something is still not clear to you. I have spent some time comparing the log and the programming logic. I found what I was looking for: a safeguard for not doing the disable twice. The disable_outputs() function, the one which calls drm_atomic_bridge_chain_disable() has a logical check which should have acted as a safety net here: it checks whether crtc_needs_disable(). Can you please doublecheck why doesn't it reflect the fact that CRTC doesn't need disabling as it has been already disabled. If I understand correctly this boils down to CRTC's old_state->active being set, while the CRTC has been effectively disabled. -- With best wishes Dmitry