Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp2989390pxy; Mon, 3 May 2021 12:25:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAmuzV60uZ002eMvC0Umn08sM8k1msIE7DibmqQkIV5Rqos8pYLN+T6ckg5TWMDcENXlyv X-Received: by 2002:a05:6402:138f:: with SMTP id b15mr21926124edv.121.1620069911104; Mon, 03 May 2021 12:25:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620069911; cv=none; d=google.com; s=arc-20160816; b=DbYspJanIV5+mL5zP4H4NY5tVfU2iaBz76QBlF7kXtxh9ycrC9gIdT9PZbCsFripdk W6hRS/567FLiujbYrfPC49hFnUyW4Bh9dmWWhzSDoUihB7IW9sYnP+x802tUR0k3LlUM jS0y+kqWxsPm7evQKePqpGQ7xPVKBO0U8QNfU4cQ7SIKp1/dII3614IY8UToXCV2xjQA x2DF+cCTjRnrM7HRvb7ZYz7OXOvfNtPJOloBNBTaIhZNPCGrOxUVpzf+9eI8f18QFDNb 1RAv3Si46SNHHY6dXMRT2Jo2z9Y4j+IXAYuBhnktMm9Qi9dDDsnXNKg7EriNGxkqDzF6 V+Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:message-id:references:in-reply-to :subject:cc:to:from:date:content-transfer-encoding:mime-version :sender:dkim-signature; bh=zSbtczGrL5Klsa/uGQiQx/Y/vrfEEMLdvsxiLktthNg=; b=tZly0ilm0ymxuL3RwRUVGiHFkofJQXpKsRDgvrWGybdDIDCMuwf6Og+Xo7BjFSPBX9 gmVcjZm+xg1iAoQxDrkgSNunN6spXaRmt6lWQyKMPMDPp9auGiqSvXEduIpcLtnZIYbJ /G3GbByPtdozyoGlFa39VplIPBbcMml/Dxfrb7wlKdr49jHE80DZbp3IdNWm5Knvf6Q0 /Bx5U2FcBcCeqedTVjSISk5PnvHioI7JJTH4qc/SX2Nt+zG/7Vwgq7sm2ADkRIBpWarR HSxE5JYMN7ikau91fE0Qihk5ZTRyJXyAacc0P3x3nGEqX81t93ZmHPeusD20Abe+vq0L FR4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=pqTf+Lsy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g8si755619ejm.240.2021.05.03.12.24.45; Mon, 03 May 2021 12:25:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=pqTf+Lsy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229570AbhECTYd (ORCPT + 99 others); Mon, 3 May 2021 15:24:33 -0400 Received: from so254-9.mailgun.net ([198.61.254.9]:13607 "EHLO so254-9.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbhECTYc (ORCPT ); Mon, 3 May 2021 15:24:32 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1620069819; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=zSbtczGrL5Klsa/uGQiQx/Y/vrfEEMLdvsxiLktthNg=; b=pqTf+Lsyg5UxGTLPZXAD3FtksIV/DSXitNxFc3HGjbTSgyVw7HF0F1fYYFoYSHoAAA5LGCsN Pt2QeanSEg6KkJTSNKCICoXrFrmBHsq8yPkrh2cSj/LZxrqdrD6lq+iTorPmrTWTfcL6vdgc EzMwvk2C3aUpWjFDMT/qiDYtKJk= X-Mailgun-Sending-Ip: 198.61.254.9 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n02.prod.us-west-2.postgun.com with SMTP id 60904db79a9ff96d95325f73 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 03 May 2021 19:23:35 GMT Sender: khsieh=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 37DD9C433F1; Mon, 3 May 2021 19:23:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: khsieh) by smtp.codeaurora.org (Postfix) with ESMTPSA id B68B5C433D3; Mon, 3 May 2021 19:23:31 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 03 May 2021 12:23:31 -0700 From: khsieh@codeaurora.org To: Stephen Boyd Cc: aravindh@codeaurora.org, robdclark@gmail.com, sean@poorly.run, abhinavk@codeaurora.org, airlied@linux.ie, daniel@ffwll.ch, linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org, freedreno@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] drm/msm/dp: service only one irq_hpd if there are multiple irq_hpd pending In-Reply-To: References: <1618604877-28297-1-git-send-email-khsieh@codeaurora.org> <161895606268.46595.2841353121480638642@swboyd.mtv.corp.google.com> <9ccdef6e1a1b47bd8d99594831f51094@codeaurora.org> Message-ID: <9564c684fb1c14a1df0068d42c749a8e@codeaurora.org> X-Sender: khsieh@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-04-29 20:11, Stephen Boyd wrote: > Quoting khsieh@codeaurora.org (2021-04-29 10:23:31) >> On 2021-04-29 02:26, Stephen Boyd wrote: >> > Quoting khsieh@codeaurora.org (2021-04-28 10:38:11) >> >> On 2021-04-27 17:00, Stephen Boyd wrote: >> >> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21) >> >> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote: >> >> >> >> >> >> >> >>> + >> >> >> >>> mutex_unlock(&dp->event_mutex); >> >> >> >>> >> >> >> >>> return 0; >> >> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp, >> >> >> >>> struct drm_encoder *encoder) >> >> >> >>> /* stop sentinel checking */ >> >> >> >>> dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT); >> >> >> >>> >> >> >> >>> + /* link is down, delete pending irq_hdps */ >> >> >> >>> + dp_del_event(dp_display, EV_IRQ_HPD_INT); >> >> >> >>> + >> >> >> >> >> >> >> >> I'm becoming convinced that the whole kthread design and event queue >> >> >> >> is >> >> >> >> broken. These sorts of patches are working around the larger problem >> >> >> >> that the kthread is running independently of the driver and irqs can >> >> >> >> come in at any time but the event queue is not checked from the irq >> >> >> >> handler to debounce the irq event. Is the event queue necessary at >> >> >> >> all? >> >> >> >> I wonder if it would be simpler to just use an irq thread and process >> >> >> >> the hpd signal from there. Then we're guaranteed to not get an irq >> >> >> >> again >> >> >> >> until the irq thread is done processing the event. This would >> >> >> >> naturally >> >> >> >> debounce the irq hpd event that way. >> >> >> > event q just like bottom half of irq handler. it turns irq into event >> >> >> > and handle them sequentially. >> >> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot >> >> >> > during run time of operation. >> >> >> > Here, the dongle is unplugged and main link had teared down so that no >> >> >> > need to service pending irq_hpd if any. >> >> >> > >> >> >> >> >> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and >> >> >> is >> >> >> not like your typical HW generated IRQ. There is no guarantee that we >> >> >> will not receive an IRQ_HPD until we are finished with processing of >> >> >> an >> >> >> earlier HPD message or an IRQ_HPD message. For example - when you run >> >> >> the protocol compliance, when we get a HPD from the sink, we are >> >> >> expected to start reading DPCD, EDID and proceed with link training. >> >> >> As >> >> >> soon as link training is finished (which is marked by a specific DPCD >> >> >> register write), the sink is going to issue an IRQ_HPD. At this point, >> >> >> we may not done with processing the HPD high as after link training we >> >> >> would typically notify the user mode of the newly connected display, >> >> >> etc. > > I re-read this. I think you're saying that IRQ_HPD can come in after > HPD > goes high and we finish link training? That sounds like we should > enable > IRQ_HPD in the hardware once we finish link training, instead of having > it enabled all the time. Then we can finish the threaded irq handler > and > the irq should be pending again once IRQ_HPD is sent over. Is there > ever > a need to be processing some IRQ_HPD and then get another IRQ_HPD while > processing the first one? yes, for example 1) plug dongle only 2) plug hdmi monitor into dongle (generated irq_hpd with sinc_count = 1) 3) unplug hdmi monitor out of the dongle (generate irq_hpd with sinc_count = 0) 4) go back to 2) for n times 5) unplug dongle This patch is not fix this problem either. The existing code has major issue which is handle irq_hpd with sink_count = 0 same way as handle of dongle unplugged. I think this cause external dp display failed to work and cause crash at suspend/resume test case. I will drop this patch. I am working on handle irq_hpd with sink_count = 0 as asymmetric as opposite to irq_hpd with sink_count = 1. This means irq_hdp sink_count = 0 handle only tear down the main link but keep phy/aux intact. I will re submit patch for review. > >> >> > >> >> > Given that the irq comes in and is then forked off to processing at a >> >> > later time implies that IRQ_HPD can come in at practically anytime. >> >> > Case >> >> > in point, this patch, which is trying to selectively search through the >> >> > "event queue" and then remove the event that is no longer relevant >> >> > because the display is being turned off either by userspace or because >> >> > HPD has gone away. If we got rid of the queue and kthread and processed >> >> > irqs in a threaded irq handler I suspect the code would be simpler and >> >> > not have to search through an event queue when we disable the display. >> >> > Instead while disabling the display we would make sure that the irq >> >> > thread isn't running anymore with synchronize_irq() or even disable the >> >> > irq entirely, but really it would be better to just disable the irq in >> >> > the hardware with a register write to some irq mask register. >> >> > >> >> > This pushes more of the logic for HPD and connect/disconnect into the >> >> > hardware and avoids reimplementing that in software: searching through >> >> > the queue, checking for duplicate events, etc. >> >> >> >> I wish we can implemented as you suggested. but it more complicate >> >> than >> >> that. >> >> Let me explain below, >> >> we have 3 transactions defined as below, >> >> >> >> plugin transaction: irq handle do host dp ctrl initialization and link >> >> training. If sink_count = 0 or link train failed, then transaction >> >> ended. otherwise send display up uevent to frame work and wait for >> >> frame >> >> work thread to do mode set, start pixel clock and start video to end >> >> transaction. >> > >> > Why do we need to wait for userspace to start video? HPD is indicating >> > that we have something connected, so shouldn't we merely signal to >> > userspace that something is ready to display and then enable the irq >> > for >> > IRQ_HPD? >> > >> yes, it is correct. >> The problem is unplug happen after signal user space. >> if unplug happen before user space start mode set and video, then it >> can >> just do nothing and return. >> but if unplugged happen at the middle of user space doing mode set and >> start video? > > I expect the link training to fail, maybe slowly, but userspace should > still be notified that the state has changed to disconnected when the > irq comes in, around the same time that the cable is physically > disconnected. > >> >> remember we had run into problem system show in connect state when >> dongle unplugged, vice versa. >> > > These problems are still happening as far as I can tell. I've heard > reports that external panels are showing up as connected when no dongle > is there, implying that HPD handling is broken. > >> >> >> >> >> >> >> unplugged transaction: irq handle send display off uevent to frame >> >> work and wait for frame work to disable pixel clock ,tear down main >> >> link and dp ctrl host de initialization. >> > >> > What do we do if userspace is slow and doesn't disable the display >> > before the cable is physically plugged in again? >> > >> plugin is not handle (re enter back into event q) until unplugged >> handle >> completed. >> >> >> >> irq_hpd transaction: This only happen after plugin transaction and >> >> before unplug transaction. irq handle read panel dpcd register and >> >> perform requesting action. Action including perform dp compliant >> >> phy/link testing. >> >> >> >> since dongle can be plugged/unplugged at ant time, three conditions >> >> have >> >> to be met to avoid race condition, >> >> 1) no irq lost >> >> 2) irq happen timing order enforced at execution >> >> 3) no irq handle done in the middle transaction >> >> >> >> for example we do not want to see >> >> plugin --> unplug --> plugin --> unplug become plugin --> plugin--> >> >> unplug >> >> >> >> The purpose of this patch is to not handle pending irq_hpd after >> >> either >> >> dongle or monitor had been unplugged until next plug in. >> >> >> > >> > I'm not suggesting to block irq handling entirely for long running >> > actions. A plug irq due to HPD could still notify userspace that the >> > display is connected but when an IRQ_HPD comes in we process it in the >> > irq thread instead of trying to figure out what sort of action is >> > necessary to quickly fork it off to a kthread to process later. >> > >> > The problem seems to be that this quick forking off of the real IRQ_HPD >> > processing is letting the event come in, and then an unplug to come in >> > after that, and then a plug in to come in after that, leading to the >> > event queue getting full of events that are no longer relevant but >> > still >> > need to be processed. If this used a workqueue instead of an open-coded >> > one, I'd say we should cancel any work items on the queue if an unplug >> > irq came in. That way we would make sure that we're not trying to do >> > anything with the link when it isn't present anymore. >> > >> is this same as we delete irq_hpd from event q? >> What happen if the workqueue had been launched? > > Yes workqueues are basically functions you run on a kthread with > various > ways to either make sure that the work has finished processing or to > try > to cancel it out so that it either doesn't run at all because the > kthread hasn't picked it up or that it runs to completion before > continuing. The event queue should be replaced with a workqueue design, > but even better would be to use a threaded irq if possible so that > hardware can't raise more irqs while one is being handled. > >> >> > But even then it doesn't make much sense. Userspace could be heavily >> > delayed after the plug in irq, when HPD is asserted, and not display >> > anything. The user could physically unplug and plug during that time so >> > we really need to not wait at all or do anything besides note the state >> > of the HPD when this happens. The IRQ_HPD irq is different. I don't >> > think we care to keep getting them if we're not done processing the >> > previous irq. I view it as basically an "edge" irq that we see, >> > process, >> > and then if another one comes in during the processing time we ignore >> > it. There's only so much we can do, hence the suggestion to use a >> > threaded irq. >> > >> I do not think you can ignore irq_hpd. >> for example, you connect hdmi monitor to dongle then plug in dongle >> into >> DUT and unplug hdmi monitor immediatly. >> DP driver will see plugin irq with sink_count=1 followed by irq_hpd >> with >> sink_count= 0. >> Then we may end up you think it is in connect state but actually it >> shold be in disconnect state. > > Yes I'm saying that we should be able to use the hardware to coalesce > multiple IRQ_HPDs so that we don't unmask the IRQ_HPD until a connect > irq tells us a cable is connected, and then we mask IRQ_HPD when a > disconnect irq happens, and ignore extra IRQ_HPDs by processing the > IRQ_HPD in a threaded irq handler. > > Maybe this can't work because the same hardware irq is used for the HPD > high/low and IRQ_HPD? If that's true, we should be able to keep the > IRQ_HPD masked until the event is processed by calling > dp_catalog_hpd_config_intr() to disable DP_DP_IRQ_HPD_INT_MASK when we > see it in the irq handler and only enable the irq again once we've > processed it, which I guess would be the end of dp_irq_hpd_handle()? > >> I do not think we can ignore irq_hpd but combine multiple irq_hpd into >> one and handle it. >> >> >> > This is why IRQ_HPD is yanking the HPD line down to get the attention >> > of >> > the source, but HPD high and HPD low for an extended period of time >> > means the cable has been plugged or unplugged. We really do care if the >> > line goes low for a long time, but if it only temporarily goes low for >> > an IRQ_HPD then we either saw it or we didn't have time to process it >> > yet. >> > >> > It's like a person at your door ringing the doorbell. They're there >> > (HPD >> > high), and they're ringing the doorbell over and over (IRQ_HPD) and >> > eventually they go away when you don't answer (HPD low). We don't have >> > to keep track of every single doorbell/IRQ_HPD event because it's >> > mostly >> > a ping from the sink telling us we need to go do something, i.e. a >> > transitory event. The IRQ_HPD should always work once HPD is there, but >> > once HPD is gone we should mask it and ignore that irq until we see an >> > HPD high again. >> >> if amazon deliver ring the door bell 3 times, then we answer the door >> at >> the third time. this mean the first and second door bell ring can be >> ignored. >> Also if door bell ring 3 times and left an package at door then >> deliver >> left, you saw deliver left form window then you still need to answer >> to >> find out there is package left at door. If you ignore doorbell, then >> you >> will missed the package. > > There isn't a package being left at the door. When HPD goes away, > there's nothing to do anymore. Stop going to the door to look for > anything. Maybe a better analogy is that the entire door and doorbell > is > gone when HPD goes away. > >> >> >> I believe both thread_irq and event q works. >> But I think event q give us more finer controller. > > What sort of finer control? Opinions need supporting facts or they're > just opinions. > >> We are trying to fix an extreme case which generate un expected number >> of irq_hpd at an unexpected timing. >> I believe other dp driver (not Qcom) will also failed on this >> particular >> case. >> > > I don't understand why that matters. This driver being just as bad as > other drivers isn't a good quality.