Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4919338ybv; Wed, 26 Feb 2020 05:20:11 -0800 (PST) X-Google-Smtp-Source: APXvYqxzUrfYT7RMhpnsIw/dWRHAV4tynLGyQC6EMdi6TT9iocB8ZRUaerCbwVHrEZz4QoTSoeR0 X-Received: by 2002:a05:6830:18d4:: with SMTP id v20mr3061623ote.29.1582723211226; Wed, 26 Feb 2020 05:20:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582723211; cv=none; d=google.com; s=arc-20160816; b=NNGBoCYa2UbvX8kNY7omyyxDuxrbS0NbwBEtWdU7Pktvm9eub80vEXbOOGO52fsiYo cDmtpbtZyWW/1DEt9aGGMausxAuJSRPFdVM6fC9wIJrqDBCEz0BrZMd3XAApus3J6pYp LhSaRTFRt50dsEdXSClZafVYfqaGFJtnLEiOQdftg5zGHVgoCk+WyCmE21IyA6XgCIXv C2BmJTETrHY4OYPhszOH4KBqG4Mb0W5c0QJi6sIX+xcfSbSubl0dTeMrRs76HEf367sM UQYcO6ycbkjge3SVi74IanyfIgqUgv3G8LpMc5N1RAmloKttmKCQ6zpGubGJc9T4/Mvy oGqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=QoC7bJpC6IeCs1ctRIaO0PZChkcmBH5yMAUSWXCa3jI=; b=TU6QPO1CelxJK7ovj9LR/0XDNfvqwdNA+TH8lbRTCydRnlH5JzIOYzu49VHkFlcVv4 jrxTYwpBAvSg29taqf6P+0gt3KI4EzqnNLTGrvOkZfiEy75RgVbaSrqcjBNs3oov5rUU OLo442tx0KwS8gnJT6q1wNnyuzkOkK6S4EQYq1ANUS0Wo/VKSTmaZEU9FN/UVViUdwdK vKMcSrAylC9VChPNeG1zN/CqGC2EmNzdTeYXTc/wDJZlMqx5O8+Y7jpZGE4WuaHaua8Z LhTOz25vGy1kiollsNi8/M5huyjJ0og4TywRmqdrpHDTJqQi+920NvMjll48OlLRbS0g rVOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si1292442otk.164.2020.02.26.05.19.57; Wed, 26 Feb 2020 05:20:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726700AbgBZNTV (ORCPT + 99 others); Wed, 26 Feb 2020 08:19:21 -0500 Received: from metis.ext.pengutronix.de ([85.220.165.71]:42075 "EHLO metis.ext.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726400AbgBZNTU (ORCPT ); Wed, 26 Feb 2020 08:19:20 -0500 Received: from kresse.hi.pengutronix.de ([2001:67c:670:100:1d::2a]) by metis.ext.pengutronix.de with esmtp (Exim 4.92) (envelope-from ) id 1j6wan-0000fH-2j; Wed, 26 Feb 2020 14:19:13 +0100 Message-ID: <03b551925d079fcc151239afa735562332cfd557.camel@pengutronix.de> Subject: Re: [PATCH v3 2/4] drm/imx: Add initial support for DCSS on iMX8MQ From: Lucas Stach To: Laurentiu Palcu , Philipp Zabel , Shawn Guo , Sascha Hauer , Pengutronix Kernel Team , Fabio Estevam , NXP Linux Team Cc: agx@sigxcpu.org, lukas@mntmn.com, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-kernel@lists.infradead.org Date: Wed, 26 Feb 2020 14:19:11 +0100 In-Reply-To: <1575625964-27102-3-git-send-email-laurentiu.palcu@nxp.com> References: <1575625964-27102-1-git-send-email-laurentiu.palcu@nxp.com> <1575625964-27102-3-git-send-email-laurentiu.palcu@nxp.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.5-1.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 2001:67c:670:100:1d::2a X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Laurentiu, again a day later than promised, but here we go with some more in-depth comments. Apologies if I missed something, my metal bandwidth was pretty exhausted by the time I made my was to the scaler code. :) On Fr, 2019-12-06 at 11:52 +0200, Laurentiu Palcu wrote: > This adds initial support for iMX8MQ's Display Controller Subsystem (DCSS). > Some of its capabilities include: > * 4K@60fps; > * HDR10; > * one graphics and 2 video pipelines; > * on-the-fly decompression of compressed video and graphics; > > The reference manual can be found here: > https://www.nxp.com/webapp/Download?colCode=IMX8MDQLQRM > > The current patch adds only basic functionality: one primary plane for > graphics, linear, tiled and super-tiled buffers support (no graphics > decompression yet), no HDR10 and no video planes. > > Video planes support and HDR10 will be added in subsequent patches once > per-plane de-gamma/CSC/gamma support is in. > > Signed-off-by: Laurentiu Palcu > --- [...] > diff --git a/drivers/gpu/drm/imx/dcss/dcss-blkctl.c b/drivers/gpu/drm/imx/dcss/dcss-blkctl.c > new file mode 100644 > index 00000000..ee7ffa1 > --- /dev/null > +++ b/drivers/gpu/drm/imx/dcss/dcss-blkctl.c > @@ -0,0 +1,75 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2019 NXP. > + */ > + > +#include > +#include > + > +#include "dcss-dev.h" > + > +#define DCSS_BLKCTL_RESET_CTRL 0x00 > +#define B_CLK_RESETN BIT(0) > +#define APB_CLK_RESETN BIT(1) > +#define P_CLK_RESETN BIT(2) > +#define RTR_CLK_RESETN BIT(3) > +#define DCSS_BLKCTL_CONTROL0 0x10 > +#define HDMI_MIPI_CLK_SEL BIT(0) > +#define DISPMIX_REFCLK_SEL_POS 4 > +#define DISPMIX_REFCLK_SEL_MASK GENMASK(5, 4) > +#define DISPMIX_PIXCLK_SEL BIT(8) > +#define HDMI_SRC_SECURE_EN BIT(16) > + > +struct dcss_blkctl { > + struct device *dev; > + void __iomem *base_reg; > + > + bool hdmi_output; Do we need this state in this structure? It's a duplication of state in struct dcss_dev, so maybe better just keep a pointer to the parent dcss_dev here? > +}; > + > +void dcss_blkctl_cfg(struct dcss_blkctl *blkctl) > +{ > + if (blkctl->hdmi_output) > + dcss_writel(0, blkctl->base_reg + DCSS_BLKCTL_CONTROL0); > + else > + dcss_writel(DISPMIX_PIXCLK_SEL, > + blkctl->base_reg + DCSS_BLKCTL_CONTROL0); The documentation is a bit sparse on why this needs to be configured differently for HDMI output. Is HDMI the only case where this clock mux needs to be configured to source from PLL2? > + > + dcss_set(B_CLK_RESETN | APB_CLK_RESETN | P_CLK_RESETN | RTR_CLK_RESETN, > + blkctl->base_reg + DCSS_BLKCTL_RESET_CTRL); > +} > + > +int dcss_blkctl_init(struct dcss_dev *dcss, unsigned long blkctl_base) > +{ > + struct dcss_blkctl *blkctl; > + > + blkctl = devm_kzalloc(dcss->dev, sizeof(*blkctl), GFP_KERNEL); A general remark that also applies to all the other DCSS subblock drivers: I don't think devm is the right choice for the resource management here. All those drivers also have explicit resource freeing in the _exit functions and there is an explicit rollback in dcss_submodules_init, so I think we should not pretend to use devm here, but just do explicit resource management without any devm involvement. > + if (!blkctl) > + return -ENOMEM; > + > + blkctl->base_reg = devm_ioremap(dcss->dev, blkctl_base, SZ_4K); > + if (!blkctl->base_reg) { > + dev_err(dcss->dev, "unable to remap BLK CTRL base\n"); > + devm_kfree(dcss->dev, blkctl); > + return -ENOMEM; > + } > + > + dcss->blkctl = blkctl; > + blkctl->dev = dcss->dev; > + blkctl->hdmi_output = dcss->hdmi_output; > + > + dcss_blkctl_cfg(blkctl); > + > + return 0; > +} > + > +void dcss_blkctl_exit(struct dcss_blkctl *blkctl) > +{ > + dcss_clr(P_CLK_RESETN | RTR_CLK_RESETN, > + blkctl->base_reg + DCSS_BLKCTL_RESET_CTRL); > + > + if (blkctl->base_reg) > + devm_iounmap(blkctl->dev, blkctl->base_reg); > + > + devm_kfree(blkctl->dev, blkctl); > +} > diff --git a/drivers/gpu/drm/imx/dcss/dcss-crtc.c b/drivers/gpu/drm/imx/dcss/dcss-crtc.c > new file mode 100644 > index 00000000..567bf07 > --- /dev/null > +++ b/drivers/gpu/drm/imx/dcss/dcss-crtc.c > @@ -0,0 +1,224 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2019 NXP. > + */ > + > +#include > +#include > +#include > +#include > + > +#include "dcss-dev.h" > +#include "dcss-kms.h" > + > +static int dcss_enable_vblank(struct drm_crtc *crtc) > +{ > + struct dcss_crtc *dcss_crtc = container_of(crtc, struct dcss_crtc, > + base); > + struct dcss_dev *dcss = crtc->dev->dev_private; > + > + if (dcss_crtc->irq_enabled) > + return 0; > + > + dcss_crtc->irq_enabled = true; This state should not be necessary. Unless there is a reference counting bug somewhere in the driver, the DRM core should never call enable_vblank and disable_vblank unbalanced. > + > + dcss_dtg_vblank_irq_enable(dcss->dtg, true); > + > + dcss_dtg_ctxld_kick_irq_enable(dcss->dtg, true); > + > + enable_irq(dcss_crtc->irq); > + > + return 0; > +} > + > +static void dcss_disable_vblank(struct drm_crtc *crtc) > +{ > + struct dcss_crtc *dcss_crtc = container_of(crtc, struct dcss_crtc, > + base); > + struct dcss_dev *dcss = dcss_crtc->base.dev->dev_private; > + > + disable_irq_nosync(dcss_crtc->irq); > + > + dcss_dtg_vblank_irq_enable(dcss->dtg, false); > + > + dcss_dtg_ctxld_kick_irq_enable(dcss->dtg, false); > + > + dcss_crtc->irq_enabled = false; > +} > + > +static const struct drm_crtc_funcs dcss_crtc_funcs = { > + .set_config = drm_atomic_helper_set_config, > + .destroy = drm_crtc_cleanup, > + .page_flip = drm_atomic_helper_page_flip, > + .reset = drm_atomic_helper_crtc_reset, > + .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state, > + .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state, > + .enable_vblank = dcss_enable_vblank, > + .disable_vblank = dcss_disable_vblank, > +}; > + > +static void dcss_crtc_atomic_begin(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > + drm_crtc_vblank_on(crtc); > + > + spin_lock_irq(&crtc->dev->event_lock); > + if (crtc->state->event) { > + WARN_ON(drm_crtc_vblank_get(crtc)); > + drm_crtc_arm_vblank_event(crtc, crtc->state->event); Arming of the vblank event should move into atomic flush. Otherwise the state update is racing with the vblank IRQ. In practice you probably don't hit this issue on a moderately loaded system very often as the CTXLD allows for almost a full frame duration to prepare the state update, but the following race is present in the current code: userspace commit worker HW atomic-commit dcss_crtc_atomic_begin -> drm_crtc_arm_vblank_event [prepare plane state updates] ... vblank preempted [reuses buffers from ... last commit, while they are still scanned out, as state update didn't finish in time] dcss_crtc_atomic_flush -> CTXLD arm vblank (freeing buffers from last commit) > + crtc->state->event = NULL; > + } > + spin_unlock_irq(&crtc->dev->event_lock); > +} > + > +static void dcss_crtc_atomic_flush(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > + struct dcss_crtc *dcss_crtc = container_of(crtc, struct dcss_crtc, > + base); > + struct dcss_dev *dcss = dcss_crtc->base.dev->dev_private; > + > + if (dcss_dtg_is_enabled(dcss->dtg)) > + dcss_ctxld_enable(dcss->ctxld); > +} > + > +static void dcss_crtc_atomic_enable(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > + struct dcss_crtc *dcss_crtc = container_of(crtc, struct dcss_crtc, > + base); > + struct dcss_dev *dcss = dcss_crtc->base.dev->dev_private; > + struct drm_display_mode *mode = &crtc->state->adjusted_mode; > + struct videomode vm; > + > + drm_display_mode_to_videomode(mode, &vm); > + > + pm_runtime_get_sync(dcss->dev); > + > + vm.pixelclock = mode->crtc_clock * 1000; > + > + dcss_dtg_sync_set(dcss->dtg, &vm); > + > + dcss_ss_subsam_set(dcss->ss); > + dcss_ss_sync_set(dcss->ss, &vm, mode->flags & DRM_MODE_FLAG_PHSYNC, > + mode->flags & DRM_MODE_FLAG_PVSYNC); > + > + dcss_dtg_css_set(dcss->dtg); > + > + dcss_ss_enable(dcss->ss); > + dcss_dtg_enable(dcss->dtg, true, NULL); > + dcss_ctxld_enable(dcss->ctxld); > + > + dcss_enable_vblank(crtc); Why is this needed? The drm_crtc_vblank_on in dcss_crtc_atomic_begin should keep vblank IRQs enabled as long as needed. > + > + reinit_completion(&dcss_crtc->en_completion); This is racing with the IRQ. The completion should be initialized before enabling the hardware, as otherwise the IRQ might fire before we get to the point where the completion is initialized, causing the wait below to time out. > + wait_for_completion_timeout(&dcss_crtc->en_completion, > + msecs_to_jiffies(500)); > +} > + > +static void dcss_crtc_atomic_disable(struct drm_crtc *crtc, > + struct drm_crtc_state *old_crtc_state) > +{ > + struct dcss_crtc *dcss_crtc = container_of(crtc, struct dcss_crtc, > + base); > + struct dcss_dev *dcss = dcss_crtc->base.dev->dev_private; > + > + drm_atomic_helper_disable_planes_on_crtc(old_crtc_state, false); > + > + spin_lock_irq(&crtc->dev->event_lock); > + if (crtc->state->event) { > + drm_crtc_send_vblank_event(crtc, crtc->state->event); > + crtc->state->event = NULL; > + } > + spin_unlock_irq(&crtc->dev->event_lock); > + > + dcss_dtg_ctxld_kick_irq_enable(dcss->dtg, true); > + > + dcss_ss_disable(dcss->ss); > + dcss_dtg_enable(dcss->dtg, false, &dcss_crtc->dis_completion); > + dcss_ctxld_enable(dcss->ctxld); > + > + reinit_completion(&dcss_crtc->dis_completion); Same as above. Init completion before calling the function, which might potentially complete it. > + wait_for_completion_timeout(&dcss_crtc->dis_completion, > + msecs_to_jiffies(100)); > + > + drm_crtc_vblank_off(crtc); Those should be replaced by drm_crtc_vblank_get and drm_crtc_vblank_put where actually needed. > + > + dcss_dtg_ctxld_kick_irq_enable(dcss->dtg, false); > + > + pm_runtime_put_sync(dcss->dev); > +} > + > +static const struct drm_crtc_helper_funcs dcss_helper_funcs = { > + .atomic_begin = dcss_crtc_atomic_begin, > + .atomic_flush = dcss_crtc_atomic_flush, > + .atomic_enable = dcss_crtc_atomic_enable, > + .atomic_disable = dcss_crtc_atomic_disable, > +}; > + > +static irqreturn_t dcss_crtc_irq_handler(int irq, void *dev_id) > +{ > + struct dcss_crtc *dcss_crtc = dev_id; > + struct dcss_dev *dcss = dcss_crtc->base.dev->dev_private; > + > + if (!dcss_dtg_vblank_irq_valid(dcss->dtg)) > + return IRQ_HANDLED; This should be IRQ_NONE, to give the IRQ core a chance to detect and disable a runaway spurious IRQ. > + > + complete(&dcss_crtc->en_completion); > + > + if (dcss_ctxld_is_flushed(dcss->ctxld)) > + drm_crtc_handle_vblank(&dcss_crtc->base); > + > + dcss_dtg_vblank_irq_clear(dcss->dtg); > + > + return IRQ_HANDLED; > +} > + > +int dcss_crtc_init(struct dcss_crtc *crtc, struct drm_device *drm) > +{ > + struct dcss_dev *dcss = drm->dev_private; > + struct platform_device *pdev = to_platform_device(dcss->dev); > + int ret; > + > + crtc->plane[0] = dcss_plane_init(drm, drm_crtc_mask(&crtc->base), > + DRM_PLANE_TYPE_PRIMARY, 0); > + if (IS_ERR(crtc->plane[0])) > + return PTR_ERR(crtc->plane[0]); > + > + crtc->base.port = dcss->of_port; > + > + drm_crtc_helper_add(&crtc->base, &dcss_helper_funcs); > + ret = drm_crtc_init_with_planes(drm, &crtc->base, &crtc->plane[0]->base, > + NULL, &dcss_crtc_funcs, NULL); > + if (ret) { > + dev_err(dcss->dev, "failed to init crtc\n"); > + return ret; > + } > + > + crtc->irq = platform_get_irq_byname(pdev, "vblank"); > + if (crtc->irq < 0) { > + dev_err(dcss->dev, "unable to get vblank interrupt\n"); platform_get_irq_byname already prints an error message if the IRQ can't be found, so no point in doing the same here. > + return crtc->irq; > + } > + > + init_completion(&crtc->en_completion); > + init_completion(&crtc->dis_completion); > + > + ret = devm_request_irq(dcss->dev, crtc->irq, dcss_crtc_irq_handler, > + IRQF_TRIGGER_RISING, "dcss_drm", crtc); AFAIK the irqsteer IRQs are level triggered. While irqsteer will just ignore the wrong trigger flags, better just remove the the flags and rely on the DT to provide the correct flags. > + if (ret) { > + dev_err(dcss->dev, "irq request failed with %d.\n", ret); > + return ret; > + } > + > + disable_irq(crtc->irq); > + > + return 0; > +} > + > +void dcss_crtc_deinit(struct dcss_crtc *crtc, struct drm_device *drm) > +{ > + struct dcss_dev *dcss = drm->dev_private; > + > + devm_free_irq(dcss->dev, crtc->irq, crtc); > +} > diff --git a/drivers/gpu/drm/imx/dcss/dcss-ctxld.c b/drivers/gpu/drm/imx/dcss/dcss-ctxld.c > new file mode 100644 > index 00000000..4fe35b2b > --- /dev/null > +++ b/drivers/gpu/drm/imx/dcss/dcss-ctxld.c > @@ -0,0 +1,447 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2019 NXP. > + */ > + > +#include > +#include > +#include > +#include > + > +#include "dcss-dev.h" > + > +#define DCSS_CTXLD_DEVNAME "dcss_ctxld" > + > +#define DCSS_CTXLD_CONTROL_STATUS 0x0 > +#define CTXLD_ENABLE BIT(0) > +#define ARB_SEL BIT(1) > +#define RD_ERR_EN BIT(2) > +#define DB_COMP_EN BIT(3) > +#define SB_HP_COMP_EN BIT(4) > +#define SB_LP_COMP_EN BIT(5) > +#define DB_PEND_SB_REC_EN BIT(6) > +#define SB_PEND_DISP_ACTIVE_EN BIT(7) > +#define AHB_ERR_EN BIT(8) > +#define RD_ERR BIT(16) > +#define DB_COMP BIT(17) > +#define SB_HP_COMP BIT(18) > +#define SB_LP_COMP BIT(19) > +#define DB_PEND_SB_REC BIT(20) > +#define SB_PEND_DISP_ACTIVE BIT(21) > +#define AHB_ERR BIT(22) > +#define DCSS_CTXLD_DB_BASE_ADDR 0x10 > +#define DCSS_CTXLD_DB_COUNT 0x14 > +#define DCSS_CTXLD_SB_BASE_ADDR 0x18 > +#define DCSS_CTXLD_SB_COUNT 0x1C > +#define SB_HP_COUNT_POS 0 > +#define SB_HP_COUNT_MASK 0xffff > +#define SB_LP_COUNT_POS 16 > +#define SB_LP_COUNT_MASK 0xffff0000 > +#define DCSS_AHB_ERR_ADDR 0x20 > + > +#define CTXLD_IRQ_NAME "ctx_ld" > +#define CTXLD_IRQ_COMPLETION (DB_COMP | SB_HP_COMP | SB_LP_COMP) > +#define CTXLD_IRQ_ERROR (RD_ERR | DB_PEND_SB_REC | AHB_ERR) > + > +/* The following sizes are in context loader entries, 8 bytes each. */ > +#define CTXLD_DB_CTX_ENTRIES 1024 /* max 65536 */ > +#define CTXLD_SB_LP_CTX_ENTRIES 10240 /* max 65536 */ > +#define CTXLD_SB_HP_CTX_ENTRIES 20000 /* max 65536 */ > +#define CTXLD_SB_CTX_ENTRIES (CTXLD_SB_LP_CTX_ENTRIES + \ > + CTXLD_SB_HP_CTX_ENTRIES) > + > +/* Sizes, in entries, of the DB, SB_HP and SB_LP context regions. */ > +static u16 dcss_ctxld_ctx_size[3] = { > + CTXLD_DB_CTX_ENTRIES, > + CTXLD_SB_HP_CTX_ENTRIES, > + CTXLD_SB_LP_CTX_ENTRIES > +}; > + > +/* this represents an entry in the context loader map */ > +struct dcss_ctxld_item { > + u32 val; > + u32 ofs; > +}; > + > +#define CTX_ITEM_SIZE sizeof(struct dcss_ctxld_item) > + > +struct dcss_ctxld { > + struct device *dev; > + void __iomem *ctxld_reg; > + int irq; > + bool irq_en; > + > + struct dcss_ctxld_item *db[2]; > + struct dcss_ctxld_item *sb_hp[2]; > + struct dcss_ctxld_item *sb_lp[2]; > + > + dma_addr_t db_paddr[2]; > + dma_addr_t sb_paddr[2]; > + > + u16 ctx_size[2][3]; /* holds the sizes of DB, SB_HP and SB_LP ctx */ > + u8 current_ctx; > + > + bool in_use; > + bool armed; > + > + spinlock_t lock; /* protects concurent access to private data */ > + > + void (*dtg_disable_cb)(void *data); > + void *dtg_disable_data; The only use of this callback is to signal a completion. As this can be done from IRQ context, couldn't we just remember the completion and remove this callback indirection? > +}; > + > +static int __dcss_ctxld_enable(struct dcss_ctxld *ctxld); This forward declaration isn't needed AFAICS. > +static irqreturn_t dcss_ctxld_irq_handler(int irq, void *data) > +{ > + struct dcss_ctxld *ctxld = data; > + u32 irq_status; > + > + irq_status = dcss_readl(ctxld->ctxld_reg + DCSS_CTXLD_CONTROL_STATUS); > + > + if (irq_status & CTXLD_IRQ_COMPLETION && > + !(irq_status & CTXLD_ENABLE) && ctxld->in_use) { > + ctxld->in_use = false; > + > + if (ctxld->dtg_disable_cb) { > + ctxld->dtg_disable_cb(ctxld->dtg_disable_data); > + ctxld->dtg_disable_cb = NULL; > + ctxld->dtg_disable_data = NULL; > + } > + } else if (irq_status & CTXLD_IRQ_ERROR) { > + /* > + * Except for throwing an error message and clearing the status > + * register, there's not much we can do here. > + */ > + dev_err(ctxld->dev, "ctxld: error encountered: %08x\n", > + irq_status); > + dev_err(ctxld->dev, "ctxld: db=%d, sb_hp=%d, sb_lp=%d\n", > + ctxld->ctx_size[ctxld->current_ctx ^ 1][CTX_DB], > + ctxld->ctx_size[ctxld->current_ctx ^ 1][CTX_SB_HP], > + ctxld->ctx_size[ctxld->current_ctx ^ 1][CTX_SB_LP]); > + } > + > + dcss_clr(irq_status & (CTXLD_IRQ_ERROR | CTXLD_IRQ_COMPLETION), > + ctxld->ctxld_reg + DCSS_CTXLD_CONTROL_STATUS); > + > + return IRQ_HANDLED; > +} > + > +static int dcss_ctxld_irq_config(struct dcss_ctxld *ctxld, > + struct platform_device *pdev) > +{ > + int ret; > + > + ctxld->irq = platform_get_irq_byname(pdev, CTXLD_IRQ_NAME); Why have a define for the IRQ name? It's only used a single time in this call, so I think it would be clearer to just have the string here. > + if (ctxld->irq < 0) { > + dev_err(ctxld->dev, "ctxld: can't get irq number\n"); platform_get_irq_byname already prints an error message if the IRQ can't be found, so no point in doing the same here. > + return ctxld->irq; > + } > + > + ret = devm_request_irq(ctxld->dev, ctxld->irq, > + dcss_ctxld_irq_handler, > + IRQF_ONESHOT | IRQF_TRIGGER_HIGH, Same as with the vblank IRQ, remove the trigger flags and rely on DT to provide the correct trigger. Also IRQF_ONESHOT doesn't really makes sense for a non-threaded interrupt handler. > + DCSS_CTXLD_DEVNAME, ctxld); > + if (ret) { > + dev_err(ctxld->dev, "ctxld: irq request failed.\n"); > + return ret; > + } > + > + ctxld->irq_en = true; > + > + return 0; > +} > + > +void dcss_ctxld_hw_cfg(struct dcss_ctxld *ctxld) Missing static annotation for local function. > +{ > + dcss_writel(RD_ERR_EN | SB_HP_COMP_EN | > + DB_PEND_SB_REC_EN | AHB_ERR_EN | RD_ERR | AHB_ERR, > + ctxld->ctxld_reg + DCSS_CTXLD_CONTROL_STATUS); > +} > + > +static void dcss_ctxld_free_ctx(struct dcss_ctxld *ctxld) > +{ > + struct dcss_ctxld_item *ctx; > + int i; > + > + for (i = 0; i < 2; i++) { > + if (ctxld->db[i]) { > + dmam_free_coherent(ctxld->dev, > + CTXLD_DB_CTX_ENTRIES * sizeof(*ctx), > + ctxld->db[i], ctxld->db_paddr[i]); Same comment as with the other devm_ functions: just use the raw dma_*_coherent functions when doing explicit resource management. > + ctxld->db[i] = NULL; > + ctxld->db_paddr[i] = 0; > + } > + > + if (ctxld->sb_hp[i]) { > + dmam_free_coherent(ctxld->dev, > + CTXLD_SB_CTX_ENTRIES * sizeof(*ctx), > + ctxld->sb_hp[i], ctxld->sb_paddr[i]); > + ctxld->sb_hp[i] = NULL; > + ctxld->sb_paddr[i] = 0; > + } > + } > +} > + > +static int dcss_ctxld_alloc_ctx(struct dcss_ctxld *ctxld) > +{ > + struct dcss_ctxld_item *ctx; > + int i; > + dma_addr_t dma_handle; > + > + for (i = 0; i < 2; i++) { > + ctx = dmam_alloc_coherent(ctxld->dev, > + CTXLD_DB_CTX_ENTRIES * sizeof(*ctx), > + &dma_handle, GFP_KERNEL); What's the point of the local dma_handle variable? Just use ctxld->db_paddr[i] directly. > + if (!ctx) > + return -ENOMEM; > + > + ctxld->db[i] = ctx; > + ctxld->db_paddr[i] = dma_handle; > + > + ctx = dmam_alloc_coherent(ctxld->dev, > + CTXLD_SB_CTX_ENTRIES * sizeof(*ctx), > + &dma_handle, GFP_KERNEL); > + if (!ctx) > + return -ENOMEM; > + > + ctxld->sb_hp[i] = ctx; > + ctxld->sb_lp[i] = ctx + CTXLD_SB_HP_CTX_ENTRIES; > + > + ctxld->sb_paddr[i] = dma_handle; > + } > + > + return 0; > +} > + > +int dcss_ctxld_init(struct dcss_dev *dcss, unsigned long ctxld_base) > +{ > + struct dcss_ctxld *ctxld; > + int ret; > + > + ctxld = devm_kzalloc(dcss->dev, sizeof(struct dcss_ctxld), > + GFP_KERNEL); > + if (!ctxld) > + return -ENOMEM; > + > + dcss->ctxld = ctxld; > + ctxld->dev = dcss->dev; > + > + spin_lock_init(&ctxld->lock); > + > + ret = dcss_ctxld_alloc_ctx(ctxld); > + if (ret) { > + dev_err(dcss->dev, "ctxld: cannot allocate context memory.\n"); > + goto err; > + } > + > + ctxld->ctxld_reg = devm_ioremap(dcss->dev, ctxld_base, SZ_4K); > + if (!ctxld->ctxld_reg) { > + dev_err(dcss->dev, "ctxld: unable to remap ctxld base\n"); > + ret = -ENOMEM; > + goto err; > + } > + > + ret = dcss_ctxld_irq_config(ctxld, to_platform_device(dcss->dev)); > + if (ret) > + goto err_irq; > + > + dcss_ctxld_hw_cfg(ctxld); > + > + return 0; > + > +err_irq: > + devm_iounmap(ctxld->dev, ctxld->ctxld_reg); > + > +err: > + dcss_ctxld_free_ctx(ctxld); > + devm_kfree(ctxld->dev, ctxld); > + > + return ret; > +} > + > +void dcss_ctxld_exit(struct dcss_ctxld *ctxld) > +{ > + devm_free_irq(ctxld->dev, ctxld->irq, ctxld); > + > + if (ctxld->ctxld_reg) > + devm_iounmap(ctxld->dev, ctxld->ctxld_reg); > + > + dcss_ctxld_free_ctx(ctxld); > + devm_kfree(ctxld->dev, ctxld); > +} > + > +static int __dcss_ctxld_enable(struct dcss_ctxld *ctxld) I don't like those __something local functions. Maybe call this dcss_ctxld_enable_locked instead? > +{ > + int curr_ctx = ctxld->current_ctx; > + u32 db_base, sb_base, sb_count; > + u32 sb_hp_cnt, sb_lp_cnt, db_cnt; > + struct dcss_dev *dcss = dcss_drv_dev_to_dcss(ctxld->dev); > + > + dcss_dpr_write_sysctrl(dcss->dpr); > + > + dcss_scaler_write_sclctrl(dcss->scaler); > + > + sb_hp_cnt = ctxld->ctx_size[curr_ctx][CTX_SB_HP]; > + sb_lp_cnt = ctxld->ctx_size[curr_ctx][CTX_SB_LP]; > + db_cnt = ctxld->ctx_size[curr_ctx][CTX_DB]; > + > + /* make sure SB_LP context area comes after SB_HP */ > + if (sb_lp_cnt && > + ctxld->sb_lp[curr_ctx] != ctxld->sb_hp[curr_ctx] + sb_hp_cnt) { > + struct dcss_ctxld_item *sb_lp_adjusted; > + > + sb_lp_adjusted = ctxld->sb_hp[curr_ctx] + sb_hp_cnt; > + > + memcpy(sb_lp_adjusted, ctxld->sb_lp[curr_ctx], > + sb_lp_cnt * CTX_ITEM_SIZE); > + } > + > + db_base = db_cnt ? ctxld->db_paddr[curr_ctx] : 0; > + > + dcss_writel(db_base, ctxld->ctxld_reg + DCSS_CTXLD_DB_BASE_ADDR); > + dcss_writel(db_cnt, ctxld->ctxld_reg + DCSS_CTXLD_DB_COUNT); > + > + if (sb_hp_cnt) > + sb_count = ((sb_hp_cnt << SB_HP_COUNT_POS) & SB_HP_COUNT_MASK) | > + ((sb_lp_cnt << SB_LP_COUNT_POS) & SB_LP_COUNT_MASK); > + else > + sb_count = (sb_lp_cnt << SB_HP_COUNT_POS) & SB_HP_COUNT_MASK; > + > + sb_base = sb_count ? ctxld->sb_paddr[curr_ctx] : 0; > + > + dcss_writel(sb_base, ctxld->ctxld_reg + DCSS_CTXLD_SB_BASE_ADDR); > + dcss_writel(sb_count, ctxld->ctxld_reg + DCSS_CTXLD_SB_COUNT); > + > + /* enable the context loader */ > + dcss_set(CTXLD_ENABLE, ctxld->ctxld_reg + DCSS_CTXLD_CONTROL_STATUS); > + > + ctxld->in_use = true; > + > + /* > + * Toggle the current context to the alternate one so that any updates > + * in the modules' settings take place there. > + */ > + ctxld->current_ctx ^= 1; > + > + ctxld->ctx_size[ctxld->current_ctx][CTX_DB] = 0; > + ctxld->ctx_size[ctxld->current_ctx][CTX_SB_HP] = 0; > + ctxld->ctx_size[ctxld->current_ctx][CTX_SB_LP] = 0; > + > + return 0; > +} > + > +int dcss_ctxld_enable(struct dcss_ctxld *ctxld) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&ctxld->lock, flags); This function is never called from IRQ context, so you can use the lower overhead spin_lock_irq here. > + ctxld->armed = true; > + spin_unlock_irqrestore(&ctxld->lock, flags); > + > + return 0; > +} > + > +void dcss_ctxld_kick(struct dcss_ctxld *ctxld) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&ctxld->lock, flags); > + if (ctxld->armed && !ctxld->in_use) { > + ctxld->armed = false; > + __dcss_ctxld_enable(ctxld); > + } > + spin_unlock_irqrestore(&ctxld->lock, flags); > +} > + > +void dcss_ctxld_write_irqsafe(struct dcss_ctxld *ctxld, u32 ctx_id, u32 val, > + u32 reg_ofs) > +{ > + int curr_ctx = ctxld->current_ctx; > + struct dcss_ctxld_item *ctx[] = { > + [CTX_DB] = ctxld->db[curr_ctx], > + [CTX_SB_HP] = ctxld->sb_hp[curr_ctx], > + [CTX_SB_LP] = ctxld->sb_lp[curr_ctx] > + }; > + int item_idx = ctxld->ctx_size[curr_ctx][ctx_id]; > + > + if (item_idx + 1 > dcss_ctxld_ctx_size[ctx_id]) { > + WARN_ON(1); > + return; > + } > + > + ctx[ctx_id][item_idx].val = val; > + ctx[ctx_id][item_idx].ofs = reg_ofs; > + ctxld->ctx_size[curr_ctx][ctx_id] += 1; > +} > + > +void dcss_ctxld_write(struct dcss_ctxld *ctxld, u32 ctx_id, > + u32 val, u32 reg_ofs) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&ctxld->lock, flags); I have not actually validated all callers, but I suspect that this function also isn't called from IRQ context, so can use spin_lock_irq. > + dcss_ctxld_write_irqsafe(ctxld, ctx_id, val, reg_ofs); > + spin_unlock_irqrestore(&ctxld->lock, flags); > +} > + > +bool dcss_ctxld_is_flushed(struct dcss_ctxld *ctxld) > +{ > + return ctxld->ctx_size[ctxld->current_ctx][CTX_DB] == 0 && > + ctxld->ctx_size[ctxld->current_ctx][CTX_SB_HP] == 0 && > + ctxld->ctx_size[ctxld->current_ctx][CTX_SB_LP] == 0; > +} > + > +int dcss_ctxld_resume(struct dcss_ctxld *ctxld) > +{ > + dcss_ctxld_hw_cfg(ctxld); > + > + if (!ctxld->irq_en) { > + enable_irq(ctxld->irq); > + ctxld->irq_en = true; > + } > + > + return 0; > +} > + > +int dcss_ctxld_suspend(struct dcss_ctxld *ctxld) > +{ > + int ret = 0; > + int wait_time_ms = 0; > + unsigned long flags; > + > + dcss_ctxld_kick(ctxld); > + > + while (ctxld->in_use && wait_time_ms < 500) { > + msleep(20); > + wait_time_ms += 20; > + } msleep might sleep a lot longer than the specified time, so if you care about the timeout being somewhat accurate you should use a timeout based on jiffies: unsigned long timeout = jiffies + msecs_to_jiffies(500); while (!time_after(jiffies, timout) && ctxld->in_use) msleep(20); > + if (wait_time_ms > 500) > + return -ETIMEDOUT; > + > + spin_lock_irqsave(&ctxld->lock, flags); spin_lock_irq > + > + if (ctxld->irq_en) { > + disable_irq_nosync(ctxld->irq); > + ctxld->irq_en = false; > + } > + > + /* reset context region and sizes */ > + ctxld->current_ctx = 0; > + ctxld->ctx_size[0][CTX_DB] = 0; > + ctxld->ctx_size[0][CTX_SB_HP] = 0; > + ctxld->ctx_size[0][CTX_SB_LP] = 0; > + > + spin_unlock_irqrestore(&ctxld->lock, flags); > + > + return ret; > +} > + > +void dcss_ctxld_register_dtg_disable_cb(struct dcss_ctxld *ctxld, > + void (*cb)(void *), > + void *data) > +{ > + ctxld->dtg_disable_cb = cb; > + ctxld->dtg_disable_data = data; > +} > diff --git a/drivers/gpu/drm/imx/dcss/dcss-dev.c b/drivers/gpu/drm/imx/dcss/dcss-dev.c > new file mode 100644 > index 00000000..265bf3c > --- /dev/null > +++ b/drivers/gpu/drm/imx/dcss/dcss-dev.c > @@ -0,0 +1,286 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2019 NXP. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "dcss-dev.h" > + > +static void dcss_clocks_enable(struct dcss_dev *dcss) > +{ > + if (dcss->clks_on) > + return; AFAICS dcss_clocks_enable and dcss_clocks_disable are always called balanced, so there is no point in having this clks_on state, right? > + > + clk_prepare_enable(dcss->axi_clk); > + clk_prepare_enable(dcss->apb_clk); > + clk_prepare_enable(dcss->rtrm_clk); > + clk_prepare_enable(dcss->dtrc_clk); > + clk_prepare_enable(dcss->pix_clk); > + > + dcss->clks_on = true; > +} > + > +static void dcss_clocks_disable(struct dcss_dev *dcss) > +{ > + if (!dcss->clks_on) > + return; > + > + clk_disable_unprepare(dcss->pix_clk); > + clk_disable_unprepare(dcss->dtrc_clk); > + clk_disable_unprepare(dcss->rtrm_clk); > + clk_disable_unprepare(dcss->apb_clk); > + clk_disable_unprepare(dcss->axi_clk); > + > + dcss->clks_on = false; > +} > + > +static int dcss_submodules_init(struct dcss_dev *dcss) > +{ > + int ret = 0; > + u32 base_addr = dcss->start_addr; > + const struct dcss_type_data *devtype = dcss->devtype; > + > + dcss_clocks_enable(dcss); > + > + ret = dcss_blkctl_init(dcss, base_addr + devtype->blkctl_ofs); > + if (ret) > + return ret; > + > + ret = dcss_ctxld_init(dcss, base_addr + devtype->ctxld_ofs); > + if (ret) > + goto ctxld_err; > + > + ret = dcss_dtg_init(dcss, base_addr + devtype->dtg_ofs); > + if (ret) > + goto dtg_err; > + > + ret = dcss_ss_init(dcss, base_addr + devtype->ss_ofs); > + if (ret) > + goto ss_err; > + > + ret = dcss_dpr_init(dcss, base_addr + devtype->dpr_ofs); > + if (ret) > + goto dpr_err; > + > + ret = dcss_scaler_init(dcss, base_addr + devtype->scaler_ofs); > + if (ret) > + goto scaler_err; > + > + return 0; > + > +scaler_err: > + dcss_dpr_exit(dcss->dpr); > + > +dpr_err: > + dcss_ss_exit(dcss->ss); > + > +ss_err: > + dcss_dtg_exit(dcss->dtg); > + > +dtg_err: > + dcss_ctxld_exit(dcss->ctxld); > + > +ctxld_err: > + dcss_blkctl_exit(dcss->blkctl); > + > + dcss_clocks_disable(dcss); > + > + return ret; > +} > + > +static void dcss_submodules_stop(struct dcss_dev *dcss) > +{ > + dcss_clocks_enable(dcss); > + dcss_scaler_exit(dcss->scaler); > + dcss_dpr_exit(dcss->dpr); > + dcss_ss_exit(dcss->ss); > + dcss_dtg_exit(dcss->dtg); > + dcss_ctxld_exit(dcss->ctxld); > + dcss_blkctl_exit(dcss->blkctl); > + dcss_clocks_disable(dcss); > +} > + > +static int dcss_clks_init(struct dcss_dev *dcss) > +{ > + int i; > + struct { > + const char *id; > + struct clk **clk; > + } clks[] = { > + {"apb", &dcss->apb_clk}, > + {"axi", &dcss->axi_clk}, > + {"pix", &dcss->pix_clk}, > + {"rtrm", &dcss->rtrm_clk}, > + {"dtrc", &dcss->dtrc_clk}, > + }; > + > + for (i = 0; i < ARRAY_SIZE(clks); i++) { > + *clks[i].clk = devm_clk_get(dcss->dev, clks[i].id); > + if (IS_ERR(*clks[i].clk)) { > + dev_err(dcss->dev, "failed to get %s clock\n", > + clks[i].id); > + return PTR_ERR(*clks[i].clk); > + } > + } > + > + dcss->clks_on = false; > + > + return 0; > +} > + > +static void dcss_clks_release(struct dcss_dev *dcss) > +{ > + devm_clk_put(dcss->dev, dcss->dtrc_clk); > + devm_clk_put(dcss->dev, dcss->rtrm_clk); > + devm_clk_put(dcss->dev, dcss->pix_clk); > + devm_clk_put(dcss->dev, dcss->axi_clk); > + devm_clk_put(dcss->dev, dcss->apb_clk); > +} > + > +struct dcss_dev *dcss_dev_create(struct device *dev, bool hdmi_output) > +{ > + struct platform_device *pdev = to_platform_device(dev); > + int ret; > + struct resource *res; > + struct dcss_dev *dcss; > + const struct dcss_type_data *devtype; > + > + devtype = of_device_get_match_data(dev); > + if (!devtype) { > + dev_err(dev, "no device match found\n"); > + return ERR_PTR(-ENODEV); > + } > + > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > + if (!res) { > + dev_err(dev, "cannot get memory resource\n"); > + return ERR_PTR(-EINVAL); > + } > + > + dcss = devm_kzalloc(dev, sizeof(struct dcss_dev), GFP_KERNEL); > + if (!dcss) > + return ERR_PTR(-ENOMEM); > + > + dcss->dev = dev; > + dcss->devtype = devtype; > + dcss->hdmi_output = hdmi_output; > + > + ret = dcss_clks_init(dcss); > + if (ret) { > + dev_err(dev, "clocks initialization failed\n"); > + goto err; > + } > + > + dcss->of_port = of_graph_get_port_by_id(dev->of_node, 0); > + if (!dcss->of_port) { > + dev_err(dev, "no port@0 node in %s\n", dev->of_node->full_name); > + ret = -ENODEV; > + goto clks_err; > + } > + > + dcss->start_addr = res->start; > + > + ret = dcss_submodules_init(dcss); > + if (ret) { > + dev_err(dev, "submodules initialization failed\n"); > + goto clks_err; > + } > + > + pm_runtime_enable(dev); > + > + return dcss; > + > +clks_err: > + dcss_clks_release(dcss); > + > +err: > + devm_kfree(dcss->dev, dcss); > + > + return ERR_PTR(ret); > +} > + > +void dcss_dev_destroy(struct dcss_dev *dcss) > +{ > + pm_runtime_disable(dcss->dev); > + > + dcss_submodules_stop(dcss); > + > + dcss_clks_release(dcss); > + > + devm_kfree(dcss->dev, dcss); > +} > + > +#ifdef CONFIG_PM_SLEEP > +int dcss_dev_suspend(struct device *dev) > +{ > + struct dcss_dev *dcss = dcss_drv_dev_to_dcss(dev); > + int ret; > + > + drm_mode_config_helper_suspend(dcss_drv_dev_to_drm(dev)); > + > + if (pm_runtime_suspended(dev)) > + return 0; > + > + ret = dcss_ctxld_suspend(dcss->ctxld); > + if (ret) > + return ret; > + > + dcss_clocks_disable(dcss); > + > + return 0; > +} > + > +int dcss_dev_resume(struct device *dev) > +{ > + struct dcss_dev *dcss = dcss_drv_dev_to_dcss(dev); > + > + if (pm_runtime_suspended(dev)) { > + drm_mode_config_helper_resume(dcss_drv_dev_to_drm(dev)); > + return 0; > + } > + > + dcss_clocks_enable(dcss); > + > + dcss_blkctl_cfg(dcss->blkctl); > + > + dcss_ctxld_resume(dcss->ctxld); > + > + drm_mode_config_helper_resume(dcss_drv_dev_to_drm(dev)); > + > + return 0; > +} > +#endif /* CONFIG_PM_SLEEP */ > + > +#ifdef CONFIG_PM > +int dcss_dev_runtime_suspend(struct device *dev) > +{ > + struct dcss_dev *dcss = dcss_drv_dev_to_dcss(dev); > + int ret; > + > + ret = dcss_ctxld_suspend(dcss->ctxld); > + if (ret) > + return ret; > + > + dcss_clocks_disable(dcss); > + > + return 0; > +} > + > +int dcss_dev_runtime_resume(struct device *dev) > +{ > + struct dcss_dev *dcss = dcss_drv_dev_to_dcss(dev); > + > + dcss_clocks_enable(dcss); > + > + dcss_blkctl_cfg(dcss->blkctl); > + > + dcss_ctxld_resume(dcss->ctxld); > + > + return 0; > +} > +#endif /* CONFIG_PM */ > diff --git a/drivers/gpu/drm/imx/dcss/dcss-dev.h b/drivers/gpu/drm/imx/dcss/dcss-dev.h > new file mode 100644 > index 00000000..15c5de3 > --- /dev/null > +++ b/drivers/gpu/drm/imx/dcss/dcss-dev.h > @@ -0,0 +1,195 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright 2019 NXP. > + */ > + > +#ifndef __DCSS_PRV_H__ > +#define __DCSS_PRV_H__ > + > +#include > +#include > +#include