2017-09-25 22:20:33

by Dmitry Osipenko

[permalink] [raw]
Subject: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver

This driver provides accelerated video decoding to NVIDIA Tegra20 SoC's,
it is a result of reverse-engineering efforts. Driver has been tested on
Toshiba AC100 and Acer A500, it should work on any Tegra20 device.

In userspace this driver is utilized by libvdpau-tegra [0] that implements
VDPAU interface, so any video player that supports VDPAU can provide
accelerated video decoding on Tegra20 on Linux.

[0] https://github.com/grate-driver/libvdpau-tegra

Dmitry Osipenko (2):
staging: Introduce NVIDIA Tegra20 video decoder driver
ARM: dts: tegra20: Add video decoder node

.../bindings/arm/tegra/nvidia,tegra20-vde.txt | 38 +
arch/arm/boot/dts/tegra20.dtsi | 16 +
drivers/staging/Kconfig | 2 +
drivers/staging/Makefile | 1 +
drivers/staging/tegra-vde/Kconfig | 6 +
drivers/staging/tegra-vde/Makefile | 1 +
drivers/staging/tegra-vde/TODO | 8 +
drivers/staging/tegra-vde/uapi.h | 99 +++
drivers/staging/tegra-vde/vde.c | 971 +++++++++++++++++++++
9 files changed, 1142 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
create mode 100644 drivers/staging/tegra-vde/Kconfig
create mode 100644 drivers/staging/tegra-vde/Makefile
create mode 100644 drivers/staging/tegra-vde/TODO
create mode 100644 drivers/staging/tegra-vde/uapi.h
create mode 100644 drivers/staging/tegra-vde/vde.c

--
2.14.1


2017-09-25 22:20:45

by Dmitry Osipenko

[permalink] [raw]
Subject: [PATCH v1 2/2] ARM: dts: tegra20: Add video decoder node

Add a device node for the video decoder engine found on Tegra20.

Signed-off-by: Dmitry Osipenko <[email protected]>
---
arch/arm/boot/dts/tegra20.dtsi | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi
index 7c85f97f72ea..fb485a5e63d7 100644
--- a/arch/arm/boot/dts/tegra20.dtsi
+++ b/arch/arm/boot/dts/tegra20.dtsi
@@ -249,6 +249,22 @@
*/
};

+ vde@6001a000 {
+ compatible = "nvidia,tegra20-vde";
+ reg = <0x6001a000 0x3D00 /* VDE registers */
+ 0x40000400 0x3FC00>; /* IRAM area */
+ reg-names = "regs", "iram";
+ interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, /* UCQ error interrupt */
+ <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>, /* Sync token interrupt */
+ <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, /* BSE-V interrupt */
+ <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>, /* BSE-A interrupt */
+ <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>; /* SXE interrupt */
+ interrupt-names = "ucq-error", "sync-token", "bsev", "bsea", "sxe";
+ clocks = <&tegra_car TEGRA20_CLK_VDE>;
+ resets = <&tegra_car 61>;
+ reset-names = "vde";
+ };
+
apbmisc@70000800 {
compatible = "nvidia,tegra20-apbmisc";
reg = <0x70000800 0x64 /* Chip revision */
--
2.14.1

2017-09-25 22:21:05

by Dmitry Osipenko

[permalink] [raw]
Subject: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
decoding of CAVLC H.264 only.

Signed-off-by: Dmitry Osipenko <[email protected]>
---
.../bindings/arm/tegra/nvidia,tegra20-vde.txt | 38 +
drivers/staging/Kconfig | 2 +
drivers/staging/Makefile | 1 +
drivers/staging/tegra-vde/Kconfig | 6 +
drivers/staging/tegra-vde/Makefile | 1 +
drivers/staging/tegra-vde/TODO | 8 +
drivers/staging/tegra-vde/uapi.h | 99 +++
drivers/staging/tegra-vde/vde.c | 971 +++++++++++++++++++++
8 files changed, 1126 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
create mode 100644 drivers/staging/tegra-vde/Kconfig
create mode 100644 drivers/staging/tegra-vde/Makefile
create mode 100644 drivers/staging/tegra-vde/TODO
create mode 100644 drivers/staging/tegra-vde/uapi.h
create mode 100644 drivers/staging/tegra-vde/vde.c

diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
new file mode 100644
index 000000000000..e5ca6f5e96c0
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
@@ -0,0 +1,38 @@
+NVIDIA Tegra Video Decoder Engine
+
+Required properties:
+- compatible : "nvidia,tegra20-vde"
+- reg : Must contain 2 register ranges: registers and IRAM area.
+- reg-names : Must include the following entries:
+ - regs
+ - iram
+- interrupts : Must contain an entry for each entry in interrupt-names.
+- interrupt-names : Must include the following entries:
+ - ucq-error
+ - sync-token
+ - bsev
+ - bsea
+ - sxe
+- clocks : Must contain one entry, for the module clock.
+ See ../clocks/clock-bindings.txt for details.
+- resets : Must contain an entry for each entry in reset-names.
+ See ../reset/reset.txt for details.
+- reset-names : Must include the following entries:
+ - vde
+
+Example:
+ vde@6001a000 {
+ compatible = "nvidia,tegra20-vde";
+ reg = <0x6001a000 0x3D00 /* VDE registers */
+ 0x40000400 0x3FC00>; /* IRAM area */
+ reg-names = "regs", "iram";
+ interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, /* UCQ error interrupt */
+ <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>, /* Sync token interrupt */
+ <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, /* BSE-V interrupt */
+ <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>, /* BSE-A interrupt */
+ <GIC_SPI 12 IRQ_TYPE_LEVEL_HIGH>; /* SXE interrupt */
+ interrupt-names = "ucq-error", "sync-token", "bsev", "bsea", "sxe";
+ clocks = <&tegra_car TEGRA20_CLK_VDE>;
+ resets = <&tegra_car 61>;
+ reset-names = "vde";
+ };
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 554683912cff..10c982811093 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -118,4 +118,6 @@ source "drivers/staging/vboxvideo/Kconfig"

source "drivers/staging/pi433/Kconfig"

+source "drivers/staging/tegra-vde/Kconfig"
+
endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 8951c37d8d80..d07c167d5773 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -49,3 +49,4 @@ obj-$(CONFIG_BCM2835_VCHIQ) += vc04_services/
obj-$(CONFIG_CRYPTO_DEV_CCREE) += ccree/
obj-$(CONFIG_DRM_VBOXVIDEO) += vboxvideo/
obj-$(CONFIG_PI433) += pi433/
+obj-$(CONFIG_ARCH_TEGRA) += tegra-vde/
diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
new file mode 100644
index 000000000000..b947c012a373
--- /dev/null
+++ b/drivers/staging/tegra-vde/Kconfig
@@ -0,0 +1,6 @@
+config TEGRA_VDE
+ tristate "NVIDIA Tegra20 video decoder driver"
+ depends on ARCH_TEGRA_2x_SOC
+ help
+ Say Y here to enable support for a NVIDIA Tegra20 video decoder
+ driver.
diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
new file mode 100644
index 000000000000..e7c0df1174bf
--- /dev/null
+++ b/drivers/staging/tegra-vde/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_TEGRA_VDE) += vde.o
diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
new file mode 100644
index 000000000000..533ddfc5190e
--- /dev/null
+++ b/drivers/staging/tegra-vde/TODO
@@ -0,0 +1,8 @@
+All TODO's require reverse-engineering to be done first, it is very
+unlikely that NVIDIA would ever release HW specs to public.
+
+TODO:
+ - properly handle decoding faults
+ - support more formats
+
+Contact: Dmitry Osipenko <[email protected]>
diff --git a/drivers/staging/tegra-vde/uapi.h b/drivers/staging/tegra-vde/uapi.h
new file mode 100644
index 000000000000..4e60449f688e
--- /dev/null
+++ b/drivers/staging/tegra-vde/uapi.h
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2016-2017 Dmitry Osipenko <[email protected]>
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _UAPI_TEGRA_VDE_H_
+#define _UAPI_TEGRA_VDE_H_
+
+#include <linux/types.h>
+#include <asm/ioctl.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define FLAG_IS_B_FRAME (1 << 0)
+#define FLAG_IS_REFERENCE (1 << 1)
+
+struct tegra_vde_h264_frame {
+ __s32 y_fd;
+ __s32 cb_fd;
+ __s32 cr_fd;
+ __s32 aux_fd;
+ __u32 y_offset;
+ __u32 cb_offset;
+ __u32 cr_offset;
+ __u32 aux_offset;
+ __u32 frame_num;
+ __u32 flags;
+
+ __u32 reserved;
+} __attribute__((packed));
+
+struct tegra_vde_h264_decoder_ctx {
+ __s32 bitstream_data_fd;
+ __u32 bitstream_data_offset;
+
+ __u32 dpb_frames_ptr;
+ __u8 dpb_frames_nb;
+ __u8 dpb_ref_frames_with_earlier_poc_nb;
+
+ // SPS
+ __u8 is_baseline_profile;
+ __u8 level_idc;
+ __u8 log2_max_pic_order_cnt_lsb;
+ __u8 log2_max_frame_num;
+ __u8 pic_order_cnt_type;
+ __u8 direct_8x8_inference_flag;
+ __u8 pic_width_in_mbs;
+ __u8 pic_height_in_mbs;
+
+ // PPS
+ __u8 pic_init_qp;
+ __u8 deblocking_filter_control_present_flag;
+ __u8 constrained_intra_pred_flag;
+ __u8 chroma_qp_index_offset;
+ __u8 pic_order_present_flag;
+
+ // Slice header
+ __u8 num_ref_idx_l0_active_minus1;
+ __u8 num_ref_idx_l1_active_minus1;
+
+ __u32 reserved;
+} __attribute__((packed));
+
+#define VDE_IOCTL_BASE 'v'
+#define VDE_IO(nr) _IO(VDE_IOCTL_BASE,nr)
+#define VDE_IOR(nr,type) _IOR(VDE_IOCTL_BASE,nr,type)
+#define VDE_IOW(nr,type) _IOW(VDE_IOCTL_BASE,nr,type)
+#define VDE_IOWR(nr,type) _IOWR(VDE_IOCTL_BASE,nr,type)
+
+#define TEGRA_VDE_DECODE_H264 0x01
+
+#define TEGRA_VDE_IOCTL_DECODE_H264 VDE_IOW(VDE_IOCTL_BASE + TEGRA_VDE_DECODE_H264, struct tegra_vde_h264_decoder_ctx)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif // _UAPI_TEGRA_VDE_H_
diff --git a/drivers/staging/tegra-vde/vde.c b/drivers/staging/tegra-vde/vde.c
new file mode 100644
index 000000000000..309d87926e21
--- /dev/null
+++ b/drivers/staging/tegra-vde/vde.c
@@ -0,0 +1,971 @@
+/*
+ * NVIDIA Tegra20 Video decoder driver
+ *
+ * Copyright (C) 2016-2017 Dmitry Osipenko <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <soc/tegra/pmc.h>
+
+#include "uapi.h"
+
+#define SXE(offt) (0x0000 + (offt))
+#define BSEV(offt) (0x1000 + (offt))
+#define MBE(offt) (0x2000 + (offt))
+#define PPE(offt) (0x2200 + (offt))
+#define MCE(offt) (0x2400 + (offt))
+#define TFE(offt) (0x2600 + (offt))
+#define VDMA(offt) (0x2A00 + (offt))
+#define FRAMEID(offt) (0x3800 + (offt))
+
+#define ICMDQUE_WR 0x00
+#define CMDQUE_CONTROL 0x08
+#define INTR_STATUS 0x18
+#define BSE_INT_ENB 0x40
+#define BSE_CONFIG 0x44
+
+#define BSE_ICMDQUE_EMPTY BIT(3)
+#define BSE_DMA_BUSY BIT(23)
+
+#define TEGRA_VDE_TIMEOUT (msecs_to_jiffies(1000))
+
+#define VDE_WR(data, addr) \
+do { \
+ pr_debug("%s: %d: 0x%08X => " #addr ")\n", \
+ __func__, __LINE__, (data)); \
+ writel_relaxed(data, addr); \
+} while (0)
+
+struct video_frame {
+ struct dma_buf_attachment *y_dmabuf_attachment;
+ struct dma_buf_attachment *cb_dmabuf_attachment;
+ struct dma_buf_attachment *cr_dmabuf_attachment;
+ struct dma_buf_attachment *aux_dmabuf_attachment;
+ dma_addr_t y_paddr;
+ dma_addr_t cb_paddr;
+ dma_addr_t cr_paddr;
+ dma_addr_t aux_paddr;
+ u32 frame_num;
+ u32 flags;
+};
+
+struct tegra_vde {
+ void __iomem *regs;
+ void __iomem *iram;
+ struct mutex lock;
+ struct miscdevice miscdev;
+ struct reset_control *rst;
+ struct clk *clk;
+ struct completion decode_completion;
+ phys_addr_t iram_lists_paddr;
+ int irq;
+};
+
+static void tegra_vde_set_bits(void __iomem *regs, u32 mask, u32 offset)
+{
+ u32 value = readl_relaxed(regs + offset);
+
+ VDE_WR(value | mask, regs + offset);
+}
+
+static int tegra_vde_wait_mbe(void __iomem *regs)
+{
+ u32 tmp;
+
+ return readl_poll_timeout(regs + MBE(0x8C), tmp, (tmp >= 0x10), 1, 100);
+}
+
+static int tegra_vde_setup_mbe_frame_idx(void __iomem *regs,
+ int setup_refs, int refs_nb)
+{
+ u32 frame_idx_enb_mask = 0;
+ u32 value;
+ int frame_idx;
+ int idx;
+
+ VDE_WR(0xD0000000 | (0 << 23), regs + MBE(0x80));
+ VDE_WR(0xD0200000 | (0 << 23), regs + MBE(0x80));
+
+ if (tegra_vde_wait_mbe(regs))
+ return -EIO;
+
+ if (!setup_refs)
+ return 0;
+
+ for (idx = 0, frame_idx = 1; idx < refs_nb; idx++, frame_idx++) {
+ VDE_WR(0xD0000000 | (frame_idx << 23), regs + MBE(0x80));
+ VDE_WR(0xD0200000 | (frame_idx << 23), regs + MBE(0x80));
+
+ frame_idx_enb_mask |= frame_idx << (6 * (idx % 4));
+
+ if (idx % 4 == 3 || idx == refs_nb - 1) {
+ value = 0xC0000000;
+ value |= (idx >> 2) << 24;
+ value |= frame_idx_enb_mask;
+
+ VDE_WR(value, regs + MBE(0x80));
+
+ if (tegra_vde_wait_mbe(regs))
+ return -EIO;
+
+ frame_idx_enb_mask = 0;
+ }
+ }
+
+ return 0;
+}
+
+static void tegra_vde_mbe_set_0xa_reg(void __iomem *regs, int reg, u32 val)
+{
+ VDE_WR(0xA0000000 | (reg << 24) | (val & 0xFFFF), regs + MBE(0x80));
+ VDE_WR(0xA0000000 | ((reg + 1) << 24) | (val >> 16), regs + MBE(0x80));
+}
+
+static int tegra_vde_wait_bsev(struct tegra_vde *vde, bool wait_dma)
+{
+ struct device *dev = vde->miscdev.parent;
+ u32 polled;
+ int ret;
+
+ ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+ !(polled & BIT(2)), 1, 100);
+ if (ret) {
+ dev_err(dev, "BSEV unknown bit timeout\n");
+ return -EIO;
+ }
+
+ ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+ (polled & BSE_ICMDQUE_EMPTY), 1, 100);
+ if (ret) {
+ dev_err(dev, "BSEV ICMDQUE flush timeout\n");
+ return -EIO;
+ }
+
+ if (!wait_dma)
+ return 0;
+
+ ret = readl_poll_timeout(vde->regs + BSEV(INTR_STATUS), polled,
+ !(polled & BSE_DMA_BUSY), 1, 100);
+ if (ret) {
+ dev_err(dev, "BSEV DMA timeout\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int tegra_vde_push_bsev_icmdqueue(struct tegra_vde *vde,
+ u32 value, bool wait_dma)
+{
+ VDE_WR(value, vde->regs + BSEV(ICMDQUE_WR));
+
+ return tegra_vde_wait_bsev(vde, wait_dma);
+}
+
+static void tegra_vde_setup_frameid(void __iomem *regs,
+ struct video_frame *frame, int frameid,
+ u32 mbs_width, u32 mbs_height)
+{
+ u32 y_paddr = frame ? frame->y_paddr : 0xFCDEAD00;
+ u32 cb_paddr = frame ? frame->cb_paddr : 0xFCDEAD00;
+ u32 cr_paddr = frame ? frame->cr_paddr : 0xFCDEAD00;
+ u32 v1 = frame ? ((mbs_width << 16) | mbs_height) : 0;
+ u32 v2 = frame ? ((((mbs_width + 1) >> 1) << 6) | 1) : 0;
+
+ VDE_WR( y_paddr >> 8, regs + FRAMEID(0x000 + frameid * 4));
+ VDE_WR(cb_paddr >> 8, regs + FRAMEID(0x100 + frameid * 4));
+ VDE_WR(cr_paddr >> 8, regs + FRAMEID(0x180 + frameid * 4));
+ VDE_WR(v1, regs + FRAMEID(0x080 + frameid * 4));
+ VDE_WR(v2, regs + FRAMEID(0x280 + frameid * 4));
+}
+
+static void tegra_setup_frameidx(void __iomem *regs,
+ struct video_frame *frames, int frames_nb,
+ u32 mbs_width, u32 mbs_height)
+{
+ int idx;
+
+ for (idx = 0; idx < frames_nb; idx++)
+ tegra_vde_setup_frameid(regs, &frames[idx], idx,
+ mbs_width, mbs_height);
+ for (; idx < 17; idx++)
+ tegra_vde_setup_frameid(regs, NULL, idx, 0, 0);
+}
+
+static void tegra_vde_write_iram_entry(void __iomem *tables,
+ int table, int row,
+ u32 value1, u32 value2)
+{
+ VDE_WR(value1, tables + 0x80 * table + row * 8);
+ VDE_WR(value2, tables + 0x80 * table + row * 8 + 4);
+}
+
+static void tegra_vde_setup_iram_tables(void __iomem *iram_tables,
+ struct video_frame *dpb_frames,
+ int ref_frames_nb,
+ int with_earlier_poc_nb)
+{
+ struct video_frame *frame;
+ u32 value, aux_paddr;
+ int with_later_poc_nb;
+ int i, k;
+
+ pr_debug("DPB: Frame 0: frame_num = %d\n", dpb_frames[0].frame_num);
+
+ pr_debug("REF L0:\n");
+
+ for (i = 0; i < 16; i++) {
+ if (i < ref_frames_nb) {
+ frame = &dpb_frames[i + 1];
+
+ aux_paddr = frame->aux_paddr;
+
+ value = (i + 1) << 26;
+ value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+ value |= 1 << 24;
+ value |= frame->frame_num;
+
+ pr_debug("\tFrame %d: frame_num = %d is_B_frame = %d\n",
+ i + 1, frame->frame_num,
+ (frame->flags & FLAG_IS_B_FRAME));
+ } else {
+ aux_paddr = 0xFADEAD00;
+ value = 0;
+ }
+
+ tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);
+ tegra_vde_write_iram_entry(iram_tables, 1, i, value, aux_paddr);
+ tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+ tegra_vde_write_iram_entry(iram_tables, 3, i, value, aux_paddr);
+ }
+
+ if (!(dpb_frames[0].flags & FLAG_IS_B_FRAME))
+ return;
+
+ if (with_earlier_poc_nb >= ref_frames_nb)
+ return;
+
+ with_later_poc_nb = ref_frames_nb - with_earlier_poc_nb;
+
+ pr_debug("REF L1: with_later_poc_nb %d with_earlier_poc_nb %d\n",
+ with_later_poc_nb, with_earlier_poc_nb);
+
+ for (i = 0, k = with_earlier_poc_nb; i < with_later_poc_nb; i++, k++) {
+ frame = &dpb_frames[k + 1];
+
+ aux_paddr = frame->aux_paddr;
+
+ value = (k + 1) << 26;
+ value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+ value |= 1 << 24;
+ value |= frame->frame_num;
+
+ pr_debug("\tFrame %d: frame_num = %d\n",
+ k + 1, frame->frame_num);
+
+ tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+ }
+
+ for (k = 0; i < ref_frames_nb; i++, k++) {
+ frame = &dpb_frames[k + 1];
+
+ aux_paddr = frame->aux_paddr;
+
+ value = (k + 1) << 26;
+ value |= !(frame->flags & FLAG_IS_B_FRAME) << 25;
+ value |= 1 << 24;
+ value |= frame->frame_num;
+
+ pr_debug("\tFrame %d: frame_num = %d\n",
+ k + 1, frame->frame_num);
+
+ tegra_vde_write_iram_entry(iram_tables, 2, i, value, aux_paddr);
+ }
+}
+
+static int tegra_vde_setup_context(struct tegra_vde *vde,
+ struct tegra_vde_h264_decoder_ctx *ctx,
+ struct video_frame *dpb_frames,
+ phys_addr_t bitstream_data_paddr,
+ int bitstream_data_size,
+ int macroblocks_nb)
+{
+ struct device *dev = vde->miscdev.parent;
+ u32 value;
+
+ tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0));
+ tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL));
+ tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
+ tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0));
+ tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14));
+ tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28));
+ tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08));
+ tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00));
+ tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04));
+
+ VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
+ VDE_WR(0x00000000, vde->regs + VDMA(0x00));
+ VDE_WR(0x00000007, vde->regs + VDMA(0x04));
+ VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
+ VDE_WR(0x00000005, vde->regs + TFE(0x04));
+ VDE_WR(0x00000000, vde->regs + MBE(0x84));
+ VDE_WR(0x00000010, vde->regs + SXE(0x08));
+ VDE_WR(0x00000150, vde->regs + SXE(0x54));
+ VDE_WR(0x0000054C, vde->regs + SXE(0x58));
+ VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
+ VDE_WR(0x063C063C, vde->regs + MCE(0x10));
+ VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
+ VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
+ VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
+ VDE_WR(0x00000000, vde->regs + BSEV(0x98));
+ VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
+
+ memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
+
+ tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
+ ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
+
+ tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
+ ctx->dpb_frames_nb - 1,
+ ctx->dpb_ref_frames_with_earlier_poc_nb);
+
+ VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
+ VDE_WR(bitstream_data_paddr + bitstream_data_size,
+ vde->regs + BSEV(0x54));
+
+ value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
+
+ VDE_WR(value, vde->regs + BSEV(0x88));
+
+ if (tegra_vde_wait_bsev(vde, false))
+ return -EIO;
+
+ if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))
+ return -EIO;
+
+ value = 0x01500000;
+ value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
+
+ if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
+ return -EIO;
+
+ if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
+ return -EIO;
+
+ if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
+ return -EIO;
+
+ value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
+
+ if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
+ return -EIO;
+
+ value = (1 << 23) | 5;
+ value |= ctx->pic_width_in_mbs << 11;
+ value |= ctx->pic_height_in_mbs << 3;
+
+ VDE_WR(value, vde->regs + SXE(0x10));
+
+ value = !ctx->is_baseline_profile << 17;
+ value |= ctx->level_idc << 13;
+ value |= ctx->log2_max_pic_order_cnt_lsb << 7;
+ value |= ctx->pic_order_cnt_type << 5;
+ value |= ctx->log2_max_frame_num;
+
+ VDE_WR(value, vde->regs + SXE(0x40));
+
+ value = ctx->pic_init_qp << 25;
+ value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
+ value |= !!ctx->pic_order_present_flag;
+
+ VDE_WR(value, vde->regs + SXE(0x44));
+
+ value = ctx->chroma_qp_index_offset;
+ value |= ctx->num_ref_idx_l0_active_minus1 << 5;
+ value |= ctx->num_ref_idx_l1_active_minus1 << 10;
+ value |= !!ctx->constrained_intra_pred_flag << 15;
+
+ VDE_WR(value, vde->regs + SXE(0x48));
+
+ value = 0x0C000000;
+ value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
+
+ VDE_WR(value, vde->regs + SXE(0x4C));
+
+ value = 0x03800000;
+ value |= min(bitstream_data_size, SZ_1M);
+
+ VDE_WR(value, vde->regs + SXE(0x68));
+
+ VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
+
+ value = (1 << 28) | 5;
+ value |= ctx->pic_width_in_mbs << 11;
+ value |= ctx->pic_height_in_mbs << 3;
+
+ VDE_WR(value, vde->regs + MBE(0x80));
+
+ value = 0x26800000;
+ value |= ctx->level_idc << 4;
+ value |= !ctx->is_baseline_profile << 1;
+ value |= !!ctx->direct_8x8_inference_flag;
+
+ VDE_WR(value, vde->regs + MBE(0x80));
+
+ VDE_WR(0xF4000001, vde->regs + MBE(0x80));
+ VDE_WR(0x20000000, vde->regs + MBE(0x80));
+ VDE_WR(0xF4000101, vde->regs + MBE(0x80));
+
+ value = 0x20000000;
+ value |= ctx->chroma_qp_index_offset << 8;
+
+ VDE_WR(value, vde->regs + MBE(0x80));
+
+ if (tegra_vde_setup_mbe_frame_idx(vde->regs,
+ ctx->pic_order_cnt_type == 0,
+ ctx->dpb_frames_nb - 1)) {
+ dev_err(dev, "MBE frames setup failed\n");
+ return -EIO;
+ }
+
+ tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
+ tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
+ tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
+ tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
+ tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
+
+ value = 0xFC000000;
+ value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
+
+ if (!ctx->is_baseline_profile)
+ value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
+
+ VDE_WR(value, vde->regs + MBE(0x80));
+
+ if (tegra_vde_wait_mbe(vde->regs)) {
+ dev_err(dev, "MBE programming failed\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
+{
+ VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
+ VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
+}
+
+static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
+{
+ struct dma_buf *dmabuf = a->dmabuf;
+
+ if (IS_ERR_OR_NULL(a))
+ return;
+
+ dma_buf_detach(dmabuf, a);
+ dma_buf_put(dmabuf);
+}
+
+static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
+ unsigned long offset, int min_size,
+ struct dma_buf_attachment **a,
+ phys_addr_t *paddr, u32 *size,
+ enum dma_data_direction dma_dir)
+{
+ struct dma_buf_attachment *attachment;
+ struct dma_buf *dmabuf;
+ struct sg_table *sgt;
+
+ *a = NULL;
+ *paddr = 0xFBDEAD00;
+
+ dmabuf = dma_buf_get(fd);
+ if (IS_ERR(dmabuf)) {
+ dev_err(dev, "Invalid dmabuf FD\n");
+ return PTR_ERR(dmabuf);
+ }
+
+ if ((u64)offset + min_size > dmabuf->size) {
+ dev_err(dev, "Too small dmabuf size %d @0x%lX, "
+ "should be at least %d\n",
+ dmabuf->size, offset, min_size);
+ return -EINVAL;
+ }
+
+ attachment = dma_buf_attach(dmabuf, dev);
+ if (IS_ERR(attachment)) {
+ dev_err(dev, "Failed to attach dmabuf\n");
+ dma_buf_put(dmabuf);
+ return PTR_ERR(attachment);
+ }
+
+ sgt = dma_buf_map_attachment(attachment, dma_dir);
+ if (IS_ERR(sgt)) {
+ dev_err(dev, "Failed to get dmabuf sg_table\n");
+ dma_buf_detach(dmabuf, attachment);
+ dma_buf_put(dmabuf);
+ return PTR_ERR(sgt);
+ }
+
+ if (sgt->nents != 1) {
+ dev_err(dev, "Sparsed DMA area is unsupported\n");
+ dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+ dma_buf_detach(dmabuf, attachment);
+ dma_buf_put(dmabuf);
+ return -EINVAL;
+ }
+
+ *paddr = sg_dma_address(sgt->sgl) + offset;
+
+ dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+
+ *a = attachment;
+
+ if (size)
+ *size = dmabuf->size - offset;
+
+ return 0;
+}
+
+static int tegra_vde_attach_frame_dmabufs(struct device *dev,
+ struct video_frame *frame,
+ struct tegra_vde_h264_frame *source,
+ enum dma_data_direction dma_dir,
+ int is_baseline_profile, int csize)
+{
+ int ret;
+
+ ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
+ source->y_offset, csize * 4,
+ &frame->y_dmabuf_attachment,
+ &frame->y_paddr, NULL, dma_dir);
+ if (ret)
+ return ret;
+
+ ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
+ source->cb_offset, csize,
+ &frame->cb_dmabuf_attachment,
+ &frame->cb_paddr, NULL, dma_dir);
+ if (ret)
+ return ret;
+
+ ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
+ source->cr_offset, csize,
+ &frame->cr_dmabuf_attachment,
+ &frame->cr_paddr, NULL, dma_dir);
+ if (ret)
+ return ret;
+
+ if (is_baseline_profile)
+ frame->aux_paddr = 0xF4DEAD00;
+ else
+ ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
+ source->aux_offset, csize,
+ &frame->aux_dmabuf_attachment,
+ &frame->aux_paddr, NULL, dma_dir);
+
+ return ret;
+}
+
+static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
+{
+ tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
+ tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
+ tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
+ tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);
+}
+
+static int tegra_vde_copy_and_validate_frame(struct device *dev,
+ struct tegra_vde_h264_frame *frame,
+ unsigned long vaddr)
+{
+ if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
+ dev_err(dev, "Copying of H.264 frame from user failed\n");
+ return -EFAULT;
+ }
+
+ if (frame->frame_num > 0x7FFFFF) {
+ dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
+ return -EINVAL;
+ }
+
+ if (frame->y_offset & 0xFF) {
+ dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
+ return -EINVAL;
+ }
+
+ if (frame->cb_offset & 0xFF) {
+ dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
+ return -EINVAL;
+ }
+
+ if (frame->cr_offset & 0xFF) {
+ dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int tegra_vde_validate_h264_ctx(struct device *dev,
+ struct tegra_vde_h264_decoder_ctx *ctx)
+{
+ if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
+ dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
+ return -EINVAL;
+ }
+
+ if (ctx->level_idc > 15) {
+ dev_err(dev, "Bad level value %u\n", ctx->level_idc);
+ return -EINVAL;
+ }
+
+ if (ctx->pic_init_qp > 52) {
+ dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
+ return -EINVAL;
+ }
+
+ if (ctx->log2_max_pic_order_cnt_lsb > 16) {
+ dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
+ ctx->log2_max_pic_order_cnt_lsb);
+ return -EINVAL;
+ }
+
+ if (ctx->log2_max_frame_num > 16) {
+ dev_err(dev, "Bad log2_max_frame_num value %u\n",
+ ctx->log2_max_frame_num);
+ return -EINVAL;
+ }
+
+ if (ctx->chroma_qp_index_offset > 31) {
+ dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
+ ctx->chroma_qp_index_offset);
+ return -EINVAL;
+ }
+
+ if (ctx->pic_order_cnt_type > 2) {
+ dev_err(dev, "Bad pic_order_cnt_type value %u\n",
+ ctx->pic_order_cnt_type);
+ return -EINVAL;
+ }
+
+ if (ctx->num_ref_idx_l0_active_minus1 > 15) {
+ dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
+ ctx->num_ref_idx_l0_active_minus1);
+ return -EINVAL;
+ }
+
+ if (ctx->num_ref_idx_l1_active_minus1 > 15) {
+ dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
+ ctx->num_ref_idx_l1_active_minus1);
+ return -EINVAL;
+ }
+
+ if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
+ dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
+ ctx->pic_width_in_mbs);
+ return -EINVAL;
+ }
+
+ if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
+ dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
+ ctx->pic_height_in_mbs);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
+ unsigned long vaddr)
+{
+ struct tegra_vde_h264_decoder_ctx ctx;
+ struct tegra_vde_h264_frame frame;
+ struct device *dev = vde->miscdev.parent;
+ struct video_frame *dpb_frames = NULL;
+ struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
+ enum dma_data_direction dma_dir;
+ phys_addr_t bitstream_data_paddr;
+ phys_addr_t bsev_paddr;
+ u32 bitstream_data_size;
+ u32 macroblocks_nb;
+ bool timeout = false;
+ int i, ret;
+
+ if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
+ dev_err(dev, "Copying of H.264 CTX from user failed\n");
+ return -EFAULT;
+ }
+
+ ret = tegra_vde_validate_h264_ctx(dev, &ctx);
+ if (ret)
+ return -EINVAL;
+
+ ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
+ ctx.bitstream_data_offset, 0,
+ &bitstream_data_dmabuf_attachment,
+ &bitstream_data_paddr,
+ &bitstream_data_size,
+ DMA_TO_DEVICE);
+ if (ret)
+ goto cleanup;
+
+ dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
+ GFP_KERNEL);
+ if (!dpb_frames) {
+ ret = -ENOMEM;
+ goto cleanup;
+ }
+
+ macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
+ vaddr = ctx.dpb_frames_ptr;
+
+ for (i = 0; i < ctx.dpb_frames_nb; i++) {
+ ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
+ if (ret)
+ goto cleanup;
+
+ dpb_frames[i].flags = frame.flags;
+ dpb_frames[i].frame_num = frame.frame_num;
+
+ dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+ ret = tegra_vde_attach_frame_dmabufs(dev,
+ &dpb_frames[i], &frame,
+ dma_dir,
+ ctx.is_baseline_profile,
+ macroblocks_nb * 64);
+ if (ret) {
+ tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
+ goto cleanup;
+ }
+
+ vaddr += sizeof(frame);
+ }
+
+ ret = clk_prepare_enable(vde->clk);
+ if (ret) {
+ dev_err(dev, "Failed to enable clk: %d\n", ret);
+ goto cleanup;
+ }
+
+ ret = mutex_lock_interruptible(&vde->lock);
+ if (ret)
+ goto clkgate;
+
+ ret = reset_control_deassert(vde->rst);
+ if (ret) {
+ dev_err(dev, "Failed to deassert reset: %d\n", ret);
+ clk_disable_unprepare(vde->clk);
+ goto unlock;
+ }
+
+ ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
+ bitstream_data_paddr,
+ bitstream_data_size,
+ macroblocks_nb);
+ if (ret)
+ goto reset;
+
+ tegra_vde_decode_frame(vde, macroblocks_nb);
+
+ timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
+ TEGRA_VDE_TIMEOUT);
+ if (timeout) {
+ bsev_paddr = readl(vde->regs + BSEV(0x10));
+ macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
+
+ dev_err(dev, "Decoding failed, "
+ "read 0x%X bytes : %u macroblocks parsed\n",
+ bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
+ macroblocks_nb);
+ }
+
+reset:
+ /*
+ * We rely on the VDE registers reset value, otherwise VDE would
+ * cause bus lockup.
+ */
+ ret = reset_control_assert(vde->rst);
+ if (ret)
+ dev_err(dev, "Failed to assert reset: %d\n", ret);
+
+ if (timeout)
+ ret = -EIO;
+
+unlock:
+ mutex_unlock(&vde->lock);
+
+clkgate:
+ clk_disable_unprepare(vde->clk);
+
+cleanup:
+ if (dpb_frames)
+ while (i--)
+ tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
+
+ kfree(dpb_frames);
+
+ tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
+
+ return ret;
+}
+
+static long tegra_vde_unlocked_ioctl(struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+ struct miscdevice *miscdev = filp->private_data;
+ struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
+ miscdev);
+
+ switch (cmd) {
+ case TEGRA_VDE_IOCTL_DECODE_H264:
+ return tegra_vde_ioctl_decode_h264(vde, arg);
+ }
+
+ dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
+
+ return -ENOTTY;
+}
+
+static const struct file_operations tegra_vde_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = tegra_vde_unlocked_ioctl,
+};
+
+static irqreturn_t tegra_vde_isr(int irq, void *data)
+{
+ struct tegra_vde *vde = data;
+
+ tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
+ complete(&vde->decode_completion);
+
+ return IRQ_HANDLED;
+}
+
+static int tegra_vde_probe(struct platform_device *pdev)
+{
+ struct resource *res_regs, *res_iram;
+ struct device *dev = &pdev->dev;
+ struct tegra_vde *vde;
+ int ret;
+
+ vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
+ if (!vde)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, vde);
+
+ res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
+ if (!res_regs)
+ return -ENODEV;
+
+ res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
+ if (!res_iram)
+ return -ENODEV;
+
+ vde->irq = platform_get_irq_byname(pdev, "sync-token");
+ if (vde->irq < 0)
+ return vde->irq;
+
+ vde->regs = devm_ioremap_resource(dev, res_regs);
+ if (IS_ERR(vde->regs))
+ return PTR_ERR(vde->regs);
+
+ vde->iram = devm_ioremap_resource(dev, res_iram);
+ if (IS_ERR(vde->iram))
+ return PTR_ERR(vde->iram);
+
+ vde->clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(vde->clk)) {
+ dev_err(dev, "Could not get VDE clk\n");
+ return PTR_ERR(vde->clk);
+ }
+
+ vde->rst = devm_reset_control_get(dev, "vde");
+ if (IS_ERR(vde->rst)) {
+ dev_err(dev, "Could not get VDE reset\n");
+ return PTR_ERR(vde->rst);
+ }
+
+ ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
+ dev_name(dev), vde);
+ if (ret)
+ return -ENODEV;
+
+ ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
+ vde->clk, vde->rst);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to power up VDE unit\n");
+ return ret;
+ }
+
+ ret = reset_control_assert(vde->rst);
+ if (ret) {
+ dev_err(dev, "Failed to assert reset: %d\n", ret);
+ return ret;
+ }
+
+ clk_disable_unprepare(vde->clk);
+
+ mutex_init(&vde->lock);
+ init_completion(&vde->decode_completion);
+
+ vde->iram_lists_paddr = res_iram->start;
+ vde->miscdev.minor = MISC_DYNAMIC_MINOR;
+ vde->miscdev.name = "tegra_vde";
+ vde->miscdev.fops = &tegra_vde_fops;
+ vde->miscdev.parent = dev;
+
+ ret = misc_register(&vde->miscdev);
+ if (ret)
+ return -ENODEV;
+
+ return 0;
+}
+
+static int tegra_vde_remove(struct platform_device *pdev)
+{
+ struct tegra_vde *vde = platform_get_drvdata(pdev);
+
+ misc_deregister(&vde->miscdev);
+
+ return 0;
+}
+
+static const struct of_device_id tegra_vde_of_match[] = {
+ { .compatible = "nvidia,tegra20-vde", },
+ { },
+};
+MODULE_DEVICE_TABLE(of, tegra_vde_of_match);
+
+static struct platform_driver tegra_vde_driver = {
+ .probe = tegra_vde_probe,
+ .remove = tegra_vde_remove,
+ .driver = {
+ .name = "tegra-vde",
+ .of_match_table = tegra_vde_of_match,
+ },
+};
+module_platform_driver(tegra_vde_driver);
+
+MODULE_DESCRIPTION("NVIDIA Tegra20 Video Decoder driver");
+MODULE_AUTHOR("Dmitry Osipenko");
+MODULE_LICENSE("GPL");
--
2.14.1

2017-09-25 23:07:07

by Stephen Warren

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
> decoding of CAVLC H.264 only.

Note: I don't know anything much about video decoding on Tegra (just NV
desktop GPUs, and that was a while ago), but I had a couple small
comments on the DT binding:

> diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt

> +NVIDIA Tegra Video Decoder Engine
> +
> +Required properties:
> +- compatible : "nvidia,tegra20-vde"
> +- reg : Must contain 2 register ranges: registers and IRAM area.
> +- reg-names : Must include the following entries:
> + - regs
> + - iram

I think the IRAM region needs more explanation: What is the region used
for and by what? Can it be moved, and if so does the move need to be
co-ordinated with any other piece of SW?

> +- clocks : Must contain one entry, for the module clock.
> + See ../clocks/clock-bindings.txt for details.
> +- resets : Must contain an entry for each entry in reset-names.
> + See ../reset/reset.txt for details.
> +- reset-names : Must include the following entries:
> + - vde

Let's require a clock-names property too.

2017-09-25 23:45:30

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 26.09.2017 02:01, Stephen Warren wrote:
> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>> decoding of CAVLC H.264 only.
>
> Note: I don't know anything much about video decoding on Tegra (just NV desktop
> GPUs, and that was a while ago), but I had a couple small comments on the DT
> binding:
>
>> diff --git
>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>
>> +NVIDIA Tegra Video Decoder Engine
>> +
>> +Required properties:
>> +- compatible : "nvidia,tegra20-vde"
>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>> +- reg-names : Must include the following entries:
>> +  - regs
>> +  - iram
>
> I think the IRAM region needs more explanation: What is the region used for and
> by what? Can it be moved, and if so does the move need to be co-ordinated with
> any other piece of SW?
>

IRAM region is used by Video Decoder HW for internal use and some of decoding
parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
Should it be explained in the binding?

>> +- clocks : Must contain one entry, for the module clock.
>> +  See ../clocks/clock-bindings.txt for details.
>> +- resets : Must contain an entry for each entry in reset-names.
>> +  See ../reset/reset.txt for details.
>> +- reset-names : Must include the following entries:
>> +  - vde
>
> Let's require a clock-names property too.

Okay, I'll add this property to the binding.

--
Dmitry

2017-09-26 05:11:39

by Stephen Warren

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 09/25/2017 05:45 PM, Dmitry Osipenko wrote:
> On 26.09.2017 02:01, Stephen Warren wrote:
>> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>>> decoding of CAVLC H.264 only.
>>
>> Note: I don't know anything much about video decoding on Tegra (just NV desktop
>> GPUs, and that was a while ago), but I had a couple small comments on the DT
>> binding:
>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>
>>> +NVIDIA Tegra Video Decoder Engine
>>> +
>>> +Required properties:
>>> +- compatible : "nvidia,tegra20-vde"
>>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>>> +- reg-names : Must include the following entries:
>>> + - regs
>>> + - iram
>>
>> I think the IRAM region needs more explanation: What is the region used for and
>> by what? Can it be moved, and if so does the move need to be co-ordinated with
>> any other piece of SW?
>
> IRAM region is used by Video Decoder HW for internal use and some of decoding
> parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
> are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
> Should it be explained in the binding?

I think this should be briefly mentioned, yes. Otherwise at least people
who don't know the VDE HW well (like me) will wonder why on earth VDE
interacts with IRAM at all. I would have assumed all parameters were
supplied via registers or via descriptors in DRAM.

Thanks.

2017-09-26 06:54:41

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver

On Tue, Sep 26, 2017 at 01:15:41AM +0300, Dmitry Osipenko wrote:
> This driver provides accelerated video decoding to NVIDIA Tegra20 SoC's,
> it is a result of reverse-engineering efforts. Driver has been tested on
> Toshiba AC100 and Acer A500, it should work on any Tegra20 device.
>
> In userspace this driver is utilized by libvdpau-tegra [0] that implements
> VDPAU interface, so any video player that supports VDPAU can provide
> accelerated video decoding on Tegra20 on Linux.

Why not use the v4l2 api instead? Doesn't that provide the same needed
user/kernel api here instead of creating yet-another-custom ioctl?

thanks,

greg k-h

2017-09-26 12:02:50

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 26.09.2017 08:11, Stephen Warren wrote:
> On 09/25/2017 05:45 PM, Dmitry Osipenko wrote:
>> On 26.09.2017 02:01, Stephen Warren wrote:
>>> On 09/25/2017 04:15 PM, Dmitry Osipenko wrote:
>>>> Video decoder, found on NVIDIA Tegra20 SoC, supports a standard set of
>>>> video formats like H.264 / MPEG-4 / WMV / VC1. Currently driver supports
>>>> decoding of CAVLC H.264 only.
>>>
>>> Note: I don't know anything much about video decoding on Tegra (just NV desktop
>>> GPUs, and that was a while ago), but I had a couple small comments on the DT
>>> binding:
>>>
>>>> diff --git
>>>> a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>>> b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-vde.txt
>>>
>>>> +NVIDIA Tegra Video Decoder Engine
>>>> +
>>>> +Required properties:
>>>> +- compatible : "nvidia,tegra20-vde"
>>>> +- reg : Must contain 2 register ranges: registers and IRAM area.
>>>> +- reg-names : Must include the following entries:
>>>> + - regs
>>>> + - iram
>>>
>>> I think the IRAM region needs more explanation: What is the region used for and
>>> by what? Can it be moved, and if so does the move need to be co-ordinated with
>>> any other piece of SW?
>>
>> IRAM region is used by Video Decoder HW for internal use and some of decoding
>> parameters are supplied via IRAM, like frames order list. AFAIK IRAM addresses
>> are hardwired in HW and aren't movable, it is not 100% but I'm pretty sure.
>> Should it be explained in the binding?
>
> I think this should be briefly mentioned, yes. Otherwise at least people
> who don't know the VDE HW well (like me) will wonder why on earth VDE
> interacts with IRAM at all. I would have assumed all parameters were
> supplied via registers or via descriptors in DRAM.
>
> Thanks.
>

I also forgot to mention that VDE scrubs that IRAM region on HW reset. So yeah,
it's definitely a part of HW definition. I'll add a brief explanation to the
binding.

--
Dmitry

2017-09-26 12:32:30

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver

On 26.09.2017 09:54, Greg Kroah-Hartman wrote:
> On Tue, Sep 26, 2017 at 01:15:41AM +0300, Dmitry Osipenko wrote:
>> This driver provides accelerated video decoding to NVIDIA Tegra20 SoC's,
>> it is a result of reverse-engineering efforts. Driver has been tested on
>> Toshiba AC100 and Acer A500, it should work on any Tegra20 device.
>>
>> In userspace this driver is utilized by libvdpau-tegra [0] that implements
>> VDPAU interface, so any video player that supports VDPAU can provide
>> accelerated video decoding on Tegra20 on Linux.
>
> Why not use the v4l2 api instead? Doesn't that provide the same needed
> user/kernel api here instead of creating yet-another-custom ioctl?
>

1) The HW doesn't generalize for the common API. Like for example, it isn't
capable of unpacking bitstream encoded with CABAC (Context-adaptive binary
arithmetic coding), so unpacking should be done in software and then VDE HW
isn't capable of decoding such a stream in a fully-automated manner, software
would have to feed engine with a chunks of macroblocks untill the whole frame is
decoded. That lameness is partially hidden in the BLOB's firmware, that firmware
actually is just a driver BTW.

2) We want to have decoding integrated with the presentation of the decoded
video frame. So having v4l interface for decoding would be just an extra
unnecessary shim, increasing CPU / memory resources usage and complexity of the
code.

3) The decoding and presentation are already implemented using VDPAU API and
proven to work decently in that way.

--
Dmitry

2017-09-26 21:35:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver

On Tue, Sep 26, 2017 at 03:32:23PM +0300, Dmitry Osipenko wrote:
> On 26.09.2017 09:54, Greg Kroah-Hartman wrote:
> > On Tue, Sep 26, 2017 at 01:15:41AM +0300, Dmitry Osipenko wrote:
> >> This driver provides accelerated video decoding to NVIDIA Tegra20 SoC's,
> >> it is a result of reverse-engineering efforts. Driver has been tested on
> >> Toshiba AC100 and Acer A500, it should work on any Tegra20 device.
> >>
> >> In userspace this driver is utilized by libvdpau-tegra [0] that implements
> >> VDPAU interface, so any video player that supports VDPAU can provide
> >> accelerated video decoding on Tegra20 on Linux.
> >
> > Why not use the v4l2 api instead? Doesn't that provide the same needed
> > user/kernel api here instead of creating yet-another-custom ioctl?
> >
>
> 1) The HW doesn't generalize for the common API. Like for example, it isn't
> capable of unpacking bitstream encoded with CABAC (Context-adaptive binary
> arithmetic coding), so unpacking should be done in software and then VDE HW
> isn't capable of decoding such a stream in a fully-automated manner, software
> would have to feed engine with a chunks of macroblocks untill the whole frame is
> decoded. That lameness is partially hidden in the BLOB's firmware, that firmware
> actually is just a driver BTW.
>
> 2) We want to have decoding integrated with the presentation of the decoded
> video frame. So having v4l interface for decoding would be just an extra
> unnecessary shim, increasing CPU / memory resources usage and complexity of the
> code.
>
> 3) The decoding and presentation are already implemented using VDPAU API and
> proven to work decently in that way.

This sounds like something you should be talking over with the media
driver developers, why are they not even cc:ed on this submission?

I need their ack on this new api before I can take this.

thanks,

greg k-h

2017-09-26 22:17:16

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] NVIDIA Tegra20 video decoder driver

On 27.09.2017 00:35, Greg Kroah-Hartman wrote:
> On Tue, Sep 26, 2017 at 03:32:23PM +0300, Dmitry Osipenko wrote:
>> On 26.09.2017 09:54, Greg Kroah-Hartman wrote:
>>> On Tue, Sep 26, 2017 at 01:15:41AM +0300, Dmitry Osipenko wrote:
>>>> This driver provides accelerated video decoding to NVIDIA Tegra20 SoC's,
>>>> it is a result of reverse-engineering efforts. Driver has been tested on
>>>> Toshiba AC100 and Acer A500, it should work on any Tegra20 device.
>>>>
>>>> In userspace this driver is utilized by libvdpau-tegra [0] that implements
>>>> VDPAU interface, so any video player that supports VDPAU can provide
>>>> accelerated video decoding on Tegra20 on Linux.
>>>
>>> Why not use the v4l2 api instead? Doesn't that provide the same needed
>>> user/kernel api here instead of creating yet-another-custom ioctl?
>>>
>>
>> 1) The HW doesn't generalize for the common API. Like for example, it isn't
>> capable of unpacking bitstream encoded with CABAC (Context-adaptive binary
>> arithmetic coding), so unpacking should be done in software and then VDE HW
>> isn't capable of decoding such a stream in a fully-automated manner, software
>> would have to feed engine with a chunks of macroblocks untill the whole frame is
>> decoded. That lameness is partially hidden in the BLOB's firmware, that firmware
>> actually is just a driver BTW.
>>
>> 2) We want to have decoding integrated with the presentation of the decoded
>> video frame. So having v4l interface for decoding would be just an extra
>> unnecessary shim, increasing CPU / memory resources usage and complexity of the
>> code.
>>
>> 3) The decoding and presentation are already implemented using VDPAU API and
>> proven to work decently in that way.
>
> This sounds like something you should be talking over with the media
> driver developers, why are they not even cc:ed on this submission?
>
> I need their ack on this new api before I can take this.
>

Indeed, will cc media on V2.

--
Dmitry

2017-09-27 09:46:25

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

I feel like this code is good enough to go into the regular kernel
instead of staging, but I don't really know what "- properly handle
decoding faults" means in this context. Does the driver Oops all the
time or what?

Anyway, minor comments inline.

On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote:
> diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
> new file mode 100644
> index 000000000000..b947c012a373
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/Kconfig
> @@ -0,0 +1,6 @@
> +config TEGRA_VDE
> + tristate "NVIDIA Tegra20 video decoder driver"
> + depends on ARCH_TEGRA_2x_SOC

Could we get a || COMPILE_TEST here as well?

> + help
> + Say Y here to enable support for a NVIDIA Tegra20 video decoder
> + driver.
> diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
> new file mode 100644
> index 000000000000..e7c0df1174bf
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_TEGRA_VDE) += vde.o
> diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
> new file mode 100644
> index 000000000000..533ddfc5190e
> --- /dev/null
> +++ b/drivers/staging/tegra-vde/TODO
> @@ -0,0 +1,8 @@
> +All TODO's require reverse-engineering to be done first, it is very
> +unlikely that NVIDIA would ever release HW specs to public.
> +
> +TODO:
> + - properly handle decoding faults
> + - support more formats

Adding more formats is not a reason to put this into staging instead of
the normal drivers/ dir.

> +static int tegra_vde_setup_context(struct tegra_vde *vde,
> + struct tegra_vde_h264_decoder_ctx *ctx,
> + struct video_frame *dpb_frames,
> + phys_addr_t bitstream_data_paddr,
> + int bitstream_data_size,
> + int macroblocks_nb)
> +{
> + struct device *dev = vde->miscdev.parent;
> + u32 value;
> +
> + tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0));
> + tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL));
> + tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
> + tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0));
> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14));
> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28));
> + tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08));
> + tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00));
> + tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04));
> +
> + VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
> + VDE_WR(0x00000000, vde->regs + VDMA(0x00));
> + VDE_WR(0x00000007, vde->regs + VDMA(0x04));
> + VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
> + VDE_WR(0x00000005, vde->regs + TFE(0x04));
> + VDE_WR(0x00000000, vde->regs + MBE(0x84));
> + VDE_WR(0x00000010, vde->regs + SXE(0x08));
> + VDE_WR(0x00000150, vde->regs + SXE(0x54));
> + VDE_WR(0x0000054C, vde->regs + SXE(0x58));
> + VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
> + VDE_WR(0x063C063C, vde->regs + MCE(0x10));
> + VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
> + VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
> + VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
> + VDE_WR(0x00000000, vde->regs + BSEV(0x98));
> + VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
> +
> + memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
> +
> + tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
> + ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
> +
> + tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
> + ctx->dpb_frames_nb - 1,
> + ctx->dpb_ref_frames_with_earlier_poc_nb);
> +
> + VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
> + VDE_WR(bitstream_data_paddr + bitstream_data_size,
> + vde->regs + BSEV(0x54));
> +
> + value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
> +
> + VDE_WR(value, vde->regs + BSEV(0x88));
> +
> + if (tegra_vde_wait_bsev(vde, false))
> + return -EIO;
> +
> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))

Preserve the error code from tegra_vde_push_bsev_icmdqueue(). It's
still -EIO so this doesn't affect runtime.

> + return -EIO;
> +
> + value = 0x01500000;
> + value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
> +
> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true))

Same for all.

> + return -EIO;
> +
> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
> + return -EIO;
> +
> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
> + return -EIO;
> +
> + value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
> +
> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
> + return -EIO;
> +
> + value = (1 << 23) | 5;

I don't like these magic numbers.

> + value |= ctx->pic_width_in_mbs << 11;
> + value |= ctx->pic_height_in_mbs << 3;
> +
> + VDE_WR(value, vde->regs + SXE(0x10));
> +
> + value = !ctx->is_baseline_profile << 17;
> + value |= ctx->level_idc << 13;
> + value |= ctx->log2_max_pic_order_cnt_lsb << 7;
> + value |= ctx->pic_order_cnt_type << 5;
> + value |= ctx->log2_max_frame_num;
> +
> + VDE_WR(value, vde->regs + SXE(0x40));
> +
> + value = ctx->pic_init_qp << 25;
> + value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
> + value |= !!ctx->pic_order_present_flag;
> +
> + VDE_WR(value, vde->regs + SXE(0x44));
> +
> + value = ctx->chroma_qp_index_offset;
> + value |= ctx->num_ref_idx_l0_active_minus1 << 5;
> + value |= ctx->num_ref_idx_l1_active_minus1 << 10;
> + value |= !!ctx->constrained_intra_pred_flag << 15;
> +
> + VDE_WR(value, vde->regs + SXE(0x48));
> +
> + value = 0x0C000000;
> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
> +
> + VDE_WR(value, vde->regs + SXE(0x4C));
> +
> + value = 0x03800000;
> + value |= min(bitstream_data_size, SZ_1M);
> +
> + VDE_WR(value, vde->regs + SXE(0x68));
> +
> + VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
> +
> + value = (1 << 28) | 5;
> + value |= ctx->pic_width_in_mbs << 11;
> + value |= ctx->pic_height_in_mbs << 3;
> +
> + VDE_WR(value, vde->regs + MBE(0x80));
> +
> + value = 0x26800000;
> + value |= ctx->level_idc << 4;
> + value |= !ctx->is_baseline_profile << 1;
> + value |= !!ctx->direct_8x8_inference_flag;
> +
> + VDE_WR(value, vde->regs + MBE(0x80));
> +
> + VDE_WR(0xF4000001, vde->regs + MBE(0x80));
> + VDE_WR(0x20000000, vde->regs + MBE(0x80));
> + VDE_WR(0xF4000101, vde->regs + MBE(0x80));
> +
> + value = 0x20000000;
> + value |= ctx->chroma_qp_index_offset << 8;
> +
> + VDE_WR(value, vde->regs + MBE(0x80));
> +
> + if (tegra_vde_setup_mbe_frame_idx(vde->regs,
> + ctx->pic_order_cnt_type == 0,
> + ctx->dpb_frames_nb - 1)) {


Preserve the error code.

> + dev_err(dev, "MBE frames setup failed\n");
> + return -EIO;
> + }
> +
> + tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
> + tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
> + tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
> + tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
> + tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
> +
> + value = 0xFC000000;
> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
> +
> + if (!ctx->is_baseline_profile)
> + value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
> +
> + VDE_WR(value, vde->regs + MBE(0x80));
> +
> + if (tegra_vde_wait_mbe(vde->regs)) {
> + dev_err(dev, "MBE programming failed\n");
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
> +{
> + VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
> + VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
> +}
> +
> +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
> +{
> + struct dma_buf *dmabuf = a->dmabuf;
> +
> + if (IS_ERR_OR_NULL(a))

You just dereferenced this on the line before so it's too late.

Obviously we don't want to dereference either an error pointer or a NULL
but I'm wondering the significance of having it be both. Normally that
would mean that NULL is a special case of success. In other words,
error pointer means the hardware is broken but NULL means it's missing
and not required. I guess I'm suggesting to just delete the check and
make sure we never set this to either NULL or ERR_PTR.

> + return;
> +
> + dma_buf_detach(dmabuf, a);
> + dma_buf_put(dmabuf);
> +}
> +
> +static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
> + unsigned long offset, int min_size,
> + struct dma_buf_attachment **a,
> + phys_addr_t *paddr, u32 *size,
> + enum dma_data_direction dma_dir)
> +{
> + struct dma_buf_attachment *attachment;
> + struct dma_buf *dmabuf;
> + struct sg_table *sgt;
> +
> + *a = NULL;
> + *paddr = 0xFBDEAD00;
> +
> + dmabuf = dma_buf_get(fd);
> + if (IS_ERR(dmabuf)) {
> + dev_err(dev, "Invalid dmabuf FD\n");
> + return PTR_ERR(dmabuf);
> + }
> +
> + if ((u64)offset + min_size > dmabuf->size) {
> + dev_err(dev, "Too small dmabuf size %d @0x%lX, "
> + "should be at least %d\n",
> + dmabuf->size, offset, min_size);
> + return -EINVAL;
> + }
> +
> + attachment = dma_buf_attach(dmabuf, dev);
> + if (IS_ERR(attachment)) {
> + dev_err(dev, "Failed to attach dmabuf\n");
> + dma_buf_put(dmabuf);
> + return PTR_ERR(attachment);
> + }
> +
> + sgt = dma_buf_map_attachment(attachment, dma_dir);
> + if (IS_ERR(sgt)) {
> + dev_err(dev, "Failed to get dmabuf sg_table\n");
> + dma_buf_detach(dmabuf, attachment);
> + dma_buf_put(dmabuf);
> + return PTR_ERR(sgt);
> + }
> +
> + if (sgt->nents != 1) {
> + dev_err(dev, "Sparsed DMA area is unsupported\n");

s/Sparsed/Sparse/

> + dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> + dma_buf_detach(dmabuf, attachment);
> + dma_buf_put(dmabuf);
> + return -EINVAL;

This function would be cleaner using gotos to unwind. Pick the goto
name to reflect what the goto does. For example, here it would be:

if (sgt->nents != 1) {
dev_err(dev, "Sparse DMA area is unsupported\n");
ret = -EINVAL;
goto err_umap;
}



> + }
> +
> + *paddr = sg_dma_address(sgt->sgl) + offset;
> +
> + dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> +
> + *a = attachment;
> +
> + if (size)
> + *size = dmabuf->size - offset;
> +
> + return 0;

return 0;

err_unmap:
dma_buf_unmap_attachment(attachment, sgt, dma_dir);
err_detach:
dma_buf_detach(dmabuf, attachment);
err_put:
dma_buf_put(dmabuf);
return ret;

> +}
> +
> +static int tegra_vde_attach_frame_dmabufs(struct device *dev,
> + struct video_frame *frame,
> + struct tegra_vde_h264_frame *source,
> + enum dma_data_direction dma_dir,
> + int is_baseline_profile, int csize)
> +{
> + int ret;
> +
> + ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
> + source->y_offset, csize * 4,
> + &frame->y_dmabuf_attachment,
> + &frame->y_paddr, NULL, dma_dir);
> + if (ret)
> + return ret;
> +
> + ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
> + source->cb_offset, csize,
> + &frame->cb_dmabuf_attachment,
> + &frame->cb_paddr, NULL, dma_dir);
> + if (ret)
> + return ret;
> +
> + ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
> + source->cr_offset, csize,
> + &frame->cr_dmabuf_attachment,
> + &frame->cr_paddr, NULL, dma_dir);
> + if (ret)
> + return ret;
> +
> + if (is_baseline_profile)
> + frame->aux_paddr = 0xF4DEAD00;

The handling of is_baseline_profile is strange to me. It feels like we
should always check it before we use ->aux_paddr but we don't ever do
that.

> + else
> + ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
> + source->aux_offset, csize,
> + &frame->aux_dmabuf_attachment,
> + &frame->aux_paddr, NULL, dma_dir);


This function should have some error handling to undo the earlier
attach calls if the latter ones fail. See below:


> +
> + return ret;

return 0;

err_detach_cr:
tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
err_detach_cb:
tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
err_detach_y:
tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);

return ret;


> +}
> +
> +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
> +{
> + tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
> + tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
> + tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
> + tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);


It would be happier to write this in the reverse order so it matches
the error handling that I wrote for you.


> +}
> +
> +static int tegra_vde_copy_and_validate_frame(struct device *dev,
> + struct tegra_vde_h264_frame *frame,
> + unsigned long vaddr)
> +{
> + if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
> + dev_err(dev, "Copying of H.264 frame from user failed\n");
> + return -EFAULT;

Error message are a funny thing and different people feel different
ways. These can be triggered by the user so they let you spam dmesg
but I can see how many of them would be useful. These ones for
copy_from_user() are not useful since we assume the programmer should
know that stuff. The error code seems enough.

> + }
> +
> + if (frame->frame_num > 0x7FFFFF) {
> + dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
> + return -EINVAL;
> + }
> +
> + if (frame->y_offset & 0xFF) {
> + dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
> + return -EINVAL;
> + }
> +
> + if (frame->cb_offset & 0xFF) {
> + dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
> + return -EINVAL;
> + }
> +
> + if (frame->cr_offset & 0xFF) {
> + dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int tegra_vde_validate_h264_ctx(struct device *dev,
> + struct tegra_vde_h264_decoder_ctx *ctx)
> +{
> + if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
> + dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
> + return -EINVAL;
> + }
> +
> + if (ctx->level_idc > 15) {
> + dev_err(dev, "Bad level value %u\n", ctx->level_idc);
> + return -EINVAL;
> + }
> +
> + if (ctx->pic_init_qp > 52) {
> + dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
> + return -EINVAL;
> + }
> +
> + if (ctx->log2_max_pic_order_cnt_lsb > 16) {
> + dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
> + ctx->log2_max_pic_order_cnt_lsb);
> + return -EINVAL;
> + }
> +
> + if (ctx->log2_max_frame_num > 16) {
> + dev_err(dev, "Bad log2_max_frame_num value %u\n",
> + ctx->log2_max_frame_num);
> + return -EINVAL;
> + }
> +
> + if (ctx->chroma_qp_index_offset > 31) {
> + dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
> + ctx->chroma_qp_index_offset);
> + return -EINVAL;
> + }
> +
> + if (ctx->pic_order_cnt_type > 2) {
> + dev_err(dev, "Bad pic_order_cnt_type value %u\n",
> + ctx->pic_order_cnt_type);
> + return -EINVAL;
> + }
> +
> + if (ctx->num_ref_idx_l0_active_minus1 > 15) {
> + dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
> + ctx->num_ref_idx_l0_active_minus1);
> + return -EINVAL;
> + }
> +
> + if (ctx->num_ref_idx_l1_active_minus1 > 15) {
> + dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
> + ctx->num_ref_idx_l1_active_minus1);
> + return -EINVAL;
> + }
> +
> + if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
> + dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
> + ctx->pic_width_in_mbs);
> + return -EINVAL;
> + }
> +
> + if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
> + dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
> + ctx->pic_height_in_mbs);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
> + unsigned long vaddr)
> +{
> + struct tegra_vde_h264_decoder_ctx ctx;
> + struct tegra_vde_h264_frame frame;
> + struct device *dev = vde->miscdev.parent;
> + struct video_frame *dpb_frames = NULL;
> + struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
> + enum dma_data_direction dma_dir;
> + phys_addr_t bitstream_data_paddr;
> + phys_addr_t bsev_paddr;
> + u32 bitstream_data_size;
> + u32 macroblocks_nb;
> + bool timeout = false;
> + int i, ret;
> +
> + if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
> + dev_err(dev, "Copying of H.264 CTX from user failed\n");
> + return -EFAULT;
> + }
> +
> + ret = tegra_vde_validate_h264_ctx(dev, &ctx);
> + if (ret)
> + return -EINVAL;
> +
> + ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
> + ctx.bitstream_data_offset, 0,
> + &bitstream_data_dmabuf_attachment,
> + &bitstream_data_paddr,
> + &bitstream_data_size,
> + DMA_TO_DEVICE);
> + if (ret)
> + goto cleanup;


I hate this label name. What are we cleaning up??? Now I have to set a
bookmark so I can remember where I left and then scroll down to the
bottom of the function and take a look.

[pause]

OK. I'm back. I call this "one err" style error handling. And it's
very bug prone. Please read my essay on the topic:

https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ

The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL
pointer and there was that dereference before check that I mentioned
earlier.

> +
> + dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
> + GFP_KERNEL);
> + if (!dpb_frames) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
> + vaddr = ctx.dpb_frames_ptr;
> +
> + for (i = 0; i < ctx.dpb_frames_nb; i++) {
> + ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
> + if (ret)
> + goto cleanup;
> +
> + dpb_frames[i].flags = frame.flags;
> + dpb_frames[i].frame_num = frame.frame_num;
> +
> + dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> + ret = tegra_vde_attach_frame_dmabufs(dev,
> + &dpb_frames[i], &frame,
> + dma_dir,
> + ctx.is_baseline_profile,
> + macroblocks_nb * 64);
> + if (ret) {
> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
> + goto cleanup;
> + }
> +
> + vaddr += sizeof(frame);
> + }
> +
> + ret = clk_prepare_enable(vde->clk);
> + if (ret) {
> + dev_err(dev, "Failed to enable clk: %d\n", ret);
> + goto cleanup;
> + }
> +
> + ret = mutex_lock_interruptible(&vde->lock);
> + if (ret)
> + goto clkgate;
> +
> + ret = reset_control_deassert(vde->rst);
> + if (ret) {
> + dev_err(dev, "Failed to deassert reset: %d\n", ret);
> + clk_disable_unprepare(vde->clk);

We do a second clk_disable_unprepare(vde->clk); after the unlock.
Delete this?

> + goto unlock;
> + }
> +
> + ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
> + bitstream_data_paddr,
> + bitstream_data_size,
> + macroblocks_nb);
> + if (ret)
> + goto reset;
> +
> + tegra_vde_decode_frame(vde, macroblocks_nb);
> +
> + timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
> + TEGRA_VDE_TIMEOUT);
> + if (timeout) {
> + bsev_paddr = readl(vde->regs + BSEV(0x10));
> + macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
> +
> + dev_err(dev, "Decoding failed, "
> + "read 0x%X bytes : %u macroblocks parsed\n",
> + bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
> + macroblocks_nb);
> + }
> +
> +reset:
> + /*
> + * We rely on the VDE registers reset value, otherwise VDE would
> + * cause bus lockup.
> + */
> + ret = reset_control_assert(vde->rst);
> + if (ret)
> + dev_err(dev, "Failed to assert reset: %d\n", ret);

We're overwriting "ret" here when we probably want to preserve the error
code from reset_control_deassert().

> +
> + if (timeout)
> + ret = -EIO;
> +
> +unlock:
> + mutex_unlock(&vde->lock);
> +
> +clkgate:
> + clk_disable_unprepare(vde->clk);
> +
> +cleanup:
> + if (dpb_frames)
> + while (i--)
> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
> +
> + kfree(dpb_frames);
> +
> + tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
> +
> + return ret;
> +}
> +
> +static long tegra_vde_unlocked_ioctl(struct file *filp,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct miscdevice *miscdev = filp->private_data;
> + struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
> + miscdev);
> +
> + switch (cmd) {
> + case TEGRA_VDE_IOCTL_DECODE_H264:
> + return tegra_vde_ioctl_decode_h264(vde, arg);
> + }
> +
> + dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
> +
> + return -ENOTTY;
> +}
> +
> +static const struct file_operations tegra_vde_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = tegra_vde_unlocked_ioctl,
> +};
> +
> +static irqreturn_t tegra_vde_isr(int irq, void *data)
> +{
> + struct tegra_vde *vde = data;
> +
> + tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
> + complete(&vde->decode_completion);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int tegra_vde_probe(struct platform_device *pdev)
> +{
> + struct resource *res_regs, *res_iram;
> + struct device *dev = &pdev->dev;
> + struct tegra_vde *vde;
> + int ret;
> +
> + vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
> + if (!vde)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, vde);
> +
> + res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
> + if (!res_regs)
> + return -ENODEV;
> +
> + res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
> + if (!res_iram)
> + return -ENODEV;
> +
> + vde->irq = platform_get_irq_byname(pdev, "sync-token");
> + if (vde->irq < 0)
> + return vde->irq;
> +
> + vde->regs = devm_ioremap_resource(dev, res_regs);
> + if (IS_ERR(vde->regs))
> + return PTR_ERR(vde->regs);
> +
> + vde->iram = devm_ioremap_resource(dev, res_iram);
> + if (IS_ERR(vde->iram))
> + return PTR_ERR(vde->iram);
> +
> + vde->clk = devm_clk_get(dev, NULL);
> + if (IS_ERR(vde->clk)) {
> + dev_err(dev, "Could not get VDE clk\n");
> + return PTR_ERR(vde->clk);
> + }
> +
> + vde->rst = devm_reset_control_get(dev, "vde");
> + if (IS_ERR(vde->rst)) {
> + dev_err(dev, "Could not get VDE reset\n");
> + return PTR_ERR(vde->rst);
> + }
> +
> + ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
> + dev_name(dev), vde);
> + if (ret)
> + return -ENODEV;

Presever the error code.

> +
> + ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
> + vde->clk, vde->rst);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to power up VDE unit\n");
> + return ret;
> + }
> +
> + ret = reset_control_assert(vde->rst);
> + if (ret) {
> + dev_err(dev, "Failed to assert reset: %d\n", ret);
> + return ret;
> + }
> +
> + clk_disable_unprepare(vde->clk);
> +
> + mutex_init(&vde->lock);
> + init_completion(&vde->decode_completion);
> +
> + vde->iram_lists_paddr = res_iram->start;
> + vde->miscdev.minor = MISC_DYNAMIC_MINOR;
> + vde->miscdev.name = "tegra_vde";
> + vde->miscdev.fops = &tegra_vde_fops;
> + vde->miscdev.parent = dev;
> +
> + ret = misc_register(&vde->miscdev);
> + if (ret)
> + return -ENODEV;

Preserve the error code.

> +
> + return 0;
> +}
> +

regards,
dan carpenter


2017-09-27 23:28:15

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 27.09.2017 12:45, Dan Carpenter wrote:
> I feel like this code is good enough to go into the regular kernel
> instead of staging, but I don't really know what "- properly handle
> decoding faults" means in this context.

As Greg pointed out, this patch should be cc'ed to media maintainers for a
review. I'll cc them on V2, will see what they would suggest. Yeah, probably the
decoding faults isn't a very candidate for a TODO for staging.

Does the driver Oops all the
> time or what?
>

Driver works fine.

> Anyway, minor comments inline.
>

Thank you very much for the awesome review. I agree with the most of the comments.

> On Tue, Sep 26, 2017 at 01:15:42AM +0300, Dmitry Osipenko wrote:
>> diff --git a/drivers/staging/tegra-vde/Kconfig b/drivers/staging/tegra-vde/Kconfig
>> new file mode 100644
>> index 000000000000..b947c012a373
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Kconfig
>> @@ -0,0 +1,6 @@
>> +config TEGRA_VDE
>> + tristate "NVIDIA Tegra20 video decoder driver"
>> + depends on ARCH_TEGRA_2x_SOC
>
> Could we get a || COMPILE_TEST here as well?
>
>> + help
>> + Say Y here to enable support for a NVIDIA Tegra20 video decoder
>> + driver.
>> diff --git a/drivers/staging/tegra-vde/Makefile b/drivers/staging/tegra-vde/Makefile
>> new file mode 100644
>> index 000000000000..e7c0df1174bf
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Makefile
>> @@ -0,0 +1 @@
>> +obj-$(CONFIG_TEGRA_VDE) += vde.o
>> diff --git a/drivers/staging/tegra-vde/TODO b/drivers/staging/tegra-vde/TODO
>> new file mode 100644
>> index 000000000000..533ddfc5190e
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/TODO
>> @@ -0,0 +1,8 @@
>> +All TODO's require reverse-engineering to be done first, it is very
>> +unlikely that NVIDIA would ever release HW specs to public.
>> +
>> +TODO:
>> + - properly handle decoding faults
>> + - support more formats
>
> Adding more formats is not a reason to put this into staging instead of
> the normal drivers/ dir.
>

Well, it feels that the driver isn't very acceptable for the core drivers/. But
maybe it's a wrong feeling. Again, will see what media/ maintainers would suggest.

>> +static int tegra_vde_setup_context(struct tegra_vde *vde,
>> + struct tegra_vde_h264_decoder_ctx *ctx,
>> + struct video_frame *dpb_frames,
>> + phys_addr_t bitstream_data_paddr,
>> + int bitstream_data_size,
>> + int macroblocks_nb)
>> +{
>> + struct device *dev = vde->miscdev.parent;
>> + u32 value;
>> +
>> + tegra_vde_set_bits(vde->regs, 0xA, SXE(0xF0));
>> + tegra_vde_set_bits(vde->regs, 0xB, BSEV(CMDQUE_CONTROL));
>> + tegra_vde_set_bits(vde->regs, 0x8002, MBE(0x50));
>> + tegra_vde_set_bits(vde->regs, 0xA, MBE(0xA0));
>> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x14));
>> + tegra_vde_set_bits(vde->regs, 0xA, PPE(0x28));
>> + tegra_vde_set_bits(vde->regs, 0xA00, MCE(0x08));
>> + tegra_vde_set_bits(vde->regs, 0xA, TFE(0x00));
>> + tegra_vde_set_bits(vde->regs, 0x5, VDMA(0x04));
>> +
>> + VDE_WR(0x00000000, vde->regs + VDMA(0x1C));
>> + VDE_WR(0x00000000, vde->regs + VDMA(0x00));
>> + VDE_WR(0x00000007, vde->regs + VDMA(0x04));
>> + VDE_WR(0x00000007, vde->regs + FRAMEID(0x200));
>> + VDE_WR(0x00000005, vde->regs + TFE(0x04));
>> + VDE_WR(0x00000000, vde->regs + MBE(0x84));
>> + VDE_WR(0x00000010, vde->regs + SXE(0x08));
>> + VDE_WR(0x00000150, vde->regs + SXE(0x54));
>> + VDE_WR(0x0000054C, vde->regs + SXE(0x58));
>> + VDE_WR(0x00000E34, vde->regs + SXE(0x5C));
>> + VDE_WR(0x063C063C, vde->regs + MCE(0x10));
>> + VDE_WR(0x0003FC00, vde->regs + BSEV(INTR_STATUS));
>> + VDE_WR(0x0000150D, vde->regs + BSEV(BSE_CONFIG));
>> + VDE_WR(0x00000100, vde->regs + BSEV(BSE_INT_ENB));
>> + VDE_WR(0x00000000, vde->regs + BSEV(0x98));
>> + VDE_WR(0x00000060, vde->regs + BSEV(0x9C));
>> +
>> + memset_io(vde->iram + 512, 0, macroblocks_nb / 2);
>> +
>> + tegra_setup_frameidx(vde->regs, dpb_frames, ctx->dpb_frames_nb,
>> + ctx->pic_width_in_mbs, ctx->pic_height_in_mbs);
>> +
>> + tegra_vde_setup_iram_tables(vde->iram, dpb_frames,
>> + ctx->dpb_frames_nb - 1,
>> + ctx->dpb_ref_frames_with_earlier_poc_nb);
>> +
>> + VDE_WR(0x00000000, vde->regs + BSEV(0x8C));
>> + VDE_WR(bitstream_data_paddr + bitstream_data_size,
>> + vde->regs + BSEV(0x54));
>> +
>> + value = ctx->pic_width_in_mbs << 11 | ctx->pic_height_in_mbs << 3;
>> +
>> + VDE_WR(value, vde->regs + BSEV(0x88));
>> +
>> + if (tegra_vde_wait_bsev(vde, false))
>> + return -EIO;
>> +
>> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x800003FC, false))
>
> Preserve the error code from tegra_vde_push_bsev_icmdqueue(). It's
> still -EIO so this doesn't affect runtime.
>

Okay, I'll propagate error code in all other places as well.

>> + return -EIO;
>> +
>> + value = 0x01500000;
>> + value |= ((vde->iram_lists_paddr + 512) >> 2) & 0xFFFF;
>> +
>> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
>
> Same for all.
>
>> + return -EIO;
>> +
>> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x840F054C, false))
>> + return -EIO;
>> +
>> + if (tegra_vde_push_bsev_icmdqueue(vde, 0x80000080, false))
>> + return -EIO;
>> +
>> + value = 0x0E340000 | ((vde->iram_lists_paddr >> 2) & 0xFFFF);
>> +
>> + if (tegra_vde_push_bsev_icmdqueue(vde, value, true))
>> + return -EIO;
>> +
>> + value = (1 << 23) | 5;
>
> I don't like these magic numbers.
>

I don't know what these bits do, at best I can replace them with a hex for
consistency.

>> + value |= ctx->pic_width_in_mbs << 11;
>> + value |= ctx->pic_height_in_mbs << 3;
>> +
>> + VDE_WR(value, vde->regs + SXE(0x10));
>> +
>> + value = !ctx->is_baseline_profile << 17;
>> + value |= ctx->level_idc << 13;
>> + value |= ctx->log2_max_pic_order_cnt_lsb << 7;
>> + value |= ctx->pic_order_cnt_type << 5;
>> + value |= ctx->log2_max_frame_num;
>> +
>> + VDE_WR(value, vde->regs + SXE(0x40));
>> +
>> + value = ctx->pic_init_qp << 25;
>> + value |= !!(ctx->deblocking_filter_control_present_flag) << 2;
>> + value |= !!ctx->pic_order_present_flag;
>> +
>> + VDE_WR(value, vde->regs + SXE(0x44));
>> +
>> + value = ctx->chroma_qp_index_offset;
>> + value |= ctx->num_ref_idx_l0_active_minus1 << 5;
>> + value |= ctx->num_ref_idx_l1_active_minus1 << 10;
>> + value |= !!ctx->constrained_intra_pred_flag << 15;
>> +
>> + VDE_WR(value, vde->regs + SXE(0x48));
>> +
>> + value = 0x0C000000;
>> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 24;
>> +
>> + VDE_WR(value, vde->regs + SXE(0x4C));
>> +
>> + value = 0x03800000;
>> + value |= min(bitstream_data_size, SZ_1M);
>> +
>> + VDE_WR(value, vde->regs + SXE(0x68));
>> +
>> + VDE_WR(bitstream_data_paddr, vde->regs + SXE(0x6C));
>> +
>> + value = (1 << 28) | 5;
>> + value |= ctx->pic_width_in_mbs << 11;
>> + value |= ctx->pic_height_in_mbs << 3;
>> +
>> + VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> + value = 0x26800000;
>> + value |= ctx->level_idc << 4;
>> + value |= !ctx->is_baseline_profile << 1;
>> + value |= !!ctx->direct_8x8_inference_flag;
>> +
>> + VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> + VDE_WR(0xF4000001, vde->regs + MBE(0x80));
>> + VDE_WR(0x20000000, vde->regs + MBE(0x80));
>> + VDE_WR(0xF4000101, vde->regs + MBE(0x80));
>> +
>> + value = 0x20000000;
>> + value |= ctx->chroma_qp_index_offset << 8;
>> +
>> + VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> + if (tegra_vde_setup_mbe_frame_idx(vde->regs,
>> + ctx->pic_order_cnt_type == 0,
>> + ctx->dpb_frames_nb - 1)) {
>
>
> Preserve the error code.
>
>> + dev_err(dev, "MBE frames setup failed\n");
>> + return -EIO;
>> + }
>> +
>> + tegra_vde_mbe_set_0xa_reg(vde->regs, 0, 0x000009FC);
>> + tegra_vde_mbe_set_0xa_reg(vde->regs, 2, 0xF1DEAD00);
>> + tegra_vde_mbe_set_0xa_reg(vde->regs, 4, 0xF2DEAD00);
>> + tegra_vde_mbe_set_0xa_reg(vde->regs, 6, 0xF3DEAD00);
>> + tegra_vde_mbe_set_0xa_reg(vde->regs, 8, dpb_frames[0].aux_paddr);
>> +
>> + value = 0xFC000000;
>> + value |= !!(dpb_frames[0].flags & FLAG_IS_B_FRAME) << 2;
>> +
>> + if (!ctx->is_baseline_profile)
>> + value |= !!(dpb_frames[0].flags & FLAG_IS_REFERENCE) << 1;
>> +
>> + VDE_WR(value, vde->regs + MBE(0x80));
>> +
>> + if (tegra_vde_wait_mbe(vde->regs)) {
>> + dev_err(dev, "MBE programming failed\n");
>> + return -EIO;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void tegra_vde_decode_frame(struct tegra_vde *vde, int macroblocks_nb)
>> +{
>> + VDE_WR(0x00000001, vde->regs + BSEV(0x8C));
>> + VDE_WR(0x20000000 | (macroblocks_nb - 1), vde->regs + SXE(0x00));
>> +}
>> +
>> +static void tegra_vde_detach_and_put_dmabuf(struct dma_buf_attachment *a)
>> +{
>> + struct dma_buf *dmabuf = a->dmabuf;
>> +
>> + if (IS_ERR_OR_NULL(a))
>
> You just dereferenced this on the line before so it's too late.
>

Indeed, good catch!

> Obviously we don't want to dereference either an error pointer or a NULL
> but I'm wondering the significance of having it be both. Normally that
> would mean that NULL is a special case of success. In other words,
> error pointer means the hardware is broken but NULL means it's missing
> and not required. I guess I'm suggesting to just delete the check and
> make sure we never set this to either NULL or ERR_PTR.
>

NULL here exactly means that attachment is missing and not required. But anyway
I'll get rid of IS_ERR_OR_NULL, thanks for the suggestion.

>> + return;
>> +
>> + dma_buf_detach(dmabuf, a);
>> + dma_buf_put(dmabuf);
>> +}
>> +
>> +static int tegra_vde_attach_dmabuf(struct device *dev, int fd,
>> + unsigned long offset, int min_size,
>> + struct dma_buf_attachment **a,
>> + phys_addr_t *paddr, u32 *size,
>> + enum dma_data_direction dma_dir)
>> +{
>> + struct dma_buf_attachment *attachment;
>> + struct dma_buf *dmabuf;
>> + struct sg_table *sgt;
>> +
>> + *a = NULL;
>> + *paddr = 0xFBDEAD00;
>> +
>> + dmabuf = dma_buf_get(fd);
>> + if (IS_ERR(dmabuf)) {
>> + dev_err(dev, "Invalid dmabuf FD\n");
>> + return PTR_ERR(dmabuf);
>> + }
>> +
>> + if ((u64)offset + min_size > dmabuf->size) {
>> + dev_err(dev, "Too small dmabuf size %d @0x%lX, "
>> + "should be at least %d\n",
>> + dmabuf->size, offset, min_size);
>> + return -EINVAL;
>> + }
>> +
>> + attachment = dma_buf_attach(dmabuf, dev);
>> + if (IS_ERR(attachment)) {
>> + dev_err(dev, "Failed to attach dmabuf\n");
>> + dma_buf_put(dmabuf);
>> + return PTR_ERR(attachment);
>> + }
>> +
>> + sgt = dma_buf_map_attachment(attachment, dma_dir);
>> + if (IS_ERR(sgt)) {
>> + dev_err(dev, "Failed to get dmabuf sg_table\n");
>> + dma_buf_detach(dmabuf, attachment);
>> + dma_buf_put(dmabuf);
>> + return PTR_ERR(sgt);
>> + }
>> +
>> + if (sgt->nents != 1) {
>> + dev_err(dev, "Sparsed DMA area is unsupported\n");
>
> s/Sparsed/Sparse/
>

ACK.

>> + dma_buf_unmap_attachment(attachment, sgt, dma_dir);
>> + dma_buf_detach(dmabuf, attachment);
>> + dma_buf_put(dmabuf);
>> + return -EINVAL;
>
> This function would be cleaner using gotos to unwind. Pick the goto
> name to reflect what the goto does. For example, here it would be:
>

Okay.

> if (sgt->nents != 1) {
> dev_err(dev, "Sparse DMA area is unsupported\n");
> ret = -EINVAL;
> goto err_umap;
> }
>
>
>
>> + }
>> +
>> + *paddr = sg_dma_address(sgt->sgl) + offset;
>> +
>> + dma_buf_unmap_attachment(attachment, sgt, dma_dir);
>> +
>> + *a = attachment;
>> +
>> + if (size)
>> + *size = dmabuf->size - offset;
>> +
>> + return 0;
>
> return 0;
>
> err_unmap:
> dma_buf_unmap_attachment(attachment, sgt, dma_dir);
> err_detach:
> dma_buf_detach(dmabuf, attachment);
> err_put:
> dma_buf_put(dmabuf);
> return ret;
>
>> +}
>> +
>> +static int tegra_vde_attach_frame_dmabufs(struct device *dev,
>> + struct video_frame *frame,
>> + struct tegra_vde_h264_frame *source,
>> + enum dma_data_direction dma_dir,
>> + int is_baseline_profile, int csize)
>> +{
>> + int ret;
>> +
>> + ret = tegra_vde_attach_dmabuf(dev, source->y_fd,
>> + source->y_offset, csize * 4,
>> + &frame->y_dmabuf_attachment,
>> + &frame->y_paddr, NULL, dma_dir);
>> + if (ret)
>> + return ret;
>> +
>> + ret = tegra_vde_attach_dmabuf(dev, source->cb_fd,
>> + source->cb_offset, csize,
>> + &frame->cb_dmabuf_attachment,
>> + &frame->cb_paddr, NULL, dma_dir);
>> + if (ret)
>> + return ret;
>> +
>> + ret = tegra_vde_attach_dmabuf(dev, source->cr_fd,
>> + source->cr_offset, csize,
>> + &frame->cr_dmabuf_attachment,
>> + &frame->cr_paddr, NULL, dma_dir);
>> + if (ret)
>> + return ret;
>> +
>> + if (is_baseline_profile)
>> + frame->aux_paddr = 0xF4DEAD00;
>
> The handling of is_baseline_profile is strange to me. It feels like we
> should always check it before we use ->aux_paddr but we don't ever do
> that.
>

In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
phys address is set to a predefined and invalid address, so that in a case of
VDE trying to use it, its invalid memory accesses would be reported in KMSG by
memory controller driver and the reported invalid addresses would be known to be
associated with the aux buffer. I'm not sure what you are meaning.

>> + else
>> + ret = tegra_vde_attach_dmabuf(dev, source->aux_fd,
>> + source->aux_offset, csize,
>> + &frame->aux_dmabuf_attachment,
>> + &frame->aux_paddr, NULL, dma_dir);
>
>
> This function should have some error handling to undo the earlier
> attach calls if the latter ones fail. See below:
>
>

Okay.

>> +
>> + return ret;
>
> return 0;
>
> err_detach_cr:
> tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
> err_detach_cb:
> tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
> err_detach_y:
> tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
>
> return ret;
>
>
>> +}
>> +
>> +static void tegra_vde_deattach_frame_dmabufs(struct video_frame *frame)
>> +{
>> + tegra_vde_detach_and_put_dmabuf(frame->y_dmabuf_attachment);
>> + tegra_vde_detach_and_put_dmabuf(frame->cb_dmabuf_attachment);
>> + tegra_vde_detach_and_put_dmabuf(frame->cr_dmabuf_attachment);
>> + tegra_vde_detach_and_put_dmabuf(frame->aux_dmabuf_attachment);
>
>
> It would be happier to write this in the reverse order so it matches
> the error handling that I wrote for you.
>
>

Okay.

>> +}
>> +
>> +static int tegra_vde_copy_and_validate_frame(struct device *dev,
>> + struct tegra_vde_h264_frame *frame,
>> + unsigned long vaddr)
>> +{
>> + if (copy_from_user(frame, (void __user *)vaddr, sizeof(*frame))) {
>> + dev_err(dev, "Copying of H.264 frame from user failed\n");
>> + return -EFAULT;
>
> Error message are a funny thing and different people feel different
> ways. These can be triggered by the user so they let you spam dmesg
> but I can see how many of them would be useful. These ones for
> copy_from_user() are not useful since we assume the programmer should
> know that stuff. The error code seems enough.
>

Agree.

>> + }
>> +
>> + if (frame->frame_num > 0x7FFFFF) {
>> + dev_err(dev, "Bad frame_num %u\n", frame->frame_num);
>> + return -EINVAL;
>> + }
>> +
>> + if (frame->y_offset & 0xFF) {
>> + dev_err(dev, "Bad y_offset 0x%X\n", frame->y_offset);
>> + return -EINVAL;
>> + }
>> +
>> + if (frame->cb_offset & 0xFF) {
>> + dev_err(dev, "Bad cb_offset 0x%X\n", frame->cb_offset);
>> + return -EINVAL;
>> + }
>> +
>> + if (frame->cr_offset & 0xFF) {
>> + dev_err(dev, "Bad cr_offset 0x%X\n", frame->cr_offset);
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_vde_validate_h264_ctx(struct device *dev,
>> + struct tegra_vde_h264_decoder_ctx *ctx)
>> +{
>> + if (ctx->dpb_frames_nb == 0 || ctx->dpb_frames_nb > 17) {
>> + dev_err(dev, "Bad DPB size %u\n", ctx->dpb_frames_nb);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->level_idc > 15) {
>> + dev_err(dev, "Bad level value %u\n", ctx->level_idc);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->pic_init_qp > 52) {
>> + dev_err(dev, "Bad pic_init_qp value %u\n", ctx->pic_init_qp);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->log2_max_pic_order_cnt_lsb > 16) {
>> + dev_err(dev, "Bad log2_max_pic_order_cnt_lsb value %u\n",
>> + ctx->log2_max_pic_order_cnt_lsb);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->log2_max_frame_num > 16) {
>> + dev_err(dev, "Bad log2_max_frame_num value %u\n",
>> + ctx->log2_max_frame_num);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->chroma_qp_index_offset > 31) {
>> + dev_err(dev, "Bad chroma_qp_index_offset value %u\n",
>> + ctx->chroma_qp_index_offset);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->pic_order_cnt_type > 2) {
>> + dev_err(dev, "Bad pic_order_cnt_type value %u\n",
>> + ctx->pic_order_cnt_type);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->num_ref_idx_l0_active_minus1 > 15) {
>> + dev_err(dev, "Bad num_ref_idx_l0_active_minus1 value %u\n",
>> + ctx->num_ref_idx_l0_active_minus1);
>> + return -EINVAL;
>> + }
>> +
>> + if (ctx->num_ref_idx_l1_active_minus1 > 15) {
>> + dev_err(dev, "Bad num_ref_idx_l1_active_minus1 value %u\n",
>> + ctx->num_ref_idx_l1_active_minus1);
>> + return -EINVAL;
>> + }
>> +
>> + if (!ctx->pic_width_in_mbs || ctx->pic_width_in_mbs > 127) {
>> + dev_err(dev, "Bad pic_width_in_mbs value %u, min 1 max 127\n",
>> + ctx->pic_width_in_mbs);
>> + return -EINVAL;
>> + }
>> +
>> + if (!ctx->pic_height_in_mbs || ctx->pic_height_in_mbs > 127) {
>> + dev_err(dev, "Bad pic_height_in_mbs value %u, min 1 max 127\n",
>> + ctx->pic_height_in_mbs);
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
>> + unsigned long vaddr)
>> +{
>> + struct tegra_vde_h264_decoder_ctx ctx;
>> + struct tegra_vde_h264_frame frame;
>> + struct device *dev = vde->miscdev.parent;
>> + struct video_frame *dpb_frames = NULL;
>> + struct dma_buf_attachment *bitstream_data_dmabuf_attachment = NULL;
>> + enum dma_data_direction dma_dir;
>> + phys_addr_t bitstream_data_paddr;
>> + phys_addr_t bsev_paddr;
>> + u32 bitstream_data_size;
>> + u32 macroblocks_nb;
>> + bool timeout = false;
>> + int i, ret;
>> +
>> + if (copy_from_user(&ctx, (void __user *)vaddr, sizeof(ctx))) {
>> + dev_err(dev, "Copying of H.264 CTX from user failed\n");
>> + return -EFAULT;
>> + }
>> +
>> + ret = tegra_vde_validate_h264_ctx(dev, &ctx);
>> + if (ret)
>> + return -EINVAL;
>> +
>> + ret = tegra_vde_attach_dmabuf(dev, ctx.bitstream_data_fd,
>> + ctx.bitstream_data_offset, 0,
>> + &bitstream_data_dmabuf_attachment,
>> + &bitstream_data_paddr,
>> + &bitstream_data_size,
>> + DMA_TO_DEVICE);
>> + if (ret)
>> + goto cleanup;
>
>
> I hate this label name. What are we cleaning up??? Now I have to set a
> bookmark so I can remember where I left and then scroll down to the
> bottom of the function and take a look.
>
> [pause]
>
> OK. I'm back. I call this "one err" style error handling. And it's
> very bug prone. Please read my essay on the topic:
>
> https://plus.google.com/106378716002406849458/posts/dnanfhQ4mHQ
>
> The bug is that we call tegra_vde_detach_and_put_dmabuf() with a NULL
> pointer and there was that dereference before check that I mentioned
> earlier.
>

Again, this NULL pointer was intentional. But yes, I messed up the
IS_ERR_OR_NULL/derefernce order and will get rid of it as you suggested. No
objections to what you wrote in the essay ;)

>> +
>> + dpb_frames = kcalloc(ctx.dpb_frames_nb, sizeof(*dpb_frames),
>> + GFP_KERNEL);
>> + if (!dpb_frames) {
>> + ret = -ENOMEM;
>> + goto cleanup;
>> + }
>> +
>> + macroblocks_nb = ctx.pic_width_in_mbs * ctx.pic_height_in_mbs;
>> + vaddr = ctx.dpb_frames_ptr;
>> +
>> + for (i = 0; i < ctx.dpb_frames_nb; i++) {
>> + ret = tegra_vde_copy_and_validate_frame(dev, &frame, vaddr);
>> + if (ret)
>> + goto cleanup;
>> +
>> + dpb_frames[i].flags = frame.flags;
>> + dpb_frames[i].frame_num = frame.frame_num;
>> +
>> + dma_dir = (i == 0) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
>> +
>> + ret = tegra_vde_attach_frame_dmabufs(dev,
>> + &dpb_frames[i], &frame,
>> + dma_dir,
>> + ctx.is_baseline_profile,
>> + macroblocks_nb * 64);
>> + if (ret) {
>> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
>> + goto cleanup;
>> + }
>> +
>> + vaddr += sizeof(frame);
>> + }
>> +
>> + ret = clk_prepare_enable(vde->clk);
>> + if (ret) {
>> + dev_err(dev, "Failed to enable clk: %d\n", ret);
>> + goto cleanup;
>> + }
>> +
>> + ret = mutex_lock_interruptible(&vde->lock);
>> + if (ret)
>> + goto clkgate;
>> +
>> + ret = reset_control_deassert(vde->rst);
>> + if (ret) {
>> + dev_err(dev, "Failed to deassert reset: %d\n", ret);
>> + clk_disable_unprepare(vde->clk);
>
> We do a second clk_disable_unprepare(vde->clk); after the unlock.
> Delete this?
>

Good catch! I've reshuffled clk managment several times before..

>> + goto unlock;
>> + }
>> +
>> + ret = tegra_vde_setup_context(vde, &ctx, dpb_frames,
>> + bitstream_data_paddr,
>> + bitstream_data_size,
>> + macroblocks_nb);
>> + if (ret)
>> + goto reset;
>> +
>> + tegra_vde_decode_frame(vde, macroblocks_nb);
>> +
>> + timeout = !wait_for_completion_io_timeout(&vde->decode_completion,
>> + TEGRA_VDE_TIMEOUT);
>> + if (timeout) {
>> + bsev_paddr = readl(vde->regs + BSEV(0x10));
>> + macroblocks_nb = readl(vde->regs + SXE(0xC8)) & 0x1FFF;
>> +
>> + dev_err(dev, "Decoding failed, "
>> + "read 0x%X bytes : %u macroblocks parsed\n",
>> + bsev_paddr ? bsev_paddr - bitstream_data_paddr : 0,
>> + macroblocks_nb);
>> + }
>> +
>> +reset:
>> + /*
>> + * We rely on the VDE registers reset value, otherwise VDE would
>> + * cause bus lockup.
>> + */
>> + ret = reset_control_assert(vde->rst);
>> + if (ret)
>> + dev_err(dev, "Failed to assert reset: %d\n", ret);
>
> We're overwriting "ret" here when we probably want to preserve the error
> code from reset_control_deassert().
>

I think it doesn't really matter. It's very unlikely that the reset assert could
ever fail, it might result in a further system hang on the next decode invocation.

>> +
>> + if (timeout)
>> + ret = -EIO;
>> +
>> +unlock:
>> + mutex_unlock(&vde->lock);
>> +
>> +clkgate:
>> + clk_disable_unprepare(vde->clk);
>> +
>> +cleanup:
>> + if (dpb_frames)
>> + while (i--)
>> + tegra_vde_deattach_frame_dmabufs(&dpb_frames[i]);
>> +
>> + kfree(dpb_frames);
>> +
>> + tegra_vde_detach_and_put_dmabuf(bitstream_data_dmabuf_attachment);
>> +
>> + return ret;
>> +}
>> +
>> +static long tegra_vde_unlocked_ioctl(struct file *filp,
>> + unsigned int cmd, unsigned long arg)
>> +{
>> + struct miscdevice *miscdev = filp->private_data;
>> + struct tegra_vde *vde = container_of(miscdev, struct tegra_vde,
>> + miscdev);
>> +
>> + switch (cmd) {
>> + case TEGRA_VDE_IOCTL_DECODE_H264:
>> + return tegra_vde_ioctl_decode_h264(vde, arg);
>> + }
>> +
>> + dev_err(miscdev->parent, "Invalid IOCTL command %u\n", cmd);
>> +
>> + return -ENOTTY;
>> +}
>> +
>> +static const struct file_operations tegra_vde_fops = {
>> + .owner = THIS_MODULE,
>> + .unlocked_ioctl = tegra_vde_unlocked_ioctl,
>> +};
>> +
>> +static irqreturn_t tegra_vde_isr(int irq, void *data)
>> +{
>> + struct tegra_vde *vde = data;
>> +
>> + tegra_vde_set_bits(vde->regs, 0, FRAMEID(0x208));
>> + complete(&vde->decode_completion);
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static int tegra_vde_probe(struct platform_device *pdev)
>> +{
>> + struct resource *res_regs, *res_iram;
>> + struct device *dev = &pdev->dev;
>> + struct tegra_vde *vde;
>> + int ret;
>> +
>> + vde = devm_kzalloc(&pdev->dev, sizeof(*vde), GFP_KERNEL);
>> + if (!vde)
>> + return -ENOMEM;
>> +
>> + platform_set_drvdata(pdev, vde);
>> +
>> + res_regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
>> + if (!res_regs)
>> + return -ENODEV;
>> +
>> + res_iram = platform_get_resource_byname(pdev, IORESOURCE_MEM, "iram");
>> + if (!res_iram)
>> + return -ENODEV;
>> +
>> + vde->irq = platform_get_irq_byname(pdev, "sync-token");
>> + if (vde->irq < 0)
>> + return vde->irq;
>> +
>> + vde->regs = devm_ioremap_resource(dev, res_regs);
>> + if (IS_ERR(vde->regs))
>> + return PTR_ERR(vde->regs);
>> +
>> + vde->iram = devm_ioremap_resource(dev, res_iram);
>> + if (IS_ERR(vde->iram))
>> + return PTR_ERR(vde->iram);
>> +
>> + vde->clk = devm_clk_get(dev, NULL);
>> + if (IS_ERR(vde->clk)) {
>> + dev_err(dev, "Could not get VDE clk\n");
>> + return PTR_ERR(vde->clk);
>> + }
>> +
>> + vde->rst = devm_reset_control_get(dev, "vde");
>> + if (IS_ERR(vde->rst)) {
>> + dev_err(dev, "Could not get VDE reset\n");
>> + return PTR_ERR(vde->rst);
>> + }
>> +
>> + ret = devm_request_irq(dev, vde->irq, tegra_vde_isr, IRQF_SHARED,
>> + dev_name(dev), vde);
>> + if (ret)
>> + return -ENODEV;
>
> Presever the error code.
>
>> +
>> + ret = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_VDEC,
>> + vde->clk, vde->rst);
>> + if (ret) {
>> + dev_err(&pdev->dev, "Failed to power up VDE unit\n");
>> + return ret;
>> + }
>> +
>> + ret = reset_control_assert(vde->rst);
>> + if (ret) {
>> + dev_err(dev, "Failed to assert reset: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + clk_disable_unprepare(vde->clk);
>> +
>> + mutex_init(&vde->lock);
>> + init_completion(&vde->decode_completion);
>> +
>> + vde->iram_lists_paddr = res_iram->start;
>> + vde->miscdev.minor = MISC_DYNAMIC_MINOR;
>> + vde->miscdev.name = "tegra_vde";
>> + vde->miscdev.fops = &tegra_vde_fops;
>> + vde->miscdev.parent = dev;
>> +
>> + ret = misc_register(&vde->miscdev);
>> + if (ret)
>> + return -ENODEV;
>
> Preserve the error code.
>
>> +
>> + return 0;
>> +}
>> +
>
--
Dmitry

2017-09-27 23:31:29

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 27.09.2017 12:45, Dan Carpenter wrote:
>> --- /dev/null
>> +++ b/drivers/staging/tegra-vde/Kconfig
>> @@ -0,0 +1,6 @@
>> +config TEGRA_VDE
>> + tristate "NVIDIA Tegra20 video decoder driver"
>> + depends on ARCH_TEGRA_2x_SOC
>
> Could we get a || COMPILE_TEST here as well?
>

Good point, I'll address it in V2.

--
Dmitry

2017-09-28 07:23:28

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote:
> >> + if (is_baseline_profile)
> >> + frame->aux_paddr = 0xF4DEAD00;
> >
> > The handling of is_baseline_profile is strange to me. It feels like we
> > should always check it before we use ->aux_paddr but we don't ever do
> > that.
> >
>
> In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
> phys address is set to a predefined and invalid address, so that in a case of
> VDE trying to use it, its invalid memory accesses would be reported in KMSG by
> memory controller driver and the reported invalid addresses would be known to be
> associated with the aux buffer. I'm not sure what you are meaning.

It's not used perhaps, but we do write it to the hardware, right?

tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);

It's just strange.

regards,
dan carpenter

2017-09-28 11:37:09

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] staging: Introduce NVIDIA Tegra20 video decoder driver

On 28.09.2017 10:23, Dan Carpenter wrote:
> On Thu, Sep 28, 2017 at 02:28:04AM +0300, Dmitry Osipenko wrote:
>>>> + if (is_baseline_profile)
>>>> + frame->aux_paddr = 0xF4DEAD00;
>>>
>>> The handling of is_baseline_profile is strange to me. It feels like we
>>> should always check it before we use ->aux_paddr but we don't ever do
>>> that.
>>>
>>
>> In a case of baseline profile, aux buffer isn't needed, HW should't use it. Aux
>> phys address is set to a predefined and invalid address, so that in a case of
>> VDE trying to use it, its invalid memory accesses would be reported in KMSG by
>> memory controller driver and the reported invalid addresses would be known to be
>> associated with the aux buffer. I'm not sure what you are meaning.
>
> It's not used perhaps, but we do write it to the hardware, right?
>

That's right. I'm pretty sure HW won't use the aux in a case of baseline
profile, haven't seen an evidence of it. But in order to be on a safe side, the
addresses are initialized to an invalid value, so HW won't have a chance to
silently fetch/trash 'arbitrary' main memory and we will know if it tries to do it.

if (baseline_profile)
frame->aux_paddr = 0xF4DEAD00;
else {

So here ^ all *used* frame entries are being initialized.

> tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);
>
> It's just strange.
>

There up to 16 reference video frames (already decoded) that could be used for
decoding of a video frame. Addresses of reference frames that shouldn't be used
for decoding of the current frame are set to an invalid address. Userspace my
supply wrong frames list or frames list setup code may contain an obscure bug
and we will know about it.

} else {
aux_paddr = 0xFADEAD00;
value = 0;
}

Here ^ all *unused* frame entries are being initialized.

tegra_vde_write_iram_entry(iram_tables, 0, i, value, aux_paddr);

And here ^ these entries are written to the tables that are read by HW.

--
Dmitry