2020-10-13 13:43:07

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

Dear all,

This series introduces a regmap infrastructure for the Hantro driver
which is used to compensate for different HW-revision register layouts.
To justify it h264 decoding capability is added for newer VC8000 chips.

This is a gradual conversion to the new infra - a complete conversion
would have been very big and I do not have all the HW yet to test (I'm
expecting a RK3399 shipment next week though ;). I think converting the
h264 decoder provides a nice blueprint for how the other codecs can be
converted and enabled for different HW revisions.

The end goal of this is to make the driver more generic and eliminate
entirely custom boilerplate like `struct hantro_reg` or headers with
core-specific bit manipulations like `hantro_g1_regs.h` and instead rely
on the well-tested albeit more verbose regmap subsytem.

To give just two examples of bugs which are easily discovered by using
more verbose regmap fields (very easy to compare with the datasheets)
instead of relying on bit-magic tricks: G1_REG_DEC_CTRL3_INIT_QP(x) was
off-by-1 and the wrong .clk_gate bit was set in hantro_postproc.c.

Anyway, this series also extends the MMIO regmap API to allow relaxed
writes for the theoretical reason that avoiding unnecessary membarriers
leads to less CPU usage and small improvements to battery life. However,
in practice I could not measure differences between relaxed/non-relaxed
IO, so I'm on the fence whether to keep or remove the relaxed calls.

What I could masure is the performance impact of adding more sub-reg
field acesses: a constant ~ 20 microsecond bump per G1 h264 frame. This
is acceptable considering the total time to decode a frame takes three
orders of magnitude longer, i.e. miliseconds ranges, depending on the
frame size and bitstream params, so it is an acceptable trade-off to
have a more generic driver.

This has been tested on next-20201009 with imx8mq for G1 and an SoC with
VC8000 which has not yet been added (hopefuly support lands soon).

Kind regards,
Adrian

Adrian Ratiu (18):
media: hantro: document all int reg bits up to vc8000
media: hantro: make consistent use of decimal register notation
media: hantro: make G1_REG_SOFT_RESET Rockchip specific
media: hantro: add reset controller support
media: hantro: prepare clocks before variant inits are run
media: hantro: imx8mq: simplify ctrlblk reset logic
regmap: mmio: add config option to allow relaxed MMIO accesses
media: hantro: add initial MMIO regmap infrastructure
media: hantro: default regmap to relaxed MMIO
media: hantro: convert G1 h264 decoder to regmap fields
media: hantro: convert G1 postproc to regmap
media: hantro: add VC8000D h264 decoding
media: hantro: add VC8000D postproc support
media: hantro: make PP enablement logic a bit smarter
media: hantro: add user-selectable, platform-selectable H264 High10
media: hantro: rename h264_dec as it's not G1 specific anymore
media: hantro: add dump registers debug option before decode start
media: hantro: document encoder reg fields

drivers/base/regmap/regmap-mmio.c | 34 +-
drivers/staging/media/hantro/Makefile | 3 +-
drivers/staging/media/hantro/hantro.h | 79 +-
drivers/staging/media/hantro/hantro_drv.c | 41 +-
drivers/staging/media/hantro/hantro_g1_regs.h | 92 +-
...hantro_g1_h264_dec.c => hantro_h264_dec.c} | 237 +++-
drivers/staging/media/hantro/hantro_hw.h | 23 +-
.../staging/media/hantro/hantro_postproc.c | 144 ++-
drivers/staging/media/hantro/hantro_regmap.c | 1015 +++++++++++++++++
drivers/staging/media/hantro/hantro_regmap.h | 295 +++++
drivers/staging/media/hantro/hantro_v4l2.c | 3 +-
drivers/staging/media/hantro/imx8m_vpu_hw.c | 75 +-
drivers/staging/media/hantro/rk3288_vpu_hw.c | 5 +-
include/linux/regmap.h | 5 +
14 files changed, 1795 insertions(+), 256 deletions(-)
rename drivers/staging/media/hantro/{hantro_g1_h264_dec.c => hantro_h264_dec.c} (58%)
create mode 100644 drivers/staging/media/hantro/hantro_regmap.c
create mode 100644 drivers/staging/media/hantro/hantro_regmap.h

--
2.28.0


2020-10-13 13:43:08

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 03/18] media: hantro: make G1_REG_SOFT_RESET Rockchip specific

This register is not documented in either the G1 or VC8000D register
maps and on VC8000D there is a conflict because at the same offset the
VPU IP defines another register with a very different meaning.

What likely happened is the HW integrator which uses only the G1 IP
core added some reset/control logic at the end of the VPU map, so
it makes sense to make this register RK-specific.

Signed-off-by: Adrian Ratiu <[email protected]>
---
drivers/staging/media/hantro/hantro_g1_regs.h | 1 -
drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 +++-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_regs.h b/drivers/staging/media/hantro/hantro_g1_regs.h
index 073b64cbe295..a482a2ba6dfe 100644
--- a/drivers/staging/media/hantro/hantro_g1_regs.h
+++ b/drivers/staging/media/hantro/hantro_g1_regs.h
@@ -315,7 +315,6 @@
#define G1_REG_REF_BUF_CTRL2_REFBU2_THR(x) (((x) & 0xfff) << 19)
#define G1_REG_REF_BUF_CTRL2_REFBU2_PICID(x) (((x) & 0x1f) << 14)
#define G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(x) (((x) & 0x3fff) << 0)
-#define G1_REG_SOFT_RESET 0x194

/* Post-processor registers. */
#define G1_REG_PP_INTERRUPT G1_SWREG(60)
diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c b/drivers/staging/media/hantro/rk3288_vpu_hw.c
index 7b299ee3e93d..4ad578b1236e 100644
--- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
+++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
@@ -13,6 +13,8 @@
#include "hantro_g1_regs.h"
#include "hantro_h1_regs.h"

+#define VDPU_REG_SOFT_RESET 0x194
+
#define RK3288_ACLK_MAX_FREQ (400 * 1000 * 1000)

/*
@@ -167,7 +169,7 @@ static void rk3288_vpu_dec_reset(struct hantro_ctx *ctx)

vdpu_write(vpu, G1_REG_INTERRUPT_DEC_IRQ_DIS, G1_REG_INTERRUPT);
vdpu_write(vpu, G1_REG_CONFIG_DEC_CLK_GATE_E, G1_REG_CONFIG);
- vdpu_write(vpu, 1, G1_REG_SOFT_RESET);
+ vdpu_write(vpu, 1, VDPU_REG_SOFT_RESET);
}

/*
--
2.28.0

2020-10-13 13:55:04

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 10/18] media: hantro: convert G1 h264 decoder to regmap fields

Populate the regmap field API for G1 h264 decoding and convert the
G1 h264 decoder source to use the new API. This is done because we
will add support for the newer VC8000D core which will configure
the regmap API fields differently to match its own hwreg layout.

Signed-off-by: Adrian Ratiu <[email protected]>
---
.../staging/media/hantro/hantro_g1_h264_dec.c | 71 ++++++++++-------
drivers/staging/media/hantro/hantro_regmap.c | 79 ++++++++++++++++++-
drivers/staging/media/hantro/hantro_regmap.h | 26 +++++-
3 files changed, 145 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
index 845bef73d218..8592dfabbc5e 100644
--- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
@@ -18,6 +18,9 @@
#include "hantro_g1_regs.h"
#include "hantro_hw.h"
#include "hantro_v4l2.h"
+#include "hantro_regmap.h"
+
+extern struct regmap_config hantro_regmap_dec;

static void set_params(struct hantro_ctx *ctx)
{
@@ -27,10 +30,15 @@ static void set_params(struct hantro_ctx *ctx)
const struct v4l2_ctrl_h264_pps *pps = ctrls->pps;
struct vb2_v4l2_buffer *src_buf = hantro_get_src_buf(ctx);
struct hantro_dev *vpu = ctx->dev;
+ struct hantro_regmap_fields_dec *fields = vpu->reg_fields_dec;
+ u32 width = MB_WIDTH(ctx->src_fmt.width);
+ u32 height = MB_HEIGHT(ctx->src_fmt.height);
u32 reg;

+ regmap_field_write(fields->dec_axi_wr_id, 0x0);
+
/* Decoder control register 0. */
- reg = G1_REG_DEC_CTRL0_DEC_AXI_WR_ID(0x0);
+ reg = 0;
if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
reg |= G1_REG_DEC_CTRL0_SEQ_MBAFF_E;
if (sps->profile_idc > 66) {
@@ -50,10 +58,11 @@ static void set_params(struct hantro_ctx *ctx)
vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);

/* Decoder control register 1. */
- reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(MB_WIDTH(ctx->src_fmt.width)) |
- G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(MB_HEIGHT(ctx->src_fmt.height)) |
- G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
- vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
+ regmap_field_write(fields->dec_pic_width, width);
+ regmap_field_write(fields->dec_pic_height, height);
+
+ regmap_field_write(fields->dec_num_ref_frames,
+ sps->max_num_ref_frames);

/* Decoder control register 2. */
reg = G1_REG_DEC_CTRL2_CH_QP_OFFSET(pps->chroma_qp_index_offset) |
@@ -66,10 +75,11 @@ static void set_params(struct hantro_ctx *ctx)
vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL2);

/* Decoder control register 3. */
- reg = G1_REG_DEC_CTRL3_START_CODE_E |
- G1_REG_DEC_CTRL3_INIT_QP(pps->pic_init_qp_minus26 + 26) |
- G1_REG_DEC_CTRL3_STREAM_LEN(vb2_get_plane_payload(&src_buf->vb2_buf, 0));
- vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL3);
+ regmap_field_write(fields->dec_start_code_e, 1);
+ regmap_field_write(fields->dec_init_qp,
+ pps->pic_init_qp_minus26 + 26);
+ regmap_field_write(fields->dec_stream_len,
+ vb2_get_plane_payload(&src_buf->vb2_buf, 0));

/* Decoder control register 4. */
reg = G1_REG_DEC_CTRL4_FRAMENUM_LEN(sps->log2_max_frame_num_minus4 + 4) |
@@ -121,8 +131,7 @@ static void set_params(struct hantro_ctx *ctx)
vdpu_write_relaxed(vpu, 0, G1_REG_REF_BUF_CTRL);

/* Reference picture buffer control register 2. */
- vdpu_write_relaxed(vpu, G1_REG_REF_BUF_CTRL2_APF_THRESHOLD(8),
- G1_REG_REF_BUF_CTRL2);
+ regmap_field_write(fields->dec_apf_threshold, 8);
}

static void set_ref(struct hantro_ctx *ctx)
@@ -221,7 +230,6 @@ static void set_ref(struct hantro_ctx *ctx)
/* Set up addresses of DPB buffers. */
for (i = 0; i < HANTRO_H264_DPB_SIZE; i++) {
dma_addr_t dma_addr = hantro_h264_get_ref_buf(ctx, i);
-
vdpu_write_relaxed(vpu, dma_addr, G1_REG_ADDR_REF(i));
}
}
@@ -231,6 +239,7 @@ static void set_buffers(struct hantro_ctx *ctx)
const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
struct vb2_v4l2_buffer *src_buf, *dst_buf;
struct hantro_dev *vpu = ctx->dev;
+ struct hantro_regmap_fields_dec *fields = vpu->reg_fields_dec;
dma_addr_t src_dma, dst_dma;
size_t offset = 0;

@@ -239,14 +248,14 @@ static void set_buffers(struct hantro_ctx *ctx)

/* Source (stream) buffer. */
src_dma = vb2_dma_contig_plane_dma_addr(&src_buf->vb2_buf, 0);
- vdpu_write_relaxed(vpu, src_dma, G1_REG_ADDR_STR);
+ regmap_field_write(fields->dec_addr_str, src_dma);

/* Destination (decoded frame) buffer. */
dst_dma = hantro_get_dec_buf_addr(ctx, &dst_buf->vb2_buf);
/* Adjust dma addr to start at second line for bottom field */
if (ctrls->decode->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD)
offset = ALIGN(ctx->src_fmt.width, MB_DIM);
- vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DST);
+ regmap_field_write(fields->dec_addr_dst, dst_dma + offset);

/* Higher profiles require DMV buffer appended to reference frames. */
if (ctrls->sps->profile_idc > 66 && ctrls->decode->nal_ref_idc) {
@@ -266,16 +275,18 @@ static void set_buffers(struct hantro_ctx *ctx)
if (ctrls->decode->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD)
offset += 32 * MB_WIDTH(ctx->src_fmt.width) *
MB_HEIGHT(ctx->src_fmt.height);
- vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DIR_MV);
+ regmap_field_write(fields->dec_addr_dir_mv, dst_dma + offset);
}

/* Auxiliary buffer prepared in hantro_g1_h264_dec_prepare_table(). */
- vdpu_write_relaxed(vpu, ctx->h264_dec.priv.dma, G1_REG_ADDR_QTABLE);
+ regmap_field_write(fields->dec_addr_qtable, ctx->h264_dec.priv.dma);
}

void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
{
struct hantro_dev *vpu = ctx->dev;
+ struct hantro_regmap_fields_dec *fields = vpu->reg_fields_dec;
+ int reg;

/* Prepare the H264 decoder context. */
if (hantro_h264_dec_prepare_run(ctx))
@@ -288,17 +299,23 @@ void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)

hantro_end_prepare_run(ctx);

+ switch (vpu->core_hw_dec_rev) {
+ case HANTRO_G1_REV:
+ reg = G1_REG_CONFIG_DEC_TIMEOUT_E |
+ G1_REG_CONFIG_DEC_OUT_ENDIAN |
+ G1_REG_CONFIG_DEC_STRENDIAN_E |
+ G1_REG_CONFIG_DEC_OUTSWAP32_E |
+ G1_REG_CONFIG_DEC_INSWAP32_E |
+ G1_REG_CONFIG_DEC_STRSWAP32_E;
+ vdpu_write_relaxed(vpu, reg, G1_REG_CONFIG);
+ break;
+ /* TODO: add VC8000 support */
+ }
+
+ regmap_field_write(fields->dec_clk_gate_e, 1);
+ regmap_field_write(fields->dec_max_burst, 16);
+ regmap_field_write(fields->dec_axi_rd_id, 16);
+
/* Start decoding! */
- vdpu_write_relaxed(vpu,
- G1_REG_CONFIG_DEC_AXI_RD_ID(0xffu) |
- G1_REG_CONFIG_DEC_TIMEOUT_E |
- G1_REG_CONFIG_DEC_OUT_ENDIAN |
- G1_REG_CONFIG_DEC_STRENDIAN_E |
- G1_REG_CONFIG_DEC_MAX_BURST(16) |
- G1_REG_CONFIG_DEC_OUTSWAP32_E |
- G1_REG_CONFIG_DEC_INSWAP32_E |
- G1_REG_CONFIG_DEC_STRSWAP32_E |
- G1_REG_CONFIG_DEC_CLK_GATE_E,
- G1_REG_CONFIG);
vdpu_write(vpu, G1_REG_INTERRUPT_DEC_E, G1_REG_INTERRUPT);
}
diff --git a/drivers/staging/media/hantro/hantro_regmap.c b/drivers/staging/media/hantro/hantro_regmap.c
index 2fc409cbd797..fbc39abedc7d 100644
--- a/drivers/staging/media/hantro/hantro_regmap.c
+++ b/drivers/staging/media/hantro/hantro_regmap.c
@@ -33,12 +33,54 @@ struct regmap_config hantro_regmap_enc = {
.name = "hantro_regmap_enc",
};

+struct hantro_field_dec {
+ struct reg_field cfg_dec_axi_rd_id;
+ struct reg_field cfg_dec_axi_wr_id;
+ struct reg_field cfg_dec_rlc_mode_e;
+ struct reg_field cfg_dec_mode;
+ struct reg_field cfg_dec_max_burst;
+ struct reg_field cfg_dec_apf_threshold;
+ struct reg_field cfg_dec_stream_len;
+ struct reg_field cfg_dec_init_qp;
+ struct reg_field cfg_dec_start_code_e;
+ struct reg_field cfg_dec_pic_width;
+ struct reg_field cfg_dec_pic_height;
+ struct reg_field cfg_dec_num_ref_frames;
+ struct reg_field cfg_dec_scaling_list_e;
+ struct reg_field cfg_dec_addr_str;
+ struct reg_field cfg_dec_addr_dst;
+ struct reg_field cfg_dec_ilace_mode;
+ struct reg_field cfg_dec_addr_qtable;
+ struct reg_field cfg_dec_addr_dir_mv;
+ struct reg_field cfg_dec_tiled_mode_lsb;
+ struct reg_field cfg_dec_clk_gate_e;
+};
+
struct hantro_field_enc {
/* TODO: populate encoder fields */
};

-struct hantro_field_dec {
- /* TODO: populate decoder fields */
+static const struct hantro_field_dec g1_field = {
+ .cfg_dec_tiled_mode_lsb = REG_FIELD(SWREG(2), 7, 7),
+ .cfg_dec_clk_gate_e = REG_FIELD(SWREG(2), 10, 10),
+ .cfg_dec_axi_rd_id = REG_FIELD(SWREG(2), 24, 31),
+ .cfg_dec_axi_wr_id = REG_FIELD(SWREG(3), 0, 7),
+ .cfg_dec_rlc_mode_e = REG_FIELD(SWREG(3), 27, 27),
+ .cfg_dec_mode = REG_FIELD(SWREG(3), 28, 31),
+ .cfg_dec_max_burst = REG_FIELD(SWREG(2), 0, 4),
+ .cfg_dec_apf_threshold = REG_FIELD(SWREG(55), 0, 13),
+ .cfg_dec_stream_len = REG_FIELD(SWREG(6), 0, 23),
+ .cfg_dec_init_qp = REG_FIELD(SWREG(6), 25, 30),
+ .cfg_dec_start_code_e = REG_FIELD(SWREG(6), 31, 31),
+ .cfg_dec_pic_width = REG_FIELD(SWREG(4), 23, 31),
+ .cfg_dec_pic_height = REG_FIELD(SWREG(4), 11, 18),
+ .cfg_dec_num_ref_frames = REG_FIELD(SWREG(4), 0, 4),
+ .cfg_dec_scaling_list_e = REG_FIELD(SWREG(5), 24, 24),
+ .cfg_dec_addr_str = REG_FIELD(SWREG(12), 0, 31),
+ .cfg_dec_addr_dst = REG_FIELD(SWREG(13), 0, 31),
+ .cfg_dec_ilace_mode = REG_FIELD(SWREG(13), 1, 1),
+ .cfg_dec_addr_qtable = REG_FIELD(SWREG(40), 0, 31),
+ .cfg_dec_addr_dir_mv = REG_FIELD(SWREG(41), 0, 31),
};

#define INIT_FIELD_CFG(f, codec, conf) ({ \
@@ -61,7 +103,27 @@ static int hantro_regmap_fields_init_dec(struct hantro_dev *vpu,
if (!vpu->reg_fields_dec)
return -ENOMEM;

- /* TODO: add decoder fields */
+ /* Decoder */
+ INIT_DEC_FIELD(dec_axi_wr_id);
+ INIT_DEC_FIELD(dec_axi_rd_id);
+ INIT_DEC_FIELD(dec_rlc_mode_e);
+ INIT_DEC_FIELD(dec_mode);
+ INIT_DEC_FIELD(dec_max_burst);
+ INIT_DEC_FIELD(dec_apf_threshold);
+ INIT_DEC_FIELD(dec_stream_len);
+ INIT_DEC_FIELD(dec_init_qp);
+ INIT_DEC_FIELD(dec_start_code_e);
+ INIT_DEC_FIELD(dec_pic_width);
+ INIT_DEC_FIELD(dec_pic_height);
+ INIT_DEC_FIELD(dec_num_ref_frames);
+ INIT_DEC_FIELD(dec_scaling_list_e);
+ INIT_DEC_FIELD(dec_addr_str);
+ INIT_DEC_FIELD(dec_addr_dst);
+ INIT_DEC_FIELD(dec_ilace_mode);
+ INIT_DEC_FIELD(dec_addr_qtable);
+ INIT_DEC_FIELD(dec_addr_dir_mv);
+ INIT_DEC_FIELD(dec_tiled_mode_lsb);
+ INIT_DEC_FIELD(dec_clk_gate_e);

return 0;
}
@@ -133,6 +195,17 @@ int hantro_regmap_init_dec(struct hantro_dev *vpu)

clk_bulk_disable(vpu->variant->num_clocks, vpu->clocks);

+ switch (vpu->core_hw_dec_rev) {
+ case HANTRO_G1_REV:
+ hantro_regmap_dec.max_register = 0x1D8;
+ field = &g1_field;
+ break;
+ default:
+ dev_err(vpu->dev, "Decoder revision 0x%x not supported by driver.\n",
+ vpu->core_hw_dec_rev);
+ return -ENODEV;
+ }
+
vpu->regs_dec = devm_regmap_init_mmio(vpu->dev, dec_base,
&hantro_regmap_dec);
if (IS_ERR(vpu->regs_dec)) {
diff --git a/drivers/staging/media/hantro/hantro_regmap.h b/drivers/staging/media/hantro/hantro_regmap.h
index 52668a8bafb9..e94fdc055784 100644
--- a/drivers/staging/media/hantro/hantro_regmap.h
+++ b/drivers/staging/media/hantro/hantro_regmap.h
@@ -9,8 +9,32 @@
#ifndef HANTRO_REGMAP_H_
#define HANTRO_REGMAP_H_

+#define HANTRO_G1_REV 0x6731
+
+#define SWREG(nr) ((nr) << 2)
+
struct hantro_regmap_fields_dec {
- /* TODO: populate decoder fields */
+ /* Decoder */
+ struct regmap_field *dec_axi_rd_id;
+ struct regmap_field *dec_axi_wr_id;
+ struct regmap_field *dec_max_burst;
+ struct regmap_field *dec_rlc_mode_e;
+ struct regmap_field *dec_mode;
+ struct regmap_field *dec_apf_threshold;
+ struct regmap_field *dec_stream_len;
+ struct regmap_field *dec_init_qp;
+ struct regmap_field *dec_start_code_e;
+ struct regmap_field *dec_pic_width;
+ struct regmap_field *dec_pic_height;
+ struct regmap_field *dec_num_ref_frames;
+ struct regmap_field *dec_scaling_list_e;
+ struct regmap_field *dec_addr_str;
+ struct regmap_field *dec_addr_dst;
+ struct regmap_field *dec_ilace_mode;
+ struct regmap_field *dec_addr_qtable;
+ struct regmap_field *dec_addr_dir_mv;
+ struct regmap_field *dec_tiled_mode_lsb;
+ struct regmap_field *dec_clk_gate_e;
};

struct hantro_regmap_fields_enc {
--
2.28.0

2020-10-13 13:55:05

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 09/18] media: hantro: default regmap to relaxed MMIO

This is done to match the pre-regmap membarrier behaviour, ensuring
default regmap_write calls in _relaxed() are indeed relaxed while
the non-relaxed versions include an explicit mem-barrier call.

Signed-off-by: Adrian Ratiu <[email protected]>
---
drivers/staging/media/hantro/hantro.h | 4 ++++
drivers/staging/media/hantro/hantro_regmap.c | 1 +
2 files changed, 5 insertions(+)

diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index c5425cd5ac84..5b7fbdc3779d 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -346,6 +346,7 @@ static inline void vepu_write_relaxed(struct hantro_dev *vpu,
static inline void vepu_write(struct hantro_dev *vpu, u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
+ wmb(); /* flush encoder previous relaxed writes */
regmap_write(vpu->regs_enc, reg, val);
}

@@ -354,6 +355,7 @@ static inline u32 vepu_read(struct hantro_dev *vpu, u32 reg)
u32 val;

regmap_read(vpu->regs_enc, reg, &val);
+ rmb(); /* read encoder swreg data in order */
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);

return val;
@@ -369,6 +371,7 @@ static inline void vdpu_write_relaxed(struct hantro_dev *vpu,
static inline void vdpu_write(struct hantro_dev *vpu, u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
+ wmb();/* flush decoder previous relaxed writes */
regmap_write(vpu->regs_dec, reg, val);
}

@@ -377,6 +380,7 @@ static inline u32 vdpu_read(struct hantro_dev *vpu, u32 reg)
u32 val;

regmap_read(vpu->regs_dec, reg, &val);
+ rmb(); /* read decoder swreg data in order */
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);

return val;
diff --git a/drivers/staging/media/hantro/hantro_regmap.c b/drivers/staging/media/hantro/hantro_regmap.c
index 890e443688e2..2fc409cbd797 100644
--- a/drivers/staging/media/hantro/hantro_regmap.c
+++ b/drivers/staging/media/hantro/hantro_regmap.c
@@ -21,6 +21,7 @@ struct regmap_config hantro_regmap_dec = {
.reg_stride = 4,
/* all hantro accesses are sequential, even with respect to irq ctx */
.disable_locking = true,
+ .use_relaxed_mmio = true,
.name = "hantro_regmap_dec",
};

--
2.28.0

2020-10-13 13:55:08

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 08/18] media: hantro: add initial MMIO regmap infrastructure

This creates regmaps on top of the memory mapped regions for encoders
and decoders and converts the helpers in hantro.h to do their R/W via
these regmaps.

In itself this indirection layer is quite useless, but the key is the
field API also initialized using the regmaps which is currently empty.

Further changes can define any necessary regmap field APIs for various
HW revisions like G1, G2 or configure the fields for different HW reg
layouts to support newer HW revisions like VC8000D.

No regmap is defined for the ctrl registers of imx8m because their
usage is very simple and there is no known register layout divergence.

Signed-off-by: Adrian Ratiu <[email protected]>
Signed-off-by: Ezequiel Garcia <[email protected]>
---
drivers/staging/media/hantro/Makefile | 1 +
drivers/staging/media/hantro/hantro.h | 35 +++--
drivers/staging/media/hantro/hantro_drv.c | 15 +-
drivers/staging/media/hantro/hantro_regmap.c | 144 +++++++++++++++++++
drivers/staging/media/hantro/hantro_regmap.h | 23 +++
5 files changed, 206 insertions(+), 12 deletions(-)
create mode 100644 drivers/staging/media/hantro/hantro_regmap.c
create mode 100644 drivers/staging/media/hantro/hantro_regmap.h

diff --git a/drivers/staging/media/hantro/Makefile b/drivers/staging/media/hantro/Makefile
index 743ce08eb184..52bc0ee73569 100644
--- a/drivers/staging/media/hantro/Makefile
+++ b/drivers/staging/media/hantro/Makefile
@@ -9,6 +9,7 @@ hantro-vpu-y += \
hantro_h1_jpeg_enc.o \
hantro_g1_h264_dec.o \
hantro_g1_mpeg2_dec.o \
+ hantro_regmap.o \
hantro_g1_vp8_dec.o \
rk3399_vpu_hw_jpeg_enc.o \
rk3399_vpu_hw_mpeg2_dec.o \
diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index 2dd4362d4080..c5425cd5ac84 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -16,6 +16,7 @@
#include <linux/videodev2.h>
#include <linux/wait.h>
#include <linux/clk.h>
+#include <linux/regmap.h>

#include <media/v4l2-ctrls.h>
#include <media/v4l2-device.h>
@@ -28,6 +29,8 @@

struct hantro_ctx;
struct hantro_codec_ops;
+struct hantro_regmap_dec_fields;
+struct hantro_regmap_enc_fields;

#define HANTRO_JPEG_ENCODER BIT(0)
#define HANTRO_ENCODERS 0x0000ffff
@@ -165,8 +168,12 @@ hantro_vdev_to_func(struct video_device *vdev)
* dev_ macros.
* @clocks: Array of clock handles.
* @reg_bases: Mapped addresses of VPU registers.
- * @enc_base: Mapped address of VPU encoder register for convenience.
- * @dec_base: Mapped address of VPU decoder register for convenience.
+ * @regs_enc: MMIO regmap of VPU encoder block for convenience.
+ * @regs_dec: MMIO regmap of VPU decoder block for convenience.
+ * @reg_fields_dec: Decoder regfields inside above regamp region.
+ * @reg_fields_enc: Encoder regfields inside above regamp region.
+ * @core_hw_dec_rev Runtime detected HW decoder core revision.
+ * @core_hw_enc_rev Runtime detected HW encoder core revision.
* @vpu_mutex: Mutex to synchronize V4L2 calls.
* @irqlock: Spinlock to synchronize access to data structures
* shared with interrupt handlers.
@@ -184,8 +191,12 @@ struct hantro_dev {
struct clk_bulk_data *clocks;
struct reset_control *reset;
void __iomem **reg_bases;
- void __iomem *enc_base;
- void __iomem *dec_base;
+ struct regmap *regs_dec;
+ struct regmap *regs_enc;
+ struct hantro_regmap_fields_dec *reg_fields_dec;
+ struct hantro_regmap_fields_enc *reg_fields_enc;
+ u32 core_hw_dec_rev;
+ u32 core_hw_enc_rev;

struct mutex vpu_mutex; /* video_device lock */
spinlock_t irqlock;
@@ -329,20 +340,22 @@ static inline void vepu_write_relaxed(struct hantro_dev *vpu,
u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
- writel_relaxed(val, vpu->enc_base + reg);
+ regmap_write(vpu->regs_enc, reg, val);
}

static inline void vepu_write(struct hantro_dev *vpu, u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
- writel(val, vpu->enc_base + reg);
+ regmap_write(vpu->regs_enc, reg, val);
}

static inline u32 vepu_read(struct hantro_dev *vpu, u32 reg)
{
- u32 val = readl(vpu->enc_base + reg);
+ u32 val;

+ regmap_read(vpu->regs_enc, reg, &val);
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
+
return val;
}

@@ -350,20 +363,22 @@ static inline void vdpu_write_relaxed(struct hantro_dev *vpu,
u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
- writel_relaxed(val, vpu->dec_base + reg);
+ regmap_write(vpu->regs_dec, reg, val);
}

static inline void vdpu_write(struct hantro_dev *vpu, u32 val, u32 reg)
{
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
- writel(val, vpu->dec_base + reg);
+ regmap_write(vpu->regs_dec, reg, val);
}

static inline u32 vdpu_read(struct hantro_dev *vpu, u32 reg)
{
- u32 val = readl(vpu->dec_base + reg);
+ u32 val;

+ regmap_read(vpu->regs_dec, reg, &val);
vpu_debug(6, "0x%04x = 0x%08x\n", reg / 4, val);
+
return val;
}

diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index 3734efa80a7e..e225515d6985 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -28,6 +28,7 @@
#include "hantro_v4l2.h"
#include "hantro.h"
#include "hantro_hw.h"
+#include "hantro_regmap.h"

#define DRIVER_NAME "hantro-vpu"

@@ -782,8 +783,6 @@ static int hantro_probe(struct platform_device *pdev)
if (IS_ERR(vpu->reg_bases[i]))
return PTR_ERR(vpu->reg_bases[i]);
}
- vpu->enc_base = vpu->reg_bases[0] + vpu->variant->enc_offset;
- vpu->dec_base = vpu->reg_bases[0] + vpu->variant->dec_offset;

ret = dma_set_coherent_mask(vpu->dev, DMA_BIT_MASK(32));
if (ret) {
@@ -829,6 +828,18 @@ static int hantro_probe(struct platform_device *pdev)
pm_runtime_use_autosuspend(vpu->dev);
pm_runtime_enable(vpu->dev);

+ ret = hantro_regmap_init_dec(vpu);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to init decoder regmap\n");
+ goto err_clk_unprepare;
+ }
+
+ ret = hantro_regmap_init_enc(vpu);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to init encoder regmap\n");
+ goto err_clk_unprepare;
+ }
+
ret = v4l2_device_register(&pdev->dev, &vpu->v4l2_dev);
if (ret) {
dev_err(&pdev->dev, "Failed to register v4l2 device\n");
diff --git a/drivers/staging/media/hantro/hantro_regmap.c b/drivers/staging/media/hantro/hantro_regmap.c
new file mode 100644
index 000000000000..890e443688e2
--- /dev/null
+++ b/drivers/staging/media/hantro/hantro_regmap.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hantro VPU codec driver
+ *
+ * Copyright (C) 2020 Collabora, Ltd.
+ *
+ */
+
+#include "hantro.h"
+#include "hantro_regmap.h"
+
+/**
+ * struct regmap_config - hantro regmap configuration, for now just a single cfg
+ * is used for all registers and fields, in the future this can be granularised
+ * to have different configs for eg enc, dec, pp, each with its own checks and
+ * valiations (bounds, read/write enforcement and so on)
+ */
+struct regmap_config hantro_regmap_dec = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+ /* all hantro accesses are sequential, even with respect to irq ctx */
+ .disable_locking = true,
+ .name = "hantro_regmap_dec",
+};
+
+struct regmap_config hantro_regmap_enc = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+ .disable_locking = true,
+ .name = "hantro_regmap_enc",
+};
+
+struct hantro_field_enc {
+ /* TODO: populate encoder fields */
+};
+
+struct hantro_field_dec {
+ /* TODO: populate decoder fields */
+};
+
+#define INIT_FIELD_CFG(f, codec, conf) ({ \
+ vpu->reg_fields_##codec->(f) = devm_regmap_field_alloc(vpu->dev,\
+ vpu->regs_##codec, \
+ field->(conf)); \
+ if (IS_ERR(vpu->reg_fields_##codec->f)) { \
+ dev_warn(vpu->dev, "Couldn't create regmap field " #f "\n"); \
+ return PTR_ERR(vpu->reg_fields_##codec->f); \
+ }})
+
+#define INIT_DEC_FIELD(f) INIT_FIELD_CFG(f, dec, cfg_##f)
+#define INIT_ENC_FIELD(f) INIT_FIELD_CFG(f, enc, cfg_##f)
+
+static int hantro_regmap_fields_init_dec(struct hantro_dev *vpu,
+ const struct hantro_field_dec *field)
+{
+ vpu->reg_fields_dec = devm_kzalloc(vpu->dev, sizeof(*vpu->reg_fields_dec),
+ GFP_KERNEL);
+ if (!vpu->reg_fields_dec)
+ return -ENOMEM;
+
+ /* TODO: add decoder fields */
+
+ return 0;
+}
+
+static int hantro_regmap_fields_init_enc(struct hantro_dev *vpu,
+ const struct hantro_field_enc *field)
+{
+ vpu->reg_fields_enc = devm_kzalloc(vpu->dev, sizeof(*vpu->reg_fields_enc),
+ GFP_KERNEL);
+ if (!vpu->reg_fields_enc)
+ return -ENOMEM;
+
+ /* TODO: add encoder fields */
+
+ return 0;
+}
+
+int hantro_regmap_init_enc(struct hantro_dev *vpu)
+{
+ const struct hantro_field_enc *field = NULL;
+ void __iomem *enc_base = vpu->reg_bases[0] + vpu->variant->enc_offset;
+ int ret;
+
+ if (!vpu->variant->enc_fmts)
+ return 0;
+
+ ret = clk_bulk_enable(vpu->variant->num_clocks, vpu->clocks);
+ if (ret) {
+ dev_err(vpu->dev, "Failed to enable clocks\n");
+ return ret;
+ }
+
+ vpu->core_hw_enc_rev = (readl(enc_base) >> 16) & 0xffff;
+ vpu_debug(0, "Detected hantro encoder revision %x\n",
+ vpu->core_hw_enc_rev);
+
+ clk_bulk_disable(vpu->variant->num_clocks, vpu->clocks);
+
+ vpu->regs_enc = devm_regmap_init_mmio(vpu->dev, enc_base,
+ &hantro_regmap_enc);
+ if (IS_ERR(vpu->regs_enc)) {
+ ret = PTR_ERR(vpu->regs_enc);
+ dev_err(vpu->dev, "Failed to create encoder regmap: %d\n", ret);
+ return ret;
+ }
+
+ return hantro_regmap_fields_init_enc(vpu, field);
+}
+
+int hantro_regmap_init_dec(struct hantro_dev *vpu)
+{
+ const struct hantro_field_dec *field = NULL;
+ void __iomem *dec_base = vpu->reg_bases[0] + vpu->variant->dec_offset;
+ int ret;
+
+ if (!vpu->variant->dec_fmts)
+ return 0;
+
+ ret = clk_bulk_enable(vpu->variant->num_clocks, vpu->clocks);
+ if (ret) {
+ dev_err(vpu->dev, "Failed to enable clocks\n");
+ return ret;
+ }
+
+ vpu->core_hw_dec_rev = (readl(dec_base) >> 16) & 0xffff;
+
+ vpu_debug(0, "Detected hantro decoder revision %x\n",
+ vpu->core_hw_dec_rev);
+
+ clk_bulk_disable(vpu->variant->num_clocks, vpu->clocks);
+
+ vpu->regs_dec = devm_regmap_init_mmio(vpu->dev, dec_base,
+ &hantro_regmap_dec);
+ if (IS_ERR(vpu->regs_dec)) {
+ ret = PTR_ERR(vpu->regs_dec);
+ dev_err(vpu->dev, "Failed to create decoder regmap: %d\n", ret);
+ return ret;
+ }
+
+ return hantro_regmap_fields_init_dec(vpu, field);
+}
diff --git a/drivers/staging/media/hantro/hantro_regmap.h b/drivers/staging/media/hantro/hantro_regmap.h
new file mode 100644
index 000000000000..52668a8bafb9
--- /dev/null
+++ b/drivers/staging/media/hantro/hantro_regmap.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Hantro VPU codec driver
+ *
+ * Copyright (C) 2020 Collabora, Ltd.
+ *
+ */
+
+#ifndef HANTRO_REGMAP_H_
+#define HANTRO_REGMAP_H_
+
+struct hantro_regmap_fields_dec {
+ /* TODO: populate decoder fields */
+};
+
+struct hantro_regmap_fields_enc {
+ /* TODO: populate encoder fields */
+};
+
+int hantro_regmap_init_dec(struct hantro_dev *vpu);
+int hantro_regmap_init_enc(struct hantro_dev *vpu);
+
+#endif /* HANTRO_REGMAP_H_ */
--
2.28.0

2020-10-13 13:55:12

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 17/18] media: hantro: add dump registers debug option before decode start

It is very useful to know the status of all the decoder configuration
registers right before starting a decode operation, so add an option
to print them if register debugging is enabled (debug bit 7 is set).

Signed-off-by: Adrian Ratiu <[email protected]>
---
drivers/staging/media/hantro/hantro.h | 1 +
drivers/staging/media/hantro/hantro_h264_dec.c | 9 ++++++++-
2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index 70aeb11b1149..1b0c441ff15a 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -304,6 +304,7 @@ struct hantro_reg {
* bit 4 - detail fmt, ctrl, buffer q/dq information
* bit 5 - detail function enter/leave trace information
* bit 6 - register write/read information
+ * bit 7 - dump
*/
extern int hantro_debug;

diff --git a/drivers/staging/media/hantro/hantro_h264_dec.c b/drivers/staging/media/hantro/hantro_h264_dec.c
index e64b59c84111..2c53394cbb0c 100644
--- a/drivers/staging/media/hantro/hantro_h264_dec.c
+++ b/drivers/staging/media/hantro/hantro_h264_dec.c
@@ -381,7 +381,9 @@ void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
struct hantro_dev *vpu = ctx->dev;
struct hantro_regmap_fields_dec *fields = vpu->reg_fields_dec;
bool do_high10 = (vpu->h264_hw_mode == HANTRO_H264_HIGH10);
- int reg;
+ u32 max_reg = hantro_regmap_dec.max_register;
+ u32 reg_stride = hantro_regmap_dec.reg_stride;
+ int reg, i;

/* Prepare the H264 decoder context. */
if (hantro_h264_dec_prepare_run(ctx))
@@ -421,6 +423,11 @@ void hantro_g1_h264_dec_run(struct hantro_ctx *ctx)
regmap_field_write(fields->dec_max_burst, 16);
regmap_field_write(fields->dec_axi_rd_id, 16);

+ vpu_debug(7, "Reg dump at decoding start\n");
+ for (i = 0; hantro_debug & BIT(7) && i <= max_reg; i += reg_stride)
+ vpu_debug(7, "swreg %03d: %08x\n", i / 4, vdpu_read(vpu, i));
+ vpu_debug(7, "Reg dump end\n");
+
/* Start decoding! */
vdpu_write(vpu, G1_REG_INTERRUPT_DEC_E, G1_REG_INTERRUPT);
}
--
2.28.0

2020-10-13 13:56:52

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 06/18] media: hantro: imx8mq: simplify ctrlblk reset logic

The G1 and G2 cores on imx8mq share a common "control block" used
to reset and enable the core clocks as well as enable functioning
via ctrl FUSE registers (these are not the FUSEs on the VPU cores,
they are just used to enable/disable the cores and allow the real
VPU FUSE regs to become available).

The problem is that, while the cores can be operated independently
from one another (different config reg mem regions, separate IRQs),
they can not be reset or powered down independently as the current
code implies. This has been a source for many bugs and frustration
when trying to enable G2 which this driver does not support yet.

So we simplify the ctrlblk reset logic to always reset both cores,
exactly like the vendor linux-imx provided driver "hantrodec" does
for this SoC.

Going forward, this simplified code should be moved in the future to
its own reset controller driver as the reset framework also supports
shared reset resources so the runtime PM logic can disable both cores
when none of them are in use (this is not done yet because only G1
is supported in the driver so there is no need to account for G2).

Signed-off-by: Adrian Ratiu <[email protected]>
---
drivers/staging/media/hantro/hantro.h | 2 -
drivers/staging/media/hantro/imx8m_vpu_hw.c | 74 +++++++--------------
2 files changed, 24 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro.h b/drivers/staging/media/hantro/hantro.h
index bb442eb1974e..2dd4362d4080 100644
--- a/drivers/staging/media/hantro/hantro.h
+++ b/drivers/staging/media/hantro/hantro.h
@@ -167,7 +167,6 @@ hantro_vdev_to_func(struct video_device *vdev)
* @reg_bases: Mapped addresses of VPU registers.
* @enc_base: Mapped address of VPU encoder register for convenience.
* @dec_base: Mapped address of VPU decoder register for convenience.
- * @ctrl_base: Mapped address of VPU control block.
* @vpu_mutex: Mutex to synchronize V4L2 calls.
* @irqlock: Spinlock to synchronize access to data structures
* shared with interrupt handlers.
@@ -187,7 +186,6 @@ struct hantro_dev {
void __iomem **reg_bases;
void __iomem *enc_base;
void __iomem *dec_base;
- void __iomem *ctrl_base;

struct mutex vpu_mutex; /* video_device lock */
spinlock_t irqlock;
diff --git a/drivers/staging/media/hantro/imx8m_vpu_hw.c b/drivers/staging/media/hantro/imx8m_vpu_hw.c
index c222de075ef4..b2a401a33992 100644
--- a/drivers/staging/media/hantro/imx8m_vpu_hw.c
+++ b/drivers/staging/media/hantro/imx8m_vpu_hw.c
@@ -24,34 +24,13 @@
#define CTRL_G1_PP_FUSE 0x0c
#define CTRL_G2_DEC_FUSE 0x10

-static void imx8m_soft_reset(struct hantro_dev *vpu, u32 reset_bits)
-{
- u32 val;
-
- /* Assert */
- val = readl(vpu->ctrl_base + CTRL_SOFT_RESET);
- val &= ~reset_bits;
- writel(val, vpu->ctrl_base + CTRL_SOFT_RESET);
-
- udelay(2);
-
- /* Release */
- val = readl(vpu->ctrl_base + CTRL_SOFT_RESET);
- val |= reset_bits;
- writel(val, vpu->ctrl_base + CTRL_SOFT_RESET);
-}
-
-static void imx8m_clk_enable(struct hantro_dev *vpu, u32 clock_bits)
-{
- u32 val;
-
- val = readl(vpu->ctrl_base + CTRL_CLOCK_ENABLE);
- val |= clock_bits;
- writel(val, vpu->ctrl_base + CTRL_CLOCK_ENABLE);
-}
-
-static int imx8mq_runtime_resume(struct hantro_dev *vpu)
+/*
+ * Due to a HW limitation, both G1 and G2 VPU cores on imx8mq need to be reset
+ * together via their unified ctrl block.
+ */
+static int imx8mq_ctrlblk_reset(struct hantro_dev *vpu)
{
+ void __iomem *ctrl_base = vpu->reg_bases[vpu->variant->num_regs - 1];
int ret;

ret = clk_bulk_prepare_enable(vpu->variant->num_clocks, vpu->clocks);
@@ -60,13 +39,18 @@ static int imx8mq_runtime_resume(struct hantro_dev *vpu)
return ret;
}

- imx8m_soft_reset(vpu, RESET_G1 | RESET_G2);
- imx8m_clk_enable(vpu, CLOCK_G1 | CLOCK_G2);
+ /* reset HW and ungate clocks via ctrl block */
+ writel(RESET_G1 | RESET_G2, ctrl_base + CTRL_SOFT_RESET);
+ writel(CLOCK_G1 | CLOCK_G2, ctrl_base + CTRL_CLOCK_ENABLE);

- /* Set values of the fuse registers */
- writel(0xffffffff, vpu->ctrl_base + CTRL_G1_DEC_FUSE);
- writel(0xffffffff, vpu->ctrl_base + CTRL_G1_PP_FUSE);
- writel(0xffffffff, vpu->ctrl_base + CTRL_G2_DEC_FUSE);
+ /*
+ * enable fuse functionalities for each core, these are not real fuses
+ * but registers which enable the cores and makes accesible their real
+ * read-only fuse registers describing supported features.
+ */
+ writel(0xffffffff, ctrl_base + CTRL_G1_DEC_FUSE);
+ writel(0xffffffff, ctrl_base + CTRL_G1_PP_FUSE);
+ writel(0xffffffff, ctrl_base + CTRL_G2_DEC_FUSE);

clk_bulk_disable_unprepare(vpu->variant->num_clocks, vpu->clocks);

@@ -148,19 +132,9 @@ static irqreturn_t imx8m_vpu_g1_irq(int irq, void *dev_id)
return IRQ_HANDLED;
}

-static int imx8mq_vpu_hw_init(struct hantro_dev *vpu)
+static void imx8m_vpu_reset(struct hantro_ctx *ctx)
{
- vpu->dec_base = vpu->reg_bases[0];
- vpu->ctrl_base = vpu->reg_bases[vpu->variant->num_regs - 1];
-
- return 0;
-}
-
-static void imx8m_vpu_g1_reset(struct hantro_ctx *ctx)
-{
- struct hantro_dev *vpu = ctx->dev;
-
- imx8m_soft_reset(vpu, RESET_G1);
+ imx8mq_ctrlblk_reset(ctx->dev);
}

/*
@@ -170,19 +144,19 @@ static void imx8m_vpu_g1_reset(struct hantro_ctx *ctx)
static const struct hantro_codec_ops imx8mq_vpu_codec_ops[] = {
[HANTRO_MODE_MPEG2_DEC] = {
.run = hantro_g1_mpeg2_dec_run,
- .reset = imx8m_vpu_g1_reset,
+ .reset = imx8m_vpu_reset,
.init = hantro_mpeg2_dec_init,
.exit = hantro_mpeg2_dec_exit,
},
[HANTRO_MODE_VP8_DEC] = {
.run = hantro_g1_vp8_dec_run,
- .reset = imx8m_vpu_g1_reset,
+ .reset = imx8m_vpu_reset,
.init = hantro_vp8_dec_init,
.exit = hantro_vp8_dec_exit,
},
[HANTRO_MODE_H264_DEC] = {
.run = hantro_g1_h264_dec_run,
- .reset = imx8m_vpu_g1_reset,
+ .reset = imx8m_vpu_reset,
.init = hantro_h264_dec_init,
.exit = hantro_h264_dec_exit,
},
@@ -209,8 +183,8 @@ const struct hantro_variant imx8mq_vpu_variant = {
.codec = HANTRO_MPEG2_DECODER | HANTRO_VP8_DECODER |
HANTRO_H264_DECODER,
.codec_ops = imx8mq_vpu_codec_ops,
- .init = imx8mq_vpu_hw_init,
- .runtime_resume = imx8mq_runtime_resume,
+ .init = imx8mq_ctrlblk_reset,
+ .runtime_resume = imx8mq_ctrlblk_reset,
.irqs = imx8mq_irqs,
.num_irqs = ARRAY_SIZE(imx8mq_irqs),
.clk_names = imx8mq_clk_names,
--
2.28.0

2020-10-13 14:07:00

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH 05/18] media: hantro: prepare clocks before variant inits are run

The fundamental idea is: clocks are prepared in the driver probe() then
each use-case will enable/disable them as needed.

Some variants like imx8mq need to have the clocks enabled during the
HW init phase, so they will benefit from having the clocks prepared
before the variant init callback to avoid duing a full prepare_enable/
unprepare_disable, so move the clk prepare a bit earlier.

Signed-off-by: Adrian Ratiu <[email protected]>
---
drivers/staging/media/hantro/hantro_drv.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index c2ea54552ce9..3734efa80a7e 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -813,22 +813,22 @@ static int hantro_probe(struct platform_device *pdev)
}
}

+ ret = clk_bulk_prepare(vpu->variant->num_clocks, vpu->clocks);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to prepare clocks\n");
+ return ret;
+ }
+
ret = vpu->variant->init(vpu);
if (ret) {
dev_err(&pdev->dev, "Failed to init VPU hardware\n");
- return ret;
+ goto err_clk_unprepare;
}

pm_runtime_set_autosuspend_delay(vpu->dev, 100);
pm_runtime_use_autosuspend(vpu->dev);
pm_runtime_enable(vpu->dev);

- ret = clk_bulk_prepare(vpu->variant->num_clocks, vpu->clocks);
- if (ret) {
- dev_err(&pdev->dev, "Failed to prepare clocks\n");
- return ret;
- }
-
ret = v4l2_device_register(&pdev->dev, &vpu->v4l2_dev);
if (ret) {
dev_err(&pdev->dev, "Failed to register v4l2 device\n");
--
2.28.0

2020-10-29 13:09:28

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

Hello Adrian,

On Mon, 2020-10-12 at 23:59 +0300, Adrian Ratiu wrote:
> Dear all,
>
> This series introduces a regmap infrastructure for the Hantro driver
> which is used to compensate for different HW-revision register layouts.
> To justify it h264 decoding capability is added for newer VC8000 chips.
>
> This is a gradual conversion to the new infra - a complete conversion
> would have been very big and I do not have all the HW yet to test (I'm
> expecting a RK3399 shipment next week though ;). I think converting the
> h264 decoder provides a nice blueprint for how the other codecs can be
> converted and enabled for different HW revisions.
>
> The end goal of this is to make the driver more generic and eliminate
> entirely custom boilerplate like `struct hantro_reg` or headers with
> core-specific bit manipulations like `hantro_g1_regs.h` and instead rely
> on the well-tested albeit more verbose regmap subsytem.
>
> To give just two examples of bugs which are easily discovered by using
> more verbose regmap fields (very easy to compare with the datasheets)
> instead of relying on bit-magic tricks: G1_REG_DEC_CTRL3_INIT_QP(x) was
> off-by-1 and the wrong .clk_gate bit was set in hantro_postproc.c.
>
> Anyway, this series also extends the MMIO regmap API to allow relaxed
> writes for the theoretical reason that avoiding unnecessary membarriers
> leads to less CPU usage and small improvements to battery life. However,
> in practice I could not measure differences between relaxed/non-relaxed
> IO, so I'm on the fence whether to keep or remove the relaxed calls.
>
> What I could masure is the performance impact of adding more sub-reg
> field acesses: a constant ~ 20 microsecond bump per G1 h264 frame. This
> is acceptable considering the total time to decode a frame takes three
> orders of magnitude longer, i.e. miliseconds ranges, depending on the
> frame size and bitstream params, so it is an acceptable trade-off to
> have a more generic driver.
>

Before going forward with using regmap, I would like to have a sense
of the footprint it adds, and see if we can avoid that 20 us penalty.

I'd also like to try another approach, something that has less
memory footprint and less runtime penalty.

How about something like this:

#define G1_PIC_WIDTH 4, 0xff8, 23
#define ...

struct hantro_swreg {
u32 value[399 /*whatever size goes here*/];
};

void hantro_reg_write(struct hantro_swreg *r,
unsigned int swreg, u32 mask, u32 offset, u32 new_val)
{
r->value[swreg] = (r->value[swreg] & ~(mask)) |
((new_val << offset) & mask);
}

Which you can then use in a very similar way as the current proposal:

hantro_reg_write(&swreg, G1_PIC_WIDTH, width);

The first advantage here is that we no longer have any
footprint for the fields.

The ugly macros for "4, 0xff8, 23" can be auto-generated from
existing vendor headers, when possible, so that shouldn't
bother us.

The register set is "flushed" using _relaxed, but it
could be still costly.

If that is indeed costly, perhaps we can avoid writing
the entire set by having a dirty bit somewhere.

In any case, it's worth exploring our options first, I think.

PS: Another option is to just fork RK3399 to its
own driver and call the day, given how different it is :)

Thanks!
Ezequiel


2020-10-29 14:18:08

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

On 2020-10-29 13:07, Ezequiel Garcia wrote:
> Hello Adrian,
>
> On Mon, 2020-10-12 at 23:59 +0300, Adrian Ratiu wrote:
>> Dear all,
>>
>> This series introduces a regmap infrastructure for the Hantro driver
>> which is used to compensate for different HW-revision register layouts.
>> To justify it h264 decoding capability is added for newer VC8000 chips.
>>
>> This is a gradual conversion to the new infra - a complete conversion
>> would have been very big and I do not have all the HW yet to test (I'm
>> expecting a RK3399 shipment next week though ;). I think converting the
>> h264 decoder provides a nice blueprint for how the other codecs can be
>> converted and enabled for different HW revisions.
>>
>> The end goal of this is to make the driver more generic and eliminate
>> entirely custom boilerplate like `struct hantro_reg` or headers with
>> core-specific bit manipulations like `hantro_g1_regs.h` and instead rely
>> on the well-tested albeit more verbose regmap subsytem.
>>
>> To give just two examples of bugs which are easily discovered by using
>> more verbose regmap fields (very easy to compare with the datasheets)
>> instead of relying on bit-magic tricks: G1_REG_DEC_CTRL3_INIT_QP(x) was
>> off-by-1 and the wrong .clk_gate bit was set in hantro_postproc.c.
>>
>> Anyway, this series also extends the MMIO regmap API to allow relaxed
>> writes for the theoretical reason that avoiding unnecessary membarriers
>> leads to less CPU usage and small improvements to battery life. However,
>> in practice I could not measure differences between relaxed/non-relaxed
>> IO, so I'm on the fence whether to keep or remove the relaxed calls.
>>
>> What I could masure is the performance impact of adding more sub-reg
>> field acesses: a constant ~ 20 microsecond bump per G1 h264 frame. This
>> is acceptable considering the total time to decode a frame takes three
>> orders of magnitude longer, i.e. miliseconds ranges, depending on the
>> frame size and bitstream params, so it is an acceptable trade-off to
>> have a more generic driver.
>>
>
> Before going forward with using regmap, I would like to have a sense
> of the footprint it adds, and see if we can avoid that 20 us penalty.
>
> I'd also like to try another approach, something that has less
> memory footprint and less runtime penalty.
>
> How about something like this:
>
> #define G1_PIC_WIDTH 4, 0xff8, 23
> #define ...
>
> struct hantro_swreg {
> u32 value[399 /*whatever size goes here*/];
> };
>
> void hantro_reg_write(struct hantro_swreg *r,
> unsigned int swreg, u32 mask, u32 offset, u32 new_val)
> {
> r->value[swreg] = (r->value[swreg] & ~(mask)) |
> ((new_val << offset) & mask);
> }
>
> Which you can then use in a very similar way as the current proposal:
>
> hantro_reg_write(&swreg, G1_PIC_WIDTH, width);
>
> The first advantage here is that we no longer have any
> footprint for the fields.
>
> The ugly macros for "4, 0xff8, 23" can be auto-generated from
> existing vendor headers, when possible, so that shouldn't
> bother us.
>
> The register set is "flushed" using _relaxed, but it
> could be still costly.
>
> If that is indeed costly, perhaps we can avoid writing
> the entire set by having a dirty bit somewhere.
>
> In any case, it's worth exploring our options first, I think.

Or maybe the regmap API itself deserves extending with a "deferred"
operating mode where updates to the cached state can be separated from
committing that state to the underlying hardware.

...which, after a brief code search out of curiosity, apparently already
exists in the form of regcache_cache_only()/regcache_sync(), so there's
probably no need to reinvent it :)

Robin.

>
> PS: Another option is to just fork RK3399 to its
> own driver and call the day, given how different it is :)
>
> Thanks!
> Ezequiel
>
>
>
> _______________________________________________
> Linux-rockchip mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>

2020-10-29 14:50:21

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

On Thu, Oct 29, 2020 at 02:15:10PM +0000, Robin Murphy wrote:

> Or maybe the regmap API itself deserves extending with a "deferred"
> operating mode where updates to the cached state can be separated from
> committing that state to the underlying hardware.

> ...which, after a brief code search out of curiosity, apparently already
> exists in the form of regcache_cache_only()/regcache_sync(), so there's
> probably no need to reinvent it :)

Yes, exactly. One of the big use cases for regmap on MMIO devices is
being able to access the register map without the hardware being there,
this would be another application of the cache stuff.


Attachments:
(No filename) (655.00 B)
signature.asc (499.00 B)
Download all attachments

2020-10-29 16:29:19

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

Hi Robin,

On Thu, 2020-10-29 at 14:15 +0000, Robin Murphy wrote:
> On 2020-10-29 13:07, Ezequiel Garcia wrote:
> > Hello Adrian,
> >
> > On Mon, 2020-10-12 at 23:59 +0300, Adrian Ratiu wrote:
> > > Dear all,
> > >
> > > This series introduces a regmap infrastructure for the Hantro driver
> > > which is used to compensate for different HW-revision register layouts.
> > > To justify it h264 decoding capability is added for newer VC8000 chips.
> > >
> > > This is a gradual conversion to the new infra - a complete conversion
> > > would have been very big and I do not have all the HW yet to test (I'm
> > > expecting a RK3399 shipment next week though ;). I think converting the
> > > h264 decoder provides a nice blueprint for how the other codecs can be
> > > converted and enabled for different HW revisions.
> > >
> > > The end goal of this is to make the driver more generic and eliminate
> > > entirely custom boilerplate like `struct hantro_reg` or headers with
> > > core-specific bit manipulations like `hantro_g1_regs.h` and instead rely
> > > on the well-tested albeit more verbose regmap subsytem.
> > >
> > > To give just two examples of bugs which are easily discovered by using
> > > more verbose regmap fields (very easy to compare with the datasheets)
> > > instead of relying on bit-magic tricks: G1_REG_DEC_CTRL3_INIT_QP(x) was
> > > off-by-1 and the wrong .clk_gate bit was set in hantro_postproc.c.
> > >
> > > Anyway, this series also extends the MMIO regmap API to allow relaxed
> > > writes for the theoretical reason that avoiding unnecessary membarriers
> > > leads to less CPU usage and small improvements to battery life. However,
> > > in practice I could not measure differences between relaxed/non-relaxed
> > > IO, so I'm on the fence whether to keep or remove the relaxed calls.
> > >
> > > What I could masure is the performance impact of adding more sub-reg
> > > field acesses: a constant ~ 20 microsecond bump per G1 h264 frame. This
> > > is acceptable considering the total time to decode a frame takes three
> > > orders of magnitude longer, i.e. miliseconds ranges, depending on the
> > > frame size and bitstream params, so it is an acceptable trade-off to
> > > have a more generic driver.
> > >
> >
> > Before going forward with using regmap, I would like to have a sense
> > of the footprint it adds, and see if we can avoid that 20 us penalty.
> >
> > I'd also like to try another approach, something that has less
> > memory footprint and less runtime penalty.
> >
> > How about something like this:
> >
> > #define G1_PIC_WIDTH 4, 0xff8, 23
> > #define ...
> >
> > struct hantro_swreg {
> > u32 value[399 /*whatever size goes here*/];
> > };
> >
> > void hantro_reg_write(struct hantro_swreg *r,
> > unsigned int swreg, u32 mask, u32 offset, u32 new_val)
> > {
> > r->value[swreg] = (r->value[swreg] & ~(mask)) |
> > ((new_val << offset) & mask);
> > }
> >
> > Which you can then use in a very similar way as the current proposal:
> >
> > hantro_reg_write(&swreg, G1_PIC_WIDTH, width);
> >
> > The first advantage here is that we no longer have any
> > footprint for the fields.
> >
> > The ugly macros for "4, 0xff8, 23" can be auto-generated from
> > existing vendor headers, when possible, so that shouldn't
> > bother us.
> >
> > The register set is "flushed" using _relaxed, but it
> > could be still costly.
> >
> > If that is indeed costly, perhaps we can avoid writing
> > the entire set by having a dirty bit somewhere.
> >
> > In any case, it's worth exploring our options first, I think.
>
> Or maybe the regmap API itself deserves extending with a "deferred"
> operating mode where updates to the cached state can be separated from
> committing that state to the underlying hardware.
>
> ...which, after a brief code search out of curiosity, apparently already
> exists in the form of regcache_cache_only()/regcache_sync(), so there's
> probably no need to reinvent it :)
>

To be fair, and despite it could seem an anti-pattern, this particular
wheel is so tiny and trivial, that I'm starting to seriously consider
reinventing it.

I've been thinking long about this but just can't seem to see exactly
what benefit we're getting from using MMIO regmaps here,
as opposed to just a simple macro with an index, a mask, and an offset.

Ezequiel


2020-10-29 18:02:21

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support

On Thu, Oct 29, 2020 at 01:27:08PM -0300, Ezequiel Garcia wrote:
> On Thu, 2020-10-29 at 14:15 +0000, Robin Murphy wrote:

> > Or maybe the regmap API itself deserves extending with a "deferred"
> > operating mode where updates to the cached state can be separated from
> > committing that state to the underlying hardware.

> > ...which, after a brief code search out of curiosity, apparently already
> > exists in the form of regcache_cache_only()/regcache_sync(), so there's
> > probably no need to reinvent it :)

> To be fair, and despite it could seem an anti-pattern, this particular
> wheel is so tiny and trivial, that I'm starting to seriously consider
> reinventing it.

> I've been thinking long about this but just can't seem to see exactly
> what benefit we're getting from using MMIO regmaps here,
> as opposed to just a simple macro with an index, a mask, and an offset.

As a rule of thumb if you're not using a cache or fitting into some
other higher level framework stuff that uses regmap then I wouldn't
bother for MMIO devices.


Attachments:
(No filename) (1.05 kB)
signature.asc (499.00 B)
Download all attachments