This series adds Tegra210, Tegra186, and Tehra194 QSPI driver and
enables QSPI on Jetson Nano and Jetson Xavier NX.
QSPI controller is available on Tegra210, Tegra186 and Tegra194.
Tegra186 and Tegra194 has additional feature of combined sequence mode
where command, address and data can all be transferred in a single transfer.
Combined sequence mode is useful with DMA mode transfer.
This series does not have combined sequence mode feature as Tegra186/Tegra194
GPCDMA driver is not upstreamed yet.
This series includes
- dt-binding document
- QSPI driver for Tegra210/Tegra186/Tegra194
- Enables QSPI on Jetson Nano and Jetson Xavier NX.
Sowjanya Komatineni (7):
MAINTAINERS: Add Tegra QSPI driver section
dt-bindings: spi: Add Tegra QSPI device tree binding
spi: qspi-tegra: Add support for Tegra210 QSPI controller
spi: qspi-tegra: Add Tegra186 and Tegra194 QSPI compatibles
arm64: tegra: Enable QSPI on Jetson Nano
arm64: tegra: Add QSPI nodes on Tegra194
arm64: tegra: Enable QSPI on Jetson Xavier NX
.../devicetree/bindings/spi/nvidia,tegra-qspi.yaml | 77 ++
MAINTAINERS | 8 +
.../dts/nvidia/tegra194-p3509-0000+p3668-0000.dts | 12 +
arch/arm64/boot/dts/nvidia/tegra194.dtsi | 26 +
arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts | 12 +
drivers/spi/Kconfig | 9 +
drivers/spi/Makefile | 1 +
drivers/spi/qspi-tegra.c | 1420 ++++++++++++++++++++
8 files changed, 1565 insertions(+)
create mode 100644 Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
create mode 100644 drivers/spi/qspi-tegra.c
--
2.7.4
This patch adds YAML based device tree binding document for Tegra
QSPI driver.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
.../devicetree/bindings/spi/nvidia,tegra-qspi.yaml | 77 ++++++++++++++++++++++
1 file changed, 77 insertions(+)
create mode 100644 Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
new file mode 100644
index 0000000..038a085
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
@@ -0,0 +1,77 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/spi/nvidia,tegra-qspi.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Tegra Quad SPI Controller
+
+maintainers:
+ - Thierry Reding <[email protected]>
+ - Jonathan Hunter <[email protected]>
+
+properties:
+ compatible:
+ enum:
+ - nvidia,tegra210-qspi
+ - nvidia,tegra186-qspi
+ - nvidia,tegra194-qspi
+
+ reg:
+ items:
+ - description: QSPI registers
+
+ interrupts:
+ maxItems: 1
+
+ clock-names:
+ items:
+ - const: qspi
+
+ clocks:
+ maxItems: 1
+
+ reset-names:
+ minItems: 1
+ items:
+ - const: qspi
+
+ resets:
+ maxItems: 1
+
+ dmas:
+ maxItems: 2
+
+ dma-names:
+ items:
+ - const: rx
+ - const: tx
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clock-names
+ - clocks
+ - reset-names
+ - resets
+
+additionalProperties: true
+
+examples:
+ - |
+ #include <dt-bindings/clock/tegra210-car.h>
+ #include <dt-bindings/reset/tegra210-car.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ spi@70410000 {
+ compatible = "nvidia,tegra210-qspi";
+ reg = <0x70410000 0x1000>;
+ interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&tegra_car TEGRA210_CLK_QSPI>;
+ clock-names = "qspi";
+ resets = <&tegra_car 211>;
+ reset-names = "qspi";
+ dmas = <&apbdma 5>, <&apbdma 5>;
+ dma-names = "rx", "tx";
+ };
--
2.7.4
QSPI controller on Tegra186 and Tegra194 is similar to Tegra210
QSPI controller in terms of transferring command, address and data
in multiple transfers.
This patch adds Tegra186 and Tegra194 compatibles to QSPI driver.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
drivers/spi/qspi-tegra.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/spi/qspi-tegra.c b/drivers/spi/qspi-tegra.c
index 67a8b44..cd6a3c3 100644
--- a/drivers/spi/qspi-tegra.c
+++ b/drivers/spi/qspi-tegra.c
@@ -1181,6 +1181,8 @@ static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
static const struct of_device_id tegra_qspi_of_match[] = {
{ .compatible = "nvidia,tegra210-qspi", },
+ { .compatible = "nvidia,tegra186-qspi", },
+ { .compatible = "nvidia,tegra194-qspi", },
{}
};
MODULE_DEVICE_TABLE(of, tegra_qspi_of_match);
--
2.7.4
Add maintainers and mailing list entries to Tegra QSPI driver section.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
MAINTAINERS | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 1cec5a0..d0fe832 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17346,6 +17346,14 @@ M: Laxman Dewangan <[email protected]>
S: Supported
F: drivers/spi/spi-tegra*
+TEGRA QSPI DRIVER
+M: Thierry Reding <[email protected]>
+M: Jonathan Hunter <[email protected]>
+M: Sowjanya Komatineni <[email protected]>
+L: [email protected]
+S: Maintained
+F: drivers/spi/qspi-tegra.c
+
TEGRA VIDEO DRIVER
M: Thierry Reding <[email protected]>
M: Jonathan Hunter <[email protected]>
--
2.7.4
Jetson Nano has Macronix MX25U3235F Quad SPI Flash.
This patch enables QSPI on Jetson Nano.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts b/arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts
index 6a877de..a1b4603 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra210-p3450-0000.dts
@@ -638,6 +638,18 @@
};
};
+ spi@70410000 {
+ status = "okay";
+
+ flash@0 {
+ compatible = "spi-nor";
+ reg = <0>;
+ spi-max-frequency = <104000000>;
+ spi-tx-bus-width = <2>;
+ spi-rx-bus-width = <2>;
+ };
+ };
+
clk32k_in: clock@0 {
compatible = "fixed-clock";
clock-frequency = <32768>;
--
2.7.4
Tegra SoC has a Quad SPI controller starting from Tegra210.
This patch adds support for Tegra210 QSPI controller.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
drivers/spi/Kconfig | 9 +
drivers/spi/Makefile | 1 +
drivers/spi/qspi-tegra.c | 1418 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 1428 insertions(+)
create mode 100644 drivers/spi/qspi-tegra.c
diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index 3fd16b7..1a021e8 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -844,6 +844,15 @@ config SPI_MXS
help
SPI driver for Freescale MXS devices.
+config QSPI_TEGRA
+ tristate "Nvidia Tegra QSPI Controller"
+ depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
+ depends on RESET_CONTROLLER
+ help
+ QSPI driver for Nvidia Tegra QSPI Controller interface. This
+ controller is different from the spi controller and is available
+ on Tegra SoCs starting from Tegra210.
+
config SPI_TEGRA114
tristate "NVIDIA Tegra114 SPI Controller"
depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile
index 6fea582..eb56f89 100644
--- a/drivers/spi/Makefile
+++ b/drivers/spi/Makefile
@@ -115,6 +115,7 @@ obj-$(CONFIG_SPI_ST_SSC4) += spi-st-ssc4.o
obj-$(CONFIG_SPI_SUN4I) += spi-sun4i.o
obj-$(CONFIG_SPI_SUN6I) += spi-sun6i.o
obj-$(CONFIG_SPI_SYNQUACER) += spi-synquacer.o
+obj-$(CONFIG_QSPI_TEGRA) += qspi-tegra.o
obj-$(CONFIG_SPI_TEGRA114) += spi-tegra114.o
obj-$(CONFIG_SPI_TEGRA20_SFLASH) += spi-tegra20-sflash.o
obj-$(CONFIG_SPI_TEGRA20_SLINK) += spi-tegra20-slink.o
diff --git a/drivers/spi/qspi-tegra.c b/drivers/spi/qspi-tegra.c
new file mode 100644
index 0000000..67a8b44
--- /dev/null
+++ b/drivers/spi/qspi-tegra.c
@@ -0,0 +1,1418 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 NVIDIA CORPORATION. All rights reserved.
+ */
+
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/reset.h>
+#include <linux/spi/spi.h>
+
+#define QSPI_COMMAND1 0x000
+#define QSPI_BIT_LENGTH(x) (((x) & 0x1f) << 0)
+#define QSPI_PACKED BIT(5)
+#define QSPI_INTERFACE_WIDTH_MASK (0x03 << 7)
+#define QSPI_INTERFACE_WIDTH(x) (((x) & 0x03) << 7)
+#define QSPI_INTERFACE_WIDTH_SINGLE QSPI_INTERFACE_WIDTH(0)
+#define QSPI_INTERFACE_WIDTH_DUAL QSPI_INTERFACE_WIDTH(1)
+#define QSPI_INTERFACE_WIDTH_QUAD QSPI_INTERFACE_WIDTH(2)
+#define QSPI_SDR_DDR_SEL BIT(9)
+#define QSPI_TX_EN BIT(11)
+#define QSPI_RX_EN BIT(12)
+#define QSPI_CS_SW_VAL BIT(20)
+#define QSPI_CS_SW_HW BIT(21)
+#define QSPI_CONTROL_MODE_0 (0 << 28)
+#define QSPI_CONTROL_MODE_3 (3 << 28)
+#define QSPI_CONTROL_MODE_MASK (3 << 28)
+#define QSPI_M_S BIT(30)
+#define QSPI_PIO BIT(31)
+
+#define QSPI_COMMAND2 0x004
+#define QSPI_TX_TAP_DELAY(x) (((x) & 0x3f) << 10)
+#define QSPI_RX_TAP_DELAY(x) (((x) & 0xff) << 0)
+
+#define QSPI_CS_TIMING1 0x008
+#define QSPI_SETUP_HOLD(setup, hold) (((setup) << 4) | (hold))
+
+#define QSPI_CS_TIMING2 0x00c
+#define CYCLES_BETWEEN_PACKETS_0(x) (((x) & 0x1f) << 0)
+#define CS_ACTIVE_BETWEEN_PACKETS_0 BIT(5)
+
+#define QSPI_TRANS_STATUS 0x010
+#define QSPI_BLK_CNT(val) (((val) >> 0) & 0xffff)
+#define QSPI_RDY BIT(30)
+
+#define QSPI_FIFO_STATUS 0x014
+#define QSPI_RX_FIFO_EMPTY BIT(0)
+#define QSPI_RX_FIFO_FULL BIT(1)
+#define QSPI_TX_FIFO_EMPTY BIT(2)
+#define QSPI_TX_FIFO_FULL BIT(3)
+#define QSPI_RX_FIFO_UNF BIT(4)
+#define QSPI_RX_FIFO_OVF BIT(5)
+#define QSPI_TX_FIFO_UNF BIT(6)
+#define QSPI_TX_FIFO_OVF BIT(7)
+#define QSPI_ERR BIT(8)
+#define QSPI_TX_FIFO_FLUSH BIT(14)
+#define QSPI_RX_FIFO_FLUSH BIT(15)
+#define QSPI_TX_FIFO_EMPTY_COUNT(val) (((val) >> 16) & 0x7f)
+#define QSPI_RX_FIFO_FULL_COUNT(val) (((val) >> 23) & 0x7f)
+
+#define QSPI_FIFO_ERROR (QSPI_RX_FIFO_UNF | \
+ QSPI_RX_FIFO_OVF | \
+ QSPI_TX_FIFO_UNF | \
+ QSPI_TX_FIFO_OVF)
+#define QSPI_FIFO_EMPTY (QSPI_RX_FIFO_EMPTY | \
+ QSPI_TX_FIFO_EMPTY)
+
+#define QSPI_TX_DATA 0x018
+#define QSPI_RX_DATA 0x01c
+
+#define QSPI_DMA_CTL 0x020
+#define QSPI_TX_TRIG(n) (((n) & 0x3) << 15)
+#define QSPI_TX_TRIG_1 QSPI_TX_TRIG(0)
+#define QSPI_TX_TRIG_4 QSPI_TX_TRIG(1)
+#define QSPI_TX_TRIG_8 QSPI_TX_TRIG(2)
+#define QSPI_TX_TRIG_16 QSPI_TX_TRIG(3)
+
+#define QSPI_RX_TRIG(n) (((n) & 0x3) << 19)
+#define QSPI_RX_TRIG_1 QSPI_RX_TRIG(0)
+#define QSPI_RX_TRIG_4 QSPI_RX_TRIG(1)
+#define QSPI_RX_TRIG_8 QSPI_RX_TRIG(2)
+#define QSPI_RX_TRIG_16 QSPI_RX_TRIG(3)
+
+#define QSPI_DMA_EN BIT(31)
+
+#define QSPI_DMA_BLK 0x024
+#define QSPI_DMA_BLK_SET(x) (((x) & 0xffff) << 0)
+
+#define QSPI_TX_FIFO 0x108
+#define QSPI_RX_FIFO 0x188
+
+#define QSPI_FIFO_DEPTH 64
+
+#define QSPI_INTR_MASK 0x18c
+#define QSPI_INTR_RX_FIFO_UNF_MASK BIT(25)
+#define QSPI_INTR_RX_FIFO_OVF_MASK BIT(26)
+#define QSPI_INTR_TX_FIFO_UNF_MASK BIT(27)
+#define QSPI_INTR_TX_FIFO_OVF_MASK BIT(28)
+#define QSPI_INTR_RDY_MASK BIT(29)
+#define QSPI_INTR_RX_TX_FIFO_ERR (QSPI_INTR_RX_FIFO_UNF_MASK | \
+ QSPI_INTR_RX_FIFO_OVF_MASK | \
+ QSPI_INTR_TX_FIFO_UNF_MASK | \
+ QSPI_INTR_TX_FIFO_OVF_MASK)
+
+#define QSPI_MISC_REG 0x194
+#define QSPI_NUM_DUMMY_CYCLE(x) (((x) & 0xff) << 0)
+
+#define DATA_DIR_TX BIT(0)
+#define DATA_DIR_RX BIT(1)
+
+#define QSPI_DMA_TIMEOUT (msecs_to_jiffies(1000))
+#define DEFAULT_QSPI_DMA_BUF_LEN (64 * 1024)
+#define QSPI_MAX_SPEED 102000000
+
+enum transfer_phase {
+ CMD_BYTE_XFER = 0,
+ ADDR_BYTES_XFER,
+ DUMMY_BYTES_XFER,
+ DATA_BYTES_XFER,
+ MAX_XFERS,
+};
+
+struct tegra_qspi_client_data {
+ int tx_clk_tap_delay;
+ int rx_clk_tap_delay;
+};
+
+struct tegra_qspi_data {
+ struct device *dev;
+ struct spi_master *master;
+ /* Lock to protect data accessed by irq */
+ spinlock_t lock;
+
+ struct clk *clk;
+ struct reset_control *rst;
+ void __iomem *base;
+ phys_addr_t phys;
+ unsigned int irq;
+
+ u32 cur_speed;
+ unsigned int cur_pos;
+ unsigned int words_per_32bit;
+ unsigned int bytes_per_word;
+ unsigned int curr_dma_words;
+ unsigned int cur_direction;
+
+ unsigned int cur_rx_pos;
+ unsigned int cur_tx_pos;
+
+ unsigned int dma_buf_size;
+ unsigned int max_buf_size;
+ bool is_curr_dma_xfer;
+
+ struct completion rx_dma_complete;
+ struct completion tx_dma_complete;
+
+ u32 tx_status;
+ u32 rx_status;
+ u32 status_reg;
+ bool is_packed;
+ bool use_dma;
+
+ u32 command1_reg;
+ u32 dma_control_reg;
+ u32 def_command1_reg;
+ u32 def_command2_reg;
+ u32 spi_cs_timing1;
+ u32 spi_cs_timing2;
+ u8 dummy_cycles;
+
+ struct completion xfer_completion;
+ struct spi_transfer *curr_xfer;
+
+ struct dma_chan *rx_dma_chan;
+ u32 *rx_dma_buf;
+ dma_addr_t rx_dma_phys;
+ struct dma_async_tx_descriptor *rx_dma_desc;
+
+ struct dma_chan *tx_dma_chan;
+ u32 *tx_dma_buf;
+ dma_addr_t tx_dma_phys;
+ struct dma_async_tx_descriptor *tx_dma_desc;
+};
+
+static int tegra_qspi_runtime_suspend(struct device *dev);
+static int tegra_qspi_runtime_resume(struct device *dev);
+
+static inline u32 tegra_qspi_readl(struct tegra_qspi_data *tqspi,
+ unsigned long reg)
+{
+ return readl(tqspi->base + reg);
+}
+
+static inline void tegra_qspi_writel(struct tegra_qspi_data *tqspi,
+ u32 val, unsigned long reg)
+{
+ writel(val, tqspi->base + reg);
+
+ /* Read back register to make sure that register writes completed */
+ if (reg != QSPI_TX_FIFO)
+ readl(tqspi->base + QSPI_COMMAND1);
+}
+
+static void tegra_qspi_mask_clear_irq(struct tegra_qspi_data *tqspi)
+{
+ u32 val;
+
+ /* Write 1 to clear status register */
+ val = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
+ tegra_qspi_writel(tqspi, val, QSPI_TRANS_STATUS);
+
+ val = tegra_qspi_readl(tqspi, QSPI_INTR_MASK);
+ if (!(val & QSPI_INTR_RDY_MASK)) {
+ val |= (QSPI_INTR_RDY_MASK | QSPI_INTR_RX_TX_FIFO_ERR);
+ tegra_qspi_writel(tqspi, val, QSPI_INTR_MASK);
+ }
+
+ /* Clear fifo status error if any */
+ val = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if (val & QSPI_ERR)
+ tegra_qspi_writel(tqspi, QSPI_ERR | QSPI_FIFO_ERROR,
+ QSPI_FIFO_STATUS);
+}
+
+static unsigned int
+tegra_qspi_calculate_curr_xfer_param(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ unsigned int remain_len = t->len - tqspi->cur_pos;
+ unsigned int max_word;
+ unsigned int bits_per_word = t->bits_per_word;
+ unsigned int max_len;
+ unsigned int total_fifo_words;
+
+ tqspi->bytes_per_word = DIV_ROUND_UP(bits_per_word, 8);
+
+ if ((bits_per_word == 8 || bits_per_word == 16 ||
+ bits_per_word == 32) && t->len > 3) {
+ tqspi->is_packed = true;
+ tqspi->words_per_32bit = 32 / bits_per_word;
+ } else {
+ tqspi->is_packed = false;
+ tqspi->words_per_32bit = 1;
+ }
+
+ if (tqspi->is_packed) {
+ max_len = min(remain_len, tqspi->max_buf_size);
+ tqspi->curr_dma_words = max_len / tqspi->bytes_per_word;
+ total_fifo_words = (max_len + 3) / 4;
+ } else {
+ max_word = (remain_len - 1) / tqspi->bytes_per_word + 1;
+ max_word = min(max_word, tqspi->max_buf_size / 4);
+ tqspi->curr_dma_words = max_word;
+ total_fifo_words = max_word;
+ }
+
+ return total_fifo_words;
+}
+
+static unsigned int
+tegra_qspi_fill_tx_fifo_from_client_txbuf(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ unsigned int len;
+ unsigned int tx_empty_count;
+ u32 fifo_status, x;
+ unsigned int max_n_32bit;
+ unsigned int i, count;
+ unsigned int written_words;
+ unsigned int fifo_words_left;
+ u8 *tx_buf = (u8 *)t->tx_buf + tqspi->cur_tx_pos;
+
+ fifo_status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ tx_empty_count = QSPI_TX_FIFO_EMPTY_COUNT(fifo_status);
+
+ if (tqspi->is_packed) {
+ fifo_words_left = tx_empty_count * tqspi->words_per_32bit;
+ written_words = min(fifo_words_left, tqspi->curr_dma_words);
+ len = written_words * tqspi->bytes_per_word;
+ max_n_32bit = DIV_ROUND_UP(len, 4);
+ for (count = 0; count < max_n_32bit; count++) {
+ x = 0;
+
+ for (i = 0; (i < 4) && len; i++, len--)
+ x |= (u32)(*tx_buf++) << (i * 8);
+ tegra_qspi_writel(tqspi, x, QSPI_TX_FIFO);
+ }
+
+ tqspi->cur_tx_pos += written_words * tqspi->bytes_per_word;
+ } else {
+ unsigned int write_bytes;
+ u8 bytes_per_word = tqspi->bytes_per_word;
+
+ max_n_32bit = min(tqspi->curr_dma_words, tx_empty_count);
+ written_words = max_n_32bit;
+ len = written_words * tqspi->bytes_per_word;
+ if (len > t->len - tqspi->cur_pos)
+ len = t->len - tqspi->cur_pos;
+ write_bytes = len;
+ for (count = 0; count < max_n_32bit; count++) {
+ x = 0;
+
+ for (i = 0; len && (i < bytes_per_word); i++, len--)
+ x |= (u32)(*tx_buf++) << (i * 8);
+ tegra_qspi_writel(tqspi, x, QSPI_TX_FIFO);
+ }
+
+ tqspi->cur_tx_pos += write_bytes;
+ }
+
+ return written_words;
+}
+
+static unsigned int
+tegra_qspi_read_rx_fifo_to_client_rxbuf(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ unsigned int rx_full_count;
+ u32 fifo_status, x;
+ unsigned int i, count;
+ unsigned int read_words = 0;
+ unsigned int len;
+ u8 *rx_buf = (u8 *)t->rx_buf + tqspi->cur_rx_pos;
+
+ fifo_status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ rx_full_count = QSPI_RX_FIFO_FULL_COUNT(fifo_status);
+ if (tqspi->is_packed) {
+ len = tqspi->curr_dma_words * tqspi->bytes_per_word;
+ for (count = 0; count < rx_full_count; count++) {
+ x = tegra_qspi_readl(tqspi, QSPI_RX_FIFO);
+
+ for (i = 0; len && (i < 4); i++, len--)
+ *rx_buf++ = (x >> i * 8) & 0xff;
+ }
+
+ read_words += tqspi->curr_dma_words;
+ tqspi->cur_rx_pos += tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+ } else {
+ u32 rx_mask = ((u32)1 << t->bits_per_word) - 1;
+ u8 bytes_per_word = tqspi->bytes_per_word;
+ unsigned int read_bytes;
+
+ len = rx_full_count * bytes_per_word;
+ if (len > t->len - tqspi->cur_pos)
+ len = t->len - tqspi->cur_pos;
+ read_bytes = len;
+ for (count = 0; count < rx_full_count; count++) {
+ x = tegra_qspi_readl(tqspi, QSPI_RX_FIFO) & rx_mask;
+
+ for (i = 0; len && (i < bytes_per_word); i++, len--)
+ *rx_buf++ = (x >> (i * 8)) & 0xff;
+ }
+
+ read_words += rx_full_count;
+ tqspi->cur_rx_pos += read_bytes;
+ }
+
+ return read_words;
+}
+
+static void
+tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ /* Make the dma buffer to read by cpu */
+ dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
+ tqspi->dma_buf_size, DMA_TO_DEVICE);
+
+ if (tqspi->is_packed) {
+ unsigned int len = tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+
+ memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
+ tqspi->cur_tx_pos += tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+ } else {
+ u8 *tx_buf = (u8 *)t->tx_buf + tqspi->cur_tx_pos;
+ unsigned int i;
+ unsigned int count;
+ unsigned int consume;
+ unsigned int write_bytes;
+
+ consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
+ if (consume > t->len - tqspi->cur_pos)
+ consume = t->len - tqspi->cur_pos;
+ write_bytes = consume;
+ for (count = 0; count < tqspi->curr_dma_words; count++) {
+ u32 x = 0;
+
+ for (i = 0; consume && (i < tqspi->bytes_per_word);
+ i++, consume--)
+ x |= (u32)(*tx_buf++) << (i * 8);
+ tqspi->tx_dma_buf[count] = x;
+ }
+
+ tqspi->cur_tx_pos += write_bytes;
+ }
+
+ /* Make the dma buffer to read by dma */
+ dma_sync_single_for_device(tqspi->dev, tqspi->tx_dma_phys,
+ tqspi->dma_buf_size, DMA_TO_DEVICE);
+}
+
+static void
+tegra_qspi_copy_qspi_rxbuf_to_client_rxbuf(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ /* Make the dma buffer to read by cpu */
+ dma_sync_single_for_cpu(tqspi->dev, tqspi->rx_dma_phys,
+ tqspi->dma_buf_size, DMA_FROM_DEVICE);
+
+ if (tqspi->is_packed) {
+ unsigned int len = tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+
+ memcpy(t->rx_buf + tqspi->cur_rx_pos, tqspi->rx_dma_buf, len);
+ tqspi->cur_rx_pos += tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+ } else {
+ unsigned char *rx_buf = t->rx_buf + tqspi->cur_rx_pos;
+ u32 rx_mask = ((u32)1 << t->bits_per_word) - 1;
+ unsigned int i;
+ unsigned int count;
+ unsigned int consume;
+ unsigned int read_bytes;
+
+ consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
+ if (consume > t->len - tqspi->cur_pos)
+ consume = t->len - tqspi->cur_pos;
+ read_bytes = consume;
+ for (count = 0; count < tqspi->curr_dma_words; count++) {
+ u32 x = tqspi->rx_dma_buf[count] & rx_mask;
+
+ for (i = 0; consume && (i < tqspi->bytes_per_word);
+ i++, consume--)
+ *rx_buf++ = (x >> (i * 8)) & 0xFF;
+ }
+
+ tqspi->cur_rx_pos += read_bytes;
+ }
+
+ /* Make the dma buffer to read by dma */
+ dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
+ tqspi->dma_buf_size, DMA_FROM_DEVICE);
+}
+
+static void tegra_qspi_dma_complete(void *args)
+{
+ struct completion *dma_complete = args;
+
+ complete(dma_complete);
+}
+
+static int tegra_qspi_start_tx_dma(struct tegra_qspi_data *tqspi, int len)
+{
+ reinit_completion(&tqspi->tx_dma_complete);
+ tqspi->tx_dma_desc = dmaengine_prep_slave_single(tqspi->tx_dma_chan,
+ tqspi->tx_dma_phys, len, DMA_MEM_TO_DEV,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ if (!tqspi->tx_dma_desc) {
+ dev_err(tqspi->dev, "Not able to get desc for Tx\n");
+ return -EIO;
+ }
+
+ tqspi->tx_dma_desc->callback = tegra_qspi_dma_complete;
+ tqspi->tx_dma_desc->callback_param = &tqspi->tx_dma_complete;
+
+ dmaengine_submit(tqspi->tx_dma_desc);
+ dma_async_issue_pending(tqspi->tx_dma_chan);
+ return 0;
+}
+
+static int tegra_qspi_start_rx_dma(struct tegra_qspi_data *tqspi, int len)
+{
+ reinit_completion(&tqspi->rx_dma_complete);
+ tqspi->rx_dma_desc = dmaengine_prep_slave_single(tqspi->rx_dma_chan,
+ tqspi->rx_dma_phys, len, DMA_DEV_TO_MEM,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ if (!tqspi->rx_dma_desc) {
+ dev_err(tqspi->dev, "Not able to get desc for Rx\n");
+ return -EIO;
+ }
+
+ tqspi->rx_dma_desc->callback = tegra_qspi_dma_complete;
+ tqspi->rx_dma_desc->callback_param = &tqspi->rx_dma_complete;
+
+ dmaengine_submit(tqspi->rx_dma_desc);
+ dma_async_issue_pending(tqspi->rx_dma_chan);
+ return 0;
+}
+
+static int tegra_qspi_flush_fifos(struct tegra_qspi_data *tqspi)
+{
+ unsigned long timeout = jiffies + HZ;
+ u32 status;
+
+ status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if ((status & QSPI_FIFO_EMPTY) == QSPI_FIFO_EMPTY)
+ return 0;
+
+ status |= QSPI_RX_FIFO_FLUSH | QSPI_TX_FIFO_FLUSH;
+ tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
+ while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
+ status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if (time_after(jiffies, timeout)) {
+ dev_err(tqspi->dev,
+ "timeout waiting for fifo flush\n");
+ return -EIO;
+ }
+
+ udelay(1);
+ }
+
+ return 0;
+}
+
+static void tegra_qspi_unmask_irq(struct tegra_qspi_data *tqspi)
+{
+ u32 intr_mask;
+
+ intr_mask = tegra_qspi_readl(tqspi, QSPI_INTR_MASK);
+ intr_mask &= ~(QSPI_INTR_RDY_MASK | QSPI_INTR_RX_TX_FIFO_ERR);
+ tegra_qspi_writel(tqspi, intr_mask, QSPI_INTR_MASK);
+}
+
+static int tegra_qspi_start_dma_based_transfer(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ u32 val;
+ unsigned int len;
+ int ret = 0;
+ u8 dma_burst;
+ struct dma_slave_config dma_sconfig = {0};
+
+ val = QSPI_DMA_BLK_SET(tqspi->curr_dma_words - 1);
+ tegra_qspi_writel(tqspi, val, QSPI_DMA_BLK);
+
+ tegra_qspi_unmask_irq(tqspi);
+
+ if (tqspi->is_packed)
+ len = DIV_ROUND_UP(tqspi->curr_dma_words *
+ tqspi->bytes_per_word, 4) * 4;
+ else
+ len = tqspi->curr_dma_words * 4;
+
+ /* Set attention level based on length of transfer */
+ val = 0;
+ if (len & 0xF) {
+ val |= QSPI_TX_TRIG_1 | QSPI_RX_TRIG_1;
+ dma_burst = 1;
+ } else if (((len) >> 4) & 0x1) {
+ val |= QSPI_TX_TRIG_4 | QSPI_RX_TRIG_4;
+ dma_burst = 4;
+ } else {
+ val |= QSPI_TX_TRIG_8 | QSPI_RX_TRIG_8;
+ dma_burst = 8;
+ }
+
+ tegra_qspi_writel(tqspi, val, QSPI_DMA_CTL);
+ tqspi->dma_control_reg = val;
+
+ dma_sconfig.device_fc = true;
+ if (tqspi->cur_direction & DATA_DIR_TX) {
+ dma_sconfig.dst_addr = tqspi->phys + QSPI_TX_FIFO;
+ dma_sconfig.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ dma_sconfig.dst_maxburst = dma_burst;
+ ret = dmaengine_slave_config(tqspi->tx_dma_chan, &dma_sconfig);
+ if (ret < 0) {
+ dev_err(tqspi->dev,
+ "DMA slave config failed: %d\n", ret);
+ return ret;
+ }
+
+ tegra_qspi_copy_client_txbuf_to_qspi_txbuf(tqspi, t);
+ ret = tegra_qspi_start_tx_dma(tqspi, len);
+ if (ret < 0) {
+ dev_err(tqspi->dev,
+ "Starting tx dma failed: %d\n", ret);
+ return ret;
+ }
+ }
+
+ if (tqspi->cur_direction & DATA_DIR_RX) {
+ dma_sconfig.src_addr = tqspi->phys + QSPI_RX_FIFO;
+ dma_sconfig.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ dma_sconfig.src_maxburst = dma_burst;
+ ret = dmaengine_slave_config(tqspi->rx_dma_chan, &dma_sconfig);
+ if (ret < 0) {
+ dev_err(tqspi->dev,
+ "DMA slave config failed: %d\n", ret);
+ return ret;
+ }
+
+ /* Make the dma buffer to read by dma */
+ dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
+ tqspi->dma_buf_size,
+ DMA_FROM_DEVICE);
+
+ ret = tegra_qspi_start_rx_dma(tqspi, len);
+ if (ret < 0) {
+ dev_err(tqspi->dev,
+ "Starting rx dma failed: %d\n", ret);
+ if (tqspi->cur_direction & DATA_DIR_TX)
+ dmaengine_terminate_all(tqspi->tx_dma_chan);
+ return ret;
+ }
+ }
+
+ tegra_qspi_writel(tqspi, tqspi->command1_reg, QSPI_COMMAND1);
+
+ tqspi->is_curr_dma_xfer = true;
+ tqspi->dma_control_reg = val;
+ val |= QSPI_DMA_EN;
+ tegra_qspi_writel(tqspi, val, QSPI_DMA_CTL);
+
+ return ret;
+}
+
+static int tegra_qspi_start_cpu_based_transfer(struct tegra_qspi_data *qspi,
+ struct spi_transfer *t)
+{
+ u32 val;
+ unsigned int cur_words;
+
+ if (qspi->cur_direction & DATA_DIR_TX)
+ cur_words = tegra_qspi_fill_tx_fifo_from_client_txbuf(qspi, t);
+ else
+ cur_words = qspi->curr_dma_words;
+
+ val = QSPI_DMA_BLK_SET(cur_words - 1);
+ tegra_qspi_writel(qspi, val, QSPI_DMA_BLK);
+
+ tegra_qspi_unmask_irq(qspi);
+
+ qspi->is_curr_dma_xfer = false;
+ val = qspi->command1_reg;
+ val |= QSPI_PIO;
+ tegra_qspi_writel(qspi, val, QSPI_COMMAND1);
+
+ return 0;
+}
+
+static int tegra_qspi_init_dma_param(struct tegra_qspi_data *tqspi,
+ bool dma_to_memory)
+{
+ struct dma_chan *dma_chan;
+ u32 *dma_buf;
+ dma_addr_t dma_phys;
+
+ if (!device_property_present(tqspi->dev, "dmas"))
+ return 0;
+
+ dma_chan = dma_request_chan(tqspi->dev, dma_to_memory ? "rx" : "tx");
+ if (IS_ERR(dma_chan))
+ return dev_err_probe(tqspi->dev, PTR_ERR(dma_chan),
+ "Dma channel is not available\n");
+
+ dma_buf = dma_alloc_coherent(tqspi->dev, tqspi->dma_buf_size,
+ &dma_phys, GFP_KERNEL);
+ if (!dma_buf) {
+ dma_release_channel(dma_chan);
+ return -ENOMEM;
+ }
+
+ if (dma_to_memory) {
+ tqspi->rx_dma_chan = dma_chan;
+ tqspi->rx_dma_buf = dma_buf;
+ tqspi->rx_dma_phys = dma_phys;
+ } else {
+ tqspi->tx_dma_chan = dma_chan;
+ tqspi->tx_dma_buf = dma_buf;
+ tqspi->tx_dma_phys = dma_phys;
+ }
+
+ tqspi->use_dma = true;
+
+ return 0;
+}
+
+static void tegra_qspi_deinit_dma_param(struct tegra_qspi_data *tqspi,
+ bool dma_to_memory)
+{
+ u32 *dma_buf;
+ dma_addr_t dma_phys;
+ struct dma_chan *dma_chan;
+
+ if (dma_to_memory) {
+ dma_buf = tqspi->rx_dma_buf;
+ dma_chan = tqspi->rx_dma_chan;
+ dma_phys = tqspi->rx_dma_phys;
+ tqspi->rx_dma_chan = NULL;
+ tqspi->rx_dma_buf = NULL;
+ } else {
+ dma_buf = tqspi->tx_dma_buf;
+ dma_chan = tqspi->tx_dma_chan;
+ dma_phys = tqspi->tx_dma_phys;
+ tqspi->tx_dma_buf = NULL;
+ tqspi->tx_dma_chan = NULL;
+ }
+ if (!dma_chan)
+ return;
+
+ dma_free_coherent(tqspi->dev, tqspi->dma_buf_size, dma_buf, dma_phys);
+ dma_release_channel(dma_chan);
+}
+
+static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,
+ struct spi_transfer *t,
+ bool is_first_of_msg)
+{
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
+ u32 speed = t->speed_hz;
+ u8 bits_per_word = t->bits_per_word;
+ u32 command1;
+ int req_mode;
+
+ if (speed != tqspi->cur_speed) {
+ clk_set_rate(tqspi->clk, speed);
+ tqspi->cur_speed = speed;
+ }
+
+ tqspi->cur_pos = 0;
+ tqspi->cur_rx_pos = 0;
+ tqspi->cur_tx_pos = 0;
+ tqspi->curr_xfer = t;
+
+ if (is_first_of_msg) {
+ tegra_qspi_mask_clear_irq(tqspi);
+
+ command1 = tqspi->def_command1_reg;
+ command1 |= QSPI_BIT_LENGTH(bits_per_word - 1);
+
+ command1 &= ~QSPI_CONTROL_MODE_MASK;
+ req_mode = spi->mode & 0x3;
+ if (req_mode == SPI_MODE_3)
+ command1 |= QSPI_CONTROL_MODE_3;
+ else
+ command1 |= QSPI_CONTROL_MODE_0;
+ tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
+
+ /* toggle cs to active state */
+ if (spi->mode & SPI_CS_HIGH)
+ command1 |= QSPI_CS_SW_VAL;
+ else
+ command1 &= ~QSPI_CS_SW_VAL;
+ tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
+ } else {
+ command1 = tqspi->command1_reg;
+ command1 &= ~QSPI_BIT_LENGTH(~0);
+ command1 |= QSPI_BIT_LENGTH(bits_per_word - 1);
+ }
+
+ command1 &= ~QSPI_SDR_DDR_SEL;
+
+ return command1;
+}
+
+static int tegra_qspi_start_transfer_one(struct spi_device *spi,
+ struct spi_transfer *t, u32 command1)
+{
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
+ unsigned int total_fifo_words;
+ u8 bus_width = 0;
+ int ret;
+
+ total_fifo_words = tegra_qspi_calculate_curr_xfer_param(tqspi, t);
+
+ command1 &= ~QSPI_PACKED;
+ if (tqspi->is_packed)
+ command1 |= QSPI_PACKED;
+ tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
+
+ command1 &= ~(QSPI_TX_EN | QSPI_RX_EN);
+ tqspi->cur_direction = 0;
+ if (t->rx_buf) {
+ command1 |= QSPI_RX_EN;
+ tqspi->cur_direction |= DATA_DIR_RX;
+ bus_width = t->rx_nbits;
+ }
+
+ if (t->tx_buf) {
+ command1 |= QSPI_TX_EN;
+ tqspi->cur_direction |= DATA_DIR_TX;
+ bus_width = t->tx_nbits;
+ }
+
+ command1 &= ~QSPI_INTERFACE_WIDTH_MASK;
+ if (bus_width == SPI_NBITS_QUAD)
+ command1 |= QSPI_INTERFACE_WIDTH_QUAD;
+ else if (bus_width == SPI_NBITS_DUAL)
+ command1 |= QSPI_INTERFACE_WIDTH_DUAL;
+ else
+ command1 |= QSPI_INTERFACE_WIDTH_SINGLE;
+ tqspi->command1_reg = command1;
+
+ tegra_qspi_writel(tqspi, QSPI_NUM_DUMMY_CYCLE(tqspi->dummy_cycles),
+ QSPI_MISC_REG);
+
+ ret = tegra_qspi_flush_fifos(tqspi);
+ if (ret < 0)
+ return ret;
+
+ if (tqspi->use_dma && total_fifo_words > QSPI_FIFO_DEPTH)
+ ret = tegra_qspi_start_dma_based_transfer(tqspi, t);
+ else
+ ret = tegra_qspi_start_cpu_based_transfer(tqspi, t);
+ return ret;
+}
+
+static struct tegra_qspi_client_data
+ *tegra_qspi_parse_cdata_dt(struct spi_device *spi)
+{
+ struct tegra_qspi_client_data *cdata;
+ struct device_node *slave_np;
+
+ slave_np = spi->dev.of_node;
+ if (!slave_np) {
+ dev_dbg(&spi->dev, "device node not found\n");
+ return NULL;
+ }
+
+ cdata = kzalloc(sizeof(*cdata), GFP_KERNEL);
+ if (!cdata)
+ return NULL;
+
+ of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
+ &cdata->tx_clk_tap_delay);
+ of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
+ &cdata->rx_clk_tap_delay);
+ return cdata;
+}
+
+static void tegra_qspi_cleanup(struct spi_device *spi)
+{
+ struct tegra_qspi_client_data *cdata = spi->controller_data;
+
+ spi->controller_data = NULL;
+ if (spi->dev.of_node)
+ kfree(cdata);
+}
+
+static int tegra_qspi_setup(struct spi_device *spi)
+{
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
+ struct tegra_qspi_client_data *cdata = spi->controller_data;
+ u32 tx_tap = 0, rx_tap = 0;
+ u32 val;
+ unsigned long flags;
+ int ret;
+
+ dev_dbg(&spi->dev, "setup %d bpw, %scpol, %scpha, %dHz\n",
+ spi->bits_per_word,
+ spi->mode & SPI_CPOL ? "" : "~",
+ spi->mode & SPI_CPHA ? "" : "~",
+ spi->max_speed_hz);
+
+ if (!cdata) {
+ cdata = tegra_qspi_parse_cdata_dt(spi);
+ spi->controller_data = cdata;
+ }
+
+ ret = pm_runtime_get_sync(tqspi->dev);
+ if (ret < 0) {
+ dev_err(tqspi->dev, "runtime resume failed: %d\n", ret);
+ if (cdata)
+ tegra_qspi_cleanup(spi);
+ return ret;
+ }
+
+ spin_lock_irqsave(&tqspi->lock, flags);
+ /* keep default cs state to inactive */
+ val = tqspi->def_command1_reg;
+ if (spi->mode & SPI_CS_HIGH)
+ val &= ~QSPI_CS_SW_VAL;
+ else
+ val |= QSPI_CS_SW_VAL;
+
+ tqspi->def_command1_reg = val;
+ tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
+
+ if (cdata && cdata->tx_clk_tap_delay)
+ tx_tap = cdata->tx_clk_tap_delay;
+ if (cdata && cdata->rx_clk_tap_delay)
+ rx_tap = cdata->rx_clk_tap_delay;
+ tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
+ QSPI_RX_TAP_DELAY(rx_tap);
+ tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
+
+ spin_unlock_irqrestore(&tqspi->lock, flags);
+
+ pm_runtime_put(tqspi->dev);
+ return 0;
+}
+
+static void tegra_qspi_dump_regs(struct tegra_qspi_data *tqspi)
+{
+ dev_dbg(tqspi->dev, "============ QSPI REGISTER DUMP ============\n");
+ dev_dbg(tqspi->dev, "Command1: 0x%08x | Command2: 0x%08x\n",
+ tegra_qspi_readl(tqspi, QSPI_COMMAND1),
+ tegra_qspi_readl(tqspi, QSPI_COMMAND2));
+ dev_dbg(tqspi->dev, "DMA_CTL: 0x%08x | DMA_BLK: 0x%08x\n",
+ tegra_qspi_readl(tqspi, QSPI_DMA_CTL),
+ tegra_qspi_readl(tqspi, QSPI_DMA_BLK));
+ dev_dbg(tqspi->dev, "INTR_MASK: 0x%08x | MISC: 0x%08x\n",
+ tegra_qspi_readl(tqspi, QSPI_INTR_MASK),
+ tegra_qspi_readl(tqspi, QSPI_MISC_REG));
+ dev_dbg(tqspi->dev, "TRANS_STAT: 0x%08x | FIFO_STATUS: 0x%08x\n",
+ tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS),
+ tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS));
+}
+
+static int tegra_qspi_transfer_one_message(struct spi_master *master,
+ struct spi_message *msg)
+{
+ bool is_first_msg = true;
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
+ struct spi_transfer *xfer;
+ struct spi_device *spi = msg->spi;
+ u8 dummy_cycles, ntransfers = 0;
+ int ret;
+ bool skip = false;
+ int xfer_phase = 0;
+
+ msg->status = 0;
+ msg->actual_length = 0;
+ tqspi->tx_status = 0;
+ tqspi->rx_status = 0;
+
+ /*
+ * Tegra QSPI hardware support dummy bytes transfer based on the
+ * programmed dummy clock cyles in QSPI register.
+ * So, get the total dummy bytes from the dummy bytes transfer in
+ * spi_messages and convert to dummy clock cyles.
+ */
+ list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+ if (ntransfers == DUMMY_BYTES_XFER &&
+ !(list_is_last(&xfer->transfer_list, &msg->transfers)))
+ dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
+ ntransfers++;
+ }
+
+ list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+ u32 cmd1;
+
+ /*
+ * Program dummy clock cycles in Tegra QSPI register only
+ * during address transfer phase.
+ */
+ if (xfer_phase == ADDR_BYTES_XFER)
+ tqspi->dummy_cycles = dummy_cycles;
+ else
+ tqspi->dummy_cycles = 0;
+
+ /*
+ * Tegra QSPI hardware support dummy bytes transfer.
+ * So, skip the transfer of dummy bytes from the software.
+ */
+ if (ntransfers == MAX_XFERS &&
+ xfer_phase == DUMMY_BYTES_XFER) {
+ msg->actual_length += xfer->len;
+ xfer_phase++;
+ continue;
+ }
+
+ reinit_completion(&tqspi->xfer_completion);
+
+ cmd1 = tegra_qspi_setup_transfer_one(spi, xfer, is_first_msg);
+ if (!xfer->len) {
+ ret = 0;
+ skip = true;
+ goto exit;
+ }
+
+ ret = tegra_qspi_start_transfer_one(spi, xfer, cmd1);
+ if (ret < 0) {
+ dev_err(tqspi->dev,
+ "failed to start transfer: %d\n", ret);
+ goto exit;
+ }
+
+ is_first_msg = false;
+ ret = wait_for_completion_timeout(&tqspi->xfer_completion,
+ QSPI_DMA_TIMEOUT);
+ if (WARN_ON(ret == 0)) {
+ dev_err(tqspi->dev,
+ "qspi transfer timeout: %d\n", ret);
+ if (tqspi->is_curr_dma_xfer &&
+ (tqspi->cur_direction & DATA_DIR_TX))
+ dmaengine_terminate_all(tqspi->tx_dma_chan);
+ if (tqspi->is_curr_dma_xfer &&
+ (tqspi->cur_direction & DATA_DIR_RX))
+ dmaengine_terminate_all(tqspi->rx_dma_chan);
+ ret = -EIO;
+ tegra_qspi_dump_regs(tqspi);
+ tegra_qspi_flush_fifos(tqspi);
+ reset_control_assert(tqspi->rst);
+ udelay(2);
+ reset_control_deassert(tqspi->rst);
+ goto exit;
+ }
+
+ if (tqspi->tx_status || tqspi->rx_status) {
+ ret = -EIO;
+ dev_err(tqspi->dev, "error in transfer: %d\n", ret);
+ tegra_qspi_dump_regs(tqspi);
+ goto exit;
+ }
+
+ msg->actual_length += xfer->len;
+ xfer_phase++;
+ }
+
+ ret = 0;
+exit:
+ tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
+ msg->status = ret;
+ spi_finalize_current_message(master);
+ return ret;
+}
+
+static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi_data *tqspi)
+{
+ struct spi_transfer *t = tqspi->curr_xfer;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tqspi->lock, flags);
+ if (tqspi->tx_status || tqspi->rx_status) {
+ dev_err(tqspi->dev, "CpuXfer ERROR bit set 0x%x\n",
+ tqspi->status_reg);
+ dev_err(tqspi->dev, "CpuXfer 0x%08x:0x%08x\n",
+ tqspi->command1_reg, tqspi->dma_control_reg);
+ tegra_qspi_dump_regs(tqspi);
+ tegra_qspi_flush_fifos(tqspi);
+ complete(&tqspi->xfer_completion);
+ spin_unlock_irqrestore(&tqspi->lock, flags);
+ reset_control_assert(tqspi->rst);
+ udelay(2);
+ reset_control_deassert(tqspi->rst);
+ return IRQ_HANDLED;
+ }
+
+ if (tqspi->cur_direction & DATA_DIR_RX)
+ tegra_qspi_read_rx_fifo_to_client_rxbuf(tqspi, t);
+
+ if (tqspi->cur_direction & DATA_DIR_TX)
+ tqspi->cur_pos = tqspi->cur_tx_pos;
+ else
+ tqspi->cur_pos = tqspi->cur_rx_pos;
+
+ if (tqspi->cur_pos == t->len) {
+ complete(&tqspi->xfer_completion);
+ goto exit;
+ }
+
+ tegra_qspi_calculate_curr_xfer_param(tqspi, t);
+ tegra_qspi_start_cpu_based_transfer(tqspi, t);
+exit:
+ spin_unlock_irqrestore(&tqspi->lock, flags);
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t handle_dma_based_xfer(struct tegra_qspi_data *tqspi)
+{
+ struct spi_transfer *t = tqspi->curr_xfer;
+ long wait_status;
+ int err = 0;
+ unsigned int total_fifo_words;
+ unsigned long flags;
+
+ /* Abort dmas if any error */
+ if (tqspi->cur_direction & DATA_DIR_TX) {
+ if (tqspi->tx_status) {
+ dmaengine_terminate_all(tqspi->tx_dma_chan);
+ err += 1;
+ } else {
+ wait_status = wait_for_completion_interruptible_timeout(
+ &tqspi->tx_dma_complete, QSPI_DMA_TIMEOUT);
+ if (wait_status <= 0) {
+ dmaengine_terminate_all(tqspi->tx_dma_chan);
+ dev_err(tqspi->dev, "TxDma Xfer failed\n");
+ err += 1;
+ }
+ }
+ }
+
+ if (tqspi->cur_direction & DATA_DIR_RX) {
+ if (tqspi->rx_status) {
+ dmaengine_terminate_all(tqspi->rx_dma_chan);
+ err += 2;
+ } else {
+ wait_status = wait_for_completion_interruptible_timeout(
+ &tqspi->rx_dma_complete, QSPI_DMA_TIMEOUT);
+ if (wait_status <= 0) {
+ dmaengine_terminate_all(tqspi->rx_dma_chan);
+ dev_err(tqspi->dev, "RxDma Xfer failed\n");
+ err += 2;
+ }
+ }
+ }
+
+ spin_lock_irqsave(&tqspi->lock, flags);
+ if (err) {
+ dev_err(tqspi->dev, "DmaXfer: ERROR bit set 0x%x\n",
+ tqspi->status_reg);
+ dev_err(tqspi->dev, "DmaXfer 0x%08x:0x%08x\n",
+ tqspi->command1_reg, tqspi->dma_control_reg);
+ tegra_qspi_dump_regs(tqspi);
+ tegra_qspi_flush_fifos(tqspi);
+ complete(&tqspi->xfer_completion);
+ spin_unlock_irqrestore(&tqspi->lock, flags);
+ reset_control_assert(tqspi->rst);
+ udelay(2);
+ reset_control_deassert(tqspi->rst);
+ return IRQ_HANDLED;
+ }
+
+ if (tqspi->cur_direction & DATA_DIR_RX)
+ tegra_qspi_copy_qspi_rxbuf_to_client_rxbuf(tqspi, t);
+
+ if (tqspi->cur_direction & DATA_DIR_TX)
+ tqspi->cur_pos = tqspi->cur_tx_pos;
+ else
+ tqspi->cur_pos = tqspi->cur_rx_pos;
+
+ if (tqspi->cur_pos == t->len) {
+ complete(&tqspi->xfer_completion);
+ goto exit;
+ }
+
+ /* Continue transfer in current message */
+ total_fifo_words = tegra_qspi_calculate_curr_xfer_param(tqspi, t);
+ if (total_fifo_words > QSPI_FIFO_DEPTH)
+ err = tegra_qspi_start_dma_based_transfer(tqspi, t);
+ else
+ err = tegra_qspi_start_cpu_based_transfer(tqspi, t);
+
+exit:
+ spin_unlock_irqrestore(&tqspi->lock, flags);
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t tegra_qspi_isr_thread(int irq, void *context_data)
+{
+ struct tegra_qspi_data *tqspi = context_data;
+
+ if (!tqspi->is_curr_dma_xfer)
+ return handle_cpu_based_xfer(tqspi);
+ return handle_dma_based_xfer(tqspi);
+}
+
+static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
+{
+ struct tegra_qspi_data *tqspi = context_data;
+
+ tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if (tqspi->cur_direction & DATA_DIR_TX)
+ tqspi->tx_status = tqspi->status_reg &
+ (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
+
+ if (tqspi->cur_direction & DATA_DIR_RX)
+ tqspi->rx_status = tqspi->status_reg &
+ (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
+ tegra_qspi_mask_clear_irq(tqspi);
+
+ return IRQ_WAKE_THREAD;
+}
+
+static const struct of_device_id tegra_qspi_of_match[] = {
+ { .compatible = "nvidia,tegra210-qspi", },
+ {}
+};
+MODULE_DEVICE_TABLE(of, tegra_qspi_of_match);
+
+static int tegra_qspi_probe(struct platform_device *pdev)
+{
+ struct spi_master *master;
+ struct tegra_qspi_data *tqspi;
+ struct resource *r;
+ int ret, qspi_irq;
+ int bus_num;
+
+ master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
+ if (!master) {
+ dev_err(&pdev->dev, "master allocation failed\n");
+ return -ENOMEM;
+ }
+
+ platform_set_drvdata(pdev, master);
+ tqspi = spi_master_get_devdata(master);
+
+ if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
+ &master->max_speed_hz))
+ master->max_speed_hz = QSPI_MAX_SPEED;
+
+ /* the spi->mode bits understood by this driver: */
+ master->mode_bits = SPI_MODE_0 | SPI_MODE_3 | SPI_CS_HIGH |
+ SPI_TX_DUAL | SPI_RX_DUAL | SPI_TX_QUAD |
+ SPI_RX_QUAD;
+ master->bits_per_word_mask = SPI_BPW_MASK(32) | SPI_BPW_MASK(16) |
+ SPI_BPW_MASK(8);
+ master->setup = tegra_qspi_setup;
+ master->cleanup = tegra_qspi_cleanup;
+ master->transfer_one_message = tegra_qspi_transfer_one_message;
+ master->num_chipselect = 1;
+ master->auto_runtime_pm = true;
+ bus_num = of_alias_get_id(pdev->dev.of_node, "spi");
+ if (bus_num >= 0)
+ master->bus_num = bus_num;
+
+ tqspi->master = master;
+ tqspi->dev = &pdev->dev;
+ spin_lock_init(&tqspi->lock);
+
+ r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ tqspi->base = devm_ioremap_resource(&pdev->dev, r);
+ if (IS_ERR(tqspi->base)) {
+ ret = PTR_ERR(tqspi->base);
+ goto exit_free_master;
+ }
+
+ tqspi->phys = r->start;
+ qspi_irq = platform_get_irq(pdev, 0);
+ tqspi->irq = qspi_irq;
+
+ tqspi->clk = devm_clk_get(&pdev->dev, "qspi");
+ if (IS_ERR(tqspi->clk)) {
+ ret = PTR_ERR(tqspi->clk);
+ dev_err(&pdev->dev, "failed to get clock: %d\n", ret);
+ goto exit_free_master;
+ }
+
+ tqspi->rst = devm_reset_control_get_exclusive(&pdev->dev, "qspi");
+ if (IS_ERR(tqspi->rst)) {
+ ret = PTR_ERR(tqspi->rst);
+ dev_err(&pdev->dev, "failed to get reset control: %d\n", ret);
+ goto exit_free_master;
+ }
+
+ tqspi->max_buf_size = QSPI_FIFO_DEPTH << 2;
+ tqspi->dma_buf_size = DEFAULT_QSPI_DMA_BUF_LEN;
+
+ ret = tegra_qspi_init_dma_param(tqspi, true);
+ if (ret < 0)
+ goto exit_free_master;
+ ret = tegra_qspi_init_dma_param(tqspi, false);
+ if (ret < 0)
+ goto exit_rx_dma_free;
+
+ if (tqspi->use_dma)
+ tqspi->max_buf_size = tqspi->dma_buf_size;
+
+ init_completion(&tqspi->tx_dma_complete);
+ init_completion(&tqspi->rx_dma_complete);
+
+ init_completion(&tqspi->xfer_completion);
+
+ pm_runtime_enable(&pdev->dev);
+ if (!pm_runtime_enabled(&pdev->dev)) {
+ ret = tegra_qspi_runtime_resume(&pdev->dev);
+ if (ret)
+ goto exit_pm_disable;
+ }
+
+ ret = pm_runtime_get_sync(&pdev->dev);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "runtime resume failed: %d\n", ret);
+ pm_runtime_put_noidle(&pdev->dev);
+ goto exit_pm_disable;
+ }
+
+ reset_control_assert(tqspi->rst);
+ udelay(2);
+ reset_control_deassert(tqspi->rst);
+ tqspi->def_command1_reg = QSPI_M_S | QSPI_CS_SW_HW | QSPI_CS_SW_VAL;
+ tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
+ tqspi->spi_cs_timing1 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING1);
+ tqspi->spi_cs_timing2 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING2);
+ tqspi->def_command2_reg = tegra_qspi_readl(tqspi, QSPI_COMMAND2);
+
+ pm_runtime_put(&pdev->dev);
+
+ ret = request_threaded_irq(tqspi->irq, tegra_qspi_isr,
+ tegra_qspi_isr_thread, IRQF_ONESHOT,
+ dev_name(&pdev->dev), tqspi);
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "failed to request IRQ#%u: %d\n", tqspi->irq, ret);
+ goto exit_pm_disable;
+ }
+
+ master->dev.of_node = pdev->dev.of_node;
+ ret = devm_spi_register_master(&pdev->dev, master);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "failed to register master: %d\n", ret);
+ goto exit_free_irq;
+ }
+ return ret;
+
+exit_free_irq:
+ free_irq(qspi_irq, tqspi);
+exit_pm_disable:
+ pm_runtime_disable(&pdev->dev);
+ if (!pm_runtime_status_suspended(&pdev->dev))
+ tegra_qspi_runtime_suspend(&pdev->dev);
+ tegra_qspi_deinit_dma_param(tqspi, false);
+exit_rx_dma_free:
+ tegra_qspi_deinit_dma_param(tqspi, true);
+exit_free_master:
+ spi_master_put(master);
+ return ret;
+}
+
+static int tegra_qspi_remove(struct platform_device *pdev)
+{
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
+
+ free_irq(tqspi->irq, tqspi);
+
+ tegra_qspi_deinit_dma_param(tqspi, false);
+ tegra_qspi_deinit_dma_param(tqspi, true);
+
+ pm_runtime_disable(&pdev->dev);
+ if (!pm_runtime_status_suspended(&pdev->dev))
+ tegra_qspi_runtime_suspend(&pdev->dev);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int tegra_qspi_suspend(struct device *dev)
+{
+ struct spi_master *master = dev_get_drvdata(dev);
+
+ return spi_master_suspend(master);
+}
+
+static int tegra_qspi_resume(struct device *dev)
+{
+ struct spi_master *master = dev_get_drvdata(dev);
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
+ int ret;
+
+ ret = pm_runtime_get_sync(dev);
+ if (ret < 0) {
+ dev_err(dev, "runtime resume failed: %d\n", ret);
+ return ret;
+ }
+
+ tegra_qspi_writel(tqspi, tqspi->command1_reg, QSPI_COMMAND1);
+ tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
+ pm_runtime_put(dev);
+
+ return spi_master_resume(master);
+}
+#endif
+
+static int tegra_qspi_runtime_suspend(struct device *dev)
+{
+ struct spi_master *master = dev_get_drvdata(dev);
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
+
+ /* Flush all write which are in PPSB queue by reading back */
+ tegra_qspi_readl(tqspi, QSPI_COMMAND1);
+
+ clk_disable_unprepare(tqspi->clk);
+ return 0;
+}
+
+static int tegra_qspi_runtime_resume(struct device *dev)
+{
+ struct spi_master *master = dev_get_drvdata(dev);
+ struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
+ int ret;
+
+ ret = clk_prepare_enable(tqspi->clk);
+ if (ret < 0) {
+ dev_err(tqspi->dev, "clk_prepare failed: %d\n", ret);
+ return ret;
+ }
+ return 0;
+}
+
+static const struct dev_pm_ops tegra_qspi_pm_ops = {
+ SET_RUNTIME_PM_OPS(tegra_qspi_runtime_suspend,
+ tegra_qspi_runtime_resume, NULL)
+ SET_SYSTEM_SLEEP_PM_OPS(tegra_qspi_suspend, tegra_qspi_resume)
+};
+
+static struct platform_driver tegra_qspi_driver = {
+ .driver = {
+ .name = "tegra-qspi",
+ .pm = &tegra_qspi_pm_ops,
+ .of_match_table = tegra_qspi_of_match,
+ },
+ .probe = tegra_qspi_probe,
+ .remove = tegra_qspi_remove,
+};
+module_platform_driver(tegra_qspi_driver);
+
+MODULE_ALIAS("platform:qspi-tegra");
+MODULE_DESCRIPTION("NVIDIA Tegra QSPI Controller Driver");
+MODULE_AUTHOR("Sowjanya Komatineni <[email protected]>");
+MODULE_LICENSE("GPL v2");
--
2.7.4
Jetson Xavier NX has Spansion S25FS26S Quad SPI Flash.
This patch enables QSPI on Jetson Xavier NX.
Signed-off-by: Sowjanya Komatineni <[email protected]>
---
.../arm64/boot/dts/nvidia/tegra194-p3509-0000+p3668-0000.dts | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p3509-0000+p3668-0000.dts b/arch/arm64/boot/dts/nvidia/tegra194-p3509-0000+p3668-0000.dts
index 7f97b34..f1053e7 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p3509-0000+p3668-0000.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p3509-0000+p3668-0000.dts
@@ -100,6 +100,18 @@
phy-names = "usb2-1", "usb2-2", "usb3-2";
};
+ spi@3270000 {
+ status = "okay";
+
+ flash@0 {
+ compatible = "spi-nor";
+ reg = <0>;
+ spi-max-frequency = <102000000>;
+ spi-tx-bus-width = <4>;
+ spi-rx-bus-width = <4>;
+ };
+ };
+
pwm@32d0000 {
status = "okay";
};
--
2.7.4
Hi Sowjanya,
I love your patch! Perhaps something to improve:
[auto build test WARNING on spi/for-next]
[also build test WARNING on robh/for-next tegra/for-next v5.10-rc6 next-20201201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Sowjanya-Komatineni/Add-Tegra-QSPI-driver/20201202-051629
base: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
config: riscv-allyesconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/a8da475b681c83105cdb8daf7408cc92aca9f65a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Sowjanya-Komatineni/Add-Tegra-QSPI-driver/20201202-051629
git checkout a8da475b681c83105cdb8daf7408cc92aca9f65a
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
drivers/spi/qspi-tegra.c: In function 'tegra_qspi_transfer_one_message':
>> drivers/spi/qspi-tegra.c:935:7: warning: variable 'skip' set but not used [-Wunused-but-set-variable]
935 | bool skip = false;
| ^~~~
vim +/skip +935 drivers/spi/qspi-tegra.c
925
926 static int tegra_qspi_transfer_one_message(struct spi_master *master,
927 struct spi_message *msg)
928 {
929 bool is_first_msg = true;
930 struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
931 struct spi_transfer *xfer;
932 struct spi_device *spi = msg->spi;
933 u8 dummy_cycles, ntransfers = 0;
934 int ret;
> 935 bool skip = false;
936 int xfer_phase = 0;
937
938 msg->status = 0;
939 msg->actual_length = 0;
940 tqspi->tx_status = 0;
941 tqspi->rx_status = 0;
942
943 /*
944 * Tegra QSPI hardware support dummy bytes transfer based on the
945 * programmed dummy clock cyles in QSPI register.
946 * So, get the total dummy bytes from the dummy bytes transfer in
947 * spi_messages and convert to dummy clock cyles.
948 */
949 list_for_each_entry(xfer, &msg->transfers, transfer_list) {
950 if (ntransfers == DUMMY_BYTES_XFER &&
951 !(list_is_last(&xfer->transfer_list, &msg->transfers)))
952 dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
953 ntransfers++;
954 }
955
956 list_for_each_entry(xfer, &msg->transfers, transfer_list) {
957 u32 cmd1;
958
959 /*
960 * Program dummy clock cycles in Tegra QSPI register only
961 * during address transfer phase.
962 */
963 if (xfer_phase == ADDR_BYTES_XFER)
964 tqspi->dummy_cycles = dummy_cycles;
965 else
966 tqspi->dummy_cycles = 0;
967
968 /*
969 * Tegra QSPI hardware support dummy bytes transfer.
970 * So, skip the transfer of dummy bytes from the software.
971 */
972 if (ntransfers == MAX_XFERS &&
973 xfer_phase == DUMMY_BYTES_XFER) {
974 msg->actual_length += xfer->len;
975 xfer_phase++;
976 continue;
977 }
978
979 reinit_completion(&tqspi->xfer_completion);
980
981 cmd1 = tegra_qspi_setup_transfer_one(spi, xfer, is_first_msg);
982 if (!xfer->len) {
983 ret = 0;
984 skip = true;
985 goto exit;
986 }
987
988 ret = tegra_qspi_start_transfer_one(spi, xfer, cmd1);
989 if (ret < 0) {
990 dev_err(tqspi->dev,
991 "failed to start transfer: %d\n", ret);
992 goto exit;
993 }
994
995 is_first_msg = false;
996 ret = wait_for_completion_timeout(&tqspi->xfer_completion,
997 QSPI_DMA_TIMEOUT);
998 if (WARN_ON(ret == 0)) {
999 dev_err(tqspi->dev,
1000 "qspi transfer timeout: %d\n", ret);
1001 if (tqspi->is_curr_dma_xfer &&
1002 (tqspi->cur_direction & DATA_DIR_TX))
1003 dmaengine_terminate_all(tqspi->tx_dma_chan);
1004 if (tqspi->is_curr_dma_xfer &&
1005 (tqspi->cur_direction & DATA_DIR_RX))
1006 dmaengine_terminate_all(tqspi->rx_dma_chan);
1007 ret = -EIO;
1008 tegra_qspi_dump_regs(tqspi);
1009 tegra_qspi_flush_fifos(tqspi);
1010 reset_control_assert(tqspi->rst);
1011 udelay(2);
1012 reset_control_deassert(tqspi->rst);
1013 goto exit;
1014 }
1015
1016 if (tqspi->tx_status || tqspi->rx_status) {
1017 ret = -EIO;
1018 dev_err(tqspi->dev, "error in transfer: %d\n", ret);
1019 tegra_qspi_dump_regs(tqspi);
1020 goto exit;
1021 }
1022
1023 msg->actual_length += xfer->len;
1024 xfer_phase++;
1025 }
1026
1027 ret = 0;
1028 exit:
1029 tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
1030 msg->status = ret;
1031 spi_finalize_current_message(master);
1032 return ret;
1033 }
1034
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Tue, Dec 01, 2020 at 01:12:45PM -0800, Sowjanya Komatineni wrote:
> QSPI controller on Tegra186 and Tegra194 is similar to Tegra210
> QSPI controller in terms of transferring command, address and data
> in multiple transfers.
Given that you're adding the driver in the same series this doesn't
really seem like it needs a separate patch.
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
> Tegra SoC has a Quad SPI controller starting from Tegra210.
>
> This patch adds support for Tegra210 QSPI controller.
This looks pretty clean but I've got a few questions below about how
this integrates with the frameworks as well as some more minor issues.
> +config QSPI_TEGRA
> + tristate "Nvidia Tegra QSPI Controller"
Everything else in this file is SPI_, even the qspi controllers.
> +++ b/drivers/spi/qspi-tegra.c
> @@ -0,0 +1,1418 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 NVIDIA CORPORATION. All rights reserved.
> + */
Please make the entire comment a C++ one. It also appears that the "All
rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
> +static void
> +tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
> + struct spi_transfer *t)
> +{
> + /* Make the dma buffer to read by cpu */
> + dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
> + tqspi->dma_buf_size, DMA_TO_DEVICE);
> +
> + if (tqspi->is_packed) {
> + unsigned int len = tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
> +
> + memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
> + tqspi->cur_tx_pos += tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
It seems weird that this device needs us to do a memcpy() to do DMA,
most devices are able to DMA directly from the buffers provided by the
SPI API (and let the SPI core sync things). What is going on here?
> + tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
> + while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
> + if (time_after(jiffies, timeout)) {
> + dev_err(tqspi->dev,
> + "timeout waiting for fifo flush\n");
> + return -EIO;
> + }
> +
> + udelay(1);
> + }
It'd be good to put a cpu_relax() in the busy loop.
> +static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,
> + struct spi_transfer *t,
> + bool is_first_of_msg)
> +{
> + /* toggle cs to active state */
> + if (spi->mode & SPI_CS_HIGH)
> + command1 |= QSPI_CS_SW_VAL;
> + else
> + command1 &= ~QSPI_CS_SW_VAL;
> + tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
This is worrying, the client device might be confused if /CS is doing
things outside of the standard handling.
> + of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
> + &cdata->tx_clk_tap_delay);
> + of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
> + &cdata->rx_clk_tap_delay);
These properties are not mentioned in the binding document.
> +static int tegra_qspi_setup(struct spi_device *spi)
> +{
> + if (cdata && cdata->tx_clk_tap_delay)
> + tx_tap = cdata->tx_clk_tap_delay;
> + if (cdata && cdata->rx_clk_tap_delay)
> + rx_tap = cdata->rx_clk_tap_delay;
> + tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
> + QSPI_RX_TAP_DELAY(rx_tap);
> + tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
The setup for one device shouldn't be able to affect the operation of
another, already running, device so either these need to be configured
as part of the controller probe or these configurations need to be
deferred until we're actually doing a transfer.
> + /*
> + * Tegra QSPI hardware support dummy bytes transfer based on the
> + * programmed dummy clock cyles in QSPI register.
> + * So, get the total dummy bytes from the dummy bytes transfer in
> + * spi_messages and convert to dummy clock cyles.
> + */
> + list_for_each_entry(xfer, &msg->transfers, transfer_list) {
> + if (ntransfers == DUMMY_BYTES_XFER &&
> + !(list_is_last(&xfer->transfer_list, &msg->transfers)))
> + dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
> + ntransfers++;
> + }
This seems weird, there's some hard coded assumption about particular
patterns that the client device is going to send. What's going on here?
I don't really understand what this is trying to do.
> +static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
> +{
> + struct tegra_qspi_data *tqspi = context_data;
> +
> + tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
> + if (tqspi->cur_direction & DATA_DIR_TX)
> + tqspi->tx_status = tqspi->status_reg &
> + (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
> +
> + if (tqspi->cur_direction & DATA_DIR_RX)
> + tqspi->rx_status = tqspi->status_reg &
> + (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
> + tegra_qspi_mask_clear_irq(tqspi);
> +
> + return IRQ_WAKE_THREAD;
> +}
It's a bit unclear to me the value we gain from having this handler - if
we don't specify a handler genirq will already mask the interrupt until
we get to the thread anyway and we could just read the status in the
threaded handler. OTOH it doesn't do any harm, just struck me as a bit
odd.
> + master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
> + if (!master) {
> + dev_err(&pdev->dev, "master allocation failed\n");
> + return -ENOMEM;
> + }
Please switch to using the devm_ version of the API to allocate
controller, it makes things much more robust.
> + if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
> + &master->max_speed_hz))
> + master->max_speed_hz = QSPI_MAX_SPEED;
The core will do this for you.
On 12/2/20 9:27 AM, Mark Brown wrote:
> On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
>> Tegra SoC has a Quad SPI controller starting from Tegra210.
>>
>> This patch adds support for Tegra210 QSPI controller.
> This looks pretty clean but I've got a few questions below about how
> this integrates with the frameworks as well as some more minor issues.
>
>> +config QSPI_TEGRA
>> + tristate "Nvidia Tegra QSPI Controller"
> Everything else in this file is SPI_, even the qspi controllers.
Will rename in v2
>> +++ b/drivers/spi/qspi-tegra.c
>> @@ -0,0 +1,1418 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (C) 2020 NVIDIA CORPORATION. All rights reserved.
>> + */
> Please make the entire comment a C++ one. It also appears that the "All
> rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
Will fix in v2
>
>> +static void
>> +tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
>> + struct spi_transfer *t)
>> +{
>> + /* Make the dma buffer to read by cpu */
>> + dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
>> + tqspi->dma_buf_size, DMA_TO_DEVICE);
>> +
>> + if (tqspi->is_packed) {
>> + unsigned int len = tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
>> +
>> + memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
>> + tqspi->cur_tx_pos += tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
> It seems weird that this device needs us to do a memcpy() to do DMA,
> most devices are able to DMA directly from the buffers provided by the
> SPI API (and let the SPI core sync things). What is going on here?
For transfers of size more than max DMA transfer limit, data transfer
happens in multiple iterations with each iteration transferring up to
max DMA transfer limit.
So using separate dma buffers and on every iteration copying them to SPI
core provided tx/rx buffers.
Transferring data logic in this driver is similar as Tegra SPI driver
except register changes and some QSPI specific register programming.
>
>> + tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
>> + while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
>> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>> + if (time_after(jiffies, timeout)) {
>> + dev_err(tqspi->dev,
>> + "timeout waiting for fifo flush\n");
>> + return -EIO;
>> + }
>> +
>> + udelay(1);
>> + }
> It'd be good to put a cpu_relax() in the busy loop.
Will update in v2.
>
>> +static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,
>> + struct spi_transfer *t,
>> + bool is_first_of_msg)
>> +{
>> + /* toggle cs to active state */
>> + if (spi->mode & SPI_CS_HIGH)
>> + command1 |= QSPI_CS_SW_VAL;
>> + else
>> + command1 &= ~QSPI_CS_SW_VAL;
>> + tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
> This is worrying, the client device might be confused if /CS is doing
> things outside of the standard handling.
Do you mean to honor spi_transfer cs_change flag?
Tegra QSPI is master and is used only with QSPI flash devices. Looking
at SPI NOR driver, I see QSPI Flash commands are executed with one flash
command per spi_message and I dont see cs_change flag usage w.r.t QSPI
flash. So, using SW based CS control for QSPI.
Please correct me if I miss something to understand here.
Also Tegra186 and later QSPI controller supports combined sequence mode
where command, address, data phases can be combined in a single GO.
This saves some cycles in transfer and for this we need to use SW based
CS control only.
>> + of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
>> + &cdata->tx_clk_tap_delay);
>> + of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
>> + &cdata->rx_clk_tap_delay);
> These properties are not mentioned in the binding document.
Thanks Mark. Missed them. Will add in v2.
>
>> +static int tegra_qspi_setup(struct spi_device *spi)
>> +{
>> + if (cdata && cdata->tx_clk_tap_delay)
>> + tx_tap = cdata->tx_clk_tap_delay;
>> + if (cdata && cdata->rx_clk_tap_delay)
>> + rx_tap = cdata->rx_clk_tap_delay;
>> + tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
>> + QSPI_RX_TAP_DELAY(rx_tap);
>> + tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
> The setup for one device shouldn't be able to affect the operation of
> another, already running, device so either these need to be configured
> as part of the controller probe or these configurations need to be
> deferred until we're actually doing a transfer.
We will only have 1 device on QSPI as we only support single chip select.
>
>> + /*
>> + * Tegra QSPI hardware support dummy bytes transfer based on the
>> + * programmed dummy clock cyles in QSPI register.
>> + * So, get the total dummy bytes from the dummy bytes transfer in
>> + * spi_messages and convert to dummy clock cyles.
>> + */
>> + list_for_each_entry(xfer, &msg->transfers, transfer_list) {
>> + if (ntransfers == DUMMY_BYTES_XFER &&
>> + !(list_is_last(&xfer->transfer_list, &msg->transfers)))
>> + dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
>> + ntransfers++;
>> + }
> This seems weird, there's some hard coded assumption about particular
> patterns that the client device is going to send. What's going on here?
> I don't really understand what this is trying to do.
QSPI flash needs dummy cycles for data read operation which is actually
the initial read latency and no. of dummy cycles required are vendor
specific.
SPI NOR driver gets required dummy cycles based on mode clock cycles and
wait state clock cycles.
During read operations, spi_nor_spimem_read_data() converts dummy cycles
to number of dummy bytes.
Tegra QSPI controller supports dummy clock cycles register and when
programmed QSPI controller sends dummy bytes rather than SW handling
extra cycles for transferring dummy bytes.
Above equation converts this dummy bytes back to dummy clock cycles to
program into QSPI register and avoid manual SW transfer of dummy bytes.
>
>> +static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
>> +{
>> + struct tegra_qspi_data *tqspi = context_data;
>> +
>> + tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>> + if (tqspi->cur_direction & DATA_DIR_TX)
>> + tqspi->tx_status = tqspi->status_reg &
>> + (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
>> +
>> + if (tqspi->cur_direction & DATA_DIR_RX)
>> + tqspi->rx_status = tqspi->status_reg &
>> + (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
>> + tegra_qspi_mask_clear_irq(tqspi);
>> +
>> + return IRQ_WAKE_THREAD;
>> +}
> It's a bit unclear to me the value we gain from having this handler - if
> we don't specify a handler genirq will already mask the interrupt until
> we get to the thread anyway and we could just read the status in the
> threaded handler. OTOH it doesn't do any harm, just struck me as a bit
> odd.
I started QSPI driver by taking SPI driver as data transfer and
interrupt handling are similar.
So kept this handler for clearing status registers and masking
interrupts as I did not see anything wrong with this.
>
>> + master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
>> + if (!master) {
>> + dev_err(&pdev->dev, "master allocation failed\n");
>> + return -ENOMEM;
>> + }
> Please switch to using the devm_ version of the API to allocate
> controller, it makes things much more robust.
Will update in v2
>
>> + if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
>> + &master->max_speed_hz))
>> + master->max_speed_hz = QSPI_MAX_SPEED;
> The core will do this for you.
Will remove this in v2.
Thanks
Sowjanya
On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:
>
> On 12/2/20 9:27 AM, Mark Brown wrote:
>> On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
>>> Tegra SoC has a Quad SPI controller starting from Tegra210.
>>>
>>> This patch adds support for Tegra210 QSPI controller.
>> This looks pretty clean but I've got a few questions below about how
>> this integrates with the frameworks as well as some more minor issues.
>>
>>> +config QSPI_TEGRA
>>> +??? tristate "Nvidia Tegra QSPI Controller"
>> Everything else in this file is SPI_, even the qspi controllers.
> Will rename in v2
>>> +++ b/drivers/spi/qspi-tegra.c
>>> @@ -0,0 +1,1418 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Copyright (C) 2020 NVIDIA CORPORATION.? All rights reserved.
>>> + */
>> Please make the entire comment a C++ one.? It also appears that the "All
>> rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
> Will fix in v2
>>
>>> +static void
>>> +tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data
>>> *tqspi,
>>> +?????????????????????? struct spi_transfer *t)
>>> +{
>>> +??? /* Make the dma buffer to read by cpu */
>>> +??? dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
>>> +??????????????? tqspi->dma_buf_size, DMA_TO_DEVICE);
>>> +
>>> +??? if (tqspi->is_packed) {
>>> +??????? unsigned int len = tqspi->curr_dma_words *
>>> +?????????????????? tqspi->bytes_per_word;
>>> +
>>> +??????? memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
>>> +??????? tqspi->cur_tx_pos += tqspi->curr_dma_words *
>>> +???????????????????? tqspi->bytes_per_word;
>> It seems weird that this device needs us to do a memcpy() to do DMA,
>> most devices are able to DMA directly from the buffers provided by the
>> SPI API (and let the SPI core sync things).? What is going on here?
>
> For transfers of size more than max DMA transfer limit, data transfer
> happens in multiple iterations with each iteration transferring up to
> max DMA transfer limit.
>
> So using separate dma buffers and on every iteration copying them to
> SPI core provided tx/rx buffers.
Also unpack mode needs to manually put the bytes together from read data
to SPI core rx buffer.
>
> Transferring data logic in this driver is similar as Tegra SPI driver
> except register changes and some QSPI specific register programming.
>
>>
>>> +??? tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
>>> +??? while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
>>> +??????? status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>>> +??????? if (time_after(jiffies, timeout)) {
>>> +??????????? dev_err(tqspi->dev,
>>> +??????????????? "timeout waiting for fifo flush\n");
>>> +??????????? return -EIO;
>>> +??????? }
>>> +
>>> +??????? udelay(1);
>>> +??? }
>> It'd be good to put a cpu_relax() in the busy loop.
> Will update in v2.
>>
>>> +static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,
>>> +???????????????????? struct spi_transfer *t,
>>> +???????????????????? bool is_first_of_msg)
>>> +{
>>> +??????? /* toggle cs to active state */
>>> +??????? if (spi->mode & SPI_CS_HIGH)
>>> +??????????? command1 |= QSPI_CS_SW_VAL;
>>> +??????? else
>>> +??????????? command1 &= ~QSPI_CS_SW_VAL;
>>> +??????? tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
>> This is worrying, the client device might be confused if /CS is doing
>> things outside of the standard handling.
>
> Do you mean to honor spi_transfer cs_change flag?
>
> Tegra QSPI is master and is used only with QSPI flash devices. Looking
> at SPI NOR driver, I see QSPI Flash commands are executed with one
> flash command per spi_message and I dont see cs_change flag usage
> w.r.t QSPI flash. So, using SW based CS control for QSPI.
>
> Please correct me if I miss something to understand here.
>
> Also Tegra186 and later QSPI controller supports combined sequence
> mode where command, address, data phases can be combined in a single GO.
>
> This saves some cycles in transfer and for this we need to use SW
> based CS control only.
>
>
>>> +??? of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
>>> +???????????????? &cdata->tx_clk_tap_delay);
>>> +??? of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
>>> +???????????????? &cdata->rx_clk_tap_delay);
>> These properties are not mentioned in the binding document.
> Thanks Mark. Missed them. Will add in v2.
>>
>>> +static int tegra_qspi_setup(struct spi_device *spi)
>>> +{
>>> +??? if (cdata && cdata->tx_clk_tap_delay)
>>> +??????? tx_tap = cdata->tx_clk_tap_delay;
>>> +??? if (cdata && cdata->rx_clk_tap_delay)
>>> +??????? rx_tap = cdata->rx_clk_tap_delay;
>>> +??? tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
>>> +????????????????? QSPI_RX_TAP_DELAY(rx_tap);
>>> +??? tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
>> The setup for one device shouldn't be able to affect the operation of
>> another, already running, device so either these need to be configured
>> as part of the controller probe or these configurations need to be
>> deferred until we're actually doing a transfer.
> We will only have 1 device on QSPI as we only support single chip select.
>>
>>> +??? /*
>>> +???? * Tegra QSPI hardware support dummy bytes transfer based on the
>>> +???? * programmed dummy clock cyles in QSPI register.
>>> +???? * So, get the total dummy bytes from the dummy bytes transfer in
>>> +???? * spi_messages and convert to dummy clock cyles.
>>> +???? */
>>> +??? list_for_each_entry(xfer, &msg->transfers, transfer_list) {
>>> +??????? if (ntransfers == DUMMY_BYTES_XFER &&
>>> +??????????? !(list_is_last(&xfer->transfer_list, &msg->transfers)))
>>> +??????????? dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
>>> +??????? ntransfers++;
>>> +??? }
>> This seems weird, there's some hard coded assumption about particular
>> patterns that the client device is going to send.? What's going on here?
>> I don't really understand what this is trying to do.
>
> QSPI flash needs dummy cycles for data read operation which is
> actually the initial read latency and no. of dummy cycles required are
> vendor specific.
>
> SPI NOR driver gets required dummy cycles based on mode clock cycles
> and wait state clock cycles.
>
> During read operations, spi_nor_spimem_read_data() converts dummy
> cycles to number of dummy bytes.
>
> Tegra QSPI controller supports dummy clock cycles register and when
> programmed QSPI controller sends dummy bytes rather than SW handling
> extra cycles for transferring dummy bytes.
>
> Above equation converts this dummy bytes back to dummy clock cycles to
> program into QSPI register and avoid manual SW transfer of dummy bytes.
>
>>
>>> +static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
>>> +{
>>> +??? struct tegra_qspi_data *tqspi = context_data;
>>> +
>>> +??? tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>>> +??? if (tqspi->cur_direction & DATA_DIR_TX)
>>> +??????? tqspi->tx_status = tqspi->status_reg &
>>> +?????????????????? (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
>>> +
>>> +??? if (tqspi->cur_direction & DATA_DIR_RX)
>>> +??????? tqspi->rx_status = tqspi->status_reg &
>>> +?????????????????? (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
>>> +??? tegra_qspi_mask_clear_irq(tqspi);
>>> +
>>> +??? return IRQ_WAKE_THREAD;
>>> +}
>> It's a bit unclear to me the value we gain from having this handler - if
>> we don't specify a handler genirq will already mask the interrupt until
>> we get to the thread anyway and we could just read the status in the
>> threaded handler.? OTOH it doesn't do any harm, just struck me as a bit
>> odd.
>
> I started QSPI driver by taking SPI driver as data transfer and
> interrupt handling are similar.
>
> So kept this handler for clearing status registers and masking
> interrupts as I did not see anything wrong with this.
>
>>
>>> +??? master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
>>> +??? if (!master) {
>>> +??????? dev_err(&pdev->dev, "master allocation failed\n");
>>> +??????? return -ENOMEM;
>>> +??? }
>> Please switch to using the devm_ version of the API to allocate
>> controller, it makes things much more robust.
> Will update in v2
>>
>>> +??? if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
>>> +???????????????? &master->max_speed_hz))
>>> +??????? master->max_speed_hz = QSPI_MAX_SPEED;
>> The core will do this for you.
>
> Will remove this in v2.
>
> Thanks
>
> Sowjanya
>
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
> Tegra SoC has a Quad SPI controller starting from Tegra210.
>
> This patch adds support for Tegra210 QSPI controller.
>
> Signed-off-by: Sowjanya Komatineni <[email protected]>
> ---
> drivers/spi/Kconfig | 9 +
> drivers/spi/Makefile | 1 +
> drivers/spi/qspi-tegra.c | 1418 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 1428 insertions(+)
> create mode 100644 drivers/spi/qspi-tegra.c
>
> diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
> index 3fd16b7..1a021e8 100644
> --- a/drivers/spi/Kconfig
> +++ b/drivers/spi/Kconfig
> @@ -844,6 +844,15 @@ config SPI_MXS
> help
> SPI driver for Freescale MXS devices.
>
> +config QSPI_TEGRA
You already discussed this with Mark, but perhaps a better name would be
SPI_TEGRA210_QUAD or something. SPI_TEGRA210 is too generic because
there is a regular SPI controller on Tegra210 as well.
SPI_TEGRA210_QUAD is in line with the likes of SPI_TEGRA20_SFLASH and
SPI_TEGRA20_SLINK.
> + tristate "Nvidia Tegra QSPI Controller"
NVIDIA
> + depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
I don't think we need the ARCH_TEGRA part there because TEGRA20_APB_DMA
already depends on that. Also, there's not strictly a dependency on
TEGRA20_APB_DMA specifically, but rather a dependency on DMA_ENGINE,
right? The DMA channels could be coming from some other driver on some
other SoC generation, such as the Tegra186 and later GPCDMA.
> + depends on RESET_CONTROLLER
> + help
> + QSPI driver for Nvidia Tegra QSPI Controller interface. This
NVIDIA
> + controller is different from the spi controller and is available
SPI
> + on Tegra SoCs starting from Tegra210.
> +
> config SPI_TEGRA114
> tristate "NVIDIA Tegra114 SPI Controller"
> depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
[...]
> diff --git a/drivers/spi/qspi-tegra.c b/drivers/spi/qspi-tegra.c
[...]
> +struct tegra_qspi_client_data {
> + int tx_clk_tap_delay;
> + int rx_clk_tap_delay;
> +};
If this is client data, why are we dealing with this in the controller
driver? Is this perhaps something that could be added to struct
spi_device?
> +
> +struct tegra_qspi_data {
That _data just seems to be 5 extra characters that don't add any value.
> + struct device *dev;
> + struct spi_master *master;
> + /* Lock to protect data accessed by irq */
> + spinlock_t lock;
> +
> + struct clk *clk;
> + struct reset_control *rst;
> + void __iomem *base;
> + phys_addr_t phys;
> + unsigned int irq;
> +
> + u32 cur_speed;
> + unsigned int cur_pos;
> + unsigned int words_per_32bit;
> + unsigned int bytes_per_word;
> + unsigned int curr_dma_words;
> + unsigned int cur_direction;
> +
> + unsigned int cur_rx_pos;
> + unsigned int cur_tx_pos;
> +
> + unsigned int dma_buf_size;
> + unsigned int max_buf_size;
> + bool is_curr_dma_xfer;
> +
> + struct completion rx_dma_complete;
> + struct completion tx_dma_complete;
> +
> + u32 tx_status;
> + u32 rx_status;
> + u32 status_reg;
> + bool is_packed;
> + bool use_dma;
> +
> + u32 command1_reg;
> + u32 dma_control_reg;
> + u32 def_command1_reg;
> + u32 def_command2_reg;
> + u32 spi_cs_timing1;
> + u32 spi_cs_timing2;
> + u8 dummy_cycles;
> +
> + struct completion xfer_completion;
> + struct spi_transfer *curr_xfer;
> +
> + struct dma_chan *rx_dma_chan;
> + u32 *rx_dma_buf;
> + dma_addr_t rx_dma_phys;
> + struct dma_async_tx_descriptor *rx_dma_desc;
> +
> + struct dma_chan *tx_dma_chan;
> + u32 *tx_dma_buf;
> + dma_addr_t tx_dma_phys;
> + struct dma_async_tx_descriptor *tx_dma_desc;
> +};
> +
> +static int tegra_qspi_runtime_suspend(struct device *dev);
> +static int tegra_qspi_runtime_resume(struct device *dev);
Can't we just reorder the code so that these don't have to be forward-
declared?
> +
> +static inline u32 tegra_qspi_readl(struct tegra_qspi_data *tqspi,
> + unsigned long reg)
Nit: I prefer "offset" over "reg" because I think it's slightly more
accurate.
> +{
> + return readl(tqspi->base + reg);
> +}
> +
> +static inline void tegra_qspi_writel(struct tegra_qspi_data *tqspi,
> + u32 val, unsigned long reg)
I also prefer "value" over "val" because "val" could also be short for
"valid". Anyway, I am pedantic that way, so feel free to ignore that.
[...]
> +static unsigned int
> +tegra_qspi_calculate_curr_xfer_param(struct tegra_qspi_data *tqspi,
> + struct spi_transfer *t)
> +{
> + unsigned int remain_len = t->len - tqspi->cur_pos;
> + unsigned int max_word;
> + unsigned int bits_per_word = t->bits_per_word;
> + unsigned int max_len;
> + unsigned int total_fifo_words;
You could list some of these on the same line to make this a bit more
compact. Something I've often seen done that makes this really clean is
to have uninitialized variables on one line and then assignments on
separate lines, then sort lines by length:
unsigned int max_word, max_len, total_fifo_words;
unsigned int remain_len = t->len - tqspi->cur_pos;
unsigned int bits_per_word = t->bits_per_word;
This also applies to some other functions further down.
[...]
> +static void
> +tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
> + struct spi_transfer *t)
> +{
> + /* Make the dma buffer to read by cpu */
This comment seems a bit redundant. dma_sync_single_for_cpu() is a well-
documented function that doesn't need explanation.
> + dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
> + tqspi->dma_buf_size, DMA_TO_DEVICE);
> +
> + if (tqspi->is_packed) {
> + unsigned int len = tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
> +
> + memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
> + tqspi->cur_tx_pos += tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
> + } else {
> + u8 *tx_buf = (u8 *)t->tx_buf + tqspi->cur_tx_pos;
> + unsigned int i;
> + unsigned int count;
> + unsigned int consume;
> + unsigned int write_bytes;
> +
> + consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
> + if (consume > t->len - tqspi->cur_pos)
> + consume = t->len - tqspi->cur_pos;
> + write_bytes = consume;
> + for (count = 0; count < tqspi->curr_dma_words; count++) {
> + u32 x = 0;
> +
> + for (i = 0; consume && (i < tqspi->bytes_per_word);
> + i++, consume--)
> + x |= (u32)(*tx_buf++) << (i * 8);
> + tqspi->tx_dma_buf[count] = x;
> + }
> +
> + tqspi->cur_tx_pos += write_bytes;
> + }
> +
> + /* Make the dma buffer to read by dma */
Same here.
> + dma_sync_single_for_device(tqspi->dev, tqspi->tx_dma_phys,
> + tqspi->dma_buf_size, DMA_TO_DEVICE);
> +}
> +
> +static void
> +tegra_qspi_copy_qspi_rxbuf_to_client_rxbuf(struct tegra_qspi_data *tqspi,
> + struct spi_transfer *t)
> +{
> + /* Make the dma buffer to read by cpu */
And here.
> + dma_sync_single_for_cpu(tqspi->dev, tqspi->rx_dma_phys,
> + tqspi->dma_buf_size, DMA_FROM_DEVICE);
> +
> + if (tqspi->is_packed) {
> + unsigned int len = tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
> +
> + memcpy(t->rx_buf + tqspi->cur_rx_pos, tqspi->rx_dma_buf, len);
> + tqspi->cur_rx_pos += tqspi->curr_dma_words *
> + tqspi->bytes_per_word;
> + } else {
> + unsigned char *rx_buf = t->rx_buf + tqspi->cur_rx_pos;
> + u32 rx_mask = ((u32)1 << t->bits_per_word) - 1;
> + unsigned int i;
> + unsigned int count;
> + unsigned int consume;
> + unsigned int read_bytes;
> +
> + consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
> + if (consume > t->len - tqspi->cur_pos)
> + consume = t->len - tqspi->cur_pos;
> + read_bytes = consume;
> + for (count = 0; count < tqspi->curr_dma_words; count++) {
> + u32 x = tqspi->rx_dma_buf[count] & rx_mask;
> +
> + for (i = 0; consume && (i < tqspi->bytes_per_word);
> + i++, consume--)
> + *rx_buf++ = (x >> (i * 8)) & 0xFF;
> + }
> +
> + tqspi->cur_rx_pos += read_bytes;
> + }
> +
> + /* Make the dma buffer to read by dma */
And here.
> + dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
> + tqspi->dma_buf_size, DMA_FROM_DEVICE);
> +}
> +
> +static void tegra_qspi_dma_complete(void *args)
> +{
> + struct completion *dma_complete = args;
> +
> + complete(dma_complete);
> +}
> +
> +static int tegra_qspi_start_tx_dma(struct tegra_qspi_data *tqspi, int len)
> +{
> + reinit_completion(&tqspi->tx_dma_complete);
> + tqspi->tx_dma_desc = dmaengine_prep_slave_single(tqspi->tx_dma_chan,
> + tqspi->tx_dma_phys, len, DMA_MEM_TO_DEV,
> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
Looks like most of this driver is wrapped at 80 columns. That rule was
relaxed a bit a while ago and checkpatch now only warns if you exceed
100 columns. There are various places in this driver that could benefit
from longer lines, such as the above.
> + if (!tqspi->tx_dma_desc) {
> + dev_err(tqspi->dev, "Not able to get desc for Tx\n");
Perhaps: "Unable to get TX descriptor\n"?
> + return -EIO;
> + }
> +
> + tqspi->tx_dma_desc->callback = tegra_qspi_dma_complete;
> + tqspi->tx_dma_desc->callback_param = &tqspi->tx_dma_complete;
> +
> + dmaengine_submit(tqspi->tx_dma_desc);
> + dma_async_issue_pending(tqspi->tx_dma_chan);
> + return 0;
> +}
> +
> +static int tegra_qspi_start_rx_dma(struct tegra_qspi_data *tqspi, int len)
> +{
> + reinit_completion(&tqspi->rx_dma_complete);
> + tqspi->rx_dma_desc = dmaengine_prep_slave_single(tqspi->rx_dma_chan,
> + tqspi->rx_dma_phys, len, DMA_DEV_TO_MEM,
> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> + if (!tqspi->rx_dma_desc) {
> + dev_err(tqspi->dev, "Not able to get desc for Rx\n");
Same here.
> + return -EIO;
> + }
> +
> + tqspi->rx_dma_desc->callback = tegra_qspi_dma_complete;
> + tqspi->rx_dma_desc->callback_param = &tqspi->rx_dma_complete;
> +
> + dmaengine_submit(tqspi->rx_dma_desc);
> + dma_async_issue_pending(tqspi->rx_dma_chan);
> + return 0;
> +}
> +
> +static int tegra_qspi_flush_fifos(struct tegra_qspi_data *tqspi)
> +{
> + unsigned long timeout = jiffies + HZ;
> + u32 status;
> +
> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
> + if ((status & QSPI_FIFO_EMPTY) == QSPI_FIFO_EMPTY)
> + return 0;
> +
> + status |= QSPI_RX_FIFO_FLUSH | QSPI_TX_FIFO_FLUSH;
> + tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
> + while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
> + if (time_after(jiffies, timeout)) {
> + dev_err(tqspi->dev,
> + "timeout waiting for fifo flush\n");
FIFO is an abbreviation, so it should be all uppercase in text.
> + return -EIO;
> + }
> +
> + udelay(1);
It looks like this function can be called from both interrupt and
process contexts, where the latter is the more common case and it is
only ever used in interrupt context to clean up on error.
I wonder if it's worth adding an "atomic" parameter to the function and
make this a sleeping loop whenever possible. Also, can this not use one
of the helpers from linux/iopoll.h?
> + }
> +
> + return 0;
> +}
> +
> +static void tegra_qspi_unmask_irq(struct tegra_qspi_data *tqspi)
> +{
> + u32 intr_mask;
> +
> + intr_mask = tegra_qspi_readl(tqspi, QSPI_INTR_MASK);
> + intr_mask &= ~(QSPI_INTR_RDY_MASK | QSPI_INTR_RX_TX_FIFO_ERR);
> + tegra_qspi_writel(tqspi, intr_mask, QSPI_INTR_MASK);
> +}
> +
> +static int tegra_qspi_start_dma_based_transfer(struct tegra_qspi_data *tqspi,
> + struct spi_transfer *t)
> +{
> + u32 val;
> + unsigned int len;
> + int ret = 0;
> + u8 dma_burst;
> + struct dma_slave_config dma_sconfig = {0};
I think checkpatch wants spaces after { and before }.
> +
> + val = QSPI_DMA_BLK_SET(tqspi->curr_dma_words - 1);
> + tegra_qspi_writel(tqspi, val, QSPI_DMA_BLK);
> +
> + tegra_qspi_unmask_irq(tqspi);
> +
> + if (tqspi->is_packed)
> + len = DIV_ROUND_UP(tqspi->curr_dma_words *
> + tqspi->bytes_per_word, 4) * 4;
> + else
> + len = tqspi->curr_dma_words * 4;
> +
> + /* Set attention level based on length of transfer */
> + val = 0;
> + if (len & 0xF) {
Nit: hexadecimal literals are usually lowercase.
> + val |= QSPI_TX_TRIG_1 | QSPI_RX_TRIG_1;
> + dma_burst = 1;
> + } else if (((len) >> 4) & 0x1) {
> + val |= QSPI_TX_TRIG_4 | QSPI_RX_TRIG_4;
> + dma_burst = 4;
> + } else {
> + val |= QSPI_TX_TRIG_8 | QSPI_RX_TRIG_8;
> + dma_burst = 8;
> + }
> +
> + tegra_qspi_writel(tqspi, val, QSPI_DMA_CTL);
> + tqspi->dma_control_reg = val;
> +
> + dma_sconfig.device_fc = true;
> + if (tqspi->cur_direction & DATA_DIR_TX) {
> + dma_sconfig.dst_addr = tqspi->phys + QSPI_TX_FIFO;
> + dma_sconfig.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + dma_sconfig.dst_maxburst = dma_burst;
> + ret = dmaengine_slave_config(tqspi->tx_dma_chan, &dma_sconfig);
> + if (ret < 0) {
> + dev_err(tqspi->dev,
> + "DMA slave config failed: %d\n", ret);
> + return ret;
> + }
> +
> + tegra_qspi_copy_client_txbuf_to_qspi_txbuf(tqspi, t);
> + ret = tegra_qspi_start_tx_dma(tqspi, len);
> + if (ret < 0) {
> + dev_err(tqspi->dev,
> + "Starting tx dma failed: %d\n", ret);
"TX" and "DMA"
> + return ret;
> + }
> + }
> +
> + if (tqspi->cur_direction & DATA_DIR_RX) {
> + dma_sconfig.src_addr = tqspi->phys + QSPI_RX_FIFO;
> + dma_sconfig.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + dma_sconfig.src_maxburst = dma_burst;
> + ret = dmaengine_slave_config(tqspi->rx_dma_chan, &dma_sconfig);
> + if (ret < 0) {
> + dev_err(tqspi->dev,
> + "DMA slave config failed: %d\n", ret);
> + return ret;
> + }
> +
> + /* Make the dma buffer to read by dma */
Again, not a useful comment.
> + dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
> + tqspi->dma_buf_size,
> + DMA_FROM_DEVICE);
> +
> + ret = tegra_qspi_start_rx_dma(tqspi, len);
> + if (ret < 0) {
> + dev_err(tqspi->dev,
> + "Starting rx dma failed: %d\n", ret);
"RX" and "DMA"
[...]
> +static int tegra_qspi_init_dma_param(struct tegra_qspi_data *tqspi,
> + bool dma_to_memory)
> +{
> + struct dma_chan *dma_chan;
> + u32 *dma_buf;
> + dma_addr_t dma_phys;
> +
> + if (!device_property_present(tqspi->dev, "dmas"))
> + return 0;
If DMA support is optional, then we definitely don't want to depend on
TEGRA20_APB_DMA (or any other specific driver).
> +
> + dma_chan = dma_request_chan(tqspi->dev, dma_to_memory ? "rx" : "tx");
Does this return some specific value when there's no channel for this?
I.e. what happens if the "dmas" property exists in device tree but we
don't have a driver for the provider enabled? Since we have code to use
as fallback, we may want to special case that here and allow the driver
to continue.
> + if (IS_ERR(dma_chan))
> + return dev_err_probe(tqspi->dev, PTR_ERR(dma_chan),
> + "Dma channel is not available\n");
"DMA"
[...]
> +static int tegra_qspi_start_transfer_one(struct spi_device *spi,
> + struct spi_transfer *t, u32 command1)
> +{
[...]
> + command1 &= ~QSPI_INTERFACE_WIDTH_MASK;
> + if (bus_width == SPI_NBITS_QUAD)
> + command1 |= QSPI_INTERFACE_WIDTH_QUAD;
> + else if (bus_width == SPI_NBITS_DUAL)
> + command1 |= QSPI_INTERFACE_WIDTH_DUAL;
> + else
> + command1 |= QSPI_INTERFACE_WIDTH_SINGLE;
> + tqspi->command1_reg = command1;
This (and in some other places in the driver) could use a couple of
blank lines to make this less cluttered.
[...]
> +static void tegra_qspi_cleanup(struct spi_device *spi)
> +{
> + struct tegra_qspi_client_data *cdata = spi->controller_data;
> +
> + spi->controller_data = NULL;
> + if (spi->dev.of_node)
Can this ever fail? Do we support SPI device instantiation from
somewhere else than DT?
> + kfree(cdata);
> +}
> +
> +static int tegra_qspi_setup(struct spi_device *spi)
> +{
> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
> + struct tegra_qspi_client_data *cdata = spi->controller_data;
> + u32 tx_tap = 0, rx_tap = 0;
> + u32 val;
> + unsigned long flags;
> + int ret;
> +
> + dev_dbg(&spi->dev, "setup %d bpw, %scpol, %scpha, %dHz\n",
> + spi->bits_per_word,
> + spi->mode & SPI_CPOL ? "" : "~",
> + spi->mode & SPI_CPHA ? "" : "~",
> + spi->max_speed_hz);
> +
> + if (!cdata) {
> + cdata = tegra_qspi_parse_cdata_dt(spi);
Oh... I see that this is actually parsed from the SPI device node, so
perhaps this is okay to do.
[...]
> +static int tegra_qspi_transfer_one_message(struct spi_master *master,
> + struct spi_message *msg)
> +{
[...]
> + is_first_msg = false;
> + ret = wait_for_completion_timeout(&tqspi->xfer_completion,
> + QSPI_DMA_TIMEOUT);
> + if (WARN_ON(ret == 0)) {
> + dev_err(tqspi->dev,
> + "qspi transfer timeout: %d\n", ret);
"QSPI", or alternatively leave this out completely because it should be
obvious that the transfer that failed is a QSPI transfer just from the
device name.
> + if (tqspi->is_curr_dma_xfer &&
> + (tqspi->cur_direction & DATA_DIR_TX))
> + dmaengine_terminate_all(tqspi->tx_dma_chan);
> + if (tqspi->is_curr_dma_xfer &&
> + (tqspi->cur_direction & DATA_DIR_RX))
> + dmaengine_terminate_all(tqspi->rx_dma_chan);
> + ret = -EIO;
> + tegra_qspi_dump_regs(tqspi);
> + tegra_qspi_flush_fifos(tqspi);
> + reset_control_assert(tqspi->rst);
> + udelay(2);
> + reset_control_deassert(tqspi->rst);
> + goto exit;
> + }
> +
> + if (tqspi->tx_status || tqspi->rx_status) {
> + ret = -EIO;
> + dev_err(tqspi->dev, "error in transfer: %d\n", ret);
Would it make sense to output tx_status and/or rx_status in this message
rather than -EIO unconditionally?
[...]
> +static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi_data *tqspi)
> +{
> + struct spi_transfer *t = tqspi->curr_xfer;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&tqspi->lock, flags);
> + if (tqspi->tx_status || tqspi->rx_status) {
> + dev_err(tqspi->dev, "CpuXfer ERROR bit set 0x%x\n",
> + tqspi->status_reg);
> + dev_err(tqspi->dev, "CpuXfer 0x%08x:0x%08x\n",
> + tqspi->command1_reg, tqspi->dma_control_reg);
> + tegra_qspi_dump_regs(tqspi);
> + tegra_qspi_flush_fifos(tqspi);
> + complete(&tqspi->xfer_completion);
> + spin_unlock_irqrestore(&tqspi->lock, flags);
> + reset_control_assert(tqspi->rst);
> + udelay(2);
> + reset_control_deassert(tqspi->rst);
> + return IRQ_HANDLED;
> + }
Can this error handling be somehow unified with the DMA error handling?
Having markers such as "CpuXfer" in error messages is not really helpful
to users. Maybe have the driver output an INFO message or so during
probe when it falls back to PIO mode, that would be a good enough
indicator to someone debugging some issue that PIO mode is being used.
And then you can just treat transfer failures more uniformly.
Thierry
On Wed, Dec 02, 2020 at 05:27:21PM +0000, Mark Brown wrote:
> On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
> > Tegra SoC has a Quad SPI controller starting from Tegra210.
> >
> > This patch adds support for Tegra210 QSPI controller.
>
> This looks pretty clean but I've got a few questions below about how
> this integrates with the frameworks as well as some more minor issues.
>
> > +config QSPI_TEGRA
> > + tristate "Nvidia Tegra QSPI Controller"
>
> Everything else in this file is SPI_, even the qspi controllers.
>
> > +++ b/drivers/spi/qspi-tegra.c
> > @@ -0,0 +1,1418 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 NVIDIA CORPORATION. All rights reserved.
> > + */
>
> Please make the entire comment a C++ one. It also appears that the "All
> rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
My understanding is that this phrase is pretty much irrelevant these
days. Furthermore, I don't think this is to be interpreted as a claim to
any rights other than the copyright, so it's mostly equivalent to the
"Copyright (C)" on that same line. Any license terms associated with the
file do still apply regardless.
That said, I'm not a lawyer, so don't take this as legal advice. FWIW,
there's something on the order of 8000 occurrences of that phrase in the
Linux kernel sources, so I think with or without the phrase we should be
okay.
Thierry
On Wed, Dec 02, 2020 at 11:17:18AM -0800, Sowjanya Komatineni wrote:
> On 12/2/20 9:27 AM, Mark Brown wrote:
> > On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
[...]
> > > +static int tegra_qspi_setup(struct spi_device *spi)
> > > +{
> > > + if (cdata && cdata->tx_clk_tap_delay)
> > > + tx_tap = cdata->tx_clk_tap_delay;
> > > + if (cdata && cdata->rx_clk_tap_delay)
> > > + rx_tap = cdata->rx_clk_tap_delay;
> > > + tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
> > > + QSPI_RX_TAP_DELAY(rx_tap);
> > > + tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
> > The setup for one device shouldn't be able to affect the operation of
> > another, already running, device so either these need to be configured
> > as part of the controller probe or these configurations need to be
> > deferred until we're actually doing a transfer.
> We will only have 1 device on QSPI as we only support single chip select.
Even so we could make the driver operate as if there were multiple
devices. This has the advantage of setting a better example for someone
who might be reading this code as reference, and it might come in handy
if for whatever reason we ever end up with a QSPI controller that does
support multiple chip selects.
If that's overly complicated, maybe a compromise would be to document
very explicitly that this only works because Tegra QSPI supports a
single chip select?
Thierry
On 01/12/2020 21:12, Sowjanya Komatineni wrote:
> Tegra SoC has a Quad SPI controller starting from Tegra210.
>
> This patch adds support for Tegra210 QSPI controller.
>
> Signed-off-by: Sowjanya Komatineni <[email protected]>
> ---
> drivers/spi/Kconfig | 9 +
> drivers/spi/Makefile | 1 +
> drivers/spi/qspi-tegra.c | 1418 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 1428 insertions(+)
> create mode 100644 drivers/spi/qspi-tegra.c
>
> diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
> index 3fd16b7..1a021e8 100644
> --- a/drivers/spi/Kconfig
> +++ b/drivers/spi/Kconfig
> @@ -844,6 +844,15 @@ config SPI_MXS
> help
> SPI driver for Freescale MXS devices.
>
> +config QSPI_TEGRA
> + tristate "Nvidia Tegra QSPI Controller"
> + depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
I assume that the dependency on the APBDMA is for Tegra210. Does it work
on Tegra210 without the DMA? I am wondering if this is a dependency?
> +static void tegra_qspi_deinit_dma_param(struct tegra_qspi_data *tqspi,
> + bool dma_to_memory)
> +{
> + u32 *dma_buf;
> + dma_addr_t dma_phys;
> + struct dma_chan *dma_chan;
> +
> + if (dma_to_memory) {
> + dma_buf = tqspi->rx_dma_buf;
> + dma_chan = tqspi->rx_dma_chan;
> + dma_phys = tqspi->rx_dma_phys;
> + tqspi->rx_dma_chan = NULL;
> + tqspi->rx_dma_buf = NULL;
> + } else {
> + dma_buf = tqspi->tx_dma_buf;
> + dma_chan = tqspi->tx_dma_chan;
> + dma_phys = tqspi->tx_dma_phys;
> + tqspi->tx_dma_buf = NULL;
> + tqspi->tx_dma_chan = NULL;
> + }
> + if (!dma_chan)
> + return;
The above seemed odd to me at first, but I guess if a device does not
support DMA yet, then this will be NULL. However, would it be clearer to
just ...
if (!tqspi->use_dma)
return;
You could also do this right at the beginning of the function.
> +static struct tegra_qspi_client_data
> + *tegra_qspi_parse_cdata_dt(struct spi_device *spi)
> +{
> + struct tegra_qspi_client_data *cdata;
> + struct device_node *slave_np;
> +
> + slave_np = spi->dev.of_node;
> + if (!slave_np) {
This test should not be necessary as we only support device-tree.
> + dev_dbg(&spi->dev, "device node not found\n");
> + return NULL;
> + }
> +
> + cdata = kzalloc(sizeof(*cdata), GFP_KERNEL);
> + if (!cdata)
> + return NULL;
> +
> + of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
> + &cdata->tx_clk_tap_delay);
> + of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
> + &cdata->rx_clk_tap_delay);
> + return cdata;
> +}
> +
> +static void tegra_qspi_cleanup(struct spi_device *spi)
> +{
> + struct tegra_qspi_client_data *cdata = spi->controller_data;
> +
> + spi->controller_data = NULL;
> + if (spi->dev.of_node)
> + kfree(cdata);
> +}
> +
> +static int tegra_qspi_setup(struct spi_device *spi)
> +{
> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
> + struct tegra_qspi_client_data *cdata = spi->controller_data;
> + u32 tx_tap = 0, rx_tap = 0;
> + u32 val;
> + unsigned long flags;
> + int ret;
> +
> + dev_dbg(&spi->dev, "setup %d bpw, %scpol, %scpha, %dHz\n",
> + spi->bits_per_word,
> + spi->mode & SPI_CPOL ? "" : "~",
> + spi->mode & SPI_CPHA ? "" : "~",
> + spi->max_speed_hz);
> +
> + if (!cdata) {
> + cdata = tegra_qspi_parse_cdata_dt(spi);
> + spi->controller_data = cdata;
> + }
> +
> + ret = pm_runtime_get_sync(tqspi->dev);
> + if (ret < 0) {
> + dev_err(tqspi->dev, "runtime resume failed: %d\n", ret);
> + if (cdata)
> + tegra_qspi_cleanup(spi);
> + return ret;
> + }
Does it simplify the code to do the pm_runtime_get_sync() before the
parsing of the cdata?
> +static int tegra_qspi_probe(struct platform_device *pdev)
> +{
> + struct spi_master *master;
> + struct tegra_qspi_data *tqspi;
> + struct resource *r;
> + int ret, qspi_irq;
> + int bus_num;
> +
> + master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
> + if (!master) {
> + dev_err(&pdev->dev, "master allocation failed\n");
> + return -ENOMEM;
> + }
> +
> + platform_set_drvdata(pdev, master);
> + tqspi = spi_master_get_devdata(master);
> +
> + if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
> + &master->max_speed_hz))
> + master->max_speed_hz = QSPI_MAX_SPEED;
> +
> + /* the spi->mode bits understood by this driver: */
> + master->mode_bits = SPI_MODE_0 | SPI_MODE_3 | SPI_CS_HIGH |
> + SPI_TX_DUAL | SPI_RX_DUAL | SPI_TX_QUAD |
> + SPI_RX_QUAD;
> + master->bits_per_word_mask = SPI_BPW_MASK(32) | SPI_BPW_MASK(16) |
> + SPI_BPW_MASK(8);
> + master->setup = tegra_qspi_setup;
> + master->cleanup = tegra_qspi_cleanup;
> + master->transfer_one_message = tegra_qspi_transfer_one_message;
> + master->num_chipselect = 1;
> + master->auto_runtime_pm = true;
> + bus_num = of_alias_get_id(pdev->dev.of_node, "spi");
> + if (bus_num >= 0)
> + master->bus_num = bus_num;
> +
> + tqspi->master = master;
> + tqspi->dev = &pdev->dev;
> + spin_lock_init(&tqspi->lock);
> +
> + r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + tqspi->base = devm_ioremap_resource(&pdev->dev, r);
> + if (IS_ERR(tqspi->base)) {
> + ret = PTR_ERR(tqspi->base);
> + goto exit_free_master;
> + }
> +
> + tqspi->phys = r->start;
> + qspi_irq = platform_get_irq(pdev, 0);
> + tqspi->irq = qspi_irq;
> +
> + tqspi->clk = devm_clk_get(&pdev->dev, "qspi");
> + if (IS_ERR(tqspi->clk)) {
> + ret = PTR_ERR(tqspi->clk);
> + dev_err(&pdev->dev, "failed to get clock: %d\n", ret);
> + goto exit_free_master;
> + }
> +
> + tqspi->rst = devm_reset_control_get_exclusive(&pdev->dev, "qspi");
> + if (IS_ERR(tqspi->rst)) {
> + ret = PTR_ERR(tqspi->rst);
> + dev_err(&pdev->dev, "failed to get reset control: %d\n", ret);
> + goto exit_free_master;
> + }
> +
> + tqspi->max_buf_size = QSPI_FIFO_DEPTH << 2;
> + tqspi->dma_buf_size = DEFAULT_QSPI_DMA_BUF_LEN;
> +
> + ret = tegra_qspi_init_dma_param(tqspi, true);
> + if (ret < 0)
> + goto exit_free_master;
> + ret = tegra_qspi_init_dma_param(tqspi, false);
> + if (ret < 0)
> + goto exit_rx_dma_free;
I would be tempted to combine the init for the TX and RX into a single
function. Then we can have a single function to deinit.
> +
> + if (tqspi->use_dma)
> + tqspi->max_buf_size = tqspi->dma_buf_size;
> +
> + init_completion(&tqspi->tx_dma_complete);
> + init_completion(&tqspi->rx_dma_complete);
> +
Unnecessary blank line.
> + init_completion(&tqspi->xfer_completion);
> +
> + pm_runtime_enable(&pdev->dev);
> + if (!pm_runtime_enabled(&pdev->dev)) {
RPM is always enabled for Tegra and so if this fails we should just fail.
> + ret = tegra_qspi_runtime_resume(&pdev->dev);
> + if (ret)
> + goto exit_pm_disable;
> + }
> +
> + ret = pm_runtime_get_sync(&pdev->dev);
> + if (ret < 0) {
> + dev_err(&pdev->dev, "runtime resume failed: %d\n", ret);
> + pm_runtime_put_noidle(&pdev->dev)
You can use pm_runtime_resume_and_get() now and then you don't need to
call pm_runtime_put_noidle() on failure.
> + goto exit_pm_disable;
> + }
> +
> + reset_control_assert(tqspi->rst);
> + udelay(2);
> + reset_control_deassert(tqspi->rst);
> + tqspi->def_command1_reg = QSPI_M_S | QSPI_CS_SW_HW | QSPI_CS_SW_VAL;
> + tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
> + tqspi->spi_cs_timing1 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING1);
> + tqspi->spi_cs_timing2 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING2);
> + tqspi->def_command2_reg = tegra_qspi_readl(tqspi, QSPI_COMMAND2);
> +
> + pm_runtime_put(&pdev->dev);
> +
> + ret = request_threaded_irq(tqspi->irq, tegra_qspi_isr,
> + tegra_qspi_isr_thread, IRQF_ONESHOT,
> + dev_name(&pdev->dev), tqspi);
> + if (ret < 0) {
> + dev_err(&pdev->dev,
> + "failed to request IRQ#%u: %d\n", tqspi->irq, ret);
> + goto exit_pm_disable;
> + }
> +
> + master->dev.of_node = pdev->dev.of_node;
> + ret = devm_spi_register_master(&pdev->dev, master);
> + if (ret < 0) {
> + dev_err(&pdev->dev, "failed to register master: %d\n", ret);
> + goto exit_free_irq;
> + }
> + return ret;
return 0
> +static int tegra_qspi_runtime_resume(struct device *dev)
> +{
> + struct spi_master *master = dev_get_drvdata(dev);
> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
> + int ret;
> +
> + ret = clk_prepare_enable(tqspi->clk);
> + if (ret < 0) {
> + dev_err(tqspi->dev, "clk_prepare failed: %d\n", ret);
> + return ret;
> + }
> + return 0;
Always just 'return ret' here.
Cheers
Jon
--
nvpublic
On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
> On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:
> > > It seems weird that this device needs us to do a memcpy() to do DMA,
> > > most devices are able to DMA directly from the buffers provided by the
> > > SPI API (and let the SPI core sync things).? What is going on here?
> > For transfers of size more than max DMA transfer limit, data transfer
> > happens in multiple iterations with each iteration transferring up to
> > max DMA transfer limit.
> > So using separate dma buffers and on every iteration copying them to SPI
> > core provided tx/rx buffers.
I don't understand this - there's no restriction on where DMA transfers
can be done from within a DMA mapped region, the driver can do multiple
transfers from different chunks of the source buffer without having to
copy anything. That's a very common scenario.
> Also unpack mode needs to manually put the bytes together from read data to
> SPI core rx buffer.
Could you be more explicit here, I don't know what "unpack mode" is?
> > > This is worrying, the client device might be confused if /CS is doing
> > > things outside of the standard handling.
> > Do you mean to honor spi_transfer cs_change flag?
At least, yes - more generally just if there's any feature to with chip
select then the driver will need to open code it. The driver should at
least be double checking that what it's doing matches what it was told
to do, though just letting this be handled by the generic code if
there's no limitation on the hardware tends to be easier all round.
> > Tegra QSPI is master and is used only with QSPI flash devices. Looking
> > at SPI NOR driver, I see QSPI Flash commands are executed with one flash
> > command per spi_message and I dont see cs_change flag usage w.r.t QSPI
> > flash. So, using SW based CS control for QSPI.
> > Please correct me if I miss something to understand here.
Someone might build a system that does something different, they may see
a spare SPI controller and decide they can use it for something else or
there may be some future change with the flash code which does something
different.
> > > > +??? tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
> > > The setup for one device shouldn't be able to affect the operation of
> > > another, already running, device so either these need to be configured
> > > as part of the controller probe or these configurations need to be
> > > deferred until we're actually doing a transfer.
> > We will only have 1 device on QSPI as we only support single chip select.
It's quite common for people to do things like add additional devices
with GPIO chip selects.
> > > > +??? /*
> > > > +???? * Tegra QSPI hardware support dummy bytes transfer based on the
> > > > +???? * programmed dummy clock cyles in QSPI register.
> > > > +???? * So, get the total dummy bytes from the dummy bytes transfer in
> > > > +???? * spi_messages and convert to dummy clock cyles.
> > > > +???? */
> > > > +??? list_for_each_entry(xfer, &msg->transfers, transfer_list) {
> > > > +??????? if (ntransfers == DUMMY_BYTES_XFER &&
> > > > +??????????? !(list_is_last(&xfer->transfer_list, &msg->transfers)))
> > > > +??????????? dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
> > > > +??????? ntransfers++;
> > > > +??? }
> > > This seems weird, there's some hard coded assumption about particular
> > > patterns that the client device is going to send.? What's going on here?
> > > I don't really understand what this is trying to do.
> > QSPI flash needs dummy cycles for data read operation which is actually
> > the initial read latency and no. of dummy cycles required are vendor
> > specific.
> > SPI NOR driver gets required dummy cycles based on mode clock cycles and
> > wait state clock cycles.
> > During read operations, spi_nor_spimem_read_data() converts dummy cycles
> > to number of dummy bytes.
> > Tegra QSPI controller supports dummy clock cycles register and when
> > programmed QSPI controller sends dummy bytes rather than SW handling
> > extra cycles for transferring dummy bytes.
> > Above equation converts this dummy bytes back to dummy clock cycles to
> > program into QSPI register and avoid manual SW transfer of dummy bytes.
This is not a good idea, attempting to reverse engineer the message and
guess at the contents isn't going to be robust and if it's useful it
will if nothing else lead to a bunch of duplicated code in drivers as
every device that has this feature will need to reimplment it. Instead
we should extend the framework so there's explicit support for
specifying transfers that are padding bytes, then there's no guesswork
that can go wrong and no duplicated code between drivers. A flag in the
transfer struct might work?
On 12/4/20 10:52 AM, Mark Brown wrote:
> On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
>> On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:
>>>> It seems weird that this device needs us to do a memcpy() to do DMA,
>>>> most devices are able to DMA directly from the buffers provided by the
>>>> SPI API (and let the SPI core sync things).? What is going on here?
>>> For transfers of size more than max DMA transfer limit, data transfer
>>> happens in multiple iterations with each iteration transferring up to
>>> max DMA transfer limit.
>>> So using separate dma buffers and on every iteration copying them to SPI
>>> core provided tx/rx buffers.
> I don't understand this - there's no restriction on where DMA transfers
> can be done from within a DMA mapped region, the driver can do multiple
> transfers from different chunks of the source buffer without having to
> copy anything. That's a very common scenario.
>
>> Also unpack mode needs to manually put the bytes together from read data to
>> SPI core rx buffer.
> Could you be more explicit here, I don't know what "unpack mode" is?
Tegra SPI/QSPI controller support packed mode and unpacked mode based on
bits per word in a transfer.
Packed Mode: When enabled, all 32-bits of data in FIFO contains valid
data packets of 8-bit/16-bit/32-bit length.
Non packed mode: For transfers like 24-bit data for example we disable
packed mode and only 24-bits of FIFO data are valid and other bits are 0's.
So during TX for FIFO filling and during receive when FIFO data is read,
SW need to skip invalid bits and should align order from/to SPI core
tx/rx buffers.
>
>>>> This is worrying, the client device might be confused if /CS is doing
>>>> things outside of the standard handling.
>>> Do you mean to honor spi_transfer cs_change flag?
> At least, yes - more generally just if there's any feature to with chip
> select then the driver will need to open code it. The driver should at
> least be double checking that what it's doing matches what it was told
> to do, though just letting this be handled by the generic code if
> there's no limitation on the hardware tends to be easier all round.
OK Will honor cs_change flag at end of transfer and will program CS
state based on that.
>
>>> Tegra QSPI is master and is used only with QSPI flash devices. Looking
>>> at SPI NOR driver, I see QSPI Flash commands are executed with one flash
>>> command per spi_message and I dont see cs_change flag usage w.r.t QSPI
>>> flash. So, using SW based CS control for QSPI.
>>> Please correct me if I miss something to understand here.
> Someone might build a system that does something different, they may see
> a spare SPI controller and decide they can use it for something else or
> there may be some future change with the flash code which does something
> different.
>
>>>>> +??? tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
>>>> The setup for one device shouldn't be able to affect the operation of
>>>> another, already running, device so either these need to be configured
>>>> as part of the controller probe or these configurations need to be
>>>> deferred until we're actually doing a transfer.
>>> We will only have 1 device on QSPI as we only support single chip select.
> It's quite common for people to do things like add additional devices
> with GPIO chip selects.
Will move tap delay programming to happen during spi transfer..
>
>>>>> +??? /*
>>>>> +???? * Tegra QSPI hardware support dummy bytes transfer based on the
>>>>> +???? * programmed dummy clock cyles in QSPI register.
>>>>> +???? * So, get the total dummy bytes from the dummy bytes transfer in
>>>>> +???? * spi_messages and convert to dummy clock cyles.
>>>>> +???? */
>>>>> +??? list_for_each_entry(xfer, &msg->transfers, transfer_list) {
>>>>> +??????? if (ntransfers == DUMMY_BYTES_XFER &&
>>>>> +??????????? !(list_is_last(&xfer->transfer_list, &msg->transfers)))
>>>>> +??????????? dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
>>>>> +??????? ntransfers++;
>>>>> +??? }
>>>> This seems weird, there's some hard coded assumption about particular
>>>> patterns that the client device is going to send.? What's going on here?
>>>> I don't really understand what this is trying to do.
>>> QSPI flash needs dummy cycles for data read operation which is actually
>>> the initial read latency and no. of dummy cycles required are vendor
>>> specific.
>>> SPI NOR driver gets required dummy cycles based on mode clock cycles and
>>> wait state clock cycles.
>>> During read operations, spi_nor_spimem_read_data() converts dummy cycles
>>> to number of dummy bytes.
>>> Tegra QSPI controller supports dummy clock cycles register and when
>>> programmed QSPI controller sends dummy bytes rather than SW handling
>>> extra cycles for transferring dummy bytes.
>>> Above equation converts this dummy bytes back to dummy clock cycles to
>>> program into QSPI register and avoid manual SW transfer of dummy bytes.
> This is not a good idea, attempting to reverse engineer the message and
> guess at the contents isn't going to be robust and if it's useful it
> will if nothing else lead to a bunch of duplicated code in drivers as
> every device that has this feature will need to reimplment it. Instead
> we should extend the framework so there's explicit support for
> specifying transfers that are padding bytes, then there's no guesswork
> that can go wrong and no duplicated code between drivers. A flag in the
> transfer struct might work?
As per QSPI spec, Dummy bytes for initial read latency are always FF's.
So its not like guessing the contents.
Tegra QSPI controller HW supports transferring dummy bytes (sending FF's
after address) based on dummy clock cycles programmed.
To allow Tegra QSPI HW transfer dummy bytes directly, controller driver
need number of dummy bytes / actual dummy clock cycles which core driver
gets from flash sfdp read data.
So, we can add flag to transfer and based on this flag if controller HW
supports then we can ignore filling dummy bytes in spi_mem_exec_op but
controller driver still needs dummy_cycles value. So probably along with
flag, do you agree to have dummy_cycle as part of transfer struct which
can be set to nor->read_dummy value?
Thanks
Sowjanya
Thanks Jon. Will address below suggestions in v2.
Regards,
Sowjanya
On 12/4/20 6:41 AM, Jon Hunter wrote:
> On 01/12/2020 21:12, Sowjanya Komatineni wrote:
>> Tegra SoC has a Quad SPI controller starting from Tegra210.
>>
>> This patch adds support for Tegra210 QSPI controller.
>>
>> Signed-off-by: Sowjanya Komatineni <[email protected]>
>> ---
>> drivers/spi/Kconfig | 9 +
>> drivers/spi/Makefile | 1 +
>> drivers/spi/qspi-tegra.c | 1418 ++++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 1428 insertions(+)
>> create mode 100644 drivers/spi/qspi-tegra.c
>>
>> diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
>> index 3fd16b7..1a021e8 100644
>> --- a/drivers/spi/Kconfig
>> +++ b/drivers/spi/Kconfig
>> @@ -844,6 +844,15 @@ config SPI_MXS
>> help
>> SPI driver for Freescale MXS devices.
>>
>> +config QSPI_TEGRA
>> + tristate "Nvidia Tegra QSPI Controller"
>> + depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
> I assume that the dependency on the APBDMA is for Tegra210. Does it work
> on Tegra210 without the DMA? I am wondering if this is a dependency?
>
>> +static void tegra_qspi_deinit_dma_param(struct tegra_qspi_data *tqspi,
>> + bool dma_to_memory)
>> +{
>> + u32 *dma_buf;
>> + dma_addr_t dma_phys;
>> + struct dma_chan *dma_chan;
>> +
>> + if (dma_to_memory) {
>> + dma_buf = tqspi->rx_dma_buf;
>> + dma_chan = tqspi->rx_dma_chan;
>> + dma_phys = tqspi->rx_dma_phys;
>> + tqspi->rx_dma_chan = NULL;
>> + tqspi->rx_dma_buf = NULL;
>> + } else {
>> + dma_buf = tqspi->tx_dma_buf;
>> + dma_chan = tqspi->tx_dma_chan;
>> + dma_phys = tqspi->tx_dma_phys;
>> + tqspi->tx_dma_buf = NULL;
>> + tqspi->tx_dma_chan = NULL;
>> + }
>> + if (!dma_chan)
>> + return;
> The above seemed odd to me at first, but I guess if a device does not
> support DMA yet, then this will be NULL. However, would it be clearer to
> just ...
>
> if (!tqspi->use_dma)
> return;
>
> You could also do this right at the beginning of the function.
>
>> +static struct tegra_qspi_client_data
>> + *tegra_qspi_parse_cdata_dt(struct spi_device *spi)
>> +{
>> + struct tegra_qspi_client_data *cdata;
>> + struct device_node *slave_np;
>> +
>> + slave_np = spi->dev.of_node;
>> + if (!slave_np) {
> This test should not be necessary as we only support device-tree.
>
>> + dev_dbg(&spi->dev, "device node not found\n");
>> + return NULL;
>> + }
>> +
>> + cdata = kzalloc(sizeof(*cdata), GFP_KERNEL);
>> + if (!cdata)
>> + return NULL;
>> +
>> + of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
>> + &cdata->tx_clk_tap_delay);
>> + of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
>> + &cdata->rx_clk_tap_delay);
>> + return cdata;
>> +}
>> +
>> +static void tegra_qspi_cleanup(struct spi_device *spi)
>> +{
>> + struct tegra_qspi_client_data *cdata = spi->controller_data;
>> +
>> + spi->controller_data = NULL;
>> + if (spi->dev.of_node)
>> + kfree(cdata);
>> +}
>> +
>> +static int tegra_qspi_setup(struct spi_device *spi)
>> +{
>> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
>> + struct tegra_qspi_client_data *cdata = spi->controller_data;
>> + u32 tx_tap = 0, rx_tap = 0;
>> + u32 val;
>> + unsigned long flags;
>> + int ret;
>> +
>> + dev_dbg(&spi->dev, "setup %d bpw, %scpol, %scpha, %dHz\n",
>> + spi->bits_per_word,
>> + spi->mode & SPI_CPOL ? "" : "~",
>> + spi->mode & SPI_CPHA ? "" : "~",
>> + spi->max_speed_hz);
>> +
>> + if (!cdata) {
>> + cdata = tegra_qspi_parse_cdata_dt(spi);
>> + spi->controller_data = cdata;
>> + }
>> +
>> + ret = pm_runtime_get_sync(tqspi->dev);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev, "runtime resume failed: %d\n", ret);
>> + if (cdata)
>> + tegra_qspi_cleanup(spi);
>> + return ret;
>> + }
> Does it simplify the code to do the pm_runtime_get_sync() before the
> parsing of the cdata?
>
>> +static int tegra_qspi_probe(struct platform_device *pdev)
>> +{
>> + struct spi_master *master;
>> + struct tegra_qspi_data *tqspi;
>> + struct resource *r;
>> + int ret, qspi_irq;
>> + int bus_num;
>> +
>> + master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
>> + if (!master) {
>> + dev_err(&pdev->dev, "master allocation failed\n");
>> + return -ENOMEM;
>> + }
>> +
>> + platform_set_drvdata(pdev, master);
>> + tqspi = spi_master_get_devdata(master);
>> +
>> + if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
>> + &master->max_speed_hz))
>> + master->max_speed_hz = QSPI_MAX_SPEED;
>> +
>> + /* the spi->mode bits understood by this driver: */
>> + master->mode_bits = SPI_MODE_0 | SPI_MODE_3 | SPI_CS_HIGH |
>> + SPI_TX_DUAL | SPI_RX_DUAL | SPI_TX_QUAD |
>> + SPI_RX_QUAD;
>> + master->bits_per_word_mask = SPI_BPW_MASK(32) | SPI_BPW_MASK(16) |
>> + SPI_BPW_MASK(8);
>> + master->setup = tegra_qspi_setup;
>> + master->cleanup = tegra_qspi_cleanup;
>> + master->transfer_one_message = tegra_qspi_transfer_one_message;
>> + master->num_chipselect = 1;
>> + master->auto_runtime_pm = true;
>> + bus_num = of_alias_get_id(pdev->dev.of_node, "spi");
>> + if (bus_num >= 0)
>> + master->bus_num = bus_num;
>> +
>> + tqspi->master = master;
>> + tqspi->dev = &pdev->dev;
>> + spin_lock_init(&tqspi->lock);
>> +
>> + r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> + tqspi->base = devm_ioremap_resource(&pdev->dev, r);
>> + if (IS_ERR(tqspi->base)) {
>> + ret = PTR_ERR(tqspi->base);
>> + goto exit_free_master;
>> + }
>> +
>> + tqspi->phys = r->start;
>> + qspi_irq = platform_get_irq(pdev, 0);
>> + tqspi->irq = qspi_irq;
>> +
>> + tqspi->clk = devm_clk_get(&pdev->dev, "qspi");
>> + if (IS_ERR(tqspi->clk)) {
>> + ret = PTR_ERR(tqspi->clk);
>> + dev_err(&pdev->dev, "failed to get clock: %d\n", ret);
>> + goto exit_free_master;
>> + }
>> +
>> + tqspi->rst = devm_reset_control_get_exclusive(&pdev->dev, "qspi");
>> + if (IS_ERR(tqspi->rst)) {
>> + ret = PTR_ERR(tqspi->rst);
>> + dev_err(&pdev->dev, "failed to get reset control: %d\n", ret);
>> + goto exit_free_master;
>> + }
>> +
>> + tqspi->max_buf_size = QSPI_FIFO_DEPTH << 2;
>> + tqspi->dma_buf_size = DEFAULT_QSPI_DMA_BUF_LEN;
>> +
>> + ret = tegra_qspi_init_dma_param(tqspi, true);
>> + if (ret < 0)
>> + goto exit_free_master;
>> + ret = tegra_qspi_init_dma_param(tqspi, false);
>> + if (ret < 0)
>> + goto exit_rx_dma_free;
> I would be tempted to combine the init for the TX and RX into a single
> function. Then we can have a single function to deinit.
>
>> +
>> + if (tqspi->use_dma)
>> + tqspi->max_buf_size = tqspi->dma_buf_size;
>> +
>> + init_completion(&tqspi->tx_dma_complete);
>> + init_completion(&tqspi->rx_dma_complete);
>> +
> Unnecessary blank line.
>
>> + init_completion(&tqspi->xfer_completion);
>> +
>> + pm_runtime_enable(&pdev->dev);
>> + if (!pm_runtime_enabled(&pdev->dev)) {
> RPM is always enabled for Tegra and so if this fails we should just fail.
>
>> + ret = tegra_qspi_runtime_resume(&pdev->dev);
>> + if (ret)
>> + goto exit_pm_disable;
>> + }
>> +
>> + ret = pm_runtime_get_sync(&pdev->dev);
>> + if (ret < 0) {
>> + dev_err(&pdev->dev, "runtime resume failed: %d\n", ret);
>> + pm_runtime_put_noidle(&pdev->dev)
> You can use pm_runtime_resume_and_get() now and then you don't need to
> call pm_runtime_put_noidle() on failure.
>
>> + goto exit_pm_disable;
>> + }
>> +
>> + reset_control_assert(tqspi->rst);
>> + udelay(2);
>> + reset_control_deassert(tqspi->rst);
>> + tqspi->def_command1_reg = QSPI_M_S | QSPI_CS_SW_HW | QSPI_CS_SW_VAL;
>> + tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
>> + tqspi->spi_cs_timing1 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING1);
>> + tqspi->spi_cs_timing2 = tegra_qspi_readl(tqspi, QSPI_CS_TIMING2);
>> + tqspi->def_command2_reg = tegra_qspi_readl(tqspi, QSPI_COMMAND2);
>> +
>> + pm_runtime_put(&pdev->dev);
>> +
>> + ret = request_threaded_irq(tqspi->irq, tegra_qspi_isr,
>> + tegra_qspi_isr_thread, IRQF_ONESHOT,
>> + dev_name(&pdev->dev), tqspi);
>> + if (ret < 0) {
>> + dev_err(&pdev->dev,
>> + "failed to request IRQ#%u: %d\n", tqspi->irq, ret);
>> + goto exit_pm_disable;
>> + }
>> +
>> + master->dev.of_node = pdev->dev.of_node;
>> + ret = devm_spi_register_master(&pdev->dev, master);
>> + if (ret < 0) {
>> + dev_err(&pdev->dev, "failed to register master: %d\n", ret);
>> + goto exit_free_irq;
>> + }
>> + return ret;
> return 0
>
>> +static int tegra_qspi_runtime_resume(struct device *dev)
>> +{
>> + struct spi_master *master = dev_get_drvdata(dev);
>> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
>> + int ret;
>> +
>> + ret = clk_prepare_enable(tqspi->clk);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev, "clk_prepare failed: %d\n", ret);
>> + return ret;
>> + }
>> + return 0;
> Always just 'return ret' here.
>
> Cheers
> Jon
>
Thanks Thierry. Will address below suggestions in v2.
Some inline comments below regarding client data part of controller driver.
Regards,
Sowjanya
On 12/4/20 4:11 AM, Thierry Reding wrote:
> On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
>> Tegra SoC has a Quad SPI controller starting from Tegra210.
>>
>> This patch adds support for Tegra210 QSPI controller.
>>
>> Signed-off-by: Sowjanya Komatineni <[email protected]>
>> ---
>> drivers/spi/Kconfig | 9 +
>> drivers/spi/Makefile | 1 +
>> drivers/spi/qspi-tegra.c | 1418 ++++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 1428 insertions(+)
>> create mode 100644 drivers/spi/qspi-tegra.c
>>
>> diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
>> index 3fd16b7..1a021e8 100644
>> --- a/drivers/spi/Kconfig
>> +++ b/drivers/spi/Kconfig
>> @@ -844,6 +844,15 @@ config SPI_MXS
>> help
>> SPI driver for Freescale MXS devices.
>>
>> +config QSPI_TEGRA
> You already discussed this with Mark, but perhaps a better name would be
> SPI_TEGRA210_QUAD or something. SPI_TEGRA210 is too generic because
> there is a regular SPI controller on Tegra210 as well.
>
> SPI_TEGRA210_QUAD is in line with the likes of SPI_TEGRA20_SFLASH and
> SPI_TEGRA20_SLINK.
>
>> + tristate "Nvidia Tegra QSPI Controller"
> NVIDIA
>
>> + depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
> I don't think we need the ARCH_TEGRA part there because TEGRA20_APB_DMA
> already depends on that. Also, there's not strictly a dependency on
> TEGRA20_APB_DMA specifically, but rather a dependency on DMA_ENGINE,
> right? The DMA channels could be coming from some other driver on some
> other SoC generation, such as the Tegra186 and later GPCDMA.
>
>> + depends on RESET_CONTROLLER
>> + help
>> + QSPI driver for Nvidia Tegra QSPI Controller interface. This
> NVIDIA
>
>> + controller is different from the spi controller and is available
> SPI
>
>> + on Tegra SoCs starting from Tegra210.
>> +
>> config SPI_TEGRA114
>> tristate "NVIDIA Tegra114 SPI Controller"
>> depends on (ARCH_TEGRA && TEGRA20_APB_DMA) || COMPILE_TEST
> [...]
>> diff --git a/drivers/spi/qspi-tegra.c b/drivers/spi/qspi-tegra.c
> [...]
>> +struct tegra_qspi_client_data {
>> + int tx_clk_tap_delay;
>> + int rx_clk_tap_delay;
>> +};
> If this is client data, why are we dealing with this in the controller
> driver? Is this perhaps something that could be added to struct
> spi_device?
tx/rx clock tap delays are based on platform design and values may vary
depending on trace lengths to corresponding spi device.
These tap delays are programmed in Tegra QSPI controller.
>
>> +
>> +struct tegra_qspi_data {
> That _data just seems to be 5 extra characters that don't add any value.
>
>> + struct device *dev;
>> + struct spi_master *master;
>> + /* Lock to protect data accessed by irq */
>> + spinlock_t lock;
>> +
>> + struct clk *clk;
>> + struct reset_control *rst;
>> + void __iomem *base;
>> + phys_addr_t phys;
>> + unsigned int irq;
>> +
>> + u32 cur_speed;
>> + unsigned int cur_pos;
>> + unsigned int words_per_32bit;
>> + unsigned int bytes_per_word;
>> + unsigned int curr_dma_words;
>> + unsigned int cur_direction;
>> +
>> + unsigned int cur_rx_pos;
>> + unsigned int cur_tx_pos;
>> +
>> + unsigned int dma_buf_size;
>> + unsigned int max_buf_size;
>> + bool is_curr_dma_xfer;
>> +
>> + struct completion rx_dma_complete;
>> + struct completion tx_dma_complete;
>> +
>> + u32 tx_status;
>> + u32 rx_status;
>> + u32 status_reg;
>> + bool is_packed;
>> + bool use_dma;
>> +
>> + u32 command1_reg;
>> + u32 dma_control_reg;
>> + u32 def_command1_reg;
>> + u32 def_command2_reg;
>> + u32 spi_cs_timing1;
>> + u32 spi_cs_timing2;
>> + u8 dummy_cycles;
>> +
>> + struct completion xfer_completion;
>> + struct spi_transfer *curr_xfer;
>> +
>> + struct dma_chan *rx_dma_chan;
>> + u32 *rx_dma_buf;
>> + dma_addr_t rx_dma_phys;
>> + struct dma_async_tx_descriptor *rx_dma_desc;
>> +
>> + struct dma_chan *tx_dma_chan;
>> + u32 *tx_dma_buf;
>> + dma_addr_t tx_dma_phys;
>> + struct dma_async_tx_descriptor *tx_dma_desc;
>> +};
>> +
>> +static int tegra_qspi_runtime_suspend(struct device *dev);
>> +static int tegra_qspi_runtime_resume(struct device *dev);
> Can't we just reorder the code so that these don't have to be forward-
> declared?
>
>> +
>> +static inline u32 tegra_qspi_readl(struct tegra_qspi_data *tqspi,
>> + unsigned long reg)
> Nit: I prefer "offset" over "reg" because I think it's slightly more
> accurate.
>
>> +{
>> + return readl(tqspi->base + reg);
>> +}
>> +
>> +static inline void tegra_qspi_writel(struct tegra_qspi_data *tqspi,
>> + u32 val, unsigned long reg)
> I also prefer "value" over "val" because "val" could also be short for
> "valid". Anyway, I am pedantic that way, so feel free to ignore that.
>
> [...]
>> +static unsigned int
>> +tegra_qspi_calculate_curr_xfer_param(struct tegra_qspi_data *tqspi,
>> + struct spi_transfer *t)
>> +{
>> + unsigned int remain_len = t->len - tqspi->cur_pos;
>> + unsigned int max_word;
>> + unsigned int bits_per_word = t->bits_per_word;
>> + unsigned int max_len;
>> + unsigned int total_fifo_words;
> You could list some of these on the same line to make this a bit more
> compact. Something I've often seen done that makes this really clean is
> to have uninitialized variables on one line and then assignments on
> separate lines, then sort lines by length:
>
> unsigned int max_word, max_len, total_fifo_words;
> unsigned int remain_len = t->len - tqspi->cur_pos;
> unsigned int bits_per_word = t->bits_per_word;
>
> This also applies to some other functions further down.
>
> [...]
>> +static void
>> +tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
>> + struct spi_transfer *t)
>> +{
>> + /* Make the dma buffer to read by cpu */
> This comment seems a bit redundant. dma_sync_single_for_cpu() is a well-
> documented function that doesn't need explanation.
>
>> + dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
>> + tqspi->dma_buf_size, DMA_TO_DEVICE);
>> +
>> + if (tqspi->is_packed) {
>> + unsigned int len = tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
>> +
>> + memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
>> + tqspi->cur_tx_pos += tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
>> + } else {
>> + u8 *tx_buf = (u8 *)t->tx_buf + tqspi->cur_tx_pos;
>> + unsigned int i;
>> + unsigned int count;
>> + unsigned int consume;
>> + unsigned int write_bytes;
>> +
>> + consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
>> + if (consume > t->len - tqspi->cur_pos)
>> + consume = t->len - tqspi->cur_pos;
>> + write_bytes = consume;
>> + for (count = 0; count < tqspi->curr_dma_words; count++) {
>> + u32 x = 0;
>> +
>> + for (i = 0; consume && (i < tqspi->bytes_per_word);
>> + i++, consume--)
>> + x |= (u32)(*tx_buf++) << (i * 8);
>> + tqspi->tx_dma_buf[count] = x;
>> + }
>> +
>> + tqspi->cur_tx_pos += write_bytes;
>> + }
>> +
>> + /* Make the dma buffer to read by dma */
> Same here.
>
>> + dma_sync_single_for_device(tqspi->dev, tqspi->tx_dma_phys,
>> + tqspi->dma_buf_size, DMA_TO_DEVICE);
>> +}
>> +
>> +static void
>> +tegra_qspi_copy_qspi_rxbuf_to_client_rxbuf(struct tegra_qspi_data *tqspi,
>> + struct spi_transfer *t)
>> +{
>> + /* Make the dma buffer to read by cpu */
> And here.
>
>> + dma_sync_single_for_cpu(tqspi->dev, tqspi->rx_dma_phys,
>> + tqspi->dma_buf_size, DMA_FROM_DEVICE);
>> +
>> + if (tqspi->is_packed) {
>> + unsigned int len = tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
>> +
>> + memcpy(t->rx_buf + tqspi->cur_rx_pos, tqspi->rx_dma_buf, len);
>> + tqspi->cur_rx_pos += tqspi->curr_dma_words *
>> + tqspi->bytes_per_word;
>> + } else {
>> + unsigned char *rx_buf = t->rx_buf + tqspi->cur_rx_pos;
>> + u32 rx_mask = ((u32)1 << t->bits_per_word) - 1;
>> + unsigned int i;
>> + unsigned int count;
>> + unsigned int consume;
>> + unsigned int read_bytes;
>> +
>> + consume = tqspi->curr_dma_words * tqspi->bytes_per_word;
>> + if (consume > t->len - tqspi->cur_pos)
>> + consume = t->len - tqspi->cur_pos;
>> + read_bytes = consume;
>> + for (count = 0; count < tqspi->curr_dma_words; count++) {
>> + u32 x = tqspi->rx_dma_buf[count] & rx_mask;
>> +
>> + for (i = 0; consume && (i < tqspi->bytes_per_word);
>> + i++, consume--)
>> + *rx_buf++ = (x >> (i * 8)) & 0xFF;
>> + }
>> +
>> + tqspi->cur_rx_pos += read_bytes;
>> + }
>> +
>> + /* Make the dma buffer to read by dma */
> And here.
>
>> + dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
>> + tqspi->dma_buf_size, DMA_FROM_DEVICE);
>> +}
>> +
>> +static void tegra_qspi_dma_complete(void *args)
>> +{
>> + struct completion *dma_complete = args;
>> +
>> + complete(dma_complete);
>> +}
>> +
>> +static int tegra_qspi_start_tx_dma(struct tegra_qspi_data *tqspi, int len)
>> +{
>> + reinit_completion(&tqspi->tx_dma_complete);
>> + tqspi->tx_dma_desc = dmaengine_prep_slave_single(tqspi->tx_dma_chan,
>> + tqspi->tx_dma_phys, len, DMA_MEM_TO_DEV,
>> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> Looks like most of this driver is wrapped at 80 columns. That rule was
> relaxed a bit a while ago and checkpatch now only warns if you exceed
> 100 columns. There are various places in this driver that could benefit
> from longer lines, such as the above.
>
>> + if (!tqspi->tx_dma_desc) {
>> + dev_err(tqspi->dev, "Not able to get desc for Tx\n");
> Perhaps: "Unable to get TX descriptor\n"?
>
>> + return -EIO;
>> + }
>> +
>> + tqspi->tx_dma_desc->callback = tegra_qspi_dma_complete;
>> + tqspi->tx_dma_desc->callback_param = &tqspi->tx_dma_complete;
>> +
>> + dmaengine_submit(tqspi->tx_dma_desc);
>> + dma_async_issue_pending(tqspi->tx_dma_chan);
>> + return 0;
>> +}
>> +
>> +static int tegra_qspi_start_rx_dma(struct tegra_qspi_data *tqspi, int len)
>> +{
>> + reinit_completion(&tqspi->rx_dma_complete);
>> + tqspi->rx_dma_desc = dmaengine_prep_slave_single(tqspi->rx_dma_chan,
>> + tqspi->rx_dma_phys, len, DMA_DEV_TO_MEM,
>> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>> + if (!tqspi->rx_dma_desc) {
>> + dev_err(tqspi->dev, "Not able to get desc for Rx\n");
> Same here.
>
>> + return -EIO;
>> + }
>> +
>> + tqspi->rx_dma_desc->callback = tegra_qspi_dma_complete;
>> + tqspi->rx_dma_desc->callback_param = &tqspi->rx_dma_complete;
>> +
>> + dmaengine_submit(tqspi->rx_dma_desc);
>> + dma_async_issue_pending(tqspi->rx_dma_chan);
>> + return 0;
>> +}
>> +
>> +static int tegra_qspi_flush_fifos(struct tegra_qspi_data *tqspi)
>> +{
>> + unsigned long timeout = jiffies + HZ;
>> + u32 status;
>> +
>> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>> + if ((status & QSPI_FIFO_EMPTY) == QSPI_FIFO_EMPTY)
>> + return 0;
>> +
>> + status |= QSPI_RX_FIFO_FLUSH | QSPI_TX_FIFO_FLUSH;
>> + tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
>> + while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
>> + status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>> + if (time_after(jiffies, timeout)) {
>> + dev_err(tqspi->dev,
>> + "timeout waiting for fifo flush\n");
> FIFO is an abbreviation, so it should be all uppercase in text.
>
>> + return -EIO;
>> + }
>> +
>> + udelay(1);
> It looks like this function can be called from both interrupt and
> process contexts, where the latter is the more common case and it is
> only ever used in interrupt context to clean up on error.
>
> I wonder if it's worth adding an "atomic" parameter to the function and
> make this a sleeping loop whenever possible. Also, can this not use one
> of the helpers from linux/iopoll.h?
>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void tegra_qspi_unmask_irq(struct tegra_qspi_data *tqspi)
>> +{
>> + u32 intr_mask;
>> +
>> + intr_mask = tegra_qspi_readl(tqspi, QSPI_INTR_MASK);
>> + intr_mask &= ~(QSPI_INTR_RDY_MASK | QSPI_INTR_RX_TX_FIFO_ERR);
>> + tegra_qspi_writel(tqspi, intr_mask, QSPI_INTR_MASK);
>> +}
>> +
>> +static int tegra_qspi_start_dma_based_transfer(struct tegra_qspi_data *tqspi,
>> + struct spi_transfer *t)
>> +{
>> + u32 val;
>> + unsigned int len;
>> + int ret = 0;
>> + u8 dma_burst;
>> + struct dma_slave_config dma_sconfig = {0};
> I think checkpatch wants spaces after { and before }.
>
>> +
>> + val = QSPI_DMA_BLK_SET(tqspi->curr_dma_words - 1);
>> + tegra_qspi_writel(tqspi, val, QSPI_DMA_BLK);
>> +
>> + tegra_qspi_unmask_irq(tqspi);
>> +
>> + if (tqspi->is_packed)
>> + len = DIV_ROUND_UP(tqspi->curr_dma_words *
>> + tqspi->bytes_per_word, 4) * 4;
>> + else
>> + len = tqspi->curr_dma_words * 4;
>> +
>> + /* Set attention level based on length of transfer */
>> + val = 0;
>> + if (len & 0xF) {
> Nit: hexadecimal literals are usually lowercase.
>
>> + val |= QSPI_TX_TRIG_1 | QSPI_RX_TRIG_1;
>> + dma_burst = 1;
>> + } else if (((len) >> 4) & 0x1) {
>> + val |= QSPI_TX_TRIG_4 | QSPI_RX_TRIG_4;
>> + dma_burst = 4;
>> + } else {
>> + val |= QSPI_TX_TRIG_8 | QSPI_RX_TRIG_8;
>> + dma_burst = 8;
>> + }
>> +
>> + tegra_qspi_writel(tqspi, val, QSPI_DMA_CTL);
>> + tqspi->dma_control_reg = val;
>> +
>> + dma_sconfig.device_fc = true;
>> + if (tqspi->cur_direction & DATA_DIR_TX) {
>> + dma_sconfig.dst_addr = tqspi->phys + QSPI_TX_FIFO;
>> + dma_sconfig.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>> + dma_sconfig.dst_maxburst = dma_burst;
>> + ret = dmaengine_slave_config(tqspi->tx_dma_chan, &dma_sconfig);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev,
>> + "DMA slave config failed: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + tegra_qspi_copy_client_txbuf_to_qspi_txbuf(tqspi, t);
>> + ret = tegra_qspi_start_tx_dma(tqspi, len);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev,
>> + "Starting tx dma failed: %d\n", ret);
> "TX" and "DMA"
>
>> + return ret;
>> + }
>> + }
>> +
>> + if (tqspi->cur_direction & DATA_DIR_RX) {
>> + dma_sconfig.src_addr = tqspi->phys + QSPI_RX_FIFO;
>> + dma_sconfig.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>> + dma_sconfig.src_maxburst = dma_burst;
>> + ret = dmaengine_slave_config(tqspi->rx_dma_chan, &dma_sconfig);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev,
>> + "DMA slave config failed: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + /* Make the dma buffer to read by dma */
> Again, not a useful comment.
>
>> + dma_sync_single_for_device(tqspi->dev, tqspi->rx_dma_phys,
>> + tqspi->dma_buf_size,
>> + DMA_FROM_DEVICE);
>> +
>> + ret = tegra_qspi_start_rx_dma(tqspi, len);
>> + if (ret < 0) {
>> + dev_err(tqspi->dev,
>> + "Starting rx dma failed: %d\n", ret);
> "RX" and "DMA"
>
> [...]
>> +static int tegra_qspi_init_dma_param(struct tegra_qspi_data *tqspi,
>> + bool dma_to_memory)
>> +{
>> + struct dma_chan *dma_chan;
>> + u32 *dma_buf;
>> + dma_addr_t dma_phys;
>> +
>> + if (!device_property_present(tqspi->dev, "dmas"))
>> + return 0;
> If DMA support is optional, then we definitely don't want to depend on
> TEGRA20_APB_DMA (or any other specific driver).
>
>> +
>> + dma_chan = dma_request_chan(tqspi->dev, dma_to_memory ? "rx" : "tx");
> Does this return some specific value when there's no channel for this?
> I.e. what happens if the "dmas" property exists in device tree but we
> don't have a driver for the provider enabled? Since we have code to use
> as fallback, we may want to special case that here and allow the driver
> to continue.
In current Tegra194 device tree, I don't see dma properties populated
and we don't have GPCDMA upstream yet.
So kind of thought we will add dma properties to device tree only when
we have corresponding dma driver support.
Yeah understood there's nothing restricting to follow this. Will fix in
v2 to switch PIO mode even for the case when dma properties are present
but dma request fails.
>
>> + if (IS_ERR(dma_chan))
>> + return dev_err_probe(tqspi->dev, PTR_ERR(dma_chan),
>> + "Dma channel is not available\n");
> "DMA"
>
> [...]
>> +static int tegra_qspi_start_transfer_one(struct spi_device *spi,
>> + struct spi_transfer *t, u32 command1)
>> +{
> [...]
>> + command1 &= ~QSPI_INTERFACE_WIDTH_MASK;
>> + if (bus_width == SPI_NBITS_QUAD)
>> + command1 |= QSPI_INTERFACE_WIDTH_QUAD;
>> + else if (bus_width == SPI_NBITS_DUAL)
>> + command1 |= QSPI_INTERFACE_WIDTH_DUAL;
>> + else
>> + command1 |= QSPI_INTERFACE_WIDTH_SINGLE;
>> + tqspi->command1_reg = command1;
> This (and in some other places in the driver) could use a couple of
> blank lines to make this less cluttered.
>
> [...]
>> +static void tegra_qspi_cleanup(struct spi_device *spi)
>> +{
>> + struct tegra_qspi_client_data *cdata = spi->controller_data;
>> +
>> + spi->controller_data = NULL;
>> + if (spi->dev.of_node)
> Can this ever fail? Do we support SPI device instantiation from
> somewhere else than DT?
>
>> + kfree(cdata);
>> +}
>> +
>> +static int tegra_qspi_setup(struct spi_device *spi)
>> +{
>> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(spi->master);
>> + struct tegra_qspi_client_data *cdata = spi->controller_data;
>> + u32 tx_tap = 0, rx_tap = 0;
>> + u32 val;
>> + unsigned long flags;
>> + int ret;
>> +
>> + dev_dbg(&spi->dev, "setup %d bpw, %scpol, %scpha, %dHz\n",
>> + spi->bits_per_word,
>> + spi->mode & SPI_CPOL ? "" : "~",
>> + spi->mode & SPI_CPHA ? "" : "~",
>> + spi->max_speed_hz);
>> +
>> + if (!cdata) {
>> + cdata = tegra_qspi_parse_cdata_dt(spi);
> Oh... I see that this is actually parsed from the SPI device node, so
> perhaps this is okay to do.
yeah. ok
>
> [...]
>> +static int tegra_qspi_transfer_one_message(struct spi_master *master,
>> + struct spi_message *msg)
>> +{
> [...]
>> + is_first_msg = false;
>> + ret = wait_for_completion_timeout(&tqspi->xfer_completion,
>> + QSPI_DMA_TIMEOUT);
>> + if (WARN_ON(ret == 0)) {
>> + dev_err(tqspi->dev,
>> + "qspi transfer timeout: %d\n", ret);
> "QSPI", or alternatively leave this out completely because it should be
> obvious that the transfer that failed is a QSPI transfer just from the
> device name.
>
>> + if (tqspi->is_curr_dma_xfer &&
>> + (tqspi->cur_direction & DATA_DIR_TX))
>> + dmaengine_terminate_all(tqspi->tx_dma_chan);
>> + if (tqspi->is_curr_dma_xfer &&
>> + (tqspi->cur_direction & DATA_DIR_RX))
>> + dmaengine_terminate_all(tqspi->rx_dma_chan);
>> + ret = -EIO;
>> + tegra_qspi_dump_regs(tqspi);
>> + tegra_qspi_flush_fifos(tqspi);
>> + reset_control_assert(tqspi->rst);
>> + udelay(2);
>> + reset_control_deassert(tqspi->rst);
>> + goto exit;
>> + }
>> +
>> + if (tqspi->tx_status || tqspi->rx_status) {
>> + ret = -EIO;
>> + dev_err(tqspi->dev, "error in transfer: %d\n", ret);
> Would it make sense to output tx_status and/or rx_status in this message
> rather than -EIO unconditionally?
>
> [...]
>> +static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi_data *tqspi)
>> +{
>> + struct spi_transfer *t = tqspi->curr_xfer;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&tqspi->lock, flags);
>> + if (tqspi->tx_status || tqspi->rx_status) {
>> + dev_err(tqspi->dev, "CpuXfer ERROR bit set 0x%x\n",
>> + tqspi->status_reg);
>> + dev_err(tqspi->dev, "CpuXfer 0x%08x:0x%08x\n",
>> + tqspi->command1_reg, tqspi->dma_control_reg);
>> + tegra_qspi_dump_regs(tqspi);
>> + tegra_qspi_flush_fifos(tqspi);
>> + complete(&tqspi->xfer_completion);
>> + spin_unlock_irqrestore(&tqspi->lock, flags);
>> + reset_control_assert(tqspi->rst);
>> + udelay(2);
>> + reset_control_deassert(tqspi->rst);
>> + return IRQ_HANDLED;
>> + }
> Can this error handling be somehow unified with the DMA error handling?
> Having markers such as "CpuXfer" in error messages is not really helpful
> to users. Maybe have the driver output an INFO message or so during
> probe when it falls back to PIO mode, that would be a good enough
> indicator to someone debugging some issue that PIO mode is being used.
> And then you can just treat transfer failures more uniformly.
>
> Thierry
On Fri, Dec 04, 2020 at 01:04:46PM -0800, Sowjanya Komatineni wrote:
> On 12/4/20 10:52 AM, Mark Brown wrote:
> > On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
> > > Also unpack mode needs to manually put the bytes together from read data to
> > > SPI core rx buffer.
> > Could you be more explicit here, I don't know what "unpack mode" is?
> Tegra SPI/QSPI controller support packed mode and unpacked mode based on
> bits per word in a transfer.
> Packed Mode: When enabled, all 32-bits of data in FIFO contains valid data
> packets of 8-bit/16-bit/32-bit length.
> Non packed mode: For transfers like 24-bit data for example we disable
> packed mode and only 24-bits of FIFO data are valid and other bits are 0's.
> So during TX for FIFO filling and during receive when FIFO data is read, SW
> need to skip invalid bits and should align order from/to SPI core tx/rx
> buffers.
That's pretty surprising - is it really worth the overhead of using
non-packed mode compared to just doing the transfer in 8 bit mode? In
any case it seems better to only do the memcpy() stuff in the cases
where it's actually required since it looks like fairly obvious overhead
otherwise, and the code could use some comments explaining why we're
doing this. It may actually be that the implementation is already doing
the most sensible thing and it just needs more comments explaining why
that's the case.
> > This is not a good idea, attempting to reverse engineer the message and
> > guess at the contents isn't going to be robust and if it's useful it
> > will if nothing else lead to a bunch of duplicated code in drivers as
> > every device that has this feature will need to reimplment it. Instead
> > we should extend the framework so there's explicit support for
> > specifying transfers that are padding bytes, then there's no guesswork
> > that can go wrong and no duplicated code between drivers. A flag in the
> > transfer struct might work?
> As per QSPI spec, Dummy bytes for initial read latency are always FF's. So
> its not like guessing the contents.
The guesswork I was thinking of was deciding to do this rather than the
pattern being output - the bit where the driver figures out that the
intent of that transfer is to provide dummy bytes.
> Tegra QSPI controller HW supports transferring dummy bytes (sending FF's
> after address) based on dummy clock cycles programmed.
> To allow Tegra QSPI HW transfer dummy bytes directly, controller driver need
> number of dummy bytes / actual dummy clock cycles which core driver gets
> from flash sfdp read data.
Sure, the use case makes sense.
> So, we can add flag to transfer and based on this flag if controller HW
> supports then we can ignore filling dummy bytes in spi_mem_exec_op but
> controller driver still needs dummy_cycles value. So probably along with
> flag, do you agree to have dummy_cycle as part of transfer struct which can
> be set to nor->read_dummy value?
Yeah, or given that perhaps just skip the flag and do this by specifying
dummy_cycles. Or if this is always a multiple of 8 (which I guess it
must be to do it using normal byte transfers) perhaps just have the flag
and use the existing length field to infer the number of cycles? I've
not actually looked at the details at all so one or both of those
suggestions may not actually make sense, please take them with a grain
of salt.
I'd recommend doing this as a followup to introducing the base driver,
start off with the less efficient explicit writes and then add the API
and add the support in the driver - that way the new API can be
reviewed without it holding up the rest of the driver.
On 12/4/20 2:46 PM, Mark Brown wrote:
> On Fri, Dec 04, 2020 at 01:04:46PM -0800, Sowjanya Komatineni wrote:
>> On 12/4/20 10:52 AM, Mark Brown wrote:
>>> On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
>>>> Also unpack mode needs to manually put the bytes together from read data to
>>>> SPI core rx buffer.
>>> Could you be more explicit here, I don't know what "unpack mode" is?
>> Tegra SPI/QSPI controller support packed mode and unpacked mode based on
>> bits per word in a transfer.
>> Packed Mode: When enabled, all 32-bits of data in FIFO contains valid data
>> packets of 8-bit/16-bit/32-bit length.
>> Non packed mode: For transfers like 24-bit data for example we disable
>> packed mode and only 24-bits of FIFO data are valid and other bits are 0's.
>> So during TX for FIFO filling and during receive when FIFO data is read, SW
>> need to skip invalid bits and should align order from/to SPI core tx/rx
>> buffers.
> That's pretty surprising - is it really worth the overhead of using
> non-packed mode compared to just doing the transfer in 8 bit mode? In
> any case it seems better to only do the memcpy() stuff in the cases
> where it's actually required since it looks like fairly obvious overhead
> otherwise, and the code could use some comments explaining why we're
> doing this. It may actually be that the implementation is already doing
> the most sensible thing and it just needs more comments explaining why
> that's the case.
Understand the overhead but If any device specific transfers use/need 24
bits per word, without non-packed mode we should fail the transfer.
Tegra HW has non-packed mode for such cases.
OK. Will use dma_map/unmap for packed mode transfer and for non-packed
mode will use dma buf for fifo data and then can fill SPI core rx_buf
with valid bytes from dma buf contents.
Sure will add comments for non-packed mode logic.
>>> This is not a good idea, attempting to reverse engineer the message and
>>> guess at the contents isn't going to be robust and if it's useful it
>>> will if nothing else lead to a bunch of duplicated code in drivers as
>>> every device that has this feature will need to reimplment it. Instead
>>> we should extend the framework so there's explicit support for
>>> specifying transfers that are padding bytes, then there's no guesswork
>>> that can go wrong and no duplicated code between drivers. A flag in the
>>> transfer struct might work?
>> As per QSPI spec, Dummy bytes for initial read latency are always FF's. So
>> its not like guessing the contents.
> The guesswork I was thinking of was deciding to do this rather than the
> pattern being output - the bit where the driver figures out that the
> intent of that transfer is to provide dummy bytes.
>
>> Tegra QSPI controller HW supports transferring dummy bytes (sending FF's
>> after address) based on dummy clock cycles programmed.
>> To allow Tegra QSPI HW transfer dummy bytes directly, controller driver need
>> number of dummy bytes / actual dummy clock cycles which core driver gets
>> from flash sfdp read data.
> Sure, the use case makes sense.
>
>> So, we can add flag to transfer and based on this flag if controller HW
>> supports then we can ignore filling dummy bytes in spi_mem_exec_op but
>> controller driver still needs dummy_cycles value. So probably along with
>> flag, do you agree to have dummy_cycle as part of transfer struct which can
>> be set to nor->read_dummy value?
> Yeah, or given that perhaps just skip the flag and do this by specifying
> dummy_cycles. Or if this is always a multiple of 8 (which I guess it
> must be to do it using normal byte transfers) perhaps just have the flag
> and use the existing length field to infer the number of cycles? I've
> not actually looked at the details at all so one or both of those
> suggestions may not actually make sense, please take them with a grain
> of salt.
>
> I'd recommend doing this as a followup to introducing the base driver,
> start off with the less efficient explicit writes and then add the API
> and add the support in the driver - that way the new API can be
> reviewed without it holding up the rest of the driver.
ok I think adding dummy_cycles to transfer is enough without flag.
If dummy cycles is 0, definitely no dummy bytes transfer.
So will get rid of code that detects dummy bytes xfer phase from list of
transfers.
Thanks Mark.
Regards,
Sowjanya
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
> + ret = devm_spi_register_master(&pdev->dev, master);
[...]
> +static int tegra_qspi_remove(struct platform_device *pdev)
> +{
> + struct spi_master *master = platform_get_drvdata(pdev);
> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
> +
> + free_irq(tqspi->irq, tqspi);
> +
> + tegra_qspi_deinit_dma_param(tqspi, false);
> + tegra_qspi_deinit_dma_param(tqspi, true);
> +
> + pm_runtime_disable(&pdev->dev);
> + if (!pm_runtime_status_suspended(&pdev->dev))
> + tegra_qspi_runtime_suspend(&pdev->dev);
> +
> + return 0;
> +}
With devm_spi_register_master(), the SPI controller is unregistered
*after* tegra_qspi_remove(). SPI transactions may still be ongoing
until the SPI controller is unregistered, yet you perform teardown
steps (such as freeing the IRQ) while it is still registered.
Bottom line is, you can't use devm_spi_register_master() in this case.
You need to use spi_register_master() and explicitly call
spi_unregister_master() in tegra_qspi_remove() *before* performing
teardown steps.
However, be sure to use the devm variant to *allocate* the SPI controller,
i.e. use devm_spi_alloc_master() instead of spi_alloc_master().
Thanks,
Lukas
On 12/6/20 10:16 AM, Lukas Wunner wrote:
> On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
>> + ret = devm_spi_register_master(&pdev->dev, master);
> [...]
>> +static int tegra_qspi_remove(struct platform_device *pdev)
>> +{
>> + struct spi_master *master = platform_get_drvdata(pdev);
>> + struct tegra_qspi_data *tqspi = spi_master_get_devdata(master);
>> +
>> + free_irq(tqspi->irq, tqspi);
>> +
>> + tegra_qspi_deinit_dma_param(tqspi, false);
>> + tegra_qspi_deinit_dma_param(tqspi, true);
>> +
>> + pm_runtime_disable(&pdev->dev);
>> + if (!pm_runtime_status_suspended(&pdev->dev))
>> + tegra_qspi_runtime_suspend(&pdev->dev);
>> +
>> + return 0;
>> +}
> With devm_spi_register_master(), the SPI controller is unregistered
> *after* tegra_qspi_remove(). SPI transactions may still be ongoing
> until the SPI controller is unregistered, yet you perform teardown
> steps (such as freeing the IRQ) while it is still registered.
>
> Bottom line is, you can't use devm_spi_register_master() in this case.
> You need to use spi_register_master() and explicitly call
> spi_unregister_master() in tegra_qspi_remove() *before* performing
> teardown steps.
>
> However, be sure to use the devm variant to *allocate* the SPI controller,
> i.e. use devm_spi_alloc_master() instead of spi_alloc_master().
>
> Thanks,
>
> Lukas
Thanks Lukas. I see devm_spi_alloc_master() in 5.4 but not from 5.5
Thanks
Sowjanya
On Mon, Dec 07, 2020 at 04:14:53PM -0800, Sowjanya Komatineni wrote:
> On 12/6/20 10:16 AM, Lukas Wunner wrote:
> > However, be sure to use the devm variant to *allocate* the SPI controller,
> > i.e. use devm_spi_alloc_master() instead of spi_alloc_master().
>
> Thanks Lukas. I see devm_spi_alloc_master() in 5.4 but not from 5.5
devm_spi_alloc_master() was introduced in v5.10-rc5 with commit
5e844cc37a5c and then backported to 5.9-stable and 5.4-stable.
Patches are pending to also backport it to 4.19-stable, 4.14-stable,
4.9-stable and 4.4-stable.
If your development branch is based on v5.5, just cherry-pick
5e844cc37a5c and you should be good to go.
Thanks,
Lukas
On Mon, Dec 07, 2020 at 04:02:15PM -0800, Sowjanya Komatineni wrote:
> So by having it in spi_message, controller driver can get dummy cycles info
> a head of transfers so it can program this along with address bytes transfer
> phase.
That sounds fine.
On Tue, Dec 01, 2020 at 01:12:43PM -0800, Sowjanya Komatineni wrote:
> This patch adds YAML based device tree binding document for Tegra
> QSPI driver.
>
> Signed-off-by: Sowjanya Komatineni <[email protected]>
> ---
> .../devicetree/bindings/spi/nvidia,tegra-qspi.yaml | 77 ++++++++++++++++++++++
> 1 file changed, 77 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>
> diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
> new file mode 100644
> index 0000000..038a085
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/spi/nvidia,tegra-qspi.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Tegra Quad SPI Controller
> +
> +maintainers:
> + - Thierry Reding <[email protected]>
> + - Jonathan Hunter <[email protected]>
> +
> +properties:
> + compatible:
> + enum:
> + - nvidia,tegra210-qspi
> + - nvidia,tegra186-qspi
> + - nvidia,tegra194-qspi
> +
> + reg:
> + items:
> + - description: QSPI registers
Just 'maxItems: 1'
> +
> + interrupts:
> + maxItems: 1
> +
> + clock-names:
> + items:
> + - const: qspi
Kind of a pointless name.
> +
> + clocks:
> + maxItems: 1
> +
> + reset-names:
> + minItems: 1
> + items:
> + - const: qspi
Same here.
> +
> + resets:
> + maxItems: 1
> +
> + dmas:
> + maxItems: 2
> +
> + dma-names:
> + items:
> + - const: rx
> + - const: tx
> +
> +required:
> + - compatible
> + - reg
> + - interrupts
> + - clock-names
> + - clocks
> + - reset-names
> + - resets
> +
> +additionalProperties: true
> +
> +examples:
> + - |
> + #include <dt-bindings/clock/tegra210-car.h>
> + #include <dt-bindings/reset/tegra210-car.h>
> + #include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> + spi@70410000 {
> + compatible = "nvidia,tegra210-qspi";
> + reg = <0x70410000 0x1000>;
> + interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
> + clocks = <&tegra_car TEGRA210_CLK_QSPI>;
> + clock-names = "qspi";
> + resets = <&tegra_car 211>;
> + reset-names = "qspi";
> + dmas = <&apbdma 5>, <&apbdma 5>;
> + dma-names = "rx", "tx";
> + };
> --
> 2.7.4
>
On 12/9/20 9:26 AM, Rob Herring wrote:
> On Tue, Dec 01, 2020 at 01:12:43PM -0800, Sowjanya Komatineni wrote:
>> This patch adds YAML based device tree binding document for Tegra
>> QSPI driver.
>>
>> Signed-off-by: Sowjanya Komatineni <[email protected]>
>> ---
>> .../devicetree/bindings/spi/nvidia,tegra-qspi.yaml | 77 ++++++++++++++++++++++
>> 1 file changed, 77 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>>
>> diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>> new file mode 100644
>> index 0000000..038a085
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>> @@ -0,0 +1,77 @@
>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/spi/nvidia,tegra-qspi.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Tegra Quad SPI Controller
>> +
>> +maintainers:
>> + - Thierry Reding <[email protected]>
>> + - Jonathan Hunter <[email protected]>
>> +
>> +properties:
>> + compatible:
>> + enum:
>> + - nvidia,tegra210-qspi
>> + - nvidia,tegra186-qspi
>> + - nvidia,tegra194-qspi
>> +
>> + reg:
>> + items:
>> + - description: QSPI registers
> Just 'maxItems: 1'
>
>> +
>> + interrupts:
>> + maxItems: 1
>> +
>> + clock-names:
>> + items:
>> + - const: qspi
> Kind of a pointless name.
Thanks Rob for feedback. Do you mean to change name something like
qspi-clk instead of qspi?
>
>> +
>> + clocks:
>> + maxItems: 1
>> +
>> + reset-names:
>> + minItems: 1
>> + items:
>> + - const: qspi
> Same here.
>
>> +
>> + resets:
>> + maxItems: 1
>> +
>> + dmas:
>> + maxItems: 2
>> +
>> + dma-names:
>> + items:
>> + - const: rx
>> + - const: tx
>> +
>> +required:
>> + - compatible
>> + - reg
>> + - interrupts
>> + - clock-names
>> + - clocks
>> + - reset-names
>> + - resets
>> +
>> +additionalProperties: true
>> +
>> +examples:
>> + - |
>> + #include <dt-bindings/clock/tegra210-car.h>
>> + #include <dt-bindings/reset/tegra210-car.h>
>> + #include <dt-bindings/interrupt-controller/arm-gic.h>
>> +
>> + spi@70410000 {
>> + compatible = "nvidia,tegra210-qspi";
>> + reg = <0x70410000 0x1000>;
>> + interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
>> + clocks = <&tegra_car TEGRA210_CLK_QSPI>;
>> + clock-names = "qspi";
>> + resets = <&tegra_car 211>;
>> + reset-names = "qspi";
>> + dmas = <&apbdma 5>, <&apbdma 5>;
>> + dma-names = "rx", "tx";
>> + };
>> --
>> 2.7.4
>>
On 12/9/20 12:28 PM, Sowjanya Komatineni wrote:
>
> On 12/9/20 9:26 AM, Rob Herring wrote:
>> On Tue, Dec 01, 2020 at 01:12:43PM -0800, Sowjanya Komatineni wrote:
>>> This patch adds YAML based device tree binding document for Tegra
>>> QSPI driver.
>>>
>>> Signed-off-by: Sowjanya Komatineni <[email protected]>
>>> ---
>>> .../devicetree/bindings/spi/nvidia,tegra-qspi.yaml | 77
>>> ++++++++++++++++++++++
>>> 1 file changed, 77 insertions(+)
>>> create mode 100644
>>> Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>>> b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>>> new file mode 100644
>>> index 0000000..038a085
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra-qspi.yaml
>>> @@ -0,0 +1,77 @@
>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/spi/nvidia,tegra-qspi.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> +
>>> +title: Tegra Quad SPI Controller
>>> +
>>> +maintainers:
>>> + - Thierry Reding <[email protected]>
>>> + - Jonathan Hunter <[email protected]>
>>> +
>>> +properties:
>>> + compatible:
>>> + enum:
>>> + - nvidia,tegra210-qspi
>>> + - nvidia,tegra186-qspi
>>> + - nvidia,tegra194-qspi
>>> +
>>> + reg:
>>> + items:
>>> + - description: QSPI registers
>> Just 'maxItems: 1'
>>
>>> +
>>> + interrupts:
>>> + maxItems: 1
>>> +
>>> + clock-names:
>>> + items:
>>> + - const: qspi
>> Kind of a pointless name.
> Thanks Rob for feedback. Do you mean to change name something like
> qspi-clk instead of qspi?
Rob, reason I added clock name is later when we add ddr mode we will
have additional clock.
So driver uses clock name to retrieve.
Will add both clocks to dt-binding doc in v2 although support for ddr
mode in qspi driver can be implemented later.
>>
>>> +
>>> + clocks:
>>> + maxItems: 1
>>> +
>>> + reset-names:
>>> + minItems: 1
>>> + items:
>>> + - const: qspi
>> Same here.
>>
>>> +
>>> + resets:
>>> + maxItems: 1
>>> +
>>> + dmas:
>>> + maxItems: 2
>>> +
>>> + dma-names:
>>> + items:
>>> + - const: rx
>>> + - const: tx
>>> +
>>> +required:
>>> + - compatible
>>> + - reg
>>> + - interrupts
>>> + - clock-names
>>> + - clocks
>>> + - reset-names
>>> + - resets
>>> +
>>> +additionalProperties: true
>>> +
>>> +examples:
>>> + - |
>>> + #include <dt-bindings/clock/tegra210-car.h>
>>> + #include <dt-bindings/reset/tegra210-car.h>
>>> + #include <dt-bindings/interrupt-controller/arm-gic.h>
>>> +
>>> + spi@70410000 {
>>> + compatible = "nvidia,tegra210-qspi";
>>> + reg = <0x70410000 0x1000>;
>>> + interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>;
>>> + clocks = <&tegra_car TEGRA210_CLK_QSPI>;
>>> + clock-names = "qspi";
>>> + resets = <&tegra_car 211>;
>>> + reset-names = "qspi";
>>> + dmas = <&apbdma 5>, <&apbdma 5>;
>>> + dma-names = "rx", "tx";
>>> + };
>>> --
>>> 2.7.4
>>>