2023-02-07 06:09:48

by Potthuri, Sai Krishna

[permalink] [raw]
Subject: [PATCH 0/3] spi: spi-cadence-quadspi: Add Rx tuning support for DTR mode

Enable PHY and DQS required for Xilinx Versal Octal SPI to operate in DTR
protocol.
Add and update device_id field in spi_mem structure with flash id
information. Xilinx Versal Octal SPI driver requires the device id
information to perform the Rx tuning operation. Since there is no common
Tuning Data Pattern defined across all vendors, controllers like Xilinx
Versal Octal SPI which requires Rx tuning to find out the optimal sampling
point for data lines, this device id information will be used as a golden
data.
The reason behind choosing this approach instead of reading the ID again
in the controller driver is to make it generic solution.
- Other controller drivers which want to use similar tuning process, they
will make use of this ID instead of reading the ID again in the driver.
- Also, we can avoid hardcoding the command information and initiating the
transfer in the controller driver as this should happen from spi-nor.

Sai Krishna Potthuri (3):
spi: cadence-quadspi: Add support for PHY module and DQS
mtd: spi-nor: Add and update device_id field in spi_mem structure
spi: cadence-quadspi: Add Rx tuning support for Xilinx Versal OSPI

drivers/mtd/spi-nor/core.c | 1 +
drivers/spi/spi-cadence-quadspi.c | 226 +++++++++++++++++++++++++++++-
include/linux/spi/spi-mem.h | 4 +
3 files changed, 230 insertions(+), 1 deletion(-)

--
2.25.1



2023-02-07 06:09:51

by Potthuri, Sai Krishna

[permalink] [raw]
Subject: [PATCH 1/3] spi: cadence-quadspi: Add support for PHY module and DQS

Cadence Octal SPI Flash Controller has PHY module, which is used for
handling SPI transactions when there is a requirement for high
performance. Also, this controller supports DQS(data strobe) used for
data capturing in PHY mode rather than internally generated ref_clk.
Xilinx Versal Octal SPI use PHY module and DQS signal when operating in
DTR protocol.

Signed-off-by: Sai Krishna Potthuri <[email protected]>
---
drivers/spi/spi-cadence-quadspi.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index 676313e1bdad..b55b763003f0 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -40,6 +40,8 @@
#define CQSPI_SUPPORT_EXTERNAL_DMA BIT(2)
#define CQSPI_NO_SUPPORT_WR_COMPLETION BIT(3)
#define CQSPI_SLOW_SRAM BIT(4)
+#define CQSPI_HAS_PHY BIT(5)
+#define CQSPI_HAS_DQS BIT(6)

/* Capabilities */
#define CQSPI_SUPPORTS_OCTAL BIT(0)
@@ -89,6 +91,8 @@ struct cqspi_st {
u32 pd_dev_id;
bool wr_completion;
bool slow_sram;
+ bool use_phy;
+ bool use_dqs;
};

struct cqspi_driver_platdata {
@@ -112,6 +116,7 @@ struct cqspi_driver_platdata {
/* Register map */
#define CQSPI_REG_CONFIG 0x00
#define CQSPI_REG_CONFIG_ENABLE_MASK BIT(0)
+#define CQSPI_REG_CONFIG_PHY_MODE BIT(3)
#define CQSPI_REG_CONFIG_ENB_DIR_ACC_CTRL BIT(7)
#define CQSPI_REG_CONFIG_DECODE_MASK BIT(9)
#define CQSPI_REG_CONFIG_CHIPSELECT_LSB 10
@@ -151,6 +156,7 @@ struct cqspi_driver_platdata {
#define CQSPI_REG_DELAY_TSHSL_MASK 0xFF

#define CQSPI_REG_READCAPTURE 0x10
+#define CQSPI_REG_READCAPTURE_DQS BIT(8)
#define CQSPI_REG_READCAPTURE_BYPASS_LSB 0
#define CQSPI_REG_READCAPTURE_DELAY_LSB 1
#define CQSPI_REG_READCAPTURE_DELAY_MASK 0xF
@@ -468,6 +474,8 @@ static int cqspi_enable_dtr(struct cqspi_flash_pdata *f_pdata,
if (op->cmd.dtr) {
reg |= CQSPI_REG_CONFIG_DTR_PROTO;
reg |= CQSPI_REG_CONFIG_DUAL_OPCODE;
+ if (cqspi->use_phy)
+ reg |= CQSPI_REG_CONFIG_PHY_MODE;

/* Set up command opcode extension. */
ret = cqspi_setup_opcode_ext(f_pdata, op, shift);
@@ -476,10 +484,20 @@ static int cqspi_enable_dtr(struct cqspi_flash_pdata *f_pdata,
} else {
reg &= ~CQSPI_REG_CONFIG_DTR_PROTO;
reg &= ~CQSPI_REG_CONFIG_DUAL_OPCODE;
+ reg &= ~CQSPI_REG_CONFIG_PHY_MODE;
}

writel(reg, reg_base + CQSPI_REG_CONFIG);

+ reg = readl(reg_base + CQSPI_REG_READCAPTURE);
+
+ if (op->cmd.dtr)
+ reg |= CQSPI_REG_READCAPTURE_DQS;
+ else
+ reg &= ~CQSPI_REG_READCAPTURE_DQS;
+
+ writel(reg, reg_base + CQSPI_REG_READCAPTURE);
+
return cqspi_wait_idle(cqspi);
}

@@ -1700,6 +1718,10 @@ static int cqspi_probe(struct platform_device *pdev)
cqspi->wr_completion = false;
if (ddata->quirks & CQSPI_SLOW_SRAM)
cqspi->slow_sram = true;
+ if (ddata->quirks & CQSPI_HAS_PHY)
+ cqspi->use_phy = true;
+ if (ddata->quirks & CQSPI_HAS_DQS)
+ cqspi->use_dqs = true;

if (of_device_is_compatible(pdev->dev.of_node,
"xlnx,versal-ospi-1.0"))
@@ -1820,7 +1842,8 @@ static const struct cqspi_driver_platdata socfpga_qspi = {

static const struct cqspi_driver_platdata versal_ospi = {
.hwcaps_mask = CQSPI_SUPPORTS_OCTAL,
- .quirks = CQSPI_DISABLE_DAC_MODE | CQSPI_SUPPORT_EXTERNAL_DMA,
+ .quirks = CQSPI_DISABLE_DAC_MODE | CQSPI_SUPPORT_EXTERNAL_DMA
+ | CQSPI_HAS_PHY | CQSPI_HAS_DQS,
.indirect_read_dma = cqspi_versal_indirect_read_dma,
.get_dma_status = cqspi_get_versal_dma_status,
};
--
2.25.1


2023-02-07 06:10:02

by Potthuri, Sai Krishna

[permalink] [raw]
Subject: [PATCH 2/3] mtd: spi-nor: Add and update device_id field in spi_mem structure

Xilinx Versal Octal SPI controller drivers might require the device id
information to perform the Rx tuning operation. Since there is no common
Tuning Data Pattern defined across all vendors, controllers which requires
Rx tuning to find out the optimal sampling point for data lines, this
device id information will be used as a golden data.
spi-nor core updates the device id information in device_id field
while reading the id of the flash device.

Signed-off-by: Sai Krishna Potthuri <[email protected]>
---
drivers/mtd/spi-nor/core.c | 1 +
include/linux/spi/spi-mem.h | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index bee8fc4c9f07..9811e74ef3b6 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1644,6 +1644,7 @@ static const struct flash_info *spi_nor_match_id(struct spi_nor *nor,
if (part->id_len &&
!memcmp(part->id, id, part->id_len)) {
nor->manufacturer = manufacturers[i];
+ memcpy(nor->spimem->device_id, id, SPI_NOR_MAX_ID_LEN);
return part;
}
}
diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
index 8e984d75f5b6..e9ae5c1b7391 100644
--- a/include/linux/spi/spi-mem.h
+++ b/include/linux/spi/spi-mem.h
@@ -181,11 +181,14 @@ struct spi_mem_dirmap_desc {
void *priv;
};

+#define SPI_MEM_DEV_MAX_ID_LEN 6
+
/**
* struct spi_mem - describes a SPI memory device
* @spi: the underlying SPI device
* @drvpriv: spi_mem_driver private data
* @name: name of the SPI memory device
+ * @device_id: device id of the SPI memory device
*
* Extra information that describe the SPI memory device and may be needed by
* the controller to properly handle this device should be placed here.
@@ -197,6 +200,7 @@ struct spi_mem {
struct spi_device *spi;
void *drvpriv;
const char *name;
+ u8 device_id[SPI_MEM_DEV_MAX_ID_LEN];
};

/**
--
2.25.1


2023-02-07 06:10:11

by Potthuri, Sai Krishna

[permalink] [raw]
Subject: [PATCH 3/3] spi: cadence-quadspi: Add Rx tuning support for Xilinx Versal OSPI

Add Rx tuning support for Xilinx Versal Octal SPI Controller(OSPI).
Xilinx Versal Octal SPI controller requires Rx tuning to find out the
optimal tap value when running at higher clock in DTR protocol. As there
is no common Tuning Data Pattern defined across all vendors, Xilinx Versal
Octal SPI uses READ_ID based tuning algorithm for which this device_id
field in spi_mem will be used as a Tuning data pattern.
After enabling the DTR protocol, spi-nor sends READ ID command to verify
the DTR protocol switching is successfully done or not. If execute_tuning
is defined(not NULL) and clk_tuned is not set then Rx tuning algorithm
will be executed before it actually read the ID in DTR protocol.
Xilinx Versal tuning algorithm (as suggested by IP design team):
- Sweep all possible 127 taps to read the flash id for each tap.
- To consider a particular tap as valid make sure to get the correct id
for 10 times.
- Once valid window is found(window size >= 3), stop parsing the
remaining taps.
- Perform the above steps with extra dummy clock 1.
- Identify the maximum window size between extra dummy clock 0 and 1 and
configure the median of min and max valid tap of the corresponding window.
- If the maximum window size is with dummy clock 1 then all the subsequent
read operations should be performed with one extra dummy clock.

Signed-off-by: Sai Krishna Potthuri <[email protected]>
---
drivers/spi/spi-cadence-quadspi.c | 201 ++++++++++++++++++++++++++++++
1 file changed, 201 insertions(+)

diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index b55b763003f0..bc4395736def 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -93,6 +93,8 @@ struct cqspi_st {
bool slow_sram;
bool use_phy;
bool use_dqs;
+ bool clk_tuned;
+ u8 extra_dummy_clks;
};

struct cqspi_driver_platdata {
@@ -101,6 +103,7 @@ struct cqspi_driver_platdata {
int (*indirect_read_dma)(struct cqspi_flash_pdata *f_pdata,
u_char *rxbuf, loff_t from_addr, size_t n_rx);
u32 (*get_dma_status)(struct cqspi_st *cqspi);
+ int (*execute_tuning)(struct spi_mem *mem, const struct spi_mem_op *op);
};

/* Operation timeout value */
@@ -192,6 +195,7 @@ struct cqspi_driver_platdata {

#define CQSPI_REG_IRQSTATUS 0x40
#define CQSPI_REG_IRQMASK 0x44
+#define CQSPI_REG_VERSAL_ECO 0x48

#define CQSPI_REG_INDIRECTRD 0x60
#define CQSPI_REG_INDIRECTRD_START_MASK BIT(0)
@@ -238,6 +242,19 @@ struct cqspi_driver_platdata {
#define CQSPI_REG_POLLING_STATUS 0xB0
#define CQSPI_REG_POLLING_STATUS_DUMMY_LSB 16

+#define CQSPI_REG_VERSAL_PHY_CONFIG 0xB4
+#define CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK BIT(31)
+#define CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK BIT(30)
+#define CQSPI_REG_PHY_CONFIG_TX_DLL_DLY_LSB 16
+
+#define CQSPI_REG_VERSAL_PHY_MASTER_CTRL 0xB8
+#define CQSPI_REG_DLL_LOWER 0xBC
+#define CQSPI_REG_DLL_LOWER_LPBK_LOCK_MASK BIT(15)
+#define CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK BIT(0)
+
+#define CQSPI_REG_VERSAL_DLL_OBSVBLE_UPPER 0xC0
+#define CQSPI_REG_DLL_UPPER_RX_FLD_MASK 0x7F
+
#define CQSPI_REG_OP_EXT_LOWER 0xE0
#define CQSPI_REG_OP_EXT_READ_LSB 24
#define CQSPI_REG_OP_EXT_WRITE_LSB 16
@@ -282,6 +299,12 @@ struct cqspi_driver_platdata {
#define CQSPI_DMA_UNALIGN 0x3

#define CQSPI_REG_VERSAL_DMA_VAL 0x602
+#define CQSPI_VERSAL_MAX_DLL_TAPS 127
+#define CQSPI_VERSAL_TX_TAP_MASTER 0x1E
+#define CQSPI_VERSAL_ECO_MIN_FREQ 120000000
+#define CQSPI_VERSAL_INIT_DLY 4
+#define CQSPI_VERSAL_RDID_MAX_CNT 10
+#define CQSPI_VERSAL_MIN_WINDOW_SZ 3

static int cqspi_wait_for_bit(void __iomem *reg, const u32 mask, bool clr)
{
@@ -540,6 +563,14 @@ static int cqspi_command_read(struct cqspi_flash_pdata *f_pdata,
if (dummy_clk > CQSPI_DUMMY_CLKS_MAX)
return -EOPNOTSUPP;

+ /*
+ * Based on the tuning, some controllers requires extra dummy clocks due to additional
+ * delays in read path. For such controllers add required extra dummy clocks for all
+ * subsequent read operations.
+ */
+ if (cqspi->extra_dummy_clks)
+ dummy_clk += cqspi->extra_dummy_clks;
+
if (dummy_clk)
reg |= (dummy_clk & CQSPI_REG_CMDCTRL_DUMMY_MASK)
<< CQSPI_REG_CMDCTRL_DUMMY_LSB;
@@ -662,6 +693,14 @@ static int cqspi_read_setup(struct cqspi_flash_pdata *f_pdata,
if (dummy_clk > CQSPI_DUMMY_CLKS_MAX)
return -EOPNOTSUPP;

+ /*
+ * Based on the tuning, some controllers requires extra dummy clocks due to additional
+ * delays in read path. For such controllers add required extra dummy clocks for all
+ * subsequent read operations.
+ */
+ if (cqspi->extra_dummy_clks)
+ dummy_clk += cqspi->extra_dummy_clks;
+
if (dummy_clk)
reg |= (dummy_clk & CQSPI_REG_RD_INSTR_DUMMY_MASK)
<< CQSPI_REG_RD_INSTR_DUMMY_LSB;
@@ -1056,6 +1095,157 @@ static int cqspi_indirect_write_execute(struct cqspi_flash_pdata *f_pdata,
return ret;
}

+static int cqspi_versal_execute_tuning(struct spi_mem *mem, const struct spi_mem_op *op)
+{
+ struct cqspi_st *cqspi = spi_master_get_devdata(mem->spi->master);
+ u8 dummy_incr, dummy_flag = 0, count, max_index, min_index;
+ u8 min_rxtap, max_rxtap, avg_index, max_tap, windowsize;
+ struct platform_device *pdev = cqspi->pdev;
+ struct cqspi_flash_pdata *f_pdata;
+ bool id_matched, rxtapfound;
+ u32 val, i, reg, txtap = 0;
+ s8 max_windowsize = -1;
+ int ret;
+
+ f_pdata = &cqspi->f_pdata[mem->spi->chip_select];
+
+ reg = readl(cqspi->iobase + CQSPI_REG_CONFIG);
+ reg &= CQSPI_REG_CONFIG_ENABLE_MASK;
+ if (!reg)
+ return 0;
+
+ if (cqspi->master_ref_clk_hz >= CQSPI_VERSAL_ECO_MIN_FREQ)
+ writel(1, cqspi->iobase + CQSPI_REG_VERSAL_ECO);
+
+ /* Drive DLL reset bit to low */
+ writel(0, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+
+ /* Set initial delay value */
+ writel(CQSPI_VERSAL_INIT_DLY, cqspi->iobase + CQSPI_REG_VERSAL_PHY_MASTER_CTRL);
+
+ /* Set DLL reset bit */
+ writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK,
+ cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+
+ /* Check for loopback lock */
+ ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
+ CQSPI_REG_DLL_LOWER_LPBK_LOCK_MASK, 0);
+ if (ret) {
+ dev_err(&pdev->dev, "Loopback lock bit error (%i)\n", ret);
+ return ret;
+ }
+
+ /* Re-synchronize slave DLLs */
+ writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+ writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK | CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK,
+ cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+
+ txtap = CQSPI_VERSAL_TX_TAP_MASTER << CQSPI_REG_PHY_CONFIG_TX_DLL_DLY_LSB;
+ max_tap = CQSPI_VERSAL_MAX_DLL_TAPS;
+
+ for (dummy_incr = 0; dummy_incr <= 1; dummy_incr++) {
+ /* Update the extra dummy clocks required to read the ID */
+ cqspi->extra_dummy_clks = dummy_incr;
+ rxtapfound = false;
+ min_rxtap = 0;
+ max_rxtap = 0;
+ max_index = 0;
+ min_index = 0;
+
+ for (i = 0; i <= max_tap; i++) {
+ val = txtap | i | CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK;
+ writel(val, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+ writel(val | CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK,
+ cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+ ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
+ CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK, 0);
+ if (ret)
+ return ret;
+
+ /*
+ * To consider the tap as valid, make sure to read the Flash ID
+ * 10 times(recommended by IP design team). This makes the algorithm more
+ * robust and gives more confidence about the valid tap, this will also
+ * avoid getting the correct id by a fluke with a particular tap.
+ * In any one of the iteration if the Flash ID is not matches with the
+ * device_id, then that particular tap should not be considered as valid.
+ */
+ count = 0;
+ do {
+ /* Execute the command and wait for the complete */
+ ret = cqspi_command_read(f_pdata, op);
+ if (ret < 0) {
+ dev_err(&pdev->dev,
+ "error %d reading JEDEC ID\n", ret);
+ return ret;
+ }
+
+ id_matched = true;
+ if (memcmp(mem->device_id, op->data.buf.in, op->data.nbytes)) {
+ id_matched = false;
+ break;
+ }
+ } while (id_matched && (count++ <= CQSPI_VERSAL_RDID_MAX_CNT));
+
+ if (id_matched) {
+ u8 current_rxtap = readl(cqspi->iobase +
+ CQSPI_REG_VERSAL_DLL_OBSVBLE_UPPER) &
+ CQSPI_REG_DLL_UPPER_RX_FLD_MASK;
+
+ if (!rxtapfound) {
+ min_rxtap = current_rxtap;
+ min_index = i;
+ rxtapfound = true;
+ }
+
+ max_rxtap = current_rxtap;
+ max_index = i;
+ }
+
+ if (!rxtapfound)
+ continue;
+
+ /* If ID doesn't match or reach the max tap, calculate the window size */
+ if (!id_matched || i == max_tap) {
+ windowsize = max_rxtap - min_rxtap + 1;
+ if (windowsize > max_windowsize) {
+ dummy_flag = dummy_incr;
+ max_windowsize = windowsize;
+ avg_index = (max_index + min_index) / 2;
+ }
+
+ rxtapfound = false;
+ if (windowsize >= CQSPI_VERSAL_MIN_WINDOW_SZ)
+ break;
+ }
+ }
+ }
+
+ /*
+ * Look for minimum window size to ensure reliable data sampling even during drifts due
+ * to PVT, so that a drift either towards left or right can be well addressed.
+ */
+ if (max_windowsize < CQSPI_VERSAL_MIN_WINDOW_SZ)
+ return -EINVAL;
+
+ /* Update the extra dummy clocks based on the maximum window size */
+ cqspi->extra_dummy_clks = dummy_flag;
+
+ val = txtap | avg_index | CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK;
+ writel(val, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+ writel((CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK | val),
+ cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
+
+ ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
+ CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK, 0);
+ if (ret)
+ return ret;
+
+ cqspi->clk_tuned = true;
+
+ return 0;
+}
+
static void cqspi_chipselect(struct cqspi_flash_pdata *f_pdata)
{
struct cqspi_st *cqspi = f_pdata->cqspi;
@@ -1345,11 +1535,21 @@ static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
static int cqspi_mem_process(struct spi_mem *mem, const struct spi_mem_op *op)
{
struct cqspi_st *cqspi = spi_master_get_devdata(mem->spi->master);
+ const struct cqspi_driver_platdata *ddata;
+ struct device *dev = &cqspi->pdev->dev;
struct cqspi_flash_pdata *f_pdata;

+ ddata = of_device_get_match_data(dev);
f_pdata = &cqspi->f_pdata[mem->spi->chip_select];
cqspi_configure(f_pdata, mem->spi->max_speed_hz);

+ if (op->cmd.dtr && ddata && ddata->execute_tuning && !cqspi->clk_tuned) {
+ int ret = ddata->execute_tuning(mem, op);
+
+ if (ret)
+ return ret;
+ }
+
if (op->data.dir == SPI_MEM_DATA_IN && op->data.buf.in) {
if (!op->addr.nbytes)
return cqspi_command_read(f_pdata, op);
@@ -1846,6 +2046,7 @@ static const struct cqspi_driver_platdata versal_ospi = {
| CQSPI_HAS_PHY | CQSPI_HAS_DQS,
.indirect_read_dma = cqspi_versal_indirect_read_dma,
.get_dma_status = cqspi_get_versal_dma_status,
+ .execute_tuning = cqspi_versal_execute_tuning,
};

static const struct of_device_id cqspi_dt_ids[] = {
--
2.25.1


2023-02-07 06:48:13

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [PATCH 0/3] spi: spi-cadence-quadspi: Add Rx tuning support for DTR mode



On 2/7/23 06:09, Sai Krishna Potthuri wrote:
> Enable PHY and DQS required for Xilinx Versal Octal SPI to operate in DTR
> protocol.
> Add and update device_id field in spi_mem structure with flash id
> information. Xilinx Versal Octal SPI driver requires the device id
> information to perform the Rx tuning operation. Since there is no common
> Tuning Data Pattern defined across all vendors, controllers like Xilinx
> Versal Octal SPI which requires Rx tuning to find out the optimal sampling
> point for data lines, this device id information will be used as a golden
> data.

Using only 6 bytes as golden pattern seems fragile, but you are aware of
that, as I see that you chose to read the ID 10 times to make the
decision whether the tap is valid or not. Other option (which is not
perfect) is to use SFDP data as golden pattern. If I remember
correctly, JESD216 suggests to use the Read SFDP cmd at 50 MHz, so it
won't help you much. In practice SPI NOR uses the Read SFDP command at
the flash's maximum speed and we haven't seen problems. But better would
be to use some flash OTP data maybe? I remember Pratyush has submitted a
phy calibration series in the past, I haven't had the chance to read his
proposal. Did you? How's your proposal different than his?

Cheers,
ta

> The reason behind choosing this approach instead of reading the ID again
> in the controller driver is to make it generic solution.
> - Other controller drivers which want to use similar tuning process, they
> will make use of this ID instead of reading the ID again in the driver.
> - Also, we can avoid hardcoding the command information and initiating the
> transfer in the controller driver as this should happen from spi-nor.
>
> Sai Krishna Potthuri (3):
> spi: cadence-quadspi: Add support for PHY module and DQS
> mtd: spi-nor: Add and update device_id field in spi_mem structure
> spi: cadence-quadspi: Add Rx tuning support for Xilinx Versal OSPI
>
> drivers/mtd/spi-nor/core.c | 1 +
> drivers/spi/spi-cadence-quadspi.c | 226 +++++++++++++++++++++++++++++-
> include/linux/spi/spi-mem.h | 4 +
> 3 files changed, 230 insertions(+), 1 deletion(-)
>

2023-02-07 07:23:30

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [PATCH 3/3] spi: cadence-quadspi: Add Rx tuning support for Xilinx Versal OSPI



On 2/7/23 06:09, Sai Krishna Potthuri wrote:
> Add Rx tuning support for Xilinx Versal Octal SPI Controller(OSPI).
> Xilinx Versal Octal SPI controller requires Rx tuning to find out the
> optimal tap value when running at higher clock in DTR protocol. As there

Is the tunning required just for DTR? Is STR spared of the tunning
requirements? Because as of now, we use the maximum supported flash
speed from the begining and we don't change it.

An option is to start the NOR flash with min(max-flash-freq, 50 MHz), do
the read ID, SFDP read, golden pattern read, increase the freq to the
max-flash-freq, then do the calibration. The golden pattern must not
necessarily be a OTP-like data (ID, SFDP, OTP), we can reserve a portion
of the flash (non - OTP) for the golden pattern, can't we?

> is no common Tuning Data Pattern defined across all vendors, Xilinx Versal
> Octal SPI uses READ_ID based tuning algorithm for which this device_id
> field in spi_mem will be used as a Tuning data pattern.
> After enabling the DTR protocol, spi-nor sends READ ID command to verify
> the DTR protocol switching is successfully done or not. If execute_tuning
> is defined(not NULL) and clk_tuned is not set then Rx tuning algorithm

looks hackish, an explicit call to a phy-calibrate method would be
better IMO.

Cheers,
ta
> will be executed before it actually read the ID in DTR protocol.
> Xilinx Versal tuning algorithm (as suggested by IP design team):
> - Sweep all possible 127 taps to read the flash id for each tap.
> - To consider a particular tap as valid make sure to get the correct id
> for 10 times.
> - Once valid window is found(window size >= 3), stop parsing the
> remaining taps.
> - Perform the above steps with extra dummy clock 1.
> - Identify the maximum window size between extra dummy clock 0 and 1 and
> configure the median of min and max valid tap of the corresponding window.
> - If the maximum window size is with dummy clock 1 then all the subsequent
> read operations should be performed with one extra dummy clock.
>
> Signed-off-by: Sai Krishna Potthuri <[email protected]>
> ---
> drivers/spi/spi-cadence-quadspi.c | 201 ++++++++++++++++++++++++++++++
> 1 file changed, 201 insertions(+)
>
> diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
> index b55b763003f0..bc4395736def 100644
> --- a/drivers/spi/spi-cadence-quadspi.c
> +++ b/drivers/spi/spi-cadence-quadspi.c
> @@ -93,6 +93,8 @@ struct cqspi_st {
> bool slow_sram;
> bool use_phy;
> bool use_dqs;
> + bool clk_tuned;
> + u8 extra_dummy_clks;
> };
>
> struct cqspi_driver_platdata {
> @@ -101,6 +103,7 @@ struct cqspi_driver_platdata {
> int (*indirect_read_dma)(struct cqspi_flash_pdata *f_pdata,
> u_char *rxbuf, loff_t from_addr, size_t n_rx);
> u32 (*get_dma_status)(struct cqspi_st *cqspi);
> + int (*execute_tuning)(struct spi_mem *mem, const struct spi_mem_op *op);
> };
>
> /* Operation timeout value */
> @@ -192,6 +195,7 @@ struct cqspi_driver_platdata {
>
> #define CQSPI_REG_IRQSTATUS 0x40
> #define CQSPI_REG_IRQMASK 0x44
> +#define CQSPI_REG_VERSAL_ECO 0x48
>
> #define CQSPI_REG_INDIRECTRD 0x60
> #define CQSPI_REG_INDIRECTRD_START_MASK BIT(0)
> @@ -238,6 +242,19 @@ struct cqspi_driver_platdata {
> #define CQSPI_REG_POLLING_STATUS 0xB0
> #define CQSPI_REG_POLLING_STATUS_DUMMY_LSB 16
>
> +#define CQSPI_REG_VERSAL_PHY_CONFIG 0xB4
> +#define CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK BIT(31)
> +#define CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK BIT(30)
> +#define CQSPI_REG_PHY_CONFIG_TX_DLL_DLY_LSB 16
> +
> +#define CQSPI_REG_VERSAL_PHY_MASTER_CTRL 0xB8
> +#define CQSPI_REG_DLL_LOWER 0xBC
> +#define CQSPI_REG_DLL_LOWER_LPBK_LOCK_MASK BIT(15)
> +#define CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK BIT(0)
> +
> +#define CQSPI_REG_VERSAL_DLL_OBSVBLE_UPPER 0xC0
> +#define CQSPI_REG_DLL_UPPER_RX_FLD_MASK 0x7F
> +
> #define CQSPI_REG_OP_EXT_LOWER 0xE0
> #define CQSPI_REG_OP_EXT_READ_LSB 24
> #define CQSPI_REG_OP_EXT_WRITE_LSB 16
> @@ -282,6 +299,12 @@ struct cqspi_driver_platdata {
> #define CQSPI_DMA_UNALIGN 0x3
>
> #define CQSPI_REG_VERSAL_DMA_VAL 0x602
> +#define CQSPI_VERSAL_MAX_DLL_TAPS 127
> +#define CQSPI_VERSAL_TX_TAP_MASTER 0x1E
> +#define CQSPI_VERSAL_ECO_MIN_FREQ 120000000
> +#define CQSPI_VERSAL_INIT_DLY 4
> +#define CQSPI_VERSAL_RDID_MAX_CNT 10
> +#define CQSPI_VERSAL_MIN_WINDOW_SZ 3
>
> static int cqspi_wait_for_bit(void __iomem *reg, const u32 mask, bool clr)
> {
> @@ -540,6 +563,14 @@ static int cqspi_command_read(struct cqspi_flash_pdata *f_pdata,
> if (dummy_clk > CQSPI_DUMMY_CLKS_MAX)
> return -EOPNOTSUPP;
>
> + /*
> + * Based on the tuning, some controllers requires extra dummy clocks due to additional
> + * delays in read path. For such controllers add required extra dummy clocks for all
> + * subsequent read operations.
> + */
> + if (cqspi->extra_dummy_clks)
> + dummy_clk += cqspi->extra_dummy_clks;
> +
> if (dummy_clk)
> reg |= (dummy_clk & CQSPI_REG_CMDCTRL_DUMMY_MASK)
> << CQSPI_REG_CMDCTRL_DUMMY_LSB;
> @@ -662,6 +693,14 @@ static int cqspi_read_setup(struct cqspi_flash_pdata *f_pdata,
> if (dummy_clk > CQSPI_DUMMY_CLKS_MAX)
> return -EOPNOTSUPP;
>
> + /*
> + * Based on the tuning, some controllers requires extra dummy clocks due to additional
> + * delays in read path. For such controllers add required extra dummy clocks for all
> + * subsequent read operations.
> + */
> + if (cqspi->extra_dummy_clks)
> + dummy_clk += cqspi->extra_dummy_clks;
> +
> if (dummy_clk)
> reg |= (dummy_clk & CQSPI_REG_RD_INSTR_DUMMY_MASK)
> << CQSPI_REG_RD_INSTR_DUMMY_LSB;
> @@ -1056,6 +1095,157 @@ static int cqspi_indirect_write_execute(struct cqspi_flash_pdata *f_pdata,
> return ret;
> }
>
> +static int cqspi_versal_execute_tuning(struct spi_mem *mem, const struct spi_mem_op *op)
> +{
> + struct cqspi_st *cqspi = spi_master_get_devdata(mem->spi->master);
> + u8 dummy_incr, dummy_flag = 0, count, max_index, min_index;
> + u8 min_rxtap, max_rxtap, avg_index, max_tap, windowsize;
> + struct platform_device *pdev = cqspi->pdev;
> + struct cqspi_flash_pdata *f_pdata;
> + bool id_matched, rxtapfound;
> + u32 val, i, reg, txtap = 0;
> + s8 max_windowsize = -1;
> + int ret;
> +
> + f_pdata = &cqspi->f_pdata[mem->spi->chip_select];
> +
> + reg = readl(cqspi->iobase + CQSPI_REG_CONFIG);
> + reg &= CQSPI_REG_CONFIG_ENABLE_MASK;
> + if (!reg)
> + return 0;
> +
> + if (cqspi->master_ref_clk_hz >= CQSPI_VERSAL_ECO_MIN_FREQ)
> + writel(1, cqspi->iobase + CQSPI_REG_VERSAL_ECO);
> +
> + /* Drive DLL reset bit to low */
> + writel(0, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> +
> + /* Set initial delay value */
> + writel(CQSPI_VERSAL_INIT_DLY, cqspi->iobase + CQSPI_REG_VERSAL_PHY_MASTER_CTRL);
> +
> + /* Set DLL reset bit */
> + writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK,
> + cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> +
> + /* Check for loopback lock */
> + ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
> + CQSPI_REG_DLL_LOWER_LPBK_LOCK_MASK, 0);
> + if (ret) {
> + dev_err(&pdev->dev, "Loopback lock bit error (%i)\n", ret);
> + return ret;
> + }
> +
> + /* Re-synchronize slave DLLs */
> + writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> + writel(CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK | CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK,
> + cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> +
> + txtap = CQSPI_VERSAL_TX_TAP_MASTER << CQSPI_REG_PHY_CONFIG_TX_DLL_DLY_LSB;
> + max_tap = CQSPI_VERSAL_MAX_DLL_TAPS;
> +
> + for (dummy_incr = 0; dummy_incr <= 1; dummy_incr++) {
> + /* Update the extra dummy clocks required to read the ID */
> + cqspi->extra_dummy_clks = dummy_incr;
> + rxtapfound = false;
> + min_rxtap = 0;
> + max_rxtap = 0;
> + max_index = 0;
> + min_index = 0;
> +
> + for (i = 0; i <= max_tap; i++) {
> + val = txtap | i | CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK;
> + writel(val, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> + writel(val | CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK,
> + cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> + ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
> + CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK, 0);
> + if (ret)
> + return ret;
> +
> + /*
> + * To consider the tap as valid, make sure to read the Flash ID
> + * 10 times(recommended by IP design team). This makes the algorithm more
> + * robust and gives more confidence about the valid tap, this will also
> + * avoid getting the correct id by a fluke with a particular tap.
> + * In any one of the iteration if the Flash ID is not matches with the
> + * device_id, then that particular tap should not be considered as valid.
> + */
> + count = 0;
> + do {
> + /* Execute the command and wait for the complete */
> + ret = cqspi_command_read(f_pdata, op);
> + if (ret < 0) {
> + dev_err(&pdev->dev,
> + "error %d reading JEDEC ID\n", ret);
> + return ret;
> + }
> +
> + id_matched = true;
> + if (memcmp(mem->device_id, op->data.buf.in, op->data.nbytes)) {
> + id_matched = false;
> + break;
> + }
> + } while (id_matched && (count++ <= CQSPI_VERSAL_RDID_MAX_CNT));
> +
> + if (id_matched) {
> + u8 current_rxtap = readl(cqspi->iobase +
> + CQSPI_REG_VERSAL_DLL_OBSVBLE_UPPER) &
> + CQSPI_REG_DLL_UPPER_RX_FLD_MASK;
> +
> + if (!rxtapfound) {
> + min_rxtap = current_rxtap;
> + min_index = i;
> + rxtapfound = true;
> + }
> +
> + max_rxtap = current_rxtap;
> + max_index = i;
> + }
> +
> + if (!rxtapfound)
> + continue;
> +
> + /* If ID doesn't match or reach the max tap, calculate the window size */
> + if (!id_matched || i == max_tap) {
> + windowsize = max_rxtap - min_rxtap + 1;
> + if (windowsize > max_windowsize) {
> + dummy_flag = dummy_incr;
> + max_windowsize = windowsize;
> + avg_index = (max_index + min_index) / 2;
> + }
> +
> + rxtapfound = false;
> + if (windowsize >= CQSPI_VERSAL_MIN_WINDOW_SZ)
> + break;
> + }
> + }
> + }
> +
> + /*
> + * Look for minimum window size to ensure reliable data sampling even during drifts due
> + * to PVT, so that a drift either towards left or right can be well addressed.
> + */
> + if (max_windowsize < CQSPI_VERSAL_MIN_WINDOW_SZ)
> + return -EINVAL;
> +
> + /* Update the extra dummy clocks based on the maximum window size */
> + cqspi->extra_dummy_clks = dummy_flag;
> +
> + val = txtap | avg_index | CQSPI_REG_PHY_CONFIG_RESET_FLD_MASK;
> + writel(val, cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> + writel((CQSPI_REG_PHY_CONFIG_RESYNC_FLD_MASK | val),
> + cqspi->iobase + CQSPI_REG_VERSAL_PHY_CONFIG);
> +
> + ret = cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_DLL_LOWER,
> + CQSPI_REG_DLL_LOWER_DLL_LOCK_MASK, 0);
> + if (ret)
> + return ret;
> +
> + cqspi->clk_tuned = true;
> +
> + return 0;
> +}
> +
> static void cqspi_chipselect(struct cqspi_flash_pdata *f_pdata)
> {
> struct cqspi_st *cqspi = f_pdata->cqspi;
> @@ -1345,11 +1535,21 @@ static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
> static int cqspi_mem_process(struct spi_mem *mem, const struct spi_mem_op *op)
> {
> struct cqspi_st *cqspi = spi_master_get_devdata(mem->spi->master);
> + const struct cqspi_driver_platdata *ddata;
> + struct device *dev = &cqspi->pdev->dev;
> struct cqspi_flash_pdata *f_pdata;
>
> + ddata = of_device_get_match_data(dev);
> f_pdata = &cqspi->f_pdata[mem->spi->chip_select];
> cqspi_configure(f_pdata, mem->spi->max_speed_hz);
>
> + if (op->cmd.dtr && ddata && ddata->execute_tuning && !cqspi->clk_tuned) {
> + int ret = ddata->execute_tuning(mem, op);
> +
> + if (ret)
> + return ret;
> + }
> +
> if (op->data.dir == SPI_MEM_DATA_IN && op->data.buf.in) {
> if (!op->addr.nbytes)
> return cqspi_command_read(f_pdata, op);
> @@ -1846,6 +2046,7 @@ static const struct cqspi_driver_platdata versal_ospi = {
> | CQSPI_HAS_PHY | CQSPI_HAS_DQS,
> .indirect_read_dma = cqspi_versal_indirect_read_dma,
> .get_dma_status = cqspi_get_versal_dma_status,
> + .execute_tuning = cqspi_versal_execute_tuning,
> };
>
> static const struct of_device_id cqspi_dt_ids[] = {

2023-07-18 10:10:58

by Michael Walle

[permalink] [raw]
Subject: Re: [PATCH 0/3] spi: spi-cadence-quadspi: Add Rx tuning support for DTR mode

Am 2023-02-07 07:48, schrieb Tudor Ambarus:
> On 2/7/23 06:09, Sai Krishna Potthuri wrote:
>> Enable PHY and DQS required for Xilinx Versal Octal SPI to operate in
>> DTR
>> protocol.
>> Add and update device_id field in spi_mem structure with flash id
>> information. Xilinx Versal Octal SPI driver requires the device id
>> information to perform the Rx tuning operation. Since there is no
>> common
>> Tuning Data Pattern defined across all vendors, controllers like
>> Xilinx
>> Versal Octal SPI which requires Rx tuning to find out the optimal
>> sampling
>> point for data lines, this device id information will be used as a
>> golden
>> data.
>
> Using only 6 bytes as golden pattern seems fragile, but you are aware
> of
> that, as I see that you chose to read the ID 10 times to make the
> decision whether the tap is valid or not. Other option (which is not
> perfect) is to use SFDP data as golden pattern. If I remember
> correctly, JESD216 suggests to use the Read SFDP cmd at 50 MHz, so it
> won't help you much. In practice SPI NOR uses the Read SFDP command at
> the flash's maximum speed and we haven't seen problems. But better
> would
> be to use some flash OTP data maybe? I remember Pratyush has submitted
> a
> phy calibration series in the past, I haven't had the chance to read
> his
> proposal. Did you? How's your proposal different than his?

And its not 6 bytes.. it's usually only three. The last three bytes will
probably be undefined. So the might return ff or just wrap around and
return the first three bytes again.

Is there a datasheet where you can read how the calibration is done? Is
this
the same for all i/o pads or individual per i/o pad?

I cannot see where the op to read the id is coming from. Are you relying
on the fact that a RDID is the first command which gets executed. If so,
please don't.

Do you calibrate only one pad? RDID (9f) is single bit i/o, right? And
I guess you are calibrating with the highest frequency, are
we sure that RDID will work with any frequency (on any flash).


>> The reason behind choosing this approach instead of reading the ID
>> again
>> in the controller driver is to make it generic solution.
>> - Other controller drivers which want to use similar tuning process,
>> they
>> will make use of this ID instead of reading the ID again in the
>> driver.

Honestly, I'm not sure this is the way to go. Pratyush proposed solution
to have a dedicated memory area within the flash array with a know
pattern
seems to make more sense, because you are calibrating on the thing you
are going
to use later, that is quad/ocal read with the fastest frequency.

>> - Also, we can avoid hardcoding the command information and initiating
>> the
>> transfer in the controller driver as this should happen from spi-nor.

So how you know that this is a RDID instruction?

-michael

2023-07-18 11:42:19

by Potthuri, Sai Krishna

[permalink] [raw]
Subject: RE: [PATCH 0/3] spi: spi-cadence-quadspi: Add Rx tuning support for DTR mode

Hi Michael,

> -----Original Message-----
> From: Michael Walle <[email protected]>
> Sent: Tuesday, July 18, 2023 3:31 PM
> To: Tudor Ambarus <[email protected]>
> Cc: Potthuri, Sai Krishna <[email protected]>; Mark Brown
> <[email protected]>; Tudor Ambarus <[email protected]>;
> Pratyush Yadav <[email protected]>; Miquel Raynal
> <[email protected]>; Richard Weinberger <[email protected]>; Vignesh
> Raghavendra <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> git (AMD-Xilinx) <[email protected]>
> Subject: Re: [PATCH 0/3] spi: spi-cadence-quadspi: Add Rx tuning support for DTR
> mode
>
> Am 2023-02-07 07:48, schrieb Tudor Ambarus:
> > On 2/7/23 06:09, Sai Krishna Potthuri wrote:
> >> Enable PHY and DQS required for Xilinx Versal Octal SPI to operate in
> >> DTR protocol.
> >> Add and update device_id field in spi_mem structure with flash id
> >> information. Xilinx Versal Octal SPI driver requires the device id
> >> information to perform the Rx tuning operation. Since there is no
> >> common Tuning Data Pattern defined across all vendors, controllers
> >> like Xilinx Versal Octal SPI which requires Rx tuning to find out the
> >> optimal sampling point for data lines, this device id information
> >> will be used as a golden data.
> >
> > Using only 6 bytes as golden pattern seems fragile, but you are aware
> > of that, as I see that you chose to read the ID 10 times to make the
> > decision whether the tap is valid or not. Other option (which is not
> > perfect) is to use SFDP data as golden pattern. If I remember
> > correctly, JESD216 suggests to use the Read SFDP cmd at 50 MHz, so it
> > won't help you much. In practice SPI NOR uses the Read SFDP command at
> > the flash's maximum speed and we haven't seen problems. But better
> > would be to use some flash OTP data maybe? I remember Pratyush has
> > submitted a phy calibration series in the past, I haven't had the
> > chance to read his proposal. Did you? How's your proposal different
> > than his?
>
> And its not 6 bytes.. it's usually only three. The last three bytes will probably be
> undefined. So the might return ff or just wrap around and return the first three
> bytes again.
>
> Is there a datasheet where you can read how the calibration is done? Is this the
> same for all i/o pads or individual per i/o pad?
>
> I cannot see where the op to read the id is coming from. Are you relying on the
> fact that a RDID is the first command which gets executed. If so, please don't.
>
> Do you calibrate only one pad? RDID (9f) is single bit i/o, right? And I guess you
> are calibrating with the highest frequency, are we sure that RDID will work with
> any frequency (on any flash).
>
>
> >> The reason behind choosing this approach instead of reading the ID
> >> again in the controller driver is to make it generic solution.
> >> - Other controller drivers which want to use similar tuning process,
> >> they will make use of this ID instead of reading the ID again in the
> >> driver.
>
> Honestly, I'm not sure this is the way to go. Pratyush proposed solution to have a
> dedicated memory area within the flash array with a know pattern seems to
> make more sense, because you are calibrating on the thing you are going to use
> later, that is quad/ocal read with the fastest frequency.
Yes, I am going through his proposal and working on it to adapt that solution.
Agree, this approach makes more sense than RDID also, with this we can avoid the
cons with RDID approach like small pattern, calibrating one line, frequency limitations
with RDID command(if any) etc.,

>
> >> - Also, we can avoid hardcoding the command information and
> >> initiating the transfer in the controller driver as this should
> >> happen from spi-nor.
>
> So how you know that this is a RDID instruction?
Yes, I am expecting the first command after switching to DDR mode is RDID.
Agree, we cannot rely on this.

Regards
Sai Krishna