2017-06-24 06:37:39

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

Ondrej, would you please test this new series?

Changed since v1:
- PDMA transfer residual is calculated earlier.
- End of DMA flag check is now polled (if there is any residual).


Finn Thain (2):
g_NCR5380: Limit sg_tablesize to avoid PDMA read overruns on DTC436
g_NCR5380: Cleanup comments and whitespace

Ondrej Zary (3):
g_NCR5380: Fix PDMA transfer size
g_NCR5380: End PDMA transfer correctly on target disconnection
g_NCR5380: Re-work PDMA loops

drivers/scsi/g_NCR5380.c | 231 ++++++++++++++++++++++-------------------------
1 file changed, 107 insertions(+), 124 deletions(-)

--
2.13.0


2017-06-24 06:37:40

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 1/5] g_NCR5380: Fix PDMA transfer size

From: Ondrej Zary <[email protected]>

generic_NCR5380_dma_xfer_len() incorrectly uses cmd->transfersize
which causes rescan-scsi-bus and CD-ROM access to hang the system.
Use cmd->SCp.this_residual instead, like other NCR5380 drivers.

Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
---
drivers/scsi/g_NCR5380.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 67c8dac321ad..14ef4e8c4713 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -76,6 +76,7 @@
#define IRQ_AUTO 254

#define MAX_CARDS 8
+#define DMA_MAX_SIZE 32768

/* old-style parameters for compatibility */
static int ncr_irq = -1;
@@ -629,23 +630,16 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
struct scsi_cmnd *cmd)
{
- int transfersize = cmd->transfersize;
+ int transfersize = cmd->SCp.this_residual;

if (hostdata->flags & FLAG_NO_PSEUDO_DMA)
return 0;

- /* Limit transfers to 32K, for xx400 & xx406
- * pseudoDMA that transfers in 128 bytes blocks.
- */
- if (transfersize > 32 * 1024 && cmd->SCp.this_residual &&
- !(cmd->SCp.this_residual % transfersize))
- transfersize = 32 * 1024;
-
/* 53C400 datasheet: non-modulo-128-byte transfers should use PIO */
if (transfersize % 128)
transfersize = 0;

- return transfersize;
+ return min(transfersize, DMA_MAX_SIZE);
}

/*
--
2.13.0

2017-06-24 06:37:42

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 3/5] g_NCR5380: Limit sg_tablesize to avoid PDMA read overruns on DTC436

Back-to-back DMA receive transfers can lose a byte due to a 5380
flaw. This makes scatter-receive difficult or impossible on affected
hardware, so limit the scatter/gather tablesize to 1.

Signed-off-by: Finn Thain <[email protected]>
---
drivers/scsi/g_NCR5380.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 911a4300ea51..7b3626fe120b 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -247,6 +247,7 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt,
case BOARD_DTC3181E:
ports = dtc_3181e_ports;
magic = ncr_53c400a_magic;
+ tpnt->sg_tablesize = 1;
break;
}

--
2.13.0

2017-06-24 06:37:41

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 4/5] g_NCR5380: Cleanup comments and whitespace

Signed-off-by: Finn Thain <[email protected]>
---
drivers/scsi/g_NCR5380.c | 61 ++++++++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 7b3626fe120b..d51a2ba07a24 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -1,17 +1,17 @@
/*
* Generic Generic NCR5380 driver
- *
+ *
* Copyright 1993, Drew Eckhardt
- * Visionary Computing
- * (Unix and Linux consulting and custom programming)
- * [email protected]
- * +1 (303) 440-4894
+ * Visionary Computing
+ * (Unix and Linux consulting and custom programming)
+ * [email protected]
+ * +1 (303) 440-4894
*
* NCR53C400 extensions (c) 1994,1995,1996, Kevin Lentin
- * [email protected]
+ * [email protected]
*
* NCR53C400A extensions (c) 1996, Ingmar Baumgart
- * [email protected]
+ * [email protected]
*
* DTC3181E extensions (c) 1997, Ronald van Cuijlenborg
* [email protected] or [email protected]
@@ -482,15 +482,14 @@ static void generic_NCR5380_release_resources(struct Scsi_Host *instance)
}

/**
- * generic_NCR5380_pread - pseudo DMA read
- * @hostdata: scsi host private data
- * @dst: buffer to read into
- * @len: buffer length
+ * generic_NCR5380_pread - pseudo DMA receive
+ * @hostdata: scsi host private data
+ * @dst: buffer to write into
+ * @len: transfer size
*
- * Perform a pseudo DMA mode read from an NCR53C400 or equivalent
- * controller
+ * Perform a pseudo DMA mode receive from a 53C400 or equivalent device.
*/
-
+
static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
unsigned char *dst, int len)
{
@@ -509,10 +508,10 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,

if (hostdata->io_port && hostdata->io_width == 2)
insw(hostdata->io_port + hostdata->c400_host_buf,
- dst + start, 64);
+ dst + start, 64);
else if (hostdata->io_port)
insb(hostdata->io_port + hostdata->c400_host_buf,
- dst + start, 128);
+ dst + start, 128);
else
memcpy_fromio(dst + start,
hostdata->io + NCR53C400_host_buffer, 128);
@@ -559,13 +558,12 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
}

/**
- * generic_NCR5380_pwrite - pseudo DMA write
- * @hostdata: scsi host private data
- * @dst: buffer to read into
- * @len: buffer length
+ * generic_NCR5380_pwrite - pseudo DMA send
+ * @hostdata: scsi host private data
+ * @src: buffer to read from
+ * @len: transfer size
*
- * Perform a pseudo DMA mode read from an NCR53C400 or equivalent
- * controller
+ * Perform a pseudo DMA mode send to a 53C400 or equivalent device.
*/

static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
@@ -604,10 +602,10 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,

if (hostdata->io_port && hostdata->io_width == 2)
outsw(hostdata->io_port + hostdata->c400_host_buf,
- src + start, 64);
+ src + start, 64);
else if (hostdata->io_port)
outsb(hostdata->io_port + hostdata->c400_host_buf,
- src + start, 128);
+ src + start, 128);
else
memcpy_toio(hostdata->io + NCR53C400_host_buffer,
src + start, 128);
@@ -657,10 +655,8 @@ static int generic_NCR5380_dma_residual(struct NCR5380_hostdata *hostdata)
return hostdata->pdma_residual;
}

-/*
- * Include the NCR5380 core code that we build our driver around
- */
-
+/* Include the core driver code. */
+
#include "NCR5380.c"

static struct scsi_host_template driver_template = {
@@ -680,11 +676,10 @@ static struct scsi_host_template driver_template = {
.max_sectors = 128,
};

-
static int generic_NCR5380_isa_match(struct device *pdev, unsigned int ndev)
{
int ret = generic_NCR5380_init_one(&driver_template, pdev, base[ndev],
- irq[ndev], card[ndev]);
+ irq[ndev], card[ndev]);
if (ret) {
if (base[ndev])
printk(KERN_WARNING "Card not found at address 0x%03x\n",
@@ -696,7 +691,7 @@ static int generic_NCR5380_isa_match(struct device *pdev, unsigned int ndev)
}

static int generic_NCR5380_isa_remove(struct device *pdev,
- unsigned int ndev)
+ unsigned int ndev)
{
generic_NCR5380_release_resources(dev_get_drvdata(pdev));
dev_set_drvdata(pdev, NULL);
@@ -719,7 +714,7 @@ static struct pnp_device_id generic_NCR5380_pnp_ids[] = {
MODULE_DEVICE_TABLE(pnp, generic_NCR5380_pnp_ids);

static int generic_NCR5380_pnp_probe(struct pnp_dev *pdev,
- const struct pnp_device_id *id)
+ const struct pnp_device_id *id)
{
int base, irq;

@@ -730,7 +725,7 @@ static int generic_NCR5380_pnp_probe(struct pnp_dev *pdev,
irq = pnp_irq(pdev, 0);

return generic_NCR5380_init_one(&driver_template, &pdev->dev, base, irq,
- id->driver_data);
+ id->driver_data);
}

static void generic_NCR5380_pnp_remove(struct pnp_dev *pdev)
--
2.13.0

2017-06-24 06:38:22

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 5/5] g_NCR5380: Re-work PDMA loops

From: Ondrej Zary <[email protected]>

The polling loops in pread() and pwrite() can easily become infinite
loops and hang the machine.

On DTC chips, IRQ can arrive late and we miss it because we only check
once. Merge the IRQ check into host buffer wait and add polling limit.

Also place a limit on polling for 53C80 registers accessibility.

[Use poll_politely2() for register polling. Rely on polling for IRQ
rather than polling for bus status, like the algorithm in the datasheet.
Calculate residual from block count register instead of the loop count.
Factor-out common code as wait_for_53c80_access(). -- F.T.]

Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
---
drivers/scsi/g_NCR5380.c | 149 ++++++++++++++++++++---------------------------
1 file changed, 64 insertions(+), 85 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index d51a2ba07a24..2546e6076a90 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -481,6 +481,30 @@ static void generic_NCR5380_release_resources(struct Scsi_Host *instance)
release_mem_region(base, region_size);
}

+/* wait_for_53c80_access - wait for 53C80 registers to become accessible
+ * @hostdata: scsi host private data
+ *
+ * The registers within the 53C80 logic block are inaccessible until
+ * bit 7 in the 53C400 control status register gets asserted.
+ */
+
+static int wait_for_53c80_access(struct NCR5380_hostdata *hostdata)
+{
+ int count = 10000;
+
+ do {
+ udelay(4); /* DTC436 chip hangs without this */
+ if (NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG)
+ return 0;
+ } while (--count > 0);
+
+ scmd_printk(KERN_ERR, hostdata->connected,
+ "53c80 registers not accessible, device will be reset\n");
+ NCR5380_write(hostdata->c400_ctl_status, CSR_RESET);
+ NCR5380_write(hostdata->c400_ctl_status, CSR_BASE);
+ return -1;
+}
+
/**
* generic_NCR5380_pread - pseudo DMA receive
* @hostdata: scsi host private data
@@ -493,18 +517,19 @@ static void generic_NCR5380_release_resources(struct Scsi_Host *instance)
static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
unsigned char *dst, int len)
{
- int blocks = len / 128;
- int start = 0;
+ int result;
+ int start;

NCR5380_write(hostdata->c400_ctl_status, CSR_BASE | CSR_TRANS_DIR);
- NCR5380_write(hostdata->c400_blk_cnt, blocks);
- while (1) {
- if (NCR5380_read(hostdata->c400_blk_cnt) == 0)
+ NCR5380_write(hostdata->c400_blk_cnt, len / 128);
+
+ for (start = 0; start < len; start += 128) {
+ if (NCR5380_poll_politely2(hostdata, hostdata->c400_ctl_status,
+ CSR_HOST_BUF_NOT_RDY, 0,
+ hostdata->c400_ctl_status,
+ CSR_GATED_53C80_IRQ,
+ CSR_GATED_53C80_IRQ, HZ / 64) < 0)
break;
- if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ)
- goto out_wait;
- while (NCR5380_read(hostdata->c400_ctl_status) & CSR_HOST_BUF_NOT_RDY)
- ; /* FIXME - no timeout */

if (hostdata->io_port && hostdata->io_width == 2)
insw(hostdata->io_port + hostdata->c400_host_buf,
@@ -515,46 +540,20 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
else
memcpy_fromio(dst + start,
hostdata->io + NCR53C400_host_buffer, 128);
-
- start += 128;
- blocks--;
- }
-
- if (blocks) {
- while (NCR5380_read(hostdata->c400_ctl_status) & CSR_HOST_BUF_NOT_RDY)
- ; /* FIXME - no timeout */
-
- if (hostdata->io_port && hostdata->io_width == 2)
- insw(hostdata->io_port + hostdata->c400_host_buf,
- dst + start, 64);
- else if (hostdata->io_port)
- insb(hostdata->io_port + hostdata->c400_host_buf,
- dst + start, 128);
- else
- memcpy_fromio(dst + start,
- hostdata->io + NCR53C400_host_buffer, 128);
-
- start += 128;
- blocks--;
}

- if (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ))
- printk("53C400r: no 53C80 gated irq after transfer");
-
-out_wait:
- hostdata->pdma_residual = len - start;
+ hostdata->pdma_residual = NCR5380_read(hostdata->c400_blk_cnt) * 128;

- /* wait for 53C80 registers to be available */
- while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG))
- ;
+ result = wait_for_53c80_access(hostdata);

- if (NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
+ if (hostdata->pdma_residual == 0 &&
+ NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
BASR_END_DMA_TRANSFER, BASR_END_DMA_TRANSFER,
HZ / 64) < 0)
scmd_printk(KERN_ERR, hostdata->connected, "%s: End of DMA timeout (%d)\n",
__func__, hostdata->pdma_residual);

- return 0;
+ return result;
}

/**
@@ -569,36 +568,19 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
unsigned char *src, int len)
{
- int blocks = len / 128;
- int start = 0;
+ int result;
+ int start;

NCR5380_write(hostdata->c400_ctl_status, CSR_BASE);
- NCR5380_write(hostdata->c400_blk_cnt, blocks);
- while (1) {
- if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ)
- goto out_wait;
-
- if (NCR5380_read(hostdata->c400_blk_cnt) == 0)
+ NCR5380_write(hostdata->c400_blk_cnt, len / 128);
+
+ for (start = 0; start < len; start += 128) {
+ if (NCR5380_poll_politely2(hostdata, hostdata->c400_ctl_status,
+ CSR_HOST_BUF_NOT_RDY, 0,
+ hostdata->c400_ctl_status,
+ CSR_GATED_53C80_IRQ,
+ CSR_GATED_53C80_IRQ, HZ / 64) < 0)
break;
- while (NCR5380_read(hostdata->c400_ctl_status) & CSR_HOST_BUF_NOT_RDY)
- ; // FIXME - timeout
-
- if (hostdata->io_port && hostdata->io_width == 2)
- outsw(hostdata->io_port + hostdata->c400_host_buf,
- src + start, 64);
- else if (hostdata->io_port)
- outsb(hostdata->io_port + hostdata->c400_host_buf,
- src + start, 128);
- else
- memcpy_toio(hostdata->io + NCR53C400_host_buffer,
- src + start, 128);
-
- start += 128;
- blocks--;
- }
- if (blocks) {
- while (NCR5380_read(hostdata->c400_ctl_status) & CSR_HOST_BUF_NOT_RDY)
- ; // FIXME - no timeout

if (hostdata->io_port && hostdata->io_width == 2)
outsw(hostdata->io_port + hostdata->c400_host_buf,
@@ -609,30 +591,27 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
else
memcpy_toio(hostdata->io + NCR53C400_host_buffer,
src + start, 128);
-
- start += 128;
- blocks--;
}

-out_wait:
- hostdata->pdma_residual = len - start;
+ hostdata->pdma_residual = NCR5380_read(hostdata->c400_blk_cnt) * 128;

- /* wait for 53C80 registers to be available */
- while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG)) {
- udelay(4); /* DTC436 chip hangs without this */
- /* FIXME - no timeout */
- }
+ result = wait_for_53c80_access(hostdata);

- while (!(NCR5380_read(TARGET_COMMAND_REG) & TCR_LAST_BYTE_SENT))
- ; // TIMEOUT
+ if (hostdata->pdma_residual == 0) {
+ if (NCR5380_poll_politely(hostdata, TARGET_COMMAND_REG,
+ TCR_LAST_BYTE_SENT, TCR_LAST_BYTE_SENT,
+ HZ / 64) < 0)
+ scmd_printk(KERN_ERR, hostdata->connected,
+ "%s: Last Byte Sent timeout\n", __func__);

- if (NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
- BASR_END_DMA_TRANSFER, BASR_END_DMA_TRANSFER,
- HZ / 64) < 0)
- scmd_printk(KERN_ERR, hostdata->connected, "%s: End of DMA timeout (%d)\n",
- __func__, hostdata->pdma_residual);
+ if (NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
+ BASR_END_DMA_TRANSFER, BASR_END_DMA_TRANSFER,
+ HZ / 64) < 0)
+ scmd_printk(KERN_ERR, hostdata->connected, "%s: End of DMA timeout (%d)\n",
+ __func__, hostdata->pdma_residual);
+ }

- return 0;
+ return result;
}

static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
--
2.13.0

2017-06-24 06:38:44

by Finn Thain

[permalink] [raw]
Subject: [PATCH v2 2/5] g_NCR5380: End PDMA transfer correctly on target disconnection

From: Ondrej Zary <[email protected]>

When an IRQ arrives during PDMA transfer, pread() and pwrite() return
without waiting for the 53C80 registers to be ready and this ends up
messing up the chip state. This was observed with SONY CDU-55S which is
slow enough to disconnect during 4096-byte reads.

IRQ during PDMA is not an error so don't return -1. Instead, store the
remaining byte count for use by NCR5380_dma_residual().

[Poll for the BASR_END_DMA_TRANSFER condition rather than remove the
error message -- F.T.]

Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
---
drivers/scsi/g_NCR5380.c | 48 +++++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 14ef4e8c4713..911a4300ea51 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -44,12 +44,13 @@
int c400_ctl_status; \
int c400_blk_cnt; \
int c400_host_buf; \
- int io_width
+ int io_width; \
+ int pdma_residual

#define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len
#define NCR5380_dma_recv_setup generic_NCR5380_pread
#define NCR5380_dma_send_setup generic_NCR5380_pwrite
-#define NCR5380_dma_residual NCR5380_dma_residual_none
+#define NCR5380_dma_residual generic_NCR5380_dma_residual

#define NCR5380_intr generic_NCR5380_intr
#define NCR5380_queue_command generic_NCR5380_queue_command
@@ -500,10 +501,8 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
while (1) {
if (NCR5380_read(hostdata->c400_blk_cnt) == 0)
break;
- if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ) {
- printk(KERN_ERR "53C400r: Got 53C80_IRQ start=%d, blocks=%d\n", start, blocks);
- return -1;
- }
+ if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ)
+ goto out_wait;
while (NCR5380_read(hostdata->c400_ctl_status) & CSR_HOST_BUF_NOT_RDY)
; /* FIXME - no timeout */

@@ -542,13 +541,19 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
if (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ))
printk("53C400r: no 53C80 gated irq after transfer");

+out_wait:
+ hostdata->pdma_residual = len - start;
+
/* wait for 53C80 registers to be available */
while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG))
;

- if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER))
- printk(KERN_ERR "53C400r: no end dma signal\n");
-
+ if (NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
+ BASR_END_DMA_TRANSFER, BASR_END_DMA_TRANSFER,
+ HZ / 64) < 0)
+ scmd_printk(KERN_ERR, hostdata->connected, "%s: End of DMA timeout (%d)\n",
+ __func__, hostdata->pdma_residual);
+
return 0;
}

@@ -571,10 +576,8 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
NCR5380_write(hostdata->c400_ctl_status, CSR_BASE);
NCR5380_write(hostdata->c400_blk_cnt, blocks);
while (1) {
- if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ) {
- printk(KERN_ERR "53C400w: Got 53C80_IRQ start=%d, blocks=%d\n", start, blocks);
- return -1;
- }
+ if (NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ)
+ goto out_wait;

if (NCR5380_read(hostdata->c400_blk_cnt) == 0)
break;
@@ -612,18 +615,24 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
blocks--;
}

+out_wait:
+ hostdata->pdma_residual = len - start;
+
/* wait for 53C80 registers to be available */
while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG)) {
udelay(4); /* DTC436 chip hangs without this */
/* FIXME - no timeout */
}

- if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER)) {
- printk(KERN_ERR "53C400w: no end dma signal\n");
- }
-
while (!(NCR5380_read(TARGET_COMMAND_REG) & TCR_LAST_BYTE_SENT))
; // TIMEOUT
+
+ if (NCR5380_poll_politely(hostdata, BUS_AND_STATUS_REG,
+ BASR_END_DMA_TRANSFER, BASR_END_DMA_TRANSFER,
+ HZ / 64) < 0)
+ scmd_printk(KERN_ERR, hostdata->connected, "%s: End of DMA timeout (%d)\n",
+ __func__, hostdata->pdma_residual);
+
return 0;
}

@@ -642,6 +651,11 @@ static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
return min(transfersize, DMA_MAX_SIZE);
}

+static int generic_NCR5380_dma_residual(struct NCR5380_hostdata *hostdata)
+{
+ return hostdata->pdma_residual;
+}
+
/*
* Include the NCR5380 core code that we build our driver around
*/
--
2.13.0

2017-06-25 09:26:41

by Ondrej Zary

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

On Saturday 24 June 2017 08:37:36 Finn Thain wrote:
> Ondrej, would you please test this new series?
>
> Changed since v1:
> - PDMA transfer residual is calculated earlier.
> - End of DMA flag check is now polled (if there is any residual).
>
>
> Finn Thain (2):
> g_NCR5380: Limit sg_tablesize to avoid PDMA read overruns on DTC436
> g_NCR5380: Cleanup comments and whitespace
>
> Ondrej Zary (3):
> g_NCR5380: Fix PDMA transfer size
> g_NCR5380: End PDMA transfer correctly on target disconnection
> g_NCR5380: Re-work PDMA loops
>
> drivers/scsi/g_NCR5380.c | 231
> ++++++++++++++++++++++------------------------- 1 file changed, 107
> insertions(+), 124 deletions(-)

It mostly works, but there are some problems:

It's not reliable - we continue the data transfer after poll_politely2
returns zero but we don't know if it returned because of host buffer
being ready of because of an IRQ. So if a device disconnects during write,
we continue to fill the buffer and only then find out that wait for
53c80 registers timed out. Then PDMA gets disabled:
[ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible, device will be reset
[ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to slow handshake

We can just reset and continue with a new PDMA transfer. Found no problems
with reads. But when this happens during a write, we might have lost some
data buffers that we need to transfer again. The chip's PDMA block counter
does not seem to be very helpful here - testing shows that either one buffer
is missing in the file or is duplicated.

That's why my code had separate host buffer ready and IRQ checks. Host
buffer first - if it's ready, transfer the data. If not, check for IRQ -
if it was an error, rollback 2 buffers (the same if the host buffer is not
ready in time).



There's also a performance regression on DTC436 - the sg_tablesize limit
affects performance badly.
Before:
# hdparm -t --direct /dev/sdb

/dev/sdb:
Timing O_DIRECT disk reads: 4 MB in 3.21 seconds = 1.25 MB/sec

Now:
# hdparm -t --direct /dev/sdb

/dev/sdb:
Timing O_DIRECT disk reads: 4 MB in 4.89 seconds = 837.69 kB/sec

We should limit the transfer size instead:
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -45,7 +45,8 @@
int c400_blk_cnt; \
int c400_host_buf; \
int io_width; \
- int pdma_residual
+ int pdma_residual; \
+ int board;

#define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len
#define NCR5380_dma_recv_setup generic_NCR5380_pread
@@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt,
case BOARD_DTC3181E:
ports = dtc_3181e_ports;
magic = ncr_53c400a_magic;
- tpnt->sg_tablesize = 1;
break;
}

@@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt,
}
hostdata = shost_priv(instance);

+ hostdata->board = board;
hostdata->io = iomem;
hostdata->region_size = region_size;

@@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
/* 53C400 datasheet: non-modulo-128-byte transfers should use PIO */
if (transfersize % 128)
transfersize = 0;
+ /* Limit transfers to 512B to prevent random write corruption on DTC */
+ if (hostdata->board == BOARD_DTC3181E && transfersize > 512)
+ transfersize = 512;

return min(transfersize, DMA_MAX_SIZE);
}


No data corruption and no performance regression:
# hdparm -t --direct /dev/sdb

/dev/sdb:
Timing O_DIRECT disk reads: 4 MB in 3.25 seconds = 1.23 MB/sec

As the data corruption affects only writes, we could keep transfersize
unlimited for reads:
+ /* Limit write transfers to 512B to prevent random corruption on DTC */
+ if (hostdata->board == BOARD_DTC3181E &&
+ cmd->sc_data_direction == DMA_TO_DEVICE && transfersize > 512)
+ transfersize = 512;

So we can get some performance gain at least for reads:
# hdparm -t --direct /dev/sdb

/dev/sdb:
Timing O_DIRECT disk reads: 6 MB in 4.17 seconds = 1.44 MB/sec


--
Ondrej Zary

2017-06-26 07:35:37

by Finn Thain

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

On Sun, 25 Jun 2017, Ondrej Zary wrote:
>
> It mostly works, but there are some problems:
>
> It's not reliable - we continue the data transfer after poll_politely2
> returns zero but we don't know if it returned because of host buffer
> being ready of because of an IRQ. So if a device disconnects during write,
> we continue to fill the buffer and only then find out that wait for
> 53c80 registers timed out. Then PDMA gets disabled:
> [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible, device will be reset
> [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to slow handshake
>

Sorry about that. I messed up the Gated-IRQ-is-asserted case when I
changed the algorithm to avoid adding more polling loops.

> We can just reset and continue with a new PDMA transfer.

We should only reset the 53c400 logic when there is a real failure (like
no access to 53c80 registers). If a reset is needed to handle normal
target behaviour (like disconnection during a slow transfer) then we are
doing something wrong.

> Found no problems with reads. But when this happens during a write, we
> might have lost some data buffers that we need to transfer again. The
> chip's PDMA block counter does not seem to be very helpful here -
> testing shows that either one buffer is missing in the file or is
> duplicated.
>
> That's why my code had separate host buffer ready and IRQ checks. Host
> buffer first - if it's ready, transfer the data. If not, check for IRQ -
> if it was an error, rollback 2 buffers (the same if the host buffer is
> not ready in time).
>

In v3 of the patch series I've fixed the Gated IRQ logic so the transfer
loop will terminate early.

>
> There's also a performance regression on DTC436 - the sg_tablesize limit
> affects performance badly.
> Before:
> # hdparm -t --direct /dev/sdb
>
> /dev/sdb:
> Timing O_DIRECT disk reads: 4 MB in 3.21 seconds = 1.25 MB/sec
>
> Now:
> # hdparm -t --direct /dev/sdb
>
> /dev/sdb:
> Timing O_DIRECT disk reads: 4 MB in 4.89 seconds = 837.69 kB/sec
>

The lost throughput can perhaps be explained by extra kernel-mode CPU
overhead. Next time maybe check for that with "time hdparm". Anyway, since
you have a patch, we should try to avoid this regression.

> We should limit the transfer size instead:
> --- a/drivers/scsi/g_NCR5380.c
> +++ b/drivers/scsi/g_NCR5380.c
> @@ -45,7 +45,8 @@
> int c400_blk_cnt; \
> int c400_host_buf; \
> int io_width; \
> - int pdma_residual
> + int pdma_residual; \
> + int board;
>
> #define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len
> #define NCR5380_dma_recv_setup generic_NCR5380_pread
> @@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt,
> case BOARD_DTC3181E:
> ports = dtc_3181e_ports;
> magic = ncr_53c400a_magic;
> - tpnt->sg_tablesize = 1;
> break;
> }
>

I've dropped my "sg_tablesize = 1" patch from the v3 series.

> @@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt,
> }
> hostdata = shost_priv(instance);
>
> + hostdata->board = board;
> hostdata->io = iomem;
> hostdata->region_size = region_size;
>

I've added the hostdata->board variable in v3. It looks like we are going
to need it one way or another.

> @@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
> /* 53C400 datasheet: non-modulo-128-byte transfers should use PIO */
> if (transfersize % 128)
> transfersize = 0;
> + /* Limit transfers to 512B to prevent random write corruption on DTC */
> + if (hostdata->board == BOARD_DTC3181E && transfersize > 512)
> + transfersize = 512;
>
> return min(transfersize, DMA_MAX_SIZE);
> }
>
>
> No data corruption

How did you confirm that? What about a 1024 byte limit? 2048? 4096? Does
it make any difference?

Do you have any theories that might explain the need for a 512 B limit?

My theory about the "read overrun" issue doesn't seem applicable, given
that this is actually the CMOS version, which has different errata than
the NMOS version.

The need for udelay(4) on this board does suggest timing issues. What
happens if you add a udelay into the generic_NCR5380_pwrite() loop?

> and no performance regression:
> # hdparm -t --direct /dev/sdb
>

Well, I'd still expect some added CPU overhead. The driver is being handed
a 4096 byte transfer and is now splitting that up into eight separate
transfers. That means eight times as many DMA setup and completion calls
and presumably a similar increase in interrupts.

> /dev/sdb:
> Timing O_DIRECT disk reads: 4 MB in 3.25 seconds = 1.23 MB/sec
>
> As the data corruption affects only writes, we could keep transfersize
> unlimited for reads:
> + /* Limit write transfers to 512B to prevent random corruption on DTC */
> + if (hostdata->board == BOARD_DTC3181E &&
> + cmd->sc_data_direction == DMA_TO_DEVICE && transfersize > 512)
> + transfersize = 512;
>
> So we can get some performance gain at least for reads:
> # hdparm -t --direct /dev/sdb
>
> /dev/sdb:
> Timing O_DIRECT disk reads: 6 MB in 4.17 seconds = 1.44 MB/sec
>

Right. No need to penalize read performance by splitting up transfers.

Anyway, I'd really like to get some idea of the root cause before I sign
off on this particular approach.

--

2017-06-26 08:20:50

by Ondrej Zary

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

On Monday 26 June 2017, Finn Thain wrote:
> On Sun, 25 Jun 2017, Ondrej Zary wrote:
> > It mostly works, but there are some problems:
> >
> > It's not reliable - we continue the data transfer after poll_politely2
> > returns zero but we don't know if it returned because of host buffer
> > being ready of because of an IRQ. So if a device disconnects during
> > write, we continue to fill the buffer and only then find out that wait
> > for 53c80 registers timed out. Then PDMA gets disabled:
> > [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible,
> > device will be reset [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to
> > slow handshake
>
> Sorry about that. I messed up the Gated-IRQ-is-asserted case when I
> changed the algorithm to avoid adding more polling loops.
>
> > We can just reset and continue with a new PDMA transfer.
>
> We should only reset the 53c400 logic when there is a real failure (like
> no access to 53c80 registers). If a reset is needed to handle normal
> target behaviour (like disconnection during a slow transfer) then we are
> doing something wrong.

The reset is probably the only way to terminate a PDMA transfer.

> > Found no problems with reads. But when this happens during a write, we
> > might have lost some data buffers that we need to transfer again. The
> > chip's PDMA block counter does not seem to be very helpful here -
> > testing shows that either one buffer is missing in the file or is
> > duplicated.
> >
> > That's why my code had separate host buffer ready and IRQ checks. Host
> > buffer first - if it's ready, transfer the data. If not, check for IRQ -
> > if it was an error, rollback 2 buffers (the same if the host buffer is
> > not ready in time).
>
> In v3 of the patch series I've fixed the Gated IRQ logic so the transfer
> loop will terminate early.
>
> > There's also a performance regression on DTC436 - the sg_tablesize limit
> > affects performance badly.
> > Before:
> > # hdparm -t --direct /dev/sdb
> >
> > /dev/sdb:
> > Timing O_DIRECT disk reads: 4 MB in 3.21 seconds = 1.25 MB/sec
> >
> > Now:
> > # hdparm -t --direct /dev/sdb
> >
> > /dev/sdb:
> > Timing O_DIRECT disk reads: 4 MB in 4.89 seconds = 837.69 kB/sec
>
> The lost throughput can perhaps be explained by extra kernel-mode CPU
> overhead. Next time maybe check for that with "time hdparm". Anyway, since
> you have a patch, we should try to avoid this regression.

I don't think the CPU is that slow. It probably decreases the SCSI read
command length and we must wait for the drive to process each command (it
probably does no read-ahead).

> > We should limit the transfer size instead:
> > --- a/drivers/scsi/g_NCR5380.c
> > +++ b/drivers/scsi/g_NCR5380.c
> > @@ -45,7 +45,8 @@
> > int c400_blk_cnt; \
> > int c400_host_buf; \
> > int io_width; \
> > - int pdma_residual
> > + int pdma_residual; \
> > + int board;
> >
> > #define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len
> > #define NCR5380_dma_recv_setup generic_NCR5380_pread
> > @@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct
> > scsi_host_template *tpnt, case BOARD_DTC3181E:
> > ports = dtc_3181e_ports;
> > magic = ncr_53c400a_magic;
> > - tpnt->sg_tablesize = 1;
> > break;
> > }
>
> I've dropped my "sg_tablesize = 1" patch from the v3 series.
>
> > @@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct
> > scsi_host_template *tpnt, }
> > hostdata = shost_priv(instance);
> >
> > + hostdata->board = board;
> > hostdata->io = iomem;
> > hostdata->region_size = region_size;
>
> I've added the hostdata->board variable in v3. It looks like we are going
> to need it one way or another.
>
> > @@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct
> > NCR5380_hostdata *hostdata, /* 53C400 datasheet: non-modulo-128-byte
> > transfers should use PIO */ if (transfersize % 128)
> > transfersize = 0;
> > + /* Limit transfers to 512B to prevent random write corruption on
> > DTC */ + if (hostdata->board == BOARD_DTC3181E && transfersize >
> > 512) + transfersize = 512;
> >
> > return min(transfersize, DMA_MAX_SIZE);
> > }
> >
> >
> > No data corruption
>
> How did you confirm that? What about a 1024 byte limit? 2048? 4096? Does
> it make any difference?
>
> Do you have any theories that might explain the need for a 512 B limit?

Tested it by copying a large file (230 MB of random data) to a slow Quantum
240MB HDD, then read it back and compare. No corruption with 512B, corruption
with anything more.

My theory is that with 512 B size, no disconnect occurs during a transfer
(because the drive has 512 B sectors).

> My theory about the "read overrun" issue doesn't seem applicable, given
> that this is actually the CMOS version, which has different errata than
> the NMOS version.
>
> The need for udelay(4) on this board does suggest timing issues. What
> happens if you add a udelay into the generic_NCR5380_pwrite() loop?

I've had major problems with this DTC chip on one mainboard (PCPartner with
VIA 694T chipset) - the chip locked up and died completely until reset. This
udelay worked it around at one place but it eventually died somewhere else.
No amount of delays helped.

Tested the DTC card with another mainboard - Procomp BIZ1A (i440ZX chipset)
and all the weird problems went away! Looks like there's a HW incompatibility
between the PCPartner board (or its VIA chipset) and the DTC chip/card. So
I'm now using the Procomp board for testing.

> > and no performance regression:
> > # hdparm -t --direct /dev/sdb
>
> Well, I'd still expect some added CPU overhead. The driver is being handed
> a 4096 byte transfer and is now splitting that up into eight separate
> transfers. That means eight times as many DMA setup and completion calls
> and presumably a similar increase in interrupts.

Yes, there's still some overhead but it seems minimal compared to the drive
slowness.

> > /dev/sdb:
> > Timing O_DIRECT disk reads: 4 MB in 3.25 seconds = 1.23 MB/sec
> >
> > As the data corruption affects only writes, we could keep transfersize
> > unlimited for reads:
> > + /* Limit write transfers to 512B to prevent random corruption on
> > DTC */ + if (hostdata->board == BOARD_DTC3181E &&
> > + cmd->sc_data_direction == DMA_TO_DEVICE && transfersize >
> > 512) + transfersize = 512;
> >
> > So we can get some performance gain at least for reads:
> > # hdparm -t --direct /dev/sdb
> >
> > /dev/sdb:
> > Timing O_DIRECT disk reads: 6 MB in 4.17 seconds = 1.44 MB/sec
>
> Right. No need to penalize read performance by splitting up transfers.
>
> Anyway, I'd really like to get some idea of the root cause before I sign
> off on this particular approach.

I don't like this workaround but haven't found anything better yet.

--
Ondrej Zary