2023-10-18 16:00:04

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 0/6] i3c: master: svc: collection of bugs fixes

Each patch is indepedents. See commit message for detail.

Change from v1 to v2.
See each patch notes

Frank Li (6):
i3c: master: svc: fix race condition in ibi work thread
i3c: master: svc: fix wrong data return when IBI happen during start
frame
i3c: master: svc: fix ibi may not return mandatory data byte
i3c: master: svc: fix check wrong status register in irq handler
i3c: master: svc: fix SDA keep low when polling IBIWON timeout happen
i3c: master: svc: fix random hot join failure since timeout error

drivers/i3c/master/svc-i3c-master.c | 51 ++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)

--
2.34.1


2023-10-18 16:00:42

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 2/6] i3c: master: svc: fix wrong data return when IBI happen during start frame

┌─────┐ ┏──┐ ┏──┐ ┏──┐ ┏──┐ ┏──┐ ┏──┐ ┏──┐ ┏──┐ ┌─────
SCL: ┘ └─────┛ └──┛ └──┛ └──┛ └──┛ └──┛ └──┛ └──┛ └──┘
───┐ ┌─────┐ ┌─────┐ ┌───────────┐
SDA: └───────────────────────┘ └─────┘ └─────┘ └─────
xxx╱ ╲╱ ╲╱ ╲╱ ╲╱ ╲
: xxx╲IBI ╱╲ Addr(0x0a) ╱╲ RW ╱╲NACK╱╲ S ╱

If an In-Band Interrupt (IBI) occurs and IBI work thread is not immediately
scheduled, When svc_i3c_master_priv_xfers() initiates the I3C transfer and
attempts to send address 0x7e, the target interprets it as an
IBI handler and returns the target address 0x0a.

However, svc_i3c_master_priv_xfers() does not handle this case and proceeds
with other transfers, resulting in incorrect data being returned.

Add IBIWON check in svc_i3c_master_xfer(). In case this situation occurs,
return a failure to the driver.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Reviewed-by: Miquel Raynal <[email protected]>
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
- update commit message
- fix typo yeild and falure

drivers/i3c/master/svc-i3c-master.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index 8cd708c965085..abebef666b2bb 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -1010,6 +1010,9 @@ static int svc_i3c_master_xfer(struct svc_i3c_master *master,
u32 reg;
int ret;

+ /* clean SVC_I3C_MINT_IBIWON w1c bits */
+ writel(SVC_I3C_MINT_IBIWON, master->regs + SVC_I3C_MSTATUS);
+
writel(SVC_I3C_MCTRL_REQUEST_START_ADDR |
xfer_type |
SVC_I3C_MCTRL_IBIRESP_NACK |
@@ -1028,6 +1031,23 @@ static int svc_i3c_master_xfer(struct svc_i3c_master *master,
goto emit_stop;
}

+ /*
+ * According to I3C spec ver 1.1.1, 5.1.2.2.3 Consequence of Controller Starting a Frame
+ * with I3C Target Address.
+ *
+ * The I3C Controller normally should start a Frame, the Address may be arbitrated, and so
+ * the Controller shall monitor to see whether an In-Band Interrupt request, a Controller
+ * Role Request (i.e., Secondary Controller requests to become the Active Controller), or
+ * a Hot-Join Request has been made.
+ *
+ * If missed IBIWON check, the wrong data will be return. When IBIWON happen, return failure
+ * and yield the above events handler.
+ */
+ if (SVC_I3C_MSTATUS_IBIWON(reg)) {
+ ret = -ENXIO;
+ goto emit_stop;
+ }
+
if (rnw)
ret = svc_i3c_master_read(master, in, xfer_len);
else
--
2.34.1

2023-10-18 16:00:58

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 1/6] i3c: master: svc: fix race condition in ibi work thread

The ibi work thread operates asynchronously with other transfers, such as
svc_i3c_master_priv_xfers(). Introduce mutex protection to ensure the
completion of the entire i3c/i2c transaction.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Reviewed-by: Miquel Raynal <[email protected]>
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
- update commit message
- Add Reviewed-by: Miquel Raynal <[email protected]>

drivers/i3c/master/svc-i3c-master.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index d29de5fe533e6..8cd708c965085 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -174,6 +174,7 @@ struct svc_i3c_regs_save {
* @ibi.slots: Available IBI slots
* @ibi.tbq_slot: To be queued IBI slot
* @ibi.lock: IBI lock
+ * @lock: Transfer lock, protect between IBI work thread and callbacks from master
*/
struct svc_i3c_master {
struct i3c_master_controller base;
@@ -202,6 +203,7 @@ struct svc_i3c_master {
/* Prevent races within IBI handlers */
spinlock_t lock;
} ibi;
+ struct mutex lock;
};

/**
@@ -383,6 +385,7 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)
u32 status, val;
int ret;

+ mutex_lock(&master->lock);
/* Acknowledge the incoming interrupt with the AUTOIBI mechanism */
writel(SVC_I3C_MCTRL_REQUEST_AUTO_IBI |
SVC_I3C_MCTRL_IBIRESP_AUTO,
@@ -459,6 +462,7 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)

reenable_ibis:
svc_i3c_master_enable_interrupts(master, SVC_I3C_MINT_SLVSTART);
+ mutex_unlock(&master->lock);
}

static irqreturn_t svc_i3c_master_irq_handler(int irq, void *dev_id)
@@ -1203,9 +1207,11 @@ static int svc_i3c_master_send_bdcast_ccc_cmd(struct svc_i3c_master *master,
cmd->read_len = 0;
cmd->continued = false;

+ mutex_lock(&master->lock);
svc_i3c_master_enqueue_xfer(master, xfer);
if (!wait_for_completion_timeout(&xfer->comp, msecs_to_jiffies(1000)))
svc_i3c_master_dequeue_xfer(master, xfer);
+ mutex_unlock(&master->lock);

ret = xfer->ret;
kfree(buf);
@@ -1249,9 +1255,11 @@ static int svc_i3c_master_send_direct_ccc_cmd(struct svc_i3c_master *master,
cmd->read_len = read_len;
cmd->continued = false;

+ mutex_lock(&master->lock);
svc_i3c_master_enqueue_xfer(master, xfer);
if (!wait_for_completion_timeout(&xfer->comp, msecs_to_jiffies(1000)))
svc_i3c_master_dequeue_xfer(master, xfer);
+ mutex_unlock(&master->lock);

if (cmd->read_len != xfer_len)
ccc->dests[0].payload.len = cmd->read_len;
@@ -1308,9 +1316,11 @@ static int svc_i3c_master_priv_xfers(struct i3c_dev_desc *dev,
cmd->continued = (i + 1) < nxfers;
}

+ mutex_lock(&master->lock);
svc_i3c_master_enqueue_xfer(master, xfer);
if (!wait_for_completion_timeout(&xfer->comp, msecs_to_jiffies(1000)))
svc_i3c_master_dequeue_xfer(master, xfer);
+ mutex_unlock(&master->lock);

ret = xfer->ret;
svc_i3c_master_free_xfer(xfer);
@@ -1346,9 +1356,11 @@ static int svc_i3c_master_i2c_xfers(struct i2c_dev_desc *dev,
cmd->continued = (i + 1 < nxfers);
}

+ mutex_lock(&master->lock);
svc_i3c_master_enqueue_xfer(master, xfer);
if (!wait_for_completion_timeout(&xfer->comp, msecs_to_jiffies(1000)))
svc_i3c_master_dequeue_xfer(master, xfer);
+ mutex_unlock(&master->lock);

ret = xfer->ret;
svc_i3c_master_free_xfer(xfer);
@@ -1539,6 +1551,8 @@ static int svc_i3c_master_probe(struct platform_device *pdev)

INIT_WORK(&master->hj_work, svc_i3c_master_hj_work);
INIT_WORK(&master->ibi_work, svc_i3c_master_ibi_work);
+ mutex_init(&master->lock);
+
ret = devm_request_irq(dev, master->irq, svc_i3c_master_irq_handler,
IRQF_NO_SUSPEND, "svc-i3c-irq", master);
if (ret)
--
2.34.1

2023-10-18 16:01:21

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 3/6] i3c: master: svc: fix ibi may not return mandatory data byte

MSTATUS[RXPEND] is only updated after the data transfer cycle started. This
creates an issue when the I3C clock is slow, and the CPU is running fast
enough that MSTATUS[RXPEND] may not be updated when the code reach checking
point. As a result, mandatory data are being missed.

Add a wait for MSTATUS[COMPLETE] to ensure that all mandatory data is
already in FIFO.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
- update commit message
it also works without mandatory bytes

drivers/i3c/master/svc-i3c-master.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index abebef666b2bb..dd06b7c9333f1 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -332,6 +332,7 @@ static int svc_i3c_master_handle_ibi(struct svc_i3c_master *master,
struct i3c_ibi_slot *slot;
unsigned int count;
u32 mdatactrl;
+ int ret, val;
u8 *buf;

slot = i3c_generic_ibi_get_free_slot(data->ibi_pool);
@@ -341,6 +342,13 @@ static int svc_i3c_master_handle_ibi(struct svc_i3c_master *master,
slot->len = 0;
buf = slot->data;

+ ret = readl_relaxed_poll_timeout(master->regs + SVC_I3C_MSTATUS, val,
+ SVC_I3C_MSTATUS_COMPLETE(val), 0, 1000);
+ if (ret) {
+ dev_err(master->dev, "Timeout when polling for COMPLETE\n");
+ return ret;
+ }
+
while (SVC_I3C_MSTATUS_RXPEND(readl(master->regs + SVC_I3C_MSTATUS)) &&
slot->len < SVC_I3C_FIFO_SIZE) {
mdatactrl = readl(master->regs + SVC_I3C_MDATACTRL);
--
2.34.1

2023-10-18 16:01:22

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

master side report:
silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000

BIT 20: TIMEOUT error
The module has stalled too long in a frame. This happens when:
- The TX FIFO or RX FIFO is not handled and the bus is stuck in the
middle of a message,
- No STOP was issued and between messages,
- IBI manual is used and no decision was made.
The maximum stall period is 10 KHz or 100 μs.

This is a just warning. System irq thread schedule latency is possible
bigger than 100us. Just omit this waring.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
-none

drivers/i3c/master/svc-i3c-master.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index 1a57fdebaa26d..fedb31e0076c4 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -93,6 +93,7 @@
#define SVC_I3C_MINTMASKED 0x098
#define SVC_I3C_MERRWARN 0x09C
#define SVC_I3C_MERRWARN_NACK BIT(2)
+#define SVC_I3C_MERRWARN_TIMEOUT BIT(20)
#define SVC_I3C_MDMACTRL 0x0A0
#define SVC_I3C_MDATACTRL 0x0AC
#define SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
@@ -226,6 +227,11 @@ static bool svc_i3c_master_error(struct svc_i3c_master *master)
if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
+
+ /* ignore timeout error */
+ if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
+ return false;
+
dev_err(master->dev,
"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
mstatus, merrwarn);
--
2.34.1

2023-10-18 16:01:24

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 4/6] i3c: master: svc: fix check wrong status register in irq handler

svc_i3c_master_irq_handler() wrong check register SVC_I3C_MINTMASKED. It
should be SVC_I3C_MSTATUS.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Reviewed-by: Miquel Raynal <[email protected]>
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
- Add Reviewed-by: Miquel Raynal <[email protected]>

drivers/i3c/master/svc-i3c-master.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index dd06b7c9333f1..b113460f059c3 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -476,7 +476,7 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)
static irqreturn_t svc_i3c_master_irq_handler(int irq, void *dev_id)
{
struct svc_i3c_master *master = (struct svc_i3c_master *)dev_id;
- u32 active = readl(master->regs + SVC_I3C_MINTMASKED);
+ u32 active = readl(master->regs + SVC_I3C_MSTATUS);

if (!SVC_I3C_MSTATUS_SLVSTART(active))
return IRQ_NONE;
--
2.34.1

2023-10-18 16:01:27

by Frank Li

[permalink] [raw]
Subject: [PATCH v2 Resent 5/6] i3c: master: svc: fix SDA keep low when polling IBIWON timeout happen

If IBIWON timeout happen, SDA line will always keep low because miss emit
stop.

Call svc_i3c_master_emit_stop() to let i3c bus come back to idle statue
when IBIWON timeout happen.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: [email protected]
Signed-off-by: Frank Li <[email protected]>
---

Notes:
Change from v1 to v2
- Update commite message

drivers/i3c/master/svc-i3c-master.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index b113460f059c3..1a57fdebaa26d 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -404,6 +404,7 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)
SVC_I3C_MSTATUS_IBIWON(val), 0, 1000);
if (ret) {
dev_err(master->dev, "Timeout when polling for IBIWON\n");
+ svc_i3c_master_emit_stop(master);
goto reenable_ibis;
}

--
2.34.1

2023-10-19 06:29:44

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 3/6] i3c: master: svc: fix ibi may not return mandatory data byte

Hi Frank,

[email protected] wrote on Wed, 18 Oct 2023 11:59:23 -0400:

> MSTATUS[RXPEND] is only updated after the data transfer cycle started. This
> creates an issue when the I3C clock is slow, and the CPU is running fast
> enough that MSTATUS[RXPEND] may not be updated when the code reach checking
> point. As a result, mandatory data are being missed.

reaches

can be missed

> Add a wait for MSTATUS[COMPLETE] to ensure that all mandatory data is
> already in FIFO.
>
> Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> Cc: [email protected]
> Signed-off-by: Frank Li <[email protected]>
> ---
>
> Notes:
> Change from v1 to v2
> - update commit message
> it also works without mandatory bytes

This could be mentioned in the commit log?

>
> drivers/i3c/master/svc-i3c-master.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> index abebef666b2bb..dd06b7c9333f1 100644
> --- a/drivers/i3c/master/svc-i3c-master.c
> +++ b/drivers/i3c/master/svc-i3c-master.c
> @@ -332,6 +332,7 @@ static int svc_i3c_master_handle_ibi(struct svc_i3c_master *master,
> struct i3c_ibi_slot *slot;
> unsigned int count;
> u32 mdatactrl;
> + int ret, val;
> u8 *buf;
>
> slot = i3c_generic_ibi_get_free_slot(data->ibi_pool);
> @@ -341,6 +342,13 @@ static int svc_i3c_master_handle_ibi(struct svc_i3c_master *master,
> slot->len = 0;
> buf = slot->data;
>
> + ret = readl_relaxed_poll_timeout(master->regs + SVC_I3C_MSTATUS, val,
> + SVC_I3C_MSTATUS_COMPLETE(val), 0, 1000);
> + if (ret) {
> + dev_err(master->dev, "Timeout when polling for COMPLETE\n");
> + return ret;
> + }
> +
> while (SVC_I3C_MSTATUS_RXPEND(readl(master->regs + SVC_I3C_MSTATUS)) &&
> slot->len < SVC_I3C_FIFO_SIZE) {
> mdatactrl = readl(master->regs + SVC_I3C_MDATACTRL);


Thanks,
Miquèl

2023-10-19 06:32:20

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 5/6] i3c: master: svc: fix SDA keep low when polling IBIWON timeout happen

Hi Frank,

[email protected] wrote on Wed, 18 Oct 2023 11:59:25 -0400:

> If IBIWON timeout happen, SDA line will always keep low because miss emit
> stop.

"
Upon IBIWON timeout, the SDA line will always be kept low if we don't
emit a stop. Calling svc_i3c_master_emit_stop() there will let the bus
return to idle state.
"

With this commit log,

Reviewed-by: Miquel Raynal <[email protected]>

>
> Call svc_i3c_master_emit_stop() to let i3c bus come back to idle statue
> when IBIWON timeout happen.
>
> Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> Cc: [email protected]
> Signed-off-by: Frank Li <[email protected]>
> ---
>
> Notes:
> Change from v1 to v2
> - Update commite message
>
> drivers/i3c/master/svc-i3c-master.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> index b113460f059c3..1a57fdebaa26d 100644
> --- a/drivers/i3c/master/svc-i3c-master.c
> +++ b/drivers/i3c/master/svc-i3c-master.c
> @@ -404,6 +404,7 @@ static void svc_i3c_master_ibi_work(struct work_struct *work)
> SVC_I3C_MSTATUS_IBIWON(val), 0, 1000);
> if (ret) {
> dev_err(master->dev, "Timeout when polling for IBIWON\n");
> + svc_i3c_master_emit_stop(master);
> goto reenable_ibis;
> }
>


Thanks,
Miquèl

2023-10-19 06:46:09

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

Hi Frank,

[email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:

> master side report:
> silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
>
> BIT 20: TIMEOUT error
> The module has stalled too long in a frame. This happens when:
> - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> middle of a message,
> - No STOP was issued and between messages,
> - IBI manual is used and no decision was made.

I am still not convinced this should be ignored in all cases.

Case 1 is a problem because the hardware failed somehow.
Case 2 is fine I guess.
Case 3 is not possible in Linux, this will not be supported.

> The maximum stall period is 10 KHz or 100 μs.

s/10 KHz//

>
> This is a just warning. System irq thread schedule latency is possible
> bigger than 100us. Just omit this waring.

This can be considered as being just a warning as the system IRQ
latency can easily be greater than 100us.

>
> Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> Cc: [email protected]
> Signed-off-by: Frank Li <[email protected]>
> ---
>
> Notes:
> Change from v1 to v2
> -none
>
> drivers/i3c/master/svc-i3c-master.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> index 1a57fdebaa26d..fedb31e0076c4 100644
> --- a/drivers/i3c/master/svc-i3c-master.c
> +++ b/drivers/i3c/master/svc-i3c-master.c
> @@ -93,6 +93,7 @@
> #define SVC_I3C_MINTMASKED 0x098
> #define SVC_I3C_MERRWARN 0x09C
> #define SVC_I3C_MERRWARN_NACK BIT(2)
> +#define SVC_I3C_MERRWARN_TIMEOUT BIT(20)
> #define SVC_I3C_MDMACTRL 0x0A0
> #define SVC_I3C_MDATACTRL 0x0AC
> #define SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
> @@ -226,6 +227,11 @@ static bool svc_i3c_master_error(struct svc_i3c_master *master)
> if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
> merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
> writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
> +
> + /* ignore timeout error */
> + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> + return false;
> +
> dev_err(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);


Thanks,
Miquèl

2023-10-19 15:40:04

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> Hi Frank,
>
> [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
>
> > master side report:
> > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> >
> > BIT 20: TIMEOUT error
> > The module has stalled too long in a frame. This happens when:
> > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > middle of a message,
> > - No STOP was issued and between messages,
> > - IBI manual is used and no decision was made.
>
> I am still not convinced this should be ignored in all cases.
>
> Case 1 is a problem because the hardware failed somehow.

But so far, no action to handle this case in current code.

In svc_i3c_master_xfer() have not check this flags. also have not enable
ERRWARN irq.

If we met this case, we can add new functions/argument to handle this.
Then we can real debug the code and recover bus.

Without this patch, simplest add some debug message before issue
SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.

And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
So I can think only case 3 happen in svc_i3c_master_ibi_work().

Frank

> Case 2 is fine I guess.
> Case 3 is not possible in Linux, this will not be supported.
>
> > The maximum stall period is 10 KHz or 100 μs.
>
> s/10 KHz//
>
> >
> > This is a just warning. System irq thread schedule latency is possible
> > bigger than 100us. Just omit this waring.
>
> This can be considered as being just a warning as the system IRQ
> latency can easily be greater than 100us.
>
> >
> > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > Cc: [email protected]
> > Signed-off-by: Frank Li <[email protected]>
> > ---
> >
> > Notes:
> > Change from v1 to v2
> > -none
> >
> > drivers/i3c/master/svc-i3c-master.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> > index 1a57fdebaa26d..fedb31e0076c4 100644
> > --- a/drivers/i3c/master/svc-i3c-master.c
> > +++ b/drivers/i3c/master/svc-i3c-master.c
> > @@ -93,6 +93,7 @@
> > #define SVC_I3C_MINTMASKED 0x098
> > #define SVC_I3C_MERRWARN 0x09C
> > #define SVC_I3C_MERRWARN_NACK BIT(2)
> > +#define SVC_I3C_MERRWARN_TIMEOUT BIT(20)
> > #define SVC_I3C_MDMACTRL 0x0A0
> > #define SVC_I3C_MDATACTRL 0x0AC
> > #define SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
> > @@ -226,6 +227,11 @@ static bool svc_i3c_master_error(struct svc_i3c_master *master)
> > if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
> > merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
> > writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
> > +
> > + /* ignore timeout error */
> > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > + return false;
> > +
> > dev_err(master->dev,
> > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > mstatus, merrwarn);
>
>
> Thanks,
> Miquèl

2023-10-20 14:07:04

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

Hi Frank,

[email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:

> On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> >
> > > master side report:
> > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > >
> > > BIT 20: TIMEOUT error
> > > The module has stalled too long in a frame. This happens when:
> > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > middle of a message,
> > > - No STOP was issued and between messages,
> > > - IBI manual is used and no decision was made.
> >
> > I am still not convinced this should be ignored in all cases.
> >
> > Case 1 is a problem because the hardware failed somehow.
>
> But so far, no action to handle this case in current code.

Yes, but if you detect an issue and ignore it, it's not better than
reporting it without handling it. Instead of totally ignoring this I
would at least write a debug message (identical to what's below) before
returning false, even though I am not convinced unconditionally
returning false here is wise. If you fail a hardware sequence because
you added a printk, it's a problem. Maybe you consider this line as
noise, but I believe it's still an error condition. Maybe, however,
this bit gets set after the whole sequence, and this is just a "bus
is idle" condition. If that's the case, then you need some
additional heuristics to properly ignore the bit?

> In svc_i3c_master_xfer() have not check this flags. also have not enable
> ERRWARN irq.
>
> If we met this case, we can add new functions/argument to handle this.
> Then we can real debug the code and recover bus.
>
> Without this patch, simplest add some debug message before issue
> SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.

Yes, and sometimes it won't be an issue, but sometimes it may. Maybe we
can find more advanced heuristics there.

> And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
>
> So I can think only case 3 happen in svc_i3c_master_ibi_work().

Case 3 cannot be handled by Linux (because of the natural latency of
the OS).

>
> Frank
>
> > Case 2 is fine I guess.
> > Case 3 is not possible in Linux, this will not be supported.
> >
> > > The maximum stall period is 10 KHz or 100 μs.
> >
> > s/10 KHz//
> >
> > >
> > > This is a just warning. System irq thread schedule latency is possible
> > > bigger than 100us. Just omit this waring.
> >
> > This can be considered as being just a warning as the system IRQ
> > latency can easily be greater than 100us.

This was skipped in your v3.

> > > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > > Cc: [email protected]
> > > Signed-off-by: Frank Li <[email protected]>
> > > ---

Thanks,
Miquèl

2023-10-20 14:19:25

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> Hi Frank,
>
> [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
>
> > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > >
> > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > >
> > > > master side report:
> > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > >
> > > > BIT 20: TIMEOUT error
> > > > The module has stalled too long in a frame. This happens when:
> > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > middle of a message,
> > > > - No STOP was issued and between messages,
> > > > - IBI manual is used and no decision was made.
> > >
> > > I am still not convinced this should be ignored in all cases.
> > >
> > > Case 1 is a problem because the hardware failed somehow.
> >
> > But so far, no action to handle this case in current code.
>
> Yes, but if you detect an issue and ignore it, it's not better than
> reporting it without handling it. Instead of totally ignoring this I
> would at least write a debug message (identical to what's below) before
> returning false, even though I am not convinced unconditionally
> returning false here is wise. If you fail a hardware sequence because
> you added a printk, it's a problem. Maybe you consider this line as
> noise, but I believe it's still an error condition. Maybe, however,
> this bit gets set after the whole sequence, and this is just a "bus
> is idle" condition. If that's the case, then you need some
> additional heuristics to properly ignore the bit?
>

dev_err(master->dev,
"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
mstatus, merrwarn);
+
+ /* ignore timeout error */
+ if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
+ return false;
+

Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

Frank


> > In svc_i3c_master_xfer() have not check this flags. also have not enable
> > ERRWARN irq.
> >
> > If we met this case, we can add new functions/argument to handle this.
> > Then we can real debug the code and recover bus.
> >
> > Without this patch, simplest add some debug message before issue
> > SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.
>
> Yes, and sometimes it won't be an issue, but sometimes it may. Maybe we
> can find more advanced heuristics there.
>
> > And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
> >
> > So I can think only case 3 happen in svc_i3c_master_ibi_work().
>
> Case 3 cannot be handled by Linux (because of the natural latency of
> the OS).
>
> >
> > Frank
> >
> > > Case 2 is fine I guess.
> > > Case 3 is not possible in Linux, this will not be supported.
> > >
> > > > The maximum stall period is 10 KHz or 100 μs.
> > >
> > > s/10 KHz//
> > >
> > > >
> > > > This is a just warning. System irq thread schedule latency is possible
> > > > bigger than 100us. Just omit this waring.
> > >
> > > This can be considered as being just a warning as the system IRQ
> > > latency can easily be greater than 100us.
>
> This was skipped in your v3.
>
> > > > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > > > Cc: [email protected]
> > > > Signed-off-by: Frank Li <[email protected]>
> > > > ---
>
> Thanks,
> Miquèl

2023-10-20 14:35:50

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

Hi Frank,

[email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:

> On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> >
> > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > >
> > > > > master side report:
> > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > >
> > > > > BIT 20: TIMEOUT error
> > > > > The module has stalled too long in a frame. This happens when:
> > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > middle of a message,
> > > > > - No STOP was issued and between messages,
> > > > > - IBI manual is used and no decision was made.
> > > >
> > > > I am still not convinced this should be ignored in all cases.
> > > >
> > > > Case 1 is a problem because the hardware failed somehow.
> > >
> > > But so far, no action to handle this case in current code.
> >
> > Yes, but if you detect an issue and ignore it, it's not better than
> > reporting it without handling it. Instead of totally ignoring this I
> > would at least write a debug message (identical to what's below) before
> > returning false, even though I am not convinced unconditionally
> > returning false here is wise. If you fail a hardware sequence because
> > you added a printk, it's a problem. Maybe you consider this line as
> > noise, but I believe it's still an error condition. Maybe, however,
> > this bit gets set after the whole sequence, and this is just a "bus
> > is idle" condition. If that's the case, then you need some
> > additional heuristics to properly ignore the bit?
> >
>
> dev_err(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> +
> + /* ignore timeout error */
> + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> + return false;
> +
>
> Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

I think you mentioned earlier that the problem was not the printk but
the return value. So perhaps there is a way to know if the timeout
happened after a transaction and was legitimate or not?

In any case we should probably lower the log level for this error.

Thanks,
Miquèl

2023-10-20 14:48:22

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> Hi Frank,
>
> [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
>
> > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > >
> > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > >
> > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > Hi Frank,
> > > > >
> > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > >
> > > > > > master side report:
> > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > >
> > > > > > BIT 20: TIMEOUT error
> > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > middle of a message,
> > > > > > - No STOP was issued and between messages,
> > > > > > - IBI manual is used and no decision was made.
> > > > >
> > > > > I am still not convinced this should be ignored in all cases.
> > > > >
> > > > > Case 1 is a problem because the hardware failed somehow.
> > > >
> > > > But so far, no action to handle this case in current code.
> > >
> > > Yes, but if you detect an issue and ignore it, it's not better than
> > > reporting it without handling it. Instead of totally ignoring this I
> > > would at least write a debug message (identical to what's below) before
> > > returning false, even though I am not convinced unconditionally
> > > returning false here is wise. If you fail a hardware sequence because
> > > you added a printk, it's a problem. Maybe you consider this line as
> > > noise, but I believe it's still an error condition. Maybe, however,
> > > this bit gets set after the whole sequence, and this is just a "bus
> > > is idle" condition. If that's the case, then you need some
> > > additional heuristics to properly ignore the bit?
> > >
> >
> > dev_err(master->dev,
> > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > mstatus, merrwarn);
> > +
> > + /* ignore timeout error */
> > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > + return false;
> > +
> >
> > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
>
> I think you mentioned earlier that the problem was not the printk but
> the return value. So perhaps there is a way to know if the timeout
> happened after a transaction and was legitimate or not?

Error message just annoise user, don't impact function. But return false
let IBI thread running to avoid dead lock.

>
> In any case we should probably lower the log level for this error.

Only SVC_I3C_MERRWARN_TIMEOUT is warning

Maybe below logic is better

if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
dev_dbg(master->dev,
"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
mstatus, merrwarn);
return false;
}

dev_err(master->dev,
"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
mstatus, merrwarn);
....

Frank

>
> Thanks,
> Miqu?l

2023-10-20 15:17:40

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

On Fri, Oct 20, 2023 at 10:47:52AM -0400, Frank Li wrote:
> On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> >
> > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > >
> > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > Hi Frank,
> > > > > >
> > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > >
> > > > > > > master side report:
> > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > >
> > > > > > > BIT 20: TIMEOUT error
> > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > middle of a message,
> > > > > > > - No STOP was issued and between messages,
> > > > > > > - IBI manual is used and no decision was made.
> > > > > >
> > > > > > I am still not convinced this should be ignored in all cases.
> > > > > >
> > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > >
> > > > > But so far, no action to handle this case in current code.
> > > >
> > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > reporting it without handling it. Instead of totally ignoring this I
> > > > would at least write a debug message (identical to what's below) before
> > > > returning false, even though I am not convinced unconditionally
> > > > returning false here is wise. If you fail a hardware sequence because
> > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > is idle" condition. If that's the case, then you need some
> > > > additional heuristics to properly ignore the bit?
> > > >
> > >
> > > dev_err(master->dev,
> > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > mstatus, merrwarn);
> > > +
> > > + /* ignore timeout error */
> > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > + return false;
> > > +
> > >
> > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> >
> > I think you mentioned earlier that the problem was not the printk but
> > the return value. So perhaps there is a way to know if the timeout
> > happened after a transaction and was legitimate or not?
>
> Error message just annoise user, don't impact function. But return false
> let IBI thread running to avoid dead lock.

I forget mention one thing. Any error message here will make SDA low for
longer. Before emit stop, SDA is low.

I have not checked I3C spec yet about how long SDA will be allowed. it will
worser if message go through uart port. The bus will be locked longer.

It's better to print error message after emit_stop to reduce SDA low time.

Frank

>
> >
> > In any case we should probably lower the log level for this error.
>
> Only SVC_I3C_MERRWARN_TIMEOUT is warning
>
> Maybe below logic is better
>
> if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> dev_dbg(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> return false;
> }
>
> dev_err(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> ....
>
> Frank
>
> >
> > Thanks,
> > Miqu?l

2023-10-20 15:21:14

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

Hi Frank,

[email protected] wrote on Fri, 20 Oct 2023 10:47:52 -0400:

> On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> >
> > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > >
> > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > Hi Frank,
> > > > > >
> > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > >
> > > > > > > master side report:
> > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > >
> > > > > > > BIT 20: TIMEOUT error
> > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > middle of a message,
> > > > > > > - No STOP was issued and between messages,
> > > > > > > - IBI manual is used and no decision was made.
> > > > > >
> > > > > > I am still not convinced this should be ignored in all cases.
> > > > > >
> > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > >
> > > > > But so far, no action to handle this case in current code.
> > > >
> > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > reporting it without handling it. Instead of totally ignoring this I
> > > > would at least write a debug message (identical to what's below) before
> > > > returning false, even though I am not convinced unconditionally
> > > > returning false here is wise. If you fail a hardware sequence because
> > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > is idle" condition. If that's the case, then you need some
> > > > additional heuristics to properly ignore the bit?
> > > >
> > >
> > > dev_err(master->dev,
> > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > mstatus, merrwarn);
> > > +
> > > + /* ignore timeout error */
> > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > + return false;
> > > +
> > >
> > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> >
> > I think you mentioned earlier that the problem was not the printk but
> > the return value. So perhaps there is a way to know if the timeout
> > happened after a transaction and was legitimate or not?
>
> Error message just annoise user, don't impact function. But return false
> let IBI thread running to avoid dead lock.
>
> >
> > In any case we should probably lower the log level for this error.
>
> Only SVC_I3C_MERRWARN_TIMEOUT is warning
>
> Maybe below logic is better
>
> if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> dev_dbg(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> return false;
> }
>
> dev_err(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> ....
>

Yes, this looks better but I wonder if we should add an additional
condition to just return false in this case; something saying "this
timeout is legitimate and has no impact".

Thanks,
Miquèl

2023-10-20 15:25:40

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

Hi Frank,

[email protected] wrote on Fri, 20 Oct 2023 11:17:17 -0400:

> On Fri, Oct 20, 2023 at 10:47:52AM -0400, Frank Li wrote:
> > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > >
> > > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > >
> > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > > Hi Frank,
> > > > >
> > > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > >
> > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > > Hi Frank,
> > > > > > >
> > > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > >
> > > > > > > > master side report:
> > > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > >
> > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > middle of a message,
> > > > > > > > - No STOP was issued and between messages,
> > > > > > > > - IBI manual is used and no decision was made.
> > > > > > >
> > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > >
> > > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > > >
> > > > > > But so far, no action to handle this case in current code.
> > > > >
> > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > would at least write a debug message (identical to what's below) before
> > > > > returning false, even though I am not convinced unconditionally
> > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > is idle" condition. If that's the case, then you need some
> > > > > additional heuristics to properly ignore the bit?
> > > > >
> > > >
> > > > dev_err(master->dev,
> > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > mstatus, merrwarn);
> > > > +
> > > > + /* ignore timeout error */
> > > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > + return false;
> > > > +
> > > >
> > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > >
> > > I think you mentioned earlier that the problem was not the printk but
> > > the return value. So perhaps there is a way to know if the timeout
> > > happened after a transaction and was legitimate or not?
> >
> > Error message just annoise user, don't impact function. But return false
> > let IBI thread running to avoid dead lock.
>
> I forget mention one thing. Any error message here will make SDA low for
> longer. Before emit stop, SDA is low.
>
> I have not checked I3C spec yet about how long SDA will be allowed. it will
> worser if message go through uart port. The bus will be locked longer.
>
> It's better to print error message after emit_stop to reduce SDA low time.

That's fine I guess.

Thanks,
Miquèl

2023-10-20 15:48:15

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout errory

On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> Hi Frank,
>
> [email protected] wrote on Fri, 20 Oct 2023 10:47:52 -0400:
>
> > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > >
> > > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > >
> > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > > Hi Frank,
> > > > >
> > > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > >
> > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > > Hi Frank,
> > > > > > >
> > > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > >
> > > > > > > > master side report:
> > > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > >
> > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > middle of a message,
> > > > > > > > - No STOP was issued and between messages,
> > > > > > > > - IBI manual is used and no decision was made.
> > > > > > >
> > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > >
> > > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > > >
> > > > > > But so far, no action to handle this case in current code.
> > > > >
> > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > would at least write a debug message (identical to what's below) before
> > > > > returning false, even though I am not convinced unconditionally
> > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > is idle" condition. If that's the case, then you need some
> > > > > additional heuristics to properly ignore the bit?
> > > > >
> > > >
> > > > dev_err(master->dev,
> > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > mstatus, merrwarn);
> > > > +
> > > > + /* ignore timeout error */
> > > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > + return false;
> > > > +
> > > >
> > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > >
> > > I think you mentioned earlier that the problem was not the printk but
> > > the return value. So perhaps there is a way to know if the timeout
> > > happened after a transaction and was legitimate or not?
> >
> > Error message just annoise user, don't impact function. But return false
> > let IBI thread running to avoid dead lock.
> >
> > >
> > > In any case we should probably lower the log level for this error.
> >
> > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> >
> > Maybe below logic is better
> >
> > if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > dev_dbg(master->dev,
> > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > mstatus, merrwarn);
> > return false;
> > }
> >
> > dev_err(master->dev,
> > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > mstatus, merrwarn);
> > ....
> >
>
> Yes, this looks better but I wonder if we should add an additional
> condition to just return false in this case;

What's additional condition we can check?

> something saying "this
> timeout is legitimate and has no impact".

Add comments "this timeout is legitimate and has no impact" or dev_dbg
print that?

>
> Thanks,
> Miqu?l

2023-10-20 17:03:50

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout errory

Hi Frank,

[email protected] wrote on Fri, 20 Oct 2023 11:47:48 -0400:

> On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> >
> > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > >
> > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > > > Hi Frank,
> > > > > >
> > > > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > >
> > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > > > Hi Frank,
> > > > > > > >
> > > > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > >
> > > > > > > > > master side report:
> > > > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > >
> > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > middle of a message,
> > > > > > > > > - No STOP was issued and between messages,
> > > > > > > > > - IBI manual is used and no decision was made.
> > > > > > > >
> > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > >
> > > > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > > > >
> > > > > > > But so far, no action to handle this case in current code.
> > > > > >
> > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > would at least write a debug message (identical to what's below) before
> > > > > > returning false, even though I am not convinced unconditionally
> > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > is idle" condition. If that's the case, then you need some
> > > > > > additional heuristics to properly ignore the bit?
> > > > > >
> > > > >
> > > > > dev_err(master->dev,
> > > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > mstatus, merrwarn);
> > > > > +
> > > > > + /* ignore timeout error */
> > > > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > + return false;
> > > > > +
> > > > >
> > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > > >
> > > > I think you mentioned earlier that the problem was not the printk but
> > > > the return value. So perhaps there is a way to know if the timeout
> > > > happened after a transaction and was legitimate or not?
> > >
> > > Error message just annoise user, don't impact function. But return false
> > > let IBI thread running to avoid dead lock.
> > >
> > > >
> > > > In any case we should probably lower the log level for this error.
> > >
> > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > >
> > > Maybe below logic is better
> > >
> > > if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > dev_dbg(master->dev,
> > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > mstatus, merrwarn);
> > > return false;
> > > }
> > >
> > > dev_err(master->dev,
> > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > mstatus, merrwarn);
> > > ....
> > >
> >
> > Yes, this looks better but I wonder if we should add an additional
> > condition to just return false in this case;
>
> What's additional condition we can check?

Well, you're the one bothered with an error case which is not a real
error. You're saying "this error is never a problem" and I am saying
that I believe it is not a problem is your particular case, but in
general there might be situations where it *is* a problem. So you need
to find proper conditions to check against in order to determine
whether this is just an info with no consequence or an error.

> > something saying "this
> > timeout is legitimate and has no impact".
>
> Add comments "this timeout is legitimate and has no impact" or dev_dbg
> print that?

No I'm talking about the additional heuristics.

Thanks,
Miquèl

2023-10-20 19:59:00

by Frank Li

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout errory

On Fri, Oct 20, 2023 at 07:03:37PM +0200, Miquel Raynal wrote:
> Hi Frank,
>
> [email protected] wrote on Fri, 20 Oct 2023 11:47:48 -0400:
>
> > On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > >
> > > [email protected] wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> > >
> > > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > > > Hi Frank,
> > > > >
> > > > > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > > >
> > > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > > > > Hi Frank,
> > > > > > >
> > > > > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > > >
> > > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > > > > Hi Frank,
> > > > > > > > >
> > > > > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > > >
> > > > > > > > > > master side report:
> > > > > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > > >
> > > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > > middle of a message,
> > > > > > > > > > - No STOP was issued and between messages,
> > > > > > > > > > - IBI manual is used and no decision was made.
> > > > > > > > >
> > > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > > >
> > > > > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > > > > >
> > > > > > > > But so far, no action to handle this case in current code.
> > > > > > >
> > > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > > would at least write a debug message (identical to what's below) before
> > > > > > > returning false, even though I am not convinced unconditionally
> > > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > > is idle" condition. If that's the case, then you need some
> > > > > > > additional heuristics to properly ignore the bit?
> > > > > > >
> > > > > >
> > > > > > dev_err(master->dev,
> > > > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > > mstatus, merrwarn);
> > > > > > +
> > > > > > + /* ignore timeout error */
> > > > > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > > + return false;
> > > > > > +
> > > > > >
> > > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > > > >
> > > > > I think you mentioned earlier that the problem was not the printk but
> > > > > the return value. So perhaps there is a way to know if the timeout
> > > > > happened after a transaction and was legitimate or not?
> > > >
> > > > Error message just annoise user, don't impact function. But return false
> > > > let IBI thread running to avoid dead lock.
> > > >
> > > > >
> > > > > In any case we should probably lower the log level for this error.
> > > >
> > > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > > >
> > > > Maybe below logic is better
> > > >
> > > > if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > > dev_dbg(master->dev,
> > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > mstatus, merrwarn);
> > > > return false;
> > > > }
> > > >
> > > > dev_err(master->dev,
> > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > mstatus, merrwarn);
> > > > ....
> > > >
> > >
> > > Yes, this looks better but I wonder if we should add an additional
> > > condition to just return false in this case;
> >
> > What's additional condition we can check?
>
> Well, you're the one bothered with an error case which is not a real
> error. You're saying "this error is never a problem" and I am saying
> that I believe it is not a problem is your particular case, but in
> general there might be situations where it *is* a problem. So you need
> to find proper conditions to check against in order to determine
> whether this is just an info with no consequence or an error.

I checked R** code of this TIMEOUT, which is quite simple, set to 1 if SDA
is low over 100us if I understand correctly. I also checked, if I add delay
before emit stop, TIMEOUT will be set. (Read can auto emit stop accoring to
RDTERM, so just saw TIMEOUT at write transaction).

TIMEOUT just means condition "I3C bus's SDA low over 100us" happened since
written 1 to TIMEOUT.

I think "I3C bus's SDA over 100us" means nothing for linux drivers.

I think there are NO sitation where it *is* a problem. If it was problem,
there are NO solution to resolve it at linux driver side. And I think it
already happen many times silencely.

Frank

>
> > > something saying "this
> > > timeout is legitimate and has no impact".
> >
> > Add comments "this timeout is legitimate and has no impact" or dev_dbg
> > print that?
>
> No I'm talking about the additional heuristics.
>
> Thanks,
> Miqu?l

2023-10-23 07:49:30

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout errory

Hi Frank,

[email protected] wrote on Fri, 20 Oct 2023 15:58:25 -0400:

> On Fri, Oct 20, 2023 at 07:03:37PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > [email protected] wrote on Fri, 20 Oct 2023 11:47:48 -0400:
> >
> > > On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > [email protected] wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> > > >
> > > > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > > > > Hi Frank,
> > > > > >
> > > > > > [email protected] wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > > > >
> > > > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > > > > > Hi Frank,
> > > > > > > >
> > > > > > > > [email protected] wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > > > >
> > > > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > > > > > Hi Frank,
> > > > > > > > > >
> > > > > > > > > > [email protected] wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > > > >
> > > > > > > > > > > master side report:
> > > > > > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > > > >
> > > > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > > > middle of a message,
> > > > > > > > > > > - No STOP was issued and between messages,
> > > > > > > > > > > - IBI manual is used and no decision was made.
> > > > > > > > > >
> > > > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > > > >
> > > > > > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > > > > > >
> > > > > > > > > But so far, no action to handle this case in current code.
> > > > > > > >
> > > > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > > > would at least write a debug message (identical to what's below) before
> > > > > > > > returning false, even though I am not convinced unconditionally
> > > > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > > > is idle" condition. If that's the case, then you need some
> > > > > > > > additional heuristics to properly ignore the bit?
> > > > > > > >
> > > > > > >
> > > > > > > dev_err(master->dev,
> > > > > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > > > mstatus, merrwarn);
> > > > > > > +
> > > > > > > + /* ignore timeout error */
> > > > > > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > > > + return false;
> > > > > > > +
> > > > > > >
> > > > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > > > > >
> > > > > > I think you mentioned earlier that the problem was not the printk but
> > > > > > the return value. So perhaps there is a way to know if the timeout
> > > > > > happened after a transaction and was legitimate or not?
> > > > >
> > > > > Error message just annoise user, don't impact function. But return false
> > > > > let IBI thread running to avoid dead lock.
> > > > >
> > > > > >
> > > > > > In any case we should probably lower the log level for this error.
> > > > >
> > > > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > > > >
> > > > > Maybe below logic is better
> > > > >
> > > > > if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > > > dev_dbg(master->dev,
> > > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > mstatus, merrwarn);
> > > > > return false;
> > > > > }
> > > > >
> > > > > dev_err(master->dev,
> > > > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > mstatus, merrwarn);
> > > > > ....
> > > > >
> > > >
> > > > Yes, this looks better but I wonder if we should add an additional
> > > > condition to just return false in this case;
> > >
> > > What's additional condition we can check?
> >
> > Well, you're the one bothered with an error case which is not a real
> > error. You're saying "this error is never a problem" and I am saying
> > that I believe it is not a problem is your particular case, but in
> > general there might be situations where it *is* a problem. So you need
> > to find proper conditions to check against in order to determine
> > whether this is just an info with no consequence or an error.
>
> I checked R** code of this TIMEOUT, which is quite simple, set to 1 if SDA
> is low over 100us if I understand correctly. I also checked, if I add delay
> before emit stop, TIMEOUT will be set. (Read can auto emit stop accoring to
> RDTERM, so just saw TIMEOUT at write transaction).
>
> TIMEOUT just means condition "I3C bus's SDA low over 100us" happened since
> written 1 to TIMEOUT.
>
> I think "I3C bus's SDA over 100us" means nothing for linux drivers.
>
> I think there are NO sitation where it *is* a problem. If it was problem,
> there are NO solution to resolve it at linux driver side. And I think it
> already happen many times silencely.

Ok then, I'll opt for your last proposal of printing the error message
at the debug loglevel and return false.

Thanks,
Miquèl