2014-05-15 12:56:07

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 0/3] ath10k: pci fixes 2014-05-15

Hi,

Patches 1 and 2 were made as a result of analysing
`ath10k firmware crash after 4 hours of heavy TCP
traffic` bug reported by Avery.

Patch 3 fixes a bug reported by Avery that was
apparently related to missing memory barriers on
his ARM test system. @Avery, can you take a look
or test it, please? I was never able to reproduce
this but it just seems like the right thing to do.
Or you can simply post your patch and I'll drop
mine then.

Note: this is based on ath-next-test branch.


Michal Kazior (3):
ath10k: protect src_ring state with ce_lock in tx_sg()
ath10k: revert incomplete scatter-gather pci tx
ath10k: add explicit memory barrier for ring index update

drivers/net/wireless/ath/ath10k/ce.c | 29 +++++++++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/ce.h | 2 ++
drivers/net/wireless/ath/ath10k/pci.c | 27 ++++++++++++++++++---------
3 files changed, 49 insertions(+), 9 deletions(-)

--
1.8.5.3



2014-05-15 12:56:09

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 1/3] ath10k: protect src_ring state with ce_lock in tx_sg()

It was possible to read invalid state of CE ring
buffer indexes. This could lead to scatter-gather
transfer failure in mid-way and crash firmware
later by leaving garbage data on the ring.

Reported-By: Avery Pennarun <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 91d6076..04d8cbf 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -762,13 +762,17 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
struct ath10k_pci_pipe *pci_pipe = &ar_pci->pipe_info[pipe_id];
struct ath10k_ce_pipe *ce_pipe = pci_pipe->ce_hdl;
struct ath10k_ce_ring *src_ring = ce_pipe->src_ring;
- unsigned int nentries_mask = src_ring->nentries_mask;
- unsigned int sw_index = src_ring->sw_index;
- unsigned int write_index = src_ring->write_index;
+ unsigned int nentries_mask;
+ unsigned int sw_index;
+ unsigned int write_index;
int err, i;

spin_lock_bh(&ar_pci->ce_lock);

+ nentries_mask = src_ring->nentries_mask;
+ sw_index = src_ring->sw_index;
+ write_index = src_ring->write_index;
+
if (unlikely(CE_RING_DELTA(nentries_mask,
write_index, sw_index - 1) < n_items)) {
err = -ENOBUFS;
--
1.8.5.3


2014-05-26 10:10:20

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 1/2] ath10k: protect src_ring state with ce_lock in tx_sg()

It was possible to read invalid state of CE ring
buffer indexes. This could lead to scatter-gather
transfer failure in mid-way and crash firmware
later by leaving garbage data on the ring.

Reported-By: Avery Pennarun <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 71ab110..b1eb915 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -762,13 +762,17 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
struct ath10k_pci_pipe *pci_pipe = &ar_pci->pipe_info[pipe_id];
struct ath10k_ce_pipe *ce_pipe = pci_pipe->ce_hdl;
struct ath10k_ce_ring *src_ring = ce_pipe->src_ring;
- unsigned int nentries_mask = src_ring->nentries_mask;
- unsigned int sw_index = src_ring->sw_index;
- unsigned int write_index = src_ring->write_index;
+ unsigned int nentries_mask;
+ unsigned int sw_index;
+ unsigned int write_index;
int err, i;

spin_lock_bh(&ar_pci->ce_lock);

+ nentries_mask = src_ring->nentries_mask;
+ sw_index = src_ring->sw_index;
+ write_index = src_ring->write_index;
+
if (unlikely(CE_RING_DELTA(nentries_mask,
write_index, sw_index - 1) < n_items)) {
err = -ENOBUFS;
--
1.8.5.3


2014-05-26 05:37:15

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH 2/3] ath10k: revert incomplete scatter-gather pci tx

On 25 May 2014 09:53, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> This prevents leaving incomplete scatter-gather
>> transfer on CE rings which can lead firmware to
>> crash.
>>
>> Reported-By: Avery Pennarun <[email protected]>
>> Signed-off-by: Michal Kazior <[email protected]>
>
> The title is a bit misleading as usually the commit log with the word
> revert means that the commit is reverting another git commit. Maybe
> something like this is better:
>
> ath10k: drop incomplete scatter-gather pci tx transfers

Good point. I was actually thinking 'abort' .. 'properly'.


>> + if (WARN_ON(src_ring->write_index == src_ring->sw_index))
>> + return;
>> +
>> + if (WARN_ON(src_ring->write_index ==
>> + ath10k_ce_src_ring_write_index_get(ar, ctrl_addr)))
>> + return;
>
> WARN_ON() on data path is dangerous. WARN_ON_ONCE() or ath10k_warn() is
> better.

Good point!


>> +err:
>> + for (; i > 0; i--)
>
> Isn't this just a fancy way to say 'while (i-- > 0)'?

Not really. More like do { .. } while (--i > 0), no? First iteration
must use unmodified `i`.


MichaƂ

2014-05-26 10:10:22

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 2/2] ath10k: abort incomplete scatter-gather pci tx properly

This prevents leaving incomplete scatter-gather
transfer on CE rings which can lead firmware to
crash.

Reported-By: Avery Pennarun <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
---

Notes:
v2:
* send_revert(): s/WARN_ON/WARN_ON_ONCE/ [Kalle]
* commit subject: s/revert/abort/, s/$/properly/ [Kalle]

drivers/net/wireless/ath/ath10k/ce.c | 27 +++++++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/ce.h | 2 ++
drivers/net/wireless/ath/ath10k/pci.c | 17 +++++++++++------
3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 1e4cad8..d185dc0 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -329,6 +329,33 @@ exit:
return ret;
}

+void __ath10k_ce_send_revert(struct ath10k_ce_pipe *pipe)
+{
+ struct ath10k *ar = pipe->ar;
+ struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+ struct ath10k_ce_ring *src_ring = pipe->src_ring;
+ u32 ctrl_addr = pipe->ctrl_addr;
+
+ lockdep_assert_held(&ar_pci->ce_lock);
+
+ /*
+ * This function must be called only if there is an incomplete
+ * scatter-gather transfer (before index register is updated)
+ * that needs to be cleaned up.
+ */
+ if (WARN_ON_ONCE(src_ring->write_index == src_ring->sw_index))
+ return;
+
+ if (WARN_ON_ONCE(src_ring->write_index ==
+ ath10k_ce_src_ring_write_index_get(ar, ctrl_addr)))
+ return;
+
+ src_ring->write_index--;
+ src_ring->write_index &= src_ring->nentries_mask;
+
+ src_ring->per_transfer_context[src_ring->write_index] = NULL;
+}
+
int ath10k_ce_send(struct ath10k_ce_pipe *ce_state,
void *per_transfer_context,
u32 buffer,
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index fd0bc35..7a5a36f 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -160,6 +160,8 @@ int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
unsigned int transfer_id,
unsigned int flags);

+void __ath10k_ce_send_revert(struct ath10k_ce_pipe *pipe);
+
void ath10k_ce_send_cb_register(struct ath10k_ce_pipe *ce_state,
void (*send_cb)(struct ath10k_ce_pipe *),
int disable_interrupts);
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index b1eb915..d0004d5 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -765,7 +765,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
unsigned int nentries_mask;
unsigned int sw_index;
unsigned int write_index;
- int err, i;
+ int err, i = 0;

spin_lock_bh(&ar_pci->ce_lock);

@@ -776,7 +776,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
if (unlikely(CE_RING_DELTA(nentries_mask,
write_index, sw_index - 1) < n_items)) {
err = -ENOBUFS;
- goto unlock;
+ goto err;
}

for (i = 0; i < n_items - 1; i++) {
@@ -793,7 +793,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
items[i].transfer_id,
CE_SEND_FLAG_GATHER);
if (err)
- goto unlock;
+ goto err;
}

/* `i` is equal to `n_items -1` after for() */
@@ -811,10 +811,15 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
items[i].transfer_id,
0);
if (err)
- goto unlock;
+ goto err;
+
+ spin_unlock_bh(&ar_pci->ce_lock);
+ return 0;
+
+err:
+ for (; i > 0; i--)
+ __ath10k_ce_send_revert(ce_pipe);

- err = 0;
-unlock:
spin_unlock_bh(&ar_pci->ce_lock);
return err;
}
--
1.8.5.3


2014-05-25 07:54:04

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 2/3] ath10k: revert incomplete scatter-gather pci tx

Michal Kazior <[email protected]> writes:

> This prevents leaving incomplete scatter-gather
> transfer on CE rings which can lead firmware to
> crash.
>
> Reported-By: Avery Pennarun <[email protected]>
> Signed-off-by: Michal Kazior <[email protected]>

The title is a bit misleading as usually the commit log with the word
revert means that the commit is reverting another git commit. Maybe
something like this is better:

ath10k: drop incomplete scatter-gather pci tx transfers

> + if (WARN_ON(src_ring->write_index == src_ring->sw_index))
> + return;
> +
> + if (WARN_ON(src_ring->write_index ==
> + ath10k_ce_src_ring_write_index_get(ar, ctrl_addr)))
> + return;

WARN_ON() on data path is dangerous. WARN_ON_ONCE() or ath10k_warn() is
better.

> +err:
> + for (; i > 0; i--)

Isn't this just a fancy way to say 'while (i-- > 0)'?

--
Kalle Valo

2014-05-15 12:56:12

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 3/3] ath10k: add explicit memory barrier for ring index update

Avery reported he had some issues related to
instructions being re-ordered on his ARM test
system resulting in firmware crashes.

This makes sure that data is in place before CE
ring index is updated telling firmware it can
fetch the data.

Reported-By: Avery Pennarun <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/ce.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 966f26e..e66159e 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -63,6 +63,7 @@ static inline void ath10k_ce_dest_ring_write_index_set(struct ath10k *ar,
u32 ce_ctrl_addr,
unsigned int n)
{
+ mb();
ath10k_pci_write32(ar, ce_ctrl_addr + DST_WR_INDEX_ADDRESS, n);
}

@@ -76,6 +77,7 @@ static inline void ath10k_ce_src_ring_write_index_set(struct ath10k *ar,
u32 ce_ctrl_addr,
unsigned int n)
{
+ mb();
ath10k_pci_write32(ar, ce_ctrl_addr + SR_WR_INDEX_ADDRESS, n);
}

--
1.8.5.3


2014-05-25 07:44:52

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 3/3] ath10k: add explicit memory barrier for ring index update

Michal Kazior <[email protected]> writes:

> Avery reported he had some issues related to
> instructions being re-ordered on his ARM test
> system resulting in firmware crashes.
>
> This makes sure that data is in place before CE
> ring index is updated telling firmware it can
> fetch the data.
>
> Reported-By: Avery Pennarun <[email protected]>
> Signed-off-by: Michal Kazior <[email protected]>

I have dropped this patch for now. Like I said before, let's first be
sure that all memory barrier issues are resolved and only then apply
this.

--
Kalle Valo

2014-05-16 12:34:40

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 0/3] ath10k: pci fixes 2014-05-15

Michal Kazior <[email protected]> writes:

> Hi,
>
> Patches 1 and 2 were made as a result of analysing
> `ath10k firmware crash after 4 hours of heavy TCP
> traffic` bug reported by Avery.
>
> Patch 3 fixes a bug reported by Avery that was
> apparently related to missing memory barriers on
> his ARM test system. @Avery, can you take a look
> or test it, please? I was never able to reproduce
> this but it just seems like the right thing to do.
> Or you can simply post your patch and I'll drop
> mine then.

I think it would be good still to wait a while with patch 3 as
apparently Avery is still seeing some odd firmware crashes. Once we are
confident that the memory barrier issues are handled we come up with a
final patch.

--
Kalle Valo

2014-05-16 12:32:17

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 3/3] ath10k: add explicit memory barrier for ring index update

Michal Kazior <[email protected]> writes:

> Avery reported he had some issues related to
> instructions being re-ordered on his ARM test
> system resulting in firmware crashes.
>
> This makes sure that data is in place before CE
> ring index is updated telling firmware it can
> fetch the data.
>
> Reported-By: Avery Pennarun <[email protected]>
> Signed-off-by: Michal Kazior <[email protected]>
> ---

[...]

> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -63,6 +63,7 @@ static inline void ath10k_ce_dest_ring_write_index_set(struct ath10k *ar,
> u32 ce_ctrl_addr,
> unsigned int n)
> {
> + mb();
> ath10k_pci_write32(ar, ce_ctrl_addr + DST_WR_INDEX_ADDRESS, n);
> }
>
> @@ -76,6 +77,7 @@ static inline void ath10k_ce_src_ring_write_index_set(struct ath10k *ar,
> u32 ce_ctrl_addr,
> unsigned int n)
> {
> + mb();
> ath10k_pci_write32(ar, ce_ctrl_addr + SR_WR_INDEX_ADDRESS, n);
> }

I see two new checkpatch warnings:

drivers/net/wireless/ath/ath10k/ce.c:66: CHECK: memory barrier without comment
drivers/net/wireless/ath/ath10k/ce.c:80: CHECK: memory barrier without comment

--
Kalle Valo

2014-05-15 12:56:10

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 2/3] ath10k: revert incomplete scatter-gather pci tx

This prevents leaving incomplete scatter-gather
transfer on CE rings which can lead firmware to
crash.

Reported-By: Avery Pennarun <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/ce.c | 27 +++++++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/ce.h | 2 ++
drivers/net/wireless/ath/ath10k/pci.c | 17 +++++++++++------
3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 1e4cad8..966f26e 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -329,6 +329,33 @@ exit:
return ret;
}

+void __ath10k_ce_send_revert(struct ath10k_ce_pipe *pipe)
+{
+ struct ath10k *ar = pipe->ar;
+ struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+ struct ath10k_ce_ring *src_ring = pipe->src_ring;
+ u32 ctrl_addr = pipe->ctrl_addr;
+
+ lockdep_assert_held(&ar_pci->ce_lock);
+
+ /*
+ * This function must be called only if there is an incomplete
+ * scatter-gather transfer (before index register is updated)
+ * that needs to be cleaned up.
+ */
+ if (WARN_ON(src_ring->write_index == src_ring->sw_index))
+ return;
+
+ if (WARN_ON(src_ring->write_index ==
+ ath10k_ce_src_ring_write_index_get(ar, ctrl_addr)))
+ return;
+
+ src_ring->write_index--;
+ src_ring->write_index &= src_ring->nentries_mask;
+
+ src_ring->per_transfer_context[src_ring->write_index] = NULL;
+}
+
int ath10k_ce_send(struct ath10k_ce_pipe *ce_state,
void *per_transfer_context,
u32 buffer,
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index fd0bc35..7a5a36f 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -160,6 +160,8 @@ int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
unsigned int transfer_id,
unsigned int flags);

+void __ath10k_ce_send_revert(struct ath10k_ce_pipe *pipe);
+
void ath10k_ce_send_cb_register(struct ath10k_ce_pipe *ce_state,
void (*send_cb)(struct ath10k_ce_pipe *),
int disable_interrupts);
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 04d8cbf..059add4 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -765,7 +765,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
unsigned int nentries_mask;
unsigned int sw_index;
unsigned int write_index;
- int err, i;
+ int err, i = 0;

spin_lock_bh(&ar_pci->ce_lock);

@@ -776,7 +776,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
if (unlikely(CE_RING_DELTA(nentries_mask,
write_index, sw_index - 1) < n_items)) {
err = -ENOBUFS;
- goto unlock;
+ goto err;
}

for (i = 0; i < n_items - 1; i++) {
@@ -793,7 +793,7 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
items[i].transfer_id,
CE_SEND_FLAG_GATHER);
if (err)
- goto unlock;
+ goto err;
}

/* `i` is equal to `n_items -1` after for() */
@@ -811,10 +811,15 @@ static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
items[i].transfer_id,
0);
if (err)
- goto unlock;
+ goto err;
+
+ spin_unlock_bh(&ar_pci->ce_lock);
+ return 0;
+
+err:
+ for (; i > 0; i--)
+ __ath10k_ce_send_revert(ce_pipe);

- err = 0;
-unlock:
spin_unlock_bh(&ar_pci->ce_lock);
return err;
}
--
1.8.5.3


2014-05-26 10:10:20

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 0/2] ath10k: pci fixes 2014-05-15

Hi,

Patches are a result of analysing `ath10k firmware
crash after 4 hours of heavy TCP traffic` bug
reported by Avery.

v2:
* dropped patch 3/3 (memory barrier addition) [Kalle]

Based on github.com/kvalo/ath/master


Michal Kazior (2):
ath10k: protect src_ring state with ce_lock in tx_sg()
ath10k: abort incomplete scatter-gather pci tx properly

drivers/net/wireless/ath/ath10k/ce.c | 27 +++++++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/ce.h | 2 ++
drivers/net/wireless/ath/ath10k/pci.c | 27 ++++++++++++++++++---------
3 files changed, 47 insertions(+), 9 deletions(-)

--
1.8.5.3


2014-05-26 09:19:39

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 2/3] ath10k: revert incomplete scatter-gather pci tx

Michal Kazior <[email protected]> writes:

> On 25 May 2014 09:53, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> This prevents leaving incomplete scatter-gather
>>> transfer on CE rings which can lead firmware to
>>> crash.
>>>
>>> Reported-By: Avery Pennarun <[email protected]>
>>> Signed-off-by: Michal Kazior <[email protected]>
>>
>> The title is a bit misleading as usually the commit log with the word
>> revert means that the commit is reverting another git commit. Maybe
>> something like this is better:
>>
>> ath10k: drop incomplete scatter-gather pci tx transfers
>
> Good point. I was actually thinking 'abort' .. 'properly'.

Sounds good to me.

>>> +err:
>>> + for (; i > 0; i--)
>>
>> Isn't this just a fancy way to say 'while (i-- > 0)'?
>
> Not really. More like do { .. } while (--i > 0), no? First iteration
> must use unmodified `i`.

Ok, I missed that. Then the for loop is good here.

--
Kalle Valo

2014-05-27 09:33:03

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] ath10k: pci fixes 2014-05-15

Michal Kazior <[email protected]> writes:

> Hi,
>
> Patches are a result of analysing `ath10k firmware
> crash after 4 hours of heavy TCP traffic` bug
> reported by Avery.
>
> v2:
> * dropped patch 3/3 (memory barrier addition) [Kalle]
>
> Based on github.com/kvalo/ath/master
>
>
> Michal Kazior (2):
> ath10k: protect src_ring state with ce_lock in tx_sg()
> ath10k: abort incomplete scatter-gather pci tx properly

Thanks, both patches applied.

--
Kalle Valo