2014-06-05 07:56:19

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 0/2] ath10k: pci fixes 2014-06-05

Hi,

I've found a few issues while I was testing ath10k
in QEMU/KVM with PCI passthrough. The second patch
is something that I've found while analysing the
bmi race and I don't think anyone experienced
problems associated with it.

There's still a weird synchronization issue with
(what I think is) iomap write propagation. The
initial fw bootup interrupt doesn't come in and
it seems CE interrupts are unmasked with a lag
because explicit polling after ctl_resp timeout
seems to be sufficient to work around it. I
suspect this might be a virtualization problem
rather than ath10k. Ideas, anyone?


Michal Kazior (2):
ath10k: fix bmi exchange tx/rx race
ath10k: sanitize tx ring index access properly

drivers/net/wireless/ath/ath10k/ce.c | 11 +++++++----
drivers/net/wireless/ath/ath10k/pci.c | 11 +++--------
drivers/net/wireless/ath/ath10k/pci.h | 3 ++-
3 files changed, 12 insertions(+), 13 deletions(-)

--
1.8.5.3



2014-06-05 07:56:22

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 2/2] ath10k: sanitize tx ring index access properly

The tx ring index was immediately trimmed with a
bitmask. This discarded the 0xFFFFFFFF error case
(which theoretically can happen when a device is
abruptly disconnected) and led to using an invalid
tx ring index. This could lead to memory
corruption.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/ce.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d185dc0..7c6c7d5 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -603,16 +603,19 @@ static int ath10k_ce_completed_send_next_nolock(struct ath10k_ce_pipe *ce_state,
if (ret)
return ret;

- src_ring->hw_index =
- ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
- src_ring->hw_index &= nentries_mask;
+ read_index = ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
+ if (read_index == 0xFFFFFFFF)
+ return -ENODEV;
+
+ read_index &= nentries_mask;
+ src_ring->hw_index = read_index;

ath10k_pci_sleep(ar);
}

read_index = src_ring->hw_index;

- if ((read_index == sw_index) || (read_index == 0xffffffff))
+ if (read_index == sw_index)
return -EIO;

sbase = src_ring->shadow_base;
--
1.8.5.3


2014-06-05 07:56:21

by Michal Kazior

[permalink] [raw]
Subject: [PATCH 1/2] ath10k: fix bmi exchange tx/rx race

It was possible for tx completion not to be
processed. In that case an old stack pointer was
left on copy engine tx ring. Next bmi exchange
would immediately pop it and use complete() on the
completion struct there causing corruption.

Make sure to wait for both tx and rx completions
properly.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 11 +++--------
drivers/net/wireless/ath/ath10k/pci.h | 3 ++-
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index d0004d5..06840d1 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1362,8 +1362,6 @@ static int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
ath10k_ce_recv_buf_enqueue(ce_rx, &xfer, resp_paddr);
}

- init_completion(&xfer.done);
-
ret = ath10k_ce_send(ce_tx, &xfer, req_paddr, req_len, -1, 0);
if (ret)
goto err_resp;
@@ -1414,10 +1412,7 @@ static void ath10k_pci_bmi_send_done(struct ath10k_ce_pipe *ce_state)
&nbytes, &transfer_id))
return;

- if (xfer->wait_for_resp)
- return;
-
- complete(&xfer->done);
+ xfer->tx_done = true;
}

static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
@@ -1438,7 +1433,7 @@ static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
}

xfer->resp_len = nbytes;
- complete(&xfer->done);
+ xfer->rx_done = true;
}

static int ath10k_pci_bmi_wait(struct ath10k_ce_pipe *tx_pipe,
@@ -1451,7 +1446,7 @@ static int ath10k_pci_bmi_wait(struct ath10k_ce_pipe *tx_pipe,
ath10k_pci_bmi_send_done(tx_pipe);
ath10k_pci_bmi_recv_data(rx_pipe);

- if (completion_done(&xfer->done))
+ if (xfer->tx_done && (xfer->rx_done == xfer->wait_for_resp))
return 0;

schedule();
diff --git a/drivers/net/wireless/ath/ath10k/pci.h b/drivers/net/wireless/ath/ath10k/pci.h
index dfdebb4..9401292 100644
--- a/drivers/net/wireless/ath/ath10k/pci.h
+++ b/drivers/net/wireless/ath/ath10k/pci.h
@@ -38,7 +38,8 @@
#define DIAG_TRANSFER_LIMIT 2048

struct bmi_xfer {
- struct completion done;
+ bool tx_done;
+ bool rx_done;
bool wait_for_resp;
u32 resp_len;
};
--
1.8.5.3


2014-07-15 08:32:47

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 0/2] ath10k: pci fixes 2014-06-05

Michal Kazior <[email protected]> writes:

> I've found a few issues while I was testing ath10k
> in QEMU/KVM with PCI passthrough. The second patch
> is something that I've found while analysing the
> bmi race and I don't think anyone experienced
> problems associated with it.
>
> There's still a weird synchronization issue with
> (what I think is) iomap write propagation. The
> initial fw bootup interrupt doesn't come in and
> it seems CE interrupts are unmasked with a lag
> because explicit polling after ctl_resp timeout
> seems to be sufficient to work around it. I
> suspect this might be a virtualization problem
> rather than ath10k. Ideas, anyone?

No ideas, but it would be great to get ath10k working in KVM. If we
cannot find a better solution, could we add the explicit polling anyway?
(With an approriate comment why such a hack is needed, of course)

> Michal Kazior (2):
> ath10k: fix bmi exchange tx/rx race
> ath10k: sanitize tx ring index access properly

Thanks, both patches applied.

--
Kalle Valo

2014-07-14 13:20:31

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 2/2] ath10k: sanitize tx ring index access properly

Michal Kazior <[email protected]> writes:

> The tx ring index was immediately trimmed with a
> bitmask. This discarded the 0xFFFFFFFF error case
> (which theoretically can happen when a device is
> abruptly disconnected) and led to using an invalid
> tx ring index. This could lead to memory
> corruption.
>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/ce.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
> index d185dc0..7c6c7d5 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -603,16 +603,19 @@ static int ath10k_ce_completed_send_next_nolock(struct ath10k_ce_pipe *ce_state,
> if (ret)
> return ret;
>
> - src_ring->hw_index =
> - ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
> - src_ring->hw_index &= nentries_mask;
> + read_index = ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
> + if (read_index == 0xFFFFFFFF)
> + return -ENODEV;

I changed this to lower case, as it was before. Let's use lower case hex
values in ath10k.

--
Kalle Valo