LinuxLists.cc - [PATCH] ath10k: double check bmi xfer pointers

2014-04-10 10:17:10

Subject: [PATCH] ath10k: double check bmi xfer pointers

If for some reason copy engine ring buffer became
corrupt ath10k could crash the machine due to
invalid pointer dereference. It's very unlikely
but devices can never be fully trusted so verify
if the bmi xfer pointer read back from copy engine
matches the original pointer. The bug looked as
follows:

BUG: unable to handle kernel paging request at ffffffff815d6133
...
Call Trace:
[<ffffffff810fdaf4>] ? mark_held_locks+0x71/0x99
[<ffffffff810c6b6d>] ? __local_bh_enable_ip+0xaa/0xd9
[<ffffffff810fd7eb>] lock_acquire+0x82/0x9d
[<ffffffff810f5999>] ? complete+0x19/0x45
[<ffffffff810c6b72>] ? __local_bh_enable_ip+0xaf/0xd9
[<ffffffff815d5f9b>] _raw_spin_lock_irqsave+0x47/0x5a
[<ffffffff810f5999>] ? complete+0x19/0x45
[<ffffffff810f5999>] complete+0x19/0x45
[<ffffffffa056d977>] ath10k_pci_hif_exchange_bmi_msg+0x267/0x3f4 [ath10k_pci]
[<ffffffffa0471b42>] ath10k_hif_exchange_bmi_msg+0xe/0x10 [ath10k_core]
[<ffffffffa0471f01>] ath10k_bmi_write_memory+0xc4/0x12d [ath10k_core]
[<ffffffffa046877f>] ath10k_core_start+0x207/0x828 [ath10k_core]
[<ffffffffa0469723>] ath10k_core_register+0x5ca/0x77f [ath10k_core]
...

Reported-By: Ben Greear <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index bf1083d..85e84c9 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1390,35 +1390,49 @@ err_dma:
return ret;
}

-static void ath10k_pci_bmi_send_done(struct ath10k_ce_pipe *ce_state)
+static void ath10k_pci_bmi_send_done(struct ath10k_ce_pipe *ce_state,
+ struct bmi_xfer *xfer)
{
- struct bmi_xfer *xfer;
+ void *ptr;
u32 ce_data;
unsigned int nbytes;
unsigned int transfer_id;

- if (ath10k_ce_completed_send_next(ce_state, (void **)&xfer, &ce_data,
+ if (ath10k_ce_completed_send_next(ce_state, (void **)&ptr, &ce_data,
&nbytes, &transfer_id))
return;

+ if (xfer != ptr) {
+ ath10k_warn("failed to verify bmi xfer tx pointer (got %p expected %p)\n",
+ ptr, xfer);
+ return;
+ }
+
if (xfer->wait_for_resp)
return;

complete(&xfer->done);
}

-static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
+static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state,
+ struct bmi_xfer *xfer)
{
- struct bmi_xfer *xfer;
+ void *ptr;
u32 ce_data;
unsigned int nbytes;
unsigned int transfer_id;
unsigned int flags;

- if (ath10k_ce_completed_recv_next(ce_state, (void **)&xfer, &ce_data,
+ if (ath10k_ce_completed_recv_next(ce_state, (void **)&ptr, &ce_data,
&nbytes, &transfer_id, &flags))
return;

+ if (xfer != ptr) {
+ ath10k_warn("failed to verify bmi xfer rx pointer (got %p expected %p)\n",
+ ptr, xfer);
+ return;
+ }
+
if (!xfer->wait_for_resp) {
ath10k_warn("unexpected: BMI data received; ignoring\n");
return;
@@ -1435,8 +1449,8 @@ static int ath10k_pci_bmi_wait(struct ath10k_ce_pipe *tx_pipe,
unsigned long timeout = jiffies + BMI_COMMUNICATION_TIMEOUT_HZ;

while (time_before_eq(jiffies, timeout)) {
- ath10k_pci_bmi_send_done(tx_pipe);
- ath10k_pci_bmi_recv_data(rx_pipe);
+ ath10k_pci_bmi_send_done(tx_pipe, xfer);
+ ath10k_pci_bmi_recv_data(rx_pipe, xfer);

if (completion_done(&xfer->done))
return 0;
--
1.8.5.3

2014-04-11 05:41:03

by Kalle Valo

[permalink] [raw]

Subject: Re: [PATCH] ath10k: double check bmi xfer pointers

Michal Kazior <[email protected]> writes:

> If for some reason copy engine ring buffer became
> corrupt ath10k could crash the machine due to
> invalid pointer dereference. It's very unlikely
> but devices can never be fully trusted so verify
> if the bmi xfer pointer read back from copy engine
> matches the original pointer.

The big question is why does this happen? Does this happen only with
Ben's firmware or is it a more generic problem?

--
Kalle Valo

2014-04-14 08:05:17

by Kalle Valo

[permalink] [raw]

Subject: Re: [PATCH] ath10k: double check bmi xfer pointers

Michal Kazior <[email protected]> writes:

> I conclude the bug is not present in the vanilla ath10k code.
>
> TL;DR drop the patch, please

Hehe. Thanks, I have dropped the patch.

--
Kalle Valo

2014-04-11 05:47:45

by Michal Kazior

[permalink] [raw]

Subject: Re: [PATCH] ath10k: double check bmi xfer pointers

On 11 April 2014 07:40, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> If for some reason copy engine ring buffer became
>> corrupt ath10k could crash the machine due to
>> invalid pointer dereference. It's very unlikely
>> but devices can never be fully trusted so verify
>> if the bmi xfer pointer read back from copy engine
>> matches the original pointer.
>
> The big question is why does this happen? Does this happen only with
> Ben's firmware or is it a more generic problem?

I'll look more into this.

Michał

2014-04-10 13:43:09

by Ben Greear

[permalink] [raw]

Subject: Re: [PATCH] ath10k: double check bmi xfer pointers

On 04/10/2014 03:05 AM, Michal Kazior wrote:
> If for some reason copy engine ring buffer became
> corrupt ath10k could crash the machine due to
> invalid pointer dereference. It's very unlikely
> but devices can never be fully trusted so verify
> if the bmi xfer pointer read back from copy engine
> matches the original pointer. The bug looked as
> follows:

Thanks for fixing this. I'll add this patch to my
tree and run more tests today.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2014-04-11 07:58:56

by Michal Kazior

[permalink] [raw]

Subject: Re: [PATCH] ath10k: double check bmi xfer pointers

On 11 April 2014 07:47, Michal Kazior <[email protected]> wrote:
> On 11 April 2014 07:40, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> If for some reason copy engine ring buffer became
>>> corrupt ath10k could crash the machine due to
>>> invalid pointer dereference. It's very unlikely
>>> but devices can never be fully trusted so verify
>>> if the bmi xfer pointer read back from copy engine
>>> matches the original pointer.
>>
>> The big question is why does this happen? Does this happen only with
>> Ben's firmware or is it a more generic problem?
>
> I'll look more into this.

Hmm.. After going through the code a few times I think the bug is
actually something else.

If the crash happened in complete() it means the completion structure
wasn't set up properly. However it is always initialized in
ath10k_pci_hif_exchange_bmi_msg() before proceeding. This means xfer
pointer read back from ath10k_ce_completed_send_next() or
ath10k_ce_completed_recv_next() is wrong. Since the pointer to it is
kept on host getting wrong xfer means sw_index must be wrong. If I
assume indexes are managed correctly (i.e. no overflows, locking is
fine) then it is the entry the sw_index points to that is actually
unexpected.

This could happen if concurrent transfers occur on pipe 0 or 1 (used
for bmi communication) during device bootup. This could be either a
concurrent bmi transfer or a non-bmi buffer or an old bmi xfer data.
The latter shouldn't be the case because
ath10k_pci_hif_exchange_bmi_msg() cleans up after itself. Concurrent
access doesn't seem to be the case either.

I conclude the bug is not present in the vanilla ath10k code.

TL;DR drop the patch, please

Michał