[ Resending due to cut and paste failure of email address ]
From: Steven Rostedt (Google) <[email protected]>
While looking at a crash report on a timer list being corrupted, which
usually happens when a timer is freed while still active. This is
commonly triggered by code calling del_timer() instead of
del_timer_sync() just before freeing.
One possible culprit is the hci_qca driver, which does exactly that.
Cc: [email protected]
Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for
UART") Signed-off-by: Steven Rostedt (Google) <[email protected]>
---
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index f6e91fb432a3..73a8c72b5aae 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -696,8 +696,8 @@ static int qca_close(struct hci_uart *hu)
skb_queue_purge(&qca->tx_wait_q);
skb_queue_purge(&qca->txq);
skb_queue_purge(&qca->rx_memdump_q);
- del_timer(&qca->tx_idle_timer);
- del_timer(&qca->wake_retrans_timer);
+ del_timer_sync(&qca->tx_idle_timer);
+ del_timer_sync(&qca->wake_retrans_timer);
destroy_workqueue(qca->workqueue);
qca->hu = NULL;
On 4/4/22 15:22, Steven Rostedt wrote:
> [ Resending due to cut and paste failure of email address ]
>
> From: Steven Rostedt (Google) <[email protected]>
>
> While looking at a crash report on a timer list being corrupted, which
> usually happens when a timer is freed while still active. This is
> commonly triggered by code calling del_timer() instead of
> del_timer_sync() just before freeing.
>
> One possible culprit is the hci_qca driver, which does exactly that.
>
> Cc: [email protected]
> Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for
> UART") Signed-off-by: Steven Rostedt (Google) <[email protected]>
> ---
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index f6e91fb432a3..73a8c72b5aae 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -696,8 +696,8 @@ static int qca_close(struct hci_uart *hu)
> skb_queue_purge(&qca->tx_wait_q);
> skb_queue_purge(&qca->txq);
> skb_queue_purge(&qca->rx_memdump_q);
> - del_timer(&qca->tx_idle_timer);
> - del_timer(&qca->wake_retrans_timer);
> + del_timer_sync(&qca->tx_idle_timer);
> + del_timer_sync(&qca->wake_retrans_timer);
It seems the wake_retrans_timer could be re-armed from a work queue.
So perhaps we need to make sure qca->workqueue is destroyed
before these del_timer_sync() calls ?
> destroy_workqueue(qca->workqueue);
ie move this destroy_workqueue() up ?
> qca->hu = NULL;
>
On Mon, 4 Apr 2022 17:22:00 -0700
Eric Dumazet <[email protected]> wrote:
> > diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> > index f6e91fb432a3..73a8c72b5aae 100644
> > --- a/drivers/bluetooth/hci_qca.c
> > +++ b/drivers/bluetooth/hci_qca.c
> > @@ -696,8 +696,8 @@ static int qca_close(struct hci_uart *hu)
> > skb_queue_purge(&qca->tx_wait_q);
> > skb_queue_purge(&qca->txq);
> > skb_queue_purge(&qca->rx_memdump_q);
> > - del_timer(&qca->tx_idle_timer);
> > - del_timer(&qca->wake_retrans_timer);
> > + del_timer_sync(&qca->tx_idle_timer);
> > + del_timer_sync(&qca->wake_retrans_timer);
>
>
> It seems the wake_retrans_timer could be re-armed from a work queue.
>
> So perhaps we need to make sure qca->workqueue is destroyed
>
> before these del_timer_sync() calls ?
>
> > destroy_workqueue(qca->workqueue);
>
>
> ie move this destroy_workqueue() up ?
Yeah, that could be a problem. I would think moving it up would help,
if that's what requeue's the timers.
-- Steve
>
>
> > qca->hu = NULL;
> >
On Mon, Apr 04, 2022 at 05:22:00PM -0700, Eric Dumazet wrote:
>
> On 4/4/22 15:22, Steven Rostedt wrote:
> > [ Resending due to cut and paste failure of email address ]
> >
> > From: Steven Rostedt (Google) <[email protected]>
> >
> > While looking at a crash report on a timer list being corrupted, which
> > usually happens when a timer is freed while still active. This is
> > commonly triggered by code calling del_timer() instead of
> > del_timer_sync() just before freeing.
> >
> > One possible culprit is the hci_qca driver, which does exactly that.
> >
> > Cc: [email protected]
> > Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for
> > UART") Signed-off-by: Steven Rostedt (Google) <[email protected]>
> > ---
> > diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> > index f6e91fb432a3..73a8c72b5aae 100644
> > --- a/drivers/bluetooth/hci_qca.c
> > +++ b/drivers/bluetooth/hci_qca.c
> > @@ -696,8 +696,8 @@ static int qca_close(struct hci_uart *hu)
> > skb_queue_purge(&qca->tx_wait_q);
> > skb_queue_purge(&qca->txq);
> > skb_queue_purge(&qca->rx_memdump_q);
> > - del_timer(&qca->tx_idle_timer);
> > - del_timer(&qca->wake_retrans_timer);
> > + del_timer_sync(&qca->tx_idle_timer);
> > + del_timer_sync(&qca->wake_retrans_timer);
>
>
> It seems the wake_retrans_timer could be re-armed from a work queue.
>
> So perhaps we need to make sure qca->workqueue is destroyed
>
> before these del_timer_sync() calls ?
>
> > destroy_workqueue(qca->workqueue);
>
>
> ie move this destroy_workqueue() up ?
>
What prevents the timer code from queueing work into the destroyed
workqueue ?
Thanks,
Guenter
On Wed, 6 Apr 2022 08:39:07 -0700
Guenter Roeck <[email protected]> wrote:
> > ie move this destroy_workqueue() up ?
> >
>
> What prevents the timer code from queueing work into the destroyed
> workqueue ?
So we have a chicken verses egg issue here?
-- Steve
On 4/6/22 08:46, Steven Rostedt wrote:
> On Wed, 6 Apr 2022 08:39:07 -0700
> Guenter Roeck <[email protected]> wrote:
>
>>> ie move this destroy_workqueue() up ?
>>>
>>
>> What prevents the timer code from queueing work into the destroyed
>> workqueue ?
>
> So we have a chicken verses egg issue here?
>
Almost looks like it, unless I am missing something. Maybe some flag
is needed to prevent the timer handling code from queuing into the
destroyed workqueue, or the workqueue handler from updating the timer.
Guenter
On Wed, 6 Apr 2022 09:36:10 -0700
Guenter Roeck <[email protected]> wrote:
> > So we have a chicken verses egg issue here?
> >
>
> Almost looks like it, unless I am missing something. Maybe some flag
> is needed to prevent the timer handling code from queuing into the
> destroyed workqueue, or the workqueue handler from updating the timer.
That's exactly what I was thinking. I do not know all the code here. I
could try to write a patch, but I may likely miss something.
-- Steve
On 4/6/22 09:46, Steven Rostedt wrote:
> On Wed, 6 Apr 2022 09:36:10 -0700
> Guenter Roeck <[email protected]> wrote:
>
>>> So we have a chicken verses egg issue here?
>>>
>> Almost looks like it, unless I am missing something. Maybe some flag
>> is needed to prevent the timer handling code from queuing into the
>> destroyed workqueue, or the workqueue handler from updating the timer.
> That's exactly what I was thinking. I do not know all the code here. I
> could try to write a patch, but I may likely miss something.
>
> -- Steve
Take a look at
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=1946014ca3b19be9e485e780e862c375c6f98bad
Ie, use an ->liveĀ (or ->dead) field.