When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <[email protected]>
---
net/ax25/ax25_ds_timer.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
index c4f8adbf814..5624c0d174c 100644
--- a/net/ax25/ax25_ds_timer.c
+++ b/net/ax25/ax25_ds_timer.c
@@ -43,7 +43,12 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
void ax25_ds_del_timer(ax25_dev *ax25_dev)
{
- if (ax25_dev)
+ if (!ax25_dev)
+ return;
+
+ if (!ax25_dev->device_up)
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
+ else
del_timer(&ax25_dev->dama.slave_timer);
}
--
2.17.1
On Tue, Mar 26, 2024 at 10:25:42PM +0800, Duoming Zhou wrote:
> When the ax25 device is detaching, the ax25_dev_device_down()
> calls ax25_ds_del_timer() to cleanup the slave_timer. When
> the timer handler is running, the ax25_ds_del_timer() that
> calls del_timer() in it will return directly. As a result,
> the use-after-free bugs could happen, one of the scenarios
> is shown below:
>
> (Thread 1) | (Thread 2)
> | ax25_ds_timeout()
> ax25_dev_device_down() |
> ax25_ds_del_timer() |
> del_timer() |
> ax25_dev_put() //FREE |
> | ax25_dev-> //USE
>
> In order to mitigate bugs, when the device is detaching, use
> timer_shutdown_sync() to stop the timer.
FWIIW, in my reading of things there is another failure mode whereby
ax25_ds_timeout may rearm the timer after the call to del_timer() but
before the call to ax25_dev_put().
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Duoming Zhou <[email protected]>
> ---
> net/ax25/ax25_ds_timer.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> index c4f8adbf814..5624c0d174c 100644
> --- a/net/ax25/ax25_ds_timer.c
> +++ b/net/ax25/ax25_ds_timer.c
> @@ -43,7 +43,12 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
>
> void ax25_ds_del_timer(ax25_dev *ax25_dev)
> {
> - if (ax25_dev)
> + if (!ax25_dev)
> + return;
> +
> + if (!ax25_dev->device_up)
> + timer_shutdown_sync(&ax25_dev->dama.slave_timer);
> + else
> del_timer(&ax25_dev->dama.slave_timer);
> }
I think that a) it is always correct to call timer_shutdown_sync,
and b) ax25_dev->device_up is always true. So a call to
timer_shutdown_sync can simply replace the call to del_timer.
Also, not strictly related, I think ax25_dev cannot be NULL,
so that check could be dropped. But perhaps that is better left alone.
Zooming out a bit, has removal of ax25 been considered.
I didn't check the logs thoroughly, but I'm not convinced it's been
maintained - other than clean-ups and by-inspection bug fixes - since git
history began.
On Wed, 27 Mar 2024 19:10:25 +0000 Simon Horman wrote:
> > When the ax25 device is detaching, the ax25_dev_device_down()
> > calls ax25_ds_del_timer() to cleanup the slave_timer. When
> > the timer handler is running, the ax25_ds_del_timer() that
> > calls del_timer() in it will return directly. As a result,
> > the use-after-free bugs could happen, one of the scenarios
> > is shown below:
> >
> > (Thread 1) | (Thread 2)
> > | ax25_ds_timeout()
> > ax25_dev_device_down() |
> > ax25_ds_del_timer() |
> > del_timer() |
> > ax25_dev_put() //FREE |
> > | ax25_dev-> //USE
> >
> > In order to mitigate bugs, when the device is detaching, use
> > timer_shutdown_sync() to stop the timer.
>
> FWIIW, in my reading of things there is another failure mode whereby
> ax25_ds_timeout may rearm the timer after the call to del_timer() but
> before the call to ax25_dev_put().
I think using timer_shutdown_sync() or del_timer_sync() to replace del_timer()
could prevent the rearm.
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Signed-off-by: Duoming Zhou <[email protected]>
> > ---
> > net/ax25/ax25_ds_timer.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> > index c4f8adbf814..5624c0d174c 100644
> > --- a/net/ax25/ax25_ds_timer.c
> > +++ b/net/ax25/ax25_ds_timer.c
> > @@ -43,7 +43,12 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
> >
> > void ax25_ds_del_timer(ax25_dev *ax25_dev)
> > {
> > - if (ax25_dev)
> > + if (!ax25_dev)
> > + return;
> > +
> > + if (!ax25_dev->device_up)
> > + timer_shutdown_sync(&ax25_dev->dama.slave_timer);
> > + else
> > del_timer(&ax25_dev->dama.slave_timer);
> > }
>
> I think that a) it is always correct to call timer_shutdown_sync,
> and b) ax25_dev->device_up is always true. So a call to
> timer_shutdown_sync can simply replace the call to del_timer.
I think timer_shutdown*() is used for the code path to clean up the
driver or detach the device. If timer is shut down by timer_shutdown*(),
it could not be re-armed again unless we reinitialize the timer. The
slave_timer should only be shut down when the ax25 device is detaching or
the driver is removing. And it should not be shut down in other scenarios,
such as called in ax25_ds_state2_machine() or ax25_ds_state3_machine().
So I think calling timer_shutdown_sync() is not always correct.
What's more, the ax25_dev->device_up is not always true. It is set to
false in ax25_kill_by_device().
In a word, the timer_shutdown_sync() could not replace the del_timer()
completely.
> Also, not strictly related, I think ax25_dev cannot be NULL,
> so that check could be dropped. But perhaps that is better left alone.
The ax25_dev cannot not be NULL, because we only use ax25_dev_put() to
free the ax25_dev instead of setting is to NULL. So I think the check
could be dropped.
Do you think the following plan is proper?
diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
index c4f8adbf8144..f1cab4effa44 100644
--- a/net/ax25/ax25_ds_timer.c
+++ b/net/ax25/ax25_ds_timer.c
@@ -43,8 +43,7 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
void ax25_ds_del_timer(ax25_dev *ax25_dev)
{
- if (ax25_dev)
- del_timer(&ax25_dev->dama.slave_timer);
+ del_timer_sync(&ax25_dev->dama.slave_timer);
}
There is no deadlock will happen.
> Zooming out a bit, has removal of ax25 been considered.
> I didn't check the logs thoroughly, but I'm not convinced it's been
> maintained - other than clean-ups and by-inspection bug fixes - since git
> history began.
Best regards,
Duoming Zhou
On Thu, Mar 28, 2024 at 01:34:48PM +0800, [email protected] wrote:
> On Wed, 27 Mar 2024 19:10:25 +0000 Simon Horman wrote:
> > > When the ax25 device is detaching, the ax25_dev_device_down()
> > > calls ax25_ds_del_timer() to cleanup the slave_timer. When
> > > the timer handler is running, the ax25_ds_del_timer() that
> > > calls del_timer() in it will return directly. As a result,
> > > the use-after-free bugs could happen, one of the scenarios
> > > is shown below:
> > >
> > > (Thread 1) | (Thread 2)
> > > | ax25_ds_timeout()
> > > ax25_dev_device_down() |
> > > ax25_ds_del_timer() |
> > > del_timer() |
> > > ax25_dev_put() //FREE |
> > > | ax25_dev-> //USE
> > >
> > > In order to mitigate bugs, when the device is detaching, use
> > > timer_shutdown_sync() to stop the timer.
> >
> > FWIIW, in my reading of things there is another failure mode whereby
> > ax25_ds_timeout may rearm the timer after the call to del_timer() but
> > before the call to ax25_dev_put().
>
> I think using timer_shutdown_sync() or del_timer_sync() to replace del_timer()
> could prevent the rearm.
I think only timer_shutdown() and timer_shutdown_sync() will prevent a
rearm. But I also think (but am not entirely sure) this is only important
in the ax25_dev_device_down() case (there are others, as you mention
below).
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Duoming Zhou <[email protected]>
> > > ---
> > > net/ax25/ax25_ds_timer.c | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> > > index c4f8adbf814..5624c0d174c 100644
> > > --- a/net/ax25/ax25_ds_timer.c
> > > +++ b/net/ax25/ax25_ds_timer.c
> > > @@ -43,7 +43,12 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
> > >
> > > void ax25_ds_del_timer(ax25_dev *ax25_dev)
> > > {
> > > - if (ax25_dev)
> > > + if (!ax25_dev)
> > > + return;
> > > +
> > > + if (!ax25_dev->device_up)
> > > + timer_shutdown_sync(&ax25_dev->dama.slave_timer);
> > > + else
> > > del_timer(&ax25_dev->dama.slave_timer);
> > > }
> >
> > I think that a) it is always correct to call timer_shutdown_sync,
> > and b) ax25_dev->device_up is always true. So a call to
> > timer_shutdown_sync can simply replace the call to del_timer.
>
> I think timer_shutdown*() is used for the code path to clean up the
> driver or detach the device. If timer is shut down by timer_shutdown*(),
> it could not be re-armed again unless we reinitialize the timer. The
> slave_timer should only be shut down when the ax25 device is detaching or
> the driver is removing. And it should not be shut down in other scenarios,
> such as called in ax25_ds_state2_machine() or ax25_ds_state3_machine().
> So I think calling timer_shutdown_sync() is not always correct.
>
> What's more, the ax25_dev->device_up is not always true. It is set to
> false in ax25_kill_by_device().
>
> In a word, the timer_shutdown_sync() could not replace the del_timer()
> completely.
Yes, sorry. I missed that ax25_ds_del_timer() is not
only called from ax25_dev_device_down().
> > Also, not strictly related, I think ax25_dev cannot be NULL,
> > so that check could be dropped. But perhaps that is better left alone.
>
> The ax25_dev cannot not be NULL, because we only use ax25_dev_put() to
> free the ax25_dev instead of setting is to NULL. So I think the check
> could be dropped.
>
> Do you think the following plan is proper?
>
> diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> index c4f8adbf8144..f1cab4effa44 100644
> --- a/net/ax25/ax25_ds_timer.c
> +++ b/net/ax25/ax25_ds_timer.c
> @@ -43,8 +43,7 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
>
> void ax25_ds_del_timer(ax25_dev *ax25_dev)
> {
> - if (ax25_dev)
> - del_timer(&ax25_dev->dama.slave_timer);
> + del_timer_sync(&ax25_dev->dama.slave_timer);
> }
>
> There is no deadlock will happen.
I'm actually getting to think that your original patch was correct.
But perhaps a different approach would be to simply call
timer_shutdown_sync() in ax25_dev_device_down(). And leave
ax25_ds_del_timer() alone.
>
> > Zooming out a bit, has removal of ax25 been considered.
> > I didn't check the logs thoroughly, but I'm not convinced it's been
> > maintained - other than clean-ups and by-inspection bug fixes - since git
> > history began.
>
> Best regards,
> Duoming Zhou
On Thu, 28 Mar 2024 18:12:50 +0000 Simon Horman wrote:
> > > > When the ax25 device is detaching, the ax25_dev_device_down()
> > > > calls ax25_ds_del_timer() to cleanup the slave_timer. When
> > > > the timer handler is running, the ax25_ds_del_timer() that
> > > > calls del_timer() in it will return directly. As a result,
> > > > the use-after-free bugs could happen, one of the scenarios
> > > > is shown below:
> > > >
> > > > (Thread 1) | (Thread 2)
> > > > | ax25_ds_timeout()
> > > > ax25_dev_device_down() |
> > > > ax25_ds_del_timer() |
> > > > del_timer() |
> > > > ax25_dev_put() //FREE |
> > > > | ax25_dev-> //USE
> > > >
> > > > In order to mitigate bugs, when the device is detaching, use
> > > > timer_shutdown_sync() to stop the timer.
> > >
> > > FWIIW, in my reading of things there is another failure mode whereby
> > > ax25_ds_timeout may rearm the timer after the call to del_timer() but
> > > before the call to ax25_dev_put().
> >
> > I think using timer_shutdown_sync() or del_timer_sync() to replace del_timer()
> > could prevent the rearm.
>
> I think only timer_shutdown() and timer_shutdown_sync() will prevent a
> rearm. But I also think (but am not entirely sure) this is only important
> in the ax25_dev_device_down() case (there are others, as you mention
> below).
When timer is rearmed in it's handler, the del_timer_sync() could prevent the
rearming. But when timer is rearmed in other threads, the del_timer_sync() could
not prevent it. The following code is apart of the del_timer_sync().
do {
ret = __try_to_del_timer_sync(timer, shutdown);
if (unlikely(ret < 0)) {
del_timer_wait_running(timer);
cpu_relax();
}
} while (ret < 0);
In the ax25_dev_device_down() case, I think it is better to use
timer_shutdown_sync().
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Duoming Zhou <[email protected]>
> > > > ---
> > > > net/ax25/ax25_ds_timer.c | 7 ++++++-
> > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> > > > index c4f8adbf814..5624c0d174c 100644
> > > > --- a/net/ax25/ax25_ds_timer.c
> > > > +++ b/net/ax25/ax25_ds_timer.c
> > > > @@ -43,7 +43,12 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
> > > >
> > > > void ax25_ds_del_timer(ax25_dev *ax25_dev)
> > > > {
> > > > - if (ax25_dev)
> > > > + if (!ax25_dev)
> > > > + return;
> > > > +
> > > > + if (!ax25_dev->device_up)
> > > > + timer_shutdown_sync(&ax25_dev->dama.slave_timer);
> > > > + else
> > > > del_timer(&ax25_dev->dama.slave_timer);
> > > > }
> > >
> > > I think that a) it is always correct to call timer_shutdown_sync,
> > > and b) ax25_dev->device_up is always true. So a call to
> > > timer_shutdown_sync can simply replace the call to del_timer.
> >
> > I think timer_shutdown*() is used for the code path to clean up the
> > driver or detach the device. If timer is shut down by timer_shutdown*(),
> > it could not be re-armed again unless we reinitialize the timer. The
> > slave_timer should only be shut down when the ax25 device is detaching or
> > the driver is removing. And it should not be shut down in other scenarios,
> > such as called in ax25_ds_state2_machine() or ax25_ds_state3_machine().
> > So I think calling timer_shutdown_sync() is not always correct.
> >
> > What's more, the ax25_dev->device_up is not always true. It is set to
> > false in ax25_kill_by_device().
> >
> > In a word, the timer_shutdown_sync() could not replace the del_timer()
> > completely.
>
> Yes, sorry. I missed that ax25_ds_del_timer() is not
> only called from ax25_dev_device_down().
>
> > > Also, not strictly related, I think ax25_dev cannot be NULL,
> > > so that check could be dropped. But perhaps that is better left alone.
> >
> > The ax25_dev cannot not be NULL, because we only use ax25_dev_put() to
> > free the ax25_dev instead of setting is to NULL. So I think the check
> > could be dropped.
> >
> > Do you think the following plan is proper?
> >
> > diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c
> > index c4f8adbf8144..f1cab4effa44 100644
> > --- a/net/ax25/ax25_ds_timer.c
> > +++ b/net/ax25/ax25_ds_timer.c
> > @@ -43,8 +43,7 @@ void ax25_ds_setup_timer(ax25_dev *ax25_dev)
> >
> > void ax25_ds_del_timer(ax25_dev *ax25_dev)
> > {
> > - if (ax25_dev)
> > - del_timer(&ax25_dev->dama.slave_timer);
> > + del_timer_sync(&ax25_dev->dama.slave_timer);
> > }
> >
> > There is no deadlock will happen.
>
> I'm actually getting to think that your original patch was correct.
> But perhaps a different approach would be to simply call
> timer_shutdown_sync() in ax25_dev_device_down(). And leave
> ax25_ds_del_timer() alone.
I think using timer_shutdown_sync() in ax25_dev_device_down() and
leaving ax25_ds_del_timer() alone is better than the original patch.
Thank you for your suggestions!
Best regards,
Duoming Zhou