2023-07-25 03:45:21

by Zheng Wang

[permalink] [raw]
Subject: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
If timeout occurs, it will start the work. And if we call
ravb_remove without finishing the work, there may be a
use-after-free bug on ndev.

Fix it by finishing the job before cleanup in ravb_remove.

Note that this bug is found by static analysis, it might be
false positive.

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Zheng Wang <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
---
v4:
- add information about the bug was found suggested by Yunsheng Lin
v3:
- fix typo in commit message
v2:
- stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
add an empty line to make code clear suggested by Sergey Shtylyov
---
drivers/net/ethernet/renesas/ravb_main.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 4d6b3b7d6abb..ce2da5101e51 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;

+ netif_carrier_off(ndev);
+ netif_tx_disable(ndev);
+ cancel_work_sync(&priv->work);
/* Stop PTP Clock driver */
if (info->ccc_gac)
ravb_ptp_stop(ndev);
--
2.25.1



2023-07-26 04:07:21

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote:
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 4d6b3b7d6abb..ce2da5101e51 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> struct ravb_private *priv = netdev_priv(ndev);
> const struct ravb_hw_info *info = priv->info;
>
> + netif_carrier_off(ndev);
> + netif_tx_disable(ndev);
> + cancel_work_sync(&priv->work);

Still racy, the carrier can come back up after canceling the work.
But whatever, this is a non-issue in the first place.
The fact that ravb_tx_timeout_work doesn't take any locks seems much
more suspicious.

2023-07-27 09:08:02

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Tue, 2023-07-25 at 20:19 -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote:
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index 4d6b3b7d6abb..ce2da5101e51 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> > struct ravb_private *priv = netdev_priv(ndev);
> > const struct ravb_hw_info *info = priv->info;
> >
> > + netif_carrier_off(ndev);
> > + netif_tx_disable(ndev);
> > + cancel_work_sync(&priv->work);
>
> Still racy, the carrier can come back up after canceling the work.

I must admit I don't see how/when this driver sets the carrier on ?!?

> But whatever, this is a non-issue in the first place.

Do you mean the UaF can't happen? I think that is real.

> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> more suspicious.

Indeed! But that should be a different patch, right?

Waiting a little more for feedback from renesas.

/P


2023-07-27 19:37:05

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello!

On 7/27/23 11:21 AM, Paolo Abeni wrote:
[...]
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 4d6b3b7d6abb..ce2da5101e51 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
>>> struct ravb_private *priv = netdev_priv(ndev);
>>> const struct ravb_hw_info *info = priv->info;
>>>
>>> + netif_carrier_off(ndev);
>>> + netif_tx_disable(ndev);
>>> + cancel_work_sync(&priv->work);
>>
>> Still racy, the carrier can come back up after canceling the work.
>
> I must admit I don't see how/when this driver sets the carrier on ?!?

The phylib code does it for this MAC driver, see the call tree of
phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...

>> But whatever, this is a non-issue in the first place.
>
> Do you mean the UaF can't happen? I think that is real.

Looks possible to me, at least now... and anyway, shouldn't we clean up
after ourselves if we call schedule_work()?However my current impression is
that cancel_work_sync() should be called from ravb_close(), after calling
phy_{stop|disconnect}()...

>> The fact that ravb_tx_timeout_work doesn't take any locks seems much
>> more suspicious.
>
> Indeed! But that should be a different patch, right?

Yes.

> Waiting a little more for feedback from renesas.

Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
driver patches, so I took that task upon myself. I also happen to be a nominal
author of this driver... :-)

> /P

MBR, Sergey

2023-07-28 00:44:44

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Thu, 27 Jul 2023 21:48:41 +0300 Sergey Shtylyov wrote:
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...
>
> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)

Simplest fix I can think of is to take a reference on the netdev before
scheduling the work, and then check if it's still registered in the work
itself. Wrap the timeout work in rtnl_lock() to avoid any races there.

2023-07-28 11:30:04

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Thu, Jul 27, 2023 at 09:48:41PM +0300, Sergey Shtylyov wrote:
> Hello!
>
> On 7/27/23 11:21 AM, Paolo Abeni wrote:
> [...]
> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>> struct ravb_private *priv = netdev_priv(ndev);
> >>> const struct ravb_hw_info *info = priv->info;
> >>>
> >>> + netif_carrier_off(ndev);
> >>> + netif_tx_disable(ndev);
> >>> + cancel_work_sync(&priv->work);
> >>
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/...
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...
>
> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)

FWIIW, that matches my recollection.
Although it may be out of date by now.

2023-08-15 15:30:25

by Lee Jones

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Tue, 25 Jul 2023, Zheng Wang wrote:

> In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> If timeout occurs, it will start the work. And if we call
> ravb_remove without finishing the work, there may be a
> use-after-free bug on ndev.
>
> Fix it by finishing the job before cleanup in ravb_remove.
>
> Note that this bug is found by static analysis, it might be
> false positive.
>
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Signed-off-by: Zheng Wang <[email protected]>
> Reviewed-by: Sergey Shtylyov <[email protected]>
> ---
> v4:
> - add information about the bug was found suggested by Yunsheng Lin
> v3:
> - fix typo in commit message
> v2:
> - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> add an empty line to make code clear suggested by Sergey Shtylyov
> ---
> drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> 1 file changed, 3 insertions(+)

Trying my best not to sound like a broken record, but ...

What's the latest with this fix? Is a v5 en route?

--
Lee Jones [李琼斯]

2023-08-29 19:18:32

by Lee Jones

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

On Tue, 15 Aug 2023, Lee Jones wrote:

> On Tue, 25 Jul 2023, Zheng Wang wrote:
>
> > In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> > If timeout occurs, it will start the work. And if we call
> > ravb_remove without finishing the work, there may be a
> > use-after-free bug on ndev.
> >
> > Fix it by finishing the job before cleanup in ravb_remove.
> >
> > Note that this bug is found by static analysis, it might be
> > false positive.
> >
> > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > Signed-off-by: Zheng Wang <[email protected]>
> > Reviewed-by: Sergey Shtylyov <[email protected]>
> > ---
> > v4:
> > - add information about the bug was found suggested by Yunsheng Lin
> > v3:
> > - fix typo in commit message
> > v2:
> > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> > add an empty line to make code clear suggested by Sergey Shtylyov
> > ---
> > drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> > 1 file changed, 3 insertions(+)
>
> Trying my best not to sound like a broken record, but ...
>
> What's the latest with this fix? Is a v5 en route?

Any update please Zheng Wang?

--
Lee Jones [李琼斯]

2023-08-30 18:33:47

by Zheng Hacker

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Sorry for my late reply. I'll update another patch later today.

Best regards,
Zheng

Lee Jones <[email protected]> 于2023年8月29日周二 21:46写道:
>
> On Tue, 15 Aug 2023, Lee Jones wrote:
>
> > On Tue, 25 Jul 2023, Zheng Wang wrote:
> >
> > > In ravb_probe, priv->work was bound with ravb_tx_timeout_work.
> > > If timeout occurs, it will start the work. And if we call
> > > ravb_remove without finishing the work, there may be a
> > > use-after-free bug on ndev.
> > >
> > > Fix it by finishing the job before cleanup in ravb_remove.
> > >
> > > Note that this bug is found by static analysis, it might be
> > > false positive.
> > >
> > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > > Signed-off-by: Zheng Wang <[email protected]>
> > > Reviewed-by: Sergey Shtylyov <[email protected]>
> > > ---
> > > v4:
> > > - add information about the bug was found suggested by Yunsheng Lin
> > > v3:
> > > - fix typo in commit message
> > > v2:
> > > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin,
> > > add an empty line to make code clear suggested by Sergey Shtylyov
> > > ---
> > > drivers/net/ethernet/renesas/ravb_main.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> >
> > Trying my best not to sound like a broken record, but ...
> >
> > What's the latest with this fix? Is a v5 en route?
>
> Any update please Zheng Wang?
>
> --
> Lee Jones [李琼斯]

2023-09-20 07:18:28

by Yoshihiro Shimoda

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello Sergey!

> From: Sergey Shtylyov, Sent: Friday, July 28, 2023 3:49 AM
>
> Hello!
>
> On 7/27/23 11:21 AM, Paolo Abeni wrote:
> [...]
> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>> struct ravb_private *priv = netdev_priv(ndev);
> >>> const struct ravb_hw_info *info = priv->info;
> >>>
> >>> + netif_carrier_off(ndev);
> >>> + netif_tx_disable(ndev);
> >>> + cancel_work_sync(&priv->work);
> >>
> >> Still racy, the carrier can come back up after canceling the work.
> >
> > I must admit I don't see how/when this driver sets the carrier on ?!?
>
> The phylib code does it for this MAC driver, see the call tree of
> phy_link_change(), on e.g.
> https://elixir.bootlin.com/linux/v6.5-rc3/source
>
> >> But whatever, this is a non-issue in the first place.
> >
> > Do you mean the UaF can't happen? I think that is real.
>
> Looks possible to me, at least now... and anyway, shouldn't we clean up
> after ourselves if we call schedule_work()?However my current impression is
> that cancel_work_sync() should be called from ravb_close(), after calling
> phy_{stop|disconnect}()...

I also think so.

ravb_remove() calls unregister_netdev().
-> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
--> unregiter_netdevice_queue()
---> unregiter_netdevice_many()
----> unregiter_netdevice_many_notify().
-----> dev_close_many()
------> __dev_close_many()
-------> ops->ndo_stop()

ravb_close() calls phy_stop()
-> phy_state_machine() with PHY_HALTED
--> phy_link_down()
---> phy_link_change()
----> netif_carrier_off()

The patch will be the following:
---
drivers/net/ethernet/renesas/ravb_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 7df9f9f8e134..e452d90de7c2 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
of_phy_deregister_fixed_link(np);
}

+ cancel_work_sync(&priv->work);
+
if (info->multi_irqs) {
free_irq(priv->tx_irqs[RAVB_NC], ndev);
free_irq(priv->rx_irqs[RAVB_NC], ndev);
---

If this patch is acceptable, I'll submit it. But, what do you think?

Best regards,
Yoshihiro Shimoda

> >> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >> more suspicious.
> >
> > Indeed! But that should be a different patch, right?
>
> Yes.
>
> > Waiting a little more for feedback from renesas.
>
> Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb
> driver patches, so I took that task upon myself. I also happen to be a nominal
> author of this driver... :-)
>
> > /P
>
> MBR, Sergey

2023-09-29 21:46:34

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello!

On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:

Sorry, I got ill that same day and still have subfebrile temperature,
and I forgot about your mail. I'll try replying to it on this weekend...

[...]

MBR, Sergey

2023-10-02 08:12:42

by Yoshihiro Shimoda

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello Sergey,

> From: Sergey Shtylyov, Sent: Saturday, September 30, 2023 5:23 AM
>
> Hello!
>
> On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:
>
> Sorry, I got ill that same day and still have subfebrile temperature,
> and I forgot about your mail. I'll try replying to it on this weekend...

Thank you for your reply! I understood it. Please take care of yourself.
I hope you will get well soon.

Best regards,
Yoshihiro Shimoda

> [...]
>
> MBR, Sergey

2023-10-03 19:39:50

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello!

Concerning the subject: I doubt that UAF acronym is known to
everybody (e.g. it wasn't known to me), I think we should be able
to afford spelling out use-after-free there...

On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:
[...]

>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> index 4d6b3b7d6abb..ce2da5101e51 100644
>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
>>>>> struct ravb_private *priv = netdev_priv(ndev);
>>>>> const struct ravb_hw_info *info = priv->info;
>>>>>
>>>>> + netif_carrier_off(ndev);
>>>>> + netif_tx_disable(ndev);
>>>>> + cancel_work_sync(&priv->work);
>>>>
>>>> Still racy, the carrier can come back up after canceling the work.
>>>
>>> I must admit I don't see how/when this driver sets the carrier on ?!?
>>
>> The phylib code does it for this MAC driver, see the call tree of
>> phy_link_change(), on e.g.
>> https://elixir.bootlin.com/linux/v6.5-rc3/source
>>
>>>> But whatever, this is a non-issue in the first place.
>>>
>>> Do you mean the UaF can't happen? I think that is real.
>>
>> Looks possible to me, at least now... and anyway, shouldn't we clean up
>> after ourselves if we call schedule_work()?However my current impression is
>> that cancel_work_sync() should be called from ravb_close(), after calling
>> phy_{stop|disconnect}()...
>
> I also think so.
>
> ravb_remove() calls unregister_netdev().
> -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> --> unregiter_netdevice_queue()
> ---> unregiter_netdevice_many()
> ----> unregiter_netdevice_many_notify().
> -----> dev_close_many()
> ------> __dev_close_many()
> -------> ops->ndo_stop()
>
> ravb_close() calls phy_stop()
> -> phy_state_machine() with PHY_HALTED
> --> phy_link_down()
> ---> phy_link_change()
> ----> netif_carrier_off()

Thanks for sharing the call chain, I've followed it once again... :-)

> The patch will be the following:
> ---
> drivers/net/ethernet/renesas/ravb_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 7df9f9f8e134..e452d90de7c2 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
> of_phy_deregister_fixed_link(np);
> }
>
> + cancel_work_sync(&priv->work);
> +
> if (info->multi_irqs) {
> free_irq(priv->tx_irqs[RAVB_NC], ndev);
> free_irq(priv->rx_irqs[RAVB_NC], ndev);
> ---
>
> If this patch is acceptable, I'll submit it. But, what do you think?

I think it should do the job. And I suspect you can even test it... :-)

> Best regards,
> Yoshihiro Shimoda

[...]

MBR, Sergey

2023-10-04 05:06:16

by Yoshihiro Shimoda

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello Sergey,

> From: Sergey Shtylyov, Sent: Wednesday, October 4, 2023 4:39 AM
>
> Hello!
>
> Concerning the subject: I doubt that UAF acronym is known to
> everybody (e.g. it wasn't known to me), I think we should be able
> to afford spelling out use-after-free there...

I got it. I'll change the subject.

> On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote:
> [...]
>
> >>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> index 4d6b3b7d6abb..ce2da5101e51 100644
> >>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev)
> >>>>> struct ravb_private *priv = netdev_priv(ndev);
> >>>>> const struct ravb_hw_info *info = priv->info;
> >>>>>
> >>>>> + netif_carrier_off(ndev);
> >>>>> + netif_tx_disable(ndev);
> >>>>> + cancel_work_sync(&priv->work);
> >>>>
> >>>> Still racy, the carrier can come back up after canceling the work.
> >>>
> >>> I must admit I don't see how/when this driver sets the carrier on ?!?
> >>
> >> The phylib code does it for this MAC driver, see the call tree of
> >> phy_link_change(), on e.g.
> >>
<snip URL>
> >>
> >>>> But whatever, this is a non-issue in the first place.
> >>>
> >>> Do you mean the UaF can't happen? I think that is real.
> >>
> >> Looks possible to me, at least now... and anyway, shouldn't we clean up
> >> after ourselves if we call schedule_work()?However my current impression is
> >> that cancel_work_sync() should be called from ravb_close(), after calling
> >> phy_{stop|disconnect}()...
> >
> > I also think so.
> >
> > ravb_remove() calls unregister_netdev().
> > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> > --> unregiter_netdevice_queue()
> > ---> unregiter_netdevice_many()
> > ----> unregiter_netdevice_many_notify().
> > -----> dev_close_many()
> > ------> __dev_close_many()
> > -------> ops->ndo_stop()
> >
> > ravb_close() calls phy_stop()
> > -> phy_state_machine() with PHY_HALTED
> > --> phy_link_down()
> > ---> phy_link_change()
> > ----> netif_carrier_off()
>
> Thanks for sharing the call chain, I've followed it once again... :-)

Thank you :)

> > The patch will be the following:
> > ---
> > drivers/net/ethernet/renesas/ravb_main.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index 7df9f9f8e134..e452d90de7c2 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev)
> > of_phy_deregister_fixed_link(np);
> > }
> >
> > + cancel_work_sync(&priv->work);
> > +
> > if (info->multi_irqs) {
> > free_irq(priv->tx_irqs[RAVB_NC], ndev);
> > free_irq(priv->rx_irqs[RAVB_NC], ndev);
> > ---
> >
> > If this patch is acceptable, I'll submit it. But, what do you think?
>
> I think it should do the job.

Thank you for your comment! I'll make such a patch.

> And I suspect you can even test it... :-)

IIUC, causing tx timeout is difficult. But, I guess
we can add a fault injection code somehow.

Best regards,
Yoshihiro Shimoda

> > Best regards,
> > Yoshihiro Shimoda
>
> [...]
>
> MBR, Sergey

2023-10-10 12:59:39

by Dirk Behme

[permalink] [raw]
Subject: Re: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove


On 26.07.2023 05:19, Jakub Kicinski wrote:
...
> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> more suspicious.
Does anybody plan to look into this, too?

Best regards

Dirk

2023-10-12 08:40:03

by Yoshihiro Shimoda

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hello Behme,

> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM
>
> On 26.07.2023 05:19, Jakub Kicinski wrote:
> ...
> > The fact that ravb_tx_timeout_work doesn't take any locks seems much
> > more suspicious.
> Does anybody plan to look into this, too?

I believe my fixed patch [1] resolved this issue too. Let me explain it in detail below.

In the thread, Jakub also mentioned [2] like below:
---
Simplest fix I can think of is to take a reference on the netdev before
scheduling the work, and then check if it's still registered in the work
itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
---

Sergey suggested to add cancel_work_sync() into the ravb_close () [3].
And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4]
like below:
-----------------------------------------------------------------------
ravb_remove() calls unregister_netdev().
-> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
--> unregiter_netdevice_queue()
---> unregiter_netdevice_many()
----> unregiter_netdevice_many_notify().
-----> dev_close_many()
------> __dev_close_many()
-------> ops->ndo_stop()

ravb_close() calls phy_stop()
-> phy_state_machine() with PHY_HALTED
--> phy_link_down()
---> phy_link_change()
----> netif_carrier_off()
-----------------------------------------------------------------------

So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(),
it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work().

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89
[2] https://lore.kernel.org/netdev/[email protected]/
[3] https://lore.kernel.org/netdev/[email protected]/
[4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/

Best regards,
Yoshihiro Shimoda

> Best regards
>
> Dirk

2023-10-13 06:05:16

by Dirk Behme

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hi,

On 12.10.2023 10:39, Yoshihiro Shimoda wrote:
> Hello Behme,
>
>> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM
>>
>> On 26.07.2023 05:19, Jakub Kicinski wrote:
>> ...
>>> The fact that ravb_tx_timeout_work doesn't take any locks seems much
>>> more suspicious.
>> Does anybody plan to look into this, too?
>
> I believe my fixed patch [1] resolved this issue too.


I'm not an expert of this driver nor the network stack, so sorry if I'm
totally wrong here ;) But somehow this answer confuses me. Let me explain:

What you did with [1] is to stop/cancel the workqueue in ravb_close().
That's fine. But that is at driver's close time.

What's about driver's normal runtime? What I understood is that
ravb_tx_timeout_work() might run totally asynchronously to the rest of
the driver. And therefore needs locking/protection/synchronization if it
uses resources of the driver which are used elsewhere in the driver, too.

I think this is exactly what is described with:

> ---
> Simplest fix I can think of is to take a reference on the netdev before
> scheduling the work, and then check if it's still registered in the work
> itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
> ---

So, where is above done? Not at close time, but at normal run time of
the driver?

Best regards

Dirk

> Sergey suggested to add cancel_work_sync() into the ravb_close () [3].
> And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4]
> like below:
> -----------------------------------------------------------------------
> ravb_remove() calls unregister_netdev().
> -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> --> unregiter_netdevice_queue()
> ---> unregiter_netdevice_many()
> ----> unregiter_netdevice_many_notify().
> -----> dev_close_many()
> ------> __dev_close_many()
> -------> ops->ndo_stop()
>
> ravb_close() calls phy_stop()
> -> phy_state_machine() with PHY_HALTED
> --> phy_link_down()
> ---> phy_link_change()
> ----> netif_carrier_off()
> -----------------------------------------------------------------------
>
> So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(),
> it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work().
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89
> [2] https://lore.kernel.org/netdev/[email protected]/
> [3] https://lore.kernel.org/netdev/[email protected]/
> [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/
>
> Best regards,
> Yoshihiro Shimoda
>
>> Best regards
>>
>> Dirk

--
======================================================================
Dirk Behme Robert Bosch Car Multimedia GmbH
CM/ESO2
Phone: +49 5121 49-3274 Dirk Behme
Fax: +49 711 811 5053274 PO Box 77 77 77
mailto:[email protected] D-31132 Hildesheim - Germany

Bosch Group, Car Multimedia (CM)
Engineering SW Operating Systems 2 (ESO2)

Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe
Sitz: Hildesheim
Registergericht: Amtsgericht Hildesheim HRB 201334
Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel
Geschäftsführung: Dr. Steffen Berns;
Dr. Sven Ost, Jörg Pollak, Dr. Walter Schirm
======================================================================

2023-10-13 08:32:32

by Yoshihiro Shimoda

[permalink] [raw]
Subject: RE: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove

Hi,

> From: Behme Dirk (CM/ESO2), Sent: Friday, October 13, 2023 3:05 PM
>
> Hi,
>
> On 12.10.2023 10:39, Yoshihiro Shimoda wrote:
> > Hello Behme,
> >
> >> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM
> >>
> >> On 26.07.2023 05:19, Jakub Kicinski wrote:
> >> ...
> >>> The fact that ravb_tx_timeout_work doesn't take any locks seems much
> >>> more suspicious.
> >> Does anybody plan to look into this, too?
> >
> > I believe my fixed patch [1] resolved this issue too.
>
>
> I'm not an expert of this driver nor the network stack, so sorry if I'm
> totally wrong here ;) But somehow this answer confuses me. Let me explain:
>
> What you did with [1] is to stop/cancel the workqueue in ravb_close().
> That's fine. But that is at driver's close time.
>
> What's about driver's normal runtime? What I understood is that
> ravb_tx_timeout_work() might run totally asynchronously to the rest of
> the driver. And therefore needs locking/protection/synchronization if it
> uses resources of the driver which are used elsewhere in the driver, too.
>
> I think this is exactly what is described with:
>
> > ---
> > Simplest fix I can think of is to take a reference on the netdev before
> > scheduling the work, and then check if it's still registered in the work
> > itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
> > ---
>
> So, where is above done? Not at close time, but at normal run time of
> the driver?

Thank you very much for your detailed explanation. I understood it.
ravb_tx_timeout_work() still has races between ethtool ops for instance.
So, I'll make a patch for it by early next week. However, IIUC, using
rtnl_lock() in ravb_tx_timeout_work() is possible to cause deadlock from
cancel_work_sync() in ravb_close(). So, I'll use rtnl_trylock() instead.

Best regards,
Yoshihiro Shimoda

> Best regards
>
> Dirk
>
> > Sergey suggested to add cancel_work_sync() into the ravb_close () [3].
> > And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4]
> > like below:
> > -----------------------------------------------------------------------
> > ravb_remove() calls unregister_netdev().
> > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice().
> > --> unregiter_netdevice_queue()
> > ---> unregiter_netdevice_many()
> > ----> unregiter_netdevice_many_notify().
> > -----> dev_close_many()
> > ------> __dev_close_many()
> > -------> ops->ndo_stop()
> >
> > ravb_close() calls phy_stop()
> > -> phy_state_machine() with PHY_HALTED
> > --> phy_link_down()
> > ---> phy_link_change()
> > ----> netif_carrier_off()
> > -----------------------------------------------------------------------
> >
> > So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(),
> > it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work().
> >
> > [1]
> https://git.kernel.org/pub/scm/linux/kernel/git%25
> 2Fnetdev%2Fnet.git%2Fcommit%2F%3Fid%3D3971442870713de527684398416970cf025b4f89&data=05%7C01%7Cyoshihiro.shimoda.uh%4
> 0renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HkA8f5a
> gawjXMvAGkaE6tELaSpjpbIn7M3mU5xbDTD0%3D&reserved=0
> > [2]
> https://lore.kernel.org/netdev/20230727164820.48c9e685
> %40kernel.org%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da19
> 47e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTi
> I6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cGvnA8WqxM%2FUDa%2FNS2OBztr1IWgjCX4IzBYXe1LGkZU%3D&reserved=0
> > [3]
> https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2
> -0cf769b48187%40omp.ru%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d
> 82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OWwBKy%2Fdckgo3clPPfn2hxE4H6ToyqdcbhPhGoqoo30%3D&reserved=0
> > [4]
> https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3
> BA41A6752CD8F9A%40OSYPR01MB5334.jpnprd01.prod.outlook.com%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466
> e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jfypf10jiUfTqWUAukjnPzIQp
> urx7m0ETF5N2Toq8wE%3D&reserved=0
> >
> > Best regards,
> > Yoshihiro Shimoda
> >
> >> Best regards
> >>
> >> Dirk
>
> --
> ======================================================================
> Dirk Behme Robert Bosch Car Multimedia GmbH
> CM/ESO2
> Phone: +49 5121 49-3274 Dirk Behme
> Fax: +49 711 811 5053274 PO Box 77 77 77
> mailto:[email protected] D-31132 Hildesheim - Germany
>
> Bosch Group, Car Multimedia (CM)
> Engineering SW Operating Systems 2 (ESO2)
>
> Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe
> Sitz: Hildesheim
> Registergericht: Amtsgericht Hildesheim HRB 201334
> Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel
> Gesch?ftsf?hrung: Dr. Steffen Berns;
> Dr. Sven Ost, J?rg Pollak, Dr. Walter Schirm
> ======================================================================