2021-10-27 21:27:11

by Ziyang Xuan (William)

[permalink] [raw]
Subject: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

The real_dev of a vlan net_device may be freed after
unregister_vlan_dev(). Access the real_dev continually by
vlan_dev_real_dev() will trigger the UAF problem for the
real_dev like following:

==================================================================
BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
Call Trace:
kasan_report.cold+0x83/0xdf
vlan_dev_real_dev+0xf9/0x120
is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
is_eth_port_of_netdev_filter+0x28/0x40
ib_enum_roce_netdev+0x1a3/0x300
ib_enum_all_roce_netdevs+0xc7/0x140
netdevice_event_work_handler+0x9d/0x210
...

Freed by task 9288:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xfc/0x130
slab_free_freelist_hook+0xdd/0x240
kfree+0xe4/0x690
kvfree+0x42/0x50
device_release+0x9f/0x240
kobject_put+0x1c8/0x530
put_device+0x1b/0x30
free_netdev+0x370/0x540
ppp_destroy_interface+0x313/0x3d0
...

Set vlan->real_dev to NULL after dev_put(real_dev) in
unregister_vlan_dev(). Check real_dev is not NULL before
access it in vlan_dev_real_dev().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Signed-off-by: Ziyang Xuan <[email protected]>
---
net/8021q/vlan.c | 1 +
net/8021q/vlan_core.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 55275ef9a31a..1106da84e725 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)

/* Get rid of the vlan's reference to real_dev */
dev_put(real_dev);
+ vlan->real_dev = NULL;
}

int vlan_check_real_dev(struct net_device *real_dev,
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 59bc13b5f14f..343f34479d8b 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
{
struct net_device *ret = vlan_dev_priv(dev)->real_dev;

- while (is_vlan_dev(ret))
+ while (ret && is_vlan_dev(ret))
ret = vlan_dev_priv(ret)->real_dev;

return ret;
--
2.25.1


2021-10-28 01:48:49

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
>
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
> kasan_report.cold+0x83/0xdf
> vlan_dev_real_dev+0xf9/0x120
> is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> is_eth_port_of_netdev_filter+0x28/0x40
> ib_enum_roce_netdev+0x1a3/0x300
> ib_enum_all_roce_netdevs+0xc7/0x140
> netdevice_event_work_handler+0x9d/0x210
> ...
>
> Freed by task 9288:
> kasan_save_stack+0x1b/0x40
> kasan_set_track+0x1c/0x30
> kasan_set_free_info+0x20/0x30
> __kasan_slab_free+0xfc/0x130
> slab_free_freelist_hook+0xdd/0x240
> kfree+0xe4/0x690
> kvfree+0x42/0x50
> device_release+0x9f/0x240
> kobject_put+0x1c8/0x530
> put_device+0x1b/0x30
> free_netdev+0x370/0x540
> ppp_destroy_interface+0x313/0x3d0
> ...
>
> Set vlan->real_dev to NULL after dev_put(real_dev) in
> unregister_vlan_dev(). Check real_dev is not NULL before
> access it in vlan_dev_real_dev().
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: [email protected]
> Signed-off-by: Ziyang Xuan <[email protected]>
> ---
> net/8021q/vlan.c | 1 +
> net/8021q/vlan_core.c | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 55275ef9a31a..1106da84e725 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>
> /* Get rid of the vlan's reference to real_dev */
> dev_put(real_dev);
> + vlan->real_dev = NULL;
> }
>
> int vlan_check_real_dev(struct net_device *real_dev,
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index 59bc13b5f14f..343f34479d8b 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
> {
> struct net_device *ret = vlan_dev_priv(dev)->real_dev;
>
> - while (is_vlan_dev(ret))
> + while (ret && is_vlan_dev(ret))
> ret = vlan_dev_priv(ret)->real_dev;
>
> return ret;

But will make all the callers of vlan_dev_real_dev() feel like they
should NULL-check the result, which is not necessary.

RDMA must be calling this helper on a vlan which was already
unregistered, can we fix RDMA instead?

2021-10-28 05:47:10

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Wed, Oct 27, 2021 at 06:46:40PM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> > The real_dev of a vlan net_device may be freed after
> > unregister_vlan_dev(). Access the real_dev continually by
> > vlan_dev_real_dev() will trigger the UAF problem for the
> > real_dev like following:
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> > Call Trace:
> > kasan_report.cold+0x83/0xdf
> > vlan_dev_real_dev+0xf9/0x120
> > is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> > is_eth_port_of_netdev_filter+0x28/0x40
> > ib_enum_roce_netdev+0x1a3/0x300
> > ib_enum_all_roce_netdevs+0xc7/0x140
> > netdevice_event_work_handler+0x9d/0x210
> > ...
> >
> > Freed by task 9288:
> > kasan_save_stack+0x1b/0x40
> > kasan_set_track+0x1c/0x30
> > kasan_set_free_info+0x20/0x30
> > __kasan_slab_free+0xfc/0x130
> > slab_free_freelist_hook+0xdd/0x240
> > kfree+0xe4/0x690
> > kvfree+0x42/0x50
> > device_release+0x9f/0x240
> > kobject_put+0x1c8/0x530
> > put_device+0x1b/0x30
> > free_netdev+0x370/0x540
> > ppp_destroy_interface+0x313/0x3d0
> > ...
> >
> > Set vlan->real_dev to NULL after dev_put(real_dev) in
> > unregister_vlan_dev(). Check real_dev is not NULL before
> > access it in vlan_dev_real_dev().
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Reported-by: [email protected]
> > Signed-off-by: Ziyang Xuan <[email protected]>
> > ---
> > net/8021q/vlan.c | 1 +
> > net/8021q/vlan_core.c | 2 +-
> > 2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..1106da84e725 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >
> > /* Get rid of the vlan's reference to real_dev */
> > dev_put(real_dev);
> > + vlan->real_dev = NULL;
> > }
> >
> > int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> > index 59bc13b5f14f..343f34479d8b 100644
> > --- a/net/8021q/vlan_core.c
> > +++ b/net/8021q/vlan_core.c
> > @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
> > {
> > struct net_device *ret = vlan_dev_priv(dev)->real_dev;
> >
> > - while (is_vlan_dev(ret))
> > + while (ret && is_vlan_dev(ret))
> > ret = vlan_dev_priv(ret)->real_dev;
> >
> > return ret;
>
> But will make all the callers of vlan_dev_real_dev() feel like they
> should NULL-check the result, which is not necessary.
>
> RDMA must be calling this helper on a vlan which was already
> unregistered, can we fix RDMA instead?

He tried to fix RDMA first, but we came to conclusion that we real
solution is in netdev land.

https://lore.kernel.org/linux-rdma/[email protected]/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

Thanks

2021-10-28 11:46:36

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Wed, Oct 27, 2021 at 06:46:40PM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> > The real_dev of a vlan net_device may be freed after
> > unregister_vlan_dev(). Access the real_dev continually by
> > vlan_dev_real_dev() will trigger the UAF problem for the
> > real_dev like following:
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> > Call Trace:
> > kasan_report.cold+0x83/0xdf
> > vlan_dev_real_dev+0xf9/0x120
> > is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> > is_eth_port_of_netdev_filter+0x28/0x40
> > ib_enum_roce_netdev+0x1a3/0x300
> > ib_enum_all_roce_netdevs+0xc7/0x140
> > netdevice_event_work_handler+0x9d/0x210
> > ...
> >
> > Freed by task 9288:
> > kasan_save_stack+0x1b/0x40
> > kasan_set_track+0x1c/0x30
> > kasan_set_free_info+0x20/0x30
> > __kasan_slab_free+0xfc/0x130
> > slab_free_freelist_hook+0xdd/0x240
> > kfree+0xe4/0x690
> > kvfree+0x42/0x50
> > device_release+0x9f/0x240
> > kobject_put+0x1c8/0x530
> > put_device+0x1b/0x30
> > free_netdev+0x370/0x540
> > ppp_destroy_interface+0x313/0x3d0
> > ...
> >
> > Set vlan->real_dev to NULL after dev_put(real_dev) in
> > unregister_vlan_dev(). Check real_dev is not NULL before
> > access it in vlan_dev_real_dev().
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Reported-by: [email protected]
> > Signed-off-by: Ziyang Xuan <[email protected]>
> > net/8021q/vlan.c | 1 +
> > net/8021q/vlan_core.c | 2 +-
> > 2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..1106da84e725 100644
> > +++ b/net/8021q/vlan.c
> > @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >
> > /* Get rid of the vlan's reference to real_dev */
> > dev_put(real_dev);
> > + vlan->real_dev = NULL;
> > }
> >
> > int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> > index 59bc13b5f14f..343f34479d8b 100644
> > +++ b/net/8021q/vlan_core.c
> > @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
> > {
> > struct net_device *ret = vlan_dev_priv(dev)->real_dev;
> >
> > - while (is_vlan_dev(ret))
> > + while (ret && is_vlan_dev(ret))
> > ret = vlan_dev_priv(ret)->real_dev;
> >
> > return ret;
>
> But will make all the callers of vlan_dev_real_dev() feel like they
> should NULL-check the result, which is not necessary.

Isn't it better to reliably return NULL instead of a silent UAF in
this edge case?

> RDMA must be calling this helper on a vlan which was already
> unregistered, can we fix RDMA instead?

RDMA holds a get on the netdev which prevents unregistration, however
unregister_vlan_dev() does:

unregister_netdevice_queue(dev, head);
dev_put(real_dev);

Which corrupts the still registered vlan device while it is sitting in
the queue waiting to unregister. So, it is not true that a registered
vlan device always has working vlan_dev_real_dev().

Jason

2021-10-28 14:04:04

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
> > But will make all the callers of vlan_dev_real_dev() feel like they
> > should NULL-check the result, which is not necessary.
>
> Isn't it better to reliably return NULL instead of a silent UAF in
> this edge case?

I don't know what the best practice is for maintaining sanity of
unregistered objects.

If there really is a requirement for the real_dev pointer to be sane we
may want to move the put_device(real_dev) to vlan_dev_free(). There
should not be any risk of circular dependency but I'm not 100% sure.

> > RDMA must be calling this helper on a vlan which was already
> > unregistered, can we fix RDMA instead?
>
> RDMA holds a get on the netdev which prevents unregistration, however
> unregister_vlan_dev() does:
>
> unregister_netdevice_queue(dev, head);
> dev_put(real_dev);
>
> Which corrupts the still registered vlan device while it is sitting in
> the queue waiting to unregister. So, it is not true that a registered
> vlan device always has working vlan_dev_real_dev().

That's not my reading, unless we have a different definition of
"registered". The RDMA code in question runs from a workqueue, at the
time the UNREGISTER notification is generated all objects are still
alive and no UAF can happen. Past UNREGISTER extra care is needed when
accessing the object.

Note that unregister_vlan_dev() may queue the unregistration, without
running it. If it clears real_dev the UNREGISTER notification will no
longer be able to access real_dev, which used to be completely legal.

2021-10-29 07:05:55

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

> On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
>>> But will make all the callers of vlan_dev_real_dev() feel like they
>>> should NULL-check the result, which is not necessary.
>>
>> Isn't it better to reliably return NULL instead of a silent UAF in
>> this edge case?
>
> I don't know what the best practice is for maintaining sanity of
> unregistered objects.
>
> If there really is a requirement for the real_dev pointer to be sane we
> may want to move the put_device(real_dev) to vlan_dev_free(). There
> should not be any risk of circular dependency but I'm not 100% sure.
>
>>> RDMA must be calling this helper on a vlan which was already
>>> unregistered, can we fix RDMA instead?
>>
>> RDMA holds a get on the netdev which prevents unregistration, however
>> unregister_vlan_dev() does:
>>
>> unregister_netdevice_queue(dev, head);
>> dev_put(real_dev);
>>
>> Which corrupts the still registered vlan device while it is sitting in
>> the queue waiting to unregister. So, it is not true that a registered
>> vlan device always has working vlan_dev_real_dev().
>
> That's not my reading, unless we have a different definition of
> "registered". The RDMA code in question runs from a workqueue, at the
> time the UNREGISTER notification is generated all objects are still
> alive and no UAF can happen. Past UNREGISTER extra care is needed when
> accessing the object.
>
> Note that unregister_vlan_dev() may queue the unregistration, without
> running it. If it clears real_dev the UNREGISTER notification will no
> longer be able to access real_dev, which used to be completely legal.
> .
>

I am sorry. I have made a misunderstanding and given a wrong conclusion
that unregister_vlan_dev() just move the vlan_ndev to a list to unregister
later and it is possible the real_dev has been freed when we access in
netdevice_queue_work().

real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in
vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many()
are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all
alive before real_ndev UNREGISTE finished.

Above is the correction for my previous misunderstanding. But the real
scenario of the problem is as following:

__rtnl_newlink
vlan_newlink
register_vlan_dev(vlan_ndev, ...)
register_netdevice(vlan_ndev)
netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)]
queue_work(gid_cache_wq, ...)
...
rtnl_configure_link(vlan_ndev, ...) [failed]
ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev]
unregister_netdevice_many(&list_kill)
...
ppp_release
unregister_netdevice(real_dev)
ppp_destroy_interface
free_netdev(real_dev)
netdev_freemem(real_dev) [real_dev freed]
...
netdevice_event_work_handler [vlan_ndev NETDEV_REGISTER notifier work]
is_eth_port_of_netdev_filter
vlan_dev_real_dev [real_dev UAF]

So my first solution as following for the problem is correct.
https://lore.kernel.org/linux-rdma/[email protected]/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

Thank you!

2021-10-29 12:18:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Fri, Oct 29, 2021 at 03:04:35PM +0800, Ziyang Xuan (William) wrote:
> > On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
> >>> But will make all the callers of vlan_dev_real_dev() feel like they
> >>> should NULL-check the result, which is not necessary.
> >>
> >> Isn't it better to reliably return NULL instead of a silent UAF in
> >> this edge case?
> >
> > I don't know what the best practice is for maintaining sanity of
> > unregistered objects.
> >
> > If there really is a requirement for the real_dev pointer to be sane we
> > may want to move the put_device(real_dev) to vlan_dev_free(). There
> > should not be any risk of circular dependency but I'm not 100% sure.
> >
> >>> RDMA must be calling this helper on a vlan which was already
> >>> unregistered, can we fix RDMA instead?
> >>
> >> RDMA holds a get on the netdev which prevents unregistration, however
> >> unregister_vlan_dev() does:
> >>
> >> unregister_netdevice_queue(dev, head);
> >> dev_put(real_dev);
> >>
> >> Which corrupts the still registered vlan device while it is sitting in
> >> the queue waiting to unregister. So, it is not true that a registered
> >> vlan device always has working vlan_dev_real_dev().
> >
> > That's not my reading, unless we have a different definition of
> > "registered". The RDMA code in question runs from a workqueue, at the
> > time the UNREGISTER notification is generated all objects are still
> > alive and no UAF can happen. Past UNREGISTER extra care is needed when
> > accessing the object.
> >
> > Note that unregister_vlan_dev() may queue the unregistration, without
> > running it. If it clears real_dev the UNREGISTER notification will no
> > longer be able to access real_dev, which used to be completely legal.
> > .
> >
>
> I am sorry. I have made a misunderstanding and given a wrong conclusion
> that unregister_vlan_dev() just move the vlan_ndev to a list to unregister
> later and it is possible the real_dev has been freed when we access in
> netdevice_queue_work().
>
> real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in
> vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many()
> are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all
> alive before real_ndev UNREGISTE finished.
>
> Above is the correction for my previous misunderstanding. But the real
> scenario of the problem is as following:
>
> __rtnl_newlink
> vlan_newlink
> register_vlan_dev(vlan_ndev, ...)
> register_netdevice(vlan_ndev)
> netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)]
> queue_work(gid_cache_wq, ...)

This is exactly what I'm saying, the rdma code saw a registered device
and captured a ref on it, passing it to a work queue.

> rtnl_configure_link(vlan_ndev, ...) [failed]
> ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev]
/* Get rid of the vlan's reference to real_dev */
dev_put(real_dev);
> unregister_netdevice_many(&list_kill)

Then it released the real_dev reference, leaving a dangled pointer and
goes into unregister_netdevice_many which does:

dev->reg_state = NETREG_UNREGISTERING;
and eventually

net_set_todo(dev);

then unlocks RTNL. The get prevents it from progressing past
NETREG_UNREGISTERING

Now later we touch the vlan dev, it is reg_state UNREGISTERED and it's
memory is corrupted because it dropped the ref it was holding on the
pointer it returns, which has now since been freed.

The only reason the dangled pointer doesn't cause larger problems, is
because rtnl saves it - but continuing to reference a pointer that no
longer has a valid ref is certainly a bad practice.

> So my first solution as following for the problem is correct.
> https://lore.kernel.org/linux-rdma/[email protected]/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

No, it still isn't.

Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
in the work queue, but that feels pretty hacky to me as the main point
of the UNREGISTERING state is to keep the object alive enough that
those with outstanding gets can compelte their work and release the
get. Leaving a wrecked object in UNREGISTERING is a bad design.

Jason

2021-10-29 13:49:52

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
> Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
> in the work queue, but that feels pretty hacky to me as the main point
> of the UNREGISTERING state is to keep the object alive enough that
> those with outstanding gets can compelte their work and release the
> get. Leaving a wrecked object in UNREGISTERING is a bad design.

That or we should investigate if we could hold the ref for real_dev all
the way until vlan_dev_free().

2021-10-29 18:50:51

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Fri, Oct 29, 2021 at 06:46:10AM -0700, Jakub Kicinski wrote:
> On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
> > Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
> > in the work queue, but that feels pretty hacky to me as the main point
> > of the UNREGISTERING state is to keep the object alive enough that
> > those with outstanding gets can compelte their work and release the
> > get. Leaving a wrecked object in UNREGISTERING is a bad design.
>
> That or we should investigate if we could hold the ref for real_dev all
> the way until vlan_dev_free().

The latter is certainly better if it works out, no circular deps, etc.

Thanks,
Jason

2021-10-30 04:10:45

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

> On Fri, Oct 29, 2021 at 06:46:10AM -0700, Jakub Kicinski wrote:
>> On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
>>> Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
>>> in the work queue, but that feels pretty hacky to me as the main point
>>> of the UNREGISTERING state is to keep the object alive enough that
>>> those with outstanding gets can compelte their work and release the
>>> get. Leaving a wrecked object in UNREGISTERING is a bad design.
>>
>> That or we should investigate if we could hold the ref for real_dev all
>> the way until vlan_dev_free().
>

Synchronize test results with the following modification:

@@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
}

vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
-
- /* Get rid of the vlan's reference to real_dev */
- dev_put(real_dev);
}

@@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)

free_percpu(vlan->vlan_pcpu_stats);
vlan->vlan_pcpu_stats = NULL;
+
+ /* Get rid of the vlan's reference to real_dev */
+ dev_put(vlan->real_dev);
}

It works on the UAF problem. And I have taken kmemleak tests for vlan_dev and real_dev,
no kmemleak problem and new UAF problem.

So take this modification for this problem?

> The latter is certainly better if it works out, no circular deps, etc.
>
> Thanks,
> Jason
> .
>