2021-11-02 01:58:23

by Ziyang Xuan (William)

[permalink] [raw]
Subject: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

The real_dev of a vlan net_device may be freed after
unregister_vlan_dev(). Access the real_dev continually by
vlan_dev_real_dev() will trigger the UAF problem for the
real_dev like following:

==================================================================
BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
Call Trace:
kasan_report.cold+0x83/0xdf
vlan_dev_real_dev+0xf9/0x120
is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
is_eth_port_of_netdev_filter+0x28/0x40
ib_enum_roce_netdev+0x1a3/0x300
ib_enum_all_roce_netdevs+0xc7/0x140
netdevice_event_work_handler+0x9d/0x210
...

Freed by task 9288:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xfc/0x130
slab_free_freelist_hook+0xdd/0x240
kfree+0xe4/0x690
kvfree+0x42/0x50
device_release+0x9f/0x240
kobject_put+0x1c8/0x530
put_device+0x1b/0x30
free_netdev+0x370/0x540
ppp_destroy_interface+0x313/0x3d0
...

Move the put_device(real_dev) to vlan_dev_free(). Ensure
real_dev not be freed before vlan_dev unregistered.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Signed-off-by: Ziyang Xuan <[email protected]>
---
net/8021q/vlan.c | 3 ---
net/8021q/vlan_dev.c | 3 +++
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 55275ef9a31a..a3a0a5e994f5 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
}

vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
-
- /* Get rid of the vlan's reference to real_dev */
- dev_put(real_dev);
}

int vlan_check_real_dev(struct net_device *real_dev,
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 0c21d1fec852..aeeb5f90417b 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)

free_percpu(vlan->vlan_pcpu_stats);
vlan->vlan_pcpu_stats = NULL;
+
+ /* Get rid of the vlan's reference to real_dev */
+ dev_put(vlan->real_dev);
}

void vlan_setup(struct net_device *dev)
--
2.25.1


2021-11-03 13:53:04

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

On Tue, Nov 02, 2021 at 10:12:18AM +0800, Ziyang Xuan wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
>
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
> kasan_report.cold+0x83/0xdf
> vlan_dev_real_dev+0xf9/0x120
> is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> is_eth_port_of_netdev_filter+0x28/0x40
> ib_enum_roce_netdev+0x1a3/0x300
> ib_enum_all_roce_netdevs+0xc7/0x140
> netdevice_event_work_handler+0x9d/0x210
> ...
>
> Freed by task 9288:
> kasan_save_stack+0x1b/0x40
> kasan_set_track+0x1c/0x30
> kasan_set_free_info+0x20/0x30
> __kasan_slab_free+0xfc/0x130
> slab_free_freelist_hook+0xdd/0x240
> kfree+0xe4/0x690
> kvfree+0x42/0x50
> device_release+0x9f/0x240
> kobject_put+0x1c8/0x530
> put_device+0x1b/0x30
> free_netdev+0x370/0x540
> ppp_destroy_interface+0x313/0x3d0
> ...
>
> Move the put_device(real_dev) to vlan_dev_free(). Ensure
> real_dev not be freed before vlan_dev unregistered.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: [email protected]
> Signed-off-by: Ziyang Xuan <[email protected]>
> ---
> net/8021q/vlan.c | 3 ---
> net/8021q/vlan_dev.c | 3 +++
> 2 files changed, 3 insertions(+), 3 deletions(-)

Suggested-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>

(though I can't tell either if there is a possiblecircular dep problem
in some oddball case)

Jason

2021-11-03 14:32:25

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:

On Tue, 2 Nov 2021 10:12:18 +0800 you wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
>
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
> kasan_report.cold+0x83/0xdf
> vlan_dev_real_dev+0xf9/0x120
> is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> is_eth_port_of_netdev_filter+0x28/0x40
> ib_enum_roce_netdev+0x1a3/0x300
> ib_enum_all_roce_netdevs+0xc7/0x140
> netdevice_event_work_handler+0x9d/0x210
> ...
>
> [...]

Here is the summary with links:
- [net,v2] net: vlan: fix a UAF in vlan_dev_real_dev()
https://git.kernel.org/netdev/net/c/563bcbae3ba2

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


2021-11-03 15:50:28

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> (though I can't tell either if there is a possiblecircular dep problem
> in some oddball case)

Same, luckily we're just starting a new dev cycle and syzbot can have
at it.

We should probably not let this patch get into stable right away -
assuming you agree - would you mind nacking the selection if it happens?
I'm not sure I'll get CCed since it doesn't have my tags.

2021-11-03 16:14:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

On Wed, Nov 03, 2021 at 08:47:46AM -0700, Jakub Kicinski wrote:
> On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> > (though I can't tell either if there is a possiblecircular dep problem
> > in some oddball case)
>
> Same, luckily we're just starting a new dev cycle and syzbot can have
> at it.
>
> We should probably not let this patch get into stable right away -
> assuming you agree - would you mind nacking the selection if it happens?
> I'm not sure I'll get CCed since it doesn't have my tags.

I will make an effort, sure. It is hard to be confident due to all the
stable selection emails I get :|

Thanks,
Jason

2021-11-15 17:06:13

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Ziyang Xuan <[email protected]> writes:

> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 55275ef9a31a..a3a0a5e994f5 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> }
>
> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> -
> - /* Get rid of the vlan's reference to real_dev */
> - dev_put(real_dev);
> }
>
> int vlan_check_real_dev(struct net_device *real_dev,
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 0c21d1fec852..aeeb5f90417b 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>
> free_percpu(vlan->vlan_pcpu_stats);
> vlan->vlan_pcpu_stats = NULL;
> +
> + /* Get rid of the vlan's reference to real_dev */
> + dev_put(vlan->real_dev);
> }
>
> void vlan_setup(struct net_device *dev)

This is causing reference counting issues when vetoing is involved.
Consider the following snippet:

ip link add name bond1 type bond mode 802.3ad
ip link set dev swp1 master bond1
ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
# ^ vetoed, no netdevice created
ip link del dev bond1

The setup process goes like this: vlan_newlink() calls
register_vlan_dev() calls netdev_upper_dev_link() calls
__netdev_upper_dev_link(), which issues a notifier
NETDEV_PRECHANGEUPPER, which yields a non-zero error,
because a listener vetoed it.

So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
up decreasing reference count of the real_dev. Then when when the bond
netdevice is removed, we get an endless loop of:

kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0

Moving the dev_hold(real_dev) to always happen even if the
netdev_upper_dev_link() call makes the issue go away.

I'm not sure why this wasn't happening before. After the veto,
register_vlan_dev() follows with a goto out_unregister_netdev, which
calls unregister_netdevice() calls unregister_netdevice_queue(), which
issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
which calls unregister_vlan_dev(), which used to dev_put(real_dev),
which seems like it should have caused the same issue. Dunno.

2021-11-16 01:54:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
> Ziyang Xuan <[email protected]> writes:
>
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..a3a0a5e994f5 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> > }
> >
> > vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> > -
> > - /* Get rid of the vlan's reference to real_dev */
> > - dev_put(real_dev);
> > }
> >
> > int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> > index 0c21d1fec852..aeeb5f90417b 100644
> > --- a/net/8021q/vlan_dev.c
> > +++ b/net/8021q/vlan_dev.c
> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
> >
> > free_percpu(vlan->vlan_pcpu_stats);
> > vlan->vlan_pcpu_stats = NULL;
> > +
> > + /* Get rid of the vlan's reference to real_dev */
> > + dev_put(vlan->real_dev);
> > }
> >
> > void vlan_setup(struct net_device *dev)
>
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
>
> ip link add name bond1 type bond mode 802.3ad
> ip link set dev swp1 master bond1
> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
> # ^ vetoed, no netdevice created
> ip link del dev bond1
>
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
>
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
>
> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0
>
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
>
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

Does the notifier trigger unregister_vlan_dev()? I thought the notifier
triggers when lower dev is unregistered.

I think we should move the dev_hold() to ndo_init(), otherwise
it's hard to reason if destructor was invoked or not if
register_netdevice() errors out.

2021-11-16 14:21:14

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Jakub Kicinski <[email protected]> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>
>> I'm not sure why this wasn't happening before. After the veto,
>> register_vlan_dev() follows with a goto out_unregister_netdev, which
>> calls unregister_netdevice() calls unregister_netdevice_queue(), which
>> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
>> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
>> which seems like it should have caused the same issue. Dunno.
>
> Does the notifier trigger unregister_vlan_dev()? I thought the notifier
> triggers when lower dev is unregistered.

Right, I misinterpreted this bit:

vlan_info = rtnl_dereference(dev->vlan_info);
if (!vlan_info)
goto out;

2021-11-17 11:51:11

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Jakub Kicinski <[email protected]> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>> Ziyang Xuan <[email protected]> writes:
>>
>> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> > index 55275ef9a31a..a3a0a5e994f5 100644
>> > --- a/net/8021q/vlan.c
>> > +++ b/net/8021q/vlan.c
>> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>> > }
>> >
>> > vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> > -
>> > - /* Get rid of the vlan's reference to real_dev */
>> > - dev_put(real_dev);
>> > }
>> >
>> > int vlan_check_real_dev(struct net_device *real_dev,
>> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> > index 0c21d1fec852..aeeb5f90417b 100644
>> > --- a/net/8021q/vlan_dev.c
>> > +++ b/net/8021q/vlan_dev.c
>> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>> >
>> > free_percpu(vlan->vlan_pcpu_stats);
>> > vlan->vlan_pcpu_stats = NULL;
>> > +
>> > + /* Get rid of the vlan's reference to real_dev */
>> > + dev_put(vlan->real_dev);
>> > }
>> >
>> > void vlan_setup(struct net_device *dev)
>>
>> This is causing reference counting issues when vetoing is involved.
>> Consider the following snippet:
>>
>> ip link add name bond1 type bond mode 802.3ad
>> ip link set dev swp1 master bond1
>> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>> # ^ vetoed, no netdevice created
>> ip link del dev bond1
>>
>> The setup process goes like this: vlan_newlink() calls
>> register_vlan_dev() calls netdev_upper_dev_link() calls
>> __netdev_upper_dev_link(), which issues a notifier
>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>> because a listener vetoed it.
>>
>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>> up decreasing reference count of the real_dev. Then when when the bond
>> netdevice is removed, we get an endless loop of:
>>
>> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0
>>
>> Moving the dev_hold(real_dev) to always happen even if the
>> netdev_upper_dev_link() call makes the issue go away.
>
> I think we should move the dev_hold() to ndo_init(), otherwise
> it's hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

Ziyang Xuan, do you intend to take care of this?

2021-11-18 01:46:31

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

>
> Jakub Kicinski <[email protected]> writes:
>
>> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>>> Ziyang Xuan <[email protected]> writes:
>>>
>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>>> index 55275ef9a31a..a3a0a5e994f5 100644
>>>> --- a/net/8021q/vlan.c
>>>> +++ b/net/8021q/vlan.c
>>>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>>> }
>>>>
>>>> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>>>> -
>>>> - /* Get rid of the vlan's reference to real_dev */
>>>> - dev_put(real_dev);
>>>> }
>>>>
>>>> int vlan_check_real_dev(struct net_device *real_dev,
>>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>>> index 0c21d1fec852..aeeb5f90417b 100644
>>>> --- a/net/8021q/vlan_dev.c
>>>> +++ b/net/8021q/vlan_dev.c
>>>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>>>
>>>> free_percpu(vlan->vlan_pcpu_stats);
>>>> vlan->vlan_pcpu_stats = NULL;
>>>> +
>>>> + /* Get rid of the vlan's reference to real_dev */
>>>> + dev_put(vlan->real_dev);
>>>> }
>>>>
>>>> void vlan_setup(struct net_device *dev)
>>>
>>> This is causing reference counting issues when vetoing is involved.
>>> Consider the following snippet:
>>>
>>> ip link add name bond1 type bond mode 802.3ad
>>> ip link set dev swp1 master bond1
>>> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>> # ^ vetoed, no netdevice created
>>> ip link del dev bond1
>>>
>>> The setup process goes like this: vlan_newlink() calls
>>> register_vlan_dev() calls netdev_upper_dev_link() calls
>>> __netdev_upper_dev_link(), which issues a notifier
>>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>>> because a listener vetoed it.
>>>
>>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>>> up decreasing reference count of the real_dev. Then when when the bond
>>> netdevice is removed, we get an endless loop of:
>>>
>>> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0
>>>
>>> Moving the dev_hold(real_dev) to always happen even if the
>>> netdev_upper_dev_link() call makes the issue go away.
>>
>> I think we should move the dev_hold() to ndo_init(), otherwise
>> it's hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
>
> Ziyang Xuan, do you intend to take care of this?
> .

I am reading the related processes according to the problem scenario.
And I will give a more clear sequence and root cause as soon as possible
by some necessary tests.

Thank you!

2021-11-18 14:17:41

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
> >> I think we should move the dev_hold() to ndo_init(), otherwise
> >> it's hard to reason if destructor was invoked or not if
> >> register_netdevice() errors out.
> >
> > Ziyang Xuan, do you intend to take care of this?
> > .
>
> I am reading the related processes according to the problem scenario.
> And I will give a more clear sequence and root cause as soon as possible
> by some necessary tests.

Okay, I still don't see the fix.

Petr would you mind submitting since you have a repro handy?

2021-11-19 03:05:01

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

>
> Ziyang Xuan <[email protected]> writes:
>
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index 55275ef9a31a..a3a0a5e994f5 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>> }
>>
>> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> -
>> - /* Get rid of the vlan's reference to real_dev */
>> - dev_put(real_dev);
>> }
>>
>> int vlan_check_real_dev(struct net_device *real_dev,
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index 0c21d1fec852..aeeb5f90417b 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>
>> free_percpu(vlan->vlan_pcpu_stats);
>> vlan->vlan_pcpu_stats = NULL;
>> +
>> + /* Get rid of the vlan's reference to real_dev */
>> + dev_put(vlan->real_dev);
>> }
>>
>> void vlan_setup(struct net_device *dev)
>
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
>
> ip link add name bond1 type bond mode 802.3ad
> ip link set dev swp1 master bond1
> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
> # ^ vetoed, no netdevice created
> ip link del dev bond1
>
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
>
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
>
> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0
>
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
>
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

netdev_upper_dev_link() failure in register_vlan_dev() will result in
no dev_hold(real_dev) and vlan_group_set_device(). So NETDEV_UNREGISTER
notifier caused by vlan_dev invokes vlan_device_event() will not find the
vlan_dev in vlan_group, and no unregister_vlan_dev() for the vlan_dev.

>

2021-11-19 03:29:06

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

> On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
>>>> I think we should move the dev_hold() to ndo_init(), otherwise
>>>> it's hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.
>>>
>>> Ziyang Xuan, do you intend to take care of this?
>>> .
>>
>> I am reading the related processes according to the problem scenario.
>> And I will give a more clear sequence and root cause as soon as possible
>> by some necessary tests.
>
> Okay, I still don't see the fix.
>
> Petr would you mind submitting since you have a repro handy?
> .
The sequence for the problem maybe as following:

=============================================================
# create vlan device
vlan_newlink [assume real_dev refcnt == 2 just referenced by itself]
register_vlan_dev
register_netdevice(vlan_dev) [success]
netdev_upper_dev_link [failed]
unregister_netdevice(vlan_dev) [failure process]
...
netdev_run_todo [vlan_dev]
vlan_dev_free(vlan_dev) [priv_destructor cb]
dev_put(real_dev) [real_dev refcnt == 1]

# delete real device
unregister_netdevice(real_dev) [real_dev refcnt == 1]
unregister_netdevice_many
dev_put(real_dev) [real_dev refcnt == 0]
net_set_todo(real_dev)
...
netdev_run_todo [real_dev]
netdev_wait_allrefs(real_dev) [real_dev refcnt == 0]
pr_emerg("unregister_netdevice: ...", ...)

I am thinking about how to fix the problem. priv_destructor() of the
net_device referenced not only by net_set_todo() but also failure process
in register_netdevice().

I need some time to test my some ideas. And anyone has good ideas, please
do not be stingy.

Thank you!



2021-11-19 10:07:11

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Ziyang Xuan (William) <[email protected]> writes:

> I need some time to test my some ideas. And anyone has good ideas, please
> do not be stingy.

Jakub Kicinski <[email protected]> writes:

> I think we should move the dev_hold() to ndo_init(), otherwise it's
> hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

That makes sense to me. We always put real_dev in the destructor, so we
should always hold it in the constructor...

2021-11-23 09:01:33

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

>
> Ziyang Xuan (William) <[email protected]> writes:
>
>> I need some time to test my some ideas. And anyone has good ideas, please
>> do not be stingy.
>
> Jakub Kicinski <[email protected]> writes:
>
>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>> hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
>
> That makes sense to me. We always put real_dev in the destructor, so we
> should always hold it in the constructor...

Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
the following testcase:

ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
ip link del dev dummy1

Make the problem repro. The problem is solved using the following fix
according to the Jakub's suggestion:

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index a3a0a5e994f5..abaa5d96ded2 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
if (err)
goto out_unregister_netdev;

- /* Account for reference in struct vlan_dev_priv */
- dev_hold(real_dev);
-
vlan_stacked_transfer_operstate(real_dev, dev, vlan);
linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index ab6dee28536d..a54535cbcf4c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
if (!vlan->vlan_pcpu_stats)
return -ENOMEM;

+ /* Get vlan's reference to real_dev */
+ dev_hold(real_dev);


If there is not any other idea and objection, I will send the fix patch later.

Thank you!

2021-11-23 12:36:11

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Ziyang Xuan (William) <[email protected]> writes:

>>
>> Ziyang Xuan (William) <[email protected]> writes:
>>
>>> I need some time to test my some ideas. And anyone has good ideas, please
>>> do not be stingy.
>>
>> Jakub Kicinski <[email protected]> writes:
>>
>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>> hard to reason if destructor was invoked or not if
>>> register_netdevice() errors out.
>>
>> That makes sense to me. We always put real_dev in the destructor, so we
>> should always hold it in the constructor...
>
> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
> the following testcase:
>
> ip link add dev dummy1 type dummy
> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
> ip link del dev dummy1
>
> Make the problem repro. The problem is solved using the following fix
> according to the Jakub's suggestion:
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index a3a0a5e994f5..abaa5d96ded2 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
> if (err)
> goto out_unregister_netdev;
>
> - /* Account for reference in struct vlan_dev_priv */
> - dev_hold(real_dev);
> -
> vlan_stacked_transfer_operstate(real_dev, dev, vlan);
> linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index ab6dee28536d..a54535cbcf4c 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
> if (!vlan->vlan_pcpu_stats)
> return -ENOMEM;
>
> + /* Get vlan's reference to real_dev */
> + dev_hold(real_dev);
>
>
> If there is not any other idea and objection, I will send the fix patch later.
>
> Thank you!

This fixes the issue in my repro, and does not seems to break anything.
We'll take it to full regression overnight, I'll report back tomorrow
about the result.

2021-11-25 11:35:25

by Petr Machata

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()


Petr Machata <[email protected]> writes:

> Ziyang Xuan (William) <[email protected]> writes:
>
>>>
>>> Ziyang Xuan (William) <[email protected]> writes:
>>>
>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>> do not be stingy.
>>>
>>> Jakub Kicinski <[email protected]> writes:
>>>
>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>> hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.
>>>
>>> That makes sense to me. We always put real_dev in the destructor, so we
>>> should always hold it in the constructor...
>>
>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>> the following testcase:
>>
>> ip link add dev dummy1 type dummy
>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>> ip link del dev dummy1
>>
>> Make the problem repro. The problem is solved using the following fix
>> according to the Jakub's suggestion:
>>
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index a3a0a5e994f5..abaa5d96ded2 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>> if (err)
>> goto out_unregister_netdev;
>>
>> - /* Account for reference in struct vlan_dev_priv */
>> - dev_hold(real_dev);
>> -
>> vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>> linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index ab6dee28536d..a54535cbcf4c 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>> if (!vlan->vlan_pcpu_stats)
>> return -ENOMEM;
>>
>> + /* Get vlan's reference to real_dev */
>> + dev_hold(real_dev);
>>
>>
>> If there is not any other idea and objection, I will send the fix patch later.
>>
>> Thank you!
>
> This fixes the issue in my repro, and does not seems to break anything.
> We'll take it to full regression overnight, I'll report back tomorrow
> about the result.

Sorry, was AFK yesterday. It does look good.

2021-11-26 01:50:31

by Ziyang Xuan (William)

[permalink] [raw]
Subject: Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

>
> Petr Machata <[email protected]> writes:
>
>> Ziyang Xuan (William) <[email protected]> writes:
>>
>>>>
>>>> Ziyang Xuan (William) <[email protected]> writes:
>>>>
>>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>>> do not be stingy.
>>>>
>>>> Jakub Kicinski <[email protected]> writes:
>>>>
>>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>>> hard to reason if destructor was invoked or not if
>>>>> register_netdevice() errors out.
>>>>
>>>> That makes sense to me. We always put real_dev in the destructor, so we
>>>> should always hold it in the constructor...
>>>
>>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>>> the following testcase:
>>>
>>> ip link add dev dummy1 type dummy
>>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>>> ip link del dev dummy1
>>>
>>> Make the problem repro. The problem is solved using the following fix
>>> according to the Jakub's suggestion:
>>>
>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>> index a3a0a5e994f5..abaa5d96ded2 100644
>>> --- a/net/8021q/vlan.c
>>> +++ b/net/8021q/vlan.c
>>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>>> if (err)
>>> goto out_unregister_netdev;
>>>
>>> - /* Account for reference in struct vlan_dev_priv */
>>> - dev_hold(real_dev);
>>> -
>>> vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>>> linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>>
>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>> index ab6dee28536d..a54535cbcf4c 100644
>>> --- a/net/8021q/vlan_dev.c
>>> +++ b/net/8021q/vlan_dev.c
>>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>>> if (!vlan->vlan_pcpu_stats)
>>> return -ENOMEM;
>>>
>>> + /* Get vlan's reference to real_dev */
>>> + dev_hold(real_dev);
>>>
>>>
>>> If there is not any other idea and objection, I will send the fix patch later.
>>>
>>> Thank you!
>>
>> This fixes the issue in my repro, and does not seems to break anything.
>> We'll take it to full regression overnight, I'll report back tomorrow
>> about the result.
>
> Sorry, was AFK yesterday. It does look good.
> .

Thank you for your efforts very much! I have sent the fix patch.