2019-04-28 03:08:31

by Yue Haibing

[permalink] [raw]
Subject: [PATCH] tun: Fix use-after-free in tun_net_xmit

From: YueHaibing <[email protected]>

KASAN report this:

BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0

CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xca/0x13e lib/dump_stack.c:113
print_address_description+0x79/0x330 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
__netdev_start_xmit include/linux/netdevice.h:4300 [inline]
netdev_start_xmit include/linux/netdevice.h:4309 [inline]
xmit_one net/core/dev.c:3243 [inline]
dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
qdisc_restart net/sched/sch_generic.c:390 [inline]
__qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
qdisc_run include/net/pkt_sched.h:120 [inline]
__dev_xmit_skb net/core/dev.c:3438 [inline]
__dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
neigh_output include/net/neighbour.h:501 [inline]
ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
NF_HOOK_COND include/linux/netfilter.h:278 [inline]
ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
dst_output include/net/dst.h:444 [inline]
NF_HOOK include/linux/netfilter.h:289 [inline]
mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
mld_send_cr net/ipv6/mcast.c:1979 [inline]
mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers kernel/time/timer.c:1682 [inline]
run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
__do_softirq+0x26d/0xabd kernel/softirq.c:292
invoke_softirq kernel/softirq.c:372 [inline]
irq_exit+0x209/0x290 kernel/softirq.c:412
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
</IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
cpuidle_idle_call kernel/sched/idle.c:153 [inline]
do_idle+0x2ca/0x420 kernel/sched/idle.c:262
cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

Allocated by task 19764:
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
__kmalloc+0x11b/0x2d0 mm/slub.c:3750
kmalloc include/linux/slab.h:518 [inline]
sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
sk_alloc+0x3d/0xc00 net/core/sock.c:1523
tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
misc_open+0x367/0x4e0 drivers/char/misc.c:141
chrdev_open+0x212/0x580 fs/char_dev.c:417
do_dentry_open+0x704/0x1050 fs/open.c:777
do_last fs/namei.c:3418 [inline]
path_openat+0x7ed/0x2ae0 fs/namei.c:3533
do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
do_sys_open+0x307/0x430 fs/open.c:1069
do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19764:
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
slab_free_hook mm/slub.c:1370 [inline]
slab_free_freelist_hook mm/slub.c:1397 [inline]
slab_free mm/slub.c:2952 [inline]
kfree+0xeb/0x2f0 mm/slub.c:3905
sk_prot_free net/core/sock.c:1506 [inline]
__sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
sk_destruct+0x48/0x70 net/core/sock.c:1596
__sk_free+0xa9/0x270 net/core/sock.c:1607
sk_free+0x2a/0x30 net/core/sock.c:1618
sock_put include/net/sock.h:1696 [inline]
__tun_detach+0x464/0xf70 drivers/net/tun.c:735
tun_detach drivers/net/tun.c:747 [inline]
tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
__fput+0x27f/0x7f0 fs/file_table.c:278
task_work_run+0x136/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:193 [inline]
exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88836cc26600
which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 1136 bytes inside of
4096-byte region [ffff88836cc26600, ffff88836cc27600)
The buggy address belongs to the page:
page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

If tun driver have multiqueues, user close the last queue by
tun_detach, then tun->tfiles[index] is not cleared. Then a new
queue may add to the tun, which using rcu_assign_pointer
tun->tfiles[index] to the new tfile and increase the numqueues.
However if there send a packet during this time, which picking the last
queue, it may uses the old tun->tfiles[index], beacause there no
RCU grace period.

1) tun_chr_close //close the last queue
--> __tun_detach //close the last queue, but tun->tfiles[index] still exist

2) tun_chr_open //attach a new queue
--> tun_attach
-->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
//there need a RCU grace period

-->tun->numqueues++;

3) tun_net_xmit //a new packet is sending, which pick the last queue
-->if (txq >= tun->numqueues)
//above check passed, but tfile still not renew
-->if (tfile->socket.sk->sk_filter ...
//use the old tfile,trigger use-after-free

Reported-by: Hulk Robot <[email protected]>
Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
Signed-off-by: YueHaibing <[email protected]>
---
drivers/net/tun.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c0..3770aba 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
*/
rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
+ synchronize_net();
tun->numqueues++;
tun_set_real_num_queues(tun);
out:
--
2.7.0



2019-04-28 03:26:22

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit


On 2019/4/28 上午11:05, Yue Haibing wrote:
> From: YueHaibing <[email protected]>
>
> KASAN report this:
>
> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xca/0x13e lib/dump_stack.c:113
> print_address_description+0x79/0x330 mm/kasan/report.c:253
> kasan_report_error mm/kasan/report.c:351 [inline]
> kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
> tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
> netdev_start_xmit include/linux/netdevice.h:4309 [inline]
> xmit_one net/core/dev.c:3243 [inline]
> dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
> sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
> qdisc_restart net/sched/sch_generic.c:390 [inline]
> __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
> qdisc_run include/net/pkt_sched.h:120 [inline]
> __dev_xmit_skb net/core/dev.c:3438 [inline]
> __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
> neigh_output include/net/neighbour.h:501 [inline]
> ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
> ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
> NF_HOOK_COND include/linux/netfilter.h:278 [inline]
> ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
> dst_output include/net/dst.h:444 [inline]
> NF_HOOK include/linux/netfilter.h:289 [inline]
> mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
> mld_send_cr net/ipv6/mcast.c:1979 [inline]
> mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
> call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
> expire_timers kernel/time/timer.c:1363 [inline]
> __run_timers kernel/time/timer.c:1682 [inline]
> run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
> __do_softirq+0x26d/0xabd kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:372 [inline]
> irq_exit+0x209/0x290 kernel/softirq.c:412
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
> </IRQ>
> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
> arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
> default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
> cpuidle_idle_call kernel/sched/idle.c:153 [inline]
> do_idle+0x2ca/0x420 kernel/sched/idle.c:262
> cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
> start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>
> Allocated by task 19764:
> set_track mm/kasan/kasan.c:460 [inline]
> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
> __kmalloc+0x11b/0x2d0 mm/slub.c:3750
> kmalloc include/linux/slab.h:518 [inline]
> sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
> sk_alloc+0x3d/0xc00 net/core/sock.c:1523
> tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
> misc_open+0x367/0x4e0 drivers/char/misc.c:141
> chrdev_open+0x212/0x580 fs/char_dev.c:417
> do_dentry_open+0x704/0x1050 fs/open.c:777
> do_last fs/namei.c:3418 [inline]
> path_openat+0x7ed/0x2ae0 fs/namei.c:3533
> do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
> do_sys_open+0x307/0x430 fs/open.c:1069
> do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 19764:
> set_track mm/kasan/kasan.c:460 [inline]
> __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
> slab_free_hook mm/slub.c:1370 [inline]
> slab_free_freelist_hook mm/slub.c:1397 [inline]
> slab_free mm/slub.c:2952 [inline]
> kfree+0xeb/0x2f0 mm/slub.c:3905
> sk_prot_free net/core/sock.c:1506 [inline]
> __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
> sk_destruct+0x48/0x70 net/core/sock.c:1596
> __sk_free+0xa9/0x270 net/core/sock.c:1607
> sk_free+0x2a/0x30 net/core/sock.c:1618
> sock_put include/net/sock.h:1696 [inline]
> __tun_detach+0x464/0xf70 drivers/net/tun.c:735
> tun_detach drivers/net/tun.c:747 [inline]
> tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
> __fput+0x27f/0x7f0 fs/file_table.c:278
> task_work_run+0x136/0x1b0 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:193 [inline]
> exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
> do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff88836cc26600
> which belongs to the cache kmalloc-4096 of size 4096
> The buggy address is located 1136 bytes inside of
> 4096-byte region [ffff88836cc26600, ffff88836cc27600)
> The buggy address belongs to the page:
> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
> flags: 0x2fffff80008100(slab|head)
> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.
>
> 1) tun_chr_close //close the last queue
> --> __tun_detach //close the last queue, but tun->tfiles[index] still exist
>
> 2) tun_chr_open //attach a new queue
> --> tun_attach
> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> //there need a RCU grace period
>
> -->tun->numqueues++;
>
> 3) tun_net_xmit //a new packet is sending, which pick the last queue
> -->if (txq >= tun->numqueues)
> //above check passed, but tfile still not renew
> -->if (tfile->socket.sk->sk_filter ...
> //use the old tfile,trigger use-after-free
>
> Reported-by: Hulk Robot <[email protected]>
> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> Signed-off-by: YueHaibing <[email protected]>
> ---
> drivers/net/tun.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
> */
> rcu_assign_pointer(tfile->tun, tun);
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> + synchronize_net();
> tun->numqueues++;
> tun_set_real_num_queues(tun);
> out:


Good catch, that's one of my suspicion as well. Assume that this has
been tested.

Acked-by: Jason Wang <[email protected]>

This patch is needed for -stable.

I will post another theoretical issue similar to this soon.

Thanks

2019-04-28 04:28:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit


On 2019/4/28 上午11:24, Jason Wang wrote:
>
> On 2019/4/28 上午11:05, Yue Haibing wrote:
>> From: YueHaibing <[email protected]>
>>
>> KASAN report this:
>>
>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750
>> drivers/net/tun.c:1104
>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>
>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.10.2-1ubuntu1 04/01/2014
>> Call Trace:
>>   <IRQ>
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>>   xmit_one net/core/dev.c:3243 [inline]
>>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>>   qdisc_run include/net/pkt_sched.h:120 [inline]
>>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>>   neigh_output include/net/neighbour.h:501 [inline]
>>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>>   dst_output include/net/dst.h:444 [inline]
>>   NF_HOOK include/linux/netfilter.h:289 [inline]
>>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>>   expire_timers kernel/time/timer.c:1363 [inline]
>>   __run_timers kernel/time/timer.c:1682 [inline]
>>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>>   invoke_softirq kernel/softirq.c:372 [inline]
>>   irq_exit+0x209/0x290 kernel/softirq.c:412
>>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>>   </IRQ>
>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9
>> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3>
>> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>>
>> Allocated by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>>   kmalloc include/linux/slab.h:518 [inline]
>>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>>   do_dentry_open+0x704/0x1050 fs/open.c:777
>>   do_last fs/namei.c:3418 [inline]
>>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>>   do_sys_open+0x307/0x430 fs/open.c:1069
>>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Freed by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>>   slab_free_hook mm/slub.c:1370 [inline]
>>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>>   slab_free mm/slub.c:2952 [inline]
>>   kfree+0xeb/0x2f0 mm/slub.c:3905
>>   sk_prot_free net/core/sock.c:1506 [inline]
>>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>>   sk_free+0x2a/0x30 net/core/sock.c:1618
>>   sock_put include/net/sock.h:1696 [inline]
>>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>>   tun_detach drivers/net/tun.c:747 [inline]
>>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>>   __fput+0x27f/0x7f0 fs/file_table.c:278
>>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> The buggy address belongs to the object at ffff88836cc26600
>>   which belongs to the cache kmalloc-4096 of size 4096
>> The buggy address is located 1136 bytes inside of
>>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
>> The buggy address belongs to the page:
>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600
>> index:0x0 compound_mapcount: 0
>> flags: 0x2fffff80008100(slab|head)
>> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
>> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                               ^
>>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>> If tun driver have multiqueues, user close the last queue by
>> tun_detach, then tun->tfiles[index] is not cleared. Then a new
>> queue may add to the tun, which using rcu_assign_pointer
>> tun->tfiles[index] to the new tfile and increase the numqueues.
>> However if there send a packet during this time, which picking the last
>> queue, it may uses the old tun->tfiles[index], beacause there no
>> RCU grace period.
>>
>> 1) tun_chr_close //close the last queue
>>      --> __tun_detach  //close the last queue, but tun->tfiles[index]
>> still exist
>>
>> 2) tun_chr_open  //attach a new queue
>>      --> tun_attach
>> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>       //there need a RCU grace period
>>
>>      -->tun->numqueues++;
>>
>> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>>       -->if (txq >= tun->numqueues)
>>        //above check passed, but tfile still not renew
>>       -->if (tfile->socket.sk->sk_filter ...
>>        //use the old tfile,trigger use-after-free
>>
>> Reported-by: Hulk Robot <[email protected]>
>> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
>> Signed-off-by: YueHaibing <[email protected]>
>> ---
>>   drivers/net/tun.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index e9ca1c0..3770aba 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun,
>> struct file *file,
>>        */
>>       rcu_assign_pointer(tfile->tun, tun);
>>       rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>> +    synchronize_net();
>>       tun->numqueues++;
>>       tun_set_real_num_queues(tun);
>>   out:
>
>
> Good catch, that's one of my suspicion as well. Assume that this has
> been tested.
>
> Acked-by: Jason Wang <[email protected]>
>
> This patch is needed for -stable.
>
> I will post another theoretical issue similar to this soon.
>
> Thanks
>

Ok, a second thought. All evil come from the access of tun->numqueues
without sufficient synchronization. This is a partial fix, another
possible case is __tun_detach(), we need either

1) synchronize through pointers in tfiles

2) or smp_load_acquire()/smp_store_release() to solve it completely.

Let me post a complete fix.

Thanks

2019-04-28 05:46:42

by Wei Yongjun

[permalink] [raw]
Subject: RE: [PATCH] tun: Fix use-after-free in tun_net_xmit

Hi Jason,

> On 2019/4/28 上午11:24, Jason Wang wrote:
> >
> > On 2019/4/28 上午11:05, Yue Haibing wrote:
> >> From: YueHaibing <[email protected]>
> >>
> >> KASAN report this:
> >>
> >> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750
> >> drivers/net/tun.c:1104
> >> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
> >>
> >> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> >> 1.10.2-1ubuntu1 04/01/2014
> >> Call Trace:
> >>   <IRQ>
> >>   __dump_stack lib/dump_stack.c:77 [inline]
> >>   dump_stack+0xca/0x13e lib/dump_stack.c:113
> >>   print_address_description+0x79/0x330 mm/kasan/report.c:253
> >>   kasan_report_error mm/kasan/report.c:351 [inline]
> >>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
> >>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> >>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
> >>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
> >>   xmit_one net/core/dev.c:3243 [inline]
> >>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
> >>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
> >>   qdisc_restart net/sched/sch_generic.c:390 [inline]
> >>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
> >>   qdisc_run include/net/pkt_sched.h:120 [inline]
> >>   __dev_xmit_skb net/core/dev.c:3438 [inline]
> >>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
> >>   neigh_output include/net/neighbour.h:501 [inline]
> >>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
> >>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
> >>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
> >>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
> >>   dst_output include/net/dst.h:444 [inline]
> >>   NF_HOOK include/linux/netfilter.h:289 [inline]
> >>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
> >>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
> >>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
> >>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
> >>   expire_timers kernel/time/timer.c:1363 [inline]
> >>   __run_timers kernel/time/timer.c:1682 [inline]
> >>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
> >>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
> >>   invoke_softirq kernel/softirq.c:372 [inline]
> >>   irq_exit+0x209/0x290 kernel/softirq.c:412
> >>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> >>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
> >>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
> >>   </IRQ>
> >> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> >> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9
> >> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3>
> >> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> >> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> >> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> >> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> >> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
> >>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
> >>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
> >>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
> >>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
> >>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
> >>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
> >>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> >>
> >> Allocated by task 19764:
> >>   set_track mm/kasan/kasan.c:460 [inline]
> >>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
> >>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
> >>   kmalloc include/linux/slab.h:518 [inline]
> >>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
> >>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
> >>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
> >>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
> >>   chrdev_open+0x212/0x580 fs/char_dev.c:417
> >>   do_dentry_open+0x704/0x1050 fs/open.c:777
> >>   do_last fs/namei.c:3418 [inline]
> >>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
> >>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
> >>   do_sys_open+0x307/0x430 fs/open.c:1069
> >>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
> >>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> Freed by task 19764:
> >>   set_track mm/kasan/kasan.c:460 [inline]
> >>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
> >>   slab_free_hook mm/slub.c:1370 [inline]
> >>   slab_free_freelist_hook mm/slub.c:1397 [inline]
> >>   slab_free mm/slub.c:2952 [inline]
> >>   kfree+0xeb/0x2f0 mm/slub.c:3905
> >>   sk_prot_free net/core/sock.c:1506 [inline]
> >>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
> >>   sk_destruct+0x48/0x70 net/core/sock.c:1596
> >>   __sk_free+0xa9/0x270 net/core/sock.c:1607
> >>   sk_free+0x2a/0x30 net/core/sock.c:1618
> >>   sock_put include/net/sock.h:1696 [inline]
> >>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
> >>   tun_detach drivers/net/tun.c:747 [inline]
> >>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
> >>   __fput+0x27f/0x7f0 fs/file_table.c:278
> >>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
> >>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
> >>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
> >>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
> >>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
> >>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
> >>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> The buggy address belongs to the object at ffff88836cc26600
> >>   which belongs to the cache kmalloc-4096 of size 4096
> >> The buggy address is located 1136 bytes inside of
> >>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
> >> The buggy address belongs to the page:
> >> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600
> >> index:0x0 compound_mapcount: 0
> >> flags: 0x2fffff80008100(slab|head)
> >> raw: 002fffff80008100 dead000000000100 dead000000000200
> ffff8883e280e600
> >> raw: 0000000000000000 0000000000070007 00000001ffffffff
> 0000000000000000
> >> page dumped because: kasan: bad access detected
> >>
> >> Memory state around the buggy address:
> >>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>                                                               ^
> >>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>
> >> If tun driver have multiqueues, user close the last queue by
> >> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> >> queue may add to the tun, which using rcu_assign_pointer
> >> tun->tfiles[index] to the new tfile and increase the numqueues.
> >> However if there send a packet during this time, which picking the last
> >> queue, it may uses the old tun->tfiles[index], beacause there no
> >> RCU grace period.
> >>
> >> 1) tun_chr_close //close the last queue
> >>      --> __tun_detach  //close the last queue, but tun->tfiles[index]
> >> still exist
> >>
> >> 2) tun_chr_open  //attach a new queue
> >>      --> tun_attach
> >> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >>       //there need a RCU grace period
> >>
> >>      -->tun->numqueues++;
> >>
> >> 3) tun_net_xmit //a new packet is sending, which pick the last queue
> >>       -->if (txq >= tun->numqueues)
> >>        //above check passed, but tfile still not renew
> >>       -->if (tfile->socket.sk->sk_filter ...
> >>        //use the old tfile,trigger use-after-free
> >>
> >> Reported-by: Hulk Robot <[email protected]>
> >> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> >> Signed-off-by: YueHaibing <[email protected]>
> >> ---
> >>   drivers/net/tun.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index e9ca1c0..3770aba 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun,
> >> struct file *file,
> >>        */
> >>       rcu_assign_pointer(tfile->tun, tun);
> >>       rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >> +    synchronize_net();
> >>       tun->numqueues++;
> >>       tun_set_real_num_queues(tun);
> >>   out:
> >
> >
> > Good catch, that's one of my suspicion as well. Assume that this has
> > been tested.
> >
> > Acked-by: Jason Wang <[email protected]>
> >
> > This patch is needed for -stable.
> >
> > I will post another theoretical issue similar to this soon.
> >
> > Thanks
> >
>
> Ok, a second thought. All evil come from the access of tun->numqueues
> without sufficient synchronization. This is a partial fix, another
> possible case is __tun_detach(), we need either

In __tun_detach(),tun->tfiles changes should always call synchronize_net(),
and free tfile after RCU grace period. tun_net_xmit() doesn't have the chance to
access the change because it holding the rcu_read_lock().

So __tun_detach() path cannot cause the use after free, isn't it?

Regards

>
> 1) synchronize through pointers in tfiles
>
> 2) or smp_load_acquire()/smp_store_release() to solve it completely.
>
> Let me post a complete fix.
>
> Thanks

2019-04-28 07:54:22

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit


On 2019/4/28 下午1:40, weiyongjun (A) wrote:
> Hi Jason,
>
>> On 2019/4/28 上午11:24, Jason Wang wrote:
>>> On 2019/4/28 上午11:05, Yue Haibing wrote:
>>>> From: YueHaibing <[email protected]>
>>>>
>>>> KASAN report this:
>>>>
>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750
>>>> drivers/net/tun.c:1104
>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>>
>>>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>>> 1.10.2-1ubuntu1 04/01/2014
>>>> Call Trace:
>>>>   <IRQ>
>>>>   __dump_stack lib/dump_stack.c:77 [inline]
>>>>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>>>>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>>>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>>>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>>>>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>>>>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>>>>   xmit_one net/core/dev.c:3243 [inline]
>>>>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>>>>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>>>>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>>>>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>>>>   qdisc_run include/net/pkt_sched.h:120 [inline]
>>>>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>>>>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>>>>   neigh_output include/net/neighbour.h:501 [inline]
>>>>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>>>>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>>>>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>>>>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>>>>   dst_output include/net/dst.h:444 [inline]
>>>>   NF_HOOK include/linux/netfilter.h:289 [inline]
>>>>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>>>>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>>>>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>>>>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>>>>   expire_timers kernel/time/timer.c:1363 [inline]
>>>>   __run_timers kernel/time/timer.c:1682 [inline]
>>>>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>>>>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>>>>   invoke_softirq kernel/softirq.c:372 [inline]
>>>>   irq_exit+0x209/0x290 kernel/softirq.c:412
>>>>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>>>>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>>>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>>>>   </IRQ>
>>>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
>>>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9
>>>> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3>
>>>> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
>>>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
>>>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
>>>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
>>>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>>>>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>>>>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>>>>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>>>>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>>>>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>>>>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>>>>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>>>>
>>>> Allocated by task 19764:
>>>>   set_track mm/kasan/kasan.c:460 [inline]
>>>>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>>>>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>>>>   kmalloc include/linux/slab.h:518 [inline]
>>>>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>>>>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>>>>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>>>>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>>>>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>>>>   do_dentry_open+0x704/0x1050 fs/open.c:777
>>>>   do_last fs/namei.c:3418 [inline]
>>>>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>>>>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>>>>   do_sys_open+0x307/0x430 fs/open.c:1069
>>>>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> Freed by task 19764:
>>>>   set_track mm/kasan/kasan.c:460 [inline]
>>>>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>>>>   slab_free_hook mm/slub.c:1370 [inline]
>>>>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>>>>   slab_free mm/slub.c:2952 [inline]
>>>>   kfree+0xeb/0x2f0 mm/slub.c:3905
>>>>   sk_prot_free net/core/sock.c:1506 [inline]
>>>>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>>>>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>>>>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>>>>   sk_free+0x2a/0x30 net/core/sock.c:1618
>>>>   sock_put include/net/sock.h:1696 [inline]
>>>>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>>>>   tun_detach drivers/net/tun.c:747 [inline]
>>>>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>>>>   __fput+0x27f/0x7f0 fs/file_table.c:278
>>>>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>>>>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>>>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>>>>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>>>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>>>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>>>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> The buggy address belongs to the object at ffff88836cc26600
>>>>   which belongs to the cache kmalloc-4096 of size 4096
>>>> The buggy address is located 1136 bytes inside of
>>>>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
>>>> The buggy address belongs to the page:
>>>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600
>>>> index:0x0 compound_mapcount: 0
>>>> flags: 0x2fffff80008100(slab|head)
>>>> raw: 002fffff80008100 dead000000000100 dead000000000200
>> ffff8883e280e600
>>>> raw: 0000000000000000 0000000000070007 00000001ffffffff
>> 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>>
>>>> Memory state around the buggy address:
>>>>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>                                                               ^
>>>>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>
>>>> If tun driver have multiqueues, user close the last queue by
>>>> tun_detach, then tun->tfiles[index] is not cleared. Then a new
>>>> queue may add to the tun, which using rcu_assign_pointer
>>>> tun->tfiles[index] to the new tfile and increase the numqueues.
>>>> However if there send a packet during this time, which picking the last
>>>> queue, it may uses the old tun->tfiles[index], beacause there no
>>>> RCU grace period.
>>>>
>>>> 1) tun_chr_close //close the last queue
>>>>      --> __tun_detach  //close the last queue, but tun->tfiles[index]
>>>> still exist
>>>>
>>>> 2) tun_chr_open  //attach a new queue
>>>>      --> tun_attach
>>>> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>>>       //there need a RCU grace period
>>>>
>>>>      -->tun->numqueues++;
>>>>
>>>> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>>>>       -->if (txq >= tun->numqueues)
>>>>        //above check passed, but tfile still not renew
>>>>       -->if (tfile->socket.sk->sk_filter ...
>>>>        //use the old tfile,trigger use-after-free
>>>>
>>>> Reported-by: Hulk Robot <[email protected]>
>>>> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
>>>> Signed-off-by: YueHaibing <[email protected]>
>>>> ---
>>>>   drivers/net/tun.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index e9ca1c0..3770aba 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun,
>>>> struct file *file,
>>>>        */
>>>>       rcu_assign_pointer(tfile->tun, tun);
>>>>       rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>>> +    synchronize_net();
>>>>       tun->numqueues++;
>>>>       tun_set_real_num_queues(tun);
>>>>   out:
>>>
>>> Good catch, that's one of my suspicion as well. Assume that this has
>>> been tested.
>>>
>>> Acked-by: Jason Wang <[email protected]>
>>>
>>> This patch is needed for -stable.
>>>
>>> I will post another theoretical issue similar to this soon.
>>>
>>> Thanks
>>>
>> Ok, a second thought. All evil come from the access of tun->numqueues
>> without sufficient synchronization. This is a partial fix, another
>> possible case is __tun_detach(), we need either
> In __tun_detach(),tun->tfiles changes should always call synchronize_net(),
> and free tfile after RCU grace period.


Exactly, and that's why I suggest to check pointers in tfiles against
NULL for method 1) I mentioned below.


> tun_net_xmit() doesn't have the chance to
> access the change because it holding the rcu_read_lock().


The problem is the following codes:


        --tun->numqueues;

        ...

        synchronize_net();

We need make sure the decrement of tun->numqueues be visible to readers
after synchronize_net(). And in tun_net_xmit():


    rcu_read_lock();
    tfile = rcu_dereference(tun->tfiles[txq]);

    /* Drop packet if interface is not attached */
    if (txq >= tun->numqueues)
        goto drop;


We should make sure tun->numqueues were read after tfiles array dereference.

Unfortunately, we don't have such guarantee even with this patch. So we
should change to use smp_load_acquire()/smp_store_release() for those
cases (method 2).

It looks to me checking file against NULL (and NULL out
tfiles[tun->numqueues] in __tun_detach) is much more easier to be
understand.

What do you think?

Thanks


>
> So __tun_detach() path cannot cause the use after free, isn't it?
>
> Regards
>
>> 1) synchronize through pointers in tfiles
>>
>> 2) or smp_load_acquire()/smp_store_release() to solve it completely.
>>
>> Let me post a complete fix.
>>
>> Thanks

2019-04-28 17:50:43

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Sat, Apr 27, 2019 at 8:06 PM Yue Haibing <[email protected]> wrote:
>
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.

This analysis makes sense. It is a normal scenario for RCU,
where readers could still read even after we unpublish the RCU
protected structure, we only need to worry about when we free it.


> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
> */
> rcu_assign_pointer(tfile->tun, tun);
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> + synchronize_net();
> tun->numqueues++;
> tun_set_real_num_queues(tun);

But this fix doesn't make any sense, we only wait for RCU
grace period when freeing old ones, not for new ones. RCU
grace period is all about readers against free.

This is why I came up with the SOCK_RCU_FREE patch, which
is also blocking-free.


Thanks.

2019-04-28 18:02:20

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
> > tun_net_xmit() doesn't have the chance to
> > access the change because it holding the rcu_read_lock().
>
>
> The problem is the following codes:
>
>
> --tun->numqueues;
>
> ...
>
> synchronize_net();
>
> We need make sure the decrement of tun->numqueues be visible to readers
> after synchronize_net(). And in tun_net_xmit():

It doesn't matter at all. Readers are okay to read it even they still use the
stale tun->numqueues, as long as the tfile is not freed readers can read
whatever they want...

The decrement of tun->numqueues is just how we unpublish the old
tfile, it is still valid for readers to read it _after_ unpublish, we only need
to worry about free, not about unpublish. This is the whole spirit of RCU.

You need to rethink about my SOCK_RCU_FREE patch.

Thanks.

2019-04-29 02:24:47

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit


On 2019/4/29 上午1:59, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
>>> tun_net_xmit() doesn't have the chance to
>>> access the change because it holding the rcu_read_lock().
>>
>>
>> The problem is the following codes:
>>
>>
>> --tun->numqueues;
>>
>> ...
>>
>> synchronize_net();
>>
>> We need make sure the decrement of tun->numqueues be visible to readers
>> after synchronize_net(). And in tun_net_xmit():
>
> It doesn't matter at all. Readers are okay to read it even they still use the
> stale tun->numqueues, as long as the tfile is not freed readers can read
> whatever they want...

This is only true if we set SOCK_RCU_FREE, isn't it?

>
> The decrement of tun->numqueues is just how we unpublish the old
> tfile, it is still valid for readers to read it _after_ unpublish, we only need
> to worry about free, not about unpublish. This is the whole spirit of RCU.
>

The point is we don't convert tun->numqueues to RCU but use
synchronize_net().

> You need to rethink about my SOCK_RCU_FREE patch.

The code is wrote before SOCK_RCU_FREE is introduced and assume no
de-reference from device after synchronize_net(). It doesn't harm to
figure out the root cause which may give us more confidence to the fix
(e.g like SOCK_RCU_FREE).

I don't object to fix with SOCK_RCU_FREE, but then we should remove
the redundant synchronize_net(). But I still prefer to synchronize
everything explicitly like (completely untested):

From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001
From: Jason Wang <[email protected]>
Date: Mon, 29 Apr 2019 10:21:06 +0800
Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues

Signed-off-by: Jason Wang <[email protected]>
---
drivers/net/tun.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 80bff1b4ec17..03715f605fb5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)

rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
+ rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;

@@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
tfile = rcu_dereference(tun->tfiles[txq]);

/* Drop packet if interface is not attached */
- if (txq >= tun->numqueues)
+ if (!tfile)
goto drop;

if (!rcu_dereference(tun->steering_prog))
@@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n,

rcu_read_lock();

- numqueues = READ_ONCE(tun->numqueues);
- if (!numqueues) {
+ tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
+ tun->numqueues]);
+ if (!tfile) {
rcu_read_unlock();
return -ENXIO; /* Caller will free/return all frames */
}

- tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
- numqueues]);
-
spin_lock(&tfile->tx_ring.producer_lock);
for (i = 0; i < n; i++) {
struct xdp_frame *xdp = frames[i];
--
2.19.1

2019-04-29 03:56:16

by Wei Yongjun

[permalink] [raw]
Subject: RE: [PATCH] tun: Fix use-after-free in tun_net_xmit

> > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]>
> wrote:
> >>> tun_net_xmit() doesn't have the chance to
> >>> access the change because it holding the rcu_read_lock().
> >>
> >>
> >> The problem is the following codes:
> >>
> >>
> >> --tun->numqueues;
> >>
> >> ...
> >>
> >> synchronize_net();
> >>
> >> We need make sure the decrement of tun->numqueues be visible to
> readers
> >> after synchronize_net(). And in tun_net_xmit():
> >
> > It doesn't matter at all. Readers are okay to read it even they still use the
> > stale tun->numqueues, as long as the tfile is not freed readers can read
> > whatever they want...
>
> This is only true if we set SOCK_RCU_FREE, isn't it?
>
> >
> > The decrement of tun->numqueues is just how we unpublish the old
> > tfile, it is still valid for readers to read it _after_ unpublish, we only need
> > to worry about free, not about unpublish. This is the whole spirit of RCU.
> >
>
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().
>
> > You need to rethink about my SOCK_RCU_FREE patch.
>
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).
>
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):
>
> From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00
> 2001
> From: Jason Wang <[email protected]>
> Date: Mon, 29 Apr 2019 10:21:06 +0800
> Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues
>
> Signed-off-by: Jason Wang <[email protected]>
> ---
> drivers/net/tun.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 80bff1b4ec17..03715f605fb5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool
> clean)
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> + rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);

Should be "rcu_assign_pointer(tun->tfiles[tun->numqueues - 1], NULL);"

> ntfile = rtnl_dereference(tun->tfiles[index]);
> ntfile->queue_index = index;
>
> @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff
> *skb, struct net_device *dev)
> tfile = rcu_dereference(tun->tfiles[txq]);
>
> /* Drop packet if interface is not attached */
> - if (txq >= tun->numqueues)
> + if (!tfile)
> goto drop;
>
> if (!rcu_dereference(tun->steering_prog))
> @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev,
> int n,
>
> rcu_read_lock();
>
> - numqueues = READ_ONCE(tun->numqueues);
> - if (!numqueues) {
> + tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> + tun->numqueues]);
> + if (!tfile) {
> rcu_read_unlock();
> return -ENXIO; /* Caller will free/return all frames */
> }
>
> - tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> - numqueues]);
> -
> spin_lock(&tfile->tx_ring.producer_lock);
> for (i = 0; i < n; i++) {
> struct xdp_frame *xdp = frames[i];
> --
> 2.19.1

2019-04-29 13:08:54

by Yue Haibing

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On 2019/4/29 10:23, Jason Wang wrote:
>
> On 2019/4/29 上午1:59, Cong Wang wrote:
>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
>>>> tun_net_xmit() doesn't have the chance to
>>>> access the change because it holding the rcu_read_lock().
>>>
>>>
>>> The problem is the following codes:
>>>
>>>
>>> --tun->numqueues;
>>>
>>> ...
>>>
>>> synchronize_net();
>>>
>>> We need make sure the decrement of tun->numqueues be visible to readers
>>> after synchronize_net(). And in tun_net_xmit():
>>
>> It doesn't matter at all. Readers are okay to read it even they still use the
>> stale tun->numqueues, as long as the tfile is not freed readers can read
>> whatever they want...
>
> This is only true if we set SOCK_RCU_FREE, isn't it?
>
>>
>> The decrement of tun->numqueues is just how we unpublish the old
>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>
>
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().
>
>> You need to rethink about my SOCK_RCU_FREE patch.
>
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).
>
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):
>
>>From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001
> From: Jason Wang <[email protected]>
> Date: Mon, 29 Apr 2019 10:21:06 +0800
> Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues
>
> Signed-off-by: Jason Wang <[email protected]>
> ---
> drivers/net/tun.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 80bff1b4ec17..03715f605fb5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> + rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);
> ntfile = rtnl_dereference(tun->tfiles[index]);

move here to avoid NULL pointer dereference, it works for me

rcu_assign_pointer(tun->tfiles[tun->numqueues -1 ], NULL);

> ntfile->queue_index = index;
>
> @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> tfile = rcu_dereference(tun->tfiles[txq]);
>
> /* Drop packet if interface is not attached */
> - if (txq >= tun->numqueues)
> + if (!tfile)
> goto drop;
>
> if (!rcu_dereference(tun->steering_prog))
> @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n,
>
> rcu_read_lock();
>
> - numqueues = READ_ONCE(tun->numqueues);
> - if (!numqueues) {
> + tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> + tun->numqueues]);
> + if (!tfile) {
> rcu_read_unlock();
> return -ENXIO; /* Caller will free/return all frames */
> }
>
> - tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> - numqueues]);
> -
> spin_lock(&tfile->tx_ring.producer_lock);
> for (i = 0; i < n; i++) {
> struct xdp_frame *xdp = frames[i];
>

2019-04-29 14:57:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Sun, Apr 28, 2019 at 11:05:39AM +0800, Yue Haibing wrote:
> From: YueHaibing <[email protected]>
>
> KASAN report this:
>
> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xca/0x13e lib/dump_stack.c:113
> print_address_description+0x79/0x330 mm/kasan/report.c:253
> kasan_report_error mm/kasan/report.c:351 [inline]
> kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
> tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
> netdev_start_xmit include/linux/netdevice.h:4309 [inline]
> xmit_one net/core/dev.c:3243 [inline]
> dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
> sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
> qdisc_restart net/sched/sch_generic.c:390 [inline]
> __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
> qdisc_run include/net/pkt_sched.h:120 [inline]
> __dev_xmit_skb net/core/dev.c:3438 [inline]
> __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
> neigh_output include/net/neighbour.h:501 [inline]
> ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
> ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
> NF_HOOK_COND include/linux/netfilter.h:278 [inline]
> ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
> dst_output include/net/dst.h:444 [inline]
> NF_HOOK include/linux/netfilter.h:289 [inline]
> mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
> mld_send_cr net/ipv6/mcast.c:1979 [inline]
> mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
> call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
> expire_timers kernel/time/timer.c:1363 [inline]
> __run_timers kernel/time/timer.c:1682 [inline]
> run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
> __do_softirq+0x26d/0xabd kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:372 [inline]
> irq_exit+0x209/0x290 kernel/softirq.c:412
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
> </IRQ>
> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
> arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
> default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
> cpuidle_idle_call kernel/sched/idle.c:153 [inline]
> do_idle+0x2ca/0x420 kernel/sched/idle.c:262
> cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
> start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>
> Allocated by task 19764:
> set_track mm/kasan/kasan.c:460 [inline]
> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
> __kmalloc+0x11b/0x2d0 mm/slub.c:3750
> kmalloc include/linux/slab.h:518 [inline]
> sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
> sk_alloc+0x3d/0xc00 net/core/sock.c:1523
> tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
> misc_open+0x367/0x4e0 drivers/char/misc.c:141
> chrdev_open+0x212/0x580 fs/char_dev.c:417
> do_dentry_open+0x704/0x1050 fs/open.c:777
> do_last fs/namei.c:3418 [inline]
> path_openat+0x7ed/0x2ae0 fs/namei.c:3533
> do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
> do_sys_open+0x307/0x430 fs/open.c:1069
> do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 19764:
> set_track mm/kasan/kasan.c:460 [inline]
> __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
> slab_free_hook mm/slub.c:1370 [inline]
> slab_free_freelist_hook mm/slub.c:1397 [inline]
> slab_free mm/slub.c:2952 [inline]
> kfree+0xeb/0x2f0 mm/slub.c:3905
> sk_prot_free net/core/sock.c:1506 [inline]
> __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
> sk_destruct+0x48/0x70 net/core/sock.c:1596
> __sk_free+0xa9/0x270 net/core/sock.c:1607
> sk_free+0x2a/0x30 net/core/sock.c:1618
> sock_put include/net/sock.h:1696 [inline]
> __tun_detach+0x464/0xf70 drivers/net/tun.c:735
> tun_detach drivers/net/tun.c:747 [inline]
> tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
> __fput+0x27f/0x7f0 fs/file_table.c:278
> task_work_run+0x136/0x1b0 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:193 [inline]
> exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
> do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff88836cc26600
> which belongs to the cache kmalloc-4096 of size 4096
> The buggy address is located 1136 bytes inside of
> 4096-byte region [ffff88836cc26600, ffff88836cc27600)
> The buggy address belongs to the page:
> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
> flags: 0x2fffff80008100(slab|head)
> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.
>
> 1) tun_chr_close //close the last queue
> --> __tun_detach //close the last queue, but tun->tfiles[index] still exist
>
> 2) tun_chr_open //attach a new queue
> --> tun_attach
> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> //there need a RCU grace period
>
> -->tun->numqueues++;
>
> 3) tun_net_xmit //a new packet is sending, which pick the last queue
> -->if (txq >= tun->numqueues)
> //above check passed, but tfile still not renew
> -->if (tfile->socket.sk->sk_filter ...
> //use the old tfile,trigger use-after-free
>
> Reported-by: Hulk Robot <[email protected]>
> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> Signed-off-by: YueHaibing <[email protected]>
> ---
> drivers/net/tun.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
> */
> rcu_assign_pointer(tfile->tun, tun);
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> + synchronize_net();
> tun->numqueues++;
> tun_set_real_num_queues(tun);
> out:

The problem seems real enough, but an extra synchronize_net on tun_attach
might be a problem, slowing guest startup significantly.
Better ideas?

> --
> 2.7.0
>

2019-04-29 16:40:22

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <[email protected]> wrote:
>
>
> On 2019/4/29 上午1:59, Cong Wang wrote:
> > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
> >>> tun_net_xmit() doesn't have the chance to
> >>> access the change because it holding the rcu_read_lock().
> >>
> >>
> >> The problem is the following codes:
> >>
> >>
> >> --tun->numqueues;
> >>
> >> ...
> >>
> >> synchronize_net();
> >>
> >> We need make sure the decrement of tun->numqueues be visible to readers
> >> after synchronize_net(). And in tun_net_xmit():
> >
> > It doesn't matter at all. Readers are okay to read it even they still use the
> > stale tun->numqueues, as long as the tfile is not freed readers can read
> > whatever they want...
>
> This is only true if we set SOCK_RCU_FREE, isn't it?


Sure, this is how RCU is supposed to work.

>
> >
> > The decrement of tun->numqueues is just how we unpublish the old
> > tfile, it is still valid for readers to read it _after_ unpublish, we only need
> > to worry about free, not about unpublish. This is the whole spirit of RCU.
> >
>
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().

Why tun->numqueues needs RCU? It is an integer, and reading a stale
value is _perfectly_ fine.

If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
it doesn't shrink or grow, so we don't need RCU for it. This is also why
a stale tun->numqueues is fine, as long as it never goes out-of-bound.


>
> > You need to rethink about my SOCK_RCU_FREE patch.
>
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).

I believe SOCK_RCU_FREE is the fix for the root cause, not just a
cover-up.


>
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):

I agree that synchronize_net() can be removed. However I don't
understand your untested patch at all, it looks like to fix a completely
different problem rather than this use-after-free.

Thanks.

2019-04-29 17:00:11

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <[email protected]> wrote:
> The problem seems real enough, but an extra synchronize_net on tun_attach
> might be a problem, slowing guest startup significantly.
> Better ideas?

Yes, I proposed the following patch in the other thread.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c088d0b..31c3210288cb 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
struct file * file)
file->private_data = tfile;
INIT_LIST_HEAD(&tfile->next);

+ sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);

return 0;

2019-04-30 02:46:21

by Yue Haibing

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit



On 2019/4/30 0:38, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <[email protected]> wrote:
>>
>>
>> On 2019/4/29 上午1:59, Cong Wang wrote:
>>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
>>>>> tun_net_xmit() doesn't have the chance to
>>>>> access the change because it holding the rcu_read_lock().
>>>>
>>>>
>>>> The problem is the following codes:
>>>>
>>>>
>>>> --tun->numqueues;
>>>>
>>>> ...
>>>>
>>>> synchronize_net();
>>>>
>>>> We need make sure the decrement of tun->numqueues be visible to readers
>>>> after synchronize_net(). And in tun_net_xmit():
>>>
>>> It doesn't matter at all. Readers are okay to read it even they still use the
>>> stale tun->numqueues, as long as the tfile is not freed readers can read
>>> whatever they want...
>>
>> This is only true if we set SOCK_RCU_FREE, isn't it?
>
>
> Sure, this is how RCU is supposed to work.
>
>>
>>>
>>> The decrement of tun->numqueues is just how we unpublish the old
>>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>>
>>
>> The point is we don't convert tun->numqueues to RCU but use
>> synchronize_net().
>
> Why tun->numqueues needs RCU? It is an integer, and reading a stale
> value is _perfectly_ fine.
>
> If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
> it doesn't shrink or grow, so we don't need RCU for it. This is also why
> a stale tun->numqueues is fine, as long as it never goes out-of-bound.
>
>
>>
>>> You need to rethink about my SOCK_RCU_FREE patch.
>>
>> The code is wrote before SOCK_RCU_FREE is introduced and assume no
>> de-reference from device after synchronize_net(). It doesn't harm to
>> figure out the root cause which may give us more confidence to the fix
>> (e.g like SOCK_RCU_FREE).
>
> I believe SOCK_RCU_FREE is the fix for the root cause, not just a
> cover-up.

With SOCK_RCU_FREE tfile is ok ,

but tfile->sk is freed by sock_put in __tun_detach, it will trgger

use-after-free in tun_net_xmit if tun->numqueues check passed.

>
>
>>
>> I don't object to fix with SOCK_RCU_FREE, but then we should remove
>> the redundant synchronize_net(). But I still prefer to synchronize
>> everything explicitly like (completely untested):
>
> I agree that synchronize_net() can be removed. However I don't
> understand your untested patch at all, it looks like to fix a completely
> different problem rather than this use-after-free.
>
> Thanks.
>
> .
>

2019-04-30 05:13:11

by Wei Yongjun

[permalink] [raw]
Subject: RE: [PATCH] tun: Fix use-after-free in tun_net_xmit

> Network Developers <[email protected]>
> Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
>
> On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <[email protected]>
> wrote:
> > The problem seems real enough, but an extra synchronize_net on
> tun_attach
> > might be a problem, slowing guest startup significantly.
> > Better ideas?
>
> Yes, I proposed the following patch in the other thread.
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c088d0b..31c3210288cb 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
> struct file * file)
> file->private_data = tfile;
> INIT_LIST_HEAD(&tfile->next);
>
> + sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
> sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>
> return 0;


This patch should not work. The key point is that when detach the queue
with index is equal to tun->numqueues - 1, we do not clear the point
in tun->tfiles:

static void __tun_detach(...)
{
...
**** if index == tun->numqueues - 1, nothing changed ****
rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
....
}

And after tfile free, xmit have change to get and use the freed file point.

Regards

2019-04-30 23:36:27

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Mon, Apr 29, 2019 at 7:44 PM YueHaibing <[email protected]> wrote:
>
> With SOCK_RCU_FREE tfile is ok ,
>
> but tfile->sk is freed by sock_put in __tun_detach, it will trgger

SOCK_RCU_FREE is exactly for sock and for sock_put(),
you need to look into sock_put() path to see where SOCK_RCU_FREE
is tested.


>
> use-after-free in tun_net_xmit if tun->numqueues check passed.

Why do you believe we still have use-after-free with SOCK_RCU_FREE?

tun_net_xmit() holds RCU read lock, so with SOCK_RCU_FREE,
the sock won't be freed until tun_net_xmit() releases RCU read lock.
This is just how RCU works...

2019-04-30 23:43:52

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit

On Mon, Apr 29, 2019 at 10:11 PM weiyongjun (A) <[email protected]> wrote:
> This patch should not work. The key point is that when detach the queue
> with index is equal to tun->numqueues - 1, we do not clear the point
> in tun->tfiles:
>
> static void __tun_detach(...)
> {
> ...
> **** if index == tun->numqueues - 1, nothing changed ****
> rcu_assign_pointer(tun->tfiles[index],
> tun->tfiles[tun->numqueues - 1]);
> ....
> }


This is _perfectly_ fine. This is just how we _unpublish_ it, RCU is NOT
against unpublish, you keep missing this point.

Think about list_del_rcu(). RCU readers could still read the list entry
even _after_ list_del_rcu(), this is perfectly fine, list_del_rcu() just
unpublishes the list entry from a global list, kfree_rcu() is the one frees
it. So, RCU readers never hate "unpublish", they just hate "free".


>
> And after tfile free, xmit have change to get and use the freed file point.

With SOCK_RCU_FREE, it won't be freed until the last reader is gone.
This is the fundamental of RCU.

Please, at least look into sk_destruct().

Thanks.

2019-05-05 09:11:36

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit


On 2019/4/30 上午12:38, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <[email protected]> wrote:
>>
>> On 2019/4/29 上午1:59, Cong Wang wrote:
>>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <[email protected]> wrote:
>>>>> tun_net_xmit() doesn't have the chance to
>>>>> access the change because it holding the rcu_read_lock().
>>>>
>>>> The problem is the following codes:
>>>>
>>>>
>>>> --tun->numqueues;
>>>>
>>>> ...
>>>>
>>>> synchronize_net();
>>>>
>>>> We need make sure the decrement of tun->numqueues be visible to readers
>>>> after synchronize_net(). And in tun_net_xmit():
>>> It doesn't matter at all. Readers are okay to read it even they still use the
>>> stale tun->numqueues, as long as the tfile is not freed readers can read
>>> whatever they want...
>> This is only true if we set SOCK_RCU_FREE, isn't it?
>
> Sure, this is how RCU is supposed to work.
>
>>> The decrement of tun->numqueues is just how we unpublish the old
>>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>>
>> The point is we don't convert tun->numqueues to RCU but use
>> synchronize_net().
> Why tun->numqueues needs RCU? It is an integer, and reading a stale
> value is _perfectly_ fine.


I meant we don't want e.g tun_net_xmit() to see the stale value after
synchronize_net() in __tun_detach(), since it has various other steps
with the assumption that no tfile dereference from data path. E.g one
example is XDP rxq information un-registering which looks racy in the
case of XDP_TX.


>
> If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
> it doesn't shrink or grow, so we don't need RCU for it. This is also why
> a stale tun->numqueues is fine, as long as it never goes out-of-bound.


We do kind of shrinking or growing through tun->numqueues. That's why we
check against it in various places. But, of course this is buggy.


>
>
>>> You need to rethink about my SOCK_RCU_FREE patch.
>> The code is wrote before SOCK_RCU_FREE is introduced and assume no
>> de-reference from device after synchronize_net(). It doesn't harm to
>> figure out the root cause which may give us more confidence to the fix
>> (e.g like SOCK_RCU_FREE).
> I believe SOCK_RCU_FREE is the fix for the root cause, not just a
> cover-up.
>
>
>> I don't object to fix with SOCK_RCU_FREE, but then we should remove
>> the redundant synchronize_net(). But I still prefer to synchronize
>> everything explicitly like (completely untested):
> I agree that synchronize_net() can be removed. However I don't
> understand your untested patch at all, it looks like to fix a completely
> different problem rather than this use-after-free.


As has been mentioned, the problem of current code is that we still
leave pointers  to freed tfile in tfiles[] array in __tun_detach() and
the check with tun->numqueues seems racy. So the patch just NULL out the
detached tfile pointers and make sure no it can not be dereferenced from
tfile after synchronize_net() by dereferencing tfile instead of checking
tun->numqueues .


Thanks

>
> Thanks.