2018-12-17 16:23:37

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 0/4] use-after-free in svc_process_common()

Unfortunately nfsv41+ clients are still not properly net-namespace-filied.

OpenVz got report on crash in svc_process_common()
abd founf that bc_svc_process() cannot use serv->sv_bc_xprt as a pointer.

serv is global structure, but sv_bc_xprt is assigned per-netnamespace.
If nfsv41+ shares (with the same minorversion) are mounted in several containers together
then bc_svc_process() can use wrong backchannel or even access freed memory.

OpenVz got report on crash svc_process_common(),
and after careful investigations Evgenii Shatokhin have found its reproducer.
Then I've reproduced the problem on last mainline kernel.

In described scenario you need to have:
- nodeA: VM with 2 interfaces and debug kernel with enabled KASAN.
- nodeB: any other node
- NFS-SRV: NFSv41+ server (4.2 is used in exaple below)

1) nodeA: mount nfsv41+ share
# mount -t nfs4 -o vers=4.2 NFS-SRV:/export/ /mnt/ns1
VvS: here serv->sv_bc_xprt is assigned first time,
in xs_tcp_bc_up() it is assigned to svc_xprt of mount's backchannel

2) nodeA: create net namespace, and mount the same (or any other) NFSv41+ share
# ip netns add second
# ip link set ens2 netns second
# ip netns exec second bash
(inside netns second) # dhclient ens2
VvS: now nets got access to external network
(inside netns second) # mount -t nfs4 -o vers=4.2 NFS-SRV:/export/ /mnt/ns2
VvS: now serv->sv_bc_xprt is overwritten by reference to svc_xprt of new mount's backchannel
NB: you can mount any other NFS share but minorversion must be the same.
NB2: if hardware allows you can use rdma transport here
NB3: you can access nothing in mounted share, problem's trigger was enabled already.

3) NodeA, destroy mount inside netns and then netns itself.

(inside netns second) # umount /mnt/ns2
(inside netns second) # ip link set ens2 netns 1
(inside netns second) # exit
VvS: return to init_net
# ip netns del second
VvS: now second NFS mount and second net namespace was destroyed.

4) Node A: prepare backchannel event
# echo test1 > /mnt/ns1/test1.txt
# echo test2 > /mnt/ns1/test2.txt
# python
>>> fl=open('/mnt/ns1/test1.txt','r')
>>>

4) Node B: replace file open by NodeA
# mount -t nfs -o vers=4.2 NFS-SRV:/export/ /mnt/
# mv /mnt/test2.txt /mnt/test1.txt

===> KASAN on nodeA detect an access to already freed memory.
(see dmesg example below for details)

svc_process_common()
/* Setup reply header */
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

svc_process_common() uses already freed rqstp->rq_xprt,
it was assigned in bc_svc_process() where it was taken from serv->sv_bc_xprt.

serv->sv_bc_xprt cannot be used as a pointer,
it can be assigned per net-namespace, either in svc_bc_tcp_create()
or in xprt_rdma_bc_up().
(Hopefully both transports cannot be used together in the same netns)

To fix this problem I've added new callback to struct rpc_xprt_ops,
it calls svc_find_xprt with proper name of transport's backchannel.

serv->sv_bc_xprt is used in svc_is_backchannel() too.
Here this filed is used not as pointer but as some mark of
backchannel-compatible svc servers.
My 2nd patch replaces sv_bc_xprt pointer to boolean flag,
I hope it helps to prevent misuse of sv_bc_xprt in future.

3rd and 4th pathces are minor cleanup in debug messages.

Vasily Averin (4):
nfs: serv->sv_bc_xprt misuse in bc_svc_process()
nfs: remove sv_bc_enabled using in svc_is_backchannel()
nfs: minor typo in nfs4_callback_up_net()
nfs: fix debug message in svc_create_xprt()

fs/nfs/callback.c | 2 +-
include/linux/sunrpc/bc_xprt.h | 10 ++++------
include/linux/sunrpc/svc.h | 2 +-
include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/svc.c | 22 ++++++++++++++++------
net/sunrpc/svc_xprt.c | 4 ++--
net/sunrpc/svcsock.c | 2 +-
net/sunrpc/xprtrdma/backchannel.c | 5 +++++
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
net/sunrpc/xprtrdma/transport.c | 1 +
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
net/sunrpc/xprtsock.c | 7 +++++++
12 files changed, 41 insertions(+), 18 deletions(-)

--
2.17.1

==================================================================
BUG: KASAN: use-after-free in svc_process_common+0xec/0xd80 [sunrpc]
Read of size 8 at addr ffff8881d69d4590 by task NFSv4 callback/1907

CPU: 0 PID: 1907 Comm: NFSv4 callback Not tainted 4.20.0-rc6+ #1
Hardware name: Virtuozzo KVM, BIOS 1.10.2-3.1.vz7.3 04/01/2014
Call Trace:
dump_stack+0xc6/0x150
? dump_stack_print_info.cold.0+0x1b/0x1b
? kmsg_dump_rewind_nolock+0x59/0x59
? _raw_write_lock_irqsave+0x100/0x100
? __switch_to_asm+0x34/0x70
? svc_process_common+0xec/0xd80 [sunrpc]
print_address_description+0x65/0x22e
? svc_process_common+0xec/0xd80 [sunrpc]
kasan_report.cold.5+0x241/0x306
svc_process_common+0xec/0xd80 [sunrpc]
? __cpuidle_text_end+0x8/0x8
? _raw_write_lock_irqsave+0xe0/0x100
? svc_printk+0x190/0x190 [sunrpc]
? __cpuidle_text_end+0x8/0x8
? _raw_write_lock_irqsave+0xe0/0x100
? prepare_to_wait+0x11f/0x210
bc_svc_process+0x24b/0x3a0 [sunrpc]
? kthread_freezable_should_stop+0xff/0x170
? svc_fill_symlink_pathname+0xe0/0xe0 [sunrpc]
? _raw_spin_lock+0xe0/0xe0
nfs41_callback_svc+0x2c1/0x340 [nfsv4]
? nfs_map_gid_to_group+0x230/0x230 [nfsv4]
? finish_wait+0x1f0/0x1f0
? wait_woken+0x130/0x130
? _raw_write_lock_irqsave+0xe0/0x100
? __cpuidle_text_end+0x8/0x8
? nfs_map_gid_to_group+0x230/0x230 [nfsv4]
kthread+0x1ae/0x1d0
? kthread_park+0xb0/0xb0
ret_from_fork+0x35/0x40
Allocated by task 1923:
kasan_kmalloc+0xbf/0xe0
kmem_cache_alloc_trace+0x125/0x270
svc_bc_tcp_create+0x38/0x80 [sunrpc]
_svc_create_xprt+0x2dd/0x400 [sunrpc]
svc_create_xprt+0x58/0xd0 [sunrpc]
xs_tcp_bc_up+0x22/0x30 [sunrpc]
nfs_callback_up+0x226/0x660 [nfsv4]
nfs4_init_client+0x2e5/0x4b0 [nfsv4]
nfs_get_client+0x7d3/0x860 [nfs]
nfs4_set_client+0x1ef/0x290 [nfsv4]
nfs4_create_server+0x268/0x520 [nfsv4]
nfs4_remote_mount+0x31/0x60 [nfsv4]
mount_fs+0x5c/0x19d
vfs_kern_mount.part.33+0xbc/0x2a0
nfs_do_root_mount+0x7f/0xc0 [nfsv4]
nfs4_try_mount+0x7f/0xd0 [nfsv4]
nfs_fs_mount+0xd10/0x1430 [nfs]
mount_fs+0x5c/0x19d
vfs_kern_mount.part.33+0xbc/0x2a0
do_mount+0x3ab/0x16d0
ksys_mount+0xba/0xd0
__x64_sys_mount+0x62/0x70
do_syscall_64+0x112/0x310
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 1984:
__kasan_slab_free+0x125/0x170
kfree+0x90/0x1e0
svc_xprt_free+0xbc/0xe0 [sunrpc]
svc_delete_xprt+0x44c/0x4d0 [sunrpc]
svc_close_net+0x2de/0x340 [sunrpc]
svc_shutdown_net+0x14/0x50 [sunrpc]
nfs_callback_down_net+0x105/0x140 [nfsv4]
nfs_callback_down+0x4d/0xf0 [nfsv4]
nfs4_free_client+0x123/0x130 [nfsv4]
nfs_put_client.part.6+0x392/0x3d0 [nfs]
nfs41_sequence_release+0xb5/0x100 [nfsv4]
rpc_free_task+0x5d/0xa0 [sunrpc]
__rpc_execute+0x6f0/0x700 [sunrpc]
process_one_work+0x5bd/0x9e0
worker_thread+0x181/0xa90
kthread+0x1ae/0x1d0
ret_from_fork+0x35/0x40

The buggy address belongs to the object at ffff8881d69d4588
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 8 bytes inside of
4096-byte region [ffff8881d69d4588, ffff8881d69d5588)
The buggy address belongs to the page:
page:ffffea00075a7400 count:1 mapcount:0 mapping:ffff8881f600ea40 index:0x0 compound_mapcount: 0
flags: 0x17ffe000010200(slab|head)
raw: 0017ffe000010200 ffffea0007c26e08 ffffea000774a808 ffff8881f600ea40
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8881d69d4480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881d69d4500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881d69d4580: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8881d69d4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8881d69d4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================