Hi,
I spent some time debugging the Syzkaller's found issue at subject:
https://syzkaller.appspot.com/bug?id=b8febdb3c7c8c1f1b606fb903cee66b21b2fd02f
And I've backtracked the UAF to the fact that the cma_listen_on_all()
function adds "id_priv->list" to the global var "listen_any_list" but
then such element is not removed in the rdma_destroy_id() function
(though I've seen that the call to cma_release_dev() in
rdma_destroy_id() should do the removal but doesn't get executed).
Therefore, if a program allocates a "struct rdma_cm_id" (through
ucma_open + ucma_create_id), then executes cma_listen_on_all(), then
frees the struct and repeat, during the second execution of
cma_listen_on_all() the kernel will try to update the references of the
freed node, triggering the UAF. I was able to fix the UAF with this ugly
patch:
--- b/drivers/infiniband/core/cma.c 2018-07-07 02:28:03.214589868 +0200
+++ a/drivers/infiniband/core/cma.c 2018-07-07 03:35:44.325301216 +0200
@@ -1678,6 +1678,11 @@ void rdma_destroy_id(struct rdma_cm_id *
mutex_lock(&id_priv->handler_mutex);
mutex_unlock(&id_priv->handler_mutex);
+ mutex_lock(&lock);
+ if(id_priv->list.next!=0 && id_priv->list.prev!=0)
+ list_del(&id_priv->list);
+ mutex_unlock(&lock);
+
if (id_priv->cma_dev) {
rdma_restrack_del(&id_priv->res);
if (rdma_cap_ib_cm(id_priv->id.device, 1)) {
Note: I only tested this patch against the shortest reproducer for this
issue (not any other use of rdma_cm):
https://syzkaller.appspot.com/text?tag=ReproC&x=1334f10f800000
I had to add that "if" in the patch because running the reproducer
(after several iterations) provoked a NULL-dereference in the added
list_del() call because for some reason I haven't cleared yet the next
and prev pointers of the list at issue gets zeroed, sometimes ( by what ??).
Moreover, I noticed that running the reproducer for "long" time exhaust
all the available memory. To spot the memory leaks I recompiled with:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=10000
The reproducer induces, apparently, 2 memory leaks reported by kmemleak:
unreferenced object 0xffff880069f49d40 (size 512):
comm "repro", pid 4263, jiffies 4294722196 (age 688.262s)
hex dump (first 32 bytes):
00 b8 13 5a 00 88 ff ff 40 9d f4 69 00 88 ff ff [email protected]....
0a 00 98 a6 00 00 00 00 fe 80 00 00 00 00 00 00 ................
backtrace:
[<0000000075a2f334>] kmem_cache_alloc_trace+0x1b2/0x3d0
[<0000000075fd9fea>] rdma_resolve_ip+0xc0/0x6b0
[<0000000033592b0b>] rdma_resolve_addr+0x490/0x2580
[<00000000d6f2cd9d>] ucma_resolve_ip+0x193/0x260
[<0000000068f1c2b7>] ucma_write+0x2ec/0x3f0
[<00000000015692cc>] __vfs_write+0x107/0x920
[<000000009528b010>] vfs_write+0x189/0x510
[<000000001a5d169b>] ksys_write+0xfa/0x240
[<00000000b747746a>] __x64_sys_write+0x73/0xb0
[<0000000071590ffb>] do_syscall_64+0x18c/0x760
[<000000003c31113f>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0000000059247e9d>] 0xffffffffffffffff
unreferenced object 0xffff88006c0c0bc0 (size 576):
comm "repro", pid 4261, jiffies 4294722191 (age 688.261s)
hex dump (first 32 bytes):
00 02 00 00 00 00 00 00 80 b8 07 6c 00 88 ff ff ...........l....
b0 7d 2c 6b 00 88 ff ff d8 0b 0c 6c 00 88 ff ff .},k.......l....
backtrace:
[<0000000039511ef2>] kmem_cache_alloc+0x1b2/0x3d0
[<00000000106bf668>] radix_tree_node_alloc.constprop.18+0x5e/0x2e0
[<000000005b2f026d>] idr_get_free+0x9f5/0x1000
[<00000000445baa5a>] idr_alloc_u32+0x1bc/0x3d0
[<000000007fd1b6f4>] idr_alloc+0xfd/0x190
[<00000000d706389e>] cma_alloc_port+0xb0/0x170
[<000000008f968f9e>] rdma_bind_addr+0x1252/0x1f00
[<00000000e3361215>] rdma_resolve_addr+0x39e/0x2580
[<00000000d6f2cd9d>] ucma_resolve_ip+0x193/0x260
[<0000000068f1c2b7>] ucma_write+0x2ec/0x3f0
[<00000000015692cc>] __vfs_write+0x107/0x920
[<000000009528b010>] vfs_write+0x189/0x510
[<000000001a5d169b>] ksys_write+0xfa/0x240
[<00000000b747746a>] __x64_sys_write+0x73/0xb0
[<0000000071590ffb>] do_syscall_64+0x18c/0x760
[<000000003c31113f>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
I don't have a background on usage or internals of the driver at issue
but I hope these clues will help in finding the proper fix.
Tomas
On Sat, Jul 07, 2018 at 03:41:30AM +0200, Tomas Bortoli wrote:
> I don't have a background on usage or internals of the driver at issue
> but I hope these clues will help in finding the proper fix.
I think anything is useful, thanks..
The truth is that nobody is left that seems to really understand this
code and syzkaller has shown it is full of various bugs..
If there is someone out there that would like to tackle it, let me
know. There might be a possibility to support such work.
Jason