Received: by 10.192.165.148 with SMTP id m20csp2219560imm; Thu, 3 May 2018 12:25:48 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqCtsjfbdGTr4LDNzYuJ1VQKVVUsFK3lE1iYz3o4C2NnEMNQLDgipcG423yVWz2hIaDwmNk X-Received: by 2002:a63:77ce:: with SMTP id s197-v6mr19711489pgc.272.1525375548465; Thu, 03 May 2018 12:25:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525375548; cv=none; d=google.com; s=arc-20160816; b=Z1HdKLFdohf4bObenKMKYrTeBb3UPnVsUyKnpXd/e8P1sUCZPn7N2h5e+TKPQLChgW daZUOP592CoGkxa48DPGAsHQLI5UF66h2gB6UolcryAPf95cgVoYPW6mulAJGjpK6Zwu wz+BKYOAdrOP/pR8S83r1O9rYjbgL8GRwwnt3EhDgrDKh5imGZ3MTGLY6z2syiT8o8ql 05+DBM8EVfGysLLFY7E1HJvNR9MKt390BG6X+IN9ezk2f0wuHxf6MN2SzxTBLTn0AqWk g2pygVELUtvWDbfwCCiEA4OEFbgYxSXbBa5L7PzNu7EYJYmARse/atoqFfNmrUbpiLZx Zmtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=Y2n+j7XUumCv4BK2SVOWV8oMKQD0c1Tze8sLwREEIDg=; b=0y/yzPf7MtMDfUdpstbpD5VLLx3HJgHUm0DWJ4g6qJd6m495bvNociqfv5jtFYxq9i J23rTU2q6SsFtAf8qPTPtwbxcc6vxKLSe94J5Xqd1I45iBJ7IFwkSNJPcfdohh9LwgG8 kqXIafkSZgb0qDM8KT1pLNg58Jd1KTKIfjEgxbaBUztjjRatoK2ggV/QiAsN92PpuOsI wdF2BDwebGd5zXXVQFM50iwbZTV9rQcqrlGK9Ul9ngdXVr+3lZxHN7w6INLpSHWdw/ae 7NCNtNtC0MrMDC6obQKt+rNoFBkvHvzT2H/7z0wC4leRKQVV6CCCy5UbHTnNqEOw+2lV Pkiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6-v6si11393914pgr.374.2018.05.03.12.25.34; Thu, 03 May 2018 12:25:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751711AbeECTZA (ORCPT + 99 others); Thu, 3 May 2018 15:25:00 -0400 Received: from mga02.intel.com ([134.134.136.20]:8166 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751307AbeECTYh (ORCPT ); Thu, 3 May 2018 15:24:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 May 2018 12:24:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,359,1520924400"; d="scan'208";a="225573202" Received: from theros.lm.intel.com ([10.232.112.164]) by fmsmga005.fm.intel.com with ESMTP; 03 May 2018 12:24:35 -0700 From: Ross Zwisler To: Andrew Morton , linux-kernel@vger.kernel.org, Matthew Wilcox Cc: Ross Zwisler , Christoph Hellwig , Dan Williams , Dave Chinner , Jan Kara , linux-nvdimm@lists.01.org Subject: [PATCH 4/5] radix tree test suite: multi-order iteration race Date: Thu, 3 May 2018 13:24:29 -0600 Message-Id: <20180503192430.7582-5-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180503192430.7582-1-ross.zwisler@linux.intel.com> References: <20180503192430.7582-1-ross.zwisler@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Signed-off-by: Ross Zwisler --- tools/testing/radix-tree/multiorder.c | 63 +++++++++++++++++++++++++++++++++++ tools/testing/radix-tree/test.h | 1 + 2 files changed, 64 insertions(+) diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c index 59245b3d587c..7bf405638b0b 100644 --- a/tools/testing/radix-tree/multiorder.c +++ b/tools/testing/radix-tree/multiorder.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "test.h" @@ -624,6 +625,67 @@ static void multiorder_account(void) item_kill_tree(&tree); } +bool stop_iteration = false; + +static void *creator_func(void *ptr) +{ + /* 'order' is set up to ensure we have sibling entries */ + unsigned int order = RADIX_TREE_MAP_SHIFT - 1; + struct radix_tree_root *tree = ptr; + int i; + + for (i = 0; i < 10000; i++) { + item_insert_order(tree, 0, order); + item_delete_rcu(tree, 0); + } + + stop_iteration = true; + return NULL; +} + +static void *iterator_func(void *ptr) +{ + struct radix_tree_root *tree = ptr; + struct radix_tree_iter iter; + struct item *item; + void **slot; + + while (!stop_iteration) { + rcu_read_lock(); + radix_tree_for_each_slot(slot, tree, &iter, 0) { + item = radix_tree_deref_slot(slot); + + if (!item) + continue; + if (radix_tree_deref_retry(item)) { + slot = radix_tree_iter_retry(&iter); + continue; + } + + item_sanity(item, iter.index); + } + rcu_read_unlock(); + } + return NULL; +} + +static void multiorder_iteration_race(void) +{ + const int num_threads = sysconf(_SC_NPROCESSORS_ONLN); + pthread_t worker_thread[num_threads]; + RADIX_TREE(tree, GFP_KERNEL); + int i; + + pthread_create(&worker_thread[0], NULL, &creator_func, &tree); + for (i = 1; i < num_threads; i++) + pthread_create(&worker_thread[i], NULL, &iterator_func, &tree); + + for (i = 0; i < num_threads; i++) + pthread_join(worker_thread[i], NULL); + + item_kill_tree(&tree); +} + void multiorder_checks(void) { int i; @@ -644,6 +706,7 @@ void multiorder_checks(void) multiorder_join(); multiorder_split(); multiorder_account(); + multiorder_iteration_race(); radix_tree_cpu_dead(0); } diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h index 8cf4a7a7f94c..31f1d9b6f506 100644 --- a/tools/testing/radix-tree/test.h +++ b/tools/testing/radix-tree/test.h @@ -13,6 +13,7 @@ struct item { struct item *item_create(unsigned long index, unsigned int order); int __item_insert(struct radix_tree_root *root, struct item *item); int item_insert(struct radix_tree_root *root, unsigned long index); +void item_sanity(struct item *item, unsigned long index); int item_insert_order(struct radix_tree_root *root, unsigned long index, unsigned order); int item_delete(struct radix_tree_root *root, unsigned long index); -- 2.14.3