Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp797664imm; Mon, 21 May 2018 14:40:02 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo/2q9UmzBSQxYfQUB/cNut4NPiFlcY7dOOpJsPUO4CGgGcEQq/lZYbIXBy2wQPzFS7beEq X-Received: by 2002:a62:9099:: with SMTP id q25-v6mr21866069pfk.66.1526938802826; Mon, 21 May 2018 14:40:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526938802; cv=none; d=google.com; s=arc-20160816; b=UzycmuVADyiMEgBnkHdaNqxKVslnC4TyxVPzN0YT+pO3niMvbWC/0+t82sIppa9fez E2VTW3rbuPTRSF6HPrNq5X4KxL4B1FGTIXHkLBEMDvragbmkUPxkqHcAzUPDbRZY+7Qj ABhVhMMsWtVbadC0daQG4uCRsMi1G2z8mmCIxBXEQexJs8zX9u9RsTfjGJsx2EBWGD5Y BRyl5chtmvaEMQ5YbaFjXour3/B9gPbZsNrU9hRQNyWp/eX5ucXxvIkcmMyFhfy/AXsj YAyje0DkXmW3EtPEVxzo/dZsnmEfY2F7bpoLFibQgHXNOhbbDrHtCSPhIVxIesf2T4Eb cteQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=Ed7hAdiKdjMACtzoaF1VKyJjKkel3cY4wCL9NOy+RKQ=; b=vyUmdFp1edoTwsV7V694V01FWCX4FQgIJ5crUv9wOJH2qbK/YbckDdMklIDGkCzC+O yx1zpANQLzTW9v1lj6vfO1RYJNX3mJITpSuo9+K6kOBrcjPzpQ5KRW34B/+SvVfTszqM 6nT5Pf6Tq8vCArhonZ4tPOYa4avYlEyxgcTsAcMP4At2XFjBlouXtxWvZL0za8lOEal3 bWhn8lBgPqX7JeIIbucLlgfxccyFqo3DLU5Ysmxmac+iVxMAbWy/AgdZA9x4FFYhjDPN NIiU4uNPjiEGeg9JBr3uXJwQSuClcBn03epMx1KH4EUqxdnl05Tt0pXfiwhIHIQgJiub QuCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=aErsSBXh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si15216490plf.308.2018.05.21.14.39.48; Mon, 21 May 2018 14:40:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=aErsSBXh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754612AbeEUVie (ORCPT + 99 others); Mon, 21 May 2018 17:38:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:39988 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751188AbeEUVZH (ORCPT ); Mon, 21 May 2018 17:25:07 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 844A420853; Mon, 21 May 2018 21:25:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1526937907; bh=tmazs3FBfHymNt20yEOFA8mbq1HIvJ/UvTFL54dZ3FU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aErsSBXhHFI6/TbwnciBUGa824qvX2MCqDu77N597RBHpxqvkqA7/U8rdi0qJ2aXo s3uZUFNiNzKI1kPBmV3nLOaMy4Io8cUgsde4OWmaEQUHIk+44rU+jlHcSq74omNTHP EMvAKzKLI0QgBJw4Y+Xc9SgmCfA1meBpiXHuRzvA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ross Zwisler , "CR, Sapthagirish" , Jan Kara , Matthew Wilcox , Christoph Hellwig , Dan Williams , Dave Chinner , Andrew Morton , Linus Torvalds Subject: [PATCH 4.16 037/110] radix tree: fix multi-order iteration race Date: Mon, 21 May 2018 23:11:34 +0200 Message-Id: <20180521210507.009374384@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180521210503.823249477@linuxfoundation.org> References: <20180521210503.823249477@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Ross Zwisler commit 9f418224e8114156d995b98fa4e0f4fd21f685fe upstream. Fix a race in the multi-order iteration code which causes the kernel to hit a GP fault. This was first seen with a production v4.15 based kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD DAX entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember for example that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. We fix this race by fixing the way that skip_siblings() detects sibling nodes. Instead of testing against the preceding slot we instead look for siblings via is_sibling_entry() which compares against the position of the struct radix_tree_node.slots[] array. This ensures that sibling entries are properly identified, even if they are no longer contiguous with the 'entry' they point to. Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators") Signed-off-by: Ross Zwisler Reported-by: CR, Sapthagirish Reviewed-by: Jan Kara Cc: Matthew Wilcox Cc: Christoph Hellwig Cc: Dan Williams Cc: Dave Chinner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- lib/radix-tree.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1612,11 +1612,9 @@ static void set_iter_tags(struct radix_t static void __rcu **skip_siblings(struct radix_tree_node **nodep, void __rcu **slot, struct radix_tree_iter *iter) { - void *sib = node_to_entry(slot - 1); - while (iter->index < iter->next_index) { *nodep = rcu_dereference_raw(*slot); - if (*nodep && *nodep != sib) + if (*nodep && !is_sibling_entry(iter->node, *nodep)) return slot; slot++; iter->index = __radix_tree_iter_add(iter, 1); @@ -1631,7 +1629,7 @@ void __rcu **__radix_tree_next_slot(void struct radix_tree_iter *iter, unsigned flags) { unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK; - struct radix_tree_node *node = rcu_dereference_raw(*slot); + struct radix_tree_node *node; slot = skip_siblings(&node, slot, iter);