Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753334AbbKZJiA (ORCPT ); Thu, 26 Nov 2015 04:38:00 -0500 Received: from victor.provo.novell.com ([137.65.250.26]:34692 "EHLO prv3-mh.provo.novell.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752825AbbKZJh4 (ORCPT ); Thu, 26 Nov 2015 04:37:56 -0500 Subject: Re: [PATCH 3.2 38/52] Btrfs: fix race when listing an inode's xattrs To: Ben Hutchings , Luis Henriques References: <20151125231100.GC21715@charon> <1448498376.27159.42.camel@decadent.org.uk> Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, akpm@linux-foundation.org From: Filipe Manana Message-ID: <5656D338.60303@suse.com> Date: Thu, 26 Nov 2015 09:39:04 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <1448498376.27159.42.camel@decadent.org.uk> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3931 Lines: 100 On 11/26/2015 12:39 AM, Ben Hutchings wrote: > On Wed, 2015-11-25 at 23:11 +0000, Luis Henriques wrote: >> On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote: >>> 3.2.74-rc1 review patch. If anyone has any objections, please let me know. >>> >>> ------------------ >>> >>> From: Filipe Manana >>> >>> commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream. >>> >>> When listing a inode's xattrs we have a time window where we race against >>> a concurrent operation for adding a new hard link for our inode that makes >>> us not return any xattr to user space. In order for this to happen, the >>> first xattr of our inode needs to be at slot 0 of a leaf and the previous >>> leaf must still have room for an inode ref (or extref) item, and this can >>> happen because an inode's listxattrs callback does not lock the inode's >>> i_mutex (nor does the VFS does it for us), but adding a hard link to an >>> inode makes the VFS lock the inode's i_mutex before calling the inode's >>> link callback. >>> >>> If we have the following leafs: >>> >>> Leaf X (has N items) Leaf Y >>> >>> [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ] >>> slot N - 2 slot N - 1 slot 0 >>> >>> The race illustrated by the following sequence diagram is possible: >>> >>> CPU 1 CPU 2 >>> >>> btrfs_listxattr() >>> >>> searches for key (257 XATTR_ITEM 0) >>> >>> gets path with path->nodes[0] == leaf X >>> and path->slots[0] == N >>> >>> because path->slots[0] is >= >>> btrfs_header_nritems(leaf X), it calls >>> btrfs_next_leaf() >>> >>> btrfs_next_leaf() >>> releases the path >>> >>> adds key (257 INODE_REF 666) >>> to the end of leaf X (slot N), >>> and leaf X now has N + 1 items >>> >>> searches for the key (257 INODE_REF 256), >>> with path->keep_locks == 1, because that >>> is the last key it saw in leaf X before >>> releasing the path >>> >>> ends up at leaf X again and it verifies >>> that the key (257 INODE_REF 256) is no >>> longer the last key in leaf X, so it >>> returns with path->nodes[0] == leaf X >>> and path->slots[0] == N, pointing to >>> the new item with key (257 INODE_REF 666) >>> >>> btrfs_listxattr's loop iteration sees that >>> the type of the key pointed by the path is >>> different from the type BTRFS_XATTR_ITEM_KEY >>> and so it breaks the loop and stops looking >>> for more xattr items >>> --> the application doesn't get any xattr >>> listed for our inode >>> >>> So fix this by breaking the loop only if the key's type is greater than >>> BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller. >>> >>> Signed-off-by: Filipe Manana >>> [bwh: Backported to 3.2: s/found_key\.type/btrfs_key_type(\&found_key)/] >> >> Actually, in my backport to 3.16 I decided to keep the usage of >> 'found_key.type' instead, as the usage of btrfs_key_type() has been >> dropped with commit 962a298f3511 ("btrfs: kill the key type accessor >> helpers"). > [...] > > OK, that makes sense. btrfs in 3.2 is pretty inconsistent about using > btrfs_key_type() anyway. Using the type field directly, instead of the accessor, is perfectly safe (the field is an u8 so no worries about endianness conversions unlike, other field of struct btrfs_key which are u64s). > > Ben. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/