Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp748809pxf; Wed, 24 Mar 2021 15:13:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx7ArDA7E6NPvtEq3IeoLq4QychCe5Zcjlg5AtVQYKG4CMzw/7l5cHEx8MIIe0cI4tANLCC X-Received: by 2002:a17:907:2661:: with SMTP id ci1mr5913366ejc.403.1616623987409; Wed, 24 Mar 2021 15:13:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616623987; cv=none; d=google.com; s=arc-20160816; b=ZKDxvfM1zJJ8Lw9QKo1/N9yKo0xAwcMLhIu9aR584WZ6gYADkP3YpLJ7A2/z2Dy1Bk 2QGM092OZUJcYKVDWaR44sVOX21n0/afrl6d9Qftxb3Gqa2s0mqm6C/KiTEeNAVekNUt XARAn76tiikGbx2+UjevoYktfCJzxR8tCWf3fI5DvB65J00G1dagrZPiwcFptHUbaMLz ZeHc2gH+7DsHBNj5YMJI6QKNeS1doO97fNdUm3fL2pCmPmCFN2w9Wob8pAMXB1MeyePH IaTE3M+4LDVdoScgu2iS0abc2gRq+R5jfe+SpuicjdmGmZoTYM2IYylmr3Tyg+fZrE8v UFYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ZegDJhsrs3t5DmLNRfuIr22Aaf3RCbcyU5Kj5Ptg79A=; b=0fKDe5GhI3yYU0v71f/sjIXwOX2HBQYFj4lIwLexz9CyNtOjpIPlnNnPFt74Vwn88A LkIOBg4iL/k2inKRE31oroO9+uQhzNoiC06B0x6UFCu9KuACBnYWVFl0xtIgTstok3Ts lBWLrEqDWQl2W9EEy+8wlbVHghq6mSVOfrXhP50EKAjnI5nJk+IiRBgnG0vK1dx84zpo moFS0pI6HwHTP+1NUdQ7duu+Uqt0WvCkgGY0xwOc2Fib566X9cW3VVaXMgOI/JtoaHE5 mSY96XzxVxzSqgNUhnPKSf2eAS997unp/hQ7KDeIg199gNM4Nc2Him/Fp5IYOukC3BcW 5ezQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=M1aeEDxm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v7si3340776edd.479.2021.03.24.15.12.39; Wed, 24 Mar 2021 15:13:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=M1aeEDxm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232422AbhCXDLa (ORCPT + 99 others); Tue, 23 Mar 2021 23:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232263AbhCXDK4 (ORCPT ); Tue, 23 Mar 2021 23:10:56 -0400 Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9632C061763 for ; Tue, 23 Mar 2021 20:10:55 -0700 (PDT) Received: by mail-lj1-x22c.google.com with SMTP id f26so28351070ljp.8 for ; Tue, 23 Mar 2021 20:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ZegDJhsrs3t5DmLNRfuIr22Aaf3RCbcyU5Kj5Ptg79A=; b=M1aeEDxmk9ECf3516Cji4G52oSFsAn8KliAgARkCx5OFt/DI6XCdQtqOSm9+ierP53 96dKx4KCJbD1lvEnlxaUGmiNHqSq7B9S+YnK8FMkPsxubMWmxCIKdd5hbiSOCmLVD8x2 y6n2VRtZwCuBVid10EDPdpch8y7Qu9isZUjXElYw2NB9WvCorTXHb/TyI6ot3eE3NxDI 7RsZdviWjL1D7m/r0hZMgfAIfoSy9HyDnzgjDhHI+4bgia3SIf9923vAMV2rMcGwAwJu +fnXWI39SCzeRrtCZiqNDUgvaQvB4cbJ5ZtxdmrQjAnfMEHQ9KUtf86yaWDN3j/WVdih Ii/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ZegDJhsrs3t5DmLNRfuIr22Aaf3RCbcyU5Kj5Ptg79A=; b=URWZaIByhhjzLfYa80JqV3YovKEv7TTFtJw8dYvHyCYKJlaPxQS4NGRQHJAJPVgHLP VeasYMKWSZQKPrAVT7H579+4t/CnBeGSSBQiqKKhkkYkIVOebdMHWBdhxEhYblvY+f4R LE1gnMAOzgXaADTw3a8R/7hswhtuTS0E5hpbPD2W4ApTRochUji1rVLLJwUiSfsdyrOF 07Xmrl8VzqxqJtvwG4HiE6Slsr7Km69t7nhY5+IbBD3tdSK0TeB5qEm7Wjdko3cUAdF9 VdPbuqafvOL1XrhErdPhipJvUjZo4IPqiKgG+h7o65gbFr0fSIUQ4StwfpXOqPf5IZB+ tJOQ== X-Gm-Message-State: AOAM531Ri9UF++hPS51owTqUl7lfUk0fbFVw/lgT3UrsjXmbVfzSA5/M p+EBslWzLMy1L+riYe6hdaHunk0MQgZaZ0adh8SAtA== X-Received: by 2002:a2e:9310:: with SMTP id e16mr638054ljh.226.1616555454025; Tue, 23 Mar 2021 20:10:54 -0700 (PDT) MIME-Version: 1.0 References: <20210316204252.427806-1-mic@digikod.net> <20210316204252.427806-8-mic@digikod.net> <7e494b74-8d5d-a109-6327-992d7d8fca87@digikod.net> In-Reply-To: <7e494b74-8d5d-a109-6327-992d7d8fca87@digikod.net> From: Jann Horn Date: Wed, 24 Mar 2021 04:10:27 +0100 Message-ID: Subject: Re: [PATCH v30 07/12] landlock: Support filesystem access-control To: =?UTF-8?B?TWlja2HDq2wgU2FsYcO8bg==?= Cc: James Morris , "Serge E . Hallyn" , Al Viro , Andrew Morton , Andy Lutomirski , Anton Ivanov , Arnd Bergmann , Casey Schaufler , David Howells , Jeff Dike , Jonathan Corbet , Kees Cook , Michael Kerrisk , Richard Weinberger , Shuah Khan , Vincent Dagonneau , Kernel Hardening , Linux API , linux-arch , "open list:DOCUMENTATION" , linux-fsdevel , kernel list , "open list:KERNEL SELFTEST FRAMEWORK" , linux-security-module , "the arch/x86 maintainers" , =?UTF-8?B?TWlja2HDq2wgU2FsYcO8bg==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 23, 2021 at 8:22 PM Micka=C3=ABl Sala=C3=BCn = wrote: > On 23/03/2021 18:49, Jann Horn wrote: > > On Tue, Mar 23, 2021 at 4:54 PM Micka=C3=ABl Sala=C3=BCn wrote: > >> On 23/03/2021 01:13, Jann Horn wrote: > >>> On Tue, Mar 16, 2021 at 9:43 PM Micka=C3=ABl Sala=C3=BCn wrote: > >>>> Using Landlock objects and ruleset, it is possible to tag inodes > >>>> according to a process's domain. > >>> [...] > >>>> +static void release_inode(struct landlock_object *const object) > >>>> + __releases(object->lock) > >>>> +{ > >>>> + struct inode *const inode =3D object->underobj; > >>>> + struct super_block *sb; > >>>> + > >>>> + if (!inode) { > >>>> + spin_unlock(&object->lock); > >>>> + return; > >>>> + } > >>>> + > >>>> + /* > >>>> + * Protects against concurrent use by hook_sb_delete() of th= e reference > >>>> + * to the underlying inode. > >>>> + */ > >>>> + object->underobj =3D NULL; > >>>> + /* > >>>> + * Makes sure that if the filesystem is concurrently unmount= ed, > >>>> + * hook_sb_delete() will wait for us to finish iput(). > >>>> + */ > >>>> + sb =3D inode->i_sb; > >>>> + atomic_long_inc(&landlock_superblock(sb)->inode_refs); > >>>> + spin_unlock(&object->lock); > >>>> + /* > >>>> + * Because object->underobj was not NULL, hook_sb_delete() a= nd > >>>> + * get_inode_object() guarantee that it is safe to reset > >>>> + * landlock_inode(inode)->object while it is not NULL. It i= s therefore > >>>> + * not necessary to lock inode->i_lock. > >>>> + */ > >>>> + rcu_assign_pointer(landlock_inode(inode)->object, NULL); > >>>> + /* > >>>> + * Now, new rules can safely be tied to @inode with get_inod= e_object(). > >>>> + */ > >>>> + > >>>> + iput(inode); > >>>> + if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode= _refs)) > >>>> + wake_up_var(&landlock_superblock(sb)->inode_refs); > >>>> +} > >>> [...] > >>>> +static struct landlock_object *get_inode_object(struct inode *const= inode) > >>>> +{ > >>>> + struct landlock_object *object, *new_object; > >>>> + struct landlock_inode_security *inode_sec =3D landlock_inode= (inode); > >>>> + > >>>> + rcu_read_lock(); > >>>> +retry: > >>>> + object =3D rcu_dereference(inode_sec->object); > >>>> + if (object) { > >>>> + if (likely(refcount_inc_not_zero(&object->usage))) { > >>>> + rcu_read_unlock(); > >>>> + return object; > >>>> + } > >>>> + /* > >>>> + * We are racing with release_inode(), the object is= going > >>>> + * away. Wait for release_inode(), then retry. > >>>> + */ > >>>> + spin_lock(&object->lock); > >>>> + spin_unlock(&object->lock); > >>>> + goto retry; > >>>> + } > >>>> + rcu_read_unlock(); > >>>> + > >>>> + /* > >>>> + * If there is no object tied to @inode, then create a new o= ne (without > >>>> + * holding any locks). > >>>> + */ > >>>> + new_object =3D landlock_create_object(&landlock_fs_underops,= inode); > >>>> + if (IS_ERR(new_object)) > >>>> + return new_object; > >>>> + > >>>> + /* Protects against concurrent get_inode_object() calls. */ > >>>> + spin_lock(&inode->i_lock); > >>>> + object =3D rcu_dereference_protected(inode_sec->object, > >>>> + lockdep_is_held(&inode->i_lock)); > >>> > >>> rcu_dereference_protected() requires that inode_sec->object is not > >>> concurrently changed, but I think another thread could call > >>> get_inode_object() while we're in landlock_create_object(), and then > >>> we could race with the NULL write in release_inode() here? (It > >>> wouldn't actually be a UAF though because we're not actually accessin= g > >>> `object` here.) Or am I missing a lock that prevents this? > >>> > >>> In v28 this wasn't an issue because release_inode() was holding > >>> inode->i_lock (and object->lock) during the NULL store; but in v29 an= d > >>> this version the NULL store in release_inode() moved out of the locke= d > >>> region. I think you could just move the NULL store in release_inode() > >>> back up (and maybe add a comment explaining the locking rules for > >>> landlock_inode(...)->object)? > >>> > >>> (Or alternatively you could use rcu_dereference_raw() with a comment > >>> explaining that the read pointer is only used to check for NULL-ness, > >>> and that it is guaranteed that the pointer can't change if it is NULL > >>> and we're holding the lock. But that'd be needlessly complicated, I > >>> think.) > >> > >> To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in > >> release_inode() or in hook_sb_delete(), the > >> landlock_inode(inode)->object need to be non-NULL, > > > > Yes. > > > >> which implies that a > >> call to get_inode_object(inode) either "retry" (because release_inode = is > >> only called by landlock_put_object, which set object->usage to 0) unti= l > >> it creates a new object, or reuses the existing referenced object (and > >> increments object->usage). > > > > But it can be that landlock_inode(inode)->object only becomes non-NULL > > after get_inode_object() has checked > > rcu_dereference(inode_sec->object). > > > >> The worse case would be if > >> get_inode_object(inode) is called just before the > >> rcu_assign_pointer(landlock_inode(inode)->object, NULL) from > >> hook_sb_delete(), which would result in an object with a NULL underobj= , > >> which is the expected behavior (and checked by release_inode). > > > > The scenario I'm talking about doesn't involve hook_sb_delete(). > > > >> The line rcu_assign_pointer(inode_sec->object, new_object) from > >> get_inode_object() can only be reached if the underlying inode doesn't > >> reference an object, > > > > Yes. > > > >> in which case hook_sb_delete() will not reach the > >> rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this > >> same inode. > >> > >> This works because get_inode_object(inode) is mutually exclusive to > >> itself with the same inode (i.e. an inode can only point to an object > >> that references this same inode). > > > > To clarify: You can concurrently call get_inode_object() multiple > > times on the same inode, right? There are no locks held on entry to > > that function. > > > >> I tried to explain this with the comment "Protects against concurrent > >> get_inode_object() calls" in get_inode_object(), and the comments just > >> before both rcu_assign_pointer(landlock_inode(inode)->object, NULL). > > > > The scenario I'm talking about is: > > > > Initially the inode does not have an associated landlock_object. There > > are two threads A and B. Thread A is going to execute > > get_inode_object(). Thread B is going to execute get_inode_object() > > followed immediately by landlock_put_object(). > > > > thread A: enters get_inode_object() > > thread A: rcu_dereference(inode_sec->object) returns NULL > > thread A: enters landlock_create_object() > > thread B: enters get_inode_object() > > thread B: rcu_dereference(inode_sec->object) returns NULL > > thread B: calls landlock_create_object() > > thread B: sets inode_sec->object while holding inode->i_lock > > thread B: leaves get_inode_object() > > thread B: enters landlock_put_object() > > thread B: object->usage drops to 0, object->lock is taken > > thread B: calls release_inode() > > thread B: drops object->lock > > thread A: returns from landlock_create_object() > > thread A: takes inode->i_lock > > > > At this point, thread B will run: > > > > rcu_assign_pointer(landlock_inode(inode)->object, NULL); > > > > while thread A runs: > > > > rcu_dereference_protected(inode_sec->object, > > lockdep_is_held(&inode->i_lock)); > > > > meaning there is a (theoretical) data race, since > > rcu_dereference_protected() doesn't use READ_ONCE(). > > Hum, I see, that is what I was missing. And that explain why there is > (in practice) no impact on winning the race. > > I would prefer to use rcu_access_pointer() instead of > rcu_dereference_protected() to avoid pitfall, and it reflects what I was > expecting: > > --- a/security/landlock/fs.c > +++ b/security/landlock/fs.c > @@ -117,9 +117,7 @@ static struct landlock_object > *get_inode_object(struct inode *const inode) > > /* Protects against concurrent get_inode_object() calls. */ > spin_lock(&inode->i_lock); > - object =3D rcu_dereference_protected(inode_sec->object, > - lockdep_is_held(&inode->i_lock)); > - if (unlikely(object)) { > + if (unlikely(rcu_access_pointer(inode_sec->object))) { > /* Someone else just created the object, bail out and > retry. */ > spin_unlock(&inode->i_lock); > kfree(new_object); Ah, yeah, that should work. I had forgotten about rcu_access_pointer(). > But I'm not sure about your proposition to move the NULL store in > release_inode() back up. Do you mean to add back the inode lock in > release_inode() like this? > > --- a/security/landlock/fs.c > +++ b/security/landlock/fs.c > @@ -59,16 +59,12 @@ static void release_inode(struct landlock_object > *const object) > * Makes sure that if the filesystem is concurrently unmounted, > * hook_sb_delete() will wait for us to finish iput(). > */ > + spin_lock(&inode->i_lock); > sb =3D inode->i_sb; > atomic_long_inc(&landlock_superblock(sb)->inode_refs); > spin_unlock(&object->lock); > - /* > - * Because object->underobj was not NULL, hook_sb_delete() and > - * get_inode_object() guarantee that it is safe to reset > - * landlock_inode(inode)->object while it is not NULL. It is the= refore > - * not necessary to lock inode->i_lock. > - */ > rcu_assign_pointer(landlock_inode(inode)->object, NULL); > + spin_unlock(&inode->i_lock); > /* > * Now, new rules can safely be tied to @inode with get_inode_obj= ect(). > */ > > > I would prefer to avoid nested locks if it is not necessary though. Hm, yeah, you have a point there. Doing it locklessly does make the locking rules a little complicated though, and you'll have to update the comment inside struct landlock_inode_security. At the moment, it says: * @object: Weak pointer to an allocated object. All writes (i.e. * creating a new object or removing one) are protected by the * underlying inode->i_lock. Disassociating @object from the inode is * additionally protected by @object->lock, from the time @object's * usage refcount drops to zero to the time this pointer is nulled out. which isn't true anymore.