Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp5695097ybg; Tue, 22 Oct 2019 07:07:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqyIwcP/T1xMpvkhQXiSO19c1xnTAMuz3rpEw76Qfhtow45MQSRNUafPWGSuHbTQWJv+MV8v X-Received: by 2002:a17:906:3010:: with SMTP id 16mr27901222ejz.74.1571753223237; Tue, 22 Oct 2019 07:07:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571753223; cv=none; d=google.com; s=arc-20160816; b=ve1tVxLw/+Y9prsQ47Pu+nRG20EjzU3TVS+a8qjmpXXHIP1GRuSytKnxmqmpjeAyyI l0WhVf2mhIc8NSDxeCzNr28ql7509kgHmXFnbG8R2sHrzynt834HV0J4RNYjVdamHL+O SEninzVr6naYsAtWBThVDDhzxHhDTGe32iz7f0+oEwhTPK4QjwzL/H1DUOJUcOSE2aA6 Ue7W0x8XmL697QO0gGmrRO99AYf+dcZbi2BjRDdSFNVdccFtDLLR3JmxIZ75wzszQGis p3L5Nx3oPev3HCUL+Qpz+ZirzqJbIFiJj2VUDLeiUp6MEuAw9oHYodbjjVL7XqSHihg1 trcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=mjnZMJEV3WyBAZiHvE63RwZRABhNndfsuGLdR8yHggg=; b=DdNYigFmZgcIjj+7Xw92RbT674HlpQG5C9vXc6lmrZYNDsevKp/EcZyhauUeFKbW0i cFyAVXx1is32M9IRNOLEJM7AmsdSHJBHwxgYRu6JEM6YjKc5PHDv6Y/pTVJwpNpkhEwx Tn6xWh3yK2vMOhwi2sWmQWzc6BbfoICHFOcaB/rvrMvhRl/tQ/Mfg7aGkkdKyA0K0f4T UAyPwMCxO4UhVnn0EQvYL3ikXABxbOZcZ59hmM8ex1MxgeshNXLD/+l2NUdQKqetPng4 LNxYzgk/yJO7B1igQqoMjJf8GCZOMF/Kt9LS/gBLU3LSvRg+PCyksbc1rbVMlfVlw4EI Llbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id jo18si10964755ejb.27.2019.10.22.07.06.37; Tue, 22 Oct 2019 07:07:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732134AbfJVNrE (ORCPT + 99 others); Tue, 22 Oct 2019 09:47:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:55284 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730540AbfJVNrE (ORCPT ); Tue, 22 Oct 2019 09:47:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CF95EAE86; Tue, 22 Oct 2019 13:47:01 +0000 (UTC) Date: Tue, 22 Oct 2019 14:47:00 +0100 From: Luis Henriques To: "Yan, Zheng" Cc: Jeff Layton , Sage Weil , Ilya Dryomov , ceph-devel , Linux Kernel Mailing List Subject: Re: [PATCH] ceph: Fix use-after-free in __ceph_remove_cap Message-ID: <20191022134700.GA23308@hermes.olymp> References: <20191017144636.28617-1-lhenriques@suse.com> <87a79unocw.fsf@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 22, 2019 at 08:48:56PM +0800, Yan, Zheng wrote: > On Mon, Oct 21, 2019 at 10:55 PM Luis Henriques wrote: > > > > > Jeff Layton writes: > > > > > On Thu, 2019-10-17 at 15:46 +0100, Luis Henriques wrote: > > >> KASAN reports a use-after-free when running xfstest generic/531, with > > the > > >> following trace: > > >> > > >> [ 293.903362] kasan_report+0xe/0x20 > > >> [ 293.903365] rb_erase+0x1f/0x790 > > >> [ 293.903370] __ceph_remove_cap+0x201/0x370 > > >> [ 293.903375] __ceph_remove_caps+0x4b/0x70 > > >> [ 293.903380] ceph_evict_inode+0x4e/0x360 > > >> [ 293.903386] evict+0x169/0x290 > > >> [ 293.903390] __dentry_kill+0x16f/0x250 > > >> [ 293.903394] dput+0x1c6/0x440 > > >> [ 293.903398] __fput+0x184/0x330 > > >> [ 293.903404] task_work_run+0xb9/0xe0 > > >> [ 293.903410] exit_to_usermode_loop+0xd3/0xe0 > > >> [ 293.903413] do_syscall_64+0x1a0/0x1c0 > > >> [ 293.903417] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > >> > > >> This happens because __ceph_remove_cap() may queue a cap release > > >> (__ceph_queue_cap_release) which can be scheduled before that cap is > > >> removed from the inode list with > > >> > > >> rb_erase(&cap->ci_node, &ci->i_caps); > > >> > > >> And, when this finally happens, the use-after-free will occur. > > >> > > >> This can be fixed by protecting the rb_erase with the s_cap_lock > > spinlock, > > >> which is used by ceph_send_cap_releases(), before the cap is freed. > > >> > > >> Signed-off-by: Luis Henriques > > >> --- > > >> fs/ceph/caps.c | 4 ++-- > > >> 1 file changed, 2 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > >> index d3b9c9d5c1bd..21ee38cabe98 100644 > > >> --- a/fs/ceph/caps.c > > >> +++ b/fs/ceph/caps.c > > >> @@ -1089,13 +1089,13 @@ void __ceph_remove_cap(struct ceph_cap *cap, > > bool queue_release) > > >> } > > >> cap->cap_ino = ci->i_vino.ino; > > >> > > >> - spin_unlock(&session->s_cap_lock); > > >> - > > >> /* remove from inode list */ > > >> rb_erase(&cap->ci_node, &ci->i_caps); > > >> if (ci->i_auth_cap == cap) > > >> ci->i_auth_cap = NULL; > > >> > > >> + spin_unlock(&session->s_cap_lock); > > >> + > > >> if (removed) > > >> ceph_put_cap(mdsc, cap); > > >> > > > > > > Is there any reason we need to wait until this point to remove it from > > > the rbtree? ISTM that we ought to just do that at the beginning of the > > > function, before we take the s_cap_lock. > > > > That sounds good to me, at least at a first glace. I spent some time > > looking for any possible issues in the code, and even run a few tests. > > > > However, looking at git log I found commit f818a73674c5 ("ceph: fix cap > > removal races"), which moved that rb_erase from the beginning of the > > function to it's current position. So, unless the race mentioned in > > this commit has disappeared in the meantime (which is possible, this > > commit is from 2010!), this rbtree operation shouldn't be changed. > > > > And I now wonder if my patch isn't introducing a race too... > > __ceph_remove_cap() is supposed to always be called with the session > > mutex held, except for the ceph_evict_inode() path. Which is where I'm > > seeing the UAF. So, maybe what's missing here is the s_mutex. Hmm... > > > > > we can't lock s_mutex here, because i_ceph_lock is locked Well, my idea wasn't to get s_mutex here but earlier in the stack. Maybe in ceph_evict_inode, protecting the call to __ceph_remove_caps. But I didn't really looked into that yet, so I'm not really sure if that's feasible (or even if that would fix this UAF). I suspect that's not possible anyway, due to the comment above __ceph_remove_cap: caller will not hold session s_mutex if called from destroy_inode. Cheers, -- Lu?s