Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754028AbdDQQB1 (ORCPT ); Mon, 17 Apr 2017 12:01:27 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:35258 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753434AbdDQQBX (ORCPT ); Mon, 17 Apr 2017 12:01:23 -0400 Date: Mon, 17 Apr 2017 09:01:21 -0700 From: "Paul E. McKenney" To: Nicolai Stange Cc: Greg Kroah-Hartman , Johannes Berg , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Reply-To: paulmck@linux.vnet.ibm.com References: <871stdyg0u.fsf@gmail.com> <20170416095137.2784-1-nicstange@gmail.com> <20170416095137.2784-10-nicstange@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170416095137.2784-10-nicstange@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17041716-0024-0000-0000-0000024A8F84 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006935; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00848715; UDB=6.00419007; IPR=6.00627355; BA=6.00005292; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015070; XFM=3.00000013; UTC=2017-04-17 16:01:22 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041716-0025-0000-0000-0000436F3E9A Message-Id: <20170417160121.GP3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-17_14:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704170143 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9718 Lines: 268 On Sun, Apr 16, 2017 at 11:51:37AM +0200, Nicolai Stange wrote: > Currently, a dentry's debugfs_fsdata instance is allocated from > debugfs_file_get() at first usage, i.e. at first file opening. > > It won't ever get freed though. > > Ideally, these instances would get freed after the last open file handle > gets closed again, that is from the fops' ->release(). > > Unfortunately, we can't hook easily into those ->release()ers of files > created through debugfs_create_file_unsafe(), i.e. of those proxied through > debugfs_open_proxy_file_operations rather than through the "full" > debugfs_full_proxy_file_operations proxy. > > Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(), > with the drawback of a potential allocation + deallocation per > debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation. > > In addition to its former role of tracking pending fops, use the > ->active_users for reference counting on the debugfs_fsdata instance > itself. In particular, don't keep a dummy reference to be dropped from > __debugfs_remove_file(): a d_delete()ed dentry and thus, request for > completion notification, is now signaled by the d_unlinked() dentry itself. > > Once ->active_users drops to zero (and the dentry is still intact), free > the debugfs_fsdata instance from debugfs_file_put(). RCU protects any > concurrent debugfs_file_get() attempts to get a hold of the instance here. > Likewise for full_proxy_release() which lacks a call to debugfs_file_get(). > > Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair, > care must be taken in order to avoid races between debugfs_file_put() and > debugfs_file_get() as well as __debugfs_remove_file(). Rather than > introducing a global lock, exploit the fact that there will ever be only a > single !d_unlinked() -> d_unlinked() transition and add memory barriers > where needed. Given the lack of proper benchmarking, that debugfs fops > aren't performance critical and that we've already got a potential > allocation/deallocation pair anyway, the added code complexity might be > highly questionable though. > > Signed-off-by: Nicolai Stange If you have not already done so, please run this with debug enabled, especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). This is important because there are configurations for which the deadlocks you saw with SRCU turn into silent failure, including memory corruption. CONFIG_PROVE_RCU=y will catch many of those situations. (And yes, kfree_rcu() doesn't have that problem, but...) Another issue called out inline. Thanx, Paul > --- > fs/debugfs/file.c | 102 ++++++++++++++++++++++++++++++++++++++++---------- > fs/debugfs/inode.c | 8 +++- > fs/debugfs/internal.h | 1 + > 3 files changed, 90 insertions(+), 21 deletions(-) > > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > index f4dfd7d0d625..b2cc25d44a39 100644 > --- a/fs/debugfs/file.c > +++ b/fs/debugfs/file.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > #include > > #include "internal.h" > @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry) > struct debugfs_fsdata *fsd; > void *d_fsd; > > - d_fsd = READ_ONCE(dentry->d_fsdata); > + rcu_read_lock(); > +retry: > + d_fsd = rcu_dereference(dentry->d_fsdata); > if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) { > + /* > + * Paired with the control dependency in > + * debugfs_file_put(): if we saw the debugfs_fsdata > + * instance "restored" there but not the dead dentry, > + * we'd erroneously instantiate a fresh debugfs_fsdata > + * instance below. > + */ > + smp_rmb(); > + if (d_unlinked(dentry)) { > + rcu_read_unlock(); > + return -EIO; > + } > + > fsd = d_fsd; > + if (!refcount_inc_not_zero(&fsd->active_users)) { > + /* > + * A concurrent debugfs_file_put() dropped the > + * count to zero and is about to free the > + * debugfs_fsdata. Help out resetting the > + * ->d_fsdata and retry. > + */ > + d_fsd = (void *)((unsigned long)fsd->real_fops | > + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); > + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); This is an infrequent race, I hope? If on the other hand there is a possibility of this branch being taken a huge number of times in one call, it would be good to exit the RCU read-side critical section before retrying. > + goto retry; > + } > + rcu_read_unlock(); > } else { > + rcu_read_unlock(); > fsd = kmalloc(sizeof(*fsd), GFP_KERNEL); > if (!fsd) > return -ENOMEM; > @@ -91,25 +121,28 @@ int debugfs_file_get(struct dentry *dentry) > refcount_set(&fsd->active_users, 1); > init_completion(&fsd->active_users_drained); > if (cmpxchg(&dentry->d_fsdata, d_fsd, fsd) != d_fsd) { > + /* > + * Another debugfs_file_get() has installed a > + * debugfs_fsdata instance concurrently. > + * Free ours and retry to grab a reference on > + * the installed one. > + */ > kfree(fsd); > - fsd = READ_ONCE(dentry->d_fsdata); > + rcu_read_lock(); > + goto retry; And given this code path, why not put the retry: label before the initial rcu_read_lock()? Same number of lines of code, rcu_read_lock() and rcu_read_unlock() are very lightweight, the extra executions should be rare, and you might be avoiding a grace-period-starvation problem. > + } > + /* > + * In case of a successful cmpxchg() above, this check is > + * strictly necessary and must follow it, see the comment in > + * __debugfs_remove_file(). > + */ > + if (d_unlinked(dentry)) { > + if (refcount_dec_and_test(&fsd->active_users)) > + complete(&fsd->active_users_drained); > + return -EIO; > } > } > > - /* > - * In case of a successful cmpxchg() above, this check is > - * strictly necessary and must follow it, see the comment in > - * __debugfs_remove_file(). > - * OTOH, if the cmpxchg() hasn't been executed or wasn't > - * successful, this serves the purpose of not starving > - * removers. > - */ > - if (d_unlinked(dentry)) > - return -EIO; > - > - if (!refcount_inc_not_zero(&fsd->active_users)) > - return -EIO; > - > return 0; > } > EXPORT_SYMBOL_GPL(debugfs_file_get); > @@ -126,9 +159,29 @@ EXPORT_SYMBOL_GPL(debugfs_file_get); > void debugfs_file_put(struct dentry *dentry) > { > struct debugfs_fsdata *fsd = READ_ONCE(dentry->d_fsdata); > + void *d_fsd; > > - if (refcount_dec_and_test(&fsd->active_users)) > - complete(&fsd->active_users_drained); > + if (refcount_dec_and_test(&fsd->active_users)) { > + d_fsd = (void *)((unsigned long)fsd->real_fops | > + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); > + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); > + /* Paired with smp_mb() in __debugfs_remove_file(). */ > + smp_mb(); > + if (d_unlinked(dentry)) { > + /* > + * We have a control dependency paired with the > + * smp_rmb() in debugfs_file_get() here. > + * > + * Restore the debugfs_fsdata instance into > + * ->d_fsdata s.t. ->d_release() can free > + * it. > + */ > + WRITE_ONCE(dentry->d_fsdata, fsd); > + complete(&fsd->active_users_drained); > + } else { > + kfree_rcu(fsd, rcu_head); > + } > + } > } > EXPORT_SYMBOL_GPL(debugfs_file_put); > > @@ -221,9 +274,20 @@ static unsigned int full_proxy_poll(struct file *filp, > static int full_proxy_release(struct inode *inode, struct file *filp) > { > const struct dentry *dentry = F_DENTRY(filp); > - const struct file_operations *real_fops = debugfs_real_fops(filp); > const struct file_operations *proxy_fops = filp->f_op; > int r = 0; > + void *d_fsd; > + const struct file_operations *real_fops; > + > + rcu_read_lock(); > + d_fsd = rcu_dereference(F_DENTRY(filp)->d_fsdata); > + if ((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) { > + real_fops = (void *)((unsigned long)d_fsd & > + ~DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); > + } else { > + real_fops = ((struct debugfs_fsdata *)d_fsd)->real_fops; > + } > + rcu_read_unlock(); > > /* > * We must not protect this against removal races here: the > diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c > index 2360c17ec00a..bacb4d6bf178 100644 > --- a/fs/debugfs/inode.c > +++ b/fs/debugfs/inode.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -636,13 +637,16 @@ static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent) > * cmpxchg() in debugfs_file_get(): either > * debugfs_file_get() must see a dead dentry or we must see a > * debugfs_fsdata instance at ->d_fsdata here (or both). > + * > + * Also paired with the smp_mb() in debugfs_file_put(): if we > + * see a debugfs_fsdata instance here, then debugfs_file_put() > + * must see a dead dentry. > */ > smp_mb(); > fsd = READ_ONCE(dentry->d_fsdata); > if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) > return; > - if (!refcount_dec_and_test(&fsd->active_users)) > - wait_for_completion(&fsd->active_users_drained); > + wait_for_completion(&fsd->active_users_drained); > } > > static int __debugfs_remove(struct dentry *dentry, struct dentry *parent) > diff --git a/fs/debugfs/internal.h b/fs/debugfs/internal.h > index cb1e8139c398..0445bd7d11f2 100644 > --- a/fs/debugfs/internal.h > +++ b/fs/debugfs/internal.h > @@ -23,6 +23,7 @@ struct debugfs_fsdata { > const struct file_operations *real_fops; > refcount_t active_users; > struct completion active_users_drained; > + struct rcu_head rcu_head; > }; > > /* > -- > 2.12.2 >