Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754322AbdCaJpB (ORCPT ); Fri, 31 Mar 2017 05:45:01 -0400 Received: from s3.sipsolutions.net ([5.9.151.49]:44236 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750903AbdCaJo7 (ORCPT ); Fri, 31 Mar 2017 05:44:59 -0400 Message-ID: <1490953494.6288.5.camel@sipsolutions.net> Subject: Re: deadlock in synchronize_srcu() in debugfs? From: Johannes Berg To: Nicolai Stange Cc: linux-kernel , "Paul E.McKenney" , gregkh Date: Fri, 31 Mar 2017 11:44:54 +0200 In-Reply-To: <871stdyg0u.fsf@gmail.com> (sfid-20170331_110332_008161_46B67339) References: <1490280886.2766.4.camel@sipsolutions.net> <87o9ws6m4s.fsf@gmail.com> <1490614617.3393.4.camel@sipsolutions.net> <87lgrnmd8b.fsf@gmail.com> <1490860547.32041.1.camel@sipsolutions.net> <87inmroy9w.fsf@d147-183.mpimet.mpg.de> <1490872283.32041.4.camel@sipsolutions.net> <871stdyg0u.fsf@gmail.com> (sfid-20170331_110332_008161_46B67339) Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.4-1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1730 Lines: 54 On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote: > > 2) > > There's a complete deadlock situation if this happens: > > > > CPU1 CPU2 > > > > debugfs_file_read(file="foo") mutex_lock(&M); > > srcu_read_lock(&debugfs_srcu); debugfs_remove(file=" > > bar") > > mutex_lock(&M); synchronize_srcu(&de > > bugfs_srcu) > > > > This is intrinsically unrecoverable. > > Let's address this in a second step. I suspect that it's actually better to address both in the same step, but whatever :) > > That seems like a strange argument to me - something has to exist > > for a process to be able to look up the file, and currently the > > proxy also has to exist? > > No, the proxies are created at file _open_ time and installed at the > struct file. > > Rationale: there are potentially many debugfs files with only few of > them opened at a time and a proxy, i.e. a struct file_operations, is > quite large. Ok, that makes sense. But that's not really a show-stopper, is it? You can either have a proxy or not have it at remove time, and if you don't have one then you can remove safely, right? And if you do have a proxy, then you have to write_lock() it. Lookup of the proxy itself can still be protected by (S)RCU, but you can't go into the debugfs file callbacks while you hold (S)RCU, so that you can safely determine whether or not a proxy exists. I'm handwaving though - there are problems here with freeing the proxy again when you close a file. Perhaps something like * first, remove the pointer and wait for a grace period * write_lock() it to make sure nobody is still inside it * delete it now works. > I'll work out a solution this weekend and send some RFC patches then. > Thanks! johannes