Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752651AbcLGI4D (ORCPT ); Wed, 7 Dec 2016 03:56:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:35648 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751609AbcLGI4B (ORCPT ); Wed, 7 Dec 2016 03:56:01 -0500 Date: Tue, 6 Dec 2016 18:07:19 +0100 From: Jan Kara To: Miklos Szeredi Cc: Jan Kara , Amir Goldstein , Eric Paris , linux-fsdevel , linux-kernel Subject: Re: fsnotify_mark_srcu wtf? Message-ID: <20161206170719.GA2347@quack2.suse.cz> References: <20161102220851.GA1839@veci.piliscsaba.szeredi.hu> <20161105213411.GA32353@quack2.suse.cz> <20161109111005.GA32353@quack2.suse.cz> <20161110194625.GG31098@quack2.suse.cz> <20161202104804.GC26086@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2907 Lines: 56 On Fri 02-12-16 12:02:33, Miklos Szeredi wrote: > On Fri, Dec 2, 2016 at 11:48 AM, Jan Kara wrote: > > On Fri 02-12-16 09:26:51, Miklos Szeredi wrote: > >> On Thu, Nov 10, 2016 at 8:46 PM, Jan Kara wrote: > >> > On Wed 09-11-16 20:26:16, Amir Goldstein wrote: > >> >> On Wed, Nov 9, 2016 at 1:10 PM, Jan Kara wrote: > >> > >> >> > And this does not work as well... Fanotify must notify groups by their > >> >> > priority so you cannot arbitrarily reorder ordering in which groups get > >> >> > notified. I'm currently pondering on using mark refcount to pin it when > >> >> > processing permission event but there are still some details to check. > >> >> > > >> >> > >> >> All right, mark refcount sound like the proper solution. > >> > > >> > Except it doesn't quite work. We can pin the current marks by a refcount > >> > but they can still be removed from the list so after we regain srcu lock, > >> > we are not sure their ->next pointers still point to still allocated marks > >> > :-| Sadly I realized this only after implementing all this. > >> > >> Hmm, how about this: when removing mark from inode, drop refcount. If > >> refcount is zero can remove from list. Otherwise mark the mark "dead" > >> and leave it on the list. > >> > >> And fsnotify can just skip dead marks. > > > > I had this idea as well and when trying to implement this, I've stumbled > > over some problems. I think the biggest problem was that destruction of a > > notification mark is relatively complex operation (doing iput() for > > example) and quite a few places dropping mark references are in a context > > where this can cause problems. Also I don't want to defer iput() to a > > workqueue as that will have unexpected consequences such as unlinked > > watched inode lingering in the system (possibly colliding with umount > > etc.). > > Okay, but all we need from the deleted mark is the ->next pointer, no? > So everything else related to destruction can be done. Yes, all we need for continuing the iteration itself is a ->next pointer and possibly a ->group pointer. However the list itself is starting at the inode->i_fsnotify_marks and the removal of the mark from the list needs to be protected by inode->i_lock so we do need inode reference while the mark is still attached to the inode list. So more plausible looks to wait while detaching mark from the object for all responses for that mark. Another possibility would be to create separate object for the head of the list of fsnotify marks (containing also the lock for the list). That would allow us to detach list of marks from the object and thus we should not have issues with conflicting lifetime needs... And the additional memory overhead (about 3 longs) is not that big compared to the size of fsnotify mark (currently 9 longs). We'll see. Honza -- Jan Kara SUSE Labs, CR