Date: Fri, 19 Jun 2009 12:10:48 -0700 (PDT)
From: Davide Libenzi <davidel@xmailserver.org>
To: Gregory Haskins <ghaskins@novell.com>
cc: mst@redhat.com, kvm@vger.kernel.org,
       Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
       avi@redhat.com, paulmck@linux.vnet.ibm.com, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix
 notifier race conditions
In-Reply-To: <20090619185138.31118.14916.stgit@dev.haskins.net>
Message-ID: <alpine.DEB.1.10.0906191156580.14884@makko.or.mcafeemobile.com>
References: <20090619183534.31118.30934.stgit@dev.haskins.net> <20090619185138.31118.14916.stgit@dev.haskins.net>
User-Agent: Alpine 1.10 (DEB 962 2008-03-14)
X-GPG-PUBLIC_KEY: http://www.xmailserver.org/davidel.asc
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Length: 2166
Lines: 70

On Fri, 19 Jun 2009, Gregory Haskins wrote:

> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
> notifier->release() callback.  This lets notification clients know if
> the eventfd is about to go away and is very useful particularly for
> in-kernel clients.  However, as it stands today it is not possible to
> use the notification API in a race-free way.  This patch adds some
> additional logic to the notification subsystem to rectify this problem.
> 
> Background:
> -----------------------
> Eventfd currently only has one reference count mechanism: fget/fput.  This
> in of itself is normally fine.  However, if a client expects to be
> notified if the eventfd is closed, it cannot hold a fget() reference
> itself or the underlying f_ops->release() callback will never be invoked
> by VFS.  Therefore we have this somewhat unusual situation where we may
> hold a pointer to an eventfd object (by virtue of having a waiter registered
> in its wait-queue), but no reference.  This makes it nearly impossible to
> design a mutual decoupling algorithm: you cannot unhook one side from the
> other (or vice versa) without racing.

And why is that?

struct xxx {
	struct mutex mtx;
	struct file *file;
	...
};

struct file *xxx_get_file(struct xxx *x) {
	struct file *file;

	mutex_lock(&x->mtx);
	file = x->file;
	if (!file)
		mutex_unlock(&x->mtx);
	return file;
}

void xxx_release_file(struct xxx *x) {
	mutex_unlock(&x->mtx);
}

void handle_POLLHUP(struct xxx *x) {
	struct file *file;

	file = xxx_get_file(x);
	if (file) {
		unhook_waitqueue(file, ...);
		x->file = NULL;
		xxx_release_file(x);
	}
}


Every time you need to "use" file, you call xxx_get_file(), and if you get 
NULL, it means it's gone and you handle it accordigly to your IRQ fd 
policies. As soon as you done with the file, you call xxx_release_file().
Replace "mtx" with the lock that fits your needs.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/