MIME-Version: 1.0
In-Reply-To: <1351518466.8467.65.camel@gandalf.local.home>
References: <1351517296-9173-1-git-send-email-fweisbec@gmail.com>
	<1351517296-9173-2-git-send-email-fweisbec@gmail.com>
	<1351518466.8467.65.camel@gandalf.local.home>
Date: Mon, 29 Oct 2012 20:18:22 +0100
Message-ID: <CAFTL4hxzGV4QFOLnTHYMhvKVsF_CrG7pOyi9hdgPBYLKdMB5zA@mail.gmail.com>
Subject: Re: [RFC PATCH 1/9] irq_work: Fix racy check on work pending flag
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Andrew Morton <akpm@linux-foundation.org>,
        Paul Gortmaker <paul.gortmaker@windriver.com>
Content-Type: text/plain; charset=ISO-8859-1
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1621
Lines: 32

2012/10/29 Steven Rostedt <rostedt@goodmis.org>:
> I wonder if the bug is just a memory barrier missing here? But that also
> suggests that the other CPU used a memory barrier too (or cmpxchg()
> which implies one).
>
> But this change looks fine too.

I fear a memory barrier wouldn't be much helpful here.

Thing is I'm not sure exactly what strict semantics is expected there:
do we allow for concurrent claiming only locally or also across CPUs?
Given that we only disable preemption on work queueing, I assumed we
allow for concurrent claiming across CPUs. Also we have irq work users
that may claim on random CPUs already and thus this can result in
cross-CPUs concurrent claiming. Task wide (not per cpu) perf event are
such examples. May be drivers/acpi/apei/ghes.c and perhaps others.

Also if we allow for cross-CPUs concurrent claiming, we should set
IRQ_WORK_BUSY with cmpxchg() (or xchg() if it's atomic, because nobody
can modify it concurrently at this stage as it's claimed). Otherwise a
CPU can fail its claim whereas the other CPU that claimed it may
actually have executed the work already, or is in the middle of doing
so. The race window stops when we clear the IRQ_WORK_BUSY flag with
cmpxchg so it's very tight. But still it's there.

Also irq_work_sync() should poll with ACCESS_ONCE(). Same problem with
foggy semantics here: do we allow remote sync?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/