Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933149AbcDYRkQ (ORCPT ); Mon, 25 Apr 2016 13:40:16 -0400 Received: from mail-io0-f176.google.com ([209.85.223.176]:34890 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754957AbcDYRkM (ORCPT ); Mon, 25 Apr 2016 13:40:12 -0400 MIME-Version: 1.0 In-Reply-To: <20160425170304.GB7822@mtj.duckdns.org> References: <1461597771-25352-1-git-send-email-roman.penyaev@profitbricks.com> <20160425154847.GZ7822@mtj.duckdns.org> <20160425170304.GB7822@mtj.duckdns.org> From: Roman Penyaev Date: Mon, 25 Apr 2016 19:39:52 +0200 Message-ID: Subject: Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO To: Tejun Heo Cc: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2831 Lines: 72 On Mon, Apr 25, 2016 at 7:03 PM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: >> I can assure you that smp_mb() helps (at least running for 30 minutes >> under IO). That was my first variant, but I did not like it because I >> could not explain myself why: >> >> 1. not smp_wmb()? We need to do flush after an update. >> (I tried that also, and it does not help) > > Regardless of the success of queue_work(), the interface guarantees > that there will be at least one execution instance which sees whatever > updates the queuer has made prior to calling queue_work(). The > PENDING bit is what synchronizes this operations. > > A B > > Make updates > clear PENDING test_and_set PENDING > start execution > > So, if B's test_and_set takes place before clearing of PENDING, what > should be guaranteed is that A's execution must be able to see B's > updates; however, as there's no barrier between "clear PENDING" and > "start execution", memory loads of execution can be scheduled before > clearing of PENDING which leads to a situation where B loses queueing > but its updates are not seen by the prior instance's execution. It's > a classic "either a sees b (clear PENDING) or b sees a (prior > updates)" interlocking situation. Ok, that's clear now. Thanks. I was confused also by a spin lock, which is being released just after clear pending: set_work_pool_and_clear_pending(work, pool->id); spin_unlock_irq(&pool->lock); ... worker->current_func(work); But seems memory operations of execution can leak-in and appear before pended bit is cleared and spin lock is released. (according to Documentation/memory-barriers.txt, (6) RELEASE operations) >> 2. what protects us from this situation? >> >> CPU#0 CPU#1 >> set_work_data() >> test_and_set_bit() >> smp_mb() > > The above would be completely fine as CPU#1's execution would see all > the changes CPU#0 has made upto that point. > >> And 2. question was crucial to me, because even tiny delay "fixes" the >> problem, e.g. ndelay also "fixes" the bug: >> >> smp_wmb(); >> set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0); >> + ndelay(40); >> } >> >> Why ndelay(40)? Because on this machine smp_mb() takes 40 ns on average. > > Yeah, this is the CPU rescheduling loads for the execution ahead of > clearing of PENDING and doing anything inbetween is likely to reduce > the chance of it happening drastically, but smp_mb() inbetween is > actually the right solution here. Tejun, do you need an updated patch for that? With a proper smp_mb()? -- Roman