Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
MIME-Version: 1.0
In-Reply-To: <20180307145408.GC3918@linux.vnet.ibm.com>
References: <20180306172657.3060270-1-tj@kernel.org> <20180306173316.3088458-1-tj@kernel.org>
 <20180306173316.3088458-7-tj@kernel.org> <CAJhGHyAuz3TRCpPnCBzZ+om3k9N3ykCWYOq_bv+hucV=XXiY9w@mail.gmail.com>
 <20180307145408.GC3918@linux.vnet.ibm.com>
From:   Lai Jiangshan <jiangshanlai+lkml@gmail.com>
Date:   Thu, 8 Mar 2018 08:29:53 +0800
Message-ID: <CAJhGHyDEpXG88uDHiogxUH0Yp5sU3v1p-kUSvJbGYCZ+HGucvQ@mail.gmail.com>
Subject: Re: [PATCH 7/7] RCU, workqueue: Implement rcu_work
To:     "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc:     Tejun Heo <tj@kernel.org>, torvalds@linux-foundation.org,
        Jann Horn <jannh@google.com>, bcrl@kvack.org,
        viro@zeniv.linux.org.uk,
        Kent Overstreet <kent.overstreet@gmail.com>,
        security@kernel.org, LKML <linux-kernel@vger.kernel.org>,
        kernel-team@fb.com
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Wed, Mar 7, 2018 at 10:54 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Mar 07, 2018 at 10:49:49AM +0800, Lai Jiangshan wrote:
>> On Wed, Mar 7, 2018 at 1:33 AM, Tejun Heo <tj@kernel.org> wrote:
>>
>> > +/**
>> > + * queue_rcu_work_on - queue work on specific CPU after a RCU grace period
>> > + * @cpu: CPU number to execute work on
>> > + * @wq: workqueue to use
>> > + * @rwork: work to queue
>>
>> For many people, "RCU grace period" is clear enough, but not ALL.
>>
>> So please make it a little more clear that it just queues work after
>> a *Normal* RCU grace period. it supports only one RCU variant.
>>
>>
>> > + *
>> > + * Return: %false if @work was already on a queue, %true otherwise.
>> > + */
>>
>> I'm afraid this will be a hard-using API.
>>
>> The user can't find a plan B when it returns false, especially when
>> the user expects the work must be called at least once again
>> after an RCU grace period.
>>
>> And the error-prone part of it is that, like other queue_work() functions,
>> the return value of it is often ignored and makes the problem worse.
>>
>> So, since workqueue.c provides this API, it should handle this
>> problem. For example, by calling call_rcu() again in this case, but
>> everything will be much more complex: a synchronization is needed
>> for "calling call_rcu() again" and allowing the work item called
>> twice after the last queue_rcu_work() is not workqueue style.
>
> I confess that I had not thought of this aspect, but doesn't the
> rcu_barrier() in v2 of this patchset guarantee that it has passed
> the RCU portion of the overall wait time?  Given that, what I am
> missing is now this differs from flush_work() on a normal work item.
>
> So could you please show me the sequence of events that leads to this
> problem?  (Again, against v2 of this patch, which replaces the
> synchronize_rcu() with rcu_barrier().)

I mentioned a subtle use case that user would think it is supported
since the comment doesn't disallow it.

It is clear that the user expects
   the work must be called at least once after the API returns
   the work must be called after an RCU grace period

But in the case when the user expects the work must be called
at least once again after "queue_rcu_work() + an RCU grace period",
the API is not competent to it if the work is queued.
Although the user can detect it by the return value of
queue_rcu_work(), the user hardly makes his expectation
happen by adding appropriate code.

>
>> Some would argue that the delayed_work has the same problem
>> when the user expects the work must be called at least once again
>> after a period of time. But time interval is easy to detect, the user
>> can check the time and call the queue_delayed_work() again
>> when needed which is also a frequent design pattern. And
>> for rcu, it is hard to use this design pattern since it is hard
>> to detect (new) rcu grace period without using call_rcu().
>>
>> I would not provide this API. it is not a NACK. I'm just trying
>> expressing my thinking about the API. I'd rather RCU be changed
>> and RCU callbacks are changed to be sleepable. But a complete
>> overhaul cleanup on the whole source tree for compatibility
>> is needed at first, an even more complex job.
>
> One downside of allowing RCU callback functions to sleep is that
> one poorly written callback can block a bunch of other ones.
> One advantage of Tejun's approach is that such delays only affect
> the workqueues, which are already designed to handle such delays.
>
>                                                 Thanx, Paul
>
>> > +bool queue_rcu_work_on(int cpu, struct workqueue_struct *wq,
>> > +                      struct rcu_work *rwork)
>> > +{
>> > +       struct work_struct *work = &rwork->work;
>> > +
>> > +       if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
>> > +               rwork->wq = wq;
>> > +               rwork->cpu = cpu;
>> > +               call_rcu(&rwork->rcu, rcu_work_rcufn);
>> > +               return true;
>> > +       }
>> > +
>> > +       return false;
>> > +}
>> > +EXPORT_SYMBOL(queue_rcu_work_on);
>> > +
>>
>