Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Date:   Mon, 19 Sep 2022 18:19:08 -0500
From:   David Vernet <void@manifault.com>
To:     Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc:     Joanne Koong <joannelkoong@gmail.com>, ast@kernel.org,
        daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
        bpf@vger.kernel.org, song@kernel.org, yhs@fb.com,
        john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
        haoluo@google.com, jolsa@kernel.org, tj@kernel.org,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 2/4] bpf: Add bpf_user_ringbuf_drain() helper
Message-ID: <Yyj47Bc01MvJyU9n@maniforge.dhcp.thefacebook.com>
References: <20220902234317.2518808-1-void@manifault.com>
 <20220902234317.2518808-3-void@manifault.com>
 <CAEf4BzZgv7W=OBi+PogxrvuM1c8QRoZ8O4s347MfKdm-p=qY0w@mail.gmail.com>
 <YyjO2LspZ3GDELDw@maniforge.dhcp.thefacebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <YyjO2LspZ3GDELDw@maniforge.dhcp.thefacebook.com>
User-Agent: Mutt/2.2.7 (2022-08-07)
Precedence: bulk

On Mon, Sep 19, 2022 at 03:19:36PM -0500, David Vernet wrote:
> > > +static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
> > > +{
> > > +       u64 consumer_pos;
> > > +       u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> > > +
> > > +       /* Using smp_load_acquire() is unnecessary here, as the busy-bit
> > > +        * prevents another task from writing to consumer_pos after it was read
> > > +        * by this task with smp_load_acquire() in __bpf_user_ringbuf_peek().
> > > +        */
> > > +       consumer_pos = rb->consumer_pos;
> > > +        /* Synchronizes with smp_load_acquire() in user-space producer. */
> > > +       smp_store_release(&rb->consumer_pos, consumer_pos + rounded_size);
> > > +
> > > +       /* Prevent the clearing of the busy-bit from being reordered before the
> > > +        * storing of the updated rb->consumer_pos value.
> > > +        */
> > > +       smp_mb__before_atomic();
> > > +       atomic_set(&rb->busy, 0);
> > > +
> > > +       if (flags & BPF_RB_FORCE_WAKEUP)
> > > +               irq_work_queue(&rb->work);
> > 
> > I think this part is new, you decided to define that FORCE_WAKEUP
> > sends wakeup after every single consumed sample? I have no strong
> > opinion on this, tbh, just wonder if it wasn't enough to do it once
> > after drain?
> 
> I didn't have a strong reason for doing this other than that I think it
> more closely matches the behavior for BPF_MAP_TYPE_RINGBUF (which invokes
> irq_work_queue() after every call to bpf_ringbuf_commit() if
> BPF_RB_FORCE_WAKEUP is passed). Let's just match that behavior unless we
> have a good reason not to? I think that will be more intuitive for users.

Hmm, something else to consider is that if we move the busy-bit setting
into bpf_user_ringbuf_drain() per your suggestion below, the critical
section is now the the whole sample drain loop. That's of course _not_ the
case for BPF_MAP_TYPE_RINGBUF, which just holds the spinlock while
reserving the sample. It seems excessive to invoke irq_work_queue() while
the busy bit is held, so I think we should just have the behavior be to
only have BPF_RB_FORCE_WAKEUP imply that a wakeup will always be sent, even
if no sample was drained.

Let me know if you disagree, but for now I'll work on spinning up a v6 that
only issues the forced wakeup event once after drain.

> > > +}
> > > +
> > > +BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map,
> > > +          void *, callback_fn, void *, callback_ctx, u64, flags)
> > > +{
> > > +       struct bpf_ringbuf *rb;
> > > +       long samples, discarded_samples = 0, ret = 0;
> > > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > > +       u64 wakeup_flags = BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP;
> > > +
> > > +       if (unlikely(flags & ~wakeup_flags))
> > > +               return -EINVAL;
> > > +
> > > +       rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
> > > +       for (samples = 0; samples < BPF_MAX_USER_RINGBUF_SAMPLES && ret == 0; samples++) {
> > > +               int err;
> > > +               u32 size;
> > > +               void *sample;
> > > +               struct bpf_dynptr_kern dynptr;
> > > +
> > > +               err = __bpf_user_ringbuf_peek(rb, &sample, &size);
> > 
> > so I also just realized that ringbuf_peek will keep setting/resetting
> > busy flag, and in like all the practical case it's a completely
> > useless work as we don't intend to have competing consumers, right? So
> > maybe move busy bit handling into drain itself and document that peek
> > expect busy taken care of?
> > 
> > This should be noticeable faster when there are multiple records
> > consumed in one drain.
> 
> Great idea, I'll do this in v6.

Thanks,
David