Received: by 2002:a05:6359:322:b0:b3:69d0:12d8 with SMTP id ef34csp440313rwb; Wed, 10 Aug 2022 10:38:42 -0700 (PDT) X-Google-Smtp-Source: AA6agR5gJg52xnDkQjUuHTQhmB3JFcMeZQNodBNPyIuR7B+ifSMijxyFDhOQBAIBVCGgv19W3VHZ X-Received: by 2002:a17:907:7678:b0:730:e1ad:b128 with SMTP id kk24-20020a170907767800b00730e1adb128mr19868986ejc.67.1660153121899; Wed, 10 Aug 2022 10:38:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660153121; cv=none; d=google.com; s=arc-20160816; b=0Z0tbupnhP3dElNLnSxHdX2yKPHI492rHVObmTR4mNGLuhSZE3XbIYkrh7Q+TOMaSl OzUZOWP7lSZrRA7cUynK+l1JmlWXHb3pFxMTiqe/15psYlOHz/D+NUOzZHHe9RQiJaUm Zk9TcQgepZYSVkjePAc78tMzDZK/t4AdendE7hWNkhB/7eETF01WHfJ1MUWEq1zu0agV 8HAjLqsmJtXA9WHZ6rAuT+kghI7gJ2xO6rXGwj6dZPSv2iRZhZZfUWQGAbr698O05GOv vwUvjcDs+eaHS4cod00WlxxBTeJ9hDqgFg9HK1bNZVRIA+FzIlskMhvW8zgEroZV6uQg hp0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=3kiPsqUPtso+NNqIgTkQ85tKAvMJGntQbjjUHjVcF7I=; b=p5xxAubtWe4D73PqVYkcEXaMj8AWuq4VemJ0BDkjjBrI/QsIXCWYIqOByZklw5Zk+6 mNiZtDc8Pt9cWropkbMiEFGFN4otS4diG5VidUG0FYTP9rBkYFo2mC6V2+Ln/IW9M5BI 2ZT2Cl1qvKejUzsiXIdZZV+G+alZzlt7qz30S/Xlo5Uit0UbdP2fscGy7W+8SxOLRd59 8S9Mp2dxOUBiK5NrpNfpNyHktv0VtM+SgE6V/idPtQTsoYACHuSX2O5WO+y0VXHNYWn3 njhaTL9znKeAZeaj2djAQTf2xSX8p8pwp7IwZ0Ugw1BdfcBT+6ae8oZhuwaRAULuBd8s hDMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dB94HTgo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s11-20020a170906a18b00b0073086d1e90csi4693433ejy.60.2022.08.10.10.38.16; Wed, 10 Aug 2022 10:38:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dB94HTgo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233496AbiHJRSs (ORCPT + 99 others); Wed, 10 Aug 2022 13:18:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233548AbiHJRS3 (ORCPT ); Wed, 10 Aug 2022 13:18:29 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 73777832DE; Wed, 10 Aug 2022 10:18:11 -0700 (PDT) Received: from pwmachine.numericable.fr (85-170-37-153.rev.numericable.fr [85.170.37.153]) by linux.microsoft.com (Postfix) with ESMTPSA id 313C7210CB09; Wed, 10 Aug 2022 10:18:07 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 313C7210CB09 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1660151890; bh=3kiPsqUPtso+NNqIgTkQ85tKAvMJGntQbjjUHjVcF7I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dB94HTgoM6vLToCJMc+xWK/jUYegFT21uB7xqzT78FYU/qz7ZES9fGtXM+3spSy0b Lf53z+KnmjBeyuCAd5XsXljNylV3loy61Fb8vnKU2vXq+nTt5S7cA98KA5P0dFvQOz imsaV8vkx9EXUkEAkYNxdZd3bSvoXElAGG6BIx00= From: Francis Laniel To: bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Francis Laniel , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joanne Koong , Dave Marchevsky , Lorenzo Bianconi , Geliang Tang , Hengqi Chen Subject: [RFC PATCH v1 1/3] bpf: Make ring buffer overwritable. Date: Wed, 10 Aug 2022 19:16:52 +0200 Message-Id: <20220810171702.74932-2-flaniel@linux.microsoft.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220810171702.74932-1-flaniel@linux.microsoft.com> References: <20220810171702.74932-1-flaniel@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org By default, BPF ring buffer are size bounded, when producers already filled the buffer, they need to wait for the consumer to get those data before adding new ones. In terms of API, bpf_ringbuf_reserve() returns NULL if the buffer is full. This patch permits making BPF ring buffer overwritable. When producers already wrote as many data as the buffer size, they will begin to over write existing data, so the oldest will be replaced. As a result, bpf_ringbuf_reserve() never returns NULL. Signed-off-by: Francis Laniel --- include/uapi/linux/bpf.h | 3 +++ kernel/bpf/ringbuf.c | 51 +++++++++++++++++++++++++++++++--------- 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ef78e0e1a754..19c7039265d8 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1226,6 +1226,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* Create an over writable BPF_RINGBUF */ + BFP_F_RB_OVER_WRITABLE = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index ded4faeca192..e2d907df4989 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -12,7 +12,7 @@ #include #include -#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE) +#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE | BFP_F_RB_OVER_WRITABLE) /* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */ #define RINGBUF_PGOFF \ @@ -37,6 +37,8 @@ struct bpf_ringbuf { u64 mask; struct page **pages; int nr_pages; + __u8 over_writable: 1, + __reserved: 7; spinlock_t spinlock ____cacheline_aligned_in_smp; /* Consumer and producer counters are put into separate pages to allow * mapping consumer page as r/w, but restrict producer page to r/o. @@ -127,7 +129,12 @@ static void bpf_ringbuf_notify(struct irq_work *work) wake_up_all(&rb->waitq); } -static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) +static inline bool is_over_writable(struct bpf_ringbuf *rb) +{ + return !!rb->over_writable; +} + +static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node, __u32 flags) { struct bpf_ringbuf *rb; @@ -142,6 +149,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) rb->mask = data_sz - 1; rb->consumer_pos = 0; rb->producer_pos = 0; + rb->over_writable = !!(flags & BFP_F_RB_OVER_WRITABLE); return rb; } @@ -170,7 +178,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr) bpf_map_init_from_attr(&rb_map->map, attr); - rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node); + rb_map->rb = bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node, attr->map_flags); if (!rb_map->rb) { kfree(rb_map); return ERR_PTR(-ENOMEM); @@ -244,11 +252,15 @@ static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) { - unsigned long cons_pos, prod_pos; + unsigned long cons_pos, prod_pos, diff; cons_pos = smp_load_acquire(&rb->consumer_pos); prod_pos = smp_load_acquire(&rb->producer_pos); - return prod_pos - cons_pos; + diff = prod_pos - cons_pos; + + if (is_over_writable(rb) && diff > rb->mask) + return rb->mask; + return diff; } static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp, @@ -327,12 +339,29 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) prod_pos = rb->producer_pos; new_prod_pos = prod_pos + len; - /* check for out of ringbuf space by ensuring producer position - * doesn't advance more than (ringbuf_size - 1) ahead - */ - if (new_prod_pos - cons_pos > rb->mask) { - spin_unlock_irqrestore(&rb->spinlock, flags); - return NULL; + if (!is_over_writable(rb)) { + /* check for out of ringbuf space by ensuring producer position + * doesn't advance more than (ringbuf_size - 1) ahead + */ + if (new_prod_pos - cons_pos > rb->mask) { + spin_unlock_irqrestore(&rb->spinlock, flags); + return NULL; + } + } else { + /* + * Data length is already rounded to be divisible by 8, but in + * the case of over writing buffer we need to round it again. + * Indeed, when the producer position will cross the buffer + * size, it is possible new position will not be divisible by + * buffer size. + * For example, if len is 520 and buffer size is 4096, then the + * next position after 4096 is 4160. + * This is a problem as it will impede us to over write data + * (4160 & 4095 = 64 which is different from 0). + * So by substracting the modulo of len, we are able to over + * write existing data. + */ + new_prod_pos -= (new_prod_pos & rb->mask) % len; } hdr = (void *)rb->data + (prod_pos & rb->mask); -- 2.25.1