LinuxLists.cc - [PATCH v5 net-next 1/6] xdp: allow same allocator usage

2019-06-30 17:24:45

Subject: [PATCH v5 net-next 1/6] xdp: allow same allocator usage

XDP rxqs can be same for ndevs running under same rx napi softirq.
But there is no ability to register same allocator for both rxqs,
by fact it can same rxq but has different ndev as a reference.

Due to last changes allocator destroy can be defered till the moment
all packets are recycled by destination interface, afterwards it's
freed. In order to schedule allocator destroy only after all users are
unregistered, add refcnt to allocator object and schedule to destroy
only it reaches 0.

Signed-off-by: Ivan Khoronzhuk <[email protected]>
---
include/net/xdp_priv.h | 1 +
net/core/xdp.c | 46 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)

diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
index 6a8cba6ea79a..995b21da2f27 100644
--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -18,6 +18,7 @@ struct xdp_mem_allocator {
struct rcu_head rcu;
struct delayed_work defer_wq;
unsigned long defer_warn;
+ unsigned long refcnt;
};

#endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index b29d7b513a18..a44621190fdc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
WARN(1, "Request remove non-existing id(%d), driver bug?", id);
return true;
}
+
+ /* to avoid calling hash lookup twice, decrement refcnt here till it
+ * reaches zero, then it can be called from workqueue afterwards.
+ */
+ if (xa->refcnt)
+ xa->refcnt--;
+
+ if (xa->refcnt) {
+ mutex_unlock(&mem_id_lock);
+ return true;
+ }
+
xa->disconnect_cnt++;

/* Detects in-flight packet-pages for page_pool */
@@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
return true;
}

+static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
+{
+ struct xdp_mem_allocator *xae, *xa = NULL;
+ struct rhashtable_iter iter;
+
+ mutex_lock(&mem_id_lock);
+ rhashtable_walk_enter(mem_id_ht, &iter);
+ do {
+ rhashtable_walk_start(&iter);
+
+ while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
+ if (xae->allocator == allocator) {
+ xae->refcnt++;
+ xa = xae;
+ break;
+ }
+ }
+
+ rhashtable_walk_stop(&iter);
+
+ } while (xae == ERR_PTR(-EAGAIN));
+ rhashtable_walk_exit(&iter);
+ mutex_unlock(&mem_id_lock);
+
+ return xa;
+}
+
int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
enum xdp_mem_type type, void *allocator)
{
@@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
}
}

+ xdp_alloc = xdp_allocator_get(allocator);
+ if (xdp_alloc) {
+ xdp_rxq->mem.id = xdp_alloc->mem.id;
+ return 0;
+ }
+
xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
if (!xdp_alloc)
return -ENOMEM;
@@ -360,6 +405,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
xdp_rxq->mem.id = id;
xdp_alloc->mem = xdp_rxq->mem;
xdp_alloc->allocator = allocator;
+ xdp_alloc->refcnt = 1;

/* Insert allocator into ID lookup table */
ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
--
2.17.1

2019-07-01 12:40:37

by Jesper Dangaard Brouer

[permalink] [raw]

Subject: Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage

I'm very skeptical about this approach.

On Sun, 30 Jun 2019 20:23:43 +0300
Ivan Khoronzhuk <[email protected]> wrote:

> XDP rxqs can be same for ndevs running under same rx napi softirq.
> But there is no ability to register same allocator for both rxqs,
> by fact it can same rxq but has different ndev as a reference.

This description is not very clear. It can easily be misunderstood.

It is an absolute requirement that each RX-queue have their own
page_pool object/allocator. (This where the performance comes from) as
the page_pool have NAPI protected array for alloc and XDP_DROP recycle.

Your driver/hardware seems to have special case, where a single
RX-queue can receive packets for two different net_device'es.

Do you violate this XDP devmap redirect assumption[1]?
[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329

> Due to last changes allocator destroy can be defered till the moment
> all packets are recycled by destination interface, afterwards it's
> freed. In order to schedule allocator destroy only after all users are
> unregistered, add refcnt to allocator object and schedule to destroy
> only it reaches 0.

The guiding principles when designing an API, is to make it easy to
use, but also make it hard to misuse.

Your API change makes it easy to misuse the API. As it make it easy to
(re)use the allocator pointer (likely page_pool) for multiple
xdp_rxq_info structs. It is only valid for your use-case, because you
have hardware where a single RX-queue delivers to two different
net_devices. For other normal use-cases, this will be a violation.

If I was a user of this API, and saw your xdp_allocator_get(), then I
would assume that this was the normal case. As minimum, we need to add
a comment in the code, about this specific/intended use-case. I
through about detecting the misuse, by adding a queue_index to
xdp_mem_allocator, that can be checked against, when calling
xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
the obvious mistake where queue_index mismatch).

> Signed-off-by: Ivan Khoronzhuk <[email protected]>
> ---
> include/net/xdp_priv.h | 1 +
> net/core/xdp.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
> index 6a8cba6ea79a..995b21da2f27 100644
> --- a/include/net/xdp_priv.h
> +++ b/include/net/xdp_priv.h
> @@ -18,6 +18,7 @@ struct xdp_mem_allocator {
> struct rcu_head rcu;
> struct delayed_work defer_wq;
> unsigned long defer_warn;
> + unsigned long refcnt;
> };
>
> #endif /* __LINUX_NET_XDP_PRIV_H__ */
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index b29d7b513a18..a44621190fdc 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
> WARN(1, "Request remove non-existing id(%d), driver bug?", id);
> return true;
> }
> +
> + /* to avoid calling hash lookup twice, decrement refcnt here till it
> + * reaches zero, then it can be called from workqueue afterwards.
> + */
> + if (xa->refcnt)
> + xa->refcnt--;
> +
> + if (xa->refcnt) {
> + mutex_unlock(&mem_id_lock);
> + return true;
> + }
> +
> xa->disconnect_cnt++;
>
> /* Detects in-flight packet-pages for page_pool */
> @@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
> return true;
> }
>
> +static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)

API wise, when you have "get" operation, you usually also have a "put"
operation...

> +{
> + struct xdp_mem_allocator *xae, *xa = NULL;
> + struct rhashtable_iter iter;
> +
> + mutex_lock(&mem_id_lock);
> + rhashtable_walk_enter(mem_id_ht, &iter);
> + do {
> + rhashtable_walk_start(&iter);
> +
> + while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
> + if (xae->allocator == allocator) {
> + xae->refcnt++;
> + xa = xae;
> + break;
> + }
> + }
> +
> + rhashtable_walk_stop(&iter);
> +
> + } while (xae == ERR_PTR(-EAGAIN));
> + rhashtable_walk_exit(&iter);
> + mutex_unlock(&mem_id_lock);
> +
> + return xa;
> +}
> +
> int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> enum xdp_mem_type type, void *allocator)
> {
> @@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> }
> }
>
> + xdp_alloc = xdp_allocator_get(allocator);
> + if (xdp_alloc) {
> + xdp_rxq->mem.id = xdp_alloc->mem.id;
> + return 0;
> + }
> +

The allocator pointer (in-practice) becomes the identifier for the
mem.id (which rhashtable points to xdp_mem_allocator object).

> xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
> if (!xdp_alloc)
> return -ENOMEM;
> @@ -360,6 +405,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> xdp_rxq->mem.id = id;
> xdp_alloc->mem = xdp_rxq->mem;
> xdp_alloc->allocator = allocator;
> + xdp_alloc->refcnt = 1;
>
> /* Insert allocator into ID lookup table */
> ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2019-07-02 10:29:12

by Ivan Khoronzhuk

[permalink] [raw]

Subject: Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage

On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
>
>I'm very skeptical about this approach.
>
>On Sun, 30 Jun 2019 20:23:43 +0300
>Ivan Khoronzhuk <[email protected]> wrote:
>
>> XDP rxqs can be same for ndevs running under same rx napi softirq.
>> But there is no ability to register same allocator for both rxqs,
>> by fact it can same rxq but has different ndev as a reference.
>
>This description is not very clear. It can easily be misunderstood.
>
>It is an absolute requirement that each RX-queue have their own
>page_pool object/allocator. (This where the performance comes from) as
>the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
>
>Your driver/hardware seems to have special case, where a single
>RX-queue can receive packets for two different net_device'es.
>
>Do you violate this XDP devmap redirect assumption[1]?
>[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
Seems that yes, but that's used only for trace for now.
As it runs under napi and flush clear dev_rx i must do it right in the
rx_handler. So next patchset version will have one patch less.

Thanks!

>
>
>> Due to last changes allocator destroy can be defered till the moment
>> all packets are recycled by destination interface, afterwards it's
>> freed. In order to schedule allocator destroy only after all users are
>> unregistered, add refcnt to allocator object and schedule to destroy
>> only it reaches 0.
>
>The guiding principles when designing an API, is to make it easy to
>use, but also make it hard to misuse.
>
>Your API change makes it easy to misuse the API. As it make it easy to
>(re)use the allocator pointer (likely page_pool) for multiple
>xdp_rxq_info structs. It is only valid for your use-case, because you
>have hardware where a single RX-queue delivers to two different
>net_devices. For other normal use-cases, this will be a violation.
>
>If I was a user of this API, and saw your xdp_allocator_get(), then I
>would assume that this was the normal case. As minimum, we need to add
>a comment in the code, about this specific/intended use-case. I
>through about detecting the misuse, by adding a queue_index to
>xdp_mem_allocator, that can be checked against, when calling
>xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
>the obvious mistake where queue_index mismatch).

I can add, but not sure if it has or can have some conflicts with another
memory allocators, now or in future. Main here to not became a cornerstone
in some not obvious use-cases.

So, for now, let it be in this way:

--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -19,6 +19,7 @@ struct xdp_mem_allocator {
struct delayed_work defer_wq;
unsigned long defer_warn;
unsigned long refcnt;
+ u32 queue_index;
};

#endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index a44621190fdc..c4bf29810f4d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
return true;
}

-static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
+static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
{
struct xdp_mem_allocator *xae, *xa = NULL;
struct rhashtable_iter iter;
@@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)

while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
if (xae->allocator == allocator) {
- xae->refcnt++;
xa = xae;
break;
}
@@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
}
}

- xdp_alloc = xdp_allocator_get(allocator);
+ xdp_alloc = xdp_allocator_find(allocator);
if (xdp_alloc) {
+ if (xdp_alloc->queue_index != xdp_rxq->queue_index)
+ return -EINVAL;
+
xdp_rxq->mem.id = xdp_alloc->mem.id;
+ xdp_alloc->refcnt++;
return 0;
}

@@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
xdp_alloc->mem = xdp_rxq->mem;
xdp_alloc->allocator = allocator;
xdp_alloc->refcnt = 1;
+ xdp_alloc->queue_index = xdp_rxq->queue_index;

/* Insert allocator into ID lookup table */
ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);

Jesper, are you Ok with this version?

>
>
>> Signed-off-by: Ivan Khoronzhuk <[email protected]>
>> ---
>> include/net/xdp_priv.h | 1 +
>> net/core/xdp.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 47 insertions(+)
>>
>> diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
>> index 6a8cba6ea79a..995b21da2f27 100644
>> --- a/include/net/xdp_priv.h
>> +++ b/include/net/xdp_priv.h
>> @@ -18,6 +18,7 @@ struct xdp_mem_allocator {
>> struct rcu_head rcu;
>> struct delayed_work defer_wq;
>> unsigned long defer_warn;
>> + unsigned long refcnt;
>> };
>>
>> #endif /* __LINUX_NET_XDP_PRIV_H__ */
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index b29d7b513a18..a44621190fdc 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -98,6 +98,18 @@ bool __mem_id_disconnect(int id, bool force)
>> WARN(1, "Request remove non-existing id(%d), driver bug?", id);
>> return true;
>> }
>> +
>> + /* to avoid calling hash lookup twice, decrement refcnt here till it
>> + * reaches zero, then it can be called from workqueue afterwards.
>> + */
>> + if (xa->refcnt)
>> + xa->refcnt--;
>> +
>> + if (xa->refcnt) {
>> + mutex_unlock(&mem_id_lock);
>> + return true;
>> + }
>> +
>> xa->disconnect_cnt++;
>>
>> /* Detects in-flight packet-pages for page_pool */
>> @@ -312,6 +324,33 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>> return true;
>> }
>>
>> +static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>
>API wise, when you have "get" operation, you usually also have a "put"
>operation...

It's not part of external API.
I propose to rename it on xdp_allocator_find() as in above diff.
What do you say?

>
>> +{
>> + struct xdp_mem_allocator *xae, *xa = NULL;
>> + struct rhashtable_iter iter;
>> +
>> + mutex_lock(&mem_id_lock);
>> + rhashtable_walk_enter(mem_id_ht, &iter);
>> + do {
>> + rhashtable_walk_start(&iter);
>> +
>> + while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
>> + if (xae->allocator == allocator) {
>> + xae->refcnt++;
>> + xa = xae;
>> + break;
>> + }
>> + }
>> +
>> + rhashtable_walk_stop(&iter);
>> +
>> + } while (xae == ERR_PTR(-EAGAIN));
>> + rhashtable_walk_exit(&iter);
>> + mutex_unlock(&mem_id_lock);
>> +
>> + return xa;
>> +}
>> +
>> int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>> enum xdp_mem_type type, void *allocator)
>> {
>> @@ -347,6 +386,12 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>> }
>> }
>>
>> + xdp_alloc = xdp_allocator_get(allocator);
>> + if (xdp_alloc) {
>> + xdp_rxq->mem.id = xdp_alloc->mem.id;
>> + return 0;
>> + }
>> +
>
>The allocator pointer (in-practice) becomes the identifier for the
>mem.id (which rhashtable points to xdp_mem_allocator object).

So, you have no obj against it?

[...]

--
Regards,
Ivan Khoronzhuk

2019-07-02 14:47:41

by Jesper Dangaard Brouer

[permalink] [raw]

Subject: Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage

On Tue, 2 Jul 2019 13:27:07 +0300
Ivan Khoronzhuk <[email protected]> wrote:

> On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
> >
> >I'm very skeptical about this approach.
> >
> >On Sun, 30 Jun 2019 20:23:43 +0300
> >Ivan Khoronzhuk <[email protected]> wrote:
> >
> >> XDP rxqs can be same for ndevs running under same rx napi softirq.
> >> But there is no ability to register same allocator for both rxqs,
> >> by fact it can same rxq but has different ndev as a reference.
> >
> >This description is not very clear. It can easily be misunderstood.
> >
> >It is an absolute requirement that each RX-queue have their own
> >page_pool object/allocator. (This where the performance comes from) as
> >the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
> >
> >Your driver/hardware seems to have special case, where a single
> >RX-queue can receive packets for two different net_device'es.
> >
> >Do you violate this XDP devmap redirect assumption[1]?
> >[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
> Seems that yes, but that's used only for trace for now.
> As it runs under napi and flush clear dev_rx i must do it right in the
> rx_handler. So next patchset version will have one patch less.
>
> Thanks!
>
> >
> >
> >> Due to last changes allocator destroy can be defered till the moment
> >> all packets are recycled by destination interface, afterwards it's
> >> freed. In order to schedule allocator destroy only after all users are
> >> unregistered, add refcnt to allocator object and schedule to destroy
> >> only it reaches 0.
> >
> >The guiding principles when designing an API, is to make it easy to
> >use, but also make it hard to misuse.
> >
> >Your API change makes it easy to misuse the API. As it make it easy to
> >(re)use the allocator pointer (likely page_pool) for multiple
> >xdp_rxq_info structs. It is only valid for your use-case, because you
> >have hardware where a single RX-queue delivers to two different
> >net_devices. For other normal use-cases, this will be a violation.
> >
> >If I was a user of this API, and saw your xdp_allocator_get(), then I
> >would assume that this was the normal case. As minimum, we need to add
> >a comment in the code, about this specific/intended use-case. I
> >through about detecting the misuse, by adding a queue_index to
> >xdp_mem_allocator, that can be checked against, when calling
> >xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
> >the obvious mistake where queue_index mismatch).
>
> I can add, but not sure if it has or can have some conflicts with another
> memory allocators, now or in future. Main here to not became a cornerstone
> in some not obvious use-cases.
>
> So, for now, let it be in this way:
>
> --- a/include/net/xdp_priv.h
> +++ b/include/net/xdp_priv.h
> @@ -19,6 +19,7 @@ struct xdp_mem_allocator {
> struct delayed_work defer_wq;
> unsigned long defer_warn;
> unsigned long refcnt;
> + u32 queue_index;
> };
>
> #endif /* __LINUX_NET_XDP_PRIV_H__ */
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index a44621190fdc..c4bf29810f4d 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
> return true;
> }
>
> -static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
> +static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
> {
> struct xdp_mem_allocator *xae, *xa = NULL;
> struct rhashtable_iter iter;
> @@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>
> while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
> if (xae->allocator == allocator) {
> - xae->refcnt++;
> xa = xae;
> break;
> }
> @@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> }
> }
>
> - xdp_alloc = xdp_allocator_get(allocator);
> + xdp_alloc = xdp_allocator_find(allocator);
> if (xdp_alloc) {
> + if (xdp_alloc->queue_index != xdp_rxq->queue_index)
> + return -EINVAL;
> +
> xdp_rxq->mem.id = xdp_alloc->mem.id;
> + xdp_alloc->refcnt++;

This is now adjusted outside lock, not good.

> return 0;
> }
>
> @@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> xdp_alloc->mem = xdp_rxq->mem;
> xdp_alloc->allocator = allocator;
> xdp_alloc->refcnt = 1;
> + xdp_alloc->queue_index = xdp_rxq->queue_index;
>
> /* Insert allocator into ID lookup table */
> ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
>
> Jesper, are you Ok with this version?

Please see my other patch, this is based on our first refcnt attempt.
I think that other patch is a better way forward.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2019-07-02 14:54:35

by Ivan Khoronzhuk

[permalink] [raw]

Subject: Re: [PATCH v5 net-next 1/6] xdp: allow same allocator usage

On Tue, Jul 02, 2019 at 04:46:48PM +0200, Jesper Dangaard Brouer wrote:
>On Tue, 2 Jul 2019 13:27:07 +0300
>Ivan Khoronzhuk <[email protected]> wrote:
>
>> On Mon, Jul 01, 2019 at 01:40:59PM +0200, Jesper Dangaard Brouer wrote:
>> >
>> >I'm very skeptical about this approach.
>> >
>> >On Sun, 30 Jun 2019 20:23:43 +0300
>> >Ivan Khoronzhuk <[email protected]> wrote:
>> >
>> >> XDP rxqs can be same for ndevs running under same rx napi softirq.
>> >> But there is no ability to register same allocator for both rxqs,
>> >> by fact it can same rxq but has different ndev as a reference.
>> >
>> >This description is not very clear. It can easily be misunderstood.
>> >
>> >It is an absolute requirement that each RX-queue have their own
>> >page_pool object/allocator. (This where the performance comes from) as
>> >the page_pool have NAPI protected array for alloc and XDP_DROP recycle.
>> >
>> >Your driver/hardware seems to have special case, where a single
>> >RX-queue can receive packets for two different net_device'es.
>> >
>> >Do you violate this XDP devmap redirect assumption[1]?
>> >[1] https://github.com/torvalds/linux/blob/v5.2-rc7/kernel/bpf/devmap.c#L324-L329
>> Seems that yes, but that's used only for trace for now.
>> As it runs under napi and flush clear dev_rx i must do it right in the
>> rx_handler. So next patchset version will have one patch less.
>>
>> Thanks!
>>
>> >
>> >
>> >> Due to last changes allocator destroy can be defered till the moment
>> >> all packets are recycled by destination interface, afterwards it's
>> >> freed. In order to schedule allocator destroy only after all users are
>> >> unregistered, add refcnt to allocator object and schedule to destroy
>> >> only it reaches 0.
>> >
>> >The guiding principles when designing an API, is to make it easy to
>> >use, but also make it hard to misuse.
>> >
>> >Your API change makes it easy to misuse the API. As it make it easy to
>> >(re)use the allocator pointer (likely page_pool) for multiple
>> >xdp_rxq_info structs. It is only valid for your use-case, because you
>> >have hardware where a single RX-queue delivers to two different
>> >net_devices. For other normal use-cases, this will be a violation.
>> >
>> >If I was a user of this API, and saw your xdp_allocator_get(), then I
>> >would assume that this was the normal case. As minimum, we need to add
>> >a comment in the code, about this specific/intended use-case. I
>> >through about detecting the misuse, by adding a queue_index to
>> >xdp_mem_allocator, that can be checked against, when calling
>> >xdp_rxq_info_reg_mem_model() with another xdp_rxq_info struct (to catch
>> >the obvious mistake where queue_index mismatch).
>>
>> I can add, but not sure if it has or can have some conflicts with another
>> memory allocators, now or in future. Main here to not became a cornerstone
>> in some not obvious use-cases.
>>
>> So, for now, let it be in this way:
>>
>> --- a/include/net/xdp_priv.h
>> +++ b/include/net/xdp_priv.h
>> @@ -19,6 +19,7 @@ struct xdp_mem_allocator {
>> struct delayed_work defer_wq;
>> unsigned long defer_warn;
>> unsigned long refcnt;
>> + u32 queue_index;
>> };
>>
>> #endif /* __LINUX_NET_XDP_PRIV_H__ */
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index a44621190fdc..c4bf29810f4d 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -324,7 +324,7 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
>> return true;
>> }
>>
>> -static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>> +static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
>> {
>> struct xdp_mem_allocator *xae, *xa = NULL;
>> struct rhashtable_iter iter;
>> @@ -336,7 +336,6 @@ static struct xdp_mem_allocator *xdp_allocator_get(void *allocator)
>>
>> while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
>> if (xae->allocator == allocator) {
>> - xae->refcnt++;
>> xa = xae;
>> break;
>> }
>> @@ -386,9 +385,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>> }
>> }
>>
>> - xdp_alloc = xdp_allocator_get(allocator);
>> + xdp_alloc = xdp_allocator_find(allocator);
>> if (xdp_alloc) {
>> + if (xdp_alloc->queue_index != xdp_rxq->queue_index)
>> + return -EINVAL;
>> +
>> xdp_rxq->mem.id = xdp_alloc->mem.id;
>> + xdp_alloc->refcnt++;
>
>This is now adjusted outside lock, not good.

In final it serves:

From f43a0b85838f75814cc93e5a724c4c7e5615f936 Mon Sep 17 00:00:00 2001
From: Ivan Khoronzhuk <[email protected]>
Date: Fri, 28 Jun 2019 03:17:24 +0300
Subject: [PATCH] xdp: allow same allocator usage

XDP rxqs can be same for ndevs running under same rx napi softirq.
But there is no ability to register same allocator for both rxqs,
by fact it can same rxq but has different ndev as a reference.

Due to last changes allocator destroy can be defered till the moment
all packets are recycled by destination interface, afterwards it's
freed. In order to schedule allocator destroy only after all users are
unregistered, add refcnt to allocator object and schedule to destroy
only it reaches 0.

Signed-off-by: Ivan Khoronzhuk <[email protected]>
---
include/net/xdp_priv.h | 2 ++
net/core/xdp.c | 52 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/include/net/xdp_priv.h b/include/net/xdp_priv.h
index 6a8cba6ea79a..9858a4057842 100644
--- a/include/net/xdp_priv.h
+++ b/include/net/xdp_priv.h
@@ -18,6 +18,8 @@ struct xdp_mem_allocator {
struct rcu_head rcu;
struct delayed_work defer_wq;
unsigned long defer_warn;
+ unsigned long refcnt;
+ u32 queue_index;
};

#endif /* __LINUX_NET_XDP_PRIV_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 829377cc83db..090f26e4f793 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -98,6 +98,18 @@ static bool __mem_id_disconnect(int id, bool force)
WARN(1, "Request remove non-existing id(%d), driver bug?", id);
return true;
}
+
+ /* to avoid calling hash lookup twice, decrement refcnt here till it
+ * reaches zero, then it can be called from workqueue afterwards.
+ */
+ if (xa->refcnt)
+ xa->refcnt--;
+
+ if (xa->refcnt) {
+ mutex_unlock(&mem_id_lock);
+ return true;
+ }
+
xa->disconnect_cnt++;

/* Detects in-flight packet-pages for page_pool */
@@ -312,6 +324,30 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
return true;
}

+static struct xdp_mem_allocator *xdp_allocator_find(void *allocator)
+{
+ struct xdp_mem_allocator *xae, *xa = NULL;
+ struct rhashtable_iter iter;
+
+ rhashtable_walk_enter(mem_id_ht, &iter);
+ do {
+ rhashtable_walk_start(&iter);
+
+ while ((xae = rhashtable_walk_next(&iter)) && !IS_ERR(xae)) {
+ if (xae->allocator == allocator) {
+ xa = xae;
+ break;
+ }
+ }
+
+ rhashtable_walk_stop(&iter);
+
+ } while (xae == ERR_PTR(-EAGAIN));
+ rhashtable_walk_exit(&iter);
+
+ return xa;
+}
+
int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
enum xdp_mem_type type, void *allocator)
{
@@ -347,6 +383,20 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
}
}

+ mutex_lock(&mem_id_lock);
+ xdp_alloc = xdp_allocator_find(allocator);
+ if (xdp_alloc) {
+ /* One allocator per queue is supposed only */
+ if (xdp_alloc->queue_index != xdp_rxq->queue_index)
+ return -EINVAL;
+
+ xdp_rxq->mem.id = xdp_alloc->mem.id;
+ xdp_alloc->refcnt++;
+ mutex_unlock(&mem_id_lock);
+ return 0;
+ }
+ mutex_unlock(&mem_id_lock);
+
xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
if (!xdp_alloc)
return -ENOMEM;
@@ -360,6 +410,8 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
xdp_rxq->mem.id = id;
xdp_alloc->mem = xdp_rxq->mem;
xdp_alloc->allocator = allocator;
+ xdp_alloc->refcnt = 1;
+ xdp_alloc->queue_index = xdp_rxq->queue_index;

/* Insert allocator into ID lookup table */
ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);

>
>> return 0;
>> }
>>
>> @@ -406,6 +409,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>> xdp_alloc->mem = xdp_rxq->mem;
>> xdp_alloc->allocator = allocator;
>> xdp_alloc->refcnt = 1;
>> + xdp_alloc->queue_index = xdp_rxq->queue_index;
>>
>> /* Insert allocator into ID lookup table */
>> ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
>>
>> Jesper, are you Ok with this version?
>
>Please see my other patch, this is based on our first refcnt attempt.
>I think that other patch is a better way forward.
XDP patch serves it better and can prevent not only obj deletion but also
pool flush. So I propose use 2 patches.

--
Regards,
Ivan Khoronzhuk