2017-06-28 20:35:44

by Jim Baxter

[permalink] [raw]
Subject: [PATCH V2 0/1] Reduce cdc_ncm memory use when kernel memory low

Problem
-------

We are using an ARM embedded platform and require 16KiB NTB's to allow for fast
data transfer. Unfortunately we have found that there are times after
running the kernel for a while and transferring a lot of data over the CDC-NCM
connection that it can become harder to find 16KiB pages of memory for
allocation.
This results in a disconnection of the NCM Gadget attached to the host platform.

We are running with reduced buffers to not cross over into the 32KiB page
boundary by setting the buffer sizes to:
tx_max=16000
rx_max=16000


Analysis
--------

We identified through investigation that the lack of 16KiB pages would be short
lived as the kernel would compact the buddy list soon after the failure which
results in pages being available within seconds.

Solution
--------

In order to avoid disconnections I implemented a patch that will attempt to
use a 2048 Byte minimum size NTB if the allocation of the maximum size NTB
fails.
This allows the connection to limp along until the memory has been recovered
which was usually between 1 and 4 NTB's on our heavy traffic system.
The algorithm will wait for an increasing number of small allocations each
time we have a failure to not burden a system short on memory.

---

V1: Sent to linux-usb for review.
V2: Added code to increase amount of time spent making small allocations to
reduce the burden on the system.

This is the diff between Version 1 and 2 of the patches.

-- File: drivers/net/usb/cdc_ncm.c
41c51,60
< @@ -1055,10 +1055,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_
---
> @@ -89,6 +89,8 @@ struct cdc_ncm_stats {
> CDC_NCM_SIMPLE_STAT(rx_ntbs),
> };
>
> +#define CDC_NCM_LOW_MEM_MAX_CNT 10
> +
> static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, int sset)
> {
> switch (sset) {
> @@ -1055,10 +1057,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_
59,60c78,91
< + ctx->tx_curr_size = ctx->tx_max;
< + skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
---
> + if (ctx->tx_low_mem_val == 0) {
> + ctx->tx_curr_size = ctx->tx_max;
> + skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
> + /* If the memory allocation fails we will wait longer
> + * each time before attempting another full size
> + * allocation again to not overload the system
> + * further.
> + */
> + if (skb_out == NULL) {
> + ctx->tx_low_mem_max_cnt = min(ctx->tx_low_mem_max_cnt + 1,
> + (unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
> + ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
> + }
> + }
84a116
> + ctx->tx_low_mem_val--;

-- File: include/linux/usb/cdc_ncm.h
130a163,164
> + u32 tx_low_mem_max_cnt;
> + u32 tx_low_mem_val;


Jim Baxter (1):
net: cdc_ncm: Reduce memory use when kernel memory low

drivers/net/usb/cdc_ncm.c | 54 +++++++++++++++++++++++++++++++++++----------
include/linux/usb/cdc_ncm.h | 3 +++
2 files changed, 45 insertions(+), 12 deletions(-)

--
1.9.1


2017-06-28 20:35:56

by Jim Baxter

[permalink] [raw]
Subject: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.

Signed-off-by: Jim Baxter <[email protected]>

---

V1: Sent to linux-usb for review.
V2: Added code to increase amount of time spent making small allocations to
reduce the burden on the system.

drivers/net/usb/cdc_ncm.c | 54 +++++++++++++++++++++++++++++++++++----------
include/linux/usb/cdc_ncm.h | 3 +++
2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index b5cec18..f9187d8 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -89,6 +89,8 @@ struct cdc_ncm_stats {
CDC_NCM_SIMPLE_STAT(rx_ntbs),
};

+#define CDC_NCM_LOW_MEM_MAX_CNT 10
+
static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, int sset)
{
switch (sset) {
@@ -1055,10 +1057,10 @@ static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_

/* align new NDP */
if (!(ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END))
- cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, ctx->tx_max);
+ cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, ctx->tx_curr_size);

/* verify that there is room for the NDP and the datagram (reserve) */
- if ((ctx->tx_max - skb->len - reserve) < ctx->max_ndp_size)
+ if ((ctx->tx_curr_size - skb->len - reserve) < ctx->max_ndp_size)
return NULL;

/* link to it */
@@ -1111,13 +1113,41 @@ struct sk_buff *

/* allocate a new OUT skb */
if (!skb_out) {
- skb_out = alloc_skb(ctx->tx_max, GFP_ATOMIC);
+ if (ctx->tx_low_mem_val == 0) {
+ ctx->tx_curr_size = ctx->tx_max;
+ skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+ /* If the memory allocation fails we will wait longer
+ * each time before attempting another full size
+ * allocation again to not overload the system
+ * further.
+ */
+ if (skb_out == NULL) {
+ ctx->tx_low_mem_max_cnt = min(ctx->tx_low_mem_max_cnt + 1,
+ (unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
+ ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
+ }
+ }
if (skb_out == NULL) {
- if (skb != NULL) {
- dev_kfree_skb_any(skb);
- dev->net->stats.tx_dropped++;
+ /* See if a very small allocation is possible.
+ * We will send this packet immediately and hope
+ * that there is more memory available later.
+ */
+ if (skb)
+ ctx->tx_curr_size = max(skb->len,
+ (u32)USB_CDC_NCM_NTB_MIN_OUT_SIZE);
+ else
+ ctx->tx_curr_size = USB_CDC_NCM_NTB_MIN_OUT_SIZE;
+ skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+
+ /* No allocation possible so we will abort */
+ if (skb_out == NULL) {
+ if (skb != NULL) {
+ dev_kfree_skb_any(skb);
+ dev->net->stats.tx_dropped++;
+ }
+ goto exit_no_skb;
}
- goto exit_no_skb;
+ ctx->tx_low_mem_val--;
}
/* fill out the initial 16-bit NTB header */
nth16 = (struct usb_cdc_ncm_nth16 *)memset(skb_put(skb_out, sizeof(struct usb_cdc_ncm_nth16)), 0, sizeof(struct usb_cdc_ncm_nth16));
@@ -1148,10 +1178,10 @@ struct sk_buff *
ndp16 = cdc_ncm_ndp(ctx, skb_out, sign, skb->len + ctx->tx_modulus + ctx->tx_remainder);

/* align beginning of next frame */
- cdc_ncm_align_tail(skb_out, ctx->tx_modulus, ctx->tx_remainder, ctx->tx_max);
+ cdc_ncm_align_tail(skb_out, ctx->tx_modulus, ctx->tx_remainder, ctx->tx_curr_size);

/* check if we had enough room left for both NDP and frame */
- if (!ndp16 || skb_out->len + skb->len + delayed_ndp_size > ctx->tx_max) {
+ if (!ndp16 || skb_out->len + skb->len + delayed_ndp_size > ctx->tx_curr_size) {
if (n == 0) {
/* won't fit, MTU problem? */
dev_kfree_skb_any(skb);
@@ -1227,7 +1257,7 @@ struct sk_buff *
/* If requested, put NDP at end of frame. */
if (ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END) {
nth16 = (struct usb_cdc_ncm_nth16 *)skb_out->data;
- cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_max);
+ cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_curr_size);
nth16->wNdpIndex = cpu_to_le16(skb_out->len);
memcpy(skb_put(skb_out, ctx->max_ndp_size), ctx->delayed_ndp16, ctx->max_ndp_size);

@@ -1246,9 +1276,9 @@ struct sk_buff *
*/
if (!(dev->driver_info->flags & FLAG_SEND_ZLP) &&
skb_out->len > ctx->min_tx_pkt) {
- padding_count = ctx->tx_max - skb_out->len;
+ padding_count = ctx->tx_curr_size - skb_out->len;
memset(skb_put(skb_out, padding_count), 0, padding_count);
- } else if (skb_out->len < ctx->tx_max &&
+ } else if (skb_out->len < ctx->tx_curr_size &&
(skb_out->len % dev->maxpacket) == 0) {
*skb_put(skb_out, 1) = 0; /* force short packet */
}
diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h
index 00d2324..021f7a8 100644
--- a/include/linux/usb/cdc_ncm.h
+++ b/include/linux/usb/cdc_ncm.h
@@ -117,6 +117,9 @@ struct cdc_ncm_ctx {
u32 tx_curr_frame_num;
u32 rx_max;
u32 tx_max;
+ u32 tx_curr_size;
+ u32 tx_low_mem_max_cnt;
+ u32 tx_low_mem_val;
u32 max_datagram_size;
u16 tx_max_datagrams;
u16 tx_remainder;
--
1.9.1

2017-06-30 16:59:57

by David Miller

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

From: Jim Baxter <[email protected]>
Date: Wed, 28 Jun 2017 21:35:29 +0100

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
>
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
>
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
>
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
>
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
>
> Signed-off-by: Jim Baxter <[email protected]>

If someone could review this patch, I remember this issue being discussed
a while ago, I would really appreciate it.

2017-06-30 17:03:36

by Jim Baxter

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low


--------------------------------------------------------------------------------
From: David S. Miller ([email protected])
Sent: Fri, 30 Jun 2017 12:59:53 -0400
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low




> From: Jim Baxter <[email protected]>
> Date: Wed, 28 Jun 2017 21:35:29 +0100
>
>> The CDC-NCM driver can require large amounts of memory to create
>> skb's and this can be a problem when the memory becomes fragmented.
>>
>> This especially affects embedded systems that have constrained
>> resources but wish to maximise the throughput of CDC-NCM with 16KiB
>> NTB's.
>>
>> The issue is after running for a while the kernel memory can become
>> fragmented and it needs compacting.
>> If the NTB allocation is needed before the memory has been compacted
>> the atomic allocation can fail which can cause increased latency,
>> large re-transmissions or disconnections depending upon the data
>> being transmitted at the time.
>> This situation occurs for less than a second until the kernel has
>> compacted the memory but the failed devices can take a lot longer to
>> recover from the failed TX packets.
>>
>> To ease this temporary situation I modified the CDC-NCM TX path to
>> temporarily switch into a reduced memory mode which allocates an NTB
>> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
>> sized memory block and only transmit NTB's with a single network frame
>> until the memory situation is resolved.
>> Each time this issue occurs we wait for an increasing number of
>> reduced size allocations before requesting a full size one to not
>> put additional pressure on a low memory system.
>>
>> Once the memory is compacted the CDC-NCM data can resume transmitting
>> at the normal tx_max rate once again.
>>
>> Signed-off-by: Jim Baxter <[email protected]>
>
> If someone could review this patch, I remember this issue being discussed
> a while ago, I would really appreciate it.
>

Hello,

For reference this replaces the original discussion in
http://patchwork.ozlabs.org/patch/763100/ and
http://patchwork.ozlabs.org/patch/766181/

Best regards,
Jim

2017-06-30 17:38:55

by Bjørn Mork

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

Jim Baxter <[email protected]> writes:

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
>
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
>
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
>
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
>
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
>
> Signed-off-by: Jim Baxter <[email protected]>

This looks good to me.

I would still be happier if we didn't need something like this, but I
understand that we do. And this patch looks as clean as it can get. I
haven't tested the patch under any sort of memory pressure, but I did a
basic runtime test on a single MBIM device. As expected, I did not
notice any changes with this patch applied.

But regarding noticable effects: The patch adds no printks, counters or
sysfs attributes which could tell the user that the initial buffer
allocation has failed. Maybe add some sort of debug helper(s) in a
followup patch? How did you verify the patch operation while testing it?

Anyway, that's no show stopper of course. So FWIW:

Reviewed-by: Bjørn Mork <[email protected]>


2017-06-30 17:54:03

by Jim Baxter

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low


> Jim Baxter <[email protected]> writes:
>
>> The CDC-NCM driver can require large amounts of memory to create
>> skb's and this can be a problem when the memory becomes fragmented.
>>
>> This especially affects embedded systems that have constrained
>> resources but wish to maximise the throughput of CDC-NCM with 16KiB
>> NTB's.
>>
>> The issue is after running for a while the kernel memory can become
>> fragmented and it needs compacting.
>> If the NTB allocation is needed before the memory has been compacted
>> the atomic allocation can fail which can cause increased latency,
>> large re-transmissions or disconnections depending upon the data
>> being transmitted at the time.
>> This situation occurs for less than a second until the kernel has
>> compacted the memory but the failed devices can take a lot longer to
>> recover from the failed TX packets.
>>
>> To ease this temporary situation I modified the CDC-NCM TX path to
>> temporarily switch into a reduced memory mode which allocates an NTB
>> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
>> sized memory block and only transmit NTB's with a single network frame
>> until the memory situation is resolved.
>> Each time this issue occurs we wait for an increasing number of
>> reduced size allocations before requesting a full size one to not
>> put additional pressure on a low memory system.
>>
>> Once the memory is compacted the CDC-NCM data can resume transmitting
>> at the normal tx_max rate once again.
>>
>> Signed-off-by: Jim Baxter <[email protected]>
>
> This looks good to me.
>
> I would still be happier if we didn't need something like this, but I
> understand that we do. And this patch looks as clean as it can get. I
> haven't tested the patch under any sort of memory pressure, but I did a
> basic runtime test on a single MBIM device. As expected, I did not
> notice any changes with this patch applied.
>
> But regarding noticable effects: The patch adds no printks, counters or
> sysfs attributes which could tell the user that the initial buffer
> allocation has failed. Maybe add some sort of debug helper(s) in a
> followup patch? How did you verify the patch operation while testing it?
>
> Anyway, that's no show stopper of course. So FWIW:
>
> Reviewed-by: Bjørn Mork <[email protected]>
>

Hello Bjørn,

I tested this with printk's to show when the low memory code was triggered
and the value of ctx->tx_low_mem_val and ctx->tx_low_mem_max_cnt.
I created a workqueue that slowly used up the atomic memory until the
code is triggered.

I could add debug prints, though I have noticed that cdc_ncm_fill_tx_frame()
does not currently have any debug prints do you think this is because it can be
called in an atomic context and I think debug messages if enabled could cause
too great a delay?

Regards,
Jim

2017-06-30 18:03:56

by Bjørn Mork

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

"Baxter, Jim" <[email protected]> writes:

> I tested this with printk's to show when the low memory code was triggered
> and the value of ctx->tx_low_mem_val and ctx->tx_low_mem_max_cnt.
> I created a workqueue that slowly used up the atomic memory until the
> code is triggered.
>
> I could add debug prints, though I have noticed that cdc_ncm_fill_tx_frame()
> does not currently have any debug prints do you think this is because it can be
> called in an atomic context and I think debug messages if enabled could cause
> too great a delay?

Yes, I guess you're right. Maybe count the number of failed allocations
and export it along with the other driver private counters? Or export
the tx_curr_size as a sysfs attribute? Or both?

Just an idea... I don't expect to see this code ever being hit on my
systems :-)



Bjørn

2017-07-03 08:51:17

by David Miller

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

From: Jim Baxter <[email protected]>
Date: Wed, 28 Jun 2017 21:35:29 +0100

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
>
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
>
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
>
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
>
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
>
> Signed-off-by: Jim Baxter <[email protected]>

Patch applied, thanks.