This patch set addresses a couple of issues that I had pointed out to Jann
Horn in response to a recent patch submission.
The first issue is that I wanted to avoid the need to read/modify/write the
size value in order to generate the value for pagecnt_bias. Instead we can
just use a fixed constant which reduces the need for memory read operations
and the overall number of instructions to update the pagecnt bias values.
The other, and more important issue is, that apparently we were letting tun
access the napi_alloc_cache indirectly through netdev_alloc_frag and as a
result letting it create unaligned accesses via unaligned allocations. In
order to prevent this I have added a call to SKB_DATA_ALIGN for the fragsz
field so that we will keep the offset in the napi_alloc_cache
SMP_CACHE_BYTES aligned.
---
Alexander Duyck (2):
mm: Use fixed constant in page_frag_alloc instead of size + 1
net: Do not allocate page fragments that are not skb aligned
mm/page_alloc.c | 8 ++++----
net/core/skbuff.c | 4 ++++
2 files changed, 8 insertions(+), 4 deletions(-)
--
From: Alexander Duyck <[email protected]>
This patch replaces the size + 1 value introduced with the recent fix for 1
byte allocs with a constant value.
The idea here is to reduce code overhead as the previous logic would have
to read size into a register, then increment it, and write it back to
whatever field was being used. By using a constant we can avoid those
memory reads and arithmetic operations in favor of just encoding the
maximum value into the operation itself.
Fixes: 2c2ade81741c ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
Signed-off-by: Alexander Duyck <[email protected]>
---
mm/page_alloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ebb35e4d0d90..37ed14ad0b59 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4857,11 +4857,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
/* Even if we own the page, we do not use atomic_set().
* This would break get_page_unless_zero() users.
*/
- page_ref_add(page, size);
+ page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
/* reset page count bias and offset to start of new frag */
nc->pfmemalloc = page_is_pfmemalloc(page);
- nc->pagecnt_bias = size + 1;
+ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
nc->offset = size;
}
@@ -4877,10 +4877,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
size = nc->size;
#endif
/* OK, page count is 0, we can safely set it */
- set_page_count(page, size + 1);
+ set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
/* reset page count bias and offset to start of new frag */
- nc->pagecnt_bias = size + 1;
+ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
offset = size - fragsz;
}
From: Alexander Duyck <[email protected]>
This patch addresses the fact that there are drivers, specifically tun,
that will call into the network page fragment allocators with buffer sizes
that are not cache aligned. Doing this could result in data alignment
and DMA performance issues as these fragment pools are also shared with the
skb allocator and any other devices that will use napi_alloc_frags or
netdev_alloc_frags.
Fixes: ffde7328a36d ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
---
net/core/skbuff.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 26d848484912..2415d9cb9b89 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -356,6 +356,8 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
*/
void *netdev_alloc_frag(unsigned int fragsz)
{
+ fragsz = SKB_DATA_ALIGN(fragsz);
+
return __netdev_alloc_frag(fragsz, GFP_ATOMIC);
}
EXPORT_SYMBOL(netdev_alloc_frag);
@@ -369,6 +371,8 @@ static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
void *napi_alloc_frag(unsigned int fragsz)
{
+ fragsz = SKB_DATA_ALIGN(fragsz);
+
return __napi_alloc_frag(fragsz, GFP_ATOMIC);
}
EXPORT_SYMBOL(napi_alloc_frag);
From: Alexander Duyck <[email protected]>
Date: Fri, 15 Feb 2019 14:44:05 -0800
> This patch set addresses a couple of issues that I had pointed out to Jann
> Horn in response to a recent patch submission.
>
> The first issue is that I wanted to avoid the need to read/modify/write the
> size value in order to generate the value for pagecnt_bias. Instead we can
> just use a fixed constant which reduces the need for memory read operations
> and the overall number of instructions to update the pagecnt bias values.
>
> The other, and more important issue is, that apparently we were letting tun
> access the napi_alloc_cache indirectly through netdev_alloc_frag and as a
> result letting it create unaligned accesses via unaligned allocations. In
> order to prevent this I have added a call to SKB_DATA_ALIGN for the fragsz
> field so that we will keep the offset in the napi_alloc_cache
> SMP_CACHE_BYTES aligned.
Series applied, thanks Alexander.
On 2/15/19 23:44, Alexander Duyck wrote:
> From: Alexander Duyck <[email protected]>
>
> This patch replaces the size + 1 value introduced with the recent fix for 1
> byte allocs with a constant value.
>
> The idea here is to reduce code overhead as the previous logic would have
> to read size into a register, then increment it, and write it back to
> whatever field was being used. By using a constant we can avoid those
> memory reads and arithmetic operations in favor of just encoding the
> maximum value into the operation itself.
>
> Fixes: 2c2ade81741c ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
> Signed-off-by: Alexander Duyck <[email protected]>
> ---
> mm/page_alloc.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ebb35e4d0d90..37ed14ad0b59 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4857,11 +4857,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> /* Even if we own the page, we do not use atomic_set().
> * This would break get_page_unless_zero() users.
> */
> - page_ref_add(page, size);
> + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
But this value can be theoretically too low when PAGE_SIZE >
PAGE_FRAG_CACHE_MAX_SIZE? Such as on architectures with 64kB page size,
while PAGE_FRAG_CACHE_MAX_SIZE is 32kB?
Maybe impossible to exploit in practice thanks to the minimum alignment, but
still IMHO we should be using the larger of PAGE_FRAG_CACHE_MAX_SIZE and
PAGE_SIZE, which should still be a build-time constant, so not defeat the
optimization.
>
> /* reset page count bias and offset to start of new frag */
> nc->pfmemalloc = page_is_pfmemalloc(page);
> - nc->pagecnt_bias = size + 1;
> + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> nc->offset = size;
> }
>
> @@ -4877,10 +4877,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> size = nc->size;
> #endif
> /* OK, page count is 0, we can safely set it */
> - set_page_count(page, size + 1);
> + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>
> /* reset page count bias and offset to start of new frag */
> - nc->pagecnt_bias = size + 1;
> + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> offset = size - fragsz;
> }
>
>
On 2/17/23 10:30, Vlastimil Babka wrote:
> On 2/15/19 23:44, Alexander Duyck wrote:
>> From: Alexander Duyck <[email protected]>
>>
>> This patch replaces the size + 1 value introduced with the recent fix for 1
>> byte allocs with a constant value.
>>
>> The idea here is to reduce code overhead as the previous logic would have
>> to read size into a register, then increment it, and write it back to
>> whatever field was being used. By using a constant we can avoid those
>> memory reads and arithmetic operations in favor of just encoding the
>> maximum value into the operation itself.
>>
>> Fixes: 2c2ade81741c ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
>> Signed-off-by: Alexander Duyck <[email protected]>
>> ---
>> mm/page_alloc.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ebb35e4d0d90..37ed14ad0b59 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4857,11 +4857,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>> /* Even if we own the page, we do not use atomic_set().
>> * This would break get_page_unless_zero() users.
>> */
>> - page_ref_add(page, size);
>> + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>
> But this value can be theoretically too low when PAGE_SIZE >
> PAGE_FRAG_CACHE_MAX_SIZE? Such as on architectures with 64kB page size,
> while PAGE_FRAG_CACHE_MAX_SIZE is 32kB?
Nevermind, PAGE_FRAG_CACHE_MAX_SIZE would be 64kB because
#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK)
So all is fine, sorry for the noise.