Syzkaller reported the following issue:
=======================================
Too BIG xdp->frame_sz = 131072
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
...
Call Trace:
<TASK>
bpf_prog_4add87e5301a4105+0x1a/0x1c
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run_xdp include/linux/filter.h:775 [inline]
bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
netif_receive_generic_xdp net/core/dev.c:4807 [inline]
do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x650/0xe40 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
tun_get_user() still provides an execution path with do_xdp_generic()
and exceed XDP limits for packet size.
Using the syzkaller repro with reduced packet size it was also
discovered that XDP_PACKET_HEADROOM is not checked in
tun_can_build_skb(), although pad may be incremented in
tun_build_skb().
If we move the limit check from tun_can_build_skb() to tun_build_skb()
we will make xdp to be used only in tun_build_skb(), without falling
in tun_alloc_skb(), etc. And moreover we will drop the packet which
can't be processed in tun_build_skb().
Reported-and-tested-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/T/
Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
Signed-off-by: Andrew Kanner <[email protected]>
---
Notes:
V2 -> V3:
* attach the forgotten changelog
V1 -> V2:
* merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
syzkaller repro and missing XDP_PACKET_HEADROOM in pad
* changed the title and description of the execution path, suggested
by Jason Wang <[email protected]>
* move the limit check from tun_can_build_skb() to tun_build_skb() to
remove duplication and locking issue, and also drop the packet in
case of a failed check - noted by Jason Wang <[email protected]>
drivers/net/tun.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d75456adc62a..7c2b05ce0421 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
if (zerocopy)
return false;
- if (SKB_DATA_ALIGN(len + TUN_RX_PAD) +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
- return false;
-
return true;
}
@@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
buflen += SKB_DATA_ALIGN(len + pad);
rcu_read_unlock();
+ if (buflen > PAGE_SIZE)
+ return ERR_PTR(-EFAULT);
+
alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
return ERR_PTR(-ENOMEM);
--
2.39.3
On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <[email protected]> wrote:
>
> Syzkaller reported the following issue:
> =======================================
> Too BIG xdp->frame_sz = 131072
> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
> ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
> bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
> ...
> Call Trace:
> <TASK>
> bpf_prog_4add87e5301a4105+0x1a/0x1c
> __bpf_prog_run include/linux/filter.h:600 [inline]
> bpf_prog_run_xdp include/linux/filter.h:775 [inline]
> bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
> netif_receive_generic_xdp net/core/dev.c:4807 [inline]
> do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
> tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
> call_write_iter include/linux/fs.h:1871 [inline]
> new_sync_write fs/read_write.c:491 [inline]
> vfs_write+0x650/0xe40 fs/read_write.c:584
> ksys_write+0x12f/0x250 fs/read_write.c:637
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
> ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
> tun_get_user() still provides an execution path with do_xdp_generic()
> and exceed XDP limits for packet size.
>
> Using the syzkaller repro with reduced packet size it was also
> discovered that XDP_PACKET_HEADROOM is not checked in
> tun_can_build_skb(), although pad may be incremented in
> tun_build_skb().
>
> If we move the limit check from tun_can_build_skb() to tun_build_skb()
> we will make xdp to be used only in tun_build_skb(), without falling
> in tun_alloc_skb(), etc. And moreover we will drop the packet which
> can't be processed in tun_build_skb().
>
> Reported-and-tested-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/T/
> Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
> Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
> Signed-off-by: Andrew Kanner <[email protected]>
> ---
>
> Notes:
> V2 -> V3:
> * attach the forgotten changelog
> V1 -> V2:
> * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
> syzkaller repro and missing XDP_PACKET_HEADROOM in pad
> * changed the title and description of the execution path, suggested
> by Jason Wang <[email protected]>
> * move the limit check from tun_can_build_skb() to tun_build_skb() to
> remove duplication and locking issue, and also drop the packet in
> case of a failed check - noted by Jason Wang <[email protected]>
Acked-by: Jason Wang <[email protected]>
Thanks
>
> drivers/net/tun.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d75456adc62a..7c2b05ce0421 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
> if (zerocopy)
> return false;
>
> - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) +
> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
> - return false;
> -
> return true;
> }
>
> @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
> buflen += SKB_DATA_ALIGN(len + pad);
> rcu_read_unlock();
>
> + if (buflen > PAGE_SIZE)
> + return ERR_PTR(-EFAULT);
> +
> alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
> return ERR_PTR(-ENOMEM);
> --
> 2.39.3
>
On Wed, Jul 26, 2023 at 10:09:53AM +0800, Jason Wang wrote:
> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <[email protected]> wrote:
> >
> > Syzkaller reported the following issue:
> > =======================================
> > Too BIG xdp->frame_sz = 131072
> > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
> > ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
> > WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
> > bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
> > ...
> > Call Trace:
> > <TASK>
> > bpf_prog_4add87e5301a4105+0x1a/0x1c
> > __bpf_prog_run include/linux/filter.h:600 [inline]
> > bpf_prog_run_xdp include/linux/filter.h:775 [inline]
> > bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
> > netif_receive_generic_xdp net/core/dev.c:4807 [inline]
> > do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
> > tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
> > tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
> > call_write_iter include/linux/fs.h:1871 [inline]
> > new_sync_write fs/read_write.c:491 [inline]
> > vfs_write+0x650/0xe40 fs/read_write.c:584
> > ksys_write+0x12f/0x250 fs/read_write.c:637
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
> > ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
> > tun_get_user() still provides an execution path with do_xdp_generic()
> > and exceed XDP limits for packet size.
> >
> > Using the syzkaller repro with reduced packet size it was also
> > discovered that XDP_PACKET_HEADROOM is not checked in
> > tun_can_build_skb(), although pad may be incremented in
> > tun_build_skb().
> >
> > If we move the limit check from tun_can_build_skb() to tun_build_skb()
> > we will make xdp to be used only in tun_build_skb(), without falling
> > in tun_alloc_skb(), etc. And moreover we will drop the packet which
> > can't be processed in tun_build_skb().
> >
> > Reported-and-tested-by: [email protected]
> > Closes: https://lore.kernel.org/all/[email protected]/T/
> > Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
> > Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
> > Signed-off-by: Andrew Kanner <[email protected]>
> > ---
> >
> > Notes:
> > V2 -> V3:
> > * attach the forgotten changelog
> > V1 -> V2:
> > * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
> > syzkaller repro and missing XDP_PACKET_HEADROOM in pad
> > * changed the title and description of the execution path, suggested
> > by Jason Wang <[email protected]>
> > * move the limit check from tun_can_build_skb() to tun_build_skb() to
> > remove duplication and locking issue, and also drop the packet in
> > case of a failed check - noted by Jason Wang <[email protected]>
>
> Acked-by: Jason Wang <[email protected]>
>
> Thanks
>
> >
> > drivers/net/tun.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index d75456adc62a..7c2b05ce0421 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
> > if (zerocopy)
> > return false;
> >
> > - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) +
> > - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
> > - return false;
> > -
> > return true;
> > }
> >
> > @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
> > buflen += SKB_DATA_ALIGN(len + pad);
> > rcu_read_unlock();
> >
> > + if (buflen > PAGE_SIZE)
> > + return ERR_PTR(-EFAULT);
> > +
> > alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> > if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
> > return ERR_PTR(-ENOMEM);
> > --
> > 2.39.3
> >
>
Thanks, Jason.
Can anyone point me to some tests other than
tools/testing/selftests/net/tun.c?
This one shows:
PASSED: 5 / 5 tests passed.
I'm trying to figure out if we're dropping more packets than expected
with this patch. Not sure if the test above is enough.
--
Andrew Kanner
Cc. John and Ahern
On 26/07/2023 04.09, Jason Wang wrote:
> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner <[email protected]> wrote:
>>
>> Syzkaller reported the following issue:
>> =======================================
>> Too BIG xdp->frame_sz = 131072
Is this a contiguous physical memory allocation?
131072 bytes equal order 5 page.
Looking at tun.c code I cannot find a code path that could create
order-5 skb->data, but only SKB with order-0 fragments. But I guess it
is the netif_receive_generic_xdp() what will realloc to make this linear
(via skb_linearize())
>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
>> ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
>> bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
>> ...
>> Call Trace:
>> <TASK>
>> bpf_prog_4add87e5301a4105+0x1a/0x1c
>> __bpf_prog_run include/linux/filter.h:600 [inline]
>> bpf_prog_run_xdp include/linux/filter.h:775 [inline]
>> bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
>> netif_receive_generic_xdp net/core/dev.c:4807 [inline]
>> do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
>> tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
>> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
>> call_write_iter include/linux/fs.h:1871 [inline]
>> new_sync_write fs/read_write.c:491 [inline]
>> vfs_write+0x650/0xe40 fs/read_write.c:584
>> ksys_write+0x12f/0x250 fs/read_write.c:637
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
>> ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
>> tun_get_user() still provides an execution path with do_xdp_generic()
>> and exceed XDP limits for packet size.
I added this check and maybe it is too strict. XDP can work on higher
order pages, as long as this is contiguous physical memory (e.g. a
page). And
An order 5 page (131072 bytes) seems excessive, but maybe TUN have a
use-case for having such large packets? (Question to Ahern?)
I'm considering we should change the size-limit to order-2 (16384) or
order-3 (32768).
Order-3 because netstack have:
#define SKB_FRAG_PAGE_ORDER get_order(32768)
And order-2 because netstack have: SKB_MAX_ALLOC (16KiB)
- See discussion in commit 6306c1189e77 ("bpf: Remove MTU check in
__bpf_skb_max_len").
- https://git.kernel.org/torvalds/c/6306c1189e77
>>
>> Using the syzkaller repro with reduced packet size it was also
>> discovered that XDP_PACKET_HEADROOM is not checked in
>> tun_can_build_skb(), although pad may be incremented in
>> tun_build_skb().
>>
>> If we move the limit check from tun_can_build_skb() to tun_build_skb()
>> we will make xdp to be used only in tun_build_skb(), without falling
>> in tun_alloc_skb(), etc. And moreover we will drop the packet which
>> can't be processed in tun_build_skb().
Looking at tun_build_skb() is uses the page_frag system, and can thus
create up-to SKB_FRAG_PAGE_ORDER (size 32768 / order-3).
>>
>> Reported-and-tested-by: [email protected]
>> Closes: https://lore.kernel.org/all/[email protected]/T/
>> Link: https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
>> Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
>> Signed-off-by: Andrew Kanner <[email protected]>
>> ---
>>
>> Notes:
>> V2 -> V3:
>> * attach the forgotten changelog
>> V1 -> V2:
>> * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
>> syzkaller repro and missing XDP_PACKET_HEADROOM in pad
>> * changed the title and description of the execution path, suggested
>> by Jason Wang <[email protected]>
>> * move the limit check from tun_can_build_skb() to tun_build_skb() to
>> remove duplication and locking issue, and also drop the packet in
>> case of a failed check - noted by Jason Wang <[email protected]>
>
> Acked-by: Jason Wang <[email protected]>
>
> Thanks
>
>>
>> drivers/net/tun.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d75456adc62a..7c2b05ce0421 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
>> if (zerocopy)
>> return false;
>>
>> - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) +
>> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
>> - return false;
>> -
>> return true;
>> }
>>
>> @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>> buflen += SKB_DATA_ALIGN(len + pad);
>> rcu_read_unlock();
>>
>> + if (buflen > PAGE_SIZE)
>> + return ERR_PTR(-EFAULT);
Concretely I'm saying maybe use SKB_FRAG_PAGE_ORDER "size" here?
e.g. create SKB_FRAG_PAGE_SIZE define as below.
if (buflen > SKB_FRAG_PAGE_SIZE)
diff --git a/include/net/sock.h b/include/net/sock.h
index 656ea89f60ff..4c4b3c257b52 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2886,7 +2886,8 @@ extern int sysctl_optmem_max;
extern __u32 sysctl_wmem_default;
extern __u32 sysctl_rmem_default;
-#define SKB_FRAG_PAGE_ORDER get_order(32768)
+#define SKB_FRAG_PAGE_SIZE 32768
+#define SKB_FRAG_PAGE_ORDER get_order(SKB_FRAG_PAGE_SIZE)
DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>> +
>> alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
>> if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
>> return ERR_PTR(-ENOMEM);
--Jesper
On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
> Cc. John and Ahern
>
> On 26/07/2023 04.09, Jason Wang wrote:
>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
>> <[email protected]> wrote:
>>>
>>> Syzkaller reported the following issue:
>>> =======================================
>>> Too BIG xdp->frame_sz = 131072
>
> Is this a contiguous physical memory allocation?
>
> 131072 bytes equal order 5 page.
>
> Looking at tun.c code I cannot find a code path that could create
> order-5 skb->data, but only SKB with order-0 fragments. But I guess it
> is the netif_receive_generic_xdp() what will realloc to make this linear
> (via skb_linearize())
get_tun_user is passed an iov_iter with a single segment of 65007
total_len. The alloc_skb path is hit with an align size of only 64. That
is insufficient for XDP so the netif_receive_generic_xdp hits the
pskb_expand_head path. Something is off in the math in
netif_receive_generic_xdp resulting in the skb markers being off. That
causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
>
>>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
>>> ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
>>> WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
>>> bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
>>> ...
>>> Call Trace:
>>> <TASK>
>>> bpf_prog_4add87e5301a4105+0x1a/0x1c
>>> __bpf_prog_run include/linux/filter.h:600 [inline]
>>> bpf_prog_run_xdp include/linux/filter.h:775 [inline]
>>> bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
>>> netif_receive_generic_xdp net/core/dev.c:4807 [inline]
>>> do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
>>> tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
>>> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
>>> call_write_iter include/linux/fs.h:1871 [inline]
>>> new_sync_write fs/read_write.c:491 [inline]
>>> vfs_write+0x650/0xe40 fs/read_write.c:584
>>> ksys_write+0x12f/0x250 fs/read_write.c:637
>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
>>> ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But
>>> tun_get_user() still provides an execution path with do_xdp_generic()
>>> and exceed XDP limits for packet size.
>
> I added this check and maybe it is too strict. XDP can work on higher
> order pages, as long as this is contiguous physical memory (e.g. a
> page). And
>
> An order 5 page (131072 bytes) seems excessive, but maybe TUN have a
> use-case for having such large packets? (Question to Ahern?)
>
> I'm considering we should change the size-limit to order-2 (16384) or
> order-3 (32768).
>
> Order-3 because netstack have:
> #define SKB_FRAG_PAGE_ORDER get_order(32768)
>
> And order-2 because netstack have: SKB_MAX_ALLOC (16KiB)
> - See discussion in commit 6306c1189e77 ("bpf: Remove MTU check in
> __bpf_skb_max_len").
> - https://git.kernel.org/torvalds/c/6306c1189e77
>
>
>>>
>>> Using the syzkaller repro with reduced packet size it was also
>>> discovered that XDP_PACKET_HEADROOM is not checked in
>>> tun_can_build_skb(), although pad may be incremented in
>>> tun_build_skb().
>>>
>>> If we move the limit check from tun_can_build_skb() to tun_build_skb()
>>> we will make xdp to be used only in tun_build_skb(), without falling
>>> in tun_alloc_skb(), etc. And moreover we will drop the packet which
>>> can't be processed in tun_build_skb().
>
> Looking at tun_build_skb() is uses the page_frag system, and can thus
> create up-to SKB_FRAG_PAGE_ORDER (size 32768 / order-3).
>
>>>
>>> Reported-and-tested-by:
>>> [email protected]
>>> Closes:
>>> https://lore.kernel.org/all/[email protected]/T/
>>> Link:
>>> https://syzkaller.appspot.com/bug?id=5335c7c62bfff89bbb1c8f14cdabebe91909060f
>>> Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
>>> Signed-off-by: Andrew Kanner <[email protected]>
>>> ---
>>>
>>> Notes:
>>> V2 -> V3:
>>> * attach the forgotten changelog
>>> V1 -> V2:
>>> * merged 2 patches in 1, fixing both issues: WARN_ON_ONCE with
>>> syzkaller repro and missing XDP_PACKET_HEADROOM in pad
>>> * changed the title and description of the execution path,
>>> suggested
>>> by Jason Wang <[email protected]>
>>> * move the limit check from tun_can_build_skb() to
>>> tun_build_skb() to
>>> remove duplication and locking issue, and also drop the packet in
>>> case of a failed check - noted by Jason Wang
>>> <[email protected]>
>>
>> Acked-by: Jason Wang <[email protected]>
>>
>> Thanks
>>
>>>
>>> drivers/net/tun.c | 7 +++----
>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index d75456adc62a..7c2b05ce0421 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -1594,10 +1594,6 @@ static bool tun_can_build_skb(struct
>>> tun_struct *tun, struct tun_file *tfile,
>>> if (zerocopy)
>>> return false;
>>>
>>> - if (SKB_DATA_ALIGN(len + TUN_RX_PAD) +
>>> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
>>> - return false;
>>> -
>>> return true;
>>> }
>>>
>>> @@ -1673,6 +1669,9 @@ static struct sk_buff *tun_build_skb(struct
>>> tun_struct *tun,
>>> buflen += SKB_DATA_ALIGN(len + pad);
>>> rcu_read_unlock();
>>>
>>> + if (buflen > PAGE_SIZE)
>>> + return ERR_PTR(-EFAULT);
>
> Concretely I'm saying maybe use SKB_FRAG_PAGE_ORDER "size" here?
>
> e.g. create SKB_FRAG_PAGE_SIZE define as below.
> if (buflen > SKB_FRAG_PAGE_SIZE)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 656ea89f60ff..4c4b3c257b52 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2886,7 +2886,8 @@ extern int sysctl_optmem_max;
> extern __u32 sysctl_wmem_default;
> extern __u32 sysctl_rmem_default;
>
> -#define SKB_FRAG_PAGE_ORDER get_order(32768)
> +#define SKB_FRAG_PAGE_SIZE 32768
> +#define SKB_FRAG_PAGE_ORDER get_order(SKB_FRAG_PAGE_SIZE)
> DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>
>>> +
>>> alloc_frag->offset = ALIGN((u64)alloc_frag->offset,
>>> SMP_CACHE_BYTES);
>>> if (unlikely(!skb_page_frag_refill(buflen, alloc_frag,
>>> GFP_KERNEL)))
>>> return ERR_PTR(-ENOMEM);
>
> --Jesper
>
On 7/26/23 1:37 PM, David Ahern wrote:
> On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
>> Cc. John and Ahern
>>
>> On 26/07/2023 04.09, Jason Wang wrote:
>>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
>>> <[email protected]> wrote:
>>>>
>>>> Syzkaller reported the following issue:
>>>> =======================================
>>>> Too BIG xdp->frame_sz = 131072
>>
>> Is this a contiguous physical memory allocation?
>>
>> 131072 bytes equal order 5 page.
>>
>> Looking at tun.c code I cannot find a code path that could create
>> order-5 skb->data, but only SKB with order-0 fragments. But I guess it
>> is the netif_receive_generic_xdp() what will realloc to make this linear
>> (via skb_linearize())
>
>
> get_tun_user is passed an iov_iter with a single segment of 65007
> total_len. The alloc_skb path is hit with an align size of only 64. That
> is insufficient for XDP so the netif_receive_generic_xdp hits the
> pskb_expand_head path. Something is off in the math in
> netif_receive_generic_xdp resulting in the skb markers being off. That
> causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB
allocation. But the 128kB part is not relevant to the "bug" here really.
The warn on getting tripped in bpf_xdp_adjust_tail is because xdp
generic path is skb based and can have a frame_sz > 4kB. That's what the
splat is about.
Perhaps the solution is to remove the WARN_ON.
On Thu, Jul 27, 2023 at 8:27 AM David Ahern <[email protected]> wrote:
>
> On 7/26/23 1:37 PM, David Ahern wrote:
> > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
> >> Cc. John and Ahern
> >>
> >> On 26/07/2023 04.09, Jason Wang wrote:
> >>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
> >>> <[email protected]> wrote:
> >>>>
> >>>> Syzkaller reported the following issue:
> >>>> =======================================
> >>>> Too BIG xdp->frame_sz = 131072
> >>
> >> Is this a contiguous physical memory allocation?
> >>
> >> 131072 bytes equal order 5 page.
> >>
> >> Looking at tun.c code I cannot find a code path that could create
> >> order-5 skb->data, but only SKB with order-0 fragments. But I guess it
> >> is the netif_receive_generic_xdp() what will realloc to make this linear
> >> (via skb_linearize())
> >
> >
> > get_tun_user is passed an iov_iter with a single segment of 65007
> > total_len. The alloc_skb path is hit with an align size of only 64. That
> > is insufficient for XDP so the netif_receive_generic_xdp hits the
> > pskb_expand_head path. Something is off in the math in
> > netif_receive_generic_xdp resulting in the skb markers being off. That
> > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
>
>
> BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB
> allocation. But the 128kB part is not relevant to the "bug" here really.
>
> The warn on getting tripped in bpf_xdp_adjust_tail is because xdp
> generic path is skb based and can have a frame_sz > 4kB. That's what the
> splat is about.
Other possibility:
tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up
with producing a frame_sz which is greater than PAGE_SIZE as well in
tun_build_skb().
And rethink this patch, it looks wrong since it basically drops all
packets whose buflen is greater than PAGE_SIZE since it can't fall
back to tun_alloc_skb().
>
> Perhaps the solution is to remove the WARN_ON.
Yes, that is what I'm asking if this warning still makes sense in V1.
Thanks
>
>
On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote:
> On Thu, Jul 27, 2023 at 8:27 AM David Ahern <[email protected]> wrote:
> >
> > On 7/26/23 1:37 PM, David Ahern wrote:
> > > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
> > > > Cc. John and Ahern
> > > >
> > > > On 26/07/2023 04.09, Jason Wang wrote:
> > > > > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Syzkaller reported the following issue:
> > > > > > =======================================
> > > > > > Too BIG xdp->frame_sz = 131072
> > > >
> > > > Is this a contiguous physical memory allocation?
> > > >
> > > > 131072 bytes equal order 5 page.
> > > >
> > > > Looking at tun.c code I cannot find a code path that could create
> > > > order-5 skb->data, but only SKB with order-0 fragments. But I guess it
> > > > is the netif_receive_generic_xdp() what will realloc to make this linear
> > > > (via skb_linearize())
> > >
> > >
> > > get_tun_user is passed an iov_iter with a single segment of 65007
> > > total_len. The alloc_skb path is hit with an align size of only 64. That
> > > is insufficient for XDP so the netif_receive_generic_xdp hits the
> > > pskb_expand_head path. Something is off in the math in
> > > netif_receive_generic_xdp resulting in the skb markers being off. That
> > > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
> >
> >
> > BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB
> > allocation. But the 128kB part is not relevant to the "bug" here really.
> >
> > The warn on getting tripped in bpf_xdp_adjust_tail is because xdp
> > generic path is skb based and can have a frame_sz > 4kB. That's what the
> > splat is about.
>
> Other possibility:
>
> tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up
> with producing a frame_sz which is greater than PAGE_SIZE as well in
> tun_build_skb().
>
> And rethink this patch, it looks wrong since it basically drops all
> packets whose buflen is greater than PAGE_SIZE since it can't fall
> back to tun_alloc_skb().
>
> >
> > Perhaps the solution is to remove the WARN_ON.
>
> Yes, that is what I'm asking if this warning still makes sense in V1.
I understand the consensus is solving the issue by changing/removing
the WARN_ON() in XDP. I think it makes sense, I guess the same warn can
be reached via packet socket xmit on veth or similar topology.
Cheers,
Paolo
On 27/07/2023 11.30, Paolo Abeni wrote:
> On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote:
>> On Thu, Jul 27, 2023 at 8:27 AM David Ahern <[email protected]> wrote:
>>>
>>> On 7/26/23 1:37 PM, David Ahern wrote:
>>>> On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
>>>>> Cc. John and Ahern
>>>>>
>>>>> On 26/07/2023 04.09, Jason Wang wrote:
>>>>>> On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Syzkaller reported the following issue:
>>>>>>> =======================================
>>>>>>> Too BIG xdp->frame_sz = 131072
>>>>>
>>>>> Is this a contiguous physical memory allocation?
>>>>>
>>>>> 131072 bytes equal order 5 page.
>>>>>
>>>>> Looking at tun.c code I cannot find a code path that could create
>>>>> order-5 skb->data, but only SKB with order-0 fragments. But I guess it
>>>>> is the netif_receive_generic_xdp() what will realloc to make this linear
>>>>> (via skb_linearize())
>>>>
>>>>
>>>> get_tun_user is passed an iov_iter with a single segment of 65007
>>>> total_len. The alloc_skb path is hit with an align size of only 64. That
>>>> is insufficient for XDP so the netif_receive_generic_xdp hits the
>>>> pskb_expand_head path. Something is off in the math in
>>>> netif_receive_generic_xdp resulting in the skb markers being off. That
>>>> causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
>>>
>>>
>>> BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB
>>> allocation. But the 128kB part is not relevant to the "bug" here really.
>>>
True, it is another "bug"/unexpected-behavior that SKB gets reallocated
to be 128KiB. We should likely solve this in another patch.
>>> The warn on getting tripped in bpf_xdp_adjust_tail is because xdp
>>> generic path is skb based and can have a frame_sz > 4kB. That's what the
>>> splat is about.
Agree, that the warn condition should be changed, even removed.
It is interesting that this warn caught this unexpected-behavior of
expanding to 128KiB.
>>
>> Other possibility:
>>
>> tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up
>> with producing a frame_sz which is greater than PAGE_SIZE as well in
>> tun_build_skb().
True, and the way I read the tun_build_skb() code, via
skb_page_frag_refill(),
it can produce an SKB with data size (buflen) upto order-3 = 32KiB
(SKB_FRAG_PAGE_ORDER).
Thus, the existing check in tun_can_build_skb() for PAGE_SIZE can/should
be relaxed?
(Please correct me as I don't fully understand tun_get_user() code)
>>
>> And rethink this patch, it looks wrong since it basically drops all
>> packets whose buflen is greater than PAGE_SIZE since it can't fall
>> back to tun_alloc_skb().
>>
I agree, this is why I reacted, as this version of the patch could
potentially cause issues and packet drops.
>>>
>>> Perhaps the solution is to remove the WARN_ON.
>>
>> Yes, that is what I'm asking if this warning still makes sense in V1.
>
> I understand the consensus is solving the issue by changing/removing
> the WARN_ON() in XDP. I think it makes sense, I guess the same warn can
> be reached via packet socket xmit on veth or similar topology.
>
Yes, we can completely remove this check. The original intend was to
catch cases where XDP drivers have not been updated to use xdp.frame_sz,
but that is not longer a concern (since xdp_init_buff).
It was added (by me) in commit:
- c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size")
- v5.8-rc1
- as part of merge 5cc5924d8315
I'm sure it is safe to remove since commit:
- 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine")
- v5.12-rc1
where we introduced xdp_init_buff() helper, which all XDP driver use today.
Question is what "Fixes:" tag should the patch have?
To Andrew, will you
(1) send a new patch that removes this check instead?
(2) have cycles to investigate why the unexpected-behavior of
expanding to 128KiB happens?
--Jesper
On 7/27/23 5:48 PM, Andrew Kanner wrote:
>
> Thanks, everyone.
>
> If we summarize the discussion - there are 3 issues here:
> 1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and
> most trivial)
> 2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not
> needed at all.
> 3. strange behaviour with reallocationg SKB (65007 -> 131072)
I believe that happens because of the current skb size and the need to
expand it to account for the XDP headroom makes the allocation go over
64kB. Since tun is given the packet via a write call there are no header
markers to allocate separate space for headers and data (e.g. like TCP
does with 32kB data segments).
>
> I can check these issues. I have to dive a little deeper with 2-3,
> most likely with kgdb and syzkaller repro. But seems this is not
> somewhat urgent and lives quite a long time without being noticed.
>
> BTW: Attached the ftrace logs using the original syzkaller repro
> (starting with tun_get_user()). They answer Jesper's question about
> contiguous physical memory allocation (kmem_cache_alloc_node() /
> kmalloc_reserve()). But I'll check it one more time before submitting
> a new PATCH V4 or another patch / patch series.
>
On Thu, Jul 27, 2023 at 01:13:10PM +0200, Jesper Dangaard Brouer wrote:
>
>
> On 27/07/2023 11.30, Paolo Abeni wrote:
> > On Thu, 2023-07-27 at 14:07 +0800, Jason Wang wrote:
> > > On Thu, Jul 27, 2023 at 8:27 AM David Ahern <[email protected]> wrote:
> > > >
> > > > On 7/26/23 1:37 PM, David Ahern wrote:
> > > > > On 7/26/23 3:02 AM, Jesper Dangaard Brouer wrote:
> > > > > > Cc. John and Ahern
> > > > > >
> > > > > > On 26/07/2023 04.09, Jason Wang wrote:
> > > > > > > On Tue, Jul 25, 2023 at 11:54 PM Andrew Kanner
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Syzkaller reported the following issue:
> > > > > > > > =======================================
> > > > > > > > Too BIG xdp->frame_sz = 131072
> > > > > >
> > > > > > Is this a contiguous physical memory allocation?
> > > > > >
> > > > > > 131072 bytes equal order 5 page.
> > > > > >
> > > > > > Looking at tun.c code I cannot find a code path that could create
> > > > > > order-5 skb->data, but only SKB with order-0 fragments. But I guess it
> > > > > > is the netif_receive_generic_xdp() what will realloc to make this linear
> > > > > > (via skb_linearize())
> > > > >
> > > > >
> > > > > get_tun_user is passed an iov_iter with a single segment of 65007
> > > > > total_len. The alloc_skb path is hit with an align size of only 64. That
> > > > > is insufficient for XDP so the netif_receive_generic_xdp hits the
> > > > > pskb_expand_head path. Something is off in the math in
> > > > > netif_receive_generic_xdp resulting in the skb markers being off. That
> > > > > causes bpf_prog_run_generic_xdp to compute the wrong frame_sz.
> > > >
> > > >
> > > > BTW, it is pskb_expand_head that turns it from a 64kB to a 128 kB
> > > > allocation. But the 128kB part is not relevant to the "bug" here really.
> > > >
>
> True, it is another "bug"/unexpected-behavior that SKB gets reallocated
> to be 128KiB. We should likely solve this in another patch.
>
> > > > The warn on getting tripped in bpf_xdp_adjust_tail is because xdp
> > > > generic path is skb based and can have a frame_sz > 4kB. That's what the
> > > > splat is about.
>
> Agree, that the warn condition should be changed, even removed.
> It is interesting that this warn caught this unexpected-behavior of
> expanding to 128KiB.
>
> > >
> > > Other possibility:
> > >
> > > tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM this may end up
> > > with producing a frame_sz which is greater than PAGE_SIZE as well in
> > > tun_build_skb().
>
> True, and the way I read the tun_build_skb() code, via
> skb_page_frag_refill(),
> it can produce an SKB with data size (buflen) upto order-3 = 32KiB
> (SKB_FRAG_PAGE_ORDER).
>
> Thus, the existing check in tun_can_build_skb() for PAGE_SIZE can/should be
> relaxed?
> (Please correct me as I don't fully understand tun_get_user() code)
>
> > >
> > > And rethink this patch, it looks wrong since it basically drops all
> > > packets whose buflen is greater than PAGE_SIZE since it can't fall
> > > back to tun_alloc_skb().
> > >
>
> I agree, this is why I reacted, as this version of the patch could
> potentially cause issues and packet drops.
>
> > > >
> > > > Perhaps the solution is to remove the WARN_ON.
> > >
> > > Yes, that is what I'm asking if this warning still makes sense in V1.
> >
> > I understand the consensus is solving the issue by changing/removing
> > the WARN_ON() in XDP. I think it makes sense, I guess the same warn can
> > be reached via packet socket xmit on veth or similar topology.
> >
>
> Yes, we can completely remove this check. The original intend was to
> catch cases where XDP drivers have not been updated to use xdp.frame_sz,
> but that is not longer a concern (since xdp_init_buff).
>
> It was added (by me) in commit:
> - c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size")
> - v5.8-rc1
> - as part of merge 5cc5924d8315
>
> I'm sure it is safe to remove since commit:
> - 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine")
> - v5.12-rc1
>
> where we introduced xdp_init_buff() helper, which all XDP driver use today.
> Question is what "Fixes:" tag should the patch have?
>
> To Andrew, will you
> (1) send a new patch that removes this check instead?
> (2) have cycles to investigate why the unexpected-behavior of
> expanding to 128KiB happens?
>
> --Jesper
>
Thanks, everyone.
If we summarize the discussion - there are 3 issues here:
1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and
most trivial)
2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not
needed at all.
3. strange behaviour with reallocationg SKB (65007 -> 131072)
I can check these issues. I have to dive a little deeper with 2-3,
most likely with kgdb and syzkaller repro. But seems this is not
somewhat urgent and lives quite a long time without being noticed.
BTW: Attached the ftrace logs using the original syzkaller repro
(starting with tun_get_user()). They answer Jesper's question about
contiguous physical memory allocation (kmem_cache_alloc_node() /
kmalloc_reserve()). But I'll check it one more time before submitting
a new PATCH V4 or another patch / patch series.
--
Andrew Kanner
On Thu, Jul 27, 2023 at 06:11:57PM -0600, David Ahern wrote:
> On 7/27/23 5:48 PM, Andrew Kanner wrote:
> >
> > Thanks, everyone.
> >
> > If we summarize the discussion - there are 3 issues here:
> > 1. tun_can_build_skb() doesn't count XDP_PACKET_HEADROOM (minor and
> > most trivial)
> > 2. WARN_ON_ONCE from net/core/filter.c, which may be too strict / not
> > needed at all.
> > 3. strange behaviour with reallocationg SKB (65007 -> 131072)
>
> I believe that happens because of the current skb size and the need to
> expand it to account for the XDP headroom makes the allocation go over
> 64kB. Since tun is given the packet via a write call there are no header
> markers to allocate separate space for headers and data (e.g. like TCP
> does with 32kB data segments).
Yes, this is exactly what you suspected. In pskb_expand_head() ->
kmalloc_reserve() I have these values initially:
(gdb) p *size
$13 = 65408
(gdb) p obj_size
$16 = 65728
and it will do:
data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
...
obj_size = SKB_HEAD_ALIGN(*size);
...
*size = obj_size = kmalloc_size_roundup(obj_size);
(gdb) p *size
$22 = 131072
So this is kmalloc_size_roundup() doing this math with the following:
/* Above the smaller buckets, size is a multiple of page size. */ │
if (size > KMALLOC_MAX_CACHE_SIZE) │
return PAGE_SIZE << get_order(size);
> >
> > I can check these issues. I have to dive a little deeper with 2-3,
> > most likely with kgdb and syzkaller repro. But seems this is not
> > somewhat urgent and lives quite a long time without being noticed.
> >
> > BTW: Attached the ftrace logs using the original syzkaller repro
> > (starting with tun_get_user()). They answer Jesper's question about
> > contiguous physical memory allocation (kmem_cache_alloc_node() /
> > kmalloc_reserve()). But I'll check it one more time before submitting
> > a new PATCH V4 or another patch / patch series.
> >
>
I see no other bugs in math, so not sure wether it should be fixed. Is
it ok and expected to roundup the memory allocation?
--
Andrew Kanner