luwei (O) wrote:
> yes, here is the vnet_hdr:
>
> flags: 3
> gso_type: 3
> hdr_len: 23
> gso_size: 58452
> csum_start: 5
> csum_offset: 16
>
> and the packet:
>
> | vnet_hdr | mac header | network header | data ... |
>
> memcpy((void*)0x20000200,
> "\x03\x03\x02\x00\x54\xe4\x05\x00\x10\x00\x80\x00\x00\x53\xcc\x9c\x2b"
> "\x19\x3b\x00\x00\x00\x89\x4f\x08\x03\x83\x81\x04",
> 29);
> *(uint16_t*)0x200000c0 = 0x11;
> *(uint16_t*)0x200000c2 = htobe16(0);
> *(uint32_t*)0x200000c4 = r[3];
> *(uint16_t*)0x200000c8 = 1;
> *(uint8_t*)0x200000ca = 0;
> *(uint8_t*)0x200000cb = 6;
> memset((void*)0x200000cc, 170, 5);
> *(uint8_t*)0x200000d1 = 0;
> memset((void*)0x200000d2, 0, 2);
> syscall(__NR_sendto, r[1], 0x20000200ul, 0xe45ful, 0ul, 0x200000c0ul, 0x14ul);
Thanks. So this can happen whenever a packet is injected into the tx
path with a virtio_net_hdr.
Even if we add bounds checking for the link layer header in pf_packet,
it can still point to the network header.
If packets are looped to the tx path, skb_pull is common if a packet
traverses tunnel devices. But csum_start does not directly matter in
the rx path (CHECKSUM_PARTIAL is just seen as CHECKSUM_UNNECESSARY).
Until it is forwarded again to the tx path.
So the question is which code calls skb_checksum_start_offset on the
tx path. Clearly, skb_checksum_help. Also a lot of drivers. Which
may cast the signed int return value to an unsigned. Even an u8 in
the first driver I spotted (alx).
skb_postpull_rcsum anticipates a negative return value, as do other
core functions. So it clearly allowed in certain cases. We cannot
just bound it.
Summary after a long story: an initial investigation, but I don't have
a good solution so far. Maybe others have a good suggestiong based on
this added context.
Slightly tangential: this check in gso_reset_checksum seems needlessly
indirect:
SKB_GSO_CB(skb)->csum_start = skb_checksum_start(skb) - skb->head;
> -----邮件原件-----
> 发件人: Willem de Bruijn [mailto:[email protected]]
> 发送时间: 2023年4月12日 9:44 PM
> 收件人: luwei (O) <[email protected]>; Eric Dumazet <[email protected]>
> 抄送: Willem de Bruijn <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> 主题: Re: [PATCH net] net: Add check for csum_start in skb_partial_csum_set()
>
> luwei (O) wrote:
> >
> > 在 2023/4/11 4:13 PM, Eric Dumazet 写道:
> > > On Tue, Apr 11, 2023 at 4:33 AM luwei (O) <[email protected]> wrote:
> > >>
> > >> 在 2023/4/11 1:30 AM, Willem de Bruijn 写道:
> > >>
> > >> Eric Dumazet wrote:
> > >>
> > >> On Mon, Apr 10, 2023 at 4:22 AM Lu Wei <[email protected]> wrote:
> > >>
> > >> If an AF_PACKET socket is used to send packets through a L3 mode
> > >> ipvlan and a vnet header is set via setsockopt() with the option
> > >> name of PACKET_VNET_HDR, the value of offset will be nagetive in
> > >> function
> > >> skb_checksum_help() and trigger the following warning:
> > >>
> > >> WARNING: CPU: 3 PID: 2023 at net/core/dev.c:3262
> > >> skb_checksum_help+0x2dc/0x390
> > >> ......
> > >> Call Trace:
> > >> <TASK>
> > >> ip_do_fragment+0x63d/0xd00
> > >> ip_fragment.constprop.0+0xd2/0x150
> > >> __ip_finish_output+0x154/0x1e0
> > >> ip_finish_output+0x36/0x1b0
> > >> ip_output+0x134/0x240
> > >> ip_local_out+0xba/0xe0
> > >> ipvlan_process_v4_outbound+0x26d/0x2b0
> > >> ipvlan_xmit_mode_l3+0x44b/0x480
> > >> ipvlan_queue_xmit+0xd6/0x1d0
> > >> ipvlan_start_xmit+0x32/0xa0
> > >> dev_hard_start_xmit+0xdf/0x3f0
> > >> packet_snd+0xa7d/0x1130
> > >> packet_sendmsg+0x7b/0xa0
> > >> sock_sendmsg+0x14f/0x160
> > >> __sys_sendto+0x209/0x2e0
> > >> __x64_sys_sendto+0x7d/0x90
> > >>
> > >> The root cause is:
> > >> 1. skb->csum_start is set in packet_snd() according vnet_hdr:
> > >> skb->csum_start = skb_headroom(skb) + (u32)start;
> > >>
> > >> 'start' is the offset from skb->data, and mac header has been
> > >> set at this moment.
> > >>
> > >> 2. when this skb arrives ipvlan_process_outbound(), the mac header
> > >> is unset and skb_pull is called to expand the skb headroom.
> > >>
> > >> 3. In function skb_checksum_help(), the variable offset is calculated
> > >> as:
> > >> offset = skb->csum_start - skb_headroom(skb);
> > >>
> > >> since skb headroom is expanded in step2, offset is nagetive, and it
> > >> is converted to an unsigned integer when compared with skb_headlen
> > >> and trigger the warning.
> > >>
> > >> Not sure why it is negative ? This seems like the real problem...
> > >>
> > >> csum_start is relative to skb->head, regardless of pull operations.
> > >>
> > >> whatever set csum_start to a too small value should be tracked and fixed.
> > >>
> > >> Right. The only way I could see it go negative is if something does
> > >> the equivalent of pskb_expand_head with positive nhead, and without
> > >> calling skb_headers_offset_update.
> > >>
> > >> Perhaps the cause can be found by instrumenting all the above
> > >> functions in the trace to report skb_headroom and csum_start.
> > >> And also virtio_net_hdr_to_skb.
> > >> .
> > >>
> > >> Hi, Eric and Willem, sorry for not describing this issue clearly enough. Here is the detailed data path:
> > >>
> > >> 1. Users call sendmsg() to send message with a AF_PACKET domain
> > >> and SOCK_RAW type socket. Since vnet_hdr
> > >>
> > >> is set, csum_start is calculated as:
> > >>
> > >> skb->csum_start = skb_headroom(skb) + (u32)start; // see the following code.
> > >>
> > >> the varible "start" it passed from user data, in my case it is 5 and skb_headroom is 2, so skb->csum_start is 7.
> > >>
> > > I think you are rephrasing, but you did not address my feedback.
> > >
> > > Namely, "csum_start < skb->network_header" does not look sensical to me.
> > >
> > > csum_start should be related to the transport header, not network header.
> >
> > csum_start is calculated in pakcet_snd() as:
> >
> > skb->csum_start = skb_headroom(skb) + (u32)start;
> >
> > the varible "start" it passed from user data via vnet_hdr as follows:
> >
> > packet_snd()
> > ...
> > if (po->has_vnet_hdr) {
> > err = packet_snd_vnet_parse(msg, &len, &vnet_hdr); // get vnet_hdr which includes start
> > if (err)
> > goto out_unlock;
> > has_vnet_hdr = true;
> > }
> > ...
> >
> > csum_start should be at the transport header but users may pass an incorrect value.
>
> Thanks for the clarification.
>
> So this is another bogus packet socket packet, with csum_start set somewhere in the L2 header, and that gets popped by ipvlan, correct?
>
> Do you have the exact packet and the virtio_net_hdr that caused this, perhaps?
>
> skb_partial_csum_set in virtio_net_hdr_to_skb has some basic bounds tests for csum_start, csum_off and csum_end. But that does not preclude an offset in the L2 header, from what I can tell.
>
> Conceivably this can be added, though it is a bit complex for devices with variable length link layer headers. And it would have to happen not only for packet sockets, but all users of virtio_net_hdr.
Willem de Bruijn wrote:
> luwei (O) wrote:
> > yes, here is the vnet_hdr:
> >
> > flags: 3
> > gso_type: 3
> > hdr_len: 23
> > gso_size: 58452
> > csum_start: 5
> > csum_offset: 16
> >
> > and the packet:
> >
> > | vnet_hdr | mac header | network header | data ... |
> >
> > memcpy((void*)0x20000200,
> > "\x03\x03\x02\x00\x54\xe4\x05\x00\x10\x00\x80\x00\x00\x53\xcc\x9c\x2b"
> > "\x19\x3b\x00\x00\x00\x89\x4f\x08\x03\x83\x81\x04",
> > 29);
> > *(uint16_t*)0x200000c0 = 0x11;
> > *(uint16_t*)0x200000c2 = htobe16(0);
> > *(uint32_t*)0x200000c4 = r[3];
> > *(uint16_t*)0x200000c8 = 1;
> > *(uint8_t*)0x200000ca = 0;
> > *(uint8_t*)0x200000cb = 6;
> > memset((void*)0x200000cc, 170, 5);
> > *(uint8_t*)0x200000d1 = 0;
> > memset((void*)0x200000d2, 0, 2);
> > syscall(__NR_sendto, r[1], 0x20000200ul, 0xe45ful, 0ul, 0x200000c0ul, 0x14ul);
>
> Thanks. So this can happen whenever a packet is injected into the tx
> path with a virtio_net_hdr.
>
> Even if we add bounds checking for the link layer header in pf_packet,
> it can still point to the network header.
>
> If packets are looped to the tx path, skb_pull is common if a packet
> traverses tunnel devices. But csum_start does not directly matter in
> the rx path (CHECKSUM_PARTIAL is just seen as CHECKSUM_UNNECESSARY).
> Until it is forwarded again to the tx path.
>
> So the question is which code calls skb_checksum_start_offset on the
> tx path. Clearly, skb_checksum_help. Also a lot of drivers. Which
> may cast the signed int return value to an unsigned. Even an u8 in
> the first driver I spotted (alx).
>
> skb_postpull_rcsum anticipates a negative return value, as do other
> core functions. So it clearly allowed in certain cases. We cannot
> just bound it.
>
> Summary after a long story: an initial investigation, but I don't have
> a good solution so far. Maybe others have a good suggestiong based on
> this added context.
Specific to skb_checksum_help, it appears that skb_checksum will
work with negative offset just fine.
Perhaps the only issue is that the WARN_ON_ONCE compares signed to
unsigned, and thus incorrectly interprets a negative offset as
>= skb_headlen(skb)