There is a problem with fragmented IP packet sent within 802.1Q tagged
ethernet frame through bridge. Problem exists when conntrack is enabled
(i.e. nf_conntrack_ipv4 module is loaded). Then, such packets are not
fragmented again (after prior reassembling on bridge device) during
passing it to bridge enslaved NIC. It cause MTU exceeding and as a result
dropping packet.
Problem exists from kernel version 2.6.17 to 2.6.21.3 inclusive.
Below, there is a patch to fix it.
Regards.
--- linux-2.6.21.3.orig/net/bridge/br_netfilter.c 2007-05-25 09:56:15.000000000 +0200
+++ linux-2.6.21.3/net/bridge/br_netfilter.c 2007-05-25 10:11:42.000000000 +0200
@@ -731,7 +731,7 @@
static int br_nf_dev_queue_xmit(struct sk_buff *skb)
{
- if (skb->protocol == htons(ETH_P_IP) &&
+ if ((skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb)) &&
skb->len > skb->dev->mtu &&
!skb_is_gso(skb))
return ip_fragment(skb, br_dev_queue_push_xmit);
On Fri, 25 May 2007 10:17:50 +0200
Adam Osuchowski <[email protected]> wrote:
> There is a problem with fragmented IP packet sent within 802.1Q tagged
> ethernet frame through bridge. Problem exists when conntrack is enabled
> (i.e. nf_conntrack_ipv4 module is loaded). Then, such packets are not
> fragmented again (after prior reassembling on bridge device) during
> passing it to bridge enslaved NIC. It cause MTU exceeding and as a result
> dropping packet.
>
> Problem exists from kernel version 2.6.17 to 2.6.21.3 inclusive.
>
> Below, there is a patch to fix it.
>
> Regards.
>
>
> --- linux-2.6.21.3.orig/net/bridge/br_netfilter.c 2007-05-25 09:56:15.000000000 +0200
> +++ linux-2.6.21.3/net/bridge/br_netfilter.c 2007-05-25 10:11:42.000000000 +0200
> @@ -731,7 +731,7 @@
>
> static int br_nf_dev_queue_xmit(struct sk_buff *skb)
> {
> - if (skb->protocol == htons(ETH_P_IP) &&
> + if ((skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb)) &&
> skb->len > skb->dev->mtu &&
> !skb_is_gso(skb))
> return ip_fragment(skb, br_dev_queue_push_xmit);
It would be better to account for the tag in the length check.
Something like
if (skb->protocol == htons(ETH_P_IP) &&
skb->len > skb->dev->mtu - (IS_VLAN_IP(skb) ? VLAN_HLEN : 0) &&
!skb_is_gso(skb))
return ip_fragment ...
--
Stephen Hemminger <[email protected]>
Stephen Hemminger wrote:
> It would be better to account for the tag in the length check.
> Something like
> if (skb->protocol == htons(ETH_P_IP) &&
> skb->len > skb->dev->mtu - (IS_VLAN_IP(skb) ? VLAN_HLEN : 0) &&
> !skb_is_gso(skb))
> return ip_fragment ...
It isn't good solution because one of IS_VLAN_IP() necessary condition is
skb->protocol == htons(ETH_P_8021Q)
which is, of course, mutually exclusive with
skb->protocol == htons(ETH_P_IP)
from br_nf_dev_queue_xmit(). IMHO, one should check length of ETH_P_IP
and ETH_P_8021Q frames separately:
if (((skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu) ||
(IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)) &&
!skb_is_gso(skb))
return ip_fragment ...
Adam Osuchowski wrote:
> Stephen Hemminger wrote:
>
>>It would be better to account for the tag in the length check.
>>Something like
>> if (skb->protocol == htons(ETH_P_IP) &&
>> skb->len > skb->dev->mtu - (IS_VLAN_IP(skb) ? VLAN_HLEN : 0) &&
>> !skb_is_gso(skb))
>> return ip_fragment ...
>
>
> It isn't good solution because one of IS_VLAN_IP() necessary condition is
>
> skb->protocol == htons(ETH_P_8021Q)
>
> which is, of course, mutually exclusive with
>
> skb->protocol == htons(ETH_P_IP)
>
> from br_nf_dev_queue_xmit(). IMHO, one should check length of ETH_P_IP
> and ETH_P_8021Q frames separately:
>
> if (((skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu) ||
> (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)) &&
> !skb_is_gso(skb))
> return ip_fragment ...
net/8021q ignores the VLAN header overhead, so we should probably do the
same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current
-rc) looks fine, additionally we should probably also check for
skb->nfct != NULL to make sure that at least without connection tracking
the bridge doesn't perform fragmentation.
On Saturday 26 May 2007, Patrick McHardy wrote:
> Adam Osuchowski wrote:
> > if (((skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu) ||
> > (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)) &&
> > !skb_is_gso(skb))
> > return ip_fragment ...
>
>
> net/8021q ignores the VLAN header overhead, so we should probably do the
> same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current
> -rc) looks fine, additionally we should probably also check for
> skb->nfct != NULL to make sure that at least without connection tracking
> the bridge doesn't perform fragmentation.
And could we separe the conditions for that into a static helper function
explaining each of these conditions? e.g. sth. like that:
static bool br_nf_need_fragment(struct sk_buff *skb)
{
/* Plain IP packet does not fit in MTU */
if (!(skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu))
return true;
/* VLAN encapsulated IP packet does not fit in MTU */
if (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)
return true;
/* PPPoE encapsulated IP packet does not fit in MTU */
if (IS_PPPOE_IP(skb) && skb->len > skb->dev->mtu - PPPOE_SES_HLEN)
return true;
return false;
}
and then br_nf_dev_queue_xmit() becomes:
static int br_nf_dev_queue_xmit(struct sk_buff *skb)
{
if (br_nf_need_fragment(skb) && !skb_is_gso(skb))
return ip_fragment(skb, br_dev_queue_push_xmit);
else
return br_dev_queue_push_xmit(skb);
}
which is much more readable, more documented and doesn't contain a condition monster :-)
@Patrick: Could you check, wether the PPPoE case is correct?
What do you think? Should I submit a patch for that?
Best Regards
Ingo Oeser
Ingo Oeser wrote:
> On Saturday 26 May 2007, Patrick McHardy wrote:
>
>>net/8021q ignores the VLAN header overhead, so we should probably do the
>>same here for consistency. Using IS_VLAN_IP (and IS_PPPOE_IP for current
>>-rc) looks fine, additionally we should probably also check for
>>skb->nfct != NULL to make sure that at least without connection tracking
>>the bridge doesn't perform fragmentation.
>
>
> And could we separe the conditions for that into a static helper function
> explaining each of these conditions? e.g. sth. like that:
The MTU checks are self-explanatory. Just a comment above the function
stating that it tries to find out whether a packet needs to be
refragmented because it was defragmented by IPv4 connection tracking
and exceeds the MTU should be enough.
> static bool br_nf_need_fragment(struct sk_buff *skb)
> {
> /* Plain IP packet does not fit in MTU */
> if (!(skb->protocol == htons(ETH_P_IP) && skb->len > skb->dev->mtu))
> return true;
>
> /* VLAN encapsulated IP packet does not fit in MTU */
> if (IS_VLAN_IP(skb) && skb->len > skb->dev->mtu - VLAN_HLEN)
> return true;
>
> /* PPPoE encapsulated IP packet does not fit in MTU */
> if (IS_PPPOE_IP(skb) && skb->len > skb->dev->mtu - PPPOE_SES_HLEN)
> return true;
>
> return false;
> }
As I said, I don't think we should account for the VLAN header overhead,
the VLAN code itself doesn't even do it. And we should exclude packets
that don't have a connection tracking reference attached since we are
only undoing the damage connection tracking did by defragmenting it
and should avoid fragmenting other packets as good as possible.
> and then br_nf_dev_queue_xmit() becomes:
>
> static int br_nf_dev_queue_xmit(struct sk_buff *skb)
> {
> if (br_nf_need_fragment(skb) && !skb_is_gso(skb))
> return ip_fragment(skb, br_dev_queue_push_xmit);
> else
> return br_dev_queue_push_xmit(skb);
> }
>
> which is much more readable, more documented and doesn't contain a condition monster :-)
>
> @Patrick: Could you check, wether the PPPoE case is correct?
It looks OK. But there is another problem, ip_fragment doesn't care
about the PPPoE overhead and produces a packet that will be too large
after restoring the PPPoE header. A second __fake_rtable that accounts
for the PPPoE overhead could probably fix that ..
> What do you think? Should I submit a patch for that?
Sure :)