Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756440AbbHFVgh (ORCPT ); Thu, 6 Aug 2015 17:36:37 -0400 Received: from mail-io0-f181.google.com ([209.85.223.181]:34107 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756291AbbHFVgf (ORCPT ); Thu, 6 Aug 2015 17:36:35 -0400 MIME-Version: 1.0 In-Reply-To: References: <1438750199-9061-1-git-send-email-joestringer@nicira.com> <1438750199-9061-6-git-send-email-joestringer@nicira.com> Date: Thu, 6 Aug 2015 14:36:34 -0700 Message-ID: Subject: Re: [PATCHv2 net-next 5/9] openvswitch: Add conntrack action From: Pravin Shelar To: Joe Stringer Cc: netdev , LKML , pablo , Patrick McHardy , Justin Pettit , Andy Zhou , Jesse Gross , Florian Westphal , Hannes Sowa , Thomas Graf Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6057 Lines: 141 On Thu, Aug 6, 2015 at 11:07 AM, Joe Stringer wrote: > On 5 August 2015 at 15:31, Pravin Shelar wrote: >> On Tue, Aug 4, 2015 at 9:49 PM, Joe Stringer wrote: >> I got sparse warning: >> >> net/openvswitch/actions.c:634:1: warning: symbol 'ovs_dst_get_mtu' was >> not declared. Should it be static? >> >> net/openvswitch/actions.c:703:17: warning: cast from restricted __be16 >> >> net/openvswitch/actions.c:703:17: warning: incorrect type in argument >> 1 (different base types) >> >> net/openvswitch/actions.c:703:17: expected unsigned short >> [unsigned] [usertype] val >> >> net/openvswitch/actions.c:703:17: got restricted __be16 [usertype] ethertype >> >> net/openvswitch/actions.c:703:17: warning: cast from restricted __be16 >> >> net/openvswitch/actions.c:703:17: warning: cast from restricted __be16 > > Hrm, thanks. I'll figure out why I wasn't getting any warnings and fix these up. > I used following sparse parameters: make C=1 CF="-Wsparse-all -D__CHECKER__ -D__CHECK_ENDIAN__ -Wbitwise" >>> -static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port) >>> +static int ovs_vport_output(struct sock *sock, struct sk_buff *skb) >>> +{ >>> + struct ovs_frag_data *data = this_cpu_ptr(&ovs_frag_data_storage); >>> + struct vport *vport = data->vport; >>> + >>> + if (skb_cow_head(skb, data->l2_len) < 0) >>> + return -ENOMEM; >>> + >> Need to free skb here. > > OK. > >>> + skb->_skb_refdst = data->dst; >> I think we need to clone dst if there are multiple fragments. >> Can you test this code with vxlan flow based tunnels? > > In prepare_frag(), we're currently doing: > data->dst = skb->_skb_refdst & SKB_DST_NOREF > > Which means that the fragments shouldn't need an additional reference. > However it also means that the original reference is never freed. I > think we just need to add the final free of the original reference > after fragmentation is finished, in ovs_fragment(). I'll do this. > Each fragment needs to have its reference on the dst-entry if it is reference counted dst entry. > I assume that's the same as just setting up OVS with a vxlan tunnel > and sending some traffic across it with different sizes. I have a test > case for that: > https://github.com/joestringer/openvswitch/blob/dev/ct_20150806/tests/kmod-traffic.at#L964 > ok > >>> +static void ovs_fragment(struct vport *vport, struct sk_buff *skb, >>> + unsigned int mru, __be16 ethertype) >>> +{ >>> + if (skb_network_offset(skb) > MAX_L2_LEN) { >>> + OVS_NLERR(1, "L2 header too long to fragment"); >>> + return; >>> + } >>> + >>> + if (ethertype == htons(ETH_P_IP)) { >>> + struct dst_entry ovs_dst; >>> + >>> + prepare_frag(vport, skb); >>> + dst_init(&ovs_dst, &ovs_dst_ops, NULL, 1, >>> + DST_OBSOLETE_NONE, DST_NOCOUNT); >>> + ovs_dst.dev = vport->dev; >>> + >>> + skb_dst_set_noref(skb, &ovs_dst); >>> + IPCB(skb)->frag_max_size = mru; >>> + >>> + ip_do_fragment(skb->sk, skb, ovs_vport_output); >>> + } else if (ethertype == htons(ETH_P_IPV6)) { >>> + const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops(); >>> + struct rt6_info ovs_rt; >>> + >>> + if (!v6ops) { >>> + kfree_skb(skb); >>> + return; >>> + } >>> + >>> + prepare_frag(vport, skb); >>> + memset(&ovs_rt, 0, sizeof(ovs_rt)); >>> + dst_init(&ovs_rt.dst, &ovs_dst_ops, NULL, 1, >>> + DST_OBSOLETE_NONE, DST_NOCOUNT); >>> + ovs_rt.dst.dev = vport->dev; >>> + >>> + skb_dst_set_noref(skb, &ovs_rt.dst); >>> + IP6CB(skb)->frag_max_size = mru; >>> + >>> + v6ops->fragment(skb->sk, skb, ovs_vport_output); >>> + } else { >>> + WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", >>> + ovs_vport_name(vport), htons(ethertype), mru, >>> + vport->dev->mtu); >>> + kfree_skb(skb); >>> + } >>> +} >>> + >> We also need something similar of this packet is going to userspace so >> that we can send original packets to userspace. Otherwise we would >> send defragmented packet to userspace. > > OK, in that case we'll need to get an MTU from somewhere. I'll look at > using the MRU as the MTU for this path, since corner cases where the > MRU is greater than the netlink payload size seems pretty unlikely > (and the netlink sending code should already handle such cases). The > other concern I have is exactly how this should be presented to > userspace. Currently the conntrack action is treated an an implicit > reassembly, which will implicitly refragment on output. In between, it > remains defragmented. If we fragment on miss, then lookup will use the > key representing the defragmented packet (ie no OVS_FRAG_TYPE_* bits > set), so we should send the same up to userspace. I assume that > userspace would then re-parse the packets and see that they are > fragments for representing up to the higher layers like OpenFlow, but > for flow installation it would reuse the key passed up from the > kernel. Is that the model you have in mind? > Right, Reassembly is transparent to cantrack action, so it should be to userspace. But this means we will need to fragment in upcall and defrag the skb again when the packet reenter kernel module from packet execute code path if we the action need to look at entire packet. So lets just keep it based on MRU parameter and we can enhance it later if we need it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/