The goal of this series is to allow OVS to send packets through the Linux
kernel connection tracker, and subsequently match on fields populated by
conntrack.
This version addresses the feedback from v3, primarily shifting the masked
set of connmark and connlabel to nest under the conntrack action. This
simplifies the API from the perspective of the SET_FIELD actions by not
requiring special-casing for these fields. The fields are still maskable, but
must be set as part of conntrack action execution. Some related refactoring and
code simplifications have been rolled in due to these changes as well.
This functionality is enabled through the CONFIG_OPENVSWITCH_CONNTRACK option.
The branch below has been updated with the corresponding userspace pieces:
https://github.com/joestringer/ovs dev/ct_20150818
v4: Shift ability to set ct_mark/ct_label into ACTION_ATTR_CT.
Streamline fetching/setting conntrack fields.
Update some documentation.
Add acks from v2/v3 review.
v3: Better handling of skb->dst in output/fragmentation
Address cases involving l2 metadata and checksums
Make MRU types more consistent
Better cleanup in error paths
Fix sparse warnings
v2: Split out per-netns connlabel width setting functions
Simplify reference tracking in output path.
Handle output cases where flow key is invalidated by prior push/pop
Store entire L2 header to apply to fragments
Various bits of refactoring, comments, styles, log improvements
Defer patch to scrub skb
Rebase
v1: First non-RFC post.
Fragment handling.
Conntrack label support.
Joe Stringer (10):
openvswitch: Serialize acts with original netlink len
openvswitch: Move MASKED* macros to datapath.h
ipv6: Export nf_ct_frag6_gather()
dst: Add __skb_dst_copy() variation
openvswitch: Add conntrack action
openvswitch: Allow matching on conntrack mark
netfilter: Always export nf_connlabels_replace()
netfilter: connlabels: Export setting connlabel length
openvswitch: Allow matching on conntrack label
openvswitch: Allow attaching helpers to ct action
include/net/dst.h | 9 +-
include/net/netfilter/nf_conntrack_labels.h | 4 +
include/uapi/linux/openvswitch.h | 58 +++
net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
net/netfilter/nf_conntrack_labels.c | 34 +-
net/netfilter/xt_connlabel.c | 16 +-
net/openvswitch/Kconfig | 11 +
net/openvswitch/Makefile | 2 +
net/openvswitch/actions.c | 229 +++++++--
net/openvswitch/conntrack.c | 713 ++++++++++++++++++++++++++++
net/openvswitch/conntrack.h | 92 ++++
net/openvswitch/datapath.c | 74 ++-
net/openvswitch/datapath.h | 12 +
net/openvswitch/flow.c | 2 +
net/openvswitch/flow.h | 9 +
net/openvswitch/flow_netlink.c | 103 +++-
net/openvswitch/flow_netlink.h | 4 +-
net/openvswitch/vport.c | 1 +
18 files changed, 1293 insertions(+), 81 deletions(-)
create mode 100644 net/openvswitch/conntrack.c
create mode 100644 net/openvswitch/conntrack.h
--
2.1.4
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
---
v2: No change.
v3: Preserve original length across buffer resize.
v4: Add ack.
---
net/openvswitch/datapath.c | 2 +-
net/openvswitch/flow.h | 1 +
net/openvswitch/flow_netlink.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ffe984f..d5b5473 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -713,7 +713,7 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts,
/* OVS_FLOW_ATTR_ACTIONS */
if (should_fill_actions(ufid_flags))
- len += nla_total_size(acts->actions_len);
+ len += nla_total_size(acts->orig_len);
return len
+ nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index b62cdb3..082a87b 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -144,6 +144,7 @@ struct sw_flow_id {
struct sw_flow_actions {
struct rcu_head rcu;
+ size_t orig_len; /* From flow_cmd_new netlink actions size */
u32 actions_len;
struct nlattr actions[];
};
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index a6eb77a..96cad8c 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1619,6 +1619,7 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len);
acts->actions_len = (*sfa)->actions_len;
+ acts->orig_len = (*sfa)->orig_len;
kfree(*sfa);
*sfa = acts;
@@ -2223,6 +2224,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
if (IS_ERR(*sfa))
return PTR_ERR(*sfa);
+ (*sfa)->orig_len = nla_len(attr);
err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
key->eth.tci, log);
if (err)
--
2.1.4
This will allow the ovs-conntrack code to reuse these macros.
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
---
v4: Add ack.
---
net/openvswitch/actions.c | 52 ++++++++++++++++++++++------------------------
net/openvswitch/datapath.h | 4 ++++
2 files changed, 29 insertions(+), 27 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 14da52d..11643fa 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -185,10 +185,6 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
return 0;
}
-/* 'KEY' must not have any bits set outside of the 'MASK' */
-#define MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK)))
-#define SET_MASKED(OLD, KEY, MASK) ((OLD) = MASKED(OLD, KEY, MASK))
-
static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
const __be32 *mpls_lse, const __be32 *mask)
{
@@ -201,7 +197,7 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
return err;
stack = (__be32 *)skb_mpls_header(skb);
- lse = MASKED(*stack, *mpls_lse, *mask);
+ lse = OVS_MASKED(*stack, *mpls_lse, *mask);
if (skb->ip_summed == CHECKSUM_COMPLETE) {
__be32 diff[] = { ~(*stack), lse };
@@ -244,9 +240,9 @@ static void ether_addr_copy_masked(u8 *dst_, const u8 *src_, const u8 *mask_)
const u16 *src = (const u16 *)src_;
const u16 *mask = (const u16 *)mask_;
- SET_MASKED(dst[0], src[0], mask[0]);
- SET_MASKED(dst[1], src[1], mask[1]);
- SET_MASKED(dst[2], src[2], mask[2]);
+ OVS_SET_MASKED(dst[0], src[0], mask[0]);
+ OVS_SET_MASKED(dst[1], src[1], mask[1]);
+ OVS_SET_MASKED(dst[2], src[2], mask[2]);
}
static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
@@ -338,10 +334,10 @@ static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto,
static void mask_ipv6_addr(const __be32 old[4], const __be32 addr[4],
const __be32 mask[4], __be32 masked[4])
{
- masked[0] = MASKED(old[0], addr[0], mask[0]);
- masked[1] = MASKED(old[1], addr[1], mask[1]);
- masked[2] = MASKED(old[2], addr[2], mask[2]);
- masked[3] = MASKED(old[3], addr[3], mask[3]);
+ masked[0] = OVS_MASKED(old[0], addr[0], mask[0]);
+ masked[1] = OVS_MASKED(old[1], addr[1], mask[1]);
+ masked[2] = OVS_MASKED(old[2], addr[2], mask[2]);
+ masked[3] = OVS_MASKED(old[3], addr[3], mask[3]);
}
static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
@@ -358,15 +354,15 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask)
{
/* Bits 21-24 are always unmasked, so this retains their values. */
- SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
- SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
- SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
+ OVS_SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
+ OVS_SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
+ OVS_SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
}
static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl,
u8 mask)
{
- new_ttl = MASKED(nh->ttl, new_ttl, mask);
+ new_ttl = OVS_MASKED(nh->ttl, new_ttl, mask);
csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
nh->ttl = new_ttl;
@@ -392,7 +388,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key *flow_key,
* makes sense to check if the value actually changed.
*/
if (mask->ipv4_src) {
- new_addr = MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
+ new_addr = OVS_MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
if (unlikely(new_addr != nh->saddr)) {
set_ip_addr(skb, nh, &nh->saddr, new_addr);
@@ -400,7 +396,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key *flow_key,
}
}
if (mask->ipv4_dst) {
- new_addr = MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
+ new_addr = OVS_MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
if (unlikely(new_addr != nh->daddr)) {
set_ip_addr(skb, nh, &nh->daddr, new_addr);
@@ -488,7 +484,8 @@ static int set_ipv6(struct sk_buff *skb, struct sw_flow_key *flow_key,
*(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL);
}
if (mask->ipv6_hlimit) {
- SET_MASKED(nh->hop_limit, key->ipv6_hlimit, mask->ipv6_hlimit);
+ OVS_SET_MASKED(nh->hop_limit, key->ipv6_hlimit,
+ mask->ipv6_hlimit);
flow_key->ip.ttl = nh->hop_limit;
}
return 0;
@@ -517,8 +514,8 @@ static int set_udp(struct sk_buff *skb, struct sw_flow_key *flow_key,
uh = udp_hdr(skb);
/* Either of the masks is non-zero, so do not bother checking them. */
- src = MASKED(uh->source, key->udp_src, mask->udp_src);
- dst = MASKED(uh->dest, key->udp_dst, mask->udp_dst);
+ src = OVS_MASKED(uh->source, key->udp_src, mask->udp_src);
+ dst = OVS_MASKED(uh->dest, key->udp_dst, mask->udp_dst);
if (uh->check && skb->ip_summed != CHECKSUM_PARTIAL) {
if (likely(src != uh->source)) {
@@ -558,12 +555,12 @@ static int set_tcp(struct sk_buff *skb, struct sw_flow_key *flow_key,
return err;
th = tcp_hdr(skb);
- src = MASKED(th->source, key->tcp_src, mask->tcp_src);
+ src = OVS_MASKED(th->source, key->tcp_src, mask->tcp_src);
if (likely(src != th->source)) {
set_tp_port(skb, &th->source, src, &th->check);
flow_key->tp.src = src;
}
- dst = MASKED(th->dest, key->tcp_dst, mask->tcp_dst);
+ dst = OVS_MASKED(th->dest, key->tcp_dst, mask->tcp_dst);
if (likely(dst != th->dest)) {
set_tp_port(skb, &th->dest, dst, &th->check);
flow_key->tp.dst = dst;
@@ -590,8 +587,8 @@ static int set_sctp(struct sk_buff *skb, struct sw_flow_key *flow_key,
old_csum = sh->checksum;
old_correct_csum = sctp_compute_cksum(skb, sctphoff);
- sh->source = MASKED(sh->source, key->sctp_src, mask->sctp_src);
- sh->dest = MASKED(sh->dest, key->sctp_dst, mask->sctp_dst);
+ sh->source = OVS_MASKED(sh->source, key->sctp_src, mask->sctp_src);
+ sh->dest = OVS_MASKED(sh->dest, key->sctp_dst, mask->sctp_dst);
new_csum = sctp_compute_cksum(skb, sctphoff);
@@ -770,12 +767,13 @@ static int execute_masked_set_action(struct sk_buff *skb,
switch (nla_type(a)) {
case OVS_KEY_ATTR_PRIORITY:
- SET_MASKED(skb->priority, nla_get_u32(a), *get_mask(a, u32 *));
+ OVS_SET_MASKED(skb->priority, nla_get_u32(a),
+ *get_mask(a, u32 *));
flow_key->phy.priority = skb->priority;
break;
case OVS_KEY_ATTR_SKB_MARK:
- SET_MASKED(skb->mark, nla_get_u32(a), *get_mask(a, u32 *));
+ OVS_SET_MASKED(skb->mark, nla_get_u32(a), *get_mask(a, u32 *));
flow_key->phy.skb_mark = skb->mark;
break;
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 6b28c5c..487a85f 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -200,6 +200,10 @@ void ovs_dp_notify_wq(struct work_struct *work);
int action_fifos_init(void);
void action_fifos_exit(void);
+/* 'KEY' must not have any bits set outside of the 'MASK' */
+#define OVS_MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK)))
+#define OVS_SET_MASKED(OLD, KEY, MASK) ((OLD) = OVS_MASKED(OLD, KEY, MASK))
+
#define OVS_NLERR(logging_allowed, fmt, ...) \
do { \
if (logging_allowed && net_ratelimit()) \
--
2.1.4
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
---
v4: Add ack.
---
net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6d02498..701cd2b 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -633,6 +633,7 @@ ret_orig:
kfree_skb(clone);
return skb;
}
+EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
void nf_ct_frag6_consume_orig(struct sk_buff *skb)
{
--
2.1.4
This variation on skb_dst_copy() doesn't require two skbs.
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
---
v4: Add ack.
---
include/net/dst.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index 2578811..0539940 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -285,13 +285,18 @@ static inline void skb_dst_drop(struct sk_buff *skb)
}
}
-static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb)
+static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst)
{
- nskb->_skb_refdst = oskb->_skb_refdst;
+ nskb->_skb_refdst = refdst;
if (!(nskb->_skb_refdst & SKB_DST_NOREF))
dst_clone(skb_dst(nskb));
}
+static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb)
+{
+ __skb_dst_copy(nskb, oskb->_skb_refdst);
+}
+
/**
* skb_dst_force - makes sure skb dst is refcounted
* @skb: buffer
--
2.1.4
Expose the kernel connection tracker via OVS. Userspace components can
make use of the "ct()" action, followed by "recirculate", to populate
the conntracking state in the OVS flow key, and subsequently match on
that state.
Example ODP flows allowing traffic from 1->2, only replies from 2->1:
in_port=1,tcp,action=ct(commit,zone=1),2
in_port=2,ct_state=-trk,tcp,action=ct(zone=1),recirc(1)
recirc_id=1,in_port=2,ct_state=+trk+est-new,tcp,action=1
IP fragments are handled by transparently assembling them as part of the
ct action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.
IP frag handling contributed by Andy Zhou.
Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Justin Pettit <[email protected]>
Signed-off-by: Andy Zhou <[email protected]>
---
This can be tested with the corresponding userspace component here:
https://www.github.com/justinpettit/openvswitch conntrack
v2: Don't take references to devs or dsts in output path.
Shift ovs_ct_init()/ovs_ct_exit() into this patch
Handle output case where flow key is invalidated
Store the entire L2 header to apply to fragments
Various minor simplifications
Improve comments/logs
Style fixes
Rebase
v3: Clone dst in output, free final dst reference properly.
Handle CHECKSUM_COMPLETE after fragmentation
Restore L2 skb metadata after fragmentation
Make MRU types more consistent
Better cleanup in error paths
Fix sparse warnings
v4: Reject set_field actions for ct_state,ct_zone
Combine key->ct update from skb->nfct into a single function.
Minor documentation tweaks.
Simplify some codepaths.
---
include/uapi/linux/openvswitch.h | 40 ++++
net/openvswitch/Kconfig | 11 +
net/openvswitch/Makefile | 2 +
net/openvswitch/actions.c | 175 ++++++++++++++-
net/openvswitch/conntrack.c | 466 +++++++++++++++++++++++++++++++++++++++
net/openvswitch/conntrack.h | 91 ++++++++
net/openvswitch/datapath.c | 72 ++++--
net/openvswitch/datapath.h | 8 +
net/openvswitch/flow.c | 2 +
net/openvswitch/flow.h | 6 +
net/openvswitch/flow_netlink.c | 72 ++++--
net/openvswitch/flow_netlink.h | 4 +-
net/openvswitch/vport.c | 1 +
13 files changed, 913 insertions(+), 37 deletions(-)
create mode 100644 net/openvswitch/conntrack.c
create mode 100644 net/openvswitch/conntrack.h
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d6b8854..55f5997 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -164,6 +164,9 @@ enum ovs_packet_cmd {
* %OVS_USERSPACE_ATTR_EGRESS_TUN_PORT attribute, which is sent only if the
* output port is actually a tunnel port. Contains the output tunnel key
* extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
+ * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
+ * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
+ * size.
*
* These attributes follow the &struct ovs_header within the Generic Netlink
* payload for %OVS_PACKET_* commands.
@@ -180,6 +183,7 @@ enum ovs_packet_attr {
OVS_PACKET_ATTR_UNUSED2,
OVS_PACKET_ATTR_PROBE, /* Packet operation is a feature probe,
error logging should be suppressed. */
+ OVS_PACKET_ATTR_MRU, /* Maximum received IP fragment size. */
__OVS_PACKET_ATTR_MAX
};
@@ -319,6 +323,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls.
* The implementation may restrict
* the accepted length of the array. */
+ OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */
+ OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
#ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
@@ -431,6 +437,15 @@ struct ovs_key_nd {
__u8 nd_tll[ETH_ALEN];
};
+/* OVS_KEY_ATTR_CT_STATE flags */
+#define OVS_CS_F_NEW 0x01 /* Beginning of a new connection. */
+#define OVS_CS_F_ESTABLISHED 0x02 /* Part of an existing connection. */
+#define OVS_CS_F_RELATED 0x04 /* Related to an established
+ * connection. */
+#define OVS_CS_F_INVALID 0x20 /* Could not track connection. */
+#define OVS_CS_F_REPLY_DIR 0x40 /* Flow is in the reply direction. */
+#define OVS_CS_F_TRACKED 0x80 /* Conntrack has occurred. */
+
/**
* enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
* @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
@@ -595,6 +610,28 @@ struct ovs_action_hash {
};
/**
+ * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
+ * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
+ * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
+ */
+enum ovs_ct_attr {
+ OVS_CT_ATTR_UNSPEC,
+ OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */
+ OVS_CT_ATTR_ZONE, /* u16 zone id. */
+ __OVS_CT_ATTR_MAX
+};
+
+#define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1)
+
+/*
+ * OVS_CT_ATTR_FLAGS flags - bitmask of %OVS_CT_F_*
+ * @OVS_CT_F_COMMIT: Commits the flow to the conntrack table. This allows
+ * future packets for the same connection to be identified as 'established'
+ * or 'related'.
+ */
+#define OVS_CT_F_COMMIT 0x01
+
+/**
* enum ovs_action_attr - Action types.
*
* @OVS_ACTION_ATTR_OUTPUT: Output packet to port.
@@ -623,6 +660,8 @@ struct ovs_action_hash {
* indicate the new packet contents. This could potentially still be
* %ETH_P_MPLS if the resulting MPLS label stack is not empty. If there
* is no MPLS label stack, as determined by ethertype, no action is taken.
+ * @OVS_ACTION_ATTR_CT: Track the connection. Populate the conntrack-related
+ * entries in the flow key.
*
* Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all
* fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -648,6 +687,7 @@ enum ovs_action_attr {
* data immediately followed by a mask.
* The data must be zero for the unmasked
* bits. */
+ OVS_ACTION_ATTR_CT, /* One nested OVS_CT_ATTR_* . */
__OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted
* from userspace. */
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 422dc05..98f343d 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -31,6 +31,17 @@ config OPENVSWITCH
If unsure, say N.
+config OPENVSWITCH_CONNTRACK
+ bool "Open vSwitch conntrack action support"
+ depends on OPENVSWITCH
+ depends on NF_CONNTRACK
+ default OPENVSWITCH
+ ---help---
+ If you say Y here, then Open vSwitch module will be able to pass
+ packets through conntrack.
+
+ Say N to exclude this support and reduce the binary size.
+
config OPENVSWITCH_GRE
tristate "Open vSwitch GRE tunneling support"
depends on OPENVSWITCH
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 6e1701d..5b5913b 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -15,6 +15,8 @@ openvswitch-y := \
vport-internal_dev.o \
vport-netdev.o
+openvswitch-$(CONFIG_OPENVSWITCH_CONNTRACK) += conntrack.o
+
obj-$(CONFIG_OPENVSWITCH_VXLAN)+= vport-vxlan.o
obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
obj-$(CONFIG_OPENVSWITCH_GRE) += vport-gre.o
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 11643fa..5911a2a 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -22,6 +22,7 @@
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/openvswitch.h>
+#include <linux/netfilter_ipv6.h>
#include <linux/sctp.h>
#include <linux/tcp.h>
#include <linux/udp.h>
@@ -29,6 +30,7 @@
#include <linux/if_arp.h>
#include <linux/if_vlan.h>
+#include <net/dst.h>
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/checksum.h>
@@ -38,6 +40,7 @@
#include "datapath.h"
#include "flow.h"
+#include "conntrack.h"
#include "vport.h"
static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
@@ -52,6 +55,20 @@ struct deferred_action {
struct sw_flow_key pkt_key;
};
+#define MAX_L2_LEN (VLAN_ETH_HLEN + 3 * MPLS_HLEN)
+struct ovs_frag_data {
+ unsigned long dst;
+ struct vport *vport;
+ struct ovs_skb_cb cb;
+ __be16 inner_protocol;
+ __u16 vlan_tci;
+ __be16 vlan_proto;
+ unsigned int l2_len;
+ u8 l2_data[MAX_L2_LEN];
+};
+
+static DEFINE_PER_CPU(struct ovs_frag_data, ovs_frag_data_storage);
+
#define DEFERRED_ACTION_FIFO_SIZE 10
struct action_fifo {
int head;
@@ -602,14 +619,145 @@ static int set_sctp(struct sk_buff *skb, struct sw_flow_key *flow_key,
return 0;
}
-static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
+static int ovs_vport_output(struct sock *sock, struct sk_buff *skb)
+{
+ struct ovs_frag_data *data = this_cpu_ptr(&ovs_frag_data_storage);
+ struct vport *vport = data->vport;
+
+ if (skb_cow_head(skb, data->l2_len) < 0) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+
+ __skb_dst_copy(skb, data->dst);
+ *OVS_CB(skb) = data->cb;
+ skb->inner_protocol = data->inner_protocol;
+ skb->vlan_tci = data->vlan_tci;
+ skb->vlan_proto = data->vlan_proto;
+
+ /* Reconstruct the MAC header. */
+ skb_push(skb, data->l2_len);
+ memcpy(skb->data, &data->l2_data, data->l2_len);
+ ovs_skb_postpush_rcsum(skb, skb->data, data->l2_len);
+ skb_reset_mac_header(skb);
+
+ ovs_vport_send(vport, skb);
+ return 0;
+}
+
+static unsigned int
+ovs_dst_get_mtu(const struct dst_entry *dst)
+{
+ return dst->dev->mtu;
+}
+
+static struct dst_ops ovs_dst_ops = {
+ .family = AF_UNSPEC,
+ .mtu = ovs_dst_get_mtu,
+};
+
+/* prepare_frag() is called once per (larger-than-MTU) frame; its inverse is
+ * ovs_vport_output(), which is called once per fragmented packet.
+ */
+static void prepare_frag(struct vport *vport, struct sk_buff *skb)
+{
+ unsigned int hlen = skb_network_offset(skb);
+ struct ovs_frag_data *data;
+
+ data = this_cpu_ptr(&ovs_frag_data_storage);
+ data->dst = skb->_skb_refdst;
+ data->vport = vport;
+ data->cb = *OVS_CB(skb);
+ data->inner_protocol = skb->inner_protocol;
+ data->vlan_tci = skb->vlan_tci;
+ data->vlan_proto = skb->vlan_proto;
+ data->l2_len = hlen;
+ memcpy(&data->l2_data, skb->data, hlen);
+
+ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+ skb_pull(skb, hlen);
+}
+
+static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru,
+ __be16 ethertype)
+{
+ if (skb_network_offset(skb) > MAX_L2_LEN) {
+ OVS_NLERR(1, "L2 header too long to fragment");
+ return;
+ }
+
+ if (ethertype == htons(ETH_P_IP)) {
+ struct dst_entry ovs_dst;
+ unsigned long orig_dst;
+
+ prepare_frag(vport, skb);
+ dst_init(&ovs_dst, &ovs_dst_ops, NULL, 1,
+ DST_OBSOLETE_NONE, DST_NOCOUNT);
+ ovs_dst.dev = vport->dev;
+
+ orig_dst = skb->_skb_refdst;
+ skb_dst_set_noref(skb, &ovs_dst);
+ IPCB(skb)->frag_max_size = mru;
+
+ ip_do_fragment(skb->sk, skb, ovs_vport_output);
+ refdst_drop(orig_dst);
+ } else if (ethertype == htons(ETH_P_IPV6)) {
+ const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
+ unsigned long orig_dst;
+ struct rt6_info ovs_rt;
+
+ if (!v6ops) {
+ kfree_skb(skb);
+ return;
+ }
+
+ prepare_frag(vport, skb);
+ memset(&ovs_rt, 0, sizeof(ovs_rt));
+ dst_init(&ovs_rt.dst, &ovs_dst_ops, NULL, 1,
+ DST_OBSOLETE_NONE, DST_NOCOUNT);
+ ovs_rt.dst.dev = vport->dev;
+
+ orig_dst = skb->_skb_refdst;
+ skb_dst_set_noref(skb, &ovs_rt.dst);
+ IP6CB(skb)->frag_max_size = mru;
+
+ v6ops->fragment(skb->sk, skb, ovs_vport_output);
+ refdst_drop(orig_dst);
+ } else {
+ WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
+ ovs_vport_name(vport), ntohs(ethertype), mru,
+ vport->dev->mtu);
+ kfree_skb(skb);
+ }
+}
+
+static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
+ struct sw_flow_key *key)
{
struct vport *vport = ovs_vport_rcu(dp, out_port);
- if (likely(vport))
- ovs_vport_send(vport, skb);
- else
+ if (likely(vport)) {
+ u16 mru = OVS_CB(skb)->mru;
+
+ if (likely(!mru || (skb->len <= mru + ETH_HLEN))) {
+ ovs_vport_send(vport, skb);
+ } else if (mru <= vport->dev->mtu) {
+ __be16 ethertype = key->eth.type;
+
+ if (!is_flow_key_valid(key)) {
+ if (eth_p_mpls(skb->protocol))
+ ethertype = skb->inner_protocol;
+ else
+ ethertype = vlan_get_protocol(skb);
+ }
+
+ ovs_fragment(vport, skb, mru, ethertype);
+ } else {
+ kfree_skb(skb);
+ }
+ } else {
kfree_skb(skb);
+ }
}
static int output_userspace(struct datapath *dp, struct sk_buff *skb,
@@ -623,6 +771,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
memset(&upcall, 0, sizeof(upcall));
upcall.cmd = OVS_PACKET_CMD_ACTION;
+ upcall.mru = OVS_CB(skb)->mru;
for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
a = nla_next(a, &rem)) {
@@ -816,6 +965,11 @@ static int execute_masked_set_action(struct sk_buff *skb,
err = set_mpls(skb, flow_key, nla_data(a), get_mask(a,
__be32 *));
break;
+
+ case OVS_KEY_ATTR_CT_STATE:
+ case OVS_KEY_ATTR_CT_ZONE:
+ err = -EINVAL;
+ break;
}
return err;
@@ -885,7 +1039,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
struct sk_buff *out_skb = skb_clone(skb, GFP_ATOMIC);
if (out_skb)
- do_output(dp, out_skb, prev_port);
+ do_output(dp, out_skb, prev_port, key);
prev_port = -1;
}
@@ -942,6 +1096,15 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
case OVS_ACTION_ATTR_SAMPLE:
err = sample(dp, skb, key, a, attr, len);
break;
+
+ case OVS_ACTION_ATTR_CT:
+ err = ovs_ct_execute(ovs_dp_get_net(dp), skb, key,
+ nla_data(a));
+
+ /* Hide stolen IP fragments from user space. */
+ if (err == -EINPROGRESS)
+ return 0;
+ break;
}
if (unlikely(err)) {
@@ -951,7 +1114,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
}
if (prev_port != -1)
- do_output(dp, skb, prev_port);
+ do_output(dp, skb, prev_port, key);
else
consume_skb(skb);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
new file mode 100644
index 0000000..601cd16
--- /dev/null
+++ b/net/openvswitch/conntrack.c
@@ -0,0 +1,466 @@
+/*
+ * Copyright (c) 2015 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/openvswitch.h>
+#include <net/ip.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
+
+#include "datapath.h"
+#include "conntrack.h"
+#include "flow.h"
+#include "flow_netlink.h"
+
+struct ovs_ct_len_tbl {
+ size_t maxlen;
+ size_t minlen;
+};
+
+/* Conntrack action context for execution. */
+struct ovs_conntrack_info {
+ struct nf_conn *ct;
+ u32 flags;
+ u16 zone;
+ u16 family;
+};
+
+static u16 key_to_nfproto(const struct sw_flow_key *key)
+{
+ switch (ntohs(key->eth.type)) {
+ case ETH_P_IP:
+ return NFPROTO_IPV4;
+ case ETH_P_IPV6:
+ return NFPROTO_IPV6;
+ default:
+ return NFPROTO_UNSPEC;
+ }
+}
+
+/* Map SKB connection state into the values used by flow definition. */
+static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
+{
+ u8 ct_state = OVS_CS_F_TRACKED;
+
+ switch (ctinfo) {
+ case IP_CT_ESTABLISHED_REPLY:
+ case IP_CT_RELATED_REPLY:
+ case IP_CT_NEW_REPLY:
+ ct_state |= OVS_CS_F_REPLY_DIR;
+ break;
+ default:
+ break;
+ }
+
+ switch (ctinfo) {
+ case IP_CT_ESTABLISHED:
+ case IP_CT_ESTABLISHED_REPLY:
+ ct_state |= OVS_CS_F_ESTABLISHED;
+ break;
+ case IP_CT_RELATED:
+ case IP_CT_RELATED_REPLY:
+ ct_state |= OVS_CS_F_RELATED;
+ break;
+ case IP_CT_NEW:
+ case IP_CT_NEW_REPLY:
+ ct_state |= OVS_CS_F_NEW;
+ break;
+ default:
+ break;
+ }
+
+ return ct_state;
+}
+
+static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, u16 zone)
+{
+ key->ct.state = state;
+ key->ct.zone = zone;
+}
+
+/* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has
+ * previously sent the packet to conntrack via the ct action.
+ */
+static void ovs_ct_update_key(const struct sk_buff *skb,
+ struct sw_flow_key *key, bool post_ct)
+{
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+ u8 state = 0;
+ u16 zone;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (ct) {
+ state = ovs_ct_get_state(ctinfo);
+ if (ct->master)
+ state |= OVS_CS_F_RELATED;
+ zone = nf_ct_zone(ct);
+ } else {
+ if (post_ct)
+ state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
+ zone = NF_CT_DEFAULT_ZONE;
+ }
+ __ovs_ct_update_key(key, state, zone);
+}
+
+void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
+{
+ ovs_ct_update_key(skb, key, false);
+}
+
+static bool __ovs_ct_state_valid(u8 state)
+{
+ return (state && !(state & OVS_CS_F_INVALID));
+}
+
+bool ovs_ct_state_valid(const struct sw_flow_key *key)
+{
+ return __ovs_ct_state_valid(key->ct.state);
+}
+
+static int handle_fragments(struct net *net, struct sw_flow_key *key,
+ u16 zone, struct sk_buff *skb)
+{
+ struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
+
+ if (key->eth.type == htons(ETH_P_IP)) {
+ enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
+ int err;
+
+ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+ err = ip_defrag(skb, user);
+ if (err)
+ return err;
+
+ ovs_cb.mru = IPCB(skb)->frag_max_size;
+ } else if (key->eth.type == htons(ETH_P_IPV6)) {
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+ enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
+ struct sk_buff *reasm;
+
+ memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
+ reasm = nf_ct_frag6_gather(skb, user);
+ if (!reasm)
+ return -EINPROGRESS;
+
+ if (skb == reasm)
+ return -EINVAL;
+
+ key->ip.proto = ipv6_hdr(reasm)->nexthdr;
+ skb_morph(skb, reasm);
+ consume_skb(reasm);
+ ovs_cb.mru = IP6CB(skb)->frag_max_size;
+#else
+ return -EPFNOSUPPORT;
+#endif
+ } else {
+ return -EPFNOSUPPORT;
+ }
+
+ key->ip.frag = OVS_FRAG_TYPE_NONE;
+ skb_clear_hash(skb);
+ skb->ignore_df = 1;
+ *OVS_CB(skb) = ovs_cb;
+
+ return 0;
+}
+
+static struct nf_conntrack_expect *
+ovs_ct_expect_find(struct net *net, u16 zone, u16 proto,
+ const struct sk_buff *skb)
+{
+ struct nf_conntrack_tuple tuple;
+
+ if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, &tuple))
+ return NULL;
+ return __nf_ct_expect_find(net, zone, &tuple);
+}
+
+/* Determine whether skb->nfct is equal to the result of conntrack lookup. */
+static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb,
+ const struct ovs_conntrack_info *info)
+{
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct)
+ return false;
+ if (!net_eq(net, read_pnet(&ct->ct_net)))
+ return false;
+ if (info->zone != nf_ct_zone(ct))
+ return false;
+
+ return true;
+}
+
+static int __ovs_ct_lookup(struct net *net, const struct sw_flow_key *key,
+ const struct ovs_conntrack_info *info,
+ struct sk_buff *skb)
+{
+ /* If we are recirculating packets to match on conntrack fields and
+ * committing with a separate conntrack action, then we don't need to
+ * actually run the packet through conntrack twice unless it's for a
+ * different zone.
+ */
+ if (!skb_nfct_cached(net, skb, info)) {
+ struct nf_conn *tmpl = info->ct;
+
+ /* Associate skb with specified zone. */
+ if (tmpl) {
+ if (skb->nfct)
+ nf_conntrack_put(skb->nfct);
+ nf_conntrack_get(&tmpl->ct_general);
+ skb->nfct = &tmpl->ct_general;
+ skb->nfctinfo = IP_CT_NEW;
+ }
+
+ if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING,
+ skb) != NF_ACCEPT)
+ return -ENOENT;
+ }
+
+ return 0;
+}
+
+/* Lookup connection and read fields into key. */
+static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
+ const struct ovs_conntrack_info *info,
+ struct sk_buff *skb)
+{
+ struct nf_conntrack_expect *exp;
+
+ exp = ovs_ct_expect_find(net, info->zone, info->family, skb);
+ if (exp) {
+ u8 state;
+
+ state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
+ __ovs_ct_update_key(key, state, info->zone);
+ } else {
+ int err;
+
+ err = __ovs_ct_lookup(net, key, info, skb);
+ if (err)
+ return err;
+
+ ovs_ct_update_key(skb, key, true);
+ }
+
+ return 0;
+}
+
+/* Lookup connection and confirm if unconfirmed. */
+static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
+ const struct ovs_conntrack_info *info,
+ struct sk_buff *skb)
+{
+ u8 state;
+ int err;
+
+ state = key->ct.state;
+ if (key->ct.zone == info->zone &&
+ ((state & OVS_CS_F_TRACKED) && !(state & OVS_CS_F_NEW))) {
+ /* Previous lookup has shown that this connection is already
+ * tracked and committed. Skip committing.
+ */
+ return 0;
+ }
+
+ err = __ovs_ct_lookup(net, key, info, skb);
+ if (err)
+ return err;
+ if (nf_conntrack_confirm(skb) != NF_ACCEPT)
+ return -EINVAL;
+
+ ovs_ct_update_key(skb, key, true);
+
+ return 0;
+}
+
+int ovs_ct_execute(struct net *net, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct ovs_conntrack_info *info)
+{
+ int nh_ofs;
+ int err;
+
+ /* The conntrack module expects to be working at L3. */
+ nh_ofs = skb_network_offset(skb);
+ skb_pull(skb, nh_ofs);
+
+ if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
+ err = handle_fragments(net, key, info->zone, skb);
+ if (err)
+ return err;
+ }
+
+ if (info->flags & OVS_CT_F_COMMIT)
+ err = ovs_ct_commit(net, key, info, skb);
+ else
+ err = ovs_ct_lookup(net, key, info, skb);
+
+ skb_push(skb, nh_ofs);
+ return err;
+}
+
+static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
+ [OVS_CT_ATTR_FLAGS] = { .minlen = sizeof(u32),
+ .maxlen = sizeof(u32) },
+ [OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16),
+ .maxlen = sizeof(u16) },
+};
+
+static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
+ bool log)
+{
+ struct nlattr *a;
+ int rem;
+
+ nla_for_each_nested(a, attr, rem) {
+ int type = nla_type(a);
+ int maxlen = ovs_ct_attr_lens[type].maxlen;
+ int minlen = ovs_ct_attr_lens[type].minlen;
+
+ if (type > OVS_CT_ATTR_MAX) {
+ OVS_NLERR(log,
+ "Unknown conntrack attr (type=%d, max=%d)",
+ type, OVS_CT_ATTR_MAX);
+ return -EINVAL;
+ }
+ if (nla_len(a) < minlen || nla_len(a) > maxlen) {
+ OVS_NLERR(log,
+ "Conntrack attr type has unexpected length (type=%d, length=%d, expected=%d)",
+ type, nla_len(a), maxlen);
+ return -EINVAL;
+ }
+
+ switch (type) {
+ case OVS_CT_ATTR_FLAGS:
+ info->flags = nla_get_u32(a);
+ break;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+ case OVS_CT_ATTR_ZONE:
+ info->zone = nla_get_u16(a);
+ break;
+#endif
+ default:
+ OVS_NLERR(log, "Unknown conntrack attr (%d)",
+ type);
+ return -EINVAL;
+ }
+ }
+
+ if (rem > 0) {
+ OVS_NLERR(log, "Conntrack attr has %d unknown bytes", rem);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+bool ovs_ct_verify(enum ovs_key_attr attr)
+{
+ if (attr & OVS_KEY_ATTR_CT_STATE)
+ return true;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+ if (attr & OVS_KEY_ATTR_CT_ZONE)
+ return true;
+#endif
+
+ return false;
+}
+
+int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **sfa, bool log)
+{
+ struct ovs_conntrack_info ct_info;
+ u16 family;
+ int err;
+
+ family = key_to_nfproto(key);
+ if (family == NFPROTO_UNSPEC) {
+ OVS_NLERR(log, "ct family unspecified");
+ return -EINVAL;
+ }
+
+ memset(&ct_info, 0, sizeof(ct_info));
+ ct_info.family = family;
+
+ err = parse_ct(attr, &ct_info, log);
+ if (err)
+ return err;
+
+ /* Set up template for tracking connections in specific zones. */
+ ct_info.ct = nf_ct_tmpl_alloc(net, ct_info.zone, GFP_KERNEL);
+ if (!ct_info.ct) {
+ OVS_NLERR(log, "Failed to allocate conntrack template");
+ return -ENOMEM;
+ }
+
+ err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, &ct_info,
+ sizeof(ct_info), log);
+ if (err)
+ goto err_free_ct;
+
+ __set_bit(IPS_CONFIRMED_BIT, &ct_info.ct->status);
+ nf_conntrack_get(&ct_info.ct->ct_general);
+ return 0;
+err_free_ct:
+ nf_conntrack_free(ct_info.ct);
+ return err;
+}
+
+int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info,
+ struct sk_buff *skb)
+{
+ struct nlattr *start;
+
+ start = nla_nest_start(skb, OVS_ACTION_ATTR_CT);
+ if (!start)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, OVS_CT_ATTR_FLAGS, ct_info->flags))
+ return -EMSGSIZE;
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+ if (nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone))
+ return -EMSGSIZE;
+#endif
+
+ nla_nest_end(skb, start);
+
+ return 0;
+}
+
+void ovs_ct_free_action(const struct nlattr *a)
+{
+ struct ovs_conntrack_info *ct_info = nla_data(a);
+
+ if (ct_info->ct)
+ nf_ct_put(ct_info->ct);
+}
+
+void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
+{
+ data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
+ data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
+}
+
+void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
+{
+ if (data->xt_v4)
+ nf_ct_l3proto_module_put(PF_INET);
+ if (data->xt_v6)
+ nf_ct_l3proto_module_put(PF_INET6);
+}
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
new file mode 100644
index 0000000..3d629ac
--- /dev/null
+++ b/net/openvswitch/conntrack.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2015 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef OVS_CONNTRACK_H
+#define OVS_CONNTRACK_H 1
+
+#include "flow.h"
+
+struct ovs_conntrack_info;
+enum ovs_key_attr;
+
+struct ovs_ct_perdp_data {
+ bool xt_v4;
+ bool xt_v6;
+};
+
+#if defined(CONFIG_OPENVSWITCH_CONNTRACK)
+void ovs_ct_init(struct net *, struct ovs_ct_perdp_data *data);
+void ovs_ct_exit(struct net *, struct ovs_ct_perdp_data *data);
+bool ovs_ct_verify(enum ovs_key_attr attr);
+int ovs_ct_copy_action(struct net *, const struct nlattr *,
+ const struct sw_flow_key *, struct sw_flow_actions **,
+ bool log);
+int ovs_ct_action_to_attr(const struct ovs_conntrack_info *, struct sk_buff *);
+
+int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *,
+ const struct ovs_conntrack_info *);
+
+void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
+bool ovs_ct_state_valid(const struct sw_flow_key *key);
+void ovs_ct_free_action(const struct nlattr *a);
+#else
+#include <linux/errno.h>
+
+static inline void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
+{
+}
+
+static inline void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
+{
+}
+
+static inline bool ovs_ct_verify(int attr)
+{
+ return false;
+}
+
+static inline int ovs_ct_copy_action(struct net *net, const struct nlattr *nla,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **acts, bool log)
+{
+ return -ENOTSUPP;
+}
+
+static inline int ovs_ct_action_to_attr(const struct ovs_conntrack_info *info,
+ struct sk_buff *skb)
+{
+ return -ENOTSUPP;
+}
+
+static inline int ovs_ct_execute(struct net *net, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct ovs_conntrack_info *info)
+{
+ return -ENOTSUPP;
+}
+
+static inline bool ovs_ct_state_valid(const struct sw_flow_key *key)
+{
+ return false;
+}
+
+void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
+{
+ key->ct.state = 0;
+ key->ct.zone = 0;
+}
+
+static inline void ovs_ct_free_action(const struct nlattr *a) { }
+#endif
+#endif /* ovs_conntrack.h */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d5b5473..7e8d964 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -275,6 +275,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
memset(&upcall, 0, sizeof(upcall));
upcall.cmd = OVS_PACKET_CMD_MISS;
upcall.portid = ovs_vport_find_upcall_portid(p, skb);
+ upcall.mru = OVS_CB(skb)->mru;
error = ovs_dp_upcall(dp, skb, key, &upcall);
if (unlikely(error))
kfree_skb(skb);
@@ -400,9 +401,23 @@ static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
if (upcall_info->actions_len)
size += nla_total_size(upcall_info->actions_len);
+ /* OVS_PACKET_ATTR_MRU */
+ if (upcall_info->mru)
+ size += nla_total_size(sizeof(upcall_info->mru));
+
return size;
}
+static void pad_packet(struct datapath *dp, struct sk_buff *skb)
+{
+ if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
+ size_t plen = NLA_ALIGN(skb->len) - skb->len;
+
+ if (plen > 0)
+ memset(skb_put(skb, plen), 0, plen);
+ }
+}
+
static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
const struct sw_flow_key *key,
const struct dp_upcall_info *upcall_info)
@@ -492,6 +507,16 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
nla_nest_cancel(user_skb, nla);
}
+ /* Add OVS_PACKET_ATTR_MRU */
+ if (upcall_info->mru) {
+ if (nla_put_u16(user_skb, OVS_PACKET_ATTR_MRU,
+ upcall_info->mru)) {
+ err = -ENOBUFS;
+ goto out;
+ }
+ pad_packet(dp, user_skb);
+ }
+
/* Only reserve room for attribute header, packet data is added
* in skb_zerocopy() */
if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) {
@@ -505,12 +530,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
goto out;
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
- if (!(dp->user_features & OVS_DP_F_UNALIGNED)) {
- size_t plen = NLA_ALIGN(user_skb->len) - user_skb->len;
-
- if (plen > 0)
- memset(skb_put(user_skb, plen), 0, plen);
- }
+ pad_packet(dp, user_skb);
((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
@@ -527,6 +547,7 @@ out:
static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
{
struct ovs_header *ovs_header = info->userhdr;
+ struct net *net = sock_net(skb->sk);
struct nlattr **a = info->attrs;
struct sw_flow_actions *acts;
struct sk_buff *packet;
@@ -535,6 +556,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
struct datapath *dp;
struct ethhdr *eth;
struct vport *input_vport;
+ u16 mru = 0;
int len;
int err;
bool log = !a[OVS_PACKET_ATTR_PROBE];
@@ -564,6 +586,13 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
else
packet->protocol = htons(ETH_P_802_2);
+ /* Set packet's mru */
+ if (a[OVS_PACKET_ATTR_MRU]) {
+ mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
+ packet->ignore_df = 1;
+ }
+ OVS_CB(packet)->mru = mru;
+
/* Build an sw_flow for sending this packet. */
flow = ovs_flow_alloc();
err = PTR_ERR(flow);
@@ -575,7 +604,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
if (err)
goto err_flow_free;
- err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
+ err = ovs_nla_copy_actions(net, a[OVS_PACKET_ATTR_ACTIONS],
&flow->key, &acts, log);
if (err)
goto err_flow_free;
@@ -586,7 +615,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
packet->mark = flow->key.phy.skb_mark;
rcu_read_lock();
- dp = get_dp_rcu(sock_net(skb->sk), ovs_header->dp_ifindex);
+ dp = get_dp_rcu(net, ovs_header->dp_ifindex);
err = -ENODEV;
if (!dp)
goto err_unlock;
@@ -598,6 +627,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
if (!input_vport)
goto err_unlock;
+ packet->dev = input_vport->dev;
OVS_CB(packet)->input_vport = input_vport;
sf_acts = rcu_dereference(flow->sf_acts);
@@ -624,6 +654,7 @@ static const struct nla_policy packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
[OVS_PACKET_ATTR_KEY] = { .type = NLA_NESTED },
[OVS_PACKET_ATTR_ACTIONS] = { .type = NLA_NESTED },
[OVS_PACKET_ATTR_PROBE] = { .type = NLA_FLAG },
+ [OVS_PACKET_ATTR_MRU] = { .type = NLA_U16 },
};
static const struct genl_ops dp_packet_genl_ops[] = {
@@ -880,6 +911,7 @@ static struct sk_buff *ovs_flow_cmd_build_info(const struct sw_flow *flow,
static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
{
+ struct net *net = sock_net(skb->sk);
struct nlattr **a = info->attrs;
struct ovs_header *ovs_header = info->userhdr;
struct sw_flow *flow = NULL, *new_flow;
@@ -929,8 +961,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
goto err_kfree_flow;
/* Validate actions. */
- error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
- &acts, log);
+ error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
+ &new_flow->key, &acts, log);
if (error) {
OVS_NLERR(log, "Flow actions may not be safe on all matching packets.");
goto err_kfree_flow;
@@ -944,7 +976,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
}
ovs_lock();
- dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+ dp = get_dp(net, ovs_header->dp_ifindex);
if (unlikely(!dp)) {
error = -ENODEV;
goto err_unlock_ovs;
@@ -1038,7 +1070,8 @@ error:
}
/* Factor out action copy to avoid "Wframe-larger-than=1024" warning. */
-static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
+static struct sw_flow_actions *get_flow_actions(struct net *net,
+ const struct nlattr *a,
const struct sw_flow_key *key,
const struct sw_flow_mask *mask,
bool log)
@@ -1048,7 +1081,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
int error;
ovs_flow_mask_key(&masked_key, key, mask);
- error = ovs_nla_copy_actions(a, &masked_key, &acts, log);
+ error = ovs_nla_copy_actions(net, a, &masked_key, &acts, log);
if (error) {
OVS_NLERR(log,
"Actions may not be safe on all matching packets");
@@ -1060,6 +1093,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
{
+ struct net *net = sock_net(skb->sk);
struct nlattr **a = info->attrs;
struct ovs_header *ovs_header = info->userhdr;
struct sw_flow_key key;
@@ -1091,8 +1125,8 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
/* Validate actions. */
if (a[OVS_FLOW_ATTR_ACTIONS]) {
- acts = get_flow_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, &mask,
- log);
+ acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], &key,
+ &mask, log);
if (IS_ERR(acts)) {
error = PTR_ERR(acts);
goto error;
@@ -1108,7 +1142,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
}
ovs_lock();
- dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+ dp = get_dp(net, ovs_header->dp_ifindex);
if (unlikely(!dp)) {
error = -ENODEV;
goto err_unlock_ovs;
@@ -1547,6 +1581,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
ovs_dp_change(dp, a);
+ /* Set up conntrack dependencies. */
+ ovs_ct_init(ovs_dp_get_net(dp), &dp->ct);
+
/* So far only local changes have been made, now need the lock. */
ovs_lock();
@@ -1583,6 +1620,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
err_destroy_ports_array:
ovs_unlock();
kfree(dp->ports);
+ ovs_ct_exit(ovs_dp_get_net(dp), &dp->ct);
err_destroy_percpu:
free_percpu(dp->stats_percpu);
err_destroy_table:
@@ -1616,6 +1654,8 @@ static void __dp_destroy(struct datapath *dp)
*/
ovs_dp_detach_port(ovs_vport_ovsl(dp, OVSP_LOCAL));
+ ovs_ct_exit(ovs_dp_get_net(dp), &dp->ct);
+
/* RCU destroy the flow table */
call_rcu(&dp->rcu, destroy_dp_rcu);
}
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 487a85f..5bf9594 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -27,6 +27,7 @@
#include <linux/u64_stats_sync.h>
#include <net/ip_tunnels.h>
+#include "conntrack.h"
#include "flow.h"
#include "flow_table.h"
#include "vport.h"
@@ -89,6 +90,8 @@ struct datapath {
possible_net_t net;
u32 user_features;
+
+ struct ovs_ct_perdp_data ct;
};
/**
@@ -97,10 +100,13 @@ struct datapath {
* NULL if the packet is not being tunneled.
* @input_vport: The original vport packet came in on. This value is cached
* when a packet is received by OVS.
+ * @mru: The maximum received fragement size; 0 if the packet is not
+ * fragmented.
*/
struct ovs_skb_cb {
struct ip_tunnel_info *egress_tun_info;
struct vport *input_vport;
+ u16 mru;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
@@ -113,6 +119,7 @@ struct ovs_skb_cb {
* then no packet is sent and the packet is accounted in the datapath's @n_lost
* counter.
* @egress_tun_info: If nonnull, becomes %OVS_PACKET_ATTR_EGRESS_TUN_KEY.
+ * @mru: If not zero, Maximum received IP fragment size.
*/
struct dp_upcall_info {
const struct ip_tunnel_info *egress_tun_info;
@@ -121,6 +128,7 @@ struct dp_upcall_info {
int actions_len;
u32 portid;
u8 cmd;
+ u16 mru;
};
/**
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 8db22ef..376ca87 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -49,6 +49,7 @@
#include "datapath.h"
#include "flow.h"
#include "flow_netlink.h"
+#include "conntrack.h"
u64 ovs_flow_used_time(unsigned long flow_jiffies)
{
@@ -707,6 +708,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
key->phy.priority = skb->priority;
key->phy.in_port = OVS_CB(skb)->input_vport->port_no;
key->phy.skb_mark = skb->mark;
+ ovs_ct_fill_key(skb, key);
key->ovs_flow_hash = 0;
key->recirc_id = 0;
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 082a87b..312c7d7 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -111,6 +111,12 @@ struct sw_flow_key {
} nd;
} ipv6;
};
+ struct {
+ /* Connection tracking fields. */
+ u16 zone;
+ u8 state;
+ } ct;
+
} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
struct sw_flow_key_range {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 96cad8c..ec64463 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -281,7 +281,7 @@ size_t ovs_key_attr_size(void)
/* Whenever adding new OVS_KEY_ FIELDS, we should consider
* updating this function.
*/
- BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 22);
+ BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 24);
return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */
+ nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */
@@ -290,6 +290,8 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */
+ nla_total_size(4) /* OVS_KEY_ATTR_DP_HASH */
+ nla_total_size(4) /* OVS_KEY_ATTR_RECIRC_ID */
+ + nla_total_size(1) /* OVS_KEY_ATTR_CT_STATE */
+ + nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */
+ nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
+ nla_total_size(2) /* OVS_KEY_ATTR_ETHERTYPE */
+ nla_total_size(4) /* OVS_KEY_ATTR_VLAN */
@@ -339,6 +341,8 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_TUNNEL] = { .len = OVS_ATTR_NESTED,
.next = ovs_tunnel_key_lens, },
[OVS_KEY_ATTR_MPLS] = { .len = sizeof(struct ovs_key_mpls) },
+ [OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u8) },
+ [OVS_KEY_ATTR_CT_ZONE] = { .len = sizeof(u16) },
};
static bool is_all_zero(const u8 *fp, size_t size)
@@ -768,6 +772,21 @@ static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL);
}
+
+ if (*attrs & (1 << OVS_KEY_ATTR_CT_STATE) &&
+ ovs_ct_verify(OVS_KEY_ATTR_CT_STATE)) {
+ u8 ct_state = nla_get_u8(a[OVS_KEY_ATTR_CT_STATE]);
+
+ SW_FLOW_KEY_PUT(match, ct.state, ct_state, is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_STATE);
+ }
+ if (*attrs & (1 << OVS_KEY_ATTR_CT_ZONE) &&
+ ovs_ct_verify(OVS_KEY_ATTR_CT_ZONE)) {
+ u16 ct_zone = nla_get_u16(a[OVS_KEY_ATTR_CT_ZONE]);
+
+ SW_FLOW_KEY_PUT(match, ct.zone, ct_zone, is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_ZONE);
+ }
return 0;
}
@@ -1266,6 +1285,7 @@ int ovs_nla_get_flow_metadata(const struct nlattr *attr,
memset(&match, 0, sizeof(match));
match.key = key;
+ memset(&key->ct, 0, sizeof(key->ct));
key->phy.in_port = DP_MAX_PORTS;
return metadata_from_nlattrs(&match, &attrs, a, false, log);
@@ -1314,6 +1334,12 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
if (nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, output->phy.skb_mark))
goto nla_put_failure;
+ if (nla_put_u8(skb, OVS_KEY_ATTR_CT_STATE, output->ct.state))
+ goto nla_put_failure;
+
+ if (nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
+ goto nla_put_failure;
+
nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
if (!nla)
goto nla_put_failure;
@@ -1574,6 +1600,9 @@ void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
case OVS_ACTION_ATTR_SET:
ovs_nla_free_set_action(a);
break;
+ case OVS_ACTION_ATTR_CT:
+ ovs_ct_free_action(a);
+ break;
}
}
@@ -1647,8 +1676,8 @@ static struct nlattr *__add_action(struct sw_flow_actions **sfa,
return a;
}
-static int add_action(struct sw_flow_actions **sfa, int attrtype,
- void *data, int len, bool log)
+int ovs_nla_add_action(struct sw_flow_actions **sfa, int attrtype, void *data,
+ int len, bool log)
{
struct nlattr *a;
@@ -1663,7 +1692,7 @@ static inline int add_nested_action_start(struct sw_flow_actions **sfa,
int used = (*sfa)->actions_len;
int err;
- err = add_action(sfa, attrtype, NULL, 0, log);
+ err = ovs_nla_add_action(sfa, attrtype, NULL, 0, log);
if (err)
return err;
@@ -1679,12 +1708,12 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
a->nla_len = sfa->actions_len - st_offset;
}
-static int __ovs_nla_copy_actions(const struct nlattr *attr,
+static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
int depth, struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci, bool log);
-static int validate_and_copy_sample(const struct nlattr *attr,
+static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key, int depth,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci, bool log)
@@ -1716,15 +1745,15 @@ static int validate_and_copy_sample(const struct nlattr *attr,
start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log);
if (start < 0)
return start;
- err = add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY,
- nla_data(probability), sizeof(u32), log);
+ err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY,
+ nla_data(probability), sizeof(u32), log);
if (err)
return err;
st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log);
if (st_acts < 0)
return st_acts;
- err = __ovs_nla_copy_actions(actions, key, depth + 1, sfa,
+ err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa,
eth_type, vlan_tci, log);
if (err)
return err;
@@ -2058,7 +2087,7 @@ static int copy_action(const struct nlattr *from,
return 0;
}
-static int __ovs_nla_copy_actions(const struct nlattr *attr,
+static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
int depth, struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci, bool log)
@@ -2082,7 +2111,8 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
[OVS_ACTION_ATTR_SET] = (u32)-1,
[OVS_ACTION_ATTR_SET_MASKED] = (u32)-1,
[OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
- [OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash)
+ [OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash),
+ [OVS_ACTION_ATTR_CT] = (u32)-1,
};
const struct ovs_action_push_vlan *vlan;
int type = nla_type(a);
@@ -2189,13 +2219,20 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
break;
case OVS_ACTION_ATTR_SAMPLE:
- err = validate_and_copy_sample(a, key, depth, sfa,
+ err = validate_and_copy_sample(net, a, key, depth, sfa,
eth_type, vlan_tci, log);
if (err)
return err;
skip_copy = true;
break;
+ case OVS_ACTION_ATTR_CT:
+ err = ovs_ct_copy_action(net, a, key, sfa, log);
+ if (err)
+ return err;
+ skip_copy = true;
+ break;
+
default:
OVS_NLERR(log, "Unknown Action type %d", type);
return -EINVAL;
@@ -2214,7 +2251,7 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
}
/* 'key' must be the masked key. */
-int ovs_nla_copy_actions(const struct nlattr *attr,
+int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa, bool log)
{
@@ -2225,7 +2262,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
return PTR_ERR(*sfa);
(*sfa)->orig_len = nla_len(attr);
- err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
+ err = __ovs_nla_copy_actions(net, attr, key, 0, sfa, key->eth.type,
key->eth.tci, log);
if (err)
ovs_nla_free_flow_actions(*sfa);
@@ -2350,6 +2387,13 @@ int ovs_nla_put_actions(const struct nlattr *attr, int len, struct sk_buff *skb)
if (err)
return err;
break;
+
+ case OVS_ACTION_ATTR_CT:
+ err = ovs_ct_action_to_attr(nla_data(a), skb);
+ if (err)
+ return err;
+ break;
+
default:
if (nla_put(skb, type, nla_len(a), nla_data(a)))
return -EMSGSIZE;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index acd0744..c0b484b 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -62,9 +62,11 @@ int ovs_nla_get_identifier(struct sw_flow_id *sfid, const struct nlattr *ufid,
const struct sw_flow_key *key, bool log);
u32 ovs_nla_get_ufid_flags(const struct nlattr *attr);
-int ovs_nla_copy_actions(const struct nlattr *attr,
+int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa, bool log);
+int ovs_nla_add_action(struct sw_flow_actions **sfa, int attrtype,
+ void *data, int len, bool log);
int ovs_nla_put_actions(const struct nlattr *attr,
int len, struct sk_buff *skb);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index d14f594..492357f 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -484,6 +484,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
OVS_CB(skb)->input_vport = vport;
OVS_CB(skb)->egress_tun_info = NULL;
+ OVS_CB(skb)->mru = 0;
/* Extract flow from 'skb' into 'key'. */
error = ovs_flow_key_extract(tun_info, skb, &key);
if (unlikely(error)) {
--
2.1.4
Allow matching and setting the conntrack mark field. As with conntrack
state and zone, these are populated when the CT action is executed,
and are made available for matching via RECIRC. To write to this field,
a value and optional mark can be passed as part of the conntrack action.
E.g.: actions:ct(zone=0),ct(zone=1,mark=1)
This will perform conntrack lookup in zone 0, then lookup in zone 1,
then modify the mark for the entry in zone 1. The conntrack entry itself
must be committed using the "commit" flag in the conntrack action flags
for this change to persist.
Signed-off-by: Justin Pettit <[email protected]>
Signed-off-by: Joe Stringer <[email protected]>
---
v1-v3: No change.
v4: Only allow setting conntrack mark via ct action.
Documentation tweaks.
---
include/uapi/linux/openvswitch.h | 5 ++++
net/openvswitch/actions.c | 1 +
net/openvswitch/conntrack.c | 61 ++++++++++++++++++++++++++++++++++++++--
net/openvswitch/conntrack.h | 1 +
net/openvswitch/flow.h | 1 +
net/openvswitch/flow_netlink.c | 15 +++++++++-
6 files changed, 80 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 55f5997..7a185b5 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -325,6 +325,7 @@ enum ovs_key_attr {
* the accepted length of the array. */
OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
+ OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */
#ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
@@ -613,11 +614,15 @@ struct ovs_action_hash {
* enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
* @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
* @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
+ * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
+ * mask, the corresponding bit in the value is copied to the connection
+ * tracking mark field in the connection.
*/
enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE, /* u16 zone id. */
+ OVS_CT_ATTR_MARK, /* mark to associate with this connection. */
__OVS_CT_ATTR_MAX
};
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 5911a2a..083dcf9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -968,6 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
+ case OVS_KEY_ATTR_CT_MARK:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 601cd16..bdd1a28 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -28,12 +28,19 @@ struct ovs_ct_len_tbl {
size_t minlen;
};
+/* Metadata mark for masked write to conntrack mark */
+struct md_mark {
+ u32 value;
+ u32 mask;
+};
+
/* Conntrack action context for execution. */
struct ovs_conntrack_info {
struct nf_conn *ct;
u32 flags;
u16 zone;
u16 family;
+ struct md_mark mark;
};
static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -83,10 +90,12 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
return ct_state;
}
-static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, u16 zone)
+static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, u16 zone,
+ const struct nf_conn *ct)
{
key->ct.state = state;
key->ct.zone = zone;
+ key->ct.mark = ct ? ct->mark : 0;
}
/* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has
@@ -111,7 +120,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
zone = NF_CT_DEFAULT_ZONE;
}
- __ovs_ct_update_key(key, state, zone);
+ __ovs_ct_update_key(key, state, zone, ct);
}
void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
@@ -119,6 +128,32 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
ovs_ct_update_key(skb, key, false);
}
+static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+ u32 ct_mark, u32 mask)
+{
+#ifdef CONFIG_NF_CONNTRACK_MARK
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+ u32 new_mark;
+
+ /* The connection could be invalid, in which case set_mark is no-op. */
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct)
+ return 0;
+
+ new_mark = ct_mark | (ct->mark & ~(mask));
+ if (ct->mark != new_mark) {
+ ct->mark = new_mark;
+ nf_conntrack_event_cache(IPCT_MARK, ct);
+ key->ct.mark = new_mark;
+ }
+
+ return 0;
+#else
+ return -ENOTSUPP;
+#endif
+}
+
static bool __ovs_ct_state_valid(u8 state)
{
return (state && !(state & OVS_CS_F_INVALID));
@@ -246,7 +281,7 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
u8 state;
state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
- __ovs_ct_update_key(key, state, info->zone);
+ __ovs_ct_update_key(key, state, info->zone, exp->master);
} else {
int err;
@@ -309,7 +344,13 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
err = ovs_ct_commit(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
+ if (err)
+ goto err;
+ if (info->mark.mask)
+ err = ovs_ct_set_mark(skb, key, info->mark.value,
+ info->mark.mask);
+err:
skb_push(skb, nh_ofs);
return err;
}
@@ -319,6 +360,8 @@ static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
.maxlen = sizeof(u32) },
[OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
+ [OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark),
+ .maxlen = sizeof(struct md_mark) },
};
static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
@@ -354,6 +397,14 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
info->zone = nla_get_u16(a);
break;
#endif
+#ifdef CONFIG_NF_CONNTRACK_MARK
+ case OVS_CT_ATTR_MARK: {
+ struct md_mark *mark = nla_data(a);
+
+ info->mark = *mark;
+ break;
+ }
+#endif
default:
OVS_NLERR(log, "Unknown conntrack attr (%d)",
type);
@@ -377,6 +428,10 @@ bool ovs_ct_verify(enum ovs_key_attr attr)
if (attr & OVS_KEY_ATTR_CT_ZONE)
return true;
#endif
+#ifdef CONFIG_NF_CONNTRACK_MARK
+ if (attr & OVS_KEY_ATTR_CT_MARK)
+ return true;
+#endif
return false;
}
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 3d629ac..4cc35b7 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -84,6 +84,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
{
key->ct.state = 0;
key->ct.zone = 0;
+ key->ct.mark = 0;
}
static inline void ovs_ct_free_action(const struct nlattr *a) { }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 312c7d7..e05e697 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -114,6 +114,7 @@ struct sw_flow_key {
struct {
/* Connection tracking fields. */
u16 zone;
+ u32 mark;
u8 state;
} ct;
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ec64463..e54de9b 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -281,7 +281,7 @@ size_t ovs_key_attr_size(void)
/* Whenever adding new OVS_KEY_ FIELDS, we should consider
* updating this function.
*/
- BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 24);
+ BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 25);
return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */
+ nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */
@@ -292,6 +292,7 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(4) /* OVS_KEY_ATTR_RECIRC_ID */
+ nla_total_size(1) /* OVS_KEY_ATTR_CT_STATE */
+ nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */
+ + nla_total_size(4) /* OVS_KEY_ATTR_CT_MARK */
+ nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
+ nla_total_size(2) /* OVS_KEY_ATTR_ETHERTYPE */
+ nla_total_size(4) /* OVS_KEY_ATTR_VLAN */
@@ -343,6 +344,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_MPLS] = { .len = sizeof(struct ovs_key_mpls) },
[OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u8) },
[OVS_KEY_ATTR_CT_ZONE] = { .len = sizeof(u16) },
+ [OVS_KEY_ATTR_CT_MARK] = { .len = sizeof(u32) },
};
static bool is_all_zero(const u8 *fp, size_t size)
@@ -787,6 +789,13 @@ static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
SW_FLOW_KEY_PUT(match, ct.zone, ct_zone, is_mask);
*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_ZONE);
}
+ if (*attrs & (1 << OVS_KEY_ATTR_CT_MARK) &&
+ ovs_ct_verify(OVS_KEY_ATTR_CT_MARK)) {
+ u32 mark = nla_get_u32(a[OVS_KEY_ATTR_CT_MARK]);
+
+ SW_FLOW_KEY_PUT(match, ct.mark, mark, is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_MARK);
+ }
return 0;
}
@@ -1340,6 +1349,9 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
if (nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, output->ct.zone))
goto nla_put_failure;
+ if (nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, output->ct.mark))
+ goto nla_put_failure;
+
nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
if (!nla)
goto nla_put_failure;
@@ -1922,6 +1934,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_PRIORITY:
case OVS_KEY_ATTR_SKB_MARK:
+ case OVS_KEY_ATTR_CT_MARK:
case OVS_KEY_ATTR_ETHERNET:
break;
--
2.1.4
The following patches will reuse this code from OVS.
Signed-off-by: Joe Stringer <[email protected]>
---
v2-v4: No change.
---
net/netfilter/nf_conntrack_labels.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_labels.c b/net/netfilter/nf_conntrack_labels.c
index bb53f12..daa7c13 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -48,7 +48,6 @@ int nf_connlabel_set(struct nf_conn *ct, u16 bit)
}
EXPORT_SYMBOL_GPL(nf_connlabel_set);
-#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
static void replace_u32(u32 *address, u32 mask, u32 new)
{
u32 old, tmp;
@@ -89,7 +88,6 @@ int nf_connlabels_replace(struct nf_conn *ct,
return 0;
}
EXPORT_SYMBOL_GPL(nf_connlabels_replace);
-#endif
static struct nf_ct_ext_type labels_extend __read_mostly = {
.len = sizeof(struct nf_conn_labels),
--
2.1.4
Add functions to change connlabel length into nf_conntrack_labels.c so
they may be reused by other modules like OVS and nftables without
needing to jump through xt_match_check() hoops.
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Florian Westphal <[email protected]>
---
v2: Protect connlabel modification with spinlock.
Fix reference leak in error case.
Style fixups.
v3: No change.
v4: Add ack.
---
include/net/netfilter/nf_conntrack_labels.h | 4 ++++
net/netfilter/nf_conntrack_labels.c | 32 +++++++++++++++++++++++++++++
net/netfilter/xt_connlabel.c | 16 ++++-----------
3 files changed, 40 insertions(+), 12 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_labels.h b/include/net/netfilter/nf_conntrack_labels.h
index dec6336..7e2b1d0 100644
--- a/include/net/netfilter/nf_conntrack_labels.h
+++ b/include/net/netfilter/nf_conntrack_labels.h
@@ -54,7 +54,11 @@ int nf_connlabels_replace(struct nf_conn *ct,
#ifdef CONFIG_NF_CONNTRACK_LABELS
int nf_conntrack_labels_init(void);
void nf_conntrack_labels_fini(void);
+int nf_connlabels_get(struct net *net, unsigned int n_bits);
+void nf_connlabels_put(struct net *net);
#else
static inline int nf_conntrack_labels_init(void) { return 0; }
static inline void nf_conntrack_labels_fini(void) {}
+static inline int nf_connlabels_get(struct net *net, unsigned int n_bits) { return 0; }
+static inline void nf_connlabels_put(struct net *net) {}
#endif
diff --git a/net/netfilter/nf_conntrack_labels.c b/net/netfilter/nf_conntrack_labels.c
index daa7c13..3ce5c31 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -14,6 +14,8 @@
#include <net/netfilter/nf_conntrack_ecache.h>
#include <net/netfilter/nf_conntrack_labels.h>
+static spinlock_t nf_connlabels_lock;
+
static unsigned int label_bits(const struct nf_conn_labels *l)
{
unsigned int longs = l->words;
@@ -89,6 +91,35 @@ int nf_connlabels_replace(struct nf_conn *ct,
}
EXPORT_SYMBOL_GPL(nf_connlabels_replace);
+int nf_connlabels_get(struct net *net, unsigned int n_bits)
+{
+ size_t words;
+
+ if (n_bits > (NF_CT_LABELS_MAX_SIZE * BITS_PER_BYTE))
+ return -ERANGE;
+
+ words = BITS_TO_LONGS(n_bits);
+
+ spin_lock(&nf_connlabels_lock);
+ net->ct.labels_used++;
+ if (words > net->ct.label_words)
+ net->ct.label_words = words;
+ spin_unlock(&nf_connlabels_lock);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_get);
+
+void nf_connlabels_put(struct net *net)
+{
+ spin_lock(&nf_connlabels_lock);
+ net->ct.labels_used--;
+ if (net->ct.labels_used == 0)
+ net->ct.label_words = 0;
+ spin_unlock(&nf_connlabels_lock);
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_put);
+
static struct nf_ct_ext_type labels_extend __read_mostly = {
.len = sizeof(struct nf_conn_labels),
.align = __alignof__(struct nf_conn_labels),
@@ -97,6 +128,7 @@ static struct nf_ct_ext_type labels_extend __read_mostly = {
int nf_conntrack_labels_init(void)
{
+ spin_lock_init(&nf_connlabels_lock);
return nf_ct_extend_register(&labels_extend);
}
diff --git a/net/netfilter/xt_connlabel.c b/net/netfilter/xt_connlabel.c
index 9f8719d..bb9cbeb 100644
--- a/net/netfilter/xt_connlabel.c
+++ b/net/netfilter/xt_connlabel.c
@@ -42,10 +42,6 @@ static int connlabel_mt_check(const struct xt_mtchk_param *par)
XT_CONNLABEL_OP_SET;
struct xt_connlabel_mtinfo *info = par->matchinfo;
int ret;
- size_t words;
-
- if (info->bit > XT_CONNLABEL_MAXBIT)
- return -ERANGE;
if (info->options & ~options) {
pr_err("Unknown options in mask %x\n", info->options);
@@ -59,19 +55,15 @@ static int connlabel_mt_check(const struct xt_mtchk_param *par)
return ret;
}
- par->net->ct.labels_used++;
- words = BITS_TO_LONGS(info->bit+1);
- if (words > par->net->ct.label_words)
- par->net->ct.label_words = words;
-
+ ret = nf_connlabels_get(par->net, info->bit + 1);
+ if (ret < 0)
+ nf_ct_l3proto_module_put(par->family);
return ret;
}
static void connlabel_mt_destroy(const struct xt_mtdtor_param *par)
{
- par->net->ct.labels_used--;
- if (par->net->ct.labels_used == 0)
- par->net->ct.label_words = 0;
+ nf_connlabels_put(par->net);
nf_ct_l3proto_module_put(par->family);
}
--
2.1.4
Allow matching and setting the conntrack label field. As with ct_mark,
this is populated by executing the CT action, and is a writable field.
Specifying a label and optional mask allows the label to be modified,
which takes effect on the entry found by the lookup of the CT action.
E.g.: actions:ct(zone=1,label=1)
This will perform conntrack lookup in zone 1, then modify the label for
that entry. The conntrack entry itself must be committed using the
"commit" flag in the conntrack action flags for this change to persist.
Signed-off-by: Joe Stringer <[email protected]>
---
v2: Split out setting the connlabel size for the current namespace.
v3: No change.
v4: Only allow setting label via ct action.
Update documentation.
---
include/uapi/linux/openvswitch.h | 10 ++++
net/openvswitch/actions.c | 1 +
net/openvswitch/conntrack.c | 100 ++++++++++++++++++++++++++++++++++++++-
net/openvswitch/conntrack.h | 1 +
net/openvswitch/flow.h | 1 +
net/openvswitch/flow_netlink.c | 18 ++++++-
6 files changed, 129 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7a185b5..9d52058 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -326,6 +326,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */
+ OVS_KEY_ATTR_CT_LABEL, /* 16-octet connection tracking label */
#ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
@@ -438,6 +439,11 @@ struct ovs_key_nd {
__u8 nd_tll[ETH_ALEN];
};
+#define OVS_CT_LABEL_LEN 16
+struct ovs_key_ct_label {
+ __u8 ct_label[OVS_CT_LABEL_LEN];
+};
+
/* OVS_KEY_ATTR_CT_STATE flags */
#define OVS_CS_F_NEW 0x01 /* Beginning of a new connection. */
#define OVS_CS_F_ESTABLISHED 0x02 /* Part of an existing connection. */
@@ -617,12 +623,16 @@ struct ovs_action_hash {
* @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
* mask, the corresponding bit in the value is copied to the connection
* tracking mark field in the connection.
+ * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
+ * mask. For each bit set in the mask, the corresponding bit in the value is
+ * copied to the connection tracking label field in the connection.
*/
enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE, /* u16 zone id. */
OVS_CT_ATTR_MARK, /* mark to associate with this connection. */
+ OVS_CT_ATTR_LABEL, /* label to associate with this connection. */
__OVS_CT_ATTR_MAX
};
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 083dcf9..862a3d2 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -969,6 +969,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
+ case OVS_KEY_ATTR_CT_LABEL:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index bdd1a28..caa9a46 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
#include <linux/openvswitch.h>
#include <net/ip.h>
#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_labels.h>
#include <net/netfilter/nf_conntrack_zones.h>
#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
@@ -34,6 +35,12 @@ struct md_mark {
u32 mask;
};
+/* Metadata label for masked write to conntrack label. */
+struct md_label {
+ struct ovs_key_ct_label value;
+ struct ovs_key_ct_label mask;
+};
+
/* Conntrack action context for execution. */
struct ovs_conntrack_info {
struct nf_conn *ct;
@@ -41,6 +48,7 @@ struct ovs_conntrack_info {
u16 zone;
u16 family;
struct md_mark mark;
+ struct md_label label;
};
static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -90,12 +98,31 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
return ct_state;
}
+static void ovs_ct_get_label(const struct nf_conn *ct,
+ struct ovs_key_ct_label *label)
+{
+ struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
+
+ if (cl) {
+ size_t len = cl->words * sizeof(long);
+
+ if (len > OVS_CT_LABEL_LEN)
+ len = OVS_CT_LABEL_LEN;
+ else if (len < OVS_CT_LABEL_LEN)
+ memset(label, 0, OVS_CT_LABEL_LEN);
+ memcpy(label, cl->bits, len);
+ } else {
+ memset(label, 0, OVS_CT_LABEL_LEN);
+ }
+}
+
static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, u16 zone,
const struct nf_conn *ct)
{
key->ct.state = state;
key->ct.zone = zone;
key->ct.mark = ct ? ct->mark : 0;
+ ovs_ct_get_label(ct, &key->ct.label);
}
/* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has
@@ -154,6 +181,41 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
#endif
}
+static int ovs_ct_set_label(struct sk_buff *skb, struct sw_flow_key *key,
+ const struct ovs_key_ct_label *label,
+ const struct ovs_key_ct_label *mask)
+{
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn_labels *cl;
+ struct nf_conn *ct;
+ int err;
+
+ /* The connection could be invalid, in which case set_label is no-op.*/
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct)
+ return 0;
+
+ cl = nf_ct_labels_find(ct);
+ if (!cl) {
+ nf_ct_labels_ext_add(ct);
+ cl = nf_ct_labels_find(ct);
+ }
+ if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
+ return -ENOSPC;
+
+ err = nf_connlabels_replace(ct, (u32 *)label, (u32 *)mask,
+ OVS_CT_LABEL_LEN / sizeof(u32));
+ if (err)
+ return err;
+
+ ovs_ct_get_label(ct, &key->ct.label);
+ return 0;
+#else
+ return -ENOTSUPP;
+#endif
+}
+
static bool __ovs_ct_state_valid(u8 state)
{
return (state && !(state & OVS_CS_F_INVALID));
@@ -323,6 +385,17 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
return 0;
}
+static bool label_nonzero(const struct ovs_key_ct_label *label)
+{
+ size_t i;
+
+ for (i = 0; i < sizeof(*label); i++)
+ if (label->ct_label[i])
+ return true;
+
+ return false;
+}
+
int ovs_ct_execute(struct net *net, struct sk_buff *skb,
struct sw_flow_key *key,
const struct ovs_conntrack_info *info)
@@ -347,9 +420,15 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
if (err)
goto err;
- if (info->mark.mask)
+ if (info->mark.mask) {
err = ovs_ct_set_mark(skb, key, info->mark.value,
info->mark.mask);
+ if (err)
+ goto err;
+ }
+ if (label_nonzero(&info->label.mask))
+ err = ovs_ct_set_label(skb, key, &info->label.value,
+ &info->label.mask);
err:
skb_push(skb, nh_ofs);
return err;
@@ -362,6 +441,8 @@ static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK] = { .minlen = sizeof(struct md_mark),
.maxlen = sizeof(struct md_mark) },
+ [OVS_CT_ATTR_LABEL] = { .minlen = sizeof(struct md_label),
+ .maxlen = sizeof(struct md_label) },
};
static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
@@ -405,6 +486,14 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
break;
}
#endif
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+ case OVS_CT_ATTR_LABEL: {
+ struct md_label *label = nla_data(a);
+
+ info->label = *label;
+ break;
+ }
+#endif
default:
OVS_NLERR(log, "Unknown conntrack attr (%d)",
type);
@@ -432,6 +521,10 @@ bool ovs_ct_verify(enum ovs_key_attr attr)
if (attr & OVS_KEY_ATTR_CT_MARK)
return true;
#endif
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+ if (attr & OVS_KEY_ATTR_CT_LABEL)
+ return true;
+#endif
return false;
}
@@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
{
+ unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
+
data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
+ if (nf_connlabels_get(net, n_bits);
+ OVS_NLERR(true, "Failed to set connlabel length");
}
void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
@@ -518,4 +615,5 @@ void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
nf_ct_l3proto_module_put(PF_INET);
if (data->xt_v6)
nf_ct_l3proto_module_put(PF_INET6);
+ nf_connlabels_put(net);
}
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 4cc35b7..21e1da1 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -85,6 +85,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
key->ct.state = 0;
key->ct.zone = 0;
key->ct.mark = 0;
+ memset(&key->ct.label, 0, sizeof(key->ct.label));
}
static inline void ovs_ct_free_action(const struct nlattr *a) { }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index e05e697..c57994b 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -116,6 +116,7 @@ struct sw_flow_key {
u16 zone;
u32 mark;
u8 state;
+ struct ovs_key_ct_label label;
} ct;
} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index e54de9b..0a8e626 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -281,7 +281,7 @@ size_t ovs_key_attr_size(void)
/* Whenever adding new OVS_KEY_ FIELDS, we should consider
* updating this function.
*/
- BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 25);
+ BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 26);
return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */
+ nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */
@@ -293,6 +293,7 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(1) /* OVS_KEY_ATTR_CT_STATE */
+ nla_total_size(2) /* OVS_KEY_ATTR_CT_ZONE */
+ nla_total_size(4) /* OVS_KEY_ATTR_CT_MARK */
+ + nla_total_size(16) /* OVS_KEY_ATTR_CT_LABEL */
+ nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
+ nla_total_size(2) /* OVS_KEY_ATTR_ETHERTYPE */
+ nla_total_size(4) /* OVS_KEY_ATTR_VLAN */
@@ -345,6 +346,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_CT_STATE] = { .len = sizeof(u8) },
[OVS_KEY_ATTR_CT_ZONE] = { .len = sizeof(u16) },
[OVS_KEY_ATTR_CT_MARK] = { .len = sizeof(u32) },
+ [OVS_KEY_ATTR_CT_LABEL] = { .len = sizeof(struct ovs_key_ct_label) },
};
static bool is_all_zero(const u8 *fp, size_t size)
@@ -796,6 +798,15 @@ static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
SW_FLOW_KEY_PUT(match, ct.mark, mark, is_mask);
*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_MARK);
}
+ if (*attrs & (1 << OVS_KEY_ATTR_CT_LABEL) &&
+ ovs_ct_verify(OVS_KEY_ATTR_CT_LABEL)) {
+ const struct ovs_key_ct_label *cl;
+
+ cl = nla_data(a[OVS_KEY_ATTR_CT_LABEL]);
+ SW_FLOW_KEY_MEMCPY(match, ct.label, cl->ct_label,
+ sizeof(*cl), is_mask);
+ *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABEL);
+ }
return 0;
}
@@ -1352,6 +1363,10 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
if (nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, output->ct.mark))
goto nla_put_failure;
+ if (nla_put(skb, OVS_KEY_ATTR_CT_LABEL,
+ sizeof(output->ct.label), &output->ct.label))
+ goto nla_put_failure;
+
nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
if (!nla)
goto nla_put_failure;
@@ -1935,6 +1950,7 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_PRIORITY:
case OVS_KEY_ATTR_SKB_MARK:
case OVS_KEY_ATTR_CT_MARK:
+ case OVS_KEY_ATTR_CT_LABEL:
case OVS_KEY_ATTR_ETHERNET:
break;
--
2.1.4
Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the ct action specifies a helper
to be used for this connection.
Example ODP flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(),recirc(1)
recirc_id=1,in_port=2,tcp,ct_state=+trk-new+est,action=1
recirc_id=1,in_port=2,tcp,ct_state=+trk+rel,action=1
Signed-off-by: Joe Stringer <[email protected]>
---
v2-v3: No change.
v4: Change error code for unknown helper ENOENT->EINVAL.
---
include/uapi/linux/openvswitch.h | 3 ++
net/openvswitch/conntrack.c | 109 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 110 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9d52058..32e07d8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -626,6 +626,7 @@ struct ovs_action_hash {
* @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
* mask. For each bit set in the mask, the corresponding bit in the value is
* copied to the connection tracking label field in the connection.
+ * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
*/
enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -633,6 +634,8 @@ enum ovs_ct_attr {
OVS_CT_ATTR_ZONE, /* u16 zone id. */
OVS_CT_ATTR_MARK, /* mark to associate with this connection. */
OVS_CT_ATTR_LABEL, /* label to associate with this connection. */
+ OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
+ related connections. */
__OVS_CT_ATTR_MAX
};
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index caa9a46..fac8b66 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
#include <linux/openvswitch.h>
#include <net/ip.h>
#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_helper.h>
#include <net/netfilter/nf_conntrack_labels.h>
#include <net/netfilter/nf_conntrack_zones.h>
#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
@@ -43,6 +44,7 @@ struct md_label {
/* Conntrack action context for execution. */
struct ovs_conntrack_info {
+ struct nf_conntrack_helper *helper;
struct nf_conn *ct;
u32 flags;
u16 zone;
@@ -226,6 +228,51 @@ bool ovs_ct_state_valid(const struct sw_flow_key *key)
return __ovs_ct_state_valid(key->ct.state);
}
+/* 'skb' should already be pulled to nh_ofs. */
+static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
+{
+ const struct nf_conntrack_helper *helper;
+ const struct nf_conn_help *help;
+ enum ip_conntrack_info ctinfo;
+ unsigned int protoff;
+ struct nf_conn *ct;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+ return NF_ACCEPT;
+
+ help = nfct_help(ct);
+ if (!help)
+ return NF_ACCEPT;
+
+ helper = rcu_dereference(help->helper);
+ if (!helper)
+ return NF_ACCEPT;
+
+ switch (proto) {
+ case NFPROTO_IPV4:
+ protoff = ip_hdrlen(skb);
+ break;
+ case NFPROTO_IPV6: {
+ u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+ __be16 frag_off;
+
+ protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
+ &nexthdr, &frag_off);
+ if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
+ pr_debug("proto header not found\n");
+ return NF_ACCEPT;
+ }
+ break;
+ }
+ default:
+ WARN_ONCE(1, "helper invoked on non-IP family!");
+ return NF_DROP;
+ }
+
+ return helper->help(skb, protoff, ct, ctinfo);
+}
+
static int handle_fragments(struct net *net, struct sw_flow_key *key,
u16 zone, struct sk_buff *skb)
{
@@ -298,6 +345,13 @@ static bool skb_nfct_cached(const struct net *net, const struct sk_buff *skb,
return false;
if (info->zone != nf_ct_zone(ct))
return false;
+ if (info->helper) {
+ struct nf_conn_help *help;
+
+ help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+ if (help && help->helper != info->helper)
+ return false;
+ }
return true;
}
@@ -326,6 +380,11 @@ static int __ovs_ct_lookup(struct net *net, const struct sw_flow_key *key,
if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING,
skb) != NF_ACCEPT)
return -ENOENT;
+
+ if (ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
+ WARN_ONCE(1, "helper rejected packet");
+ return -EINVAL;
+ }
}
return 0;
@@ -434,6 +493,30 @@ err:
return err;
}
+static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
+ const struct sw_flow_key *key, bool log)
+{
+ struct nf_conntrack_helper *helper;
+ struct nf_conn_help *help;
+
+ helper = nf_conntrack_helper_try_module_get(name, info->family,
+ key->ip.proto);
+ if (!helper) {
+ OVS_NLERR(log, "Unknown helper \"%s\"", name);
+ return -EINVAL;
+ }
+
+ help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL);
+ if (!help) {
+ module_put(helper->me);
+ return -ENOMEM;
+ }
+
+ help->helper = helper;
+ info->helper = helper;
+ return 0;
+}
+
static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_FLAGS] = { .minlen = sizeof(u32),
.maxlen = sizeof(u32) },
@@ -443,10 +526,12 @@ static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
.maxlen = sizeof(struct md_mark) },
[OVS_CT_ATTR_LABEL] = { .minlen = sizeof(struct md_label),
.maxlen = sizeof(struct md_label) },
+ [OVS_CT_ATTR_HELPER] = { .minlen = 1,
+ .maxlen = NF_CT_HELPER_NAME_LEN }
};
static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
- bool log)
+ const char **helper, bool log)
{
struct nlattr *a;
int rem;
@@ -494,6 +579,13 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
break;
}
#endif
+ case OVS_CT_ATTR_HELPER:
+ *helper = nla_data(a);
+ if (!memchr(*helper, '\0', nla_len(a))) {
+ OVS_NLERR(log, "Invalid conntrack helper");
+ return -EINVAL;
+ }
+ break;
default:
OVS_NLERR(log, "Unknown conntrack attr (%d)",
type);
@@ -534,6 +626,7 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
struct sw_flow_actions **sfa, bool log)
{
struct ovs_conntrack_info ct_info;
+ const char *helper = NULL;
u16 family;
int err;
@@ -546,7 +639,7 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
memset(&ct_info, 0, sizeof(ct_info));
ct_info.family = family;
- err = parse_ct(attr, &ct_info, log);
+ err = parse_ct(attr, &ct_info, &helper, log);
if (err)
return err;
@@ -556,6 +649,11 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
OVS_NLERR(log, "Failed to allocate conntrack template");
return -ENOMEM;
}
+ if (helper) {
+ err = ovs_ct_add_helper(&ct_info, helper, key, log);
+ if (err)
+ goto err_free_ct;
+ }
err = ovs_nla_add_action(sfa, OVS_ACTION_ATTR_CT, &ct_info,
sizeof(ct_info), log);
@@ -585,6 +683,11 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info,
if (nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone))
return -EMSGSIZE;
#endif
+ if (ct_info->helper) {
+ if (nla_put_string(skb, OVS_CT_ATTR_HELPER,
+ ct_info->helper->name))
+ return -EMSGSIZE;
+ }
nla_nest_end(skb, start);
@@ -595,6 +698,8 @@ void ovs_ct_free_action(const struct nlattr *a)
{
struct ovs_conntrack_info *ct_info = nla_data(a);
+ if (ct_info->helper)
+ module_put(ct_info->helper->me);
if (ct_info->ct)
nf_ct_put(ct_info->ct);
}
--
2.1.4
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> Expose the kernel connection tracker via OVS. Userspace components can
> make use of the "ct()" action, followed by "recirculate", to populate
> the conntracking state in the OVS flow key, and subsequently match on
> that state.
>
> Example ODP flows allowing traffic from 1->2, only replies from 2->1:
> in_port=1,tcp,action=ct(commit,zone=1),2
> in_port=2,ct_state=-trk,tcp,action=ct(zone=1),recirc(1)
> recirc_id=1,in_port=2,ct_state=+trk+est-new,tcp,action=1
>
> IP fragments are handled by transparently assembling them as part of the
> ct action. The maximum received unit (MRU) size is tracked so that
> refragmentation can occur during output.
>
> IP frag handling contributed by Andy Zhou.
>
> Signed-off-by: Joe Stringer <[email protected]>
> Signed-off-by: Justin Pettit <[email protected]>
> Signed-off-by: Andy Zhou <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> Allow matching and setting the conntrack mark field. As with conntrack
> state and zone, these are populated when the CT action is executed,
> and are made available for matching via RECIRC. To write to this field,
> a value and optional mark can be passed as part of the conntrack action.
>
> E.g.: actions:ct(zone=0),ct(zone=1,mark=1)
>
> This will perform conntrack lookup in zone 0, then lookup in zone 1,
> then modify the mark for the entry in zone 1. The conntrack entry itself
> must be committed using the "commit" flag in the conntrack action flags
> for this change to persist.
>
> Signed-off-by: Justin Pettit <[email protected]>
> Signed-off-by: Joe Stringer <[email protected]>
> ---
> v1-v3: No change.
> v4: Only allow setting conntrack mark via ct action.
> Documentation tweaks.
> ---
> include/uapi/linux/openvswitch.h | 5 ++++
> net/openvswitch/actions.c | 1 +
> net/openvswitch/conntrack.c | 61 ++++++++++++++++++++++++++++++++++++++--
> net/openvswitch/conntrack.h | 1 +
> net/openvswitch/flow.h | 1 +
> net/openvswitch/flow_netlink.c | 15 +++++++++-
> 6 files changed, 80 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 55f5997..7a185b5 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -325,6 +325,7 @@ enum ovs_key_attr {
> * the accepted length of the array. */
> OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */
> OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
> + OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */
>
> #ifdef __KERNEL__
> OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
> @@ -613,11 +614,15 @@ struct ovs_action_hash {
> * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
> * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
> * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
> + * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
> + * mask, the corresponding bit in the value is copied to the connection
> + * tracking mark field in the connection.
> */
> enum ovs_ct_attr {
> OVS_CT_ATTR_UNSPEC,
> OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */
> OVS_CT_ATTR_ZONE, /* u16 zone id. */
> + OVS_CT_ATTR_MARK, /* mark to associate with this connection. */
> __OVS_CT_ATTR_MAX
> };
>
ovs_ct_action_to_attr() is not updated to return complete datapath
action back to userpsace. same issue exist in set label patch.
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> The following patches will reuse this code from OVS.
>
> Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> Add functions to change connlabel length into nf_conntrack_labels.c so
> they may be reused by other modules like OVS and nftables without
> needing to jump through xt_match_check() hoops.
>
> Suggested-by: Florian Westphal <[email protected]>
> Signed-off-by: Joe Stringer <[email protected]>
> Acked-by: Florian Westphal <[email protected]>
> ---
> v2: Protect connlabel modification with spinlock.
> Fix reference leak in error case.
> Style fixups.
> v3: No change.
> v4: Add ack.
> ---
Looks good to me.
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> Allow matching and setting the conntrack label field. As with ct_mark,
> this is populated by executing the CT action, and is a writable field.
> Specifying a label and optional mask allows the label to be modified,
> which takes effect on the entry found by the lookup of the CT action.
>
> E.g.: actions:ct(zone=1,label=1)
>
> This will perform conntrack lookup in zone 1, then modify the label for
> that entry. The conntrack entry itself must be committed using the
> "commit" flag in the conntrack action flags for this change to persist.
>
> Signed-off-by: Joe Stringer <[email protected]>
I got compilation error after applying this patch:
net/openvswitch/conntrack.c: In function ‘ovs_ct_init’:
net/openvswitch/conntrack.c:713: error: expected ‘)’ before ‘;’ token
net/openvswitch/conntrack.c:715: error: expected expression before ‘}’ token
> ---
> v2: Split out setting the connlabel size for the current namespace.
> v3: No change.
> v4: Only allow setting label via ct action.
> Update documentation.
> ---
...
> type);
> @@ -432,6 +521,10 @@ bool ovs_ct_verify(enum ovs_key_attr attr)
> if (attr & OVS_KEY_ATTR_CT_MARK)
> return true;
> #endif
> +#ifdef CONFIG_NF_CONNTRACK_LABELS
> + if (attr & OVS_KEY_ATTR_CT_LABEL)
> + return true;
> +#endif
>
OVS_KEY_ATTR_CT_LABEL is not bit field so bitwise AND operation does
not work here. This applies to all check done in this function.
> return false;
> }
> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>
> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
> {
> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
> +
> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
> + if (nf_connlabels_get(net, n_bits);
> + OVS_NLERR(true, "Failed to set connlabel length");
> }
>
In case of error should we reject conntrack label actions? Otherwise
user will never see any error. But action could drop packets.
> void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
> @@ -518,4 +615,5 @@ void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
> nf_ct_l3proto_module_put(PF_INET);
> if (data->xt_v6)
> nf_ct_l3proto_module_put(PF_INET6);
> + nf_connlabels_put(net);
> }
On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
> Add support for using conntrack helpers to assist protocol detection.
> The new OVS_CT_ATTR_HELPER attribute of the ct action specifies a helper
> to be used for this connection.
>
> Example ODP flows allowing FTP connections from ports 1->2:
> in_port=1,tcp,action=ct(helper=ftp,commit),2
> in_port=2,tcp,ct_state=-trk,action=ct(),recirc(1)
> recirc_id=1,in_port=2,tcp,ct_state=+trk-new+est,action=1
> recirc_id=1,in_port=2,tcp,ct_state=+trk+rel,action=1
>
> Signed-off-by: Joe Stringer <[email protected]>
> ---
> v2-v3: No change.
> v4: Change error code for unknown helper ENOENT->EINVAL.
I got following compilation warning :
net/openvswitch/conntrack.c:352:42: error: incompatible types in
comparison expression (different address spaces)
> ---
> include/uapi/linux/openvswitch.h | 3 ++
> net/openvswitch/conntrack.c | 109 ++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 110 insertions(+), 2 deletions(-)
>
...
> +static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
> + const struct sw_flow_key *key, bool log)
> +{
> + struct nf_conntrack_helper *helper;
> + struct nf_conn_help *help;
> +
> + helper = nf_conntrack_helper_try_module_get(name, info->family,
> + key->ip.proto);
> + if (!helper) {
> + OVS_NLERR(log, "Unknown helper \"%s\"", name);
> + return -EINVAL;
> + }
> +
> + help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL);
> + if (!help) {
> + module_put(helper->me);
> + return -ENOMEM;
> + }
> +
> + help->helper = helper;
helper is rcu pointer so need to use rcu API to set the value. I know
it is not required here, but it is still cleaner to use the API.
> + info->helper = helper;
> + return 0;
> +}
> +
Thanks for the review,
On 19 August 2015 at 14:24, Pravin Shelar <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>> Allow matching and setting the conntrack label field. As with ct_mark,
>> this is populated by executing the CT action, and is a writable field.
>> Specifying a label and optional mask allows the label to be modified,
>> which takes effect on the entry found by the lookup of the CT action.
>>
>> E.g.: actions:ct(zone=1,label=1)
>>
>> This will perform conntrack lookup in zone 1, then modify the label for
>> that entry. The conntrack entry itself must be committed using the
>> "commit" flag in the conntrack action flags for this change to persist.
>>
>> Signed-off-by: Joe Stringer <[email protected]>
> I got compilation error after applying this patch:
> net/openvswitch/conntrack.c: In function ‘ovs_ct_init’:
> net/openvswitch/conntrack.c:713: error: expected ‘)’ before ‘;’ token
> net/openvswitch/conntrack.c:715: error: expected expression before ‘}’ token
Sorry about that, missed it in my final refactoring round. I'll fix this.
>> type);
>> @@ -432,6 +521,10 @@ bool ovs_ct_verify(enum ovs_key_attr attr)
>> if (attr & OVS_KEY_ATTR_CT_MARK)
>> return true;
>> #endif
>> +#ifdef CONFIG_NF_CONNTRACK_LABELS
>> + if (attr & OVS_KEY_ATTR_CT_LABEL)
>> + return true;
>> +#endif
>>
> OVS_KEY_ATTR_CT_LABEL is not bit field so bitwise AND operation does
> not work here. This applies to all check done in this function.
Should be BIT(...), I'll fix this.
>> return false;
>> }
>> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>>
>> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
>> {
>> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
>> +
>> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
>> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
>> + if (nf_connlabels_get(net, n_bits);
>> + OVS_NLERR(true, "Failed to set connlabel length");
>> }
>>
> In case of error should we reject conntrack label actions? Otherwise
> user will never see any error. But action could drop packets.
I suspect that currently errors would be seen from ovs_ct_set_label():
>.......if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
>.......>.......return -ENOSPC;
So, for cmd_execute, userspace would see this. For regular handling,
pipeline processing would stop (so, drop).
However, I agree it would be more friendly to have the attribute
rejected up-front. Just means we'll pass the datapath all the way
down:
ovs_nla_get_match()
--> ovs_key_from_nlattrs()
--> metadata_from_nlattrs()
--> ovs_ct_verify()
And rather than simply reporting the error in ovs_ct_init() there,
we'll store the success condition something like:
@@ -721,8 +721,12 @@ void ovs_ct_init(struct net *net, struct
ovs_ct_perdp_data *data)
data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
- if (nf_connlabels_get(net, n_bits));
+ if (nf_connlabels_get(net, n_bits)) {
+ data->xt_label = false;
OVS_NLERR(true, "Failed to set connlabel length");
+ } else {
+ data->xt_label = true;
+ }
}
void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
ovs_ct_exit() also needs to be updated to ensure that if this fails,
we don't try to put the connlabel use back.
On Wed, Aug 19, 2015 at 4:04 PM, Joe Stringer <[email protected]> wrote:
> Thanks for the review,
>
> On 19 August 2015 at 14:24, Pravin Shelar <[email protected]> wrote:
>> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>>> Allow matching and setting the conntrack label field. As with ct_mark,
>>> this is populated by executing the CT action, and is a writable field.
>>> Specifying a label and optional mask allows the label to be modified,
>>> which takes effect on the entry found by the lookup of the CT action.
>>>
>>> E.g.: actions:ct(zone=1,label=1)
>>>
>>> This will perform conntrack lookup in zone 1, then modify the label for
>>> that entry. The conntrack entry itself must be committed using the
>>> "commit" flag in the conntrack action flags for this change to persist.
>>>
>
>>> return false;
>>> }
>>> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>>>
>>> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
>>> {
>>> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
>>> +
>>> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
>>> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
>>> + if (nf_connlabels_get(net, n_bits);
>>> + OVS_NLERR(true, "Failed to set connlabel length");
>>> }
>>>
>> In case of error should we reject conntrack label actions? Otherwise
>> user will never see any error. But action could drop packets.
>
> I suspect that currently errors would be seen from ovs_ct_set_label():
>
>>.......if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
>>.......>.......return -ENOSPC;
>
> So, for cmd_execute, userspace would see this. For regular handling,
> pipeline processing would stop (so, drop).
>
> However, I agree it would be more friendly to have the attribute
> rejected up-front. Just means we'll pass the datapath all the way
> down:
> ovs_nla_get_match()
> --> ovs_key_from_nlattrs()
> --> metadata_from_nlattrs()
> --> ovs_ct_verify()
>
> And rather than simply reporting the error in ovs_ct_init() there,
> we'll store the success condition something like:
>
> @@ -721,8 +721,12 @@ void ovs_ct_init(struct net *net, struct
> ovs_ct_perdp_data *data)
>
> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
> - if (nf_connlabels_get(net, n_bits));
> + if (nf_connlabels_get(net, n_bits)) {
> + data->xt_label = false;
> OVS_NLERR(true, "Failed to set connlabel length");
> + } else {
> + data->xt_label = true;
> + }
> }
>
> void ovs_ct_exit(struct net *net, struct ovs_ct_perdp_data *data)
>
> ovs_ct_exit() also needs to be updated to ensure that if this fails,
> we don't try to put the connlabel use back.
Looks good.
On 08/18/15 at 04:39pm, Joe Stringer wrote:
> Expose the kernel connection tracker via OVS. Userspace components can
> make use of the "ct()" action, followed by "recirculate", to populate
> the conntracking state in the OVS flow key, and subsequently match on
> that state.
>
> Example ODP flows allowing traffic from 1->2, only replies from 2->1:
> in_port=1,tcp,action=ct(commit,zone=1),2
> in_port=2,ct_state=-trk,tcp,action=ct(zone=1),recirc(1)
> recirc_id=1,in_port=2,ct_state=+trk+est-new,tcp,action=1
>
> IP fragments are handled by transparently assembling them as part of the
> ct action. The maximum received unit (MRU) size is tracked so that
> refragmentation can occur during output.
>
> IP frag handling contributed by Andy Zhou.
>
> Signed-off-by: Joe Stringer <[email protected]>
> Signed-off-by: Justin Pettit <[email protected]>
> Signed-off-by: Andy Zhou <[email protected]>
OVS bits look great.
Acked-by: Thomas Graf <[email protected]>
On 08/18/15 at 04:39pm, Joe Stringer wrote:
> The following patches will reuse this code from OVS.
>
> Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
On 08/18/15 at 04:39pm, Joe Stringer wrote:
> Add functions to change connlabel length into nf_conntrack_labels.c so
> they may be reused by other modules like OVS and nftables without
> needing to jump through xt_match_check() hoops.
>
> Suggested-by: Florian Westphal <[email protected]>
> Signed-off-by: Joe Stringer <[email protected]>
> Acked-by: Florian Westphal <[email protected]>
Acked-by: Thomas Graf <[email protected]>
On 20 August 2015 at 08:45, Pravin Shelar <[email protected]> wrote:
> On Wed, Aug 19, 2015 at 4:04 PM, Joe Stringer <[email protected]> wrote:
>> Thanks for the review,
>>
>> On 19 August 2015 at 14:24, Pravin Shelar <[email protected]> wrote:
>>> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>>>> Allow matching and setting the conntrack label field. As with ct_mark,
>>>> this is populated by executing the CT action, and is a writable field.
>>>> Specifying a label and optional mask allows the label to be modified,
>>>> which takes effect on the entry found by the lookup of the CT action.
>>>>
>>>> E.g.: actions:ct(zone=1,label=1)
>>>>
>>>> This will perform conntrack lookup in zone 1, then modify the label for
>>>> that entry. The conntrack entry itself must be committed using the
>>>> "commit" flag in the conntrack action flags for this change to persist.
>>>>
>
>>
>>>> return false;
>>>> }
>>>> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>>>>
>>>> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
>>>> {
>>>> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
>>>> +
>>>> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
>>>> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
>>>> + if (nf_connlabels_get(net, n_bits);
>>>> + OVS_NLERR(true, "Failed to set connlabel length");
>>>> }
>>>>
>>> In case of error should we reject conntrack label actions? Otherwise
>>> user will never see any error. But action could drop packets.
>>
>> I suspect that currently errors would be seen from ovs_ct_set_label():
>>
>>>.......if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
>>>.......>.......return -ENOSPC;
>>
>> So, for cmd_execute, userspace would see this. For regular handling,
>> pipeline processing would stop (so, drop).
>>
>> However, I agree it would be more friendly to have the attribute
>> rejected up-front. Just means we'll pass the datapath all the way
>> down:
>> ovs_nla_get_match()
>> --> ovs_key_from_nlattrs()
>> --> metadata_from_nlattrs()
>> --> ovs_ct_verify()
Incidentally, we generally don't have the datapath by this point
(ovs_nla_get_match()). There'd need to be a bit of rearranging in the
ovs_flow_cmd_* functions, which would include holding the locks for
longer. Given that the two most common cases are that either A) The
kernel is configured with connlabel support, and built with support
for at least 128 bits of label, or B) the kernel is configured without
connlabel, and this is handled already in ovs_ct_verify(), I don't
think it's worth making this particular change.
On Thu, Aug 20, 2015 at 12:13 PM, Joe Stringer <[email protected]> wrote:
> On 20 August 2015 at 08:45, Pravin Shelar <[email protected]> wrote:
>> On Wed, Aug 19, 2015 at 4:04 PM, Joe Stringer <[email protected]> wrote:
>>> Thanks for the review,
>>>
>>> On 19 August 2015 at 14:24, Pravin Shelar <[email protected]> wrote:
>>>> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>>>>> Allow matching and setting the conntrack label field. As with ct_mark,
>>>>> this is populated by executing the CT action, and is a writable field.
>>>>> Specifying a label and optional mask allows the label to be modified,
>>>>> which takes effect on the entry found by the lookup of the CT action.
>>>>>
>>>>> E.g.: actions:ct(zone=1,label=1)
>>>>>
>>>>> This will perform conntrack lookup in zone 1, then modify the label for
>>>>> that entry. The conntrack entry itself must be committed using the
>>>>> "commit" flag in the conntrack action flags for this change to persist.
>>>>>
>>
>>>
>>>>> return false;
>>>>> }
>>>>> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>>>>>
>>>>> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
>>>>> {
>>>>> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
>>>>> +
>>>>> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
>>>>> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
>>>>> + if (nf_connlabels_get(net, n_bits);
>>>>> + OVS_NLERR(true, "Failed to set connlabel length");
>>>>> }
>>>>>
>>>> In case of error should we reject conntrack label actions? Otherwise
>>>> user will never see any error. But action could drop packets.
>>>
>>> I suspect that currently errors would be seen from ovs_ct_set_label():
>>>
>>>>.......if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
>>>>.......>.......return -ENOSPC;
>>>
>>> So, for cmd_execute, userspace would see this. For regular handling,
>>> pipeline processing would stop (so, drop).
>>>
>>> However, I agree it would be more friendly to have the attribute
>>> rejected up-front. Just means we'll pass the datapath all the way
>>> down:
>>> ovs_nla_get_match()
>>> --> ovs_key_from_nlattrs()
>>> --> metadata_from_nlattrs()
>>> --> ovs_ct_verify()
>
> Incidentally, we generally don't have the datapath by this point
> (ovs_nla_get_match()). There'd need to be a bit of rearranging in the
> ovs_flow_cmd_* functions, which would include holding the locks for
> longer. Given that the two most common cases are that either A) The
> kernel is configured with connlabel support, and built with support
> for at least 128 bits of label, or B) the kernel is configured without
> connlabel, and this is handled already in ovs_ct_verify(), I don't
> think it's worth making this particular change.
Actually I do not see need for this to be per datapath property.
infact there is no need to have struct ovs_ct_perdp_data.
ovs_ct_init can be called from ovs-module init.
nf-connlabel bit length is per net-namespace property. So
nf_connlabels_get() should be called from ovs namespace init. This
way you can move xt_label flag to ovs_net. ovs_net can be accessed
from ovs_flow_cmd_* functions.
On 20 August 2015 at 14:01, Pravin Shelar <[email protected]> wrote:
> On Thu, Aug 20, 2015 at 12:13 PM, Joe Stringer <[email protected]> wrote:
>> On 20 August 2015 at 08:45, Pravin Shelar <[email protected]> wrote:
>>> On Wed, Aug 19, 2015 at 4:04 PM, Joe Stringer <[email protected]> wrote:
>>>> Thanks for the review,
>>>>
>>>> On 19 August 2015 at 14:24, Pravin Shelar <[email protected]> wrote:
>>>>> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>>>>>> Allow matching and setting the conntrack label field. As with ct_mark,
>>>>>> this is populated by executing the CT action, and is a writable field.
>>>>>> Specifying a label and optional mask allows the label to be modified,
>>>>>> which takes effect on the entry found by the lookup of the CT action.
>>>>>>
>>>>>> E.g.: actions:ct(zone=1,label=1)
>>>>>>
>>>>>> This will perform conntrack lookup in zone 1, then modify the label for
>>>>>> that entry. The conntrack entry itself must be committed using the
>>>>>> "commit" flag in the conntrack action flags for this change to persist.
>>>>>>
>>>
>>>>
>>>>>> return false;
>>>>>> }
>>>>>> @@ -508,8 +601,12 @@ void ovs_ct_free_action(const struct nlattr *a)
>>>>>>
>>>>>> void ovs_ct_init(struct net *net, struct ovs_ct_perdp_data *data)
>>>>>> {
>>>>>> + unsigned int n_bits = sizeof(struct ovs_key_ct_label) * BITS_PER_BYTE;
>>>>>> +
>>>>>> data->xt_v4 = !nf_ct_l3proto_try_module_get(PF_INET);
>>>>>> data->xt_v6 = !nf_ct_l3proto_try_module_get(PF_INET6);
>>>>>> + if (nf_connlabels_get(net, n_bits);
>>>>>> + OVS_NLERR(true, "Failed to set connlabel length");
>>>>>> }
>>>>>>
>>>>> In case of error should we reject conntrack label actions? Otherwise
>>>>> user will never see any error. But action could drop packets.
>>>>
>>>> I suspect that currently errors would be seen from ovs_ct_set_label():
>>>>
>>>>>.......if (!cl || cl->words * sizeof(long) < OVS_CT_LABEL_LEN)
>>>>>.......>.......return -ENOSPC;
>>>>
>>>> So, for cmd_execute, userspace would see this. For regular handling,
>>>> pipeline processing would stop (so, drop).
>>>>
>>>> However, I agree it would be more friendly to have the attribute
>>>> rejected up-front. Just means we'll pass the datapath all the way
>>>> down:
>>>> ovs_nla_get_match()
>>>> --> ovs_key_from_nlattrs()
>>>> --> metadata_from_nlattrs()
>>>> --> ovs_ct_verify()
>>
>> Incidentally, we generally don't have the datapath by this point
>> (ovs_nla_get_match()). There'd need to be a bit of rearranging in the
>> ovs_flow_cmd_* functions, which would include holding the locks for
>> longer. Given that the two most common cases are that either A) The
>> kernel is configured with connlabel support, and built with support
>> for at least 128 bits of label, or B) the kernel is configured without
>> connlabel, and this is handled already in ovs_ct_verify(), I don't
>> think it's worth making this particular change.
>
> Actually I do not see need for this to be per datapath property.
> infact there is no need to have struct ovs_ct_perdp_data.
> ovs_ct_init can be called from ovs-module init.
> nf-connlabel bit length is per net-namespace property. So
> nf_connlabels_get() should be called from ovs namespace init. This
> way you can move xt_label flag to ovs_net. ovs_net can be accessed
> from ovs_flow_cmd_* functions.
That sounds like a tidier approach, I'll roll it into the next version.
On 19 August 2015 at 13:47, Pravin Shelar <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>> Allow matching and setting the conntrack mark field. As with conntrack
>> state and zone, these are populated when the CT action is executed,
>> and are made available for matching via RECIRC. To write to this field,
>> a value and optional mark can be passed as part of the conntrack action.
>>
>> E.g.: actions:ct(zone=0),ct(zone=1,mark=1)
>>
>> This will perform conntrack lookup in zone 0, then lookup in zone 1,
>> then modify the mark for the entry in zone 1. The conntrack entry itself
>> must be committed using the "commit" flag in the conntrack action flags
>> for this change to persist.
>>
>> Signed-off-by: Justin Pettit <[email protected]>
>> Signed-off-by: Joe Stringer <[email protected]>
>> ---
>> v1-v3: No change.
>> v4: Only allow setting conntrack mark via ct action.
>> Documentation tweaks.
>> ---
>> include/uapi/linux/openvswitch.h | 5 ++++
>> net/openvswitch/actions.c | 1 +
>> net/openvswitch/conntrack.c | 61 ++++++++++++++++++++++++++++++++++++++--
>> net/openvswitch/conntrack.h | 1 +
>> net/openvswitch/flow.h | 1 +
>> net/openvswitch/flow_netlink.c | 15 +++++++++-
>> 6 files changed, 80 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
>> index 55f5997..7a185b5 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -325,6 +325,7 @@ enum ovs_key_attr {
>> * the accepted length of the array. */
>> OVS_KEY_ATTR_CT_STATE, /* u8 bitmask of OVS_CS_F_* */
>> OVS_KEY_ATTR_CT_ZONE, /* u16 connection tracking zone. */
>> + OVS_KEY_ATTR_CT_MARK, /* u32 connection tracking mark */
>>
>> #ifdef __KERNEL__
>> OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
>> @@ -613,11 +614,15 @@ struct ovs_action_hash {
>> * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
>> * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
>> * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
>> + * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
>> + * mask, the corresponding bit in the value is copied to the connection
>> + * tracking mark field in the connection.
>> */
>> enum ovs_ct_attr {
>> OVS_CT_ATTR_UNSPEC,
>> OVS_CT_ATTR_FLAGS, /* u8 bitmask of OVS_CT_F_*. */
>> OVS_CT_ATTR_ZONE, /* u16 zone id. */
>> + OVS_CT_ATTR_MARK, /* mark to associate with this connection. */
>> __OVS_CT_ATTR_MAX
>> };
>>
> ovs_ct_action_to_attr() is not updated to return complete datapath
> action back to userpsace. same issue exist in set label patch.
Fixed, thanks for the review.
On 19 August 2015 at 15:57, Pravin Shelar <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>> Add support for using conntrack helpers to assist protocol detection.
>> The new OVS_CT_ATTR_HELPER attribute of the ct action specifies a helper
>> to be used for this connection.
>>
>> Example ODP flows allowing FTP connections from ports 1->2:
>> in_port=1,tcp,action=ct(helper=ftp,commit),2
>> in_port=2,tcp,ct_state=-trk,action=ct(),recirc(1)
>> recirc_id=1,in_port=2,tcp,ct_state=+trk-new+est,action=1
>> recirc_id=1,in_port=2,tcp,ct_state=+trk+rel,action=1
>>
>> Signed-off-by: Joe Stringer <[email protected]>
>> ---
>> v2-v3: No change.
>> v4: Change error code for unknown helper ENOENT->EINVAL.
>
> I got following compilation warning :
>
> net/openvswitch/conntrack.c:352:42: error: incompatible types in
> comparison expression (different address spaces)
Is this made available via another sparse flag? It looks like it's
related to the __rcu as you've mentioned below, but I'm not seeing
this (latest sparse, gcc-4.9.2)
>> +static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
>> + const struct sw_flow_key *key, bool log)
>> +{
>> + struct nf_conntrack_helper *helper;
>> + struct nf_conn_help *help;
>> +
>> + helper = nf_conntrack_helper_try_module_get(name, info->family,
>> + key->ip.proto);
>> + if (!helper) {
>> + OVS_NLERR(log, "Unknown helper \"%s\"", name);
>> + return -EINVAL;
>> + }
>> +
>> + help = nf_ct_helper_ext_add(info->ct, helper, GFP_KERNEL);
>> + if (!help) {
>> + module_put(helper->me);
>> + return -ENOMEM;
>> + }
>> +
>> + help->helper = helper;
> helper is rcu pointer so need to use rcu API to set the value. I know
> it is not required here, but it is still cleaner to use the API.
Will update, thanks.
On Thu, Aug 20, 2015 at 5:47 PM, Joe Stringer <[email protected]> wrote:
> On 19 August 2015 at 15:57, Pravin Shelar <[email protected]> wrote:
>> On Tue, Aug 18, 2015 at 4:39 PM, Joe Stringer <[email protected]> wrote:
>>> Add support for using conntrack helpers to assist protocol detection.
>>> The new OVS_CT_ATTR_HELPER attribute of the ct action specifies a helper
>>> to be used for this connection.
>>>
>>> Example ODP flows allowing FTP connections from ports 1->2:
>>> in_port=1,tcp,action=ct(helper=ftp,commit),2
>>> in_port=2,tcp,ct_state=-trk,action=ct(),recirc(1)
>>> recirc_id=1,in_port=2,tcp,ct_state=+trk-new+est,action=1
>>> recirc_id=1,in_port=2,tcp,ct_state=+trk+rel,action=1
>>>
>>> Signed-off-by: Joe Stringer <[email protected]>
>>> ---
>>> v2-v3: No change.
>>> v4: Change error code for unknown helper ENOENT->EINVAL.
>>
>> I got following compilation warning :
>>
>> net/openvswitch/conntrack.c:352:42: error: incompatible types in
>> comparison expression (different address spaces)
>
> Is this made available via another sparse flag? It looks like it's
> related to the __rcu as you've mentioned below, but I'm not seeing
> this (latest sparse, gcc-4.9.2)
>
You need to enable RCU space checker in kernel config.