On Thu, Oct 29, 2015 at 09:51:16AM -0400, Dan Streetman wrote:
> Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
> templates; their dst_entries counters will never be used. Move the
> xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
> xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
> and dst_entries_destroy for each net namespace.
>
> The ipv4 and ipv6 xfrms each create dst_ops template, and perform
> dst_entries_init on the templates. The template values are copied to each
> net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
> pcpuc_entries field is a percpu counter and cannot be used correctly by
> simply copying it to another object.
>
> The result of this is a very subtle bug; changes to the dst entries
> counter from one net namespace may sometimes get applied to a different
> net namespace dst entries counter. This is because of how the percpu
> counter works; it has a main count field as well as a pointer to the
> percpu variables. Each net namespace maintains its own main count
> variable, but all point to one set of percpu variables. When any net
> namespace happens to change one of the percpu variables to outside its
> small batch range, its count is moved to the net namespace's main count
> variable. So with multiple net namespaces operating concurrently, the
> dst_ops entries counter can stray from the actual value that it should
> be; if counts are consistently moved from one net namespace to another
> (which my testing showed is likely), then one net namespace winds up
> with a negative dst_ops count while another winds up with a continually
> increasing count, eventually reaching its gc_thresh limit, which causes
> all new traffic on the net namespace to fail with -ENOBUFS.
>
> Signed-off-by: Dan Streetman <[email protected]>
> Signed-off-by: Dan Streetman <[email protected]>
Applied to the ipsec tree, thanks Dan!