2014-02-13 19:45:39

by Rick Jones

[permalink] [raw]
Subject: Occasional lockups creating/manipulating namespaces in 3.13.0 and 3.14.0-rc2

This started as a (misdirected) netdev thread wherein I was reporting a
non-trivial slowdown in the serial creation/manpipulation of lots of
network namespaces in 3.13.0 compared to 3.12.9 and 3.5.0-44+ (the
latter a Canonical kernel with some further added commits). I've since
found that 3.13.0 appears to scale better with parallel streams of such
things (*) so I'm not as concerned about the serial slowdown, but in the
meantime I've also encountered, three or four times now but not
consistently, some "hangs" when I have 16 streams of these namespace
creations/manipulations going-on on a two socket E5-2670 system.

I've seen the hang on both 3.13.0 and 3.14.0-rc2 now (kernel from a
linux-stable tree, using the config of the 3.5.0-44 kernel), and from
the 3.14.0-rc2 kernel I've gotten a crash dump via sysreq_trigger. I
lack the knowledge to debug the crash dump but can put it up on netperf.org.

I also got the attached from the 3.13.0 kernel (before someone told me
about sysreq_trigger).

These lockups seem to affect only the 16 streams creating/manipulating
(all adds, no deletes) the namespaces, the rest of the system (as far as
I've tried it) remains going.

I've seen the hang both with this "light" script:
#Assumed to be called as add_fake_router <sudo>
SUDO=$1
j=`uuidgen`
$SUDO ip netns add bar-${j}
$SUDO ip netns exec bar-${j} ip link set lo up
$SUDO ip netns exec bar-${j} sysctl -w net.ipv4.ip_forward=1 > /dev/null
k=`echo $j | cut -b -11`
$SUDO /home/rjones2/iproute2_tot/ip/ip link add ro-${k} type veth peer
name ri-${k} netns bar-${j}
$SUDO /home/rjones2/iproute2_tot/ip/ip link add go-${k} type veth peer
name gi-${k} netns bar-${j}

as well as a "heavier" one which does many more ip netns exec commands
against each namespace.

rick jones

(*) well, four streams of "heavy" on 3.13.0 start-out still slower than
3.5.0-44+ but it crosses to above the first's "namespaces vs time" curve
by around 2000 namespaces, 8 streams on 3.5.0-44 is the same as 4, 8
streams on 3.13.0 is faster than 4 all the way out to 4000 namespaces
(the limit of my testing), and 16 streams on 3.13.0 starts out faster
than 8 on 3.13.0 but its curve crosses under that of 8 on 3.13.0 at
about 1300 or 1400 namespaces. But I digress and can provide that
data/spreadsheet if another thread is warranted.

Start of the netdev thread -
http://marc.info/?l=linux-netdev&m=139154384317278&w=2


Attachments:
16_stream_lockup_dmesg.txt (159.53 kB)