This is experienced with l3.large, t2.micro with Ubuntu 14. I believe
we don't have any special settings over system defaults.
We send a large datagram from remote host, e.g. with such trivial app in python:
import socket
UDP_IP = "123.123.123.123" # remote host IP
UDP_PORT = 33333
MESSAGE = """
.....0010......0020......0030......0040......0050......0060......0070......0080......0090......0100
.....0110......0120......0130......0140......0150......0160......0170......0180......0190......0200
.....0210......0220......0230......0240......0250......0260......0270......0280......0290......0300
.....0310......0320......0330......0340......0350......0360......0370......0380......0390......0400
.....0410......0420......0430......0440......0450......0460......0470......0480......0490......0500
.....0510......0520......0530......0540......0550......0560......0570......0580......0590......0600
.....0610......0620......0630......0640......0650......0660......0670......0680......0690......0700
.....0710......0720......0730......0740......0750......0760......0770......0780......0790......0800
.....0810......0820......0830......0840......0850......0860......0870......0880......0890......0900
.....0910......0920......0930......0940......0950......0960......0970......0980......0990......1000
.....1010......1020......1030......1040......1050......1060......1070......1080......1090......1100
.....1110......1120......1130......1140......1150......1160......1170......1180......1190......1200
.....1210......1220......1230......1240......1250......1260......1270......1280......1290......1300
.....1310......1320......1330......1340......1350......1360......1370......1380......1390......1400
.....1410......1420......1430......1440......1450......1460......1470......1480......1490......1500
.....1510......1520......1530......1540......1550......1560......1570......1580......1590......1600
.....1610......1620......1630......1640......1650......1660......1670......1680......1690......1700
.....1710......1720......1730......1740......1750......1760......1770......1780......1790......1800
.....1810......1820......1830......1840......1850......1860......1870......1880......1890......1900
.....1910......1920......1930......1940......1950......1960......1970......1980......1990......2000"""
print "UDP target IP:", UDP_IP
print "UDP target port:", UDP_PORT
print "message:", MESSAGE
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(MESSAGE, (UDP_IP, UDP_PORT))
Then we listen on that port with such trivial python app:
import socket
UDP_IP = "172.31.4.102" # local ip
UDP_PORT = 33333
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))
while True:
data, addr = sock.recvfrom(0x10000)
print "received message:", data
Meanwhile, we monitor the traffic with e.g. ngrep:
ngrep -t -e -d any -W byline -O large_udp.pcap port 33333 or
'(ip[6:2]' '&' '0x1fff)' '!=' '0'
(the part after "or" catches segments of segmented packets)
About the host:
# uname -a
Linux hostname 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15
17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Also linux-image-3.13.0-36-generic and linux-image-3.13.0-45-generic
behave in same way.
What we see:
- ngrep shows the packets with correct contents. All segments are delivered.
- application doesn't get any data at all
Rarely dmesg shows such messages:
[ 102.161679] UDP: bad checksum. From 123.123.123.124:56439 to
172.31.4.102:33333 ulen 2008
but it is logged really rarely, so this is surely not what happens on
every packet transmission.
This test works fine on e.g. cheapest DigitalOcean VPS.
I am concerned with this issue because rtpengine software has UDP
interface. So on Amazon hosts this interface works only within
localhost, and I cannot distribute software to different nodes.
Any thoughts? What's wrong, how to fix?
--
Andrey Utkin