2004-06-22 14:58:11

by Martin Zwickel

[permalink] [raw]
Subject: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

Hi there!

I don't know where I should mail this, so maybe it fits to the LKML.

I have a problem with one of my programs which streams data out over udp as a
multicast.

the program creates a thread. in the thread it creates a socket and sets the
multicast stuff:

[code]
if((sendfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0) {
perror("socket(sendfd)");
return -3;
}

ret = 1472; // max send buffer for udp (disable fragmentation?)
if(setsockopt(sendfd, SOL_SOCKET, SO_SNDBUF, (char *)&ret, sizeof (ret))
!= 0) { perror("setsockopt(SO_SNDBUF)");
}

{
struct ip_mreq imr; //XXX zero out
imr.imr_multiaddr.s_addr = inet_addr(td->sendip);
imr.imr_interface.s_addr = 0;
if (setsockopt(sendfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &imr, sizeof(struct
ip_mreq)) < 0)
perror("IP_ADD_MEMBERSHIP");
}
[/code]

then it reads data from a file and with sendto sends it out:
[code]
ret = read(readfd, sendbuf, sizeof(sendbuf));

ret = sendto(sendfd, sendbuf, ret, 0, (struct sockaddr*)&td->udp,
sizeof(td->udp)); //send the stream out
if(ret < 0) {
perror("sendto(sendfd)");
} else {
sent_bytes += ret;
}
[/code]

on the other side is a client(I started it on the same machine) which reads data
from the connected socket and writes the data to a file, which is a pipe.

but sometimes the sendto hangs for a few seconds and longer. the strace output
of the thread:

# strace -p 23620
Process 23620 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
sendto(8, "\0\1\0\31z\30\243\206(\341\216)#\212I#\206H\341\226)%\212"..., 1316,
0, {sa_family=AF_INET, sin_port=htons(22333), sin_addr=inet_addr("224.0.0.1")},
16 <unfinished ...>
Process 23620 detached



and the kernel task output (sysrq + t):

streamserver S D6564C80 0 23995 23994 (NOTLB)
d1fcfbc8 00200082 00000000 d6564c80 c1574800 c0320214 00000001 dff9e200
00b3af60 df8d1c00 c3010010 000002a6 2764d1d8 00000c50 cefb09f8 d1fce000
7fffffff d1fcfc28 7fffffff c0378e02 dffef680 00000000 00000010 c04e3d50
Call Trace:
[<c0320214>] ip_local_deliver+0xdf/0x20b
[<c0378e02>] schedule_timeout+0xb1/0xb3
[<c031108e>] netif_receive_skb+0x167/0x194
[<c030c01f>] sock_wait_for_wmem+0xad/0xc5
[<c0115a27>] autoremove_wake_function+0x0/0x43
[<c0115a27>] autoremove_wake_function+0x0/0x43
[<c030c0bb>] sock_alloc_send_pskb+0x84/0x1cb
[<c030c21b>] sock_alloc_send_skb+0x19/0x21
[<c0324605>] ip_append_data+0x63d/0x6e5
[<c01870be>] do_get_write_access+0x246/0x5bc
[<c0323f23>] ip_generic_getfrag+0x0/0xa5
[<c033ff01>] udp_sendmsg+0x2d9/0x70a
[<c0181c8d>] __ext3_journal_stop+0x24/0x4a
[<c03476d2>] inet_sendmsg+0x4a/0x62
[<c03097cf>] sock_sendmsg+0x86/0xb2
[<c012dcb1>] __generic_file_aio_read+0x1cc/0x1fe
[<c012da1b>] file_read_actor+0x0/0xca
[<c025683a>] opost_block+0xc8/0x16e
[<c0227663>] copy_from_user+0x34/0x61
[<c030aa0b>] sys_sendto+0xc7/0xe2
[<c0157601>] __pollwait+0x0/0xc0
[<c0157c89>] sys_select+0x220/0x493
[<c030b1d0>] sys_socketcall+0x17e/0x249
[<c0103d63>] syscall_call+0x7/0xb



if I run "arp" while it hangs, the sendto continues...
and I also think that it continues if the kernel receives a network packet.

can this be a locking bug?

Regards,
Martin


ps.: I'm sorry if this is the wrong ml!

--
MyExcuse:
monitor VLF leakage

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>


2004-06-23 09:56:24

by Martin Zwickel

[permalink] [raw]
Subject: Re: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

if I use MSG_DONTWAIT with sendto, I get temporarily unavailable resources
(many!):

sendto(sendfd): Resource temporarily unavailable

but isn't udp supposed to not block?

Martin

--
MyExcuse:
astropneumatic oscillations in the water-cooling

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>

2004-06-23 10:35:20

by Denis Vlasenko

[permalink] [raw]
Subject: Re: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

On Wednesday 23 June 2004 12:56, Martin Zwickel wrote:
> if I use MSG_DONTWAIT with sendto, I get temporarily unavailable resources
> (many!):
>
> sendto(sendfd): Resource temporarily unavailable
>
> but isn't udp supposed to not block?

Think about what will happen if you will try to spew
udp packets continuously:

while(1)
sendto(...);

They will pile up in queue and eventually it will fill up.
Then kernel may either drop excess packets silently
or return you EAGAIN.
--
vda

2004-06-23 12:00:27

by Martin Zwickel

[permalink] [raw]
Subject: Re: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

On Wed, 23 Jun 2004 13:34:57 +0300
Denis Vlasenko <[email protected]> bubbled:

> On Wednesday 23 June 2004 12:56, Martin Zwickel wrote:
> > if I use MSG_DONTWAIT with sendto, I get temporarily unavailable resources
> > (many!):
> >
> > sendto(sendfd): Resource temporarily unavailable
> >
> > but isn't udp supposed to not block?
>
> Think about what will happen if you will try to spew
> udp packets continuously:
>
> while(1)
> sendto(...);
>
> They will pile up in queue and eventually it will fill up.
> Then kernel may either drop excess packets silently
> or return you EAGAIN.

Yes, but why does the kernel not send out the queue?(I don't know if the queue
is empty or full when my sendto stops)
Without MSG_DONTWAIT, sendto waits endlessly. But on what?
Normally the kernel should put the queued packets on the line and accept new
ones, or did I misunderstand this?

My program sends out many udp packets, and sometimes it just stops until the
kernel receives a network packet or I access the local network(with arp
command).

So if I run arp in an endless loop(while :; do arp; done), sendto runs smooth.

For me it smells like a bug ;)

Martin

--
MyExcuse:
I'd love to help you -- it's just that the Boss won't let me near the computer.

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>

2004-06-23 12:37:04

by Denis Vlasenko

[permalink] [raw]
Subject: Re: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

On Wednesday 23 June 2004 15:00, Martin Zwickel wrote:
> On Wed, 23 Jun 2004 13:34:57 +0300
>
> Denis Vlasenko <[email protected]> bubbled:
> > On Wednesday 23 June 2004 12:56, Martin Zwickel wrote:
> > > if I use MSG_DONTWAIT with sendto, I get temporarily unavailable
> > > resources (many!):
> > >
> > > sendto(sendfd): Resource temporarily unavailable
> > >
> > > but isn't udp supposed to not block?
> >
> > Think about what will happen if you will try to spew
> > udp packets continuously:
> >
> > while(1)
> > sendto(...);
> >
> > They will pile up in queue and eventually it will fill up.
> > Then kernel may either drop excess packets silently
> > or return you EAGAIN.
>
> Yes, but why does the kernel not send out the queue?(I don't know if the
> queue is empty or full when my sendto stops)
> Without MSG_DONTWAIT, sendto waits endlessly. But on what?

strace, gdb and/or (SysRq-T with ksymoops) will tell you.

> Normally the kernel should put the queued packets on the line and accept
> new ones, or did I misunderstand this?

Hm, yes. What does tcpdump tell you?

> My program sends out many udp packets, and sometimes it just stops until
> the kernel receives a network packet or I access the local network(with arp
> command).

arp does not access network. I think it just prints current arp cache.

> So if I run arp in an endless loop(while :; do arp; done), sendto runs
> smooth.
>
> For me it smells like a bug ;)

Possible. We need more details. Also CC network folks :)
--
vda

2004-06-23 13:42:57

by Martin Zwickel

[permalink] [raw]
Subject: Re: 2.6.7-rc2-mm2 udp multicast problem (sendto hangs)

On Wed, 23 Jun 2004 15:36:10 +0300
Denis Vlasenko <[email protected]> bubbled:

> > Yes, but why does the kernel not send out the queue?(I don't know if the
> > queue is empty or full when my sendto stops)
> > Without MSG_DONTWAIT, sendto waits endlessly. But on what?
>
> strace, gdb and/or (SysRq-T with ksymoops) will tell you.

As I wrote in my first mail:

# strace -p 23620
Process 23620 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
sendto(8, "\0\1\0\31z\30\243\206(\341\216)#\212I#\206H\341\226)%\212"..., 1316,
0, {sa_family=AF_INET, sin_port=htons(22333), sin_addr=inet_addr("224.0.0.1")},
16 <unfinished ...>
Process 23620 detached


and the kernel task output (sysrq + t):

streamserver S D6564C80 0 23995 23994 (NOTLB)
d1fcfbc8 00200082 00000000 d6564c80 c1574800 c0320214 00000001 dff9e200
00b3af60 df8d1c00 c3010010 000002a6 2764d1d8 00000c50 cefb09f8 d1fce000
7fffffff d1fcfc28 7fffffff c0378e02 dffef680 00000000 00000010 c04e3d50
Call Trace:
[<c0320214>] ip_local_deliver+0xdf/0x20b
[<c0378e02>] schedule_timeout+0xb1/0xb3
[<c031108e>] netif_receive_skb+0x167/0x194
[<c030c01f>] sock_wait_for_wmem+0xad/0xc5
[<c0115a27>] autoremove_wake_function+0x0/0x43
[<c0115a27>] autoremove_wake_function+0x0/0x43
[<c030c0bb>] sock_alloc_send_pskb+0x84/0x1cb
[<c030c21b>] sock_alloc_send_skb+0x19/0x21
[<c0324605>] ip_append_data+0x63d/0x6e5
[<c01870be>] do_get_write_access+0x246/0x5bc
[<c0323f23>] ip_generic_getfrag+0x0/0xa5
[<c033ff01>] udp_sendmsg+0x2d9/0x70a
[<c0181c8d>] __ext3_journal_stop+0x24/0x4a
[<c03476d2>] inet_sendmsg+0x4a/0x62
[<c03097cf>] sock_sendmsg+0x86/0xb2
[<c012dcb1>] __generic_file_aio_read+0x1cc/0x1fe
[<c012da1b>] file_read_actor+0x0/0xca
[<c025683a>] opost_block+0xc8/0x16e
[<c0227663>] copy_from_user+0x34/0x61
[<c030aa0b>] sys_sendto+0xc7/0xe2
[<c0157601>] __pollwait+0x0/0xc0
[<c0157c89>] sys_select+0x220/0x493
[<c030b1d0>] sys_socketcall+0x17e/0x249
[<c0103d63>] syscall_call+0x7/0xb


>
> > Normally the kernel should put the queued packets on the line and accept
> > new ones, or did I misunderstand this?
>
> Hm, yes. What does tcpdump tell you?

15:33:14.571437 IP phoebee.32797 > ALL-SYSTEMS.MCAST.NET.22333: UDP, length:
1316
15:33:14.595429 IP phoebee.32797 > ALL-SYSTEMS.MCAST.NET.22333: UDP, length:
1316
15:33:14.596445 IP phoebee.32797 > ALL-SYSTEMS.MCAST.NET.22333: UDP, length:
1316
15:33:14.597445 IP phoebee.32797 > ALL-SYSTEMS.MCAST.NET.22333: UDP, length:
1316

>
> > My program sends out many udp packets, and sometimes it just stops until
> > the kernel receives a network packet or I access the local network(with arp
> > command).
>
> arp does not access network. I think it just prints current arp cache.

well, but it resolves the ip's to the hostnames. arp contacts my nameserver, so
it accesses the network :)

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0
.2")}, 28) = 0

>
> > So if I run arp in an endless loop(while :; do arp; done), sendto runs
> > smooth.
> >
> > For me it smells like a bug ;)
>
> Possible. We need more details. Also CC network folks :)

Hopefully not! But then I need a bugfix for my program...

Where can I find the address?

--
MyExcuse:
Boredom in the Kernel.

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>