LinuxLists.cc - want opinions on possible glitch in 2.4 network error reporting

2002-02-06 20:24:43

Subject: want opinions on possible glitch in 2.4 network error reporting

I've been looking around in the 2.4 networking stack, and I noticed that when
the tulip (and no doubt many other) driver cannot put any more outgoing packets
on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check
this flag by calling netif_queue_stopped(). My concern is that if this flag is
true, we return -ENETDOWN. Is this really the proper return code for this? If
anything, the network is too active. It seems to me that it would make more
sense to have some kind of congestion return code rather than claiming that the
network is down.

I think it would make sense to return -ENOBUFS in this case, as its already
listed in the sendto() man page, and the description matches the error because
the command could succeed if retried.

I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
was calling sendto() on 217000 packets/sec, even though the wire could only
handle about 127000 packets/sec. I got no errors at all in sendto, even though
over a third of the packets were not actually being sent.

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-02-06 21:28:24

by Richard B. Johnson

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Wed, 6 Feb 2002, Chris Friesen wrote:

[SNIPPED...]

>
> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec. I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.
>

In principle, sendto() will always succeed unless you provided the
wrong parameters in the function call, or the machines crashes, at
which time your task won't be there to receive the error code anyway.

Hackers code sendto as:
sendto(s,...);
Professional programmers use:
(void)sendto(s,...);

checking the return value is useless.

Note that the man-page specifically states that ENOBUFS can't happen.

You cannot assume that any sendto() data actually gets on the wire, much
less to its destination. With any user-datagram-protocol, both ends,
sender and receiver, have to work out what they will do with missing
packets and packets received out-of-order.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.

2002-02-06 21:57:25

by Ben Greear

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

However, if you use non-blocking IO you will get EAGAIN if
there is no buffer space. Blocking calls should always
block untill there is buffer space.

Also, just because select says the socket/poll is writable, it
may not be (immediately) because you can send UDP packets
that are larger than 2048 bytes, and that is the cutoff that
tells select the socket is writable...

I've actually sent a patch to Dave Miller to make select/poll
wait untill there is 64k of buffer space (the maximum size of
a UDP packet), but he is still reviewing the issue.

Enjoy,
Ben

Richard B. Johnson wrote:

> On Wed, 6 Feb 2002, Chris Friesen wrote:
>
> [SNIPPED...]
>
>
>
>>I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
>>was calling sendto() on 217000 packets/sec, even though the wire could only
>>handle about 127000 packets/sec. I got no errors at all in sendto, even though
>>over a third of the packets were not actually being sent.
>>
>>
>
> In principle, sendto() will always succeed unless you provided the
> wrong parameters in the function call, or the machines crashes, at
> which time your task won't be there to receive the error code anyway.
>
> Hackers code sendto as:
> sendto(s,...);
> Professional programmers use:
> (void)sendto(s,...);
>
> checking the return value is useless.
>
> Note that the man-page specifically states that ENOBUFS can't happen.
>
> You cannot assume that any sendto() data actually gets on the wire, much
> less to its destination. With any user-datagram-protocol, both ends,
> sender and receiver, have to work out what they will do with missing
> packets and packets received out-of-order.
>
>
> Cheers,
> Dick Johnson

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-02-06 22:16:46

by Chris Friesen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

"Richard B. Johnson" wrote:

[snip]
> Hackers code sendto as:
> sendto(s,...);
> Professional programmers use:
> (void)sendto(s,...);
>
> checking the return value is useless.
>
> Note that the man-page specifically states that ENOBUFS can't happen.

I don't know what your manpage says, but my manpage doesn't say anything about
ENOBUFS not being possible. From the man page:

"ENOBUFS The system was unable to allocate an internal memory block. The
operation may succeed when buffers become available."

> You cannot assume that any sendto() data actually gets on the wire, much
> less to its destination. With any user-datagram-protocol, both ends,
> sender and receiver, have to work out what they will do with missing
> packets and packets received out-of-order.

Hmm. I knew you couldn't assume it was delivered (the man page says so), but I
didn't know it doesn't guarantee it getting to the wire. The man page says that
"locally detected errors are indicated by a return value of -1". Furthermore,
it also says "When the message does not fit into the send buffer of the socket,
send normally blocks, unless the socket has been placed in non-blocking I/O
mode."

I would suggest that if the packet doesn't make it onto the wire, sendto()
should either a) block until it can send the packet (or return with EAGAIN, as
appropriate), or b) return an error.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-02-07 00:07:52

by Andi Kleen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Chris Friesen <[email protected]> writes:

> I've been looking around in the 2.4 networking stack, and I noticed that when
> the tulip (and no doubt many other) driver cannot put any more outgoing packets
> on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check
> this flag by calling netif_queue_stopped(). My concern is that if this flag is
> true, we return -ENETDOWN. Is this really the proper return code for this? If
> anything, the network is too active. It seems to me that it would make more
> sense to have some kind of congestion return code rather than claiming that the
> network is down.

The ENETDOWN path you're seeing only applies to queueless devices (like
loopback or a tunnel device). These should only set the queued stopped
flag when something is terrible wrong.

All real network devices have a queue and go through the qdisc.

>
> I think it would make sense to return -ENOBUFS in this case, as its already
> listed in the sendto() man page, and the description matches the error because
> the command could succeed if retried.
>
> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec. I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.

The qdisc queue acts like an IP network and deletes unnecessary packets.
There is no provision to block when it fills because that would have
many sideeffects and complicate the stack a lot. There is an return
code though that is passed up when the queue fills (NET_XMIT_DROP or
NET_XMIT_CN), but it's currently only used by TCP but not passed to
user space for UPD/RAW. It could be probably done with a special
socket option if there is a clear need.

-Andi

2002-02-07 00:18:05

by Alan

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec. I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.

That is correct UDP behaviour

2002-02-07 01:05:41

by Alan

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

> Hackers code sendto as:
> sendto(s,...);
> Professional programmers use:
> (void)sendto(s,...);

Remind me never to hire one of your professional programmers

> checking the return value is useless.

Not so. For a large number of situations its extremely informative.

2002-02-07 01:54:27

by Ion Badulescu

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Thu, 7 Feb 2002 00:26:20 +0000 (GMT), Alan Cox <[email protected]> wrote:
>> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
>> was calling sendto() on 217000 packets/sec, even though the wire could only
>> handle about 127000 packets/sec. I got no errors at all in sendto, even though
>> over a third of the packets were not actually being sent.
>
> That is correct UDP behaviour

This is totally untrue, unless the socket doing non-blocking I/O -- and
even then you get -1 and EAGAIN from sendto.

Otherwise sendto is very much a blocking operation which will block until
there is enough space in socket buffer to store the data. From there,
there is no way to "lose" that data before it hits the wire, unless of
course the network driver is broken and doesn't plug the upper layers when
its TX queue is full.

Think of it: if what you said were true, NFS over UDP would be totally
useless. But it's not, so if UDP data gets lost before it hits the wire,
it's usually a bug in the network driver.

>From the limited testing I just ran, I appears that starfire and 3c59x
handle this correctly, whereas tulip always loses a small number of
packets during a UDP storm. ttcp -us[rt] is very useful for such
testing...

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2002-02-07 01:55:37

by Alan

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

> > That is correct UDP behaviour
>
> This is totally untrue, unless the socket doing non-blocking I/O -- and
> even then you get -1 and EAGAIN from sendto.

Not the case.

> there is no way to "lose" that data before it hits the wire, unless of
> course the network driver is broken and doesn't plug the upper layers when
> its TX queue is full.

UDP is not flow controlled.

> Think of it: if what you said were true, NFS over UDP would be totally
> useless. But it's not, so if UDP data gets lost before it hits the wire,
> it's usually a bug in the network driver.

NFS does UDP flow control of its own. If it didnt it would indeed be
broken.

2002-02-07 02:09:50

by Ion Badulescu

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Thu, 7 Feb 2002, Alan Cox wrote:

> > there is no way to "lose" that data before it hits the wire, unless of
> > course the network driver is broken and doesn't plug the upper layers when
> > its TX queue is full.
>
> UDP is not flow controlled.

No, of course not, but this has *nothing* to do with UDP. The IP socket
itself is flow controlled, and so is the TX queue of the network driver.

Let me give you another example: ping -f. If what you said were true, ping -f
would send packets as fast as the CPU can generate into the black hole
called an IP raw socket, right? Well, that just doesn't happen, because
sendto/sendmsg will block until there is enough space in the TX queue of
the raw socket.

I'll state again: if data (UDP or otherwise) is lost after sendto()
returns success but before it hits the wire, something is BROKEN in that
IP stack.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2002-02-07 02:21:51

by Alan

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

> > UDP is not flow controlled.
>
> No, of course not, but this has *nothing* to do with UDP. The IP socket
> itself is flow controlled, and so is the TX queue of the network driver.

It is not flow controlled

> Let me give you another example: ping -f. If what you said were true, ping -f
> would send packets as fast as the CPU can generate into the black hole
> called an IP raw socket, right? Well, that just doesn't happen, because

Wrong. man ping. ping -f doesn't do what you apparently think it does.

Alan

2002-02-07 02:48:11

by Andi Kleen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Ion Badulescu <[email protected]> writes:
> I'll state again: if data (UDP or otherwise) is lost after sendto()
> returns success but before it hits the wire, something is BROKEN in that
> IP stack.

Your proposal would break select(). It would require UDP sendmsg to block
when the TX queue is full. Most applications using select do
not send the socket non blocking. If they select for writing and the
kernel signals the socket writable they expect not to block in the write.
As long as the only thing controlling the blocking is the per socket
send buffer that works out as long as the application is careful enough
not to fill its send buffer. If you would put the TX queue into the
blocking equation too this cannot be guaranteed anymore because the TX queue
is shared between all local processes and even forwarding. You would
get random blocking on select based applications, breaking them.

I BTW had a proposal for blocking the sender in TX some time ago but it was
luckily shot down by people who knew better than me.

-Andi

2002-02-07 02:55:15

by Ion Badulescu

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Thu, 7 Feb 2002, Alan Cox wrote:

> Wrong. man ping. ping -f doesn't do what you apparently think it does.

strace ping, you'll see it doing a

setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0

on its socket.

That's about the only way (aside from using a TBF queue, and other non
FIFO queues) you can lose data from a socket's queue.

Getting back to the NFS/UDP example: yes, NFS has its own flow control,
but that's not the point. The reason NFS/UDP works so well with large NFS
packets over a fully-switched *local* subnet is precisely because NFS's
flow control is almost never exercised in that case. Data simply doesn't
get lost -- never in the UDP socket's queue, and very rarely on the wire.

But you don't need to believe me. Just run the ttcp -uts test and explain
how come all the data makes it to the other end (again, over a
fully-switched local subnet) if:
1. ttcp has no clue about the wire speed (which it obviously doesn't) so
it can't do rate limiting
2. the UDP socket simply discards data when some internal queue fills up,
without blocking sendto() and without returning an error.

Moreover: please strace -T that ttcp -uts test, and notice how the time
for the system call goes up by 2 orders of magnitude (i.e. it blocks) as
soon as the socket queue fill up.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2002-02-07 04:21:49

by Ben Greear

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Alan Cox wrote:

>>>That is correct UDP behaviour
>>>
>>This is totally untrue, unless the socket doing non-blocking I/O -- and
>>even then you get -1 and EAGAIN from sendto.
>>
>
> Not the case.

Are you claiming that you will never see -1 and EAGAIN on a nonblocking
UDP socket with sendto? If so, I'll bet you a kernel patch that you are not
correct (I get to write the patch and you include it :) )

>
>
>>there is no way to "lose" that data before it hits the wire, unless of
>>course the network driver is broken and doesn't plug the upper layers when
>>its TX queue is full.
>>
>
> UDP is not flow controlled.

If it makes it through sendto, where can it be dropped before it
hits the wire? I doubt the socket buffers are anthing other than FIFO,
and the same goes for the ethernet/device queue. Since we (can) know
at sendto whether or not the PDU was enqueued for transmit, it seems
trivial to notify user space of success/failure of the local network
stack, and I believe this is what is done.

Now granted, it can be dropped anywhere outside of the machine, but
I can see no good reason to drop it inside the machine.

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-02-07 04:23:49

by Ben Greear

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

>>From the limited testing I just ran, I appears that starfire and 3c59x
> handle this correctly, whereas tulip always loses a small number of
> packets during a UDP storm. ttcp -us[rt] is very useful for such
> testing...

It would be interesting to see which side is dropping? Have you
coorelated ethernet driver counters to your sendto count?

>
> Ion
>
>

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-02-07 04:38:34

by Ion Badulescu

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Wed, 6 Feb 2002, Ben Greear wrote:

> >>From the limited testing I just ran, I appears that starfire and 3c59x
> > handle this correctly, whereas tulip always loses a small number of
> > packets during a UDP storm. ttcp -us[rt] is very useful for such
> > testing...
>
> It would be interesting to see which side is dropping? Have you
> coorelated ethernet driver counters to your sendto count?

It's hard for me to do it right now, because I don't have them in
isolation (they do NFS and other stuff), and I don't have iptables support
compiled into the kernel running the tulip. However:

starfire -> 3c59x
3c59x -> starfire
tulip -> 3c59x
tulip -> starfire

never lose data on a quiescent network:

ttcp-t: 83886080 bytes in 7.04 real seconds = 11640.36 KB/sec +++
ttcp-r: 83886080 bytes in 7.04 real seconds = 11641.10 KB/sec +++

whereas

3c59x -> tulip
starfire -> tulip

*always* lose several packets:

ttcp-t: 16777216 bytes in 1.40 real seconds = 11717.40 KB/sec +++
ttcp-r: 16769024 bytes in 1.40 real seconds = 11679.39 KB/sec +++

and

ttcp-t: 33554432 bytes in 2.80 real seconds = 11714.81 KB/sec +++
ttcp-r: 33456128 bytes in 2.80 real seconds = 11660.28 KB/sec +++

and

ttcp-t: 83886080 bytes in 7.00 real seconds = 11704.40 KB/sec +++
ttcp-r: 83722240 bytes in 7.00 real seconds = 11674.67 KB/sec +++

So I would tend to blame it on the tulip -- but the Rx side of it, not the
Tx, which this discussion was about...

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2002-02-07 04:41:04

by David Miller

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

From: Ben Greear <[email protected]>
Date: Wed, 06 Feb 2002 21:21:09 -0700

Alan Cox wrote:

> UDP is not flow controlled.

If it makes it through sendto, where can it be dropped before it
hits the wire?

If the packet ends up being fragmented on the way out and the socket
cannot take on the allocation against it's buffer space.

2002-02-07 04:56:55

by Ben Greear

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

David S. Miller wrote:

> From: Ben Greear <[email protected]>
> Date: Wed, 06 Feb 2002 21:21:09 -0700
>
> Alan Cox wrote:
>
> > UDP is not flow controlled.
>
> If it makes it through sendto, where can it be dropped before it
> hits the wire?
>
> If the packet ends up being fragmented on the way out and the socket
> cannot take on the allocation against it's buffer space.

In the fragmentation case (at least over 1500 MTU ethernet), the
headers are a relatively small portion of the total PDU, right?
So, if we reserved 10-15% (or whatever it works out to) that should
make it so we never drop the packet due to fragmentation, right? I can't see any reason
not to reserve this space, because sending a little later is definately
better than going through the work to send it sooner but then having to
drop it down in the local kernel. We may only want to reserve the buffers
when they are fairly large (ie not you your very small and slow embedded devices
where memory is very precious).

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-02-07 06:18:31

by Chris Friesen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Andi Kleen wrote:
>
> Ion Badulescu <[email protected]> writes:
> > I'll state again: if data (UDP or otherwise) is lost after sendto()
> > returns success but before it hits the wire, something is BROKEN in that
> > IP stack.
>
> Your proposal would break select(). It would require UDP sendmsg to block
> when the TX queue is full. Most applications using select do
> not send the socket non blocking. If they select for writing and the
> kernel signals the socket writable they expect not to block in the write.
> As long as the only thing controlling the blocking is the per socket
> send buffer that works out as long as the application is careful enough
> not to fill its send buffer. If you would put the TX queue into the
> blocking equation too this cannot be guaranteed anymore because the TX queue
> is shared between all local processes and even forwarding. You would
> get random blocking on select based applications, breaking them.

I don't see the problem. So sendto() blocks if there is no room on the socket
buffer. Fine. So if there's room on the socket buffer we take the packet and
put in on the buffer, and sendto() returns.

Now, for each socket we've got a buffer of packets that want to get onto the
device driver tx queue. So we use some kind of algorithm to pick which packets
to move from the group of socket buffers to the device driver tx queue. If the
app calls sendto() before there is space in the socket buffer, then sendto()
blocks. select() should return whether or not there is space in the socket
buffer. Eventually, every packet that gets put into a socket buffer makes it
out onto the wire. Congestion is dealt with by leaving packets in the socket
buffers until they can be guaranteed a spot in the device tx queue. I assume we
would try and add it to the tx queue, and remove it from the socket buffer if
the add succeeds.

I just don't see why sendto() would accept the packet and then later on it gets
dropped.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-02-07 09:25:02

by Luis Garces

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Alan Cox wrote:

>> I ran into a somewhat related issue on a 2.2.16 system, where I
>> had an app that was calling sendto() on 217000 packets/sec, even
>> though the wire could only handle about 127000 packets/sec. I
>> got no errors at all in sendto, even though over a third of the
>> packets were not actually being sent.
>>
>
> That is correct UDP behaviour -
>

Yes, TCP provides a reliable point-to-point path, and UDP doesn't. The
problem is considering where does this unreliability starts in the UDP
path. In Alan's opinion (I think) it starts in the very moment data is
passed to the call to sendto() (i.e, includes the kernel in the
unreliable UDP path). Perhaps it is a little sad to see the kernel as
something lossy, but I think it's the nature of UDP.

--
Luis
****

2002-02-07 10:59:21

by Alan

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

> > Wrong. man ping. ping -f doesn't do what you apparently think it does.
>
> strace ping, you'll see it doing a
> setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
>
> on its socket.

Read the ping manual page. Then when you understand what ping -f does
come back and have a useful conversation.

2002-02-07 13:43:10

by Richard B. Johnson

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Wed, 6 Feb 2002, Chris Friesen wrote:

> "Richard B. Johnson" wrote:
>
> [snip]
> > Hackers code sendto as:
> > sendto(s,...);
> > Professional programmers use:
> > (void)sendto(s,...);
> >
> > checking the return value is useless.
> >
> > Note that the man-page specifically states that ENOBUFS can't happen.
>
> I don't know what your manpage says, but my manpage doesn't say anything about
> ENOBUFS not being possible. From the man page:
>
> "ENOBUFS The system was unable to allocate an internal memory block. The
> operation may succeed when buffers become available."

ENOBUFS
The output queue for a network interface was full.
This generally indicates that the interface has
stopped sending, but may be caused by transient
congestion. (This cannot occur in Linux, packets
are just silently dropped when a device queue over?
flows.)

Linux Man Page July 1999 1

Script done on Thu Feb 7 08:35:39 2002

Distributed with RedHat 7

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.

2002-02-07 15:53:02

by Chris Friesen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Andi Kleen wrote:
>
> Chris Friesen <[email protected]> writes:
>
> > I've been looking around in the 2.4 networking stack, and I noticed that when
> > the tulip (and no doubt many other) driver cannot put any more outgoing packets
> > on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check
> > this flag by calling netif_queue_stopped(). My concern is that if this flag is
> > true, we return -ENETDOWN. Is this really the proper return code for this? If
> > anything, the network is too active. It seems to me that it would make more
> > sense to have some kind of congestion return code rather than claiming that the
> > network is down.
>
> The ENETDOWN path you're seeing only applies to queueless devices (like
> loopback or a tunnel device). These should only set the queued stopped
> flag when something is terrible wrong.
>
> All real network devices have a queue and go through the qdisc.

Okay, I must be missing something, so can you enlighten me? I can't figure out
where the qdisc is attached to the ethernet device.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-02-07 16:02:14

by Andi Kleen

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Thu, Feb 07, 2002 at 10:59:56AM -0500, Chris Friesen wrote:
> Okay, I must be missing something, so can you enlighten me? I can't figure out
> where the qdisc is attached to the ethernet device.

net/core/dev.c:dev_open -> dev_activate.

-Andi

2002-02-07 16:34:16

by Gerold Jury

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

This is off topic but close.
UDP packets are not dropped when the UDP socket is shutdown(socket,0); for
receive at least on 2.4.17.
attached is a small proof.
Anyone with a hint or a fix ?

Thanks
Gerold

On Thursday 07 February 2002 14:44, Richard B. Johnson wrote:
>
> ENOBUFS
> The output queue for a network interface was full.
> This generally indicates that the interface has
> stopped sending, but may be caused by transient
> congestion. (This cannot occur in Linux, packets
> are just silently dropped when a device queue over?
> flows.)
>
>
> Linux Man Page July 1999 1
>

Attachments:

udpshutdown.c (1.25 kB)

2002-02-08 21:14:58

by Pavel Machek

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

Hi!

> > > Wrong. man ping. ping -f doesn't do what you apparently think it does.
> >
> > strace ping, you'll see it doing a
> > setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
> >
> > on its socket.
>
> Read the ping manual page. Then when you understand what ping -f does
> come back and have a useful conversation.

But I guess it *would* be usefull to have -F option saying "feed data
as fast as possible", right? And it would be nice if this option did
not eat 100% cpu when possible, right?

So what he is asking for is pretty usefull behaviour.
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-02-08 21:40:36

by Ion Badulescu

[permalink] [raw]

Subject: Re: want opinions on possible glitch in 2.4 network error reporting

On Fri, 8 Feb 2002, Pavel Machek wrote:

> Hi!
>
> > > > Wrong. man ping. ping -f doesn't do what you apparently think it does.
> > >
> > > strace ping, you'll see it doing a
> > > setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
> > >
> > > on its socket.
> >
> > Read the ping manual page. Then when you understand what ping -f does
> > come back and have a useful conversation.
>
> But I guess it *would* be usefull to have -F option saying "feed data
> as fast as possible", right? And it would be nice if this option did
> not eat 100% cpu when possible, right?
>
> So what he is asking for is pretty usefull behaviour.

I'm not asking for it. I'm saying this is what we already have. Too bad
people won't listen -- and yes I know ping -f was a bad example. A
blocking sendto() *will* block (surprise surprise), even though it *might*
throw the data away later on.

Indeed, as Davem stated, a UDP socket will lose data under memory
pressure. In real life this hardly ever happens, however, even with large
message sizes: I just tested with sizes up to 52000, which is just about
as large as you'll ever see in real environments.

Also: I'm just dying to be enlightened about how a dumb program like
ttcp -u, doing a totally dumb "while (1) sendto();", can manage to score
sending rates identical to the raw wire speed, if indeed sendto() never
blocks and simply throws away the data:

apollo:/# ttcp -utsl 53000 zeus
ttcp-t: buflen=53000, nbuf=2048, align=16384/0, port=5001 udp -> sybase2
ttcp-t: socket
ttcp-t: 108544000 bytes in 9.02 real seconds = 11745.26 KB/sec +++
ttcp-t: 2054 I/O calls, msec/call = 4.50, calls/sec = 227.59
ttcp-t: 0.0user 0.2sys 0:09real 2% 0i+0d 0maxrss 0+13pf 0+0csw
zeus:/var/lib/pgsql# ttcp -ursl 53000
ttcp-r: buflen=53000, nbuf=2048, align=16384/0, port=5001 udp
ttcp-r: socket
ttcp-r: 108544000 bytes in 9.03 real seconds = 11741.76 KB/sec +++
ttcp-r: 2050 I/O calls, msec/call = 4.51, calls/sec = 227.08
ttcp-r: 0.0user 0.1sys 0:09real 1% 0i+0d 0maxrss 0+12pf 0+0csw

11745KB/sec sounds suspiciously close to the 100Mb/sec wire speed.

and, for reference, just to make sure ttcp wasn't lying to me:

zeus:/var/lib/pgsql# iptables -L -n -v
Chain INPUT (policy ACCEPT 7217K packets, 3137M bytes)
pkts bytes target prot opt in out source destination
2051 108M udp -- * * 10.2.10.216 0.0.0.0/0 udp dpt:5001

But no, it's so much easier to incompletely quote a message and then claim
the other person has no idea about what he's talking about. Yes, Alan,
that's precisely what you did.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.