2005-09-20 14:25:53

by Peter Duellings

[permalink] [raw]
Subject: kernel error in system call accept() under kernel 2.6.8

----------------------------------------------------------------------
One line summary of the problem:
linux accept() system call does not set always errno if the
returnvalue is negative.

----------------------------------------------------------------------
Full description of the problem/report:
Obviously there are some cases where the accept() system call does
not set the errno variable if the accept() system call returns
with a value less than zero:

Distribution : Linux Fedora Core 2 - but with kernel 2.6.8
Sample code:
--snip----------
....
//accept may return with a protocol error, simply try again
while( (n = accept(m_ListenFd, (struct sockaddr *) cliaddr, &len)) < 0)
{
Log.Log("Error accept, fd=%d, addrlen=%d, len=%d, errno=%d, %s",
m_ListenFd,
m_AddrLen, len, errno, strerror_r(errno, l_strebuf, sizeof(l_strebuf)));
if (errno == EPROTO || errno == ECONNABORTED) //connection already
aborted
{
Log.Log("connection aborted, wait for next");
continue; //next try
} else if (errno == EINTR) { //signal
if (CnThread::TestCancel(m_Log)) {
Log.Log("thread is cancelled");
delete[] cliaddr;
return -1;
} else {
Log.Log("EINTR, try again");
continue; //next try
}
}
else
{
throw CnException(__FILE__, __LINE__, "accept error errno=%d, %s",
errno,
strerror_r(errno, l_strebuf, sizeof(l_strebuf)));
}
}
...
--snip-----
output:
CnTcpServer::WaitClient Error accept, fd=19, addrlen=16, len=16,
errno=0, Success


----------------------------------------------------------------------
Keywords (i.e., modules, networking, kernel):
network accept errno error return value
----------------------------------------------------------------------
Kernel version (from /proc/version):
cat /proc/version
Linux version 2.6.8-1.521 (root@wsa92_D2_FEDTest) (gcc version 3.3.3
20040412 (Red Hat Linux 3.3.3-7)) #3 Fri Jul 8 11:08:56 CEST
2005
----------------------------------------------------------------------
----------------------------------------------------------------------
Question:
Is there any update/information/explanation about this behaviour available??



2005-09-20 15:05:54

by Alan

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

On Maw, 2005-09-20 at 16:25 +0200, Peter Duellings wrote:
> Obviously there are some cases where the accept() system call does
> not set the errno variable if the accept() system call returns
> with a value less than zero:

Not actually possible. The kernel returns a positive value, zero or a
negative value which is the errno code negated. It has no mechanism to
return a negative value and not error.

What does strace show for the failing case ?

2005-09-20 15:26:20

by Peter Duellings

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

Alan,


will try to get a strace.


Thanx,

Peter

Alan Cox wrote:
> On Maw, 2005-09-20 at 16:25 +0200, Peter Duellings wrote:
>
>>Obviously there are some cases where the accept() system call does
>>not set the errno variable if the accept() system call returns
>>with a value less than zero:
>
>
> Not actually possible. The kernel returns a positive value, zero or a
> negative value which is the errno code negated. It has no mechanism to
> return a negative value and not error.
>
> What does strace show for the failing case ?
>


2005-09-20 15:26:21

by Peter Duellings

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

Hi Ben,

if Log.Log would modify errno the Log.Log debug output should
not be affected since the value of errno - from my understanding -
is copied on the stack *before* Log.Log is called.
Or did I forget something?



Thanx,


Peter


Benjamin LaHaise wrote:
> On Tue, Sep 20, 2005 at 04:25:08PM +0200, Peter Duellings wrote:
>
>>//accept may return with a protocol error, simply try again
>>while( (n = accept(m_ListenFd, (struct sockaddr *) cliaddr, &len)) < 0)
>>{
>> Log.Log("Error accept, fd=%d, addrlen=%d, len=%d, errno=%d, %s",
>>m_ListenFd,
>>m_AddrLen, len, errno, strerror_r(errno, l_strebuf, sizeof(l_strebuf)));
>> if (errno == EPROTO || errno == ECONNABORTED) //connection already
>
>
> Let's see here: what happens if Log.Log() performs a syscall to, say,
> write out the log message to a buffer?
>
> -ben

2005-09-20 15:29:48

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

On Tue, Sep 20, 2005 at 05:20:03PM +0200, Peter Duellings wrote:
> Hi Ben,
>
> if Log.Log would modify errno the Log.Log debug output should
> not be affected since the value of errno - from my understanding -
> is copied on the stack *before* Log.Log is called.
> Or did I forget something?

errno does not reside on the stack.

-ben

2005-09-20 15:33:25

by Peter Duellings

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

Ben,


Right. But before Log.Log is called arguments of methods are
copied on the stack. That means, also the current content of
errno is copied. And "current" means in that case before the call
to Log.Log is performed (errno is transferred by value - not by
reference).

-Peter


Benjamin LaHaise wrote:

> On Tue, Sep 20, 2005 at 05:20:03PM +0200, Peter Duellings wrote:
>
>>Hi Ben,
>>
>>if Log.Log would modify errno the Log.Log debug output should
>>not be affected since the value of errno - from my understanding -
>>is copied on the stack *before* Log.Log is called.
>>Or did I forget something?
>
>
> errno does not reside on the stack.
>
> -ben


2005-09-20 15:36:18

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

On Tue, Sep 20, 2005 at 05:32:49PM +0200, Peter Duellings wrote:
> Right. But before Log.Log is called arguments of methods are
> copied on the stack. That means, also the current content of
> errno is copied. And "current" means in that case before the call
> to Log.Log is performed (errno is transferred by value - not by
> reference).

And strerror_r() modifies errno. You can't rely on the order arguments
to functions are evaluated in.

-ben

2005-10-05 15:41:39

by Peter Duellings

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

Alan,


meanwhile we could generate a strace for the problem.
However, I guess that the strace does not give the desired
information (see parts below).

Additionally we added in the program the output of the return
value of the accept() system call . The return value is -512
and the errno value is 0!
Usually the return value should be -1 and the errno should
contain the value without the sign of the error.


Any idea or comment on this ?

Thanks,


Peter D?llings

---------<strace>------------
2682 09:25:29.238663 accept(21, <unfinished ...>
2688 09:25:29.263486 accept(33, {sa_family=AF_INET, sin_port=htons(32811),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 40 <27.171270>
2688 09:25:56.589969 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
restarted) <0.385453>
2688 09:25:56.975563 --- SIGCHLD (Child exited) @ 0 (0) ---
2688 09:25:56.975676 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
restarted) <0.205963>
2688 09:25:57.181770 --- SIGCHLD (Child exited) @ 0 (0) ---
2688 09:25:57.181842 accept(33, <unfinished ...>
2682 09:25:57.231961 <... accept resumed> {sa_family=AF_INET,
sin_port=htons
(32882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 43 <27.993066>
2682 09:25:57.234320 accept(21, <unfinished ...>
2688 09:25:57.538314 <... accept resumed> 0x82fa7e0, [16]) = ? ERESTARTSYS
(To be restarted) <0.356435>
2688 09:25:57.538429 --- SIGCHLD (Child exited) @ 0 (0) ---
2688 09:25:57.538488 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
restarted) <0.015688>
2688 09:25:57.554315 --- SIGCHLD (Child exited) @ 0 (0) ---
2688 09:25:57.554370 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
restarted) <0.192660>
2688 09:25:57.747151 --- SIGCHLD (Child exited) @ 0 (0) ---
2688 09:25:57.747236 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
restarted) <0.097813>
....
.
---------</strace>------------



Alan Cox wrote:

> On Maw, 2005-09-20 at 16:25 +0200, Peter Duellings wrote:
>
>>Obviously there are some cases where the accept() system call does
>>not set the errno variable if the accept() system call returns
>>with a value less than zero:
>
>
> Not actually possible. The kernel returns a positive value, zero or a
> negative value which is the errno code negated. It has no mechanism to
> return a negative value and not error.
>
> What does strace show for the failing case ?
>
>


--


2005-10-05 17:40:05

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8


On Wed, 5 Oct 2005, Peter Duellings wrote:

> Alan,
>
> meanwhile we could generate a strace for the problem.
> However, I guess that the strace does not give the desired
> information (see parts below).
>
> Additionally we added in the program the output of the return
> value of the accept() system call . The return value is -512
> and the errno value is 0!
> Usually the return value should be -1 and the errno should
> contain the value without the sign of the error.
>
>
> Any idea or comment on this ?
>
> Thanks,
>

If you run an ordinary 'C' runtime library, not something
that you wrote to interface with the kernel, then a return
value of -512 with a 0 errno value is not possible unless
you have code that is trashing something in user-space.

The interface, whether a sys-call or an interrupt, takes
the return value, negates it, and puts it into errno if
it was negative, then it sets the return value to -1.
This is common code for all system-calls, therefore
nothing you can turn off for a particular system call.

A possible problem may be that you are not using the
proper 'C' runtime library headers to define what
errno actually is. On many (most) runtime libraries
'errno' is NOT 'extern int errno'. Instead it is
actually a de-reference of the return-value of a
function that located the proper variable for your
particular thread! You can't create your own global
errno and have it magically updated.

>
> Peter D?llings
>
> ---------<strace>------------
> 2682 09:25:29.238663 accept(21, <unfinished ...>
> 2688 09:25:29.263486 accept(33, {sa_family=AF_INET, sin_port=htons(32811),
> sin_addr=inet_addr("127.0.0.1")}, [16]) = 40 <27.171270>
> 2688 09:25:56.589969 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.385453>
> 2688 09:25:56.975563 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:56.975676 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.205963>
> 2688 09:25:57.181770 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.181842 accept(33, <unfinished ...>
> 2682 09:25:57.231961 <... accept resumed> {sa_family=AF_INET,
> sin_port=htons
> (32882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 43 <27.993066>
> 2682 09:25:57.234320 accept(21, <unfinished ...>
> 2688 09:25:57.538314 <... accept resumed> 0x82fa7e0, [16]) = ? ERESTARTSYS
> (To be restarted) <0.356435>
> 2688 09:25:57.538429 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.538488 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.015688>
> 2688 09:25:57.554315 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.554370 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.192660>
> 2688 09:25:57.747151 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.747236 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.097813>
> ....
> .
> ---------</strace>------------
>
>
>
> Alan Cox wrote:
>
>> On Maw, 2005-09-20 at 16:25 +0200, Peter Duellings wrote:
>>
>>> Obviously there are some cases where the accept() system call does
>>> not set the errno variable if the accept() system call returns
>>> with a value less than zero:
>>
>>
>> Not actually possible. The kernel returns a positive value, zero or a
>> negative value which is the errno code negated. It has no mechanism to
>> return a negative value and not error.
>>
>> What does strace show for the failing case ?
>>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-10-05 17:53:51

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: kernel error in system call accept() under kernel 2.6.8

On Wed, 2005-10-05 at 17:41 +0200, Peter Duellings wrote:
[....]
> meanwhile we could generate a strace for the problem.
> However, I guess that the strace does not give the desired
> information (see parts below).

Yup. Where exactly does the kernel return -512 in the strace output?
I can see only 40 and 43 which are valid file descriptors.

> Additionally we added in the program the output of the return
> value of the accept() system call . The return value is -512
> and the errno value is 0!
>
> Usually the return value should be -1 and the errno should
> contain the value without the sign of the error.
>
>
> Any idea or comment on this ?

You are doing somewhere else something wrong so post a minimalistic
piece of source code of how you get to it.
Or show the *failing* case in the strace output.

> Thanks,

[...]
> ---------<strace>------------
> 2682 09:25:29.238663 accept(21, <unfinished ...>
> 2688 09:25:29.263486 accept(33, {sa_family=AF_INET, sin_port=htons(32811),
> sin_addr=inet_addr("127.0.0.1")}, [16]) = 40 <27.171270>
> 2688 09:25:56.589969 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.385453>
> 2688 09:25:56.975563 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:56.975676 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.205963>
> 2688 09:25:57.181770 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.181842 accept(33, <unfinished ...>
> 2682 09:25:57.231961 <... accept resumed> {sa_family=AF_INET,
> sin_port=htons
> (32882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 43 <27.993066>
> 2682 09:25:57.234320 accept(21, <unfinished ...>
> 2688 09:25:57.538314 <... accept resumed> 0x82fa7e0, [16]) = ? ERESTARTSYS
> (To be restarted) <0.356435>
> 2688 09:25:57.538429 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.538488 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.015688>
> 2688 09:25:57.554315 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.554370 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.192660>
> 2688 09:25:57.747151 --- SIGCHLD (Child exited) @ 0 (0) ---
> 2688 09:25:57.747236 accept(33, 0x82fa7e0, [16]) = ? ERESTARTSYS (To be
> restarted) <0.097813>
> ....
> .
> ---------</strace>------------
[...]

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services