Hi folks,
I often build Debian packages inside a chroot. Today I discovered a
failure during an "aptitude update", which is a command to download new
package lists for the package management. In strace, the lines around
the failure look like this:
99% [Working]) = 14 14
[pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
[pid 5986] gettimeofday({1197576353, 670510}, NULL) = 0
[pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
[pid 5986] gettimeofday({1197576354, 173902}, NULL) = 0
[pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid 5986] select(6, [3 4 5], [], NULL, {0, 500000} <unfinished ...>
[pid 5988] <... select resumed> ) = 1 (in [3], left {105, 0})
[pid 5988] read(3, "", 56559) = 0
[pid 5988] fcntl64(-1, F_GETFL) = -1 EBADF (Bad file
descriptor)
[pid 5988] fcntl64(-1, F_SETFL,
O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
= -1 EBADF (Bad file descriptor)
[pid 5988] write(2, ""..., 41FATAL -> Could not set non-blocking flag
) = 41
[pid 5988] write(2, ""..., 19Bad file descriptor) = 19
[pid 5988] write(2, ""..., 1
) = 1
[pid 5988] exit_group(100) = ?
Process 5988 detached
This happened with a kernel after 2.6.24-rc5
(4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
reproduce the failure, so it looks like a kernel issue to me.
Regards,
Tino
On Thursday, 13 of December 2007, Tino Keitel wrote:
> Hi folks,
>
> I often build Debian packages inside a chroot. Today I discovered a
> failure during an "aptitude update", which is a command to download new
> package lists for the package management. In strace, the lines around
> the failure look like this:
>
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> [pid 5986] gettimeofday({1197576353, 670510}, NULL) = 0
> [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> [pid 5986] gettimeofday({1197576354, 173902}, NULL) = 0
> [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000} <unfinished ...>
> [pid 5988] <... select resumed> ) = 1 (in [3], left {105, 0})
> [pid 5988] read(3, "", 56559) = 0
> [pid 5988] fcntl64(-1, F_GETFL) = -1 EBADF (Bad file
> descriptor)
> [pid 5988] fcntl64(-1, F_SETFL,
> O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
> = -1 EBADF (Bad file descriptor)
> [pid 5988] write(2, ""..., 41FATAL -> Could not set non-blocking flag
> ) = 41
> [pid 5988] write(2, ""..., 19Bad file descriptor) = 19
> [pid 5988] write(2, ""..., 1
> ) = 1
> [pid 5988] exit_group(100) = ?
> Process 5988 detached
>
> This happened with a kernel after 2.6.24-rc5
> (4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
> and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
> reproduce the failure, so it looks like a kernel issue to me.
Yes, it does.
I'm not sure who to forward it to, though. Andrew, can you help, please?
Tino Keitel wrote:
> Hi folks,
>
> I often build Debian packages inside a chroot. Today I discovered a
> failure during an "aptitude update", which is a command to download new
> package lists for the package management. In strace, the lines around
> the failure look like this:
>
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> [pid 5986] gettimeofday({1197576353, 670510}, NULL) = 0
> [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> [pid 5986] gettimeofday({1197576354, 173902}, NULL) = 0
> [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> 99% [Working]) = 14 14
> [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000} <unfinished ...>
> [pid 5988] <... select resumed> ) = 1 (in [3], left {105, 0})
> [pid 5988] read(3, "", 56559) = 0
> [pid 5988] fcntl64(-1, F_GETFL) = -1 EBADF (Bad file
> descriptor)
> [pid 5988] fcntl64(-1, F_SETFL,
> O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
> = -1 EBADF (Bad file descriptor)
> [pid 5988] write(2, ""..., 41FATAL -> Could not set non-blocking flag
> ) = 41
> [pid 5988] write(2, ""..., 19Bad file descriptor) = 19
> [pid 5988] write(2, ""..., 1
> ) = 1
> [pid 5988] exit_group(100) = ?
> Process 5988 detached
>
> This happened with a kernel after 2.6.24-rc5
> (4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
> and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
> reproduce the failure, so it looks like a kernel issue to me.
With this part of strace output it seems like an obvious userspace bug
(calling fcntl on a -1 file descriptor). Could be some other change in
behavior or timing difference is triggering the bug,however.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Fri, 14 Dec 2007 01:45:39 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> On Thursday, 13 of December 2007, Tino Keitel wrote:
> > Hi folks,
> >
> > I often build Debian packages inside a chroot. Today I discovered a
> > failure during an "aptitude update", which is a command to download new
> > package lists for the package management. In strace, the lines around
> > the failure look like this:
> >
> > 99% [Working]) = 14 14
> > [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> > [pid 5986] gettimeofday({1197576353, 670510}, NULL) = 0
> > [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> > [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> > 99% [Working]) = 14 14
> > [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000}) = 0 (Timeout)
> > [pid 5986] gettimeofday({1197576354, 173902}, NULL) = 0
> > [pid 5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
> > [pid 5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> > 99% [Working]) = 14 14
> > [pid 5986] select(6, [3 4 5], [], NULL, {0, 500000} <unfinished ...>
> > [pid 5988] <... select resumed> ) = 1 (in [3], left {105, 0})
> > [pid 5988] read(3, "", 56559) = 0
> > [pid 5988] fcntl64(-1, F_GETFL) = -1 EBADF (Bad file
> > descriptor)
> > [pid 5988] fcntl64(-1, F_SETFL,
> > O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
> > = -1 EBADF (Bad file descriptor)
> > [pid 5988] write(2, ""..., 41FATAL -> Could not set non-blocking flag
> > ) = 41
> > [pid 5988] write(2, ""..., 19Bad file descriptor) = 19
> > [pid 5988] write(2, ""..., 1
> > ) = 1
> > [pid 5988] exit_group(100) = ?
> > Process 5988 detached
> >
> > This happened with a kernel after 2.6.24-rc5
> > (4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
> > and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
> > reproduce the failure, so it looks like a kernel issue to me.
>
> Yes, it does.
>
> I'm not sure who to forward it to, though. Andrew, can you help, please?
hm, I missed this email, sorry.
That application is passing an fd of -1 into the kernel's fcntl64(). Could
be an application bug, could be a kernel change which triggered an
application bug, could be a kernel bug.
Can we see more of the strace output please? Let's see if we can find
where the application got that -1 from.
On Thursday, 13 of December 2007, Tino Keitel wrote:
> Hi folks,
>
> I often build Debian packages inside a chroot. Today I discovered a
> failure during an "aptitude update", which is a command to download new
> package lists for the package management. In strace, the lines around
> the failure look like this:
Can you please have a look at http://bugzilla.kernel.org/show_bug.cgi?id=9557
and provide the information requested in there?
Thanks,
Rafael
On Thu, Dec 13, 2007 at 09:16:01PM +0100, Tino Keitel wrote:
> Hi folks,
>
> I often build Debian packages inside a chroot. Today I discovered a
> failure during an "aptitude update", which is a command to download new
> package lists for the package management. In strace, the lines around
> the failure look like this:
Are you using XFS? This looks a lot like the bug I introduced where
i_rdev gets a wrong value assigned after mknod. In that case please
try -rc7 as it has a fix for that particular problem.
On Sat, Jan 12, 2008 at 21:10:15 +0000, Christoph Hellwig wrote:
> On Thu, Dec 13, 2007 at 09:16:01PM +0100, Tino Keitel wrote:
> > Hi folks,
> >
> > I often build Debian packages inside a chroot. Today I discovered a
> > failure during an "aptitude update", which is a command to download new
> > package lists for the package management. In strace, the lines around
> > the failure look like this:
>
> Are you using XFS? This looks a lot like the bug I introduced where
Yes, I use XFS.
> i_rdev gets a wrong value assigned after mknod. In that case please
> try -rc7 as it has a fix for that particular problem.
OK, I'll try it.
Regards,
Tino
On Sun, Jan 13, 2008 at 09:56:55 +0100, Tino Keitel wrote:
> On Sat, Jan 12, 2008 at 21:10:15 +0000, Christoph Hellwig wrote:
> > On Thu, Dec 13, 2007 at 09:16:01PM +0100, Tino Keitel wrote:
> > > Hi folks,
> > >
> > > I often build Debian packages inside a chroot. Today I discovered a
> > > failure during an "aptitude update", which is a command to download new
> > > package lists for the package management. In strace, the lines around
> > > the failure look like this:
> >
> > Are you using XFS? This looks a lot like the bug I introduced where
>
> Yes, I use XFS.
>
> > i_rdev gets a wrong value assigned after mknod. In that case please
> > try -rc7 as it has a fix for that particular problem.
>
> OK, I'll try it.
I can not reproduce it anymore with 2.6.24-rc7.
Regards,
Tino