2012-10-15 21:24:29

by Pavel Roskin

[permalink] [raw]
Subject: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

Hello!

I tried the current mainline Linux on Fedora 16 x64_64 and found that I
cannot login in gdm. I'm using gdm with LXDE. After a few minutes,
kernel messages appear indicating a hung process gnome-settings-daemon.

Bisection found that the commit to blame is

commit 8280d16172243702ed43432f826ca6130edb4086
Author: Al Viro <[email protected]>
Date: Tue Aug 21 12:11:46 2012 -0400

new helper: replace_fd()

analog of dup2(), except that it takes struct file * as source.

Signed-off-by: Al Viro <[email protected]>

Reverting the part of the patch that changes fs/exec.c is sufficient to
fix the login behavior.

I was able to forward port the change to the current mainline Linux.
The call to replace_fd() was moved to fs/coredump.c, so the patch
changes it. Also, it copies __clear_close_on_exec() and
__set_open_fd() from fs/file.c

The .config file, the kernel log and the forward ported patch are
attached.

I'll try to figure out why replace_fd() is not good in
umh_pipe_setup(), but I'd like to report the problem as soon as
possible, as it's very visible to the users and I don't want others to
spend hours bisecting it.

--
Regards,
Pavel Roskin


Attachments:
(No filename) (1.16 kB)
dotconfig (76.38 kB)
kernel.log (189.97 kB)
revert-replace_fd.patch (2.60 kB)
Download all attachments

2012-10-15 21:51:32

by Al Viro

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

On Mon, Oct 15, 2012 at 05:14:37PM -0400, Pavel Roskin wrote:
> Hello!
>
> I tried the current mainline Linux on Fedora 16 x64_64 and found that I
> cannot login in gdm. I'm using gdm with LXDE. After a few minutes,
> kernel messages appear indicating a hung process gnome-settings-daemon.

What is it hanging on?

> I'll try to figure out why replace_fd() is not good in
> umh_pipe_setup(), but I'd like to report the problem as soon as
> possible, as it's very visible to the users and I don't want others to
> spend hours bisecting it.
>
> --
> Regards,
> Pavel Roskin

[snip]

> [ 241.483499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.484888] gnome-settings- D 0000000000000000 6136 3155 1 0x00000000
> [ 241.486265] ffff8801298afcc8 0000000000000046 ffff88012993b420 0000000000000246
> [ 241.487652] ffff8801298affd8 ffff8801298affd8 ffff8801298affd8 ffff8801298affd8
> [ 241.489035] ffff88012993a2c0 ffff88012993b420 ffff880128cb0b80 ffff88012993b420
> [ 241.490421] Call Trace:
> [ 241.491765] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.493116] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.494467] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.495784] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.497090] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.498384] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.499673] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.500964] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.502239] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.503525] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.504813] no locks held by gnome-settings-/3155.
> [ 241.506095] INFO: task gnome-settings-:3156 blocked for more than 120 seconds.
> [ 241.507405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.508721] gnome-settings- D 00000000ffffffff 5104 3156 1 0x00000000
> [ 241.510056] ffff88011c80bcc8 0000000000000046 ffff88011c80bc78 ffffffff8109ad9d
> [ 241.511414] ffff88011c80bfd8 ffff88011c80bfd8 ffff88011c80bfd8 ffff88011c80bfd8
> [ 241.512780] ffff8801279fd6e0 ffff88012993a2c0 ffff880129809d28 ffff88012993a2c0
> [ 241.514134] Call Trace:
> [ 241.515480] [<ffffffff8109ad9d>] ? trace_hardirqs_on_caller+0x7d/0x120
> [ 241.516857] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.518219] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.519592] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.520952] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.522315] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.523687] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.525066] [<ffffffff8109b8a4>] ? __lock_is_held+0x54/0x70
> [ 241.526436] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.527841] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.529237] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.530625] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.532009] no locks held by gnome-settings-/3156.
> [ 241.533386] INFO: task gnome-settings-:3157 blocked for more than 120 seconds.
> [ 241.534778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.536196] gnome-settings- D 00000000ffffffff 5432 3157 1 0x00000000
> [ 241.537632] ffff8801279efcc8 0000000000000046 ffff88012993c580 0000000000000246
> [ 241.539069] ffff8801279effd8 ffff8801279effd8 ffff8801279effd8 ffff8801279effd8
> [ 241.540515] ffff88012661e840 ffff88012993c580 ffff880128cb0b80 ffff88012993c580
> [ 241.541961] Call Trace:
> [ 241.543377] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.544802] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.546226] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.547642] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.549053] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.550468] [<ffffffff815d940b>] ? _raw_spin_unlock_irq+0x2b/0x50
> [ 241.551878] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.553297] [<ffffffff81122e77>] ? might_fault+0x57/0xb0
> [ 241.554726] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.556148] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.557588] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.557590] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.557595] no locks held by gnome-settings-/3157.

> @@ -443,14 +453,23 @@ static void wait_for_dump_helpers(struct file *file)
> static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> {
> struct file *files[2];
> + struct fdtable *fdt;
> struct coredump_params *cp = (struct coredump_params *)info->data;
> + struct files_struct *cf = current->files;
> int err = create_pipe_files(files, 0);
> if (err)
> return err;
>
> cp->file = files[1];
>
> - replace_fd(0, files[0], 0);
> + sys_close(0);
> + fd_install(0, files[0]);
> + spin_lock(&cf->file_lock);
> + fdt = files_fdtable(cf);
> + __set_open_fd(0, fdt);
> + __clear_close_on_exec(0, fdt);
> + spin_unlock(&cf->file_lock);

Very interesting. Let's see what could be going on there. First of
all, create_pipe_files() has return 0. There's only one exit where
it can do so and it dereferences both files[0] and files[1] on the
way there. Since that hadn't oopsed, they are both non-NULL.

IOW, replace_fd(0, files[0], 0) is equivalent to
{
int err;
struct file *f = files[0];
struct files_struct *files = current->files;
if (0 >= rlimit(RLIMIT_NOFILE))
return -EMFILE;

spin_lock(&files->file_lock);
err = expand_files(files, 0);
if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, f, 0, 0);

out_unlock:
spin_unlock(&files->file_lock);
return err;
}

Aha. We have the first thing to test - is RLIMIT_NOFILE set to 0 for that
process, by any chance? We definitedly need to check replace_fd() return
value, BTW. OK, suppose it wasn't 0. So we'd done expand_files(files, 0).
expand_files() begins with
/* Do we need to expand? */
if (nr < fdt->max_fds)
return 0;
and ->max_fds is never going to be less than NR_OPEN_DEFAULT, which is
definitely positive. So expand_files(files, 0) is going to return 0
without doing anything. And we get to do_dup2():
fdt = files_fdtable(files);
tofree = fdt->fd[0];
if (!tofree && fd_is_open(0, fdt))
goto Ebusy;
that really shouldn't be possible to hit in this case - it can happen
only for shared descriptor table. Another thread would need to have
already reserved fd 0, but hadn't got around to installing a reference
to struct file into it. But we are called from the freshly created
kernel thread, about to exec some userland binary. Those are spawned
without CLONE_FILES in flags, so the table can't be shared. It's fresh
from dup_fd() and this bit
/*
* The fd may be claimed in the fd bitmap but not yet
* instantiated in the files array if a sibling thread
* is partway through open(). So make sure that this
* fd is available to the new process.
*/
__clear_open_fd(open_files - i, new_fdt);
in there would have prevented anything of that kind.

Now we have tofree equal to whatever we used to have in ->fd[0] (realistically
that's going to be NULL). Next we have
get_file(file);
rcu_assign_pointer(fdt->fd[0], file);
__set_open_fd(0, fdt);
if (0 & O_CLOEXEC)
__set_close_on_exec(fd, fdt);
else
__clear_close_on_exec(fd, fdt);
spin_unlock(&files->file_lock);

if (tofree)
filp_close(tofree, files);

Now, note that sys_close(0) would have done something very similar -
file = fdt->fd[0];
if (!file)
goto out_unlock;
rcu_assign_pointer(fdt->fd[0], NULL);
__clear_close_on_exec(0, fdt);
__put_unused_fd(files, 0);
spin_unlock(&files->file_lock);
filp_close(file, files);
'file' here is what we had in 'tofree'. We'd ripped it out of descriptor
table and done filp_close() on it. When we follow with
fd_install(0, files[0]);
...
__set_open_fd(0, fdt);
__clear_close_on_exec(0, fdt);
we have get_file() done on files[0], files[0] put into ->fd[0] (fd_install()),
result of __put_unused(files, 0); undone (by __set_open_fd()) and we come
to exactly the same situation as with replace_fd().

All right, so the next step in debugging is to print the damn
return value of replace_fd(). That should narrow the things down.
Another interesting question, of course, is what makes that gnome turd
coredump on each boot?

2012-10-15 23:11:13

by Pavel Roskin

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

On Mon, 15 Oct 2012 22:51:27 +0100
Al Viro <[email protected]> wrote:

> On Mon, Oct 15, 2012 at 05:14:37PM -0400, Pavel Roskin wrote:
> > Hello!
> >
> > I tried the current mainline Linux on Fedora 16 x64_64 and found
> > that I cannot login in gdm. I'm using gdm with LXDE. After a few
> > minutes, kernel messages appear indicating a hung process
> > gnome-settings-daemon.
>
> What is it hanging on?

Sorry, I have no idea. I select my name, enter password and nothing
happens. I can still move the mouse and select some widgets in gdm. I
can use Ctrl-Alt-F2 to switch to the test console and login.

I attached gdb to the process and that's what I got:

(gdb) where
#0 0x0000003c19ae8803 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x0000003c1ba45448 in g_main_context_poll (n_fds=10, fds=0x2096dc0, priority=<optimized out>, timeout=41758, context=0x1e3e4a0) at gmain.c:3402
#2 g_main_context_iterate (context=0x1e3e4a0, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084
#3 0x0000003c1ba45c85 in g_main_loop_run (loop=0x1f35b70) at gmain.c:3297
#4 0x00000033cf75198d in gtk_main () at gtkmain.c:1362
#5 0x0000000000403d28 in main (argc=1, argv=0x7fffe57b3a78) at main.c:459
(gdb)

I'm not sure it's what you asked.

> Aha. We have the first thing to test - is RLIMIT_NOFILE set to 0 for
> that process, by any chance?

No. I made a copy of replace_fd() called by umh_pipe_setup() only, and
here's the output (debugging patch attached):

[ 15.792895] replace_fd_debug: rlimit(RLIMIT_NOFILE) = 1024
[ 15.792901] replace_fd_debug: file = ffff88011dd5db80
[ 15.792903] replace_fd_debug: expand_files returned 0
[ 15.792905] umh_pipe_setup: replace_fd_debug returned 0

> We definitedly need to check
> replace_fd() return value, BTW. OK, suppose it wasn't 0.

It's 0.

I made simplified functions replacing replace_fd() for umh_pipe_setup()
by substituting parameters and expanding do_dup2(). Here's the
combined function:

void replace_fd2(struct file *file)
{
struct fdtable *fdt;
struct files_struct *files = current->files;

/* old code */
sys_close(0);
fd_install(0, file);

spin_lock(&files->file_lock);
fdt = files_fdtable(files);

/* new code */
get_file(file);
rcu_assign_pointer(fdt->fd[0], file);

__set_open_fd(0, fdt);
__clear_close_on_exec(0, fdt);
spin_unlock(&files->file_lock);
}

If the "new code" (two lines below the comment) is commented out and the
"old code" is not, everything works. If the new code is used and the
old code is commented out, I have a hang. Enabling all code leads
to a hang too.

> All right, so the next step in debugging is to print the damn
> return value of replace_fd(). That should narrow the things down.
> Another interesting question, of course, is what makes that gnome turd
> coredump on each boot?

I guess some GNOME programs are "uncomfortable" when launched by LXDE.
Considering that I'm running Fedora 16 with GNOME 3.2, I don't think
anyone would care about my bug reports. But inability to login due to
kernel changes is bad and should not happen.

--
Regards,
Pavel Roskin


Attachments:
(No filename) (3.12 kB)
replacefd-debug.patch (1.90 kB)
Download all attachments

2012-10-15 23:27:08

by Al Viro

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

I think I understand what's going on there. Add fput(files[0])
after that replace_fd(); we have a leak and while that's not the right
way to fix it, that'd verify if anything else is going on there.

2012-10-15 23:40:20

by Pavel Roskin

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

On Tue, 16 Oct 2012 00:27:05 +0100
Al Viro <[email protected]> wrote:

> I think I understand what's going on there. Add
> fput(files[0]) after that replace_fd(); we have a leak and while
> that's not the right way to fix it, that'd verify if anything else is
> going on there.

Yes, it's working now! No more hang on login!

--
Regards,
Pavel Roskin

2012-10-15 23:48:55

by Al Viro

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

On Mon, Oct 15, 2012 at 07:40:08PM -0400, Pavel Roskin wrote:
> On Tue, 16 Oct 2012 00:27:05 +0100
> Al Viro <[email protected]> wrote:
>
> > I think I understand what's going on there. Add
> > fput(files[0]) after that replace_fd(); we have a leak and while
> > that's not the right way to fix it, that'd verify if anything else is
> > going on there.
>
> Yes, it's working now! No more hang on login!

That should be closer to the final variant.

diff --git a/fs/coredump.c b/fs/coredump.c
index fd37fac..ce47379 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -450,11 +450,12 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)

cp->file = files[1];

- replace_fd(0, files[0], 0);
+ err = replace_fd(0, files[0], 0);
+ fput(files[0]);
/* and disallow core files too */
current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};

- return 0;
+ return err;
}

void do_coredump(siginfo_t *siginfo, struct pt_regs *regs)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 24ab414..61a5336 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2132,18 +2132,14 @@ static inline void flush_unauthorized_files(const struct cred *cred,
return;

devnull = dentry_open(&selinux_null, O_RDWR, cred);
- if (!IS_ERR(devnull)) {
- /* replace all the matching ones with this */
- do {
- replace_fd(n - 1, get_file(devnull), 0);
- } while ((n = iterate_fd(files, n, match_file, cred)) != 0);
+ if (IS_ERR(devnull))
+ devnull = NULL;
+ /* replace all the matching ones with this */
+ do {
+ replace_fd(n - 1, devnull, 0);
+ } while ((n = iterate_fd(files, n, match_file, cred)) != 0);
+ if (devnull)
fput(devnull);
- } else {
- /* just close all the matching ones */
- do {
- replace_fd(n - 1, NULL, 0);
- } while ((n = iterate_fd(files, n, match_file, cred)) != 0);
- }
}

/*

2012-10-16 15:02:59

by Pavel Roskin

[permalink] [raw]
Subject: Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused by replace_fd()

On Tue, 16 Oct 2012 00:48:53 +0100
Al Viro <[email protected]> wrote:

> On Mon, Oct 15, 2012 at 07:40:08PM -0400, Pavel Roskin wrote:
> > On Tue, 16 Oct 2012 00:27:05 +0100
> > Al Viro <[email protected]> wrote:
> >
> > > I think I understand what's going on there. Add
> > > fput(files[0]) after that replace_fd(); we have a leak and while
> > > that's not the right way to fix it, that'd verify if anything
> > > else is going on there.
> >
> > Yes, it's working now! No more hang on login!
>
> That should be closer to the final variant.

It works me, thank you! I don't use SELinux, so I could not really
test the second part.

--
Regards,
Pavel Roskin