I'm working with a (approximately) 2.4.17 kernel from the mips-linux
tree (oss.sgi.com).
I'd like to propose removing the "__init" designation from init_idle in
kernel/sched.c, since this is called from rest_init via cpu_idle.
Notice that rest_init isn't in an init section, and explicitly mentions
that it's avoiding a race with free_initmem. In my kernel (an SMP
kernel running on a system with only 1 available CPU), cpu_idle isn't
getting called until after free_initmem().
My CPU is MIPS, but it looks like x86 could experience the same problem.
Kip
Index: kernel/sched.c
===================================================================
RCS file:
/projects/bbp/cvsroot/systemsw/linux/src/kernel/kernel/sched.c,v
retrieving revision 1.10
diff -u -r1.10 sched.c
--- kernel/sched.c 2002/01/15 04:13:43 1.10
+++ kernel/sched.c 2002/03/05 19:40:14
@@ -1304,7 +1304,7 @@
extern unsigned long wait_init_idle;
-void __init init_idle(void)
+void init_idle(void)
{
struct schedule_data * sched_data;
sched_data = &aligned_data[smp_processor_id()].schedule_data;
> I'm working with a (approximately) 2.4.17 kernel from the mips-linux
> tree (oss.sgi.com).
>
> I'd like to propose removing the "__init" designation from init_idle in
> kernel/sched.c, since this is called from rest_init via cpu_idle.
> Notice that rest_init isn't in an init section, and explicitly mentions
> that it's avoiding a race with free_initmem. In my kernel (an SMP
> kernel running on a system with only 1 available CPU), cpu_idle isn't
> getting called until after free_initmem().
>
> My CPU is MIPS, but it looks like x86 could experience the same problem.
I fixed something in this area for x86, looks like the same code path
for MIPS unless I'm misreading.
smp_init spins waiting on wait_init_idle until every cpu has done
init_idle. rest_init() isn't called until smp_init returns, so I'm not sure
how you could hit this (possibly there's a minute window after init_idle
clears the bit, but before it returns?).
M.
I think you're right. To prevent the mistake from
happening again, I'd suggest you add a comment to your
patch for init_idle(), explaining that it can't be __init
because it is called by cpu_idle, which is called
by rest_init(), which ...
I wonder if an automated __init consistency checker
would be helpful.
--
Thomas Hood
"Martin J. Bligh" wrote:
>
> > I'm working with a (approximately) 2.4.17 kernel from the mips-linux
> > tree (oss.sgi.com).
> >
> > I'd like to propose removing the "__init" designation from init_idle in
> > kernel/sched.c, since this is called from rest_init via cpu_idle.
> > Notice that rest_init isn't in an init section, and explicitly mentions
> > that it's avoiding a race with free_initmem. In my kernel (an SMP
> > kernel running on a system with only 1 available CPU), cpu_idle isn't
> > getting called until after free_initmem().
> >
> > My CPU is MIPS, but it looks like x86 could experience the same problem.
>
> I fixed something in this area for x86, looks like the same code path
> for MIPS unless I'm misreading.
>
> smp_init spins waiting on wait_init_idle until every cpu has done
> init_idle. rest_init() isn't called until smp_init returns, so I'm not sure
> how you could hit this (possibly there's a minute window after init_idle
> clears the bit, but before it returns?).
This synchronization doesn't help: cpu0 (even in the multi-cpu case)
calls init_idle twice -- once from smp_init (through smp_boot_cpus), and
then again from cpu_idle. In my failing case (CONFIG_SMP=y, but only 1
cpu in the system) the second call, the one from cpu_idle, doesn't
happen until long after the init kernel thread has been running and has
freed the initmem.
Maybe a better fix is to avoid this double calling of init_idle for the
"master" CPU? From my reading the code, x86 seems to behave the same.
Kip
On Tue, 2002-03-05 at 17:15, Kip Walker wrote:
>
> Maybe a better fix is to avoid this double calling of init_idle for the
> "master" CPU? From my reading the code, x86 seems to behave the same.
>
Looks to me like the clean fix would be to call init_idle() from
rest_init() before the init() thread is spawned, and remove it from
cpu_idle(). It looks like a pretty straightforward race condition that
no one else has happened to trigger in a bad way. I'm no scheduler pro,
but I don't see any problems with calling init_idle() earlier.
That fix assumes that bringup of non-primary cpus on other architectures
call init_idle() explicitly before allowing smp_init() to return; this
is true of mips, but I can't vouch for any other arch's.
I'd submit a patch, but I'm sadly lacking in SMP machines for testing.
Anyone who wants to rectify that, I'm open to charity. :)
-Justin
On 05 Mar 2002 17:06:43 -0500,
Thomas Hood <[email protected]> wrote:
>I think you're right. To prevent the mistake from
>happening again, I'd suggest you add a comment to your
>patch for init_idle(), explaining that it can't be __init
>because it is called by cpu_idle, which is called
>by rest_init(), which ...
>
>I wonder if an automated __init consistency checker
>would be helpful.
Looks like it will, I found a lot of dubious code by running
reference_init.pl (below).
Unfortunately I had to exclude references from read only data to .init
sections, almost all of these are false positives, they are created by
gcc. The downside of excluding rodata is that there really are some
user references from rodata to init code, e.g. drivers/video/vgacon.c
const struct consw vga_con = {
con_startup: vgacon_startup,
where vgacon_startup is __init. If you want to wade through the false
positives, take out the check for rodata.
#!/usr/bin/perl -w
#
# reference_init.pl (C) Keith Owens 2002 <[email protected]>
#
# List references to vmlinux init sections from non-init sections.
use strict;
die($0 . " takes no arguments\n") if($#ARGV >= 0);
my %object;
my $object;
my $line;
my $ignore;
$| = 1;
printf("Finding objects, ");
open(OBJDUMP_LIST, "find . -name '*.o' | xargs objdump -h |") || die "getting objdump list failed";
while (defined($line = <OBJDUMP_LIST>)) {
chomp($line);
if ($line =~ /:\s+file format/) {
($object = $line) =~ s/:.*//;
$object{$object}->{'module'} = 0;
$object{$object}->{'size'} = 0;
$object{$object}->{'off'} = 0;
}
if ($line =~ /^\s*\d+\s+\.modinfo\s+/) {
$object{$object}->{'module'} = 1;
}
if ($line =~ /^\s*\d+\s+\.comment\s+/) {
($object{$object}->{'size'}, $object{$object}->{'off'}) = (split(' ', $line))[2,5];
}
}
close(OBJDUMP_LIST);
printf("%d objects, ", scalar keys(%object));
$ignore = 0;
foreach $object (keys(%object)) {
if ($object{$object}->{'module'}) {
++$ignore;
delete($object{$object});
}
}
printf("ignoring %d module(s)\n", $ignore);
# Ignore conglomerate objects, they have been built from multiple objects and we
# only care about the individual objects. If an object has more than one GCC:
# string in the comment section then it is conglomerate. This does not filter
# out conglomerates that consist of exactly one object, can't be helped.
printf("Finding conglomerates, ");
$ignore = 0;
foreach $object (keys(%object)) {
if (exists($object{$object}->{'off'})) {
my ($off, $size, $comment, $l);
$off = hex($object{$object}->{'off'});
$size = hex($object{$object}->{'size'});
open(OBJECT, "<$object") || die "cannot read $object";
seek(OBJECT, $off, 0) || die "seek to $off in $object failed";
$l = read(OBJECT, $comment, $size);
die "read $size bytes from $object .comment failed" if ($l != $size);
close(OBJECT);
if ($comment =~ /GCC\:.*GCC\:/m) {
++$ignore;
delete($object{$object});
}
}
}
printf("ignoring %d conglomerate(s)\n", $ignore);
printf("Scanning objects\n");
foreach $object (sort(keys(%object))) {
my $from;
open(OBJDUMP, "objdump -r $object|") || die "cannot objdump -r $object";
while (defined($line = <OBJDUMP>)) {
chomp($line);
if ($line =~ /RELOCATION RECORDS FOR /) {
($from = $line) =~ s/.*\[([^]]*).*/$1/;
}
if ($line =~ /\.init$/ &&
($from !~ /\.init$/ && $from !~ /\.stab$/ && $from !~ /\.rodata$/ && $from !~ /\.text\.lock$/)) {
printf("Error: %s %s refers to %s\n", $object, $from, $line);
}
}
close(OBJDUMP);
}
printf("Done\n");