LinuxLists.cc - Re: system gets stuck in a lock during boot

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
> >>
> >> * Justin P. Mattock<[email protected]> ?wrote:
> >>
> >>
> >>>
> >>> Ingo Molnar wrote:
> >>>
> >>>>
> >>>> * Justin Mattock<[email protected]> ? wrote:
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> O.K. I feel better, deleted
> >>>>> my system, and threw in a minimal built system
> >>>>> with only the bare essentials to boot.
> >>>>> (just to make sure things are correct).
> >>>>>
> >>>>> unfortunately after building rc6 I'm still hitting
> >>>>> this. really am not sure why this is happening.
> >>>>>
> >>>>>
> >>>>
> >>>> Could you please double-check the bisection result by doing this:
> >>>>
> >>>> ? git revert af6af30c0f
> >>>>
> >>>> on the latest kernel and seeing whether that fixes the lockup?
> >>>>
> >>>> Bisections are very efficient and hence very sensitive as well to
> >>>> minimal errors. Just one small mistake near the end of a bisection
> >>>> can blame the wrong commit.
> >>>>
> >>>> So the best way to double-check such 100%-triggerable crashes is to
> >>>> do the revert. I tried the revert and it can be done fine here.
> >>>>
> >>>> [ _If_ that does not fix the bug then to save time you can
> >>>> ? ? 'backtrack' the bisection, instead of re-doing it completely.
> >>>> ? ? I.e. you have your bisection log, re-check the final steps going
> >>>> ? ? backwards. Once you find a discrepancy (i.e. a 'bad' point that
> >>>> ? ? is 'good' or the other way around), redo the bisection log
> >>>> ? ? commands up to that point and continue it up to the end. ]
> >>>>
> >>>> ? ? ? ?Ingo
> >>>>
> >>>>
> >>>>
> >>>
> >>> shoot, I did not see your post here. when looking at my bisect
> >>> log, I guess after a git bisect reset it clears?
> >>>
> >>> Anyways after git bisect had finished I looked manually at the
> >>> commits that it had generated the one which I had sent in a post
> >>> previously, and this one:
> >>>
> >>> ?9424edc2da097c8589fcc24a72552d33e54be161
> >>>
> >>
> >> (this commit has no effect on your kernel image, at all.)
> >>
> >>
> >
> > yep. but it was worth a try.
> >>>
> >>> at the time looking at the commit, I see this to be more of the
> >>> cause because of it being related to elf as so forth, but as soon
> >>> as I reverted this on rc6 made no difference.(the previous commit
> >>> fixes this for me, on a regular tar.ball as well as in git.
> >>>
> >>> I think at this point since this system is a fresh from scratch
> >>> build, I think something might be wrong that I'm doing (all the
> >>> CFLAGS, and such are in a previous post).
> >>>
> >>> At the moment I don't have a problem applying a patch to the
> >>> kernel for this. especially since I'm the only one that seems to
> >>> be hitting this, then if more and more reports of this happen then
> >>> we can go from there.
> >>>
> >>
> >> What would be nice is to verify your bisection end result, i.e. do
> >> what i suggested:
> >>
> >>
> >
> > yeah I've done this on both kernels three to be exact, and all boot after
> > reverting
> > Fix perf-tracepoint OOPS.
> >
> > As for my system, I'm still convinced that I might be doing something wrong
> > over here.
> >
> >>>> Could you please double-check the bisection result by doing this:
> >>>>
> >>>> ? git revert af6af30c0f
> >>>>
> >>>> on the latest kernel and seeing whether that fixes the lockup?
> >>>>
> >>
> >> if this doesnt fix it on latest -git then this commit is not the
> >> cause of the lockup.
> >>
> >> ? ? ? ?Ingo
> >>
> >>
> >
> > This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
> > well as others asking
> > the question of why.
> > In any case I still think I'm setting something wrong with either gcc, or
> > something
> > that might be causing this from userland.
> >
> > Justin P. Mattock
> >
>
> O.k. here something awkward about this issue I was
> experiencing. at the moment I have two imac's
> here the descriptions:
>
> imac A) the one with the problem
>
> OS: built from the clfs book
> x86_64 multilib with only lib64
>
> built everything with these flags:
> CFLAGS="-m64 -mtune=core2 -march=core2
> -mfpmath=both -O2 -pipe -fomit-frame-pointer
> -fstack-protection"
> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
> while compiling everything with
> gcc version: 4.5.0 20090730
>
>
> imac B) the one that works
>
> OS: clfs(just built a few days ago)
> x86_64 pure64 bit build
> (lib with a symlink to lib64)
> CFLAGS="-m64 -mtune=core2 -march=core2
> -O2 -pipe -fomit-frame-pointer"
> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
> gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>
> The only things I can think of is either I hit something
> because of gcc, something goes wrong with the libraries,
> or there something happening with either the option
> of mfpmath=both or stackprotection.
>
> At this point since the kernel seems to be running fine,
> is to just trash the system that has this issue and just leave
> it at, I was hitting some weird anomaly.
>

hi Justin,

I've been playing around with gcc '4.5' as well and hit a panic that
looks very similar to what you've seen with stock 2.6.31 - I haven't
seen it anywhere else. Anyways, it seems to be some sort of alignment
issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
compiler or kernel issue. But the following kernel patch fixes the issue
for me. It would be interesting to verify if the patch also resolves the
issue for you.

thanks,

-Jason

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 6ad76bf..0029af4 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -164,6 +164,7 @@
LIKELY_PROFILE() \
BRANCH_PROFILE() \
TRACE_PRINTKS() \
+ . = ALIGN(32); \
FTRACE_EVENTS() \
TRACE_SYSCALLS()

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a81170d..43f9f1e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -124,7 +124,7 @@ struct ftrace_event_call {
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
void (*profile_disable)(struct ftrace_event_call *);
-};
+} __attribute__((aligned(32)));

#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f64fbaa..4697fb6 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -600,7 +600,7 @@ static int ftrace_raw_init_event_##call(void) \
} \
\
static struct ftrace_event_call __used \
-__attribute__((__aligned__(4))) \
+__attribute__((__aligned__(32))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.system = __stringify(TRACE_SYSTEM), \

2009-10-04 17:42:28

by Ingo Molnar

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

* Jason Baron <[email protected]> wrote:

> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
> > >>
> > >> * Justin P. Mattock<[email protected]> ?wrote:
> > >>
> > >>
> > >>>
> > >>> Ingo Molnar wrote:
> > >>>
> > >>>>
> > >>>> * Justin Mattock<[email protected]> ? wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> O.K. I feel better, deleted
> > >>>>> my system, and threw in a minimal built system
> > >>>>> with only the bare essentials to boot.
> > >>>>> (just to make sure things are correct).
> > >>>>>
> > >>>>> unfortunately after building rc6 I'm still hitting
> > >>>>> this. really am not sure why this is happening.
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> Could you please double-check the bisection result by doing this:
> > >>>>
> > >>>> ? git revert af6af30c0f
> > >>>>
> > >>>> on the latest kernel and seeing whether that fixes the lockup?
> > >>>>
> > >>>> Bisections are very efficient and hence very sensitive as well to
> > >>>> minimal errors. Just one small mistake near the end of a bisection
> > >>>> can blame the wrong commit.
> > >>>>
> > >>>> So the best way to double-check such 100%-triggerable crashes is to
> > >>>> do the revert. I tried the revert and it can be done fine here.
> > >>>>
> > >>>> [ _If_ that does not fix the bug then to save time you can
> > >>>> ? ? 'backtrack' the bisection, instead of re-doing it completely.
> > >>>> ? ? I.e. you have your bisection log, re-check the final steps going
> > >>>> ? ? backwards. Once you find a discrepancy (i.e. a 'bad' point that
> > >>>> ? ? is 'good' or the other way around), redo the bisection log
> > >>>> ? ? commands up to that point and continue it up to the end. ]
> > >>>>
> > >>>> ? ? ? ?Ingo
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>> shoot, I did not see your post here. when looking at my bisect
> > >>> log, I guess after a git bisect reset it clears?
> > >>>
> > >>> Anyways after git bisect had finished I looked manually at the
> > >>> commits that it had generated the one which I had sent in a post
> > >>> previously, and this one:
> > >>>
> > >>> ?9424edc2da097c8589fcc24a72552d33e54be161
> > >>>
> > >>
> > >> (this commit has no effect on your kernel image, at all.)
> > >>
> > >>
> > >
> > > yep. but it was worth a try.
> > >>>
> > >>> at the time looking at the commit, I see this to be more of the
> > >>> cause because of it being related to elf as so forth, but as soon
> > >>> as I reverted this on rc6 made no difference.(the previous commit
> > >>> fixes this for me, on a regular tar.ball as well as in git.
> > >>>
> > >>> I think at this point since this system is a fresh from scratch
> > >>> build, I think something might be wrong that I'm doing (all the
> > >>> CFLAGS, and such are in a previous post).
> > >>>
> > >>> At the moment I don't have a problem applying a patch to the
> > >>> kernel for this. especially since I'm the only one that seems to
> > >>> be hitting this, then if more and more reports of this happen then
> > >>> we can go from there.
> > >>>
> > >>
> > >> What would be nice is to verify your bisection end result, i.e. do
> > >> what i suggested:
> > >>
> > >>
> > >
> > > yeah I've done this on both kernels three to be exact, and all boot after
> > > reverting
> > > Fix perf-tracepoint OOPS.
> > >
> > > As for my system, I'm still convinced that I might be doing something wrong
> > > over here.
> > >
> > >>>> Could you please double-check the bisection result by doing this:
> > >>>>
> > >>>> ? git revert af6af30c0f
> > >>>>
> > >>>> on the latest kernel and seeing whether that fixes the lockup?
> > >>>>
> > >>
> > >> if this doesnt fix it on latest -git then this commit is not the
> > >> cause of the lockup.
> > >>
> > >> ? ? ? ?Ingo
> > >>
> > >>
> > >
> > > This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
> > > well as others asking
> > > the question of why.
> > > In any case I still think I'm setting something wrong with either gcc, or
> > > something
> > > that might be causing this from userland.
> > >
> > > Justin P. Mattock
> > >
> >
> > O.k. here something awkward about this issue I was
> > experiencing. at the moment I have two imac's
> > here the descriptions:
> >
> > imac A) the one with the problem
> >
> > OS: built from the clfs book
> > x86_64 multilib with only lib64
> >
> > built everything with these flags:
> > CFLAGS="-m64 -mtune=core2 -march=core2
> > -mfpmath=both -O2 -pipe -fomit-frame-pointer
> > -fstack-protection"
> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
> > while compiling everything with
> > gcc version: 4.5.0 20090730
> >
> >
> > imac B) the one that works
> >
> > OS: clfs(just built a few days ago)
> > x86_64 pure64 bit build
> > (lib with a symlink to lib64)
> > CFLAGS="-m64 -mtune=core2 -march=core2
> > -O2 -pipe -fomit-frame-pointer"
> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
> > gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
> >
> > The only things I can think of is either I hit something
> > because of gcc, something goes wrong with the libraries,
> > or there something happening with either the option
> > of mfpmath=both or stackprotection.
> >
> > At this point since the kernel seems to be running fine,
> > is to just trash the system that has this issue and just leave
> > it at, I was hitting some weird anomaly.
> >
>
> hi Justin,
>
> I've been playing around with gcc '4.5' as well and hit a panic that
> looks very similar to what you've seen with stock 2.6.31 - I haven't
> seen it anywhere else. Anyways, it seems to be some sort of alignment
> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
> compiler or kernel issue. But the following kernel patch fixes the issue
> for me. It would be interesting to verify if the patch also resolves the
> issue for you.

Would be nice to know precisely what kind of problem is being hit here -
we'd like to fix either the kernel or GCC - depending on where the bug
lies.

Ingo

2009-10-05 00:11:34

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar <[email protected]> wrote:
>
> * Jason Baron <[email protected]> wrote:
>
>> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
>> > >>
>> > >> * Justin P. Mattock<[email protected]> ?wrote:
>> > >>
>> > >>
>> > >>>
>> > >>> Ingo Molnar wrote:
>> > >>>
>> > >>>>
>> > >>>> * Justin Mattock<[email protected]> ? wrote:
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>>
>> > >>>>> O.K. I feel better, deleted
>> > >>>>> my system, and threw in a minimal built system
>> > >>>>> with only the bare essentials to boot.
>> > >>>>> (just to make sure things are correct).
>> > >>>>>
>> > >>>>> unfortunately after building rc6 I'm still hitting
>> > >>>>> this. really am not sure why this is happening.
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>>> Could you please double-check the bisection result by doing this:
>> > >>>>
>> > >>>> ? git revert af6af30c0f
>> > >>>>
>> > >>>> on the latest kernel and seeing whether that fixes the lockup?
>> > >>>>
>> > >>>> Bisections are very efficient and hence very sensitive as well to
>> > >>>> minimal errors. Just one small mistake near the end of a bisection
>> > >>>> can blame the wrong commit.
>> > >>>>
>> > >>>> So the best way to double-check such 100%-triggerable crashes is to
>> > >>>> do the revert. I tried the revert and it can be done fine here.
>> > >>>>
>> > >>>> [ _If_ that does not fix the bug then to save time you can
>> > >>>> ? ? 'backtrack' the bisection, instead of re-doing it completely.
>> > >>>> ? ? I.e. you have your bisection log, re-check the final steps going
>> > >>>> ? ? backwards. Once you find a discrepancy (i.e. a 'bad' point that
>> > >>>> ? ? is 'good' or the other way around), redo the bisection log
>> > >>>> ? ? commands up to that point and continue it up to the end. ]
>> > >>>>
>> > >>>> ? ? ? ?Ingo
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>
>> > >>> shoot, I did not see your post here. when looking at my bisect
>> > >>> log, I guess after a git bisect reset it clears?
>> > >>>
>> > >>> Anyways after git bisect had finished I looked manually at the
>> > >>> commits that it had generated the one which I had sent in a post
>> > >>> previously, and this one:
>> > >>>
>> > >>> ?9424edc2da097c8589fcc24a72552d33e54be161
>> > >>>
>> > >>
>> > >> (this commit has no effect on your kernel image, at all.)
>> > >>
>> > >>
>> > >
>> > > yep. but it was worth a try.
>> > >>>
>> > >>> at the time looking at the commit, I see this to be more of the
>> > >>> cause because of it being related to elf as so forth, but as soon
>> > >>> as I reverted this on rc6 made no difference.(the previous commit
>> > >>> fixes this for me, on a regular tar.ball as well as in git.
>> > >>>
>> > >>> I think at this point since this system is a fresh from scratch
>> > >>> build, I think something might be wrong that I'm doing (all the
>> > >>> CFLAGS, and such are in a previous post).
>> > >>>
>> > >>> At the moment I don't have a problem applying a patch to the
>> > >>> kernel for this. especially since I'm the only one that seems to
>> > >>> be hitting this, then if more and more reports of this happen then
>> > >>> we can go from there.
>> > >>>
>> > >>
>> > >> What would be nice is to verify your bisection end result, i.e. do
>> > >> what i suggested:
>> > >>
>> > >>
>> > >
>> > > yeah I've done this on both kernels three to be exact, and all boot after
>> > > reverting
>> > > Fix perf-tracepoint OOPS.
>> > >
>> > > As for my system, I'm still convinced that I might be doing something wrong
>> > > over here.
>> > >
>> > >>>> Could you please double-check the bisection result by doing this:
>> > >>>>
>> > >>>> ? git revert af6af30c0f
>> > >>>>
>> > >>>> on the latest kernel and seeing whether that fixes the lockup?
>> > >>>>
>> > >>
>> > >> if this doesnt fix it on latest -git then this commit is not the
>> > >> cause of the lockup.
>> > >>
>> > >> ? ? ? ?Ingo
>> > >>
>> > >>
>> > >
>> > > This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
>> > > well as others asking
>> > > the question of why.
>> > > In any case I still think I'm setting something wrong with either gcc, or
>> > > something
>> > > that might be causing this from userland.
>> > >
>> > > Justin P. Mattock
>> > >
>> >
>> > O.k. here something awkward about this issue I was
>> > experiencing. at the moment I have two imac's
>> > here the descriptions:
>> >
>> > imac A) the one with the problem
>> >
>> > OS: built from the clfs book
>> > x86_64 multilib with only lib64
>> >
>> > built everything with these flags:
>> > CFLAGS="-m64 -mtune=core2 -march=core2
>> > -mfpmath=both -O2 -pipe -fomit-frame-pointer
>> > -fstack-protection"
>> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>> > while compiling everything with
>> > gcc version: 4.5.0 20090730
>> >
>> >
>> > imac B) the one that works
>> >
>> > OS: clfs(just built a few days ago)
>> > x86_64 pure64 bit build
>> > (lib with a symlink to lib64)
>> > CFLAGS="-m64 -mtune=core2 -march=core2
>> > ?-O2 -pipe -fomit-frame-pointer"
>> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>> > gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>> >
>> > The only things I can think of is either I hit something
>> > because of gcc, something goes wrong with the libraries,
>> > or there something happening with either the option
>> > of mfpmath=both or stackprotection.
>> >
>> > At this point since the kernel seems to be running fine,
>> > is to just trash the system that has this issue and just leave
>> > it at, I was hitting some weird anomaly.
>> >
>>
>> hi Justin,
>>
>> I've been playing around with gcc '4.5' as well and hit a panic that
>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>> compiler or kernel issue. But the following kernel patch fixes the issue
>> for me. It would be interesting to verify if the patch also resolves the
>> issue for you.
>
> Would be nice to know precisely what kind of problem is being hit here -
> we'd like to fix either the kernel or GCC - depending on where the bug
> lies.
>
> ? ? ? ?Ingo
>

So I wasn't going crazy....
Anyways that system(clfs)
I still have, I can go ahead and
put it back on the machine and see if I hit this
again(keep in mind, just got back from a 7hr drive,
so it might be tomorrow).

--
Justin P. Mattock

2009-10-06 01:01:05

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Justin Mattock wrote:
> On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar<[email protected]> wrote:
>
>> * Jason Baron<[email protected]> wrote:
>>
>>
>>> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
>>>
>>>>>> * Justin P. Mattock<[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Ingo Molnar wrote:
>>>>>>>
>>>>>>>
>>>>>>>> * Justin Mattock<[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> O.K. I feel better, deleted
>>>>>>>>> my system, and threw in a minimal built system
>>>>>>>>> with only the bare essentials to boot.
>>>>>>>>> (just to make sure things are correct).
>>>>>>>>>
>>>>>>>>> unfortunately after building rc6 I'm still hitting
>>>>>>>>> this. really am not sure why this is happening.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>
>>>>>>>> git revert af6af30c0f
>>>>>>>>
>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>
>>>>>>>> Bisections are very efficient and hence very sensitive as well to
>>>>>>>> minimal errors. Just one small mistake near the end of a bisection
>>>>>>>> can blame the wrong commit.
>>>>>>>>
>>>>>>>> So the best way to double-check such 100%-triggerable crashes is to
>>>>>>>> do the revert. I tried the revert and it can be done fine here.
>>>>>>>>
>>>>>>>> [ _If_ that does not fix the bug then to save time you can
>>>>>>>> 'backtrack' the bisection, instead of re-doing it completely.
>>>>>>>> I.e. you have your bisection log, re-check the final steps going
>>>>>>>> backwards. Once you find a discrepancy (i.e. a 'bad' point that
>>>>>>>> is 'good' or the other way around), redo the bisection log
>>>>>>>> commands up to that point and continue it up to the end. ]
>>>>>>>>
>>>>>>>> Ingo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> shoot, I did not see your post here. when looking at my bisect
>>>>>>> log, I guess after a git bisect reset it clears?
>>>>>>>
>>>>>>> Anyways after git bisect had finished I looked manually at the
>>>>>>> commits that it had generated the one which I had sent in a post
>>>>>>> previously, and this one:
>>>>>>>
>>>>>>> 9424edc2da097c8589fcc24a72552d33e54be161
>>>>>>>
>>>>>>>
>>>>>> (this commit has no effect on your kernel image, at all.)
>>>>>>
>>>>>>
>>>>>>
>>>>> yep. but it was worth a try.
>>>>>
>>>>>>> at the time looking at the commit, I see this to be more of the
>>>>>>> cause because of it being related to elf as so forth, but as soon
>>>>>>> as I reverted this on rc6 made no difference.(the previous commit
>>>>>>> fixes this for me, on a regular tar.ball as well as in git.
>>>>>>>
>>>>>>> I think at this point since this system is a fresh from scratch
>>>>>>> build, I think something might be wrong that I'm doing (all the
>>>>>>> CFLAGS, and such are in a previous post).
>>>>>>>
>>>>>>> At the moment I don't have a problem applying a patch to the
>>>>>>> kernel for this. especially since I'm the only one that seems to
>>>>>>> be hitting this, then if more and more reports of this happen then
>>>>>>> we can go from there.
>>>>>>>
>>>>>>>
>>>>>> What would be nice is to verify your bisection end result, i.e. do
>>>>>> what i suggested:
>>>>>>
>>>>>>
>>>>>>
>>>>> yeah I've done this on both kernels three to be exact, and all boot after
>>>>> reverting
>>>>> Fix perf-tracepoint OOPS.
>>>>>
>>>>> As for my system, I'm still convinced that I might be doing something wrong
>>>>> over here.
>>>>>
>>>>>
>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>
>>>>>>>> git revert af6af30c0f
>>>>>>>>
>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>
>>>>>>>>
>>>>>> if this doesnt fix it on latest -git then this commit is not the
>>>>>> cause of the lockup.
>>>>>>
>>>>>> Ingo
>>>>>>
>>>>>>
>>>>>>
>>>>> This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
>>>>> well as others asking
>>>>> the question of why.
>>>>> In any case I still think I'm setting something wrong with either gcc, or
>>>>> something
>>>>> that might be causing this from userland.
>>>>>
>>>>> Justin P. Mattock
>>>>>
>>>>>
>>>> O.k. here something awkward about this issue I was
>>>> experiencing. at the moment I have two imac's
>>>> here the descriptions:
>>>>
>>>> imac A) the one with the problem
>>>>
>>>> OS: built from the clfs book
>>>> x86_64 multilib with only lib64
>>>>
>>>> built everything with these flags:
>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>> -mfpmath=both -O2 -pipe -fomit-frame-pointer
>>>> -fstack-protection"
>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>> while compiling everything with
>>>> gcc version: 4.5.0 20090730
>>>>
>>>>
>>>> imac B) the one that works
>>>>
>>>> OS: clfs(just built a few days ago)
>>>> x86_64 pure64 bit build
>>>> (lib with a symlink to lib64)
>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>> -O2 -pipe -fomit-frame-pointer"
>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>> gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>>>>
>>>> The only things I can think of is either I hit something
>>>> because of gcc, something goes wrong with the libraries,
>>>> or there something happening with either the option
>>>> of mfpmath=both or stackprotection.
>>>>
>>>> At this point since the kernel seems to be running fine,
>>>> is to just trash the system that has this issue and just leave
>>>> it at, I was hitting some weird anomaly.
>>>>
>>>>
>>> hi Justin,
>>>
>>> I've been playing around with gcc '4.5' as well and hit a panic that
>>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>>> compiler or kernel issue. But the following kernel patch fixes the issue
>>> for me. It would be interesting to verify if the patch also resolves the
>>> issue for you.
>>>
>> Would be nice to know precisely what kind of problem is being hit here -
>> we'd like to fix either the kernel or GCC - depending on where the bug
>> lies.
>>
>> Ingo
>>
>>
>
> So I wasn't going crazy....
> Anyways that system(clfs)
> I still have, I can go ahead and
> put it back on the machine and see if I hit this
> again(keep in mind, just got back from a 7hr drive,
> so it might be tomorrow).
>
>
o.k. I put back on that system, and
hit the error. I add your patch to 2.6.31-rc6,
and the latest git(a few days old).
I still am hitting this, but with your patch
I'm able to see the beginning of this panic:
(Ill write it manually)

[ 2.523966] kernel panic - not syncing: No init found. try passing
init= option
to the kernel
[ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
[ 2.524633] Call Trace:
[ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
[ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5
[ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
[ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
[ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
[ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20

Seems I only hit this with using gcc 4.5.0 and compiling
sysvinit with SELinux support to load the policy at boot.
(here's the patch I used
http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).

Sound's like gcc is doing something(correct me if I'm
wrong) because the other systems I have are using the same
packages except for and older version of gcc.
maybe I should update sysvinit with a better patch to load the policy.

Justin P. Mattock

2009-10-06 01:20:58

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Mon, 2009-10-05 at 18:00 -0700, Justin P. Mattock wrote:
> Justin Mattock wrote:

> o.k. I put back on that system, and
> hit the error. I add your patch to 2.6.31-rc6,
> and the latest git(a few days old).
> I still am hitting this, but with your patch
> I'm able to see the beginning of this panic:
> (Ill write it manually)
>
> [ 2.523966] kernel panic - not syncing: No init found. try passing
> init= option
> to the kernel
> [ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
> [ 2.524633] Call Trace:
> [ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
> [ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5

Strange. This panic is just telling you it could not find an "init" to
execute.

-- Steve

> [ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
> [ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
> [ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
> [ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20
>
> Seems I only hit this with using gcc 4.5.0 and compiling
> sysvinit with SELinux support to load the policy at boot.
> (here's the patch I used
> http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).
>
> Sound's like gcc is doing something(correct me if I'm
> wrong) because the other systems I have are using the same
> packages except for and older version of gcc.
> maybe I should update sysvinit with a better patch to load the policy.
>
> Justin P. Mattock

2009-10-06 01:26:10

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Fri, 2009-10-02 at 17:12 -0400, Jason Baron wrote:

> hi Justin,
>
> I've been playing around with gcc '4.5' as well and hit a panic that
> looks very similar to what you've seen with stock 2.6.31 - I haven't
> seen it anywhere else. Anyways, it seems to be some sort of alignment
> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
> compiler or kernel issue. But the following kernel patch fixes the issue
> for me. It would be interesting to verify if the patch also resolves the
> issue for you.
>
> thanks,
>
> -Jason
>
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 6ad76bf..0029af4 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -164,6 +164,7 @@
> LIKELY_PROFILE() \
> BRANCH_PROFILE() \
> TRACE_PRINTKS() \
> + . = ALIGN(32); \
> FTRACE_EVENTS() \
> TRACE_SYSCALLS()
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index a81170d..43f9f1e 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -124,7 +124,7 @@ struct ftrace_event_call {
> atomic_t profile_count;
> int (*profile_enable)(struct ftrace_event_call *);
> void (*profile_disable)(struct ftrace_event_call *);
> -};
> +} __attribute__((aligned(32)));
>
> #define MAX_FILTER_PRED 32
> #define MAX_FILTER_STR_VAL 128
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index f64fbaa..4697fb6 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -600,7 +600,7 @@ static int ftrace_raw_init_event_##call(void) \
> } \
> \
> static struct ftrace_event_call __used \
> -__attribute__((__aligned__(4))) \
> +__attribute__((__aligned__(32))) \
> __attribute__((section("_ftrace_events"))) event_##call = { \
> .name = #call, \
> .system = __stringify(TRACE_SYSTEM), \

Are all alignments needed? Or just adding one might help. Or removing
the one directly above?

-- Steve

2009-10-06 02:01:31

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Mon, 2009-10-05 at 18:00 -0700, Justin P. Mattock wrote:
>
>> Justin Mattock wrote:
>>
>
>
>> o.k. I put back on that system, and
>> hit the error. I add your patch to 2.6.31-rc6,
>> and the latest git(a few days old).
>> I still am hitting this, but with your patch
>> I'm able to see the beginning of this panic:
>> (Ill write it manually)
>>
>> [ 2.523966] kernel panic - not syncing: No init found. try passing
>> init= option
>> to the kernel
>> [ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
>> [ 2.524633] Call Trace:
>> [ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
>> [ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5
>>
>
> Strange. This panic is just telling you it could not find an "init" to
> execute.
>
>
It is strange, my only guess is something gcc
is doing(like mentioned other systems I have run fine
while using gcc 4.4)
> -- Steve
>
>
>> [ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
>> [ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
>> [ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
>> [ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20
>>
>> Seems I only hit this with using gcc 4.5.0 and compiling
>> sysvinit with SELinux support to load the policy at boot.
>> (here's the patch I used
>> http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).
>>
>> Sound's like gcc is doing something(correct me if I'm
>> wrong) because the other systems I have are using the same
>> packages except for and older version of gcc.
>> maybe I should update sysvinit with a better patch to load the policy.
>>
>> Justin P. Mattock
>>
>
>
>
Justin P. Mattock

2009-10-06 14:33:16

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Mon, Oct 05, 2009 at 06:00:41PM -0700, Justin P. Mattock wrote:
> Justin Mattock wrote:
>> On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar<[email protected]> wrote:
>>
>>> * Jason Baron<[email protected]> wrote:
>>>
>>>
>>>> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
>>>>
>>>>>>> * Justin P. Mattock<[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Ingo Molnar wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> * Justin Mattock<[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> O.K. I feel better, deleted
>>>>>>>>>> my system, and threw in a minimal built system
>>>>>>>>>> with only the bare essentials to boot.
>>>>>>>>>> (just to make sure things are correct).
>>>>>>>>>>
>>>>>>>>>> unfortunately after building rc6 I'm still hitting
>>>>>>>>>> this. really am not sure why this is happening.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>>
>>>>>>>>> git revert af6af30c0f
>>>>>>>>>
>>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>>
>>>>>>>>> Bisections are very efficient and hence very sensitive as well to
>>>>>>>>> minimal errors. Just one small mistake near the end of a bisection
>>>>>>>>> can blame the wrong commit.
>>>>>>>>>
>>>>>>>>> So the best way to double-check such 100%-triggerable crashes is to
>>>>>>>>> do the revert. I tried the revert and it can be done fine here.
>>>>>>>>>
>>>>>>>>> [ _If_ that does not fix the bug then to save time you can
>>>>>>>>> 'backtrack' the bisection, instead of re-doing it completely.
>>>>>>>>> I.e. you have your bisection log, re-check the final steps going
>>>>>>>>> backwards. Once you find a discrepancy (i.e. a 'bad' point that
>>>>>>>>> is 'good' or the other way around), redo the bisection log
>>>>>>>>> commands up to that point and continue it up to the end. ]
>>>>>>>>>
>>>>>>>>> Ingo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> shoot, I did not see your post here. when looking at my bisect
>>>>>>>> log, I guess after a git bisect reset it clears?
>>>>>>>>
>>>>>>>> Anyways after git bisect had finished I looked manually at the
>>>>>>>> commits that it had generated the one which I had sent in a post
>>>>>>>> previously, and this one:
>>>>>>>>
>>>>>>>> 9424edc2da097c8589fcc24a72552d33e54be161
>>>>>>>>
>>>>>>>>
>>>>>>> (this commit has no effect on your kernel image, at all.)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> yep. but it was worth a try.
>>>>>>
>>>>>>>> at the time looking at the commit, I see this to be more of the
>>>>>>>> cause because of it being related to elf as so forth, but as soon
>>>>>>>> as I reverted this on rc6 made no difference.(the previous commit
>>>>>>>> fixes this for me, on a regular tar.ball as well as in git.
>>>>>>>>
>>>>>>>> I think at this point since this system is a fresh from scratch
>>>>>>>> build, I think something might be wrong that I'm doing (all the
>>>>>>>> CFLAGS, and such are in a previous post).
>>>>>>>>
>>>>>>>> At the moment I don't have a problem applying a patch to the
>>>>>>>> kernel for this. especially since I'm the only one that seems to
>>>>>>>> be hitting this, then if more and more reports of this happen then
>>>>>>>> we can go from there.
>>>>>>>>
>>>>>>>>
>>>>>>> What would be nice is to verify your bisection end result, i.e. do
>>>>>>> what i suggested:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> yeah I've done this on both kernels three to be exact, and all boot after
>>>>>> reverting
>>>>>> Fix perf-tracepoint OOPS.
>>>>>>
>>>>>> As for my system, I'm still convinced that I might be doing something wrong
>>>>>> over here.
>>>>>>
>>>>>>
>>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>>
>>>>>>>>> git revert af6af30c0f
>>>>>>>>>
>>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>>
>>>>>>>>>
>>>>>>> if this doesnt fix it on latest -git then this commit is not the
>>>>>>> cause of the lockup.
>>>>>>>
>>>>>>> Ingo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
>>>>>> well as others asking
>>>>>> the question of why.
>>>>>> In any case I still think I'm setting something wrong with either gcc, or
>>>>>> something
>>>>>> that might be causing this from userland.
>>>>>>
>>>>>> Justin P. Mattock
>>>>>>
>>>>>>
>>>>> O.k. here something awkward about this issue I was
>>>>> experiencing. at the moment I have two imac's
>>>>> here the descriptions:
>>>>>
>>>>> imac A) the one with the problem
>>>>>
>>>>> OS: built from the clfs book
>>>>> x86_64 multilib with only lib64
>>>>>
>>>>> built everything with these flags:
>>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>>> -mfpmath=both -O2 -pipe -fomit-frame-pointer
>>>>> -fstack-protection"
>>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>>> while compiling everything with
>>>>> gcc version: 4.5.0 20090730
>>>>>
>>>>>
>>>>> imac B) the one that works
>>>>>
>>>>> OS: clfs(just built a few days ago)
>>>>> x86_64 pure64 bit build
>>>>> (lib with a symlink to lib64)
>>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>>> -O2 -pipe -fomit-frame-pointer"
>>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>>> gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>>>>>
>>>>> The only things I can think of is either I hit something
>>>>> because of gcc, something goes wrong with the libraries,
>>>>> or there something happening with either the option
>>>>> of mfpmath=both or stackprotection.
>>>>>
>>>>> At this point since the kernel seems to be running fine,
>>>>> is to just trash the system that has this issue and just leave
>>>>> it at, I was hitting some weird anomaly.
>>>>>
>>>>>
>>>> hi Justin,
>>>>
>>>> I've been playing around with gcc '4.5' as well and hit a panic that
>>>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>>>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>>>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>>>> compiler or kernel issue. But the following kernel patch fixes the issue
>>>> for me. It would be interesting to verify if the patch also resolves the
>>>> issue for you.
>>>>
>>> Would be nice to know precisely what kind of problem is being hit here -
>>> we'd like to fix either the kernel or GCC - depending on where the bug
>>> lies.
>>>
>>> Ingo
>>>
>>>
>>
>> So I wasn't going crazy....
>> Anyways that system(clfs)
>> I still have, I can go ahead and
>> put it back on the machine and see if I hit this
>> again(keep in mind, just got back from a 7hr drive,
>> so it might be tomorrow).
>>
>>
> o.k. I put back on that system, and
> hit the error. I add your patch to 2.6.31-rc6,

ok. is that error, the same as the error below? The error below looks
completely different from the posted previously. So, it almost looks
like you the patch fixed one problem, only to reveal another one. Is
that correct?

> and the latest git(a few days old).
> I still am hitting this, but with your patch
> I'm able to see the beginning of this panic:
> (Ill write it manually)
>
> [ 2.523966] kernel panic - not syncing: No init found. try passing
> init= option
> to the kernel
> [ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
> [ 2.524633] Call Trace:
> [ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
> [ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5
> [ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
> [ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
> [ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
> [ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20
>
> Seems I only hit this with using gcc 4.5.0 and compiling
> sysvinit with SELinux support to load the policy at boot.
> (here's the patch I used
> http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).
>
> Sound's like gcc is doing something(correct me if I'm
> wrong) because the other systems I have are using the same
> packages except for and older version of gcc.
> maybe I should update sysvinit with a better patch to load the policy.
>
> Justin P. Mattock

2009-10-06 15:13:08

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Jason Baron wrote:
> On Mon, Oct 05, 2009 at 06:00:41PM -0700, Justin P. Mattock wrote:
>
>> Justin Mattock wrote:
>>
>>> On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar<[email protected]> wrote:
>>>
>>>
>>>> * Jason Baron<[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
>>>>>
>>>>>
>>>>>>>> * Justin P. Mattock<[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Ingo Molnar wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> * Justin Mattock<[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> O.K. I feel better, deleted
>>>>>>>>>>> my system, and threw in a minimal built system
>>>>>>>>>>> with only the bare essentials to boot.
>>>>>>>>>>> (just to make sure things are correct).
>>>>>>>>>>>
>>>>>>>>>>> unfortunately after building rc6 I'm still hitting
>>>>>>>>>>> this. really am not sure why this is happening.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>>>
>>>>>>>>>> git revert af6af30c0f
>>>>>>>>>>
>>>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>>>
>>>>>>>>>> Bisections are very efficient and hence very sensitive as well to
>>>>>>>>>> minimal errors. Just one small mistake near the end of a bisection
>>>>>>>>>> can blame the wrong commit.
>>>>>>>>>>
>>>>>>>>>> So the best way to double-check such 100%-triggerable crashes is to
>>>>>>>>>> do the revert. I tried the revert and it can be done fine here.
>>>>>>>>>>
>>>>>>>>>> [ _If_ that does not fix the bug then to save time you can
>>>>>>>>>> 'backtrack' the bisection, instead of re-doing it completely.
>>>>>>>>>> I.e. you have your bisection log, re-check the final steps going
>>>>>>>>>> backwards. Once you find a discrepancy (i.e. a 'bad' point that
>>>>>>>>>> is 'good' or the other way around), redo the bisection log
>>>>>>>>>> commands up to that point and continue it up to the end. ]
>>>>>>>>>>
>>>>>>>>>> Ingo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> shoot, I did not see your post here. when looking at my bisect
>>>>>>>>> log, I guess after a git bisect reset it clears?
>>>>>>>>>
>>>>>>>>> Anyways after git bisect had finished I looked manually at the
>>>>>>>>> commits that it had generated the one which I had sent in a post
>>>>>>>>> previously, and this one:
>>>>>>>>>
>>>>>>>>> 9424edc2da097c8589fcc24a72552d33e54be161
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> (this commit has no effect on your kernel image, at all.)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> yep. but it was worth a try.
>>>>>>>
>>>>>>>
>>>>>>>>> at the time looking at the commit, I see this to be more of the
>>>>>>>>> cause because of it being related to elf as so forth, but as soon
>>>>>>>>> as I reverted this on rc6 made no difference.(the previous commit
>>>>>>>>> fixes this for me, on a regular tar.ball as well as in git.
>>>>>>>>>
>>>>>>>>> I think at this point since this system is a fresh from scratch
>>>>>>>>> build, I think something might be wrong that I'm doing (all the
>>>>>>>>> CFLAGS, and such are in a previous post).
>>>>>>>>>
>>>>>>>>> At the moment I don't have a problem applying a patch to the
>>>>>>>>> kernel for this. especially since I'm the only one that seems to
>>>>>>>>> be hitting this, then if more and more reports of this happen then
>>>>>>>>> we can go from there.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> What would be nice is to verify your bisection end result, i.e. do
>>>>>>>> what i suggested:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> yeah I've done this on both kernels three to be exact, and all boot after
>>>>>>> reverting
>>>>>>> Fix perf-tracepoint OOPS.
>>>>>>>
>>>>>>> As for my system, I'm still convinced that I might be doing something wrong
>>>>>>> over here.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>>>
>>>>>>>>>> git revert af6af30c0f
>>>>>>>>>>
>>>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> if this doesnt fix it on latest -git then this commit is not the
>>>>>>>> cause of the lockup.
>>>>>>>>
>>>>>>>> Ingo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
>>>>>>> well as others asking
>>>>>>> the question of why.
>>>>>>> In any case I still think I'm setting something wrong with either gcc, or
>>>>>>> something
>>>>>>> that might be causing this from userland.
>>>>>>>
>>>>>>> Justin P. Mattock
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> O.k. here something awkward about this issue I was
>>>>>> experiencing. at the moment I have two imac's
>>>>>> here the descriptions:
>>>>>>
>>>>>> imac A) the one with the problem
>>>>>>
>>>>>> OS: built from the clfs book
>>>>>> x86_64 multilib with only lib64
>>>>>>
>>>>>> built everything with these flags:
>>>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>>>> -mfpmath=both -O2 -pipe -fomit-frame-pointer
>>>>>> -fstack-protection"
>>>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>>>> while compiling everything with
>>>>>> gcc version: 4.5.0 20090730
>>>>>>
>>>>>>
>>>>>> imac B) the one that works
>>>>>>
>>>>>> OS: clfs(just built a few days ago)
>>>>>> x86_64 pure64 bit build
>>>>>> (lib with a symlink to lib64)
>>>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>>>> -O2 -pipe -fomit-frame-pointer"
>>>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>>>> gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>>>>>>
>>>>>> The only things I can think of is either I hit something
>>>>>> because of gcc, something goes wrong with the libraries,
>>>>>> or there something happening with either the option
>>>>>> of mfpmath=both or stackprotection.
>>>>>>
>>>>>> At this point since the kernel seems to be running fine,
>>>>>> is to just trash the system that has this issue and just leave
>>>>>> it at, I was hitting some weird anomaly.
>>>>>>
>>>>>>
>>>>>>
>>>>> hi Justin,
>>>>>
>>>>> I've been playing around with gcc '4.5' as well and hit a panic that
>>>>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>>>>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>>>>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>>>>> compiler or kernel issue. But the following kernel patch fixes the issue
>>>>> for me. It would be interesting to verify if the patch also resolves the
>>>>> issue for you.
>>>>>
>>>>>
>>>> Would be nice to know precisely what kind of problem is being hit here -
>>>> we'd like to fix either the kernel or GCC - depending on where the bug
>>>> lies.
>>>>
>>>> Ingo
>>>>
>>>>
>>>>
>>> So I wasn't going crazy....
>>> Anyways that system(clfs)
>>> I still have, I can go ahead and
>>> put it back on the machine and see if I hit this
>>> again(keep in mind, just got back from a 7hr drive,
>>> so it might be tomorrow).
>>>
>>>
>>>
>> o.k. I put back on that system, and
>> hit the error. I add your patch to 2.6.31-rc6,
>>
>
> ok. is that error, the same as the error below? The error below looks
> completely different from the posted previously. So, it almost looks
> like you the patch fixed one problem, only to reveal another one. Is
> that correct?
>
>
Could be a different error, the problem I have is capturing this error i.g.
tried ieee1394_dma=early to capture this, but that mechanism
seems to error out.(ssh no go either because this happens so early).
I think this is the top part of the error, because before adding your patch
the system would boot a little farther(to fast to read anything)down the
line,
and I did see something in there about a kernel panic.

If you have any ideas on how I can capture this early, would be
appreciated.
(getting anything to log this early is a bit tricky).
>> and the latest git(a few days old).
>> I still am hitting this, but with your patch
>> I'm able to see the beginning of this panic:
>> (Ill write it manually)
>>
>> [ 2.523966] kernel panic - not syncing: No init found. try passing
>> init= option
>> to the kernel
>> [ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
>> [ 2.524633] Call Trace:
>> [ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
>> [ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5
>> [ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
>> [ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
>> [ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
>> [ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20
>>
>> Seems I only hit this with using gcc 4.5.0 and compiling
>> sysvinit with SELinux support to load the policy at boot.
>> (here's the patch I used
>> http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).
>>
>> Sound's like gcc is doing something(correct me if I'm
>> wrong) because the other systems I have are using the same
>> packages except for and older version of gcc.
>> maybe I should update sysvinit with a better patch to load the policy.
>>
>> Justin P. Mattock
>>
>
>
As a test Ill throw in a kernel that was compiled with gcc 4.4.0 just to
see if this is a compiler/kernel issue.

Justin P. Mattock

2009-10-06 20:34:01

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Mon, Oct 05, 2009 at 09:24:09PM -0400, Steven Rostedt wrote:
> On Fri, 2009-10-02 at 17:12 -0400, Jason Baron wrote:
>
> > hi Justin,
> >
> > I've been playing around with gcc '4.5' as well and hit a panic that
> > looks very similar to what you've seen with stock 2.6.31 - I haven't
> > seen it anywhere else. Anyways, it seems to be some sort of alignment
> > issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
> > compiler or kernel issue. But the following kernel patch fixes the issue
> > for me. It would be interesting to verify if the patch also resolves the
> > issue for you.
> >
> > thanks,
> >
> > -Jason
> >
> >
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index 6ad76bf..0029af4 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -164,6 +164,7 @@
> > LIKELY_PROFILE() \
> > BRANCH_PROFILE() \
> > TRACE_PRINTKS() \
> > + . = ALIGN(32); \
> > FTRACE_EVENTS() \
> > TRACE_SYSCALLS()
> >
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index a81170d..43f9f1e 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -124,7 +124,7 @@ struct ftrace_event_call {
> > atomic_t profile_count;
> > int (*profile_enable)(struct ftrace_event_call *);
> > void (*profile_disable)(struct ftrace_event_call *);
> > -};
> > +} __attribute__((aligned(32)));
> >
> > #define MAX_FILTER_PRED 32
> > #define MAX_FILTER_STR_VAL 128
> > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > index f64fbaa..4697fb6 100644
> > --- a/include/trace/ftrace.h
> > +++ b/include/trace/ftrace.h
> > @@ -600,7 +600,7 @@ static int ftrace_raw_init_event_##call(void) \
> > } \
> > \
> > static struct ftrace_event_call __used \
> > -__attribute__((__aligned__(4))) \
> > +__attribute__((__aligned__(32))) \
> > __attribute__((section("_ftrace_events"))) event_##call = { \
> > .name = #call, \
> > .system = __stringify(TRACE_SYSTEM), \
>
> Are all alignments needed? Or just adding one might help. Or removing
> the one directly above?
>
> -- Steve
>

So the problem I'm seeing is an oops on boot caused by the call->system pointer
deference in event_create_dir(). The 'call' variable is of type 'struct
ftrace_event_call'.

What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
(sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
structures are 16-byte aligned. Thus, the stride for walking through the
pointers needs to be 176 (0xB0), but instead its 168 causing the oops.

I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
vanilla 2.6.31 kernel.

That said, I'm not sure the compiler is doing the wrong thing here. The
'struct ftrace_event_call' contains an embedded 'struct list_head' which
is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
minimum alignment for the variable or structure field, measured in bytes'.
Thus, at least according to the docs, gcc can increase the alignment of the
'struct ftrace_event_call', from its original specification of 4, to 16. Even
in the case where we are working corectly the structures are 8-byte aligned.

Thus, I would reccommend the patch below as a preventive measure. Its
the minimal patch I've found to resolve this issue. In general, if we
are going to walk data structures embedded in a special elf section, I
think the general rules needs to be to set the alignment to the power of
two which is greater than or equal to the largest item in the structure.

thanks,

-Jason

Signed-off-by: Jason Baron <[email protected]>

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a81170d..7182f03 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -124,7 +124,10 @@ struct ftrace_event_call {
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
void (*profile_disable)(struct ftrace_event_call *);
-};
+} __attribute__((aligned(16)));
+
+/* Align to the largest field in the data structure:
+ * sizeof(struct list_head) = 16 */

#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f64fbaa..e344e81 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -600,7 +600,6 @@ static int ftrace_raw_init_event_##call(void) \
} \
\
static struct ftrace_event_call __used \
-__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.system = __stringify(TRACE_SYSTEM), \

2009-10-06 22:31:02

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Jason Baron wrote:
> On Mon, Oct 05, 2009 at 09:24:09PM -0400, Steven Rostedt wrote:
>
>> On Fri, 2009-10-02 at 17:12 -0400, Jason Baron wrote:
>>
>>
>>> hi Justin,
>>>
>>> I've been playing around with gcc '4.5' as well and hit a panic that
>>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>>> compiler or kernel issue. But the following kernel patch fixes the issue
>>> for me. It would be interesting to verify if the patch also resolves the
>>> issue for you.
>>>
>>> thanks,
>>>
>>> -Jason
>>>
>>>
>>> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>>> index 6ad76bf..0029af4 100644
>>> --- a/include/asm-generic/vmlinux.lds.h
>>> +++ b/include/asm-generic/vmlinux.lds.h
>>> @@ -164,6 +164,7 @@
>>> LIKELY_PROFILE() \
>>> BRANCH_PROFILE() \
>>> TRACE_PRINTKS() \
>>> + . = ALIGN(32); \
>>> FTRACE_EVENTS() \
>>> TRACE_SYSCALLS()
>>>
>>> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
>>> index a81170d..43f9f1e 100644
>>> --- a/include/linux/ftrace_event.h
>>> +++ b/include/linux/ftrace_event.h
>>> @@ -124,7 +124,7 @@ struct ftrace_event_call {
>>> atomic_t profile_count;
>>> int (*profile_enable)(struct ftrace_event_call *);
>>> void (*profile_disable)(struct ftrace_event_call *);
>>> -};
>>> +} __attribute__((aligned(32)));
>>>
>>> #define MAX_FILTER_PRED 32
>>> #define MAX_FILTER_STR_VAL 128
>>> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
>>> index f64fbaa..4697fb6 100644
>>> --- a/include/trace/ftrace.h
>>> +++ b/include/trace/ftrace.h
>>> @@ -600,7 +600,7 @@ static int ftrace_raw_init_event_##call(void) \
>>> } \
>>> \
>>> static struct ftrace_event_call __used \
>>> -__attribute__((__aligned__(4))) \
>>> +__attribute__((__aligned__(32))) \
>>> __attribute__((section("_ftrace_events"))) event_##call = { \
>>> .name = #call, \
>>> .system = __stringify(TRACE_SYSTEM), \
>>>
>> Are all alignments needed? Or just adding one might help. Or removing
>> the one directly above?
>>
>> -- Steve
>>
>>
>
> So the problem I'm seeing is an oops on boot caused by the call->system pointer
> deference in event_create_dir(). The 'call' variable is of type 'struct
> ftrace_event_call'.
>
> What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> structures are 16-byte aligned. Thus, the stride for walking through the
> pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
>
> I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> vanilla 2.6.31 kernel.
>
> That said, I'm not sure the compiler is doing the wrong thing here. The
> 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> minimum alignment for the variable or structure field, measured in bytes'.
> Thus, at least according to the docs, gcc can increase the alignment of the
> 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> in the case where we are working corectly the structures are 8-byte aligned.
>
> Thus, I would reccommend the patch below as a preventive measure. Its
> the minimal patch I've found to resolve this issue. In general, if we
> are going to walk data structures embedded in a special elf section, I
> think the general rules needs to be to set the alignment to the power of
> two which is greater than or equal to the largest item in the structure.
>
> thanks,
>
> -Jason
>
> Signed-off-by: Jason Baron<[email protected]>
>
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index a81170d..7182f03 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -124,7 +124,10 @@ struct ftrace_event_call {
> atomic_t profile_count;
> int (*profile_enable)(struct ftrace_event_call *);
> void (*profile_disable)(struct ftrace_event_call *);
> -};
> +} __attribute__((aligned(16)));
> +
> +/* Align to the largest field in the data structure:
> + * sizeof(struct list_head) = 16 */
>
> #define MAX_FILTER_PRED 32
> #define MAX_FILTER_STR_VAL 128
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index f64fbaa..e344e81 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -600,7 +600,6 @@ static int ftrace_raw_init_event_##call(void) \
> } \
> \
> static struct ftrace_event_call __used \
> -__attribute__((__aligned__(4))) \
> __attribute__((section("_ftrace_events"))) event_##call = { \
> .name = #call, \
> .system = __stringify(TRACE_SYSTEM), \
>
>
>
>
>
shoot I don't know why this is still hitting.
tried both patches and still.
As of now the only thing I can think of besides looking
at kernel/compiler is the patch for sysvinit to load
the policy(maybe something in there is old/outdated).

(BTW: not sure if it means anything but this system is x86_64
built from the multilib clfs, but with no 32 bit libs, pretty much
how fedora11 has there system built)

Justin P. Mattock

2009-10-07 02:04:01

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Tue, 2009-10-06 at 16:32 -0400, Jason Baron wrote:

> So the problem I'm seeing is an oops on boot caused by the call->system pointer
> deference in event_create_dir(). The 'call' variable is of type 'struct
> ftrace_event_call'.
>
> What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> structures are 16-byte aligned. Thus, the stride for walking through the
> pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
>
> I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> vanilla 2.6.31 kernel.
>
> That said, I'm not sure the compiler is doing the wrong thing here. The
> 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> minimum alignment for the variable or structure field, measured in bytes'.
> Thus, at least according to the docs, gcc can increase the alignment of the
> 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> in the case where we are working corectly the structures are 8-byte aligned.
>
> Thus, I would reccommend the patch below as a preventive measure. Its
> the minimal patch I've found to resolve this issue. In general, if we
> are going to walk data structures embedded in a special elf section, I
> think the general rules needs to be to set the alignment to the power of
> two which is greater than or equal to the largest item in the structure.
>
> thanks,
>
> -Jason
>
> Signed-off-by: Jason Baron <[email protected]>
>
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index a81170d..7182f03 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -124,7 +124,10 @@ struct ftrace_event_call {
> atomic_t profile_count;
> int (*profile_enable)(struct ftrace_event_call *);
> void (*profile_disable)(struct ftrace_event_call *);
> -};
> +} __attribute__((aligned(16)));
> +
> +/* Align to the largest field in the data structure:
> + * sizeof(struct list_head) = 16 */

Is this true for i386?

I just tried this patch and it seems to work. Can you give it a try.

Signed-off-by: Steven Rostedt <[email protected]>

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4ec5e67..044b70d 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -133,7 +133,7 @@ struct ftrace_event_call {
atomic_t profile_count;
int (*profile_enable)(void);
void (*profile_disable)(void);
-};
+} __attribute__((aligned(sizeof(struct list_head))));

#define FTRACE_MAX_PROFILE_SIZE 2048

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index cc0d966..31e7637 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
* }
*
* static struct ftrace_event_call __used
- * __attribute__((__aligned__(4)))
* __attribute__((section("_ftrace_events"))) event_<call> = {
* .name = "<call>",
* .system = "<system>",
@@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
} \
\
static struct ftrace_event_call __used \
-__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) event_##call = { \
.name = #call, \
.system = __stringify(TRACE_SYSTEM), \

2009-10-07 02:42:58

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Tue, 2009-10-06 at 16:32 -0400, Jason Baron wrote:
>
>
>> So the problem I'm seeing is an oops on boot caused by the call->system pointer
>> deference in event_create_dir(). The 'call' variable is of type 'struct
>> ftrace_event_call'.
>>
>> What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
>> (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
>> structures are 16-byte aligned. Thus, the stride for walking through the
>> pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
>>
>> I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
>> vanilla 2.6.31 kernel.
>>
>> That said, I'm not sure the compiler is doing the wrong thing here. The
>> 'struct ftrace_event_call' contains an embedded 'struct list_head' which
>> is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
>> minimum alignment for the variable or structure field, measured in bytes'.
>> Thus, at least according to the docs, gcc can increase the alignment of the
>> 'struct ftrace_event_call', from its original specification of 4, to 16. Even
>> in the case where we are working corectly the structures are 8-byte aligned.
>>
>> Thus, I would reccommend the patch below as a preventive measure. Its
>> the minimal patch I've found to resolve this issue. In general, if we
>> are going to walk data structures embedded in a special elf section, I
>> think the general rules needs to be to set the alignment to the power of
>> two which is greater than or equal to the largest item in the structure.
>>
>> thanks,
>>
>> -Jason
>>
>> Signed-off-by: Jason Baron<[email protected]>
>>
>>
>> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
>> index a81170d..7182f03 100644
>> --- a/include/linux/ftrace_event.h
>> +++ b/include/linux/ftrace_event.h
>> @@ -124,7 +124,10 @@ struct ftrace_event_call {
>> atomic_t profile_count;
>> int (*profile_enable)(struct ftrace_event_call *);
>> void (*profile_disable)(struct ftrace_event_call *);
>> -};
>> +} __attribute__((aligned(16)));
>> +
>> +/* Align to the largest field in the data structure:
>> + * sizeof(struct list_head) = 16 */
>>
>
> Is this true for i386?
>
> I just tried this patch and it seems to work. Can you give it a try.
>
> Signed-off-by: Steven Rostedt<[email protected]>
>
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 4ec5e67..044b70d 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -133,7 +133,7 @@ struct ftrace_event_call {
> atomic_t profile_count;
> int (*profile_enable)(void);
> void (*profile_disable)(void);
> -};
> +} __attribute__((aligned(sizeof(struct list_head))));
>
> #define FTRACE_MAX_PROFILE_SIZE 2048
>
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index cc0d966..31e7637 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
> * }
> *
> * static struct ftrace_event_call __used
> - * __attribute__((__aligned__(4)))
> * __attribute__((section("_ftrace_events"))) event_<call> = {
> * .name = "<call>",
> * .system = "<system>",
> @@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
> } \
> \
> static struct ftrace_event_call __used \
> -__attribute__((__aligned__(4))) \
> __attribute__((section("_ftrace_events"))) event_##call = { \
> .name = #call, \
> .system = __stringify(TRACE_SYSTEM), \
>
>
>
>
o.k. applied your patch, but unfortunantly
I still am hitting this kernel panic.

must admit I have no idea why this is doing this.
(but am willing to sit through this, because eventually
sooner or later will hit this if I update gcc).

Justin P. Mattock

2009-10-07 13:01:18

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Tue, 2009-10-06 at 19:42 -0700, Justin P. Mattock wrote:

> >
> o.k. applied your patch, but unfortunantly
> I still am hitting this kernel panic.
>
> must admit I have no idea why this is doing this.
> (but am willing to sit through this, because eventually
> sooner or later will hit this if I update gcc).

But the panic you showed was that it could not find an init to execute.
Which looks like a setup issue and not a kernel bug.

-- Steve

2009-10-07 14:32:24

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Tue, Oct 06, 2009 at 10:02:01PM -0400, Steven Rostedt wrote:
> > So the problem I'm seeing is an oops on boot caused by the call->system pointer
> > deference in event_create_dir(). The 'call' variable is of type 'struct
> > ftrace_event_call'.
> >
> > What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> > (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> > structures are 16-byte aligned. Thus, the stride for walking through the
> > pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
> >
> > I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> > vanilla 2.6.31 kernel.
> >
> > That said, I'm not sure the compiler is doing the wrong thing here. The
> > 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> > is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> > minimum alignment for the variable or structure field, measured in bytes'.
> > Thus, at least according to the docs, gcc can increase the alignment of the
> > 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> > in the case where we are working corectly the structures are 8-byte aligned.
> >
> > Thus, I would reccommend the patch below as a preventive measure. Its
> > the minimal patch I've found to resolve this issue. In general, if we
> > are going to walk data structures embedded in a special elf section, I
> > think the general rules needs to be to set the alignment to the power of
> > two which is greater than or equal to the largest item in the structure.
> >
> > thanks,
> >
> > -Jason
> >
> > Signed-off-by: Jason Baron <[email protected]>
> >
> >
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index a81170d..7182f03 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -124,7 +124,10 @@ struct ftrace_event_call {
> > atomic_t profile_count;
> > int (*profile_enable)(struct ftrace_event_call *);
> > void (*profile_disable)(struct ftrace_event_call *);
> > -};
> > +} __attribute__((aligned(16)));
> > +
> > +/* Align to the largest field in the data structure:
> > + * sizeof(struct list_head) = 16 */
>
> Is this true for i386?
>
> I just tried this patch and it seems to work. Can you give it a try.
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 4ec5e67..044b70d 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -133,7 +133,7 @@ struct ftrace_event_call {
> atomic_t profile_count;
> int (*profile_enable)(void);
> void (*profile_disable)(void);
> -};
> +} __attribute__((aligned(sizeof(struct list_head))));
>
> #define FTRACE_MAX_PROFILE_SIZE 2048
>
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index cc0d966..31e7637 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
> * }
> *
> * static struct ftrace_event_call __used
> - * __attribute__((__aligned__(4)))
> * __attribute__((section("_ftrace_events"))) event_<call> = {
> * .name = "<call>",
> * .system = "<system>",
> @@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
> } \
> \
> static struct ftrace_event_call __used \
> -__attribute__((__aligned__(4))) \
> __attribute__((section("_ftrace_events"))) event_##call = { \
> .name = #call, \
> .system = __stringify(TRACE_SYSTEM), \
>
>

indeed your patch works as well for me, its much cleaner!

However, I want to make sure this fix is sufficient and is the best way to
address this type of issue in general. For example, I know tracepoints are
using the aligned attribute in all 3 places -> definition, usage, and linker
alignment. (adding Mathieu to 'cc list). Is just the definition 'aligned'
sufficient? Also, once we find a method for solving these issues in general,
we need to review all users of this kind of technique to make sure they are
consistent. I also think your patch above needs to add a comment to say what
its doing.

thanks,

-Jason

2009-10-07 14:41:39

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

* Jason Baron ([email protected]) wrote:
> On Tue, Oct 06, 2009 at 10:02:01PM -0400, Steven Rostedt wrote:
> > > So the problem I'm seeing is an oops on boot caused by the call->system pointer
> > > deference in event_create_dir(). The 'call' variable is of type 'struct
> > > ftrace_event_call'.
> > >
> > > What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> > > (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> > > structures are 16-byte aligned. Thus, the stride for walking through the
> > > pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
> > >
> > > I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> > > vanilla 2.6.31 kernel.
> > >
> > > That said, I'm not sure the compiler is doing the wrong thing here. The
> > > 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> > > is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> > > minimum alignment for the variable or structure field, measured in bytes'.
> > > Thus, at least according to the docs, gcc can increase the alignment of the
> > > 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> > > in the case where we are working corectly the structures are 8-byte aligned.
> > >
> > > Thus, I would reccommend the patch below as a preventive measure. Its
> > > the minimal patch I've found to resolve this issue. In general, if we
> > > are going to walk data structures embedded in a special elf section, I
> > > think the general rules needs to be to set the alignment to the power of
> > > two which is greater than or equal to the largest item in the structure.
> > >
> > > thanks,
> > >
> > > -Jason
> > >
> > > Signed-off-by: Jason Baron <[email protected]>
> > >
> > >
> > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > > index a81170d..7182f03 100644
> > > --- a/include/linux/ftrace_event.h
> > > +++ b/include/linux/ftrace_event.h
> > > @@ -124,7 +124,10 @@ struct ftrace_event_call {
> > > atomic_t profile_count;
> > > int (*profile_enable)(struct ftrace_event_call *);
> > > void (*profile_disable)(struct ftrace_event_call *);
> > > -};
> > > +} __attribute__((aligned(16)));
> > > +
> > > +/* Align to the largest field in the data structure:
> > > + * sizeof(struct list_head) = 16 */
> >
> > Is this true for i386?
> >
> > I just tried this patch and it seems to work. Can you give it a try.
> >
> > Signed-off-by: Steven Rostedt <[email protected]>
> >
> >
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index 4ec5e67..044b70d 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -133,7 +133,7 @@ struct ftrace_event_call {
> > atomic_t profile_count;
> > int (*profile_enable)(void);
> > void (*profile_disable)(void);
> > -};
> > +} __attribute__((aligned(sizeof(struct list_head))));

I don't like that.

Basically, the vmlinux.lds.h linker script must have alignment
statements before each section, which match the alignment of the section
structures. Failure to do so would put padding at the beginning of the
section, which is definitely not working at all. I don't see how we can
automatically pass sizeof(struct list_head) to a linker script :/

Mathieu

> >
> > #define FTRACE_MAX_PROFILE_SIZE 2048
> >
> > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > index cc0d966..31e7637 100644
> > --- a/include/trace/ftrace.h
> > +++ b/include/trace/ftrace.h
> > @@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
> > * }
> > *
> > * static struct ftrace_event_call __used
> > - * __attribute__((__aligned__(4)))
> > * __attribute__((section("_ftrace_events"))) event_<call> = {
> > * .name = "<call>",
> > * .system = "<system>",
> > @@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
> > } \
> > \
> > static struct ftrace_event_call __used \
> > -__attribute__((__aligned__(4))) \
> > __attribute__((section("_ftrace_events"))) event_##call = { \
> > .name = #call, \
> > .system = __stringify(TRACE_SYSTEM), \
> >
> >
>
> indeed your patch works as well for me, its much cleaner!
>
> However, I want to make sure this fix is sufficient and is the best way to
> address this type of issue in general. For example, I know tracepoints are
> using the aligned attribute in all 3 places -> definition, usage, and linker
> alignment. (adding Mathieu to 'cc list). Is just the definition 'aligned'
> sufficient? Also, once we find a method for solving these issues in general,
> we need to review all users of this kind of technique to make sure they are
> consistent. I also think your patch above needs to add a comment to say what
> its doing.
>
> thanks,
>
> -Jason
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-10-07 14:53:27

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Tue, 2009-10-06 at 19:42 -0700, Justin P. Mattock wrote:
>
>
>>>
>>>
>> o.k. applied your patch, but unfortunantly
>> I still am hitting this kernel panic.
>>
>> must admit I have no idea why this is doing this.
>> (but am willing to sit through this, because eventually
>> sooner or later will hit this if I update gcc).
>>
>
> But the panic you showed was that it could not find an init to execute.
> Which looks like a setup issue and not a kernel bug.
>
> -- Steve
>
>
>
>
That's whats getting me, i.g. if I compile
sysvinit normally without adding an SELinux patch
the system boots, as soon as I compile sysvinit with
SELinux support to load the policy, bam... I hit this.

I can send a post to SELinux and see what they think,
and the go from there.

Justin P. Mattock

2009-10-07 14:57:31

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Wed, 2009-10-07 at 10:40 -0400, Mathieu Desnoyers wrote:
> * Jason Baron ([email protected]) wrote:
> > On Tue, Oct 06, 2009 at 10:02:01PM -0400, Steven Rostedt wrote:
> > > > So the problem I'm seeing is an oops on boot caused by the call->system pointer
> > > > deference in event_create_dir(). The 'call' variable is of type 'struct
> > > > ftrace_event_call'.
> > > >
> > > > What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> > > > (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> > > > structures are 16-byte aligned. Thus, the stride for walking through the
> > > > pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
> > > >
> > > > I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> > > > vanilla 2.6.31 kernel.
> > > >
> > > > That said, I'm not sure the compiler is doing the wrong thing here. The
> > > > 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> > > > is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> > > > minimum alignment for the variable or structure field, measured in bytes'.
> > > > Thus, at least according to the docs, gcc can increase the alignment of the
> > > > 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> > > > in the case where we are working corectly the structures are 8-byte aligned.
> > > >
> > > > Thus, I would reccommend the patch below as a preventive measure. Its
> > > > the minimal patch I've found to resolve this issue. In general, if we
> > > > are going to walk data structures embedded in a special elf section, I
> > > > think the general rules needs to be to set the alignment to the power of
> > > > two which is greater than or equal to the largest item in the structure.
> > > >
> > > > thanks,
> > > >
> > > > -Jason
> > > >
> > > > Signed-off-by: Jason Baron <[email protected]>
> > > >
> > > >
> > > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > > > index a81170d..7182f03 100644
> > > > --- a/include/linux/ftrace_event.h
> > > > +++ b/include/linux/ftrace_event.h
> > > > @@ -124,7 +124,10 @@ struct ftrace_event_call {
> > > > atomic_t profile_count;
> > > > int (*profile_enable)(struct ftrace_event_call *);
> > > > void (*profile_disable)(struct ftrace_event_call *);
> > > > -};
> > > > +} __attribute__((aligned(16)));
> > > > +
> > > > +/* Align to the largest field in the data structure:
> > > > + * sizeof(struct list_head) = 16 */
> > >
> > > Is this true for i386?
> > >
> > > I just tried this patch and it seems to work. Can you give it a try.
> > >
> > > Signed-off-by: Steven Rostedt <[email protected]>
> > >
> > >
> > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > > index 4ec5e67..044b70d 100644
> > > --- a/include/linux/ftrace_event.h
> > > +++ b/include/linux/ftrace_event.h
> > > @@ -133,7 +133,7 @@ struct ftrace_event_call {
> > > atomic_t profile_count;
> > > int (*profile_enable)(void);
> > > void (*profile_disable)(void);
> > > -};
> > > +} __attribute__((aligned(sizeof(struct list_head))));
>
> I don't like that.
>
> Basically, the vmlinux.lds.h linker script must have alignment
> statements before each section, which match the alignment of the section
> structures. Failure to do so would put padding at the beginning of the
> section, which is definitely not working at all. I don't see how we can
> automatically pass sizeof(struct list_head) to a linker script :/

OK, what about __attribute__((aligned((BITS_PER_LONG/8)*2)))

That should also work in the linker script as well.

With the added comment:

/*
* We must aligned by the largest item in the structure. This happens
* to be the list_head, which consists of two pointers.
*/

>
> Mathieu
>
> > >
> > > #define FTRACE_MAX_PROFILE_SIZE 2048
> > >
> > > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > > index cc0d966..31e7637 100644
> > > --- a/include/trace/ftrace.h
> > > +++ b/include/trace/ftrace.h
> > > @@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
> > > * }
> > > *
> > > * static struct ftrace_event_call __used
> > > - * __attribute__((__aligned__(4)))
> > > * __attribute__((section("_ftrace_events"))) event_<call> = {
> > > * .name = "<call>",
> > > * .system = "<system>",
> > > @@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
> > > } \
> > > \
> > > static struct ftrace_event_call __used \
> > > -__attribute__((__aligned__(4))) \
> > > __attribute__((section("_ftrace_events"))) event_##call = { \
> > > .name = #call, \
> > > .system = __stringify(TRACE_SYSTEM), \
> > >
> > >
> >
> > indeed your patch works as well for me, its much cleaner!
> >
> > However, I want to make sure this fix is sufficient and is the best way to
> > address this type of issue in general. For example, I know tracepoints are
> > using the aligned attribute in all 3 places -> definition, usage, and linker
> > alignment. (adding Mathieu to 'cc list). Is just the definition 'aligned'
> > sufficient? Also, once we find a method for solving these issues in general,
> > we need to review all users of this kind of technique to make sure they are
> > consistent. I also think your patch above needs to add a comment to say what
> > its doing.

Yes, I forgot to add the comment. One really does belong there.

-- Steve

> >
> > thanks,
> >
> > -Jason
> >
> >
>

2009-10-07 15:05:50

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

* Steven Rostedt ([email protected]) wrote:
> On Wed, 2009-10-07 at 10:40 -0400, Mathieu Desnoyers wrote:
> > * Jason Baron ([email protected]) wrote:
> > > On Tue, Oct 06, 2009 at 10:02:01PM -0400, Steven Rostedt wrote:
> > > > > So the problem I'm seeing is an oops on boot caused by the call->system pointer
> > > > > deference in event_create_dir(). The 'call' variable is of type 'struct
> > > > > ftrace_event_call'.
> > > > >
> > > > > What's going on is that the 'struct ftrace_event_call' is of size 168 bytes
> > > > > (sizeof(struct ftrace_event_call)) = 168 = 0xA8. However, in memory the
> > > > > structures are 16-byte aligned. Thus, the stride for walking through the
> > > > > pointers needs to be 176 (0xB0), but instead its 168 causing the oops.
> > > > >
> > > > > I've only seen this issue while using gcc (GCC) 4.5.0 20090916, on a
> > > > > vanilla 2.6.31 kernel.
> > > > >
> > > > > That said, I'm not sure the compiler is doing the wrong thing here. The
> > > > > 'struct ftrace_event_call' contains an embedded 'struct list_head' which
> > > > > is 16 bytes. According to the gcc docs, the aligned attribute, 'specifies a
> > > > > minimum alignment for the variable or structure field, measured in bytes'.
> > > > > Thus, at least according to the docs, gcc can increase the alignment of the
> > > > > 'struct ftrace_event_call', from its original specification of 4, to 16. Even
> > > > > in the case where we are working corectly the structures are 8-byte aligned.
> > > > >
> > > > > Thus, I would reccommend the patch below as a preventive measure. Its
> > > > > the minimal patch I've found to resolve this issue. In general, if we
> > > > > are going to walk data structures embedded in a special elf section, I
> > > > > think the general rules needs to be to set the alignment to the power of
> > > > > two which is greater than or equal to the largest item in the structure.
> > > > >
> > > > > thanks,
> > > > >
> > > > > -Jason
> > > > >
> > > > > Signed-off-by: Jason Baron <[email protected]>
> > > > >
> > > > >
> > > > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > > > > index a81170d..7182f03 100644
> > > > > --- a/include/linux/ftrace_event.h
> > > > > +++ b/include/linux/ftrace_event.h
> > > > > @@ -124,7 +124,10 @@ struct ftrace_event_call {
> > > > > atomic_t profile_count;
> > > > > int (*profile_enable)(struct ftrace_event_call *);
> > > > > void (*profile_disable)(struct ftrace_event_call *);
> > > > > -};
> > > > > +} __attribute__((aligned(16)));
> > > > > +
> > > > > +/* Align to the largest field in the data structure:
> > > > > + * sizeof(struct list_head) = 16 */
> > > >
> > > > Is this true for i386?
> > > >
> > > > I just tried this patch and it seems to work. Can you give it a try.
> > > >
> > > > Signed-off-by: Steven Rostedt <[email protected]>
> > > >
> > > >
> > > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > > > index 4ec5e67..044b70d 100644
> > > > --- a/include/linux/ftrace_event.h
> > > > +++ b/include/linux/ftrace_event.h
> > > > @@ -133,7 +133,7 @@ struct ftrace_event_call {
> > > > atomic_t profile_count;
> > > > int (*profile_enable)(void);
> > > > void (*profile_disable)(void);
> > > > -};
> > > > +} __attribute__((aligned(sizeof(struct list_head))));
> >
> > I don't like that.
> >
> > Basically, the vmlinux.lds.h linker script must have alignment
> > statements before each section, which match the alignment of the section
> > structures. Failure to do so would put padding at the beginning of the
> > section, which is definitely not working at all. I don't see how we can
> > automatically pass sizeof(struct list_head) to a linker script :/
>
> OK, what about __attribute__((aligned((BITS_PER_LONG/8)*2)))
>
> That should also work in the linker script as well.
>
> With the added comment:
>
> /*
> * We must aligned by the largest item in the structure. This happens
> * to be the list_head, which consists of two pointers.
> */
>

Yep, sounds good. Oddly we have to keep these in sync manually. I'd also
add a comment in the C code to tell whoever want to change the size of
the structure to also check the linker script.

Also adding a BUILD_BUG_ON() that checks the structure sizeof() would be
a nice safety-net (this should probably be added to tracepoints too
eventually).

Mathieu

> >
> > Mathieu
> >
> > > >
> > > > #define FTRACE_MAX_PROFILE_SIZE 2048
> > > >
> > > > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > > > index cc0d966..31e7637 100644
> > > > --- a/include/trace/ftrace.h
> > > > +++ b/include/trace/ftrace.h
> > > > @@ -501,7 +501,6 @@ static void ftrace_profile_disable_##call(void) \
> > > > * }
> > > > *
> > > > * static struct ftrace_event_call __used
> > > > - * __attribute__((__aligned__(4)))
> > > > * __attribute__((section("_ftrace_events"))) event_<call> = {
> > > > * .name = "<call>",
> > > > * .system = "<system>",
> > > > @@ -619,7 +618,6 @@ static int ftrace_raw_init_event_##call(void) \
> > > > } \
> > > > \
> > > > static struct ftrace_event_call __used \
> > > > -__attribute__((__aligned__(4))) \
> > > > __attribute__((section("_ftrace_events"))) event_##call = { \
> > > > .name = #call, \
> > > > .system = __stringify(TRACE_SYSTEM), \
> > > >
> > > >
> > >
> > > indeed your patch works as well for me, its much cleaner!
> > >
> > > However, I want to make sure this fix is sufficient and is the best way to
> > > address this type of issue in general. For example, I know tracepoints are
> > > using the aligned attribute in all 3 places -> definition, usage, and linker
> > > alignment. (adding Mathieu to 'cc list). Is just the definition 'aligned'
> > > sufficient? Also, once we find a method for solving these issues in general,
> > > we need to review all users of this kind of technique to make sure they are
> > > consistent. I also think your patch above needs to add a comment to say what
> > > its doing.
>
> Yes, I forgot to add the comment. One really does belong there.
>
> -- Steve
>
> > >
> > > thanks,
> > >
> > > -Jason
> > >
> > >
> >
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-10-07 15:13:39

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Wed, 2009-10-07 at 07:53 -0700, Justin P. Mattock wrote:
> Steven Rostedt wrote:
> > On Tue, 2009-10-06 at 19:42 -0700, Justin P. Mattock wrote:
> >
> >
> >>>
> >>>
> >> o.k. applied your patch, but unfortunantly
> >> I still am hitting this kernel panic.
> >>
> >> must admit I have no idea why this is doing this.
> >> (but am willing to sit through this, because eventually
> >> sooner or later will hit this if I update gcc).
> >>
> >
> > But the panic you showed was that it could not find an init to execute.
> > Which looks like a setup issue and not a kernel bug.
> >
> > -- Steve
> >
> >
> >
> >
> That's whats getting me, i.g. if I compile
> sysvinit normally without adding an SELinux patch
> the system boots, as soon as I compile sysvinit with
> SELinux support to load the policy, bam... I hit this.
>
> I can send a post to SELinux and see what they think,
> and the go from there.

Oh! It's an SELinux thing. It probably prevents you from executing
init ;-)

-- Steve

2009-10-07 15:16:23

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Wed, 2009-10-07 at 11:05 -0400, Mathieu Desnoyers wrote:

> Also adding a BUILD_BUG_ON() that checks the structure sizeof() would be
> a nice safety-net (this should probably be added to tracepoints too
> eventually).

Yeah, I was thinking the same thing. But that would have to be added
somewhere in a C file.

-- Steve

2009-10-07 15:53:51

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Wed, 2009-10-07 at 07:53 -0700, Justin P. Mattock wrote:
>
>> Steven Rostedt wrote:
>>
>>> On Tue, 2009-10-06 at 19:42 -0700, Justin P. Mattock wrote:
>>>
>>>
>>>
>>>>>
>>>>>
>>>> o.k. applied your patch, but unfortunantly
>>>> I still am hitting this kernel panic.
>>>>
>>>> must admit I have no idea why this is doing this.
>>>> (but am willing to sit through this, because eventually
>>>> sooner or later will hit this if I update gcc).
>>>>
>>>>
>>> But the panic you showed was that it could not find an init to execute.
>>> Which looks like a setup issue and not a kernel bug.
>>>
>>> -- Steve
>>>
>>>
>>>
>>>
>>>
>> That's whats getting me, i.g. if I compile
>> sysvinit normally without adding an SELinux patch
>> the system boots, as soon as I compile sysvinit with
>> SELinux support to load the policy, bam... I hit this.
>>
>> I can send a post to SELinux and see what they think,
>> and the go from there.
>>
>
> Oh! It's an SELinux thing. It probably prevents you from executing
> init ;-)
>
> -- Steve
>
>
>
>
What I think is happening is while building
sysvinit with the SELinux patch, sysvinit is looking
in /lib for libselinux(but could be wrong) but libselinux is in
/lib64.
(here's my excuse as a newbie:)
part of a pain when building an x86_64 multilib is building
everything to point to lib64(pure64 with the soft link
lib64 -> lib makes life so much easier).

I'm going to look at that patch and see how it tells
-lselinux -lsepol to find the libs.
(if this is the case then I must admit I am a real
newbie, and apologize for the confusion).

Justin P. Mattock

2009-10-07 16:08:23

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Wed, 2009-10-07 at 08:52 -0700, Justin P. Mattock wrote:

> >
> What I think is happening is while building
> sysvinit with the SELinux patch, sysvinit is looking
> in /lib for libselinux(but could be wrong) but libselinux is in
> /lib64.
> (here's my excuse as a newbie:)
> part of a pain when building an x86_64 multilib is building
> everything to point to lib64(pure64 with the soft link
> lib64 -> lib makes life so much easier).
>
> I'm going to look at that patch and see how it tells
> -lselinux -lsepol to find the libs.
> (if this is the case then I must admit I am a real
> newbie, and apologize for the confusion).

Justin,

Just being able to monkey around with the init code on top of SELinux
already qualifies you to be well beyond a newbie ;-)

-- Steve

2009-10-07 17:47:33

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Wed, 2009-10-07 at 08:52 -0700, Justin P. Mattock wrote:
>
>
>>>
>>>
>> What I think is happening is while building
>> sysvinit with the SELinux patch, sysvinit is looking
>> in /lib for libselinux(but could be wrong) but libselinux is in
>> /lib64.
>> (here's my excuse as a newbie:)
>> part of a pain when building an x86_64 multilib is building
>> everything to point to lib64(pure64 with the soft link
>> lib64 -> lib makes life so much easier).
>>
>> I'm going to look at that patch and see how it tells
>> -lselinux -lsepol to find the libs.
>> (if this is the case then I must admit I am a real
>> newbie, and apologize for the confusion).
>>
>
> Justin,
>
> Just being able to monkey around with the init code on top of SELinux
> already qualifies you to be well beyond a newbie ;-)
>
> -- Steve
>
>
>
>
Thanks man!!(I really appreciate that).
Im looking at sysvinit right now
Im thinking all it really needs is
LIBDIR/LDFLAGS=-L/lib64 -lselinux -lsepol
in the Makefile or something in that area.

Justin P. Mattock

2009-10-07 18:46:44

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Wed, 2009-10-07 at 08:52 -0700, Justin P. Mattock wrote:
>
>
>>>
>>>
>> What I think is happening is while building
>> sysvinit with the SELinux patch, sysvinit is looking
>> in /lib for libselinux(but could be wrong) but libselinux is in
>> /lib64.
>> (here's my excuse as a newbie:)
>> part of a pain when building an x86_64 multilib is building
>> everything to point to lib64(pure64 with the soft link
>> lib64 -> lib makes life so much easier).
>>
>> I'm going to look at that patch and see how it tells
>> -lselinux -lsepol to find the libs.
>> (if this is the case then I must admit I am a real
>> newbie, and apologize for the confusion).
>>
>
> Justin,
>
> Just being able to monkey around with the init code on top of SELinux
> already qualifies you to be well beyond a newbie ;-)
>
> -- Steve
>
>
>
>
yep like I said I owe you guys an apology, for
my mistake(thanks for taking the time).

From what it looks like init was looking for libselinux
in /lib instead of /lib64.
after adjusting the Makefile and adding:
LDFLAGS = -s -L/lib64 -lselinux -lsepol
LIBDIR=/lib64
the system boots right up.
(sh*t built everything to point to lib64(xserver and all).., and
forgot the modify init.c to do the same).

Justin P. Mattock

2009-10-07 18:57:31

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

On Wed, 2009-10-07 at 11:45 -0700, Justin P. Mattock wrote:

> >
> yep like I said I owe you guys an apology, for
> my mistake(thanks for taking the time).

Well, thank you for reporting it. If nobody reports any problems, we
just assume everything is OK, and as people curse us out and look for
alternatives, we will be sipping our mai-tais in ignorant bliss.

-- Steve

>
> From what it looks like init was looking for libselinux
> in /lib instead of /lib64.
> after adjusting the Makefile and adding:
> LDFLAGS = -s -L/lib64 -lselinux -lsepol
> LIBDIR=/lib64
> the system boots right up.
> (sh*t built everything to point to lib64(xserver and all).., and
> forgot the modify init.c to do the same).
>
> Justin P. Mattock
>

2009-10-07 19:08:48

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

Steven Rostedt wrote:
> On Wed, 2009-10-07 at 11:45 -0700, Justin P. Mattock wrote:
>
>
>>>
>>>
>> yep like I said I owe you guys an apology, for
>> my mistake(thanks for taking the time).
>>
>
> Well, thank you for reporting it. If nobody reports any problems, we
> just assume everything is OK, and as people curse us out and look for
> alternatives, we will be sipping our mai-tais in ignorant bliss.
>
> -- Steve
>
>
agree,
my main concern when reporting is to make sure
that it's a real issue, and not a mistake like this one.
Anyways, a mai-tais sure does sound good
(sign me up!!)
>> From what it looks like init was looking for libselinux
>> in /lib instead of /lib64.
>> after adjusting the Makefile and adding:
>> LDFLAGS = -s -L/lib64 -lselinux -lsepol
>> LIBDIR=/lib64
>> the system boots right up.
>> (sh*t built everything to point to lib64(xserver and all).., and
>> forgot the modify init.c to do the same).
>>
>> Justin P. Mattock
>>
>>
>
>
>
Justin P. mattock

2009-10-12 10:18:49

by Ingo Molnar

[permalink] [raw]

Subject: Re: system gets stuck in a lock during boot

* Justin P. Mattock <[email protected]> wrote:

> Steven Rostedt wrote:
>> On Wed, 2009-10-07 at 11:45 -0700, Justin P. Mattock wrote:
>>
>>
>>>>
>>>>
>>> yep like I said I owe you guys an apology, for
>>> my mistake(thanks for taking the time).
>>>
>>
>> Well, thank you for reporting it. If nobody reports any problems, we
>> just assume everything is OK, and as people curse us out and look for
>> alternatives, we will be sipping our mai-tais in ignorant bliss.
>>
>> -- Steve
>>
>>
>
> agree, my main concern when reporting is to make sure that it's a real
> issue, and not a mistake like this one.

Well, it can be really hard to filter out whether a problem is somehow
self-inflicted or caused by the kernel proper - thus we generally prefer
over-reporting over under-reporting.

Even if it's self-inflicted it's useful on a meta level: if many people
report it then we might end up adding more safety guards to disable a
particular common pattern of shoot-foot-self incidents.

Ingo

2009-10-12 18:17:10