Vivek Goyal wrote:
>
> > Ok. The point here is to know which task/cpu called panic, rather
> > than to get the task_struct. That makes a lot of sense, and is
> > cheap to get. Any note on the crashing cpu that is not captured
> > by another cpu will give us that information.
> >
>
> That makes sense. Sheer presence of a particular note can identify the
> crashing cpu and no need to store "current". Only crashing cpu is going
> to store that note and that too after respective NT_PRSTATUS.
>
> > My primary concern is while the concept of a task_struct is pretty
> > stable who is to know how the kernel will change in the future. So
> > if we don't need to export a task_struct pointer and merely need
> > to flag the cpu that called panic we can do that much more reliably.
Just flagging the cpu, and then mapping that to the stack pointer found in
the associated NT_PRSTATUS register set should work OK too. It gets
a little muddy if it crashed while running on an IRQ stack, but it still can be
tracked back from there as well. (although not if the crashing task overflowed
the IRQ stack)
The task_struct would be ideal though -- if the kernel's use of task_structs
changes in the future, well, then crash is going to need a serious re-write
anyway... FWIW, netdump and diskdump use the NT_TASKSTRUCT note
note to store just the "current" pointer, and not the whole task_struct itself,
which would just be a waste of space in the ELF header for crash's purposes.
And looking at the gdb sources, it appears to be totally ignored. Who
uses the NT_TASKSTRUCT note anyway?
>
> > We do need a way that we can test to see if a core dump
> > actually matches the vmlinux we are looking at. Probably
> > this is capturing all of the information captured by linux/version.h
> > and linux/compile.h both at runtime and at compile time and
> > checking to see if they match.
> >
>
> I quickly browsed through "crash" code and looks like it is already doing
> similiar check (kernel.c, verify_version()). It seems to be retrieving
> "linux_banner" from core image and also retrieving banner string from vmlinux
> and trying to match. So if banner information can be directly read from the
> core image, probably there is no need to export it through notes.
>
That's correct.
Dave
Dave Anderson <[email protected]> writes:
> Just flagging the cpu, and then mapping that to the stack pointer found in
> the associated NT_PRSTATUS register set should work OK too. It gets
> a little muddy if it crashed while running on an IRQ stack, but it still can be
> tracked back from there as well. (although not if the crashing task overflowed
> the IRQ stack)
You can't track it back from the crashing cpu if the IRQ stack overflows
either. So I would rather have crash confused when trying to find the
task_struct. Then to have the kernel fail avoidably while attempting
to capture a core dump.
Even if you overflow the stack wit a bit of detective work it should still
be possible to show the stack overflowed and correct for it when analyzing
the crash dump. Doing anything like that from a crashing cpu (in a
reliable way) is very hard.
> The task_struct would be ideal though -- if the kernel's use of task_structs
> changes in the future, well, then crash is going to need a serious re-write
> anyway... FWIW, netdump and diskdump use the NT_TASKSTRUCT note
> note to store just the "current" pointer, and not the whole task_struct itself,
> which would just be a waste of space in the ELF header for crash's purposes.
> And looking at the gdb sources, it appears to be totally ignored. Who
> uses the NT_TASKSTRUCT note anyway?
Good question, especially as the kernel exports whatever we have for
a task struct today in the ELF note. No ABI compatibility is
maintained.
Given all of that I recommend an empty NT_TASKSTRUCT to flag the
crashing cpu, for now.
Eric
Sorry. Reposting since it did not made to LKML due to my stupid mistake;
Also posting Dave Anderson's reponse:
Eric W. Biederman wrote:
>Dave Anderson <[email protected]> writes:
>
>
>
>>Just flagging the cpu, and then mapping that to the stack pointer found in
>>the associated NT_PRSTATUS register set should work OK too. It gets
>>a little muddy if it crashed while running on an IRQ stack, but it still can be
>>tracked back from there as well. (although not if the crashing task overflowed
>>the IRQ stack)
>>
>>
>
>You can't track it back from the crashing cpu if the IRQ stack overflows
>either. So I would rather have crash confused when trying to find the
>task_struct. Then to have the kernel fail avoidably while attempting
>to capture a core dump.
>
>Even if you overflow the stack wit a bit of detective work it should still
>be possible to show the stack overflowed and correct for it when analyzing
>the crash dump. Doing anything like that from a crashing cpu (in a
>reliable way) is very hard.
>
>
>
>>The task_struct would be ideal though -- if the kernel's use of task_structs
>>changes in the future, well, then crash is going to need a serious re-write
>>anyway... FWIW, netdump and diskdump use the NT_TASKSTRUCT note
>>note to store just the "current" pointer, and not the whole task_struct itself,
>>which would just be a waste of space in the ELF header for crash's purposes.
>>And looking at the gdb sources, it appears to be totally ignored. Who
>>uses the NT_TASKSTRUCT note anyway?
>>
>>
>
>Good question, especially as the kernel exports whatever we have for
>a task struct today in the ELF note. No ABI compatibility is
>maintained.
>
>Given all of that I recommend an empty NT_TASKSTRUCT to flag the
>crashing cpu, for now.
>
>
At present /proc/kcore writes the complete task structure for
NT_TASKSTRUCT note section. Thought it is the standard. Hence created
separate note section. The other option is the crash tool can directly
read "crashing_cpu variable" from the vmcore to determine the panic cpu.
Similarly, we can define panic_task variable in the kernel.
Dave Anderson ([email protected]) reponse:
" So does elf_core_dump() as well, but to gdb it's useless AFAICT...
Hey -- I wasn't even aware of the "crashing_cpu" variable.
That would work just fine.
Still a "panic_task", and perhaps even a "crash_page_size" variable
would be nice as well. No additional notes required...
Dave "
Basically, we can use some global structure in the kernel and dump any
needed information which we do not need to invoke any analysis tools
(crash, gdb). Dumping CPU control registers can also be done this way
without creating separate note section.
Thanks
Haren
>Eric
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>fastboot mailing list
>[email protected]
>https://lists.osdl.org/mailman/listinfo/fastboot
>
>
On Thu, Sep 22, 2005 at 10:31:52AM -0600, Eric W. Biederman wrote:
> Dave Anderson <[email protected]> writes:
>
> > Just flagging the cpu, and then mapping that to the stack pointer found in
> > the associated NT_PRSTATUS register set should work OK too. It gets
> > a little muddy if it crashed while running on an IRQ stack, but it still can be
> > tracked back from there as well. (although not if the crashing task overflowed
> > the IRQ stack)
>
> You can't track it back from the crashing cpu if the IRQ stack overflows
> either. So I would rather have crash confused when trying to find the
> task_struct. Then to have the kernel fail avoidably while attempting
> to capture a core dump.
>
> Even if you overflow the stack wit a bit of detective work it should still
> be possible to show the stack overflowed and correct for it when analyzing
> the crash dump. Doing anything like that from a crashing cpu (in a
> reliable way) is very hard.
>
> > The task_struct would be ideal though -- if the kernel's use of task_structs
> > changes in the future, well, then crash is going to need a serious re-write
> > anyway... FWIW, netdump and diskdump use the NT_TASKSTRUCT note
> > note to store just the "current" pointer, and not the whole task_struct itself,
> > which would just be a waste of space in the ELF header for crash's purposes.
> > And looking at the gdb sources, it appears to be totally ignored. Who
> > uses the NT_TASKSTRUCT note anyway?
>
> Good question, especially as the kernel exports whatever we have for
> a task struct today in the ELF note. No ABI compatibility is
> maintained.
>
> Given all of that I recommend an empty NT_TASKSTRUCT to flag the
> crashing cpu, for now.
>
I got a concern here. Are we not breaking the convention. NT_TASKSTRUCT note
type represents that task_sturct is stored in note data. elf_core_dump()
and /proc/kcore already do that (That's a different story that it might not
be needed at all). Now if we store a null NT_TASKSTURCT, same note type will
carry two meanings.
IMHO, introducing a null NT_KDUMPINFO will help in that sense, at least
there are no two interpretations of same note type. Also it provides the
scope to add more elements to it if need be.
Thanks
Vivek
Vivek Goyal <[email protected]> writes:
> I got a concern here. Are we not breaking the convention. NT_TASKSTRUCT note
> type represents that task_sturct is stored in note data. elf_core_dump()
> and /proc/kcore already do that (That's a different story that it might not
> be needed at all). Now if we store a null NT_TASKSTURCT, same note type will
> carry two meanings.
My impression is that NT_TASKSTRUCT has failed to define what actually appears
in the note, as the kernel task_struct is not designed to be exported to
user space, and can change without warning.
Hmm. NT_TASKSTRUCT feels like an information leak although I haven't
a clue what kernel information you could leak that would be interesting
enough that you could compromise the kernel. It looks like we should
export NT_TASKSTRUCT until we can decide what goes there.
> IMHO, introducing a null NT_KDUMPINFO will help in that sense, at least
> there are no two interpretations of same note type. Also it provides the
> scope to add more elements to it if need be.
Agreed. I was trying too hard to reuse what was already there.
Eric
task_struct is an internal structure to the kernel with
a lot of good information, that is probably interesting in
core dumps. However there is no way for user space to know
what format that information is in making it useless.
I grepped the GDB 6.3 source code and NT_TASKSTRUCT while
defined is not used anywhere else. So I would be surprised
if anyone notices it is missing.
In addition exporting kernel pointers to all the interesting
kernel data structures sounds like the very definition of an
information leak. I haven't a clue what someone with evil
intentions could do with that information, but in any attack
against the kernel it looks like this is the perfect tool for
aiming that attack.
So since NT_TASKSTRUCT is useless as currently defined and
is potentially dangerous, let's just not export it.
Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/mips/kernel/irixelf.c | 17 ++++++-----------
fs/binfmt_elf.c | 4 +---
2 files changed, 7 insertions(+), 14 deletions(-)
06d116247caa2fef10cb73c5d6042a5ff658b677
diff --git a/arch/mips/kernel/irixelf.c b/arch/mips/kernel/irixelf.c
--- a/arch/mips/kernel/irixelf.c
+++ b/arch/mips/kernel/irixelf.c
@@ -1064,8 +1064,8 @@ static int irix_core_dump(long signr, st
struct elfhdr elf;
off_t offset = 0, dataoff;
int limit = current->signal->rlim[RLIMIT_CORE].rlim_cur;
- int numnote = 4;
- struct memelfnote notes[4];
+ int numnote = 3;
+ struct memelfnote notes[3];
struct elf_prstatus prstatus; /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
struct elf_prpsinfo psinfo; /* NT_PRPSINFO */
@@ -1198,20 +1198,15 @@ static int irix_core_dump(long signr, st
}
strlcpy(psinfo.pr_fname, current->comm, sizeof(psinfo.pr_fname));
- notes[2].name = "CORE";
- notes[2].type = NT_TASKSTRUCT;
- notes[2].datasz = sizeof(*current);
- notes[2].data = current;
-
/* Try to dump the FPU. */
prstatus.pr_fpvalid = dump_fpu (regs, &fpu);
if (!prstatus.pr_fpvalid) {
numnote--;
} else {
- notes[3].name = "CORE";
- notes[3].type = NT_PRFPREG;
- notes[3].datasz = sizeof(fpu);
- notes[3].data = &fpu;
+ notes[2].name = "CORE";
+ notes[2].type = NT_PRFPREG;
+ notes[2].datasz = sizeof(fpu);
+ notes[2].data = &fpu;
}
/* Write notes phdr entry. */
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1503,9 +1503,7 @@ static int elf_core_dump(long signr, str
fill_psinfo(psinfo, current->group_leader, current->mm);
fill_note(notes +1, "CORE", NT_PRPSINFO, sizeof(*psinfo), psinfo);
- fill_note(notes +2, "CORE", NT_TASKSTRUCT, sizeof(*current), current);
-
- numnote = 3;
+ numnote = 2;
auxv = (elf_addr_t *) current->mm->saved_auxv;