On Saturday 28 Feb 2004 5:35 am, George Anzinger wrote:
> Amit S. Kale wrote:
> > On Friday 06 Feb 2004 6:54 pm, Andi Kleen wrote:
> >>On Fri, 6 Feb 2004 18:35:16 +0530
> >>
> >>"Amit S. Kale" <[email protected]> wrote:
> >>>On Friday 06 Feb 2004 5:46 pm, Andi Kleen wrote:
> >>>>On Fri, 6 Feb 2004 17:28:36 +0530
> >>>>
> >>>>"Amit S. Kale" <[email protected]> wrote:
> >>>>>On Friday 06 Feb 2004 7:50 am, Andi Kleen wrote:
> >>>>>>On Thu, 5 Feb 2004 23:20:04 +0530
> >>>>>>
> >>>>>>"Amit S. Kale" <[email protected]> wrote:
> >>>>>>>On Thursday 05 Feb 2004 8:41 am, Andi Kleen wrote:
> >>>>>>>>Andrew Morton <[email protected]> writes:
> >>>>>>>>>need to take a look at such things and really convice
> >>>>>>>>>ourselves that they're worthwhile. Personally, I'd only be
> >>>>>>>>>interested in the basic stub.
> >>>>>>>>
> >>>>>>>>What I found always extremly ugly in the i386 stub was that it
> >>>>>>>>uses magic globals to talk to the page fault handler. For the
> >>>>>>>>x86-64 version I replaced that by just using __get/__put_user
> >>>>>>>>in the memory accesses, which is much cleaner. I would suggest
> >>>>>>>>doing that for i386 too.
> >>>>>>>
> >>>>>>>May be I am missing something obvious. When debugging a page
> >>>>>>>fault handler if kgdb accesses an swapped-out user page doesn't
> >>>>>>>it deadlock when trying to hold mm semaphore?
> >>>>>>
> >>>>>>Modern i386 kernels don't grab the mm semaphore when the access is
> >>>>>>
> >>>>>>>= TASK_SIZE and the access came from kernel space (actually I see
> >>>>>>
> >>>>>>x86-64 still does, but that's a bug, will fix). You could only see
> >>>>>>a deadlock when using user addresses and you already hold the mm
> >>>>>>semaphore for writing (normal read lock is ok). Just don't do that.
> >>>>>
> >>>>>OK. It don't deadlock when kgdb accesses kernel addresses.
> >>>>>
> >>>>>When a user space address is accessed through kgdb, won't the kernel
> >>>>>attempt to fault in the user page? We don't want that to happen
> >>>>>inside kgdb.
> >>>>
> >>>>Yes, it will. But I don't think it's a bad thing. If the users doesn't
> >>>>want that they should not follow user addresses. After all kgdb is for
> >>>>people who know what they are doing.
> >>>
> >>>Let kgdb refuse to access any addresses below TASK_SIZE. That's better
> >>>than accidentally typing something and getting lost.
> >>
> >>That's fine. But can you perhaps add a magic command that enables it
> >> again?
> >
> > Yes. This sounds good.
>
> This could be a flag in the kgdb_info structure. See -mm kgdb. Does not
> require any new commands as it is just a global the user can change.
Having all user modifiable variables in one place is definitely a good idea.
Need to do this sometime.
-Amit
Amit S. Kale wrote:
> On Saturday 28 Feb 2004 5:35 am, George Anzinger wrote:
>
>>Amit S. Kale wrote:
>>
>>>On Friday 06 Feb 2004 6:54 pm, Andi Kleen wrote:
>>>
>>>>On Fri, 6 Feb 2004 18:35:16 +0530
>>>>
>>>>"Amit S. Kale" <[email protected]> wrote:
>>>>
>>>>>On Friday 06 Feb 2004 5:46 pm, Andi Kleen wrote:
>>>>>
>>>>>>On Fri, 6 Feb 2004 17:28:36 +0530
>>>>>>
>>>>>>"Amit S. Kale" <[email protected]> wrote:
>>>>>>
>>>>>>>On Friday 06 Feb 2004 7:50 am, Andi Kleen wrote:
>>>>>>>
>>>>>>>>On Thu, 5 Feb 2004 23:20:04 +0530
>>>>>>>>
>>>>>>>>"Amit S. Kale" <[email protected]> wrote:
>>>>>>>>
>>>>>>>>>On Thursday 05 Feb 2004 8:41 am, Andi Kleen wrote:
>>>>>>>>>
>>>>>>>>>>Andrew Morton <[email protected]> writes:
>>>>>>>>>>
>>>>>>>>>>>need to take a look at such things and really convice
>>>>>>>>>>>ourselves that they're worthwhile. Personally, I'd only be
>>>>>>>>>>>interested in the basic stub.
>>>>>>>>>>
>>>>>>>>>>What I found always extremly ugly in the i386 stub was that it
>>>>>>>>>>uses magic globals to talk to the page fault handler. For the
>>>>>>>>>>x86-64 version I replaced that by just using __get/__put_user
>>>>>>>>>>in the memory accesses, which is much cleaner. I would suggest
>>>>>>>>>>doing that for i386 too.
>>>>>>>>>
>>>>>>>>>May be I am missing something obvious. When debugging a page
>>>>>>>>>fault handler if kgdb accesses an swapped-out user page doesn't
>>>>>>>>>it deadlock when trying to hold mm semaphore?
>>>>>>>>
>>>>>>>>Modern i386 kernels don't grab the mm semaphore when the access is
>>>>>>>>
>>>>>>>>
>>>>>>>>>= TASK_SIZE and the access came from kernel space (actually I see
>>>>>>>>
>>>>>>>>x86-64 still does, but that's a bug, will fix). You could only see
>>>>>>>>a deadlock when using user addresses and you already hold the mm
>>>>>>>>semaphore for writing (normal read lock is ok). Just don't do that.
>>>>>>>
>>>>>>>OK. It don't deadlock when kgdb accesses kernel addresses.
>>>>>>>
>>>>>>>When a user space address is accessed through kgdb, won't the kernel
>>>>>>>attempt to fault in the user page? We don't want that to happen
>>>>>>>inside kgdb.
>>>>>>
>>>>>>Yes, it will. But I don't think it's a bad thing. If the users doesn't
>>>>>>want that they should not follow user addresses. After all kgdb is for
>>>>>>people who know what they are doing.
>>>>>
>>>>>Let kgdb refuse to access any addresses below TASK_SIZE. That's better
>>>>>than accidentally typing something and getting lost.
>>>>
>>>>That's fine. But can you perhaps add a magic command that enables it
>>>>again?
>>>
>>>Yes. This sounds good.
>>
>>This could be a flag in the kgdb_info structure. See -mm kgdb. Does not
>>require any new commands as it is just a global the user can change.
>
>
> Having all user modifiable variables in one place is definitely a good idea.
> Need to do this sometime.
I also put the discovered status of each cpu other than the current one here.
The structure is explained in the documentation part of the -mm patch. Another
thing I put here and have found to be very useful is the address that the stub
was called from. Often it is not clear just why we are in the stub, given that
we trap such things as kernel page faults, NMI watchdog, BUG macros and such. I
have found, on occasion, that knowing this info has been key to understanding
what was wrong.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
George Anzinger <[email protected]> wrote:
>
> Often it is not clear just why we are in the stub, given that
> we trap such things as kernel page faults, NMI watchdog, BUG macros and such.
Yes, that can be confusing. A little printk on the console prior to
entering the debugger would be nice.
Andrew Morton wrote:
> George Anzinger <[email protected]> wrote:
>
>> Often it is not clear just why we are in the stub, given that
>>we trap such things as kernel page faults, NMI watchdog, BUG macros and such.
>
>
> Yes, that can be confusing. A little printk on the console prior to
> entering the debugger would be nice.
That assumes that one can do a printk and not run into a lock. Far better
IMNSHO is to provide a simple way to get it from gdb. One can then even provide
a gdb macro to print the relevant source line and its surrounds. I my lighter
moments I call this the comefrom macro :) In my kgdb it would look like:
l * kgdb_info.called_from
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
On Wednesday 03 Mar 2004 5:22 am, George Anzinger wrote:
> Andrew Morton wrote:
> > George Anzinger <[email protected]> wrote:
> >> Often it is not clear just why we are in the stub, given that
> >>we trap such things as kernel page faults, NMI watchdog, BUG macros and
> >> such.
> >
> > Yes, that can be confusing. A little printk on the console prior to
> > entering the debugger would be nice.
>
> That assumes that one can do a printk and not run into a lock. Far better
> IMNSHO is to provide a simple way to get it from gdb. One can then even
> provide a gdb macro to print the relevant source line and its surrounds. I
> my lighter moments I call this the comefrom macro :) In my kgdb it would
> look like:
>
> l * kgdb_info.called_from
How about echoing "Waiting for gdb connection" stright into the serial line
without any encoding? Since gdb won't be connected to the other end, and many
a times a minicom could be running at the other end, it'll give a user an
indication of kgdb being ready.
--
Amit Kale
EmSysSoft (http://www.emsyssoft.com)
KGDB: Linux Kernel Source Level Debugger (http://kgdb.sourceforge.net)
On Tue, Mar 02, 2004 at 01:27:51PM -0800, Andrew Morton wrote:
> George Anzinger <[email protected]> wrote:
> >
> > Often it is not clear just why we are in the stub, given that
> > we trap such things as kernel page faults, NMI watchdog, BUG macros and such.
>
> Yes, that can be confusing. A little printk on the console prior to
> entering the debugger would be nice.
What I did for kdb and panic some time ago was to flash the keyboard
lights. If you use a unique frequency (different from kdb
and from panic) it works quite nicely.
-Andi
On Wed, Mar 03, 2004 at 10:38:39AM +0530, Amit S. Kale wrote:
> On Wednesday 03 Mar 2004 5:22 am, George Anzinger wrote:
> > Andrew Morton wrote:
> > > George Anzinger <[email protected]> wrote:
> > >> Often it is not clear just why we are in the stub, given that
> > >>we trap such things as kernel page faults, NMI watchdog, BUG macros and
> > >> such.
> > >
> > > Yes, that can be confusing. A little printk on the console prior to
> > > entering the debugger would be nice.
> >
> > That assumes that one can do a printk and not run into a lock. Far better
> > IMNSHO is to provide a simple way to get it from gdb. One can then even
> > provide a gdb macro to print the relevant source line and its surrounds. I
> > my lighter moments I call this the comefrom macro :) In my kgdb it would
> > look like:
> >
> > l * kgdb_info.called_from
>
> How about echoing "Waiting for gdb connection" stright into the serial line
> without any encoding? Since gdb won't be connected to the other end, and many
> a times a minicom could be running at the other end, it'll give a user an
> indication of kgdb being ready.
It's not "GDB is ready" it's "GDB is ready now because ..."
--
Tom Rini
http://gate.crashing.org/~trini/
Andi Kleen wrote:
> On Tue, Mar 02, 2004 at 01:27:51PM -0800, Andrew Morton wrote:
>
>>George Anzinger <[email protected]> wrote:
>>
>>> Often it is not clear just why we are in the stub, given that
>>>we trap such things as kernel page faults, NMI watchdog, BUG macros and such.
>>
>>Yes, that can be confusing. A little printk on the console prior to
>>entering the debugger would be nice.
>
>
> What I did for kdb and panic some time ago was to flash the keyboard
> lights. If you use a unique frequency (different from kdb
> and from panic) it works quite nicely.
Assuming a key board and a clear (no spin locks) path to it. Still it only says
we are in kgdb, now why.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
Amit S. Kale wrote:
> On Wednesday 03 Mar 2004 5:22 am, George Anzinger wrote:
>
>>Andrew Morton wrote:
>>
>>>George Anzinger <[email protected]> wrote:
>>>
>>>>Often it is not clear just why we are in the stub, given that
>>>>we trap such things as kernel page faults, NMI watchdog, BUG macros and
>>>>such.
>>>
>>>Yes, that can be confusing. A little printk on the console prior to
>>>entering the debugger would be nice.
>>
>>That assumes that one can do a printk and not run into a lock. Far better
>>IMNSHO is to provide a simple way to get it from gdb. One can then even
>>provide a gdb macro to print the relevant source line and its surrounds. I
>>my lighter moments I call this the comefrom macro :) In my kgdb it would
>>look like:
>>
>>l * kgdb_info.called_from
>
>
> How about echoing "Waiting for gdb connection" stright into the serial line
> without any encoding? Since gdb won't be connected to the other end, and many
> a times a minicom could be running at the other end, it'll give a user an
> indication of kgdb being ready.
Uh, different solution for a different problem. The above command to gdb causes
the source code around the location "kgdb_info.called_from" to be displayed. In
the -mm version, this is location is filled in by kgdb with the return address
for the "kgdb_handle_exception()". This allows you to see just why you are in kgdb.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
On Wed, 03 Mar 2004 16:43:47 -0800
George Anzinger <[email protected]> wrote:
> Andi Kleen wrote:
> > On Tue, Mar 02, 2004 at 01:27:51PM -0800, Andrew Morton wrote:
> >
> >>George Anzinger <[email protected]> wrote:
> >>
> >>> Often it is not clear just why we are in the stub, given that
> >>>we trap such things as kernel page faults, NMI watchdog, BUG macros and such.
> >>
> >>Yes, that can be confusing. A little printk on the console prior to
> >>entering the debugger would be nice.
> >
> >
> > What I did for kdb and panic some time ago was to flash the keyboard
> > lights. If you use a unique frequency (different from kdb
> > and from panic) it works quite nicely.
>
> Assuming a key board and a clear (no spin locks) path to it. Still it only says
I think it's reasonable to just write to the keyboard without any locking.
The keyboard driver will recover.
> we are in kgdb, now why.
The big advantage is that it works even when you are in X (like most people)
printks are often not visible.
-Andi
>
On Thursday 04 Mar 2004 6:20 am, Andi Kleen wrote:
> On Wed, 03 Mar 2004 16:43:47 -0800
>
> George Anzinger <[email protected]> wrote:
> > Andi Kleen wrote:
> > > On Tue, Mar 02, 2004 at 01:27:51PM -0800, Andrew Morton wrote:
> > >>George Anzinger <[email protected]> wrote:
> > >>> Often it is not clear just why we are in the stub, given that
> > >>>we trap such things as kernel page faults, NMI watchdog, BUG macros
> > >>> and such.
> > >>
> > >>Yes, that can be confusing. A little printk on the console prior to
> > >>entering the debugger would be nice.
> > >
> > > What I did for kdb and panic some time ago was to flash the keyboard
> > > lights. If you use a unique frequency (different from kdb
> > > and from panic) it works quite nicely.
Flashing keyboard lights is far simpler compared to a printk. Printk is too
heavy. Once a system is unstable, it's more important to run into kgdb code
asap. Calling printk and co may be too much.
> >
> > Assuming a key board and a clear (no spin locks) path to it. Still it
> > only says
>
> I think it's reasonable to just write to the keyboard without any locking.
> The keyboard driver will recover.
Flashing keyboard lights is easy on x86 and x86_64 platforms. Is that easy on
ppc workstations/servers? Embedded boards don't have keyboards.
>
> > we are in kgdb, now why.
>
> The big advantage is that it works even when you are in X (like most
> people) printks are often not visible.
Yep.
--
Amit Kale
EmSysSoft (http://www.emsyssoft.com)
KGDB: Linux Kernel Source Level Debugger (http://kgdb.sourceforge.net)
"Amit S. Kale" <[email protected]> wrote:
>
> Flashing keyboard lights is easy on x86 and x86_64 platforms.
Please, no keyboards. Some people want to be able to use kgdboe
to find out why machine number 324 down the corridor just died.
How about just doing
char *why_i_crashed;
{
...
if (expr1)
why_i_crashed = "hit a BUG";
else if (expr2)
why_i_crashed = "divide by zero";
else ...
}
then provide a gdb macro which prints out the string at *why_i_crashed?
On Thursday 04 Mar 2004 10:48 am, Andrew Morton wrote:
> "Amit S. Kale" <[email protected]> wrote:
> > Flashing keyboard lights is easy on x86 and x86_64 platforms.
>
> Please, no keyboards. Some people want to be able to use kgdboe
> to find out why machine number 324 down the corridor just died.
>
> How about just doing
>
>
> char *why_i_crashed;
>
>
> {
> ...
> if (expr1)
> why_i_crashed = "hit a BUG";
> else if (expr2)
> why_i_crashed = "divide by zero";
> else ...
> }
>
> then provide a gdb macro which prints out the string at *why_i_crashed?
If we can afford to do this (in terms of actions that can be done with the
machine being unstable) we can certainly print a console message through gdb.
A stub is free to send console messages to gdb at any time. We can send a
"'O'hex(Page fault at 0x1234)" packet to gdb regardless of whether
CONFIG_KGDB_CONSOLE is configured in. This way kgdb will send this packet to
gdb and then immediately report a segfault/trap. To a user it'll appear as a
message printed from gdb "Page fault at 0x1234" followed by gdb showing a
SIGSEGV etc. The gdb console message should print information other than a
signal number.
-Amit
"Amit S. Kale" <[email protected]> wrote:
>
> To a user it'll appear as a
> message printed from gdb "Page fault at 0x1234" followed by gdb showing a
> SIGSEGV etc.
Well that would be nice. Bear in mind that one usage scenario is to say
"hey, machine 342 has stopped responding" and to then fire up gdb and
connect to that machine with kgdboe.
In other words: if we rely on a gdb instance being connected at the time of
the exception, that message will be lost. There needs to be a way of
retrieving the message post-facto.
On Wed, 3 Mar 2004 21:18:50 -0800
Andrew Morton <[email protected]> wrote:
> "Amit S. Kale" <[email protected]> wrote:
> >
> > Flashing keyboard lights is easy on x86 and x86_64 platforms.
>
> Please, no keyboards. Some people want to be able to use kgdboe
> to find out why machine number 324 down the corridor just died.
Not as the only indication I agree. But for machines running X that
are actually used by people I think it's important to always give some kind
of visual feedback when the X server freezes. And kgdb will make the X server
freeze. You could actually make it a notifier list to register severals
ways to do this, e.g. the cluster people could add something that makes
it flash a warning light. For a standard box I think flashing the keyboard
is a good default for now
(ok there are USB keyboards too, for those there will need to be a different
solution)
>
> char *why_i_crashed;
>
>
> {
> ...
> if (expr1)
> why_i_crashed = "hit a BUG";
> else if (expr2)
> why_i_crashed = "divide by zero";
> else ...
> }
>
> then provide a gdb macro which prints out the string at *why_i_crashed?
That doesn't tell the user at all why his X server just froze.
But it may be a good addition.
-Andi
Amit S. Kale wrote:
> On Thursday 04 Mar 2004 10:48 am, Andrew Morton wrote:
>
>>"Amit S. Kale" <[email protected]> wrote:
>>
>>>Flashing keyboard lights is easy on x86 and x86_64 platforms.
>>
>>Please, no keyboards. Some people want to be able to use kgdboe
>>to find out why machine number 324 down the corridor just died.
>>
>>How about just doing
>>
>>
>>char *why_i_crashed;
>>
>>
>>{
>> ...
>> if (expr1)
>> why_i_crashed = "hit a BUG";
>> else if (expr2)
>> why_i_crashed = "divide by zero";
>> else ...
>>}
>>
>>then provide a gdb macro which prints out the string at *why_i_crashed?
>
>
> If we can afford to do this (in terms of actions that can be done with the
> machine being unstable) we can certainly print a console message through gdb.
Not once you are connected to gdb. The "O" packet can only be sent if the
program (i.e. kernel) is running as far as gdb knows. So you could preceed a
connection with this, but could not used it after gdb knows the kernel is stopped.
>
> A stub is free to send console messages to gdb at any time. We can send a
> "'O'hex(Page fault at 0x1234)" packet to gdb regardless of whether
> CONFIG_KGDB_CONSOLE is configured in. This way kgdb will send this packet to
> gdb and then immediately report a segfault/trap. To a user it'll appear as a
> message printed from gdb "Page fault at 0x1234" followed by gdb showing a
> SIGSEGV etc. The gdb console message should print information other than a
> signal number.
>
> -Amit
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
On Thu, Mar 04, 2004 at 12:54:12PM -0800, George Anzinger wrote:
> Amit S. Kale wrote:
> >On Thursday 04 Mar 2004 10:48 am, Andrew Morton wrote:
> >
> >>"Amit S. Kale" <[email protected]> wrote:
> >>
> >>>Flashing keyboard lights is easy on x86 and x86_64 platforms.
> >>
> >>Please, no keyboards. Some people want to be able to use kgdboe
> >>to find out why machine number 324 down the corridor just died.
> >>
> >>How about just doing
> >>
> >>
> >>char *why_i_crashed;
> >>
> >>
> >>{
> >> ...
> >> if (expr1)
> >> why_i_crashed = "hit a BUG";
> >> else if (expr2)
> >> why_i_crashed = "divide by zero";
> >> else ...
> >>}
> >>
> >>then provide a gdb macro which prints out the string at *why_i_crashed?
> >
> >
> >If we can afford to do this (in terms of actions that can be done with the
> >machine being unstable) we can certainly print a console message through
> >gdb.
>
> Not once you are connected to gdb. The "O" packet can only be sent if the
> program (i.e. kernel) is running as far as gdb knows. So you could preceed
> a connection with this, but could not used it after gdb knows the kernel is
> stopped.
If GDB is already connected and sitting by waiting, you can send the O
packet. If it is not, you could delay sending the O packet until you
know that GDB has now connected.
This isn't an unworkable idea, but it's probably better to just set
*why_i_crashed (think work work work, oh wait, what caused this again?)
and provide some handy macros (which we should be in the docs anyhow).
--
Tom Rini
http://gate.crashing.org/~trini/
Tom Rini wrote:
> On Thu, Mar 04, 2004 at 12:54:12PM -0800, George Anzinger wrote:
>
>>Amit S. Kale wrote:
>>
>>>On Thursday 04 Mar 2004 10:48 am, Andrew Morton wrote:
>>>
>>>
>>>>"Amit S. Kale" <[email protected]> wrote:
>>>>
>>>>
>>>>>Flashing keyboard lights is easy on x86 and x86_64 platforms.
>>>>
>>>>Please, no keyboards. Some people want to be able to use kgdboe
>>>>to find out why machine number 324 down the corridor just died.
>>>>
>>>>How about just doing
>>>>
>>>>
>>>>char *why_i_crashed;
>>>>
>>>>
>>>>{
>>>> ...
>>>> if (expr1)
>>>> why_i_crashed = "hit a BUG";
>>>> else if (expr2)
>>>> why_i_crashed = "divide by zero";
>>>> else ...
>>>>}
>>>>
>>>>then provide a gdb macro which prints out the string at *why_i_crashed?
>>>
>>>
>>>If we can afford to do this (in terms of actions that can be done with the
>>>machine being unstable) we can certainly print a console message through
>>>gdb.
>>
>>Not once you are connected to gdb. The "O" packet can only be sent if the
>>program (i.e. kernel) is running as far as gdb knows. So you could preceed
>>a connection with this, but could not used it after gdb knows the kernel is
>>stopped.
>
>
> If GDB is already connected and sitting by waiting, you can send the O
> packet. If it is not, you could delay sending the O packet until you
> know that GDB has now connected.
>
> This isn't an unworkable idea, but it's probably better to just set
> *why_i_crashed (think work work work, oh wait, what caused this again?)
> and provide some handy macros (which we should be in the docs anyhow).
Well, I did provide a "come_from" macro, but more definition could be had...
On the subject of macros, I am being convinced by the gdb folks that the way to
do the info threads thing is with macros. We get almost all we want this way
without messing with gdb or lying to it the way the -mm kgdb does.
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml