2000-11-22 13:50:43

by Andries E. Brouwer

[permalink] [raw]
Subject: [PATCH] isofs/inode.c

Here the second patch in the isofs series.
inode.c:isofs_read_super() dereferences the variable pri
that need not be set in case of a Joliet CD, causing an Oops.
Patch below.

Andries

[While editing the diff, I left a fix for aha1542.c,
maybe you got it already. I also left something else
that always annoyed me: valuable screen space (on a 24x80 vt)
is lost by these silly [< >] around addresses in an Oops.
They provide no information at all, but on the other hand
cause loss of information because these lines no longer
fit in 80 columns causing line wrap and the loss of the
top of the Oops.]

diff -u --recursive --new-file ../linux-2.4.0-test11/linux/arch/i386/kernel/traps.c ./linux/arch/i386/kernel/traps.c
--- ../linux-2.4.0-test11/linux/arch/i386/kernel/traps.c Tue Nov 21 21:44:05 2000
+++ ./linux/arch/i386/kernel/traps.c Mon Nov 20 12:17:11 2000
@@ -146,7 +146,7 @@
((addr >= module_start) && (addr <= module_end))) {
if (i && ((i % 8) == 0))
printk("\n ");
- printk("[<%08lx>] ", addr);
+ printk("%08lx ", addr); /* not [<...>] */
i++;
}
}
@@ -166,7 +166,7 @@
esp = regs->esp;
ss = regs->xss & 0xffff;
}
- printk("CPU: %d\nEIP: %04x:[<%08lx>]\nEFLAGS: %08lx\n",
+ printk("CPU: %d\nEIP: %04x:%08lx\nEFLAGS: %08lx\n",
smp_processor_id(), 0xffff & regs->xcs, regs->eip, regs->eflags);
printk("eax: %08lx ebx: %08lx ecx: %08lx edx: %08lx\n",
regs->eax, regs->ebx, regs->ecx, regs->edx);
diff -u --recursive --new-file ../linux-2.4.0-test11/linux/drivers/scsi/aha1542.c ./linux/drivers/scsi/aha1542.c
--- ../linux-2.4.0-test11/linux/drivers/scsi/aha1542.c Tue Nov 21 21:44:10 2000
+++ ./linux/drivers/scsi/aha1542.c Tue Nov 21 18:01:57 2000
@@ -1416,6 +1416,7 @@
SCtmp = HOSTDATA(SCpnt->host)->SCint[i];
if (SCtmp->host_scribble) {
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}
HOSTDATA(SCpnt->host)->SCint[i] = NULL;
HOSTDATA(SCpnt->host)->mb[i].status = 0;
@@ -1478,6 +1479,7 @@
}
if (SCtmp->host_scribble) {
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}
HOSTDATA(SCpnt->host)->SCint[i] = NULL;
HOSTDATA(SCpnt->host)->mb[i].status = 0;
@@ -1546,6 +1548,7 @@
}
if (SCtmp->host_scribble) {
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
}
HOSTDATA(SCpnt->host)->SCint[i] = NULL;
HOSTDATA(SCpnt->host)->mb[i].status = 0;
@@ -1681,8 +1684,10 @@
Scsi_Cmnd *SCtmp;
SCtmp = HOSTDATA(SCpnt->host)->SCint[i];
SCtmp->result = DID_RESET << 16;
- if (SCtmp->host_scribble)
+ if (SCtmp->host_scribble) {
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
+ }
printk(KERN_WARNING "Sending DID_RESET for target %d\n", SCpnt->target);
SCtmp->scsi_done(SCpnt);

@@ -1725,8 +1730,10 @@
Scsi_Cmnd *SCtmp;
SCtmp = HOSTDATA(SCpnt->host)->SCint[i];
SCtmp->result = DID_RESET << 16;
- if (SCtmp->host_scribble)
+ if (SCtmp->host_scribble) {
scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = NULL;
+ }
printk(KERN_WARNING "Sending DID_RESET for target %d\n", SCpnt->target);
SCtmp->scsi_done(SCpnt);

diff -u --recursive --new-file ../linux-2.4.0-test11/linux/fs/isofs/inode.c ./linux/fs/isofs/inode.c
--- ../linux-2.4.0-test11/linux/fs/isofs/inode.c Tue Nov 21 21:44:14 2000
+++ ./linux/fs/isofs/inode.c Wed Nov 22 13:22:09 2000
@@ -500,15 +500,13 @@
* that value.
*/
blocksize = get_hardblocksize(dev);
- if( (blocksize != 0)
- && (blocksize > opt.blocksize) )
- {
+ if(blocksize > opt.blocksize) {
/*
* Force the blocksize we are going to use to be the
* hardware blocksize.
*/
opt.blocksize = blocksize;
- }
+ }

blocksize_bits = 0;
{
@@ -594,6 +592,7 @@
brelse(bh);
bh = NULL;
}
+
/*
* If we fall through, either no volume descriptor was found,
* or else we passed a primary descriptor looking for others.
@@ -605,9 +604,7 @@
pri_bh = NULL;

root_found:
- brelse(pri_bh);
-
- if (joliet_level && opt.rock == 'n') {
+ if (joliet_level && (pri == NULL || opt.rock == 'n')) {
/* This is the case of Joliet with the norock mount flag.
* A disc with both Joliet and Rock Ridge is handled later
*/
@@ -704,6 +701,7 @@
* We're all done using the volume descriptor, and may need
* to change the device blocksize, so release the buffer now.
*/
+ brelse(pri_bh);
brelse(bh);

/*
@@ -873,8 +871,8 @@
/* Life is simpler than for other filesystem since we never
* have to create a new block, only find an existing one.
*/
-int isofs_get_block(struct inode *inode, long iblock,
- struct buffer_head *bh_result, int create)
+static int isofs_get_block(struct inode *inode, long iblock,
+ struct buffer_head *bh_result, int create)
{
unsigned long b_off;
unsigned offset, sect_size;
diff -u --recursive --new-file ../linux-2.4.0-test11/linux/include/linux/iso_fs.h ./linux/include/linux/iso_fs.h
--- ../linux-2.4.0-test11/linux/include/linux/iso_fs.h Tue Nov 21 21:44:16 2000
+++ ./linux/include/linux/iso_fs.h Tue Nov 21 13:39:44 2000
@@ -185,7 +185,6 @@
int get_acorn_filename(struct iso_directory_record *, char *, struct inode *);

extern struct dentry *isofs_lookup(struct inode *, struct dentry *);
-extern int isofs_get_block(struct inode *, long, struct buffer_head *, int);
extern int isofs_bmap(struct inode *, int);
extern struct buffer_head *isofs_bread(struct inode *, unsigned int, unsigned int);


2000-11-22 14:13:18

by Christian Gennerat

[permalink] [raw]
Subject: silly [< >] and other excess

[email protected] a ?crit :

> I also left something else
> that always annoyed me: valuable screen space (on a 24x80 vt)
> is lost by these silly [< >] around addresses in an Oops.
> They provide no information at all, but on the other hand
> cause loss of information because these lines no longer
> fit in 80 columns causing line wrap and the loss of the
> top of the Oops.]
>

What a good idea!
Moreover, there is another problem in Oops:
the dumped stack is limited to 3 or 4 lines to prevent loss of information
but the call trace is unlimited and can loose all information,
and sometimes is printing forever!
--- arch/i386/kernel/traps.c.orig Mon Oct 2 20:57:01 2000
+++ arch/i386/kernel/traps.c Sun Nov 5 14:33:52 2000
@@ -142,11 +142,12 @@
* out the call path that was taken.
*/
if (((addr >= (unsigned long) &_stext) &&
+ (i<32) &&
(addr <= (unsigned long) &_etext)) ||
((addr >= module_start) && (addr <= module_end))) {
if (i && ((i % 8) == 0))
printk("\n ");
- printk("[<%08lx>] ", addr);
+ printk("%08lx ", addr);
i++;
}
}

And do not scroll the screen after the last printed line!

--- kernel/panic.c.orig Tue Jun 20 23:32:27 2000
+++ kernel/panic.c Sun Nov 5 07:53:04 2000
@@ -56,7 +56,7 @@
va_end(args);
printk(KERN_EMERG "Kernel panic: %s\n",buf);
if (in_interrupt())
- printk(KERN_EMERG "In interrupt handler - not syncing\n");
+ printk(KERN_EMERG "In interrupt handler - not syncing");
else if (!current->pid)
printk(KERN_EMERG "In idle task - not syncing\n");
else


2000-11-22 18:51:26

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Christian Gennerat writes:
> [email protected] a =E9crit :
> > I also left something else
> > that always annoyed me: valuable screen space (on a 24x80 vt)
> > is lost by these silly [< >] around addresses in an Oops.
> > They provide no information at all, but on the other hand
> > cause loss of information because these lines no longer
> > fit in 80 columns causing line wrap and the loss of the
> > top of the Oops.]

They provide no information to the human reader, but they tell klogd
(and other tools) that the enclosed value is a kernel address that
should be looked up in the System.map file and decoded into name +
offset.

> What a good idea!

What a bad idea to remove them.

> --- arch/i386/kernel/traps.c.orig Mon Oct 2 20:57:01 2000
> +++ arch/i386/kernel/traps.c Sun Nov 5 14:33:52 2000
> @@ -142,11 +142,12 @@
> * out the call path that was taken.
> */
> if (((addr >=3D (unsigned long) &_stext) &&
> + (i<32) &&
> (addr <=3D (unsigned long) &_etext)) ||
> ((addr >=3D module_start) && (addr <=3D module_end))) {
> if (i && ((i % 8) =3D=3D 0))
> printk("\n ");
> - printk("[<%08lx>] ", addr);
> + printk("%08lx ", addr);
> i++;
> }
> }

What happened to the tabs? It looks like you're using the deadly evil
quoted-printable mime encoding format which seems to h ave spannered the
patch.

> --- kernel/panic.c.orig Tue Jun 20 23:32:27 2000
> +++ kernel/panic.c Sun Nov 5 07:53:04 2000
> @@ -56,7 +56,7 @@
> va_end(args);
> printk(KERN_EMERG "Kernel panic: %s\n",buf);
> if (in_interrupt())
> - printk(KERN_EMERG "In interrupt handler - not syncing\n");
> + printk(KERN_EMERG "In interrupt handler - not syncing");
> else if (!current->pid)
> printk(KERN_EMERG "In idle task - not syncing\n");
> else

IMHO, panic here is a better idea; I've had many a time when I get oops
after oops. Take for instance "scheduling in interrupt". This causes
a NULL pointer de-reference, which then calls die() which goes on to call
do_exit() which then calls schedule() which then causes a NULL pointer
de-reference, which then calls die() which goes on to call do_exit() which
then calls schedule() which then causes a NULL pointer de-reference etc.

End result is a constant stream of oopsen fast scrolling by on screen until
such time something gets corrupted which breaks the loop.

PS, if you want to catch an oops which locks the machine, use a serial
console to log it. If its a non-locking oops, examine the message log.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-22 19:31:19

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: silly [< >] and other excess

From [email protected] Wed Nov 22 19:20:52 2000

> [email protected] a ?crit :
> > I also left something else
> > that always annoyed me: valuable screen space (on a 24x80 vt)
> > is lost by these silly [< >] around addresses in an Oops.
> > They provide no information at all, but on the other hand
> > cause loss of information because these lines no longer
> > fit in 80 columns causing line wrap and the loss of the
> > top of the Oops.]

They provide no information to the human reader, but they tell klogd
(and other tools) that the enclosed value is a kernel address that
should be looked up in the System.map file and decoded into name +
offset.

Of course. But since there is no information in [< >]
(their presence is syntactically determined, not semantically)
the tools you mention can be trivially patched to work without
this [< >] kludge. On the other hand, when the system panics
often klogd and similar nice programs do not run at all, and
hence are unable to do any good. All information available
is the information on the screen, and it is really a pity
to lose EIP and get a few parentheses instead.

Andries

2000-11-22 22:52:58

by Keith Owens

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Wed, 22 Nov 2000 14:42:05 +0100,
Christian Gennerat <[email protected]> wrote:
>[email protected] a ?crit :
>
>> I also left something else
>> that always annoyed me: valuable screen space (on a 24x80 vt)
>> is lost by these silly [< >] around addresses in an Oops.
>> They provide no information at all, but on the other hand
>> cause loss of information because these lines no longer
>> fit in 80 columns causing line wrap and the loss of the
>> top of the Oops.]

You just broke ksymoops. Removing the [< >] is a bad idea, they are
one of the few things that identifies the addresses in the log,
otherwise they just look like hex numbers. ksymoops has to scan log
files which can contain anything and somehow pick out the interesting
lines, you need some identifier on the lines.

>Moreover, there is another problem in Oops:
>the dumped stack is limited to 3 or 4 lines to prevent loss of information
>but the call trace is unlimited and can loose all information,
>and sometimes is printing forever!
>--- arch/i386/kernel/traps.c.orig Mon Oct 2 20:57:01 2000
>+++ arch/i386/kernel/traps.c Sun Nov 5 14:33:52 2000
>@@ -142,11 +142,12 @@
> * out the call path that was taken.
> */
> if (((addr >= (unsigned long) &_stext) &&
>+ (i<32) &&
> (addr <= (unsigned long) &_etext)) ||
> ((addr >= module_start) && (addr <= module_end))) {
> if (i && ((i % 8) == 0))
> printk("\n ");
>- printk("[<%08lx>] ", addr);
>+ printk("%08lx ", addr);
> i++;
> }
> }

There should be no need to restrict the number of lines printed, it is
limited by the top of the kernel stack. If there are more than 32
trace entries on the stack then they should be printed.

2000-11-22 23:47:38

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

[email protected] writes:
> From [email protected] Wed Nov 22 19:20:52 2000
> They provide no information to the human reader, but they tell klogd
> (and other tools) that the enclosed value is a kernel address that
> should be looked up in the System.map file and decoded into name +
> offset.
>
> Of course. But since there is no information in [< >]
> (their presence is syntactically determined, not semantically)
> the tools you mention can be trivially patched to work without
> this [< >] kludge. On the other hand, when the system panics
> often klogd and similar nice programs do not run at all, and
> hence are unable to do any good. All information available
> is the information on the screen, and it is really a pity
> to lose EIP and get a few parentheses instead.

Well, in my experience, values of PC (or EIP is x86 speak) rarely
appear over column 50 on the screen. Therefore, removing them is
only going to save width, not height.

Also, have you considered that not every oops is formatted exactly
the same way on every architecture? In fact, an oops on ARM looks
like:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c1e90000
*pgd = 01e94001, *pmd = 01e94001, *pte = 0000308b, *ppte = 0000300a
Internal error: Oops: 0
CPU: 0
pc : [<c280007c>] lr : [<c00251f8>]
sp : c1e97f08 ip : c1e97ec0 fp : c1e97f14
r10: c1e96000 r9 : 00000004 r8 : ffffffea
r7 : 02029220 r6 : c1e37000 r5 : c2800000 r4 : 00000000
r3 : ef9f0000 r2 : 00000000 r1 : 00000000 r0 : 000001db
Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user
Control: 1E9117D Table: 01E9117D DAC: 00000015
Process insmod (pid: 9, stackpage=c1e97000)
Stack:
c1e97ee0: c00251f8 c280007c
c1e97f00: 60000013 ffffffff c1e97fac c1e97f18 c0026194 c280006c c1e37000 c1e38000
c1e97f20: c01f3680 00000060 c011d48c c2800060 00005e00 00000000 00000000 00000000
c1e97f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
c1e97f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
c1e97f80: 00000000 bfffec64 02021928 60000010 c2800000 c00169e4 00000080 0201bbe8
c1e97fa0: 00000000 c1e97fb0 c0016860 c0025acc bfffec64 c001bb54 0201bbe8 02029220
c1e97fc0: 00000000 00000038 bfffec64 02021928 02019f10 c2800000 00005e00 0201bbe8
c1e97fe0: 0201bbe8 bffffe60 400e4c90 bfffec28 020031e8 400e4c9c 60000010 0201bbe8
Backtrace:
Function entered at [<c2800060>] from [<c0026194>]
Function entered at [<c0025ac0>] from [<c0016860>]
Code: e51f2024 e5923000 (e5813000) e3a00000 e51f3030

where each number in [< >] should be looked up in System.map. Do you
propose to teach klogd and ksymoops every single oops format style?

If no, then don't make this change to the kernel. If yes, then you've
got a big job on your hands.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-23 00:04:27

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Keith Owens writes:
> Christian Gennerat <[email protected]> wrote:
>> [email protected] a =E9crit :

>>> I also left something else
>>> that always annoyed me: valuable screen space (on a 24x80 vt)
>>> is lost by these silly [< >] around addresses in an Oops.
>>> They provide no information at all, but on the other hand
>>> cause loss of information because these lines no longer
>>> fit in 80 columns causing line wrap and the loss of the
>>> top of the Oops.]

> You just broke ksymoops.

You can fix it. Keeping useful info on the screen is more important.

> Removing the [< >] is a bad idea, they are
> one of the few things that identifies the addresses in the log,
> otherwise they just look like hex numbers. ksymoops has to scan log
> files which can contain anything and somehow pick out the interesting
> lines, you need some identifier on the lines.

If you see register names followed by hex numbers, you have
some debug data. Scan forward and backward 25 lines, grabbing
all 8-digit and 16-digit hex numbers. Sort the numbers, then
look up all of them.

Crude solutions don't break as often as fancy solutions.

> There should be no need to restrict the number of lines printed, it is
> limited by the top of the kernel stack. If there are more than 32
> trace entries on the stack then they should be printed.

It could fill the screen. There is an expansion of 4-to-13 when
using the silly brackets, and a PC stack can be 6 or 7 kB long,
or perhaps many megabytes due to stack overflow. The standard
VGA screen only allows 4000 bytes of data.

2000-11-23 00:24:41

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Russell King writes:
> [email protected] writes:

>> Of course. But since there is no information in [< >]
>> (their presence is syntactically determined, not semantically)
>> the tools you mention can be trivially patched to work without
>> this [< >] kludge. On the other hand, when the system panics
>> often klogd and similar nice programs do not run at all, and
>> hence are unable to do any good. All information available
>> is the information on the screen, and it is really a pity
>> to lose EIP and get a few parentheses instead.
>
> Well, in my experience, values of PC (or EIP is x86 speak) rarely
> appear over column 50 on the screen. Therefore, removing them is
> only going to save width, not height.

Lots of useful data goes over column 80, causing even the innocent
little lines to scroll.

> Also, have you considered that not every oops is formatted exactly
> the same way on every architecture? In fact, an oops on ARM looks
> like:

Hey, no problem.

> Unable to handle kernel NULL pointer dereference at virtual address 00000000

That 00000000 is an 8-digit hex number within 25 lines of things
that would suggest a problem, so it gets looked up. The text itself
counts as a hint too. ("kernel NULL pointer")

> pgd = c1e90000

Same thing. The "pgd" counts as a hint.

> *pgd = 01e94001, *pmd = 01e94001, *pte = 0000308b, *ppte = 0000300a

All 4 get looked up. Sure, they might not be symbols, but it is
better to err on the side of looking up extra junk.

> Internal error: Oops: 0

The "0" is only 1 digit, so don't look it up. The "Oops" is an
obvious hint to the userspace tool.

> CPU: 0
> pc : [<c280007c>] lr : [<c00251f8>]
> sp : c1e97f08 ip : c1e97ec0 fp : c1e97f14
> r10: c1e96000 r9 : 00000004 r8 : ffffffea
> r7 : 02029220 r6 : c1e37000 r5 : c2800000 r4 : 00000000
> r3 : ef9f0000 r2 : 00000000 r1 : 00000000 r0 : 000001db

Gee, obvious register names and plenty of 8-digit hex numbers.

> Flags: nZCv IRQs on FIQs on Mode SVC_32 Segment user
> Control: 1E9117D Table: 01E9117D DAC: 00000015
> Process insmod (pid: 9, stackpage=c1e97000)

When you see "stackpage", you have a hint.

> Stack:

Oh look, another hint.

> c1e97ee0: c00251f8 c280007c
> c1e97f00: 60000013 ffffffff c1e97fac c1e97f18 c0026194 c280006c c1e37000 c1e38000

[ --- CHOP --- ]

All these numbers get looked up.

> Backtrace:

That is another hint.

> Function entered at [<c2800060>] from [<c0026194>]
> Function entered at [<c0025ac0>] from [<c0016860>]
> Code: e51f2024 e5923000 (e5813000) e3a00000 e51f3030

All those numbers get looked up. Keep going for another 25 lines too.

> where each number in [< >] should be looked up in System.map. Do you
> propose to teach klogd and ksymoops every single oops format style?

That isn't needed. When in doubt, look up a hex address.
Of course any decent tool will retain the unmolested oops
and just add a list of symbol names afterward.

2000-11-23 00:43:28

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> > c1e97ee0: c00251f8 c280007c
> > c1e97f00: 60000013 ffffffff c1e97fac c1e97f18 c0026194 c280006c c1e37000 c1e38000
>
> [ --- CHOP --- ]
>
> All these numbers get looked up.

These numbers should NOT get looked up - if they are, then very
useful information will be lost; they are not only references to
kernel functions, but also kernel data and read only data within
the kernel text segment. The result will be a totally undeciperal
garbage.

Again, care to put the effort into klogd/ksymoops to handle the
architecture special cases?
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-23 00:57:10

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> > Function entered at [<c2800060>] from [<c0026194>]
> > Function entered at [<c0025ac0>] from [<c0016860>]
> > Code: e51f2024 e5923000 (e5813000) e3a00000 e51f3030
>
> All those numbers get looked up. Keep going for another 25 lines too.

Oh, missed this one. Here you're wrong again. The numbers in [< >]
should be looked up, and no others. The code can look exactly like
a kernel address. In this case you definitely do NOT want to have
them converted.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-23 01:09:26

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: silly [< >] and other excess

From [email protected] Thu Nov 23 00:17:21 2000

Well, in my experience, values of PC (or EIP is x86 speak) rarely
appear over column 50 on the screen. Therefore, removing them is
only going to save width, not height.

The EIP is not pushed out of sight horizontally, but vertically.
(Maybe you never saw a i386 oops. With [<>] the call trace takes
twice as many lines (on a 25x80 screen) as without.)

Also, have you considered that not every oops is formatted exactly
the same way on every architecture?

Do you propose to teach klogd and ksymoops every single oops format style?

It is a triviality.
Besides, the patch only modified i386.

Andries

2000-11-23 01:21:31

by Alan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

> The EIP is not pushed out of sight horizontally, but vertically.
> (Maybe you never saw a i386 oops. With [<>] the call trace takes
> twice as many lines (on a 25x80 screen) as without.)

Thats because too many things get put on a line then. And because we
do [<foo>] [<bar>] not [<foo>][<bar>] ?

>

2000-11-23 02:54:22

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: silly [< >] and other excess

> Thats because too many things get put on a line then.
> And because we do [<foo>] [<bar>] not [<foo>][<bar>] ?

In the good old times we had foo bar for a total of 8*(8+1) = 72
positions. Now we have [<foo>] [<bar>] which takes 8*(8+1+4) = 104
positions. If you turned this into 6 items per line instead of 8,
it would certainly improve matters a bit.
Still..

Andries

2000-11-23 03:25:11

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Russell King writes:
> Albert D. Cahalan writes:

>> All these numbers get looked up.
>
> These numbers should NOT get looked up - if they are, then very
> useful information will be lost;

WOAH, STOP!!! You say "lost"???

Under NO circumstances should klogd or ksymoops mangle the
original oops. The raw oops data MUST be completely preserved.
It is a serious bug that this is not what currently happens.

> they are not only references to
> kernel functions, but also kernel data and read only data within
> the kernel text segment.

1. this is harmless
2. this is useful (you might get a variable's name)

> The result will be a totally undeciperal
> garbage.

Nope. You get the unmolested oops and some symbol data.
If there isn't any symbol for 0x424a5149, so what? It is
no big deal to look up a few opcodes in the symbol table
by accident.

> Again, care to put the effort into klogd/ksymoops to handle the
> architecture special cases?

That would be trading one design flaw for another.

The hard part of klogd/ksymoops is decoding the code bytes AFAIK.
The rest is a just a cross between grep and ps -- you search and
you do symbol lookups. I could throw it together in a few hours,
minus the disassembly part.

Hey, anybody ever think about splitting the kernel message buffer
to be per-CPU or keeping interrupt context separate from process
context? Not that I've looked at it, but locking might be reduced.

2000-11-23 03:34:12

by Keith Owens

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Wed, 22 Nov 2000 21:54:48 -0500 (EST),
"Albert D. Cahalan" <[email protected]> wrote:
>Under NO circumstances should klogd or ksymoops mangle the
>original oops. The raw oops data MUST be completely preserved.
>It is a serious bug that this is not what currently happens.

ksymoops prints the original data followed by the decode, it is clean.

<rant> klogd only prints the decoded data, often gets it wrong and
leaves garbage for ksymoops. I did a patch to klogd a couple
of years ago and sent it to the maintainer but neither the sysklogd
maintainer nor the distributors seem to care. </rant>

>The hard part of klogd/ksymoops is decoding the code bytes AFAIK.
>The rest is a just a cross between grep and ps -- you search and
>you do symbol lookups. I could throw it together in a few hours,
>minus the disassembly part.

Take a look at the code in ksymoops oops.c before you make rash
statements like that. It has to handle _all_ architecture messages,
including cross arch debugging.

2000-11-23 03:35:12

by Ragnar Hojland Espinosa

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Thu, Nov 23, 2000 at 12:26:30AM +0000, Russell King wrote:
> Albert D. Cahalan writes:
> > > Function entered at [<c2800060>] from [<c0026194>]
> > > Function entered at [<c0025ac0>] from [<c0016860>]
> > > Code: e51f2024 e5923000 (e5813000) e3a00000 e51f3030
> >
> > All those numbers get looked up. Keep going for another 25 lines too.
>
> Oh, missed this one. Here you're wrong again. The numbers in [< >]
> should be looked up, and no others. The code can look exactly like
> a kernel address. In this case you definitely do NOT want to have
> them converted.

Okay. How about just using some prefix to the hex number, such as '>'?
It'll still save plenty of space, and would be trivial changes for the
tools.

--
____/| Ragnar H?jland Freedom - Linux - OpenGL Fingerprint 94C4B
\ o.O| 2F0D27DE025BE2302C
=(_)= "Thou shalt not follow the NULL pointer for 104B78C56 B72F0822
U chaos and madness await thee at its end." hkp://keys.pgp.com

Handle via comment channels only.

2000-11-23 13:09:48

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Keith Owens writes:
> "Albert D. Cahalan" <[email protected]> wrote:

>> The hard part of klogd/ksymoops is decoding the code bytes AFAIK.
>> The rest is a just a cross between grep and ps -- you search and
>> you do symbol lookups. I could throw it together in a few hours,
>> minus the disassembly part.
>
> Take a look at the code in ksymoops oops.c before you make rash
> statements like that. It has to handle _all_ architecture messages,
> including cross arch debugging.

I looked. I'm sure that was hard to write, but I don't agree
that it is needed. If you miss one of the zillions of kernel
data formats, then you can't properly handle the data.

Also, cross-arch debugging is done by people who don't need tools
like ksymoops anyway. Most likely they have half the opcodes
memorized already, and they have the CPU manual open on their desk.
Tools are needed so that regular users don't have to send the
whole System.map file to linux-kernel.

I threw together a semi-working prototype in a few hours.
It is the worst code I ever wrote in my life, not even
excluding stuff I wrote in Atari BASIC. It slurps down log
files pretty well though, and proves "[<>]" is unneeded.

Nasty source:
http://www.cs.uml.edu/~acahalan/linux/ogrep.tar.gz

Compile:
gcc -O2 -DOGREP -o ogrep *.c

Example usage:
ogrep -v 2.2.11 parser-god-sacrifice your-log-file

Um, yeah, you have to sacrifice an argument to the parser gods.
The built-in usage message is thus wrong. The ksyms parser is
disabled, you get some extra blank lines, and invalid symbols
print as "?" (the ps WCHAN behavior) instead of being suppressed.
Non-i386 ought to work, provided that kernel pointers are not
wider than usespace pointers. Performance needs work too.

While the code is crap, it does prove that you don't need
kernel code to put silly [<>] brackets around anything.

2000-11-23 13:18:09

by Alan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

> Also, cross-arch debugging is done by people who don't need tools
> like ksymoops anyway. Most likely they have half the opcodes

Nope

> memorized already, and they have the CPU manual open on their desk.

Nope

2000-11-23 14:59:36

by Charles Cazabon

[permalink] [raw]
Subject: Re: silly [< >] and other excess

[email protected] <[email protected]> wrote:
> > Thats because too many things get put on a line then.
> > And because we do [<foo>] [<bar>] not [<foo>][<bar>] ?
>
> In the good old times we had foo bar for a total of 8*(8+1) = 72
> positions. Now we have [<foo>] [<bar>] which takes 8*(8+1+4) = 104
> positions. If you turned this into 6 items per line instead of 8,
> it would certainly improve matters a bit.

The original poster complained the output lines were too wide for the screen
on his PC. Perhaps he should change his console mode to 132 characters wide
(via SVGATextMode or such) -- voila, no more problem, no broken kernel patches.

Charles
--
--------------------------------------------------------------
Charles Cazabon <[email protected]>
My opinions do not necessarily represent those of my employer.
--------------------------------------------------------------

2000-11-23 16:45:14

by Peter Samuelson

[permalink] [raw]
Subject: [PATCH] silly [< >] and other excess


[Alan Cox]
> > Thats because too many things get put on a line then.
> > And because we do [<foo>] [<bar>] not [<foo>][<bar>] ?

[Andries Brouwer]
> In the good old times we had foo bar for a total of 8*(8+1) = 72
> positions. Now we have [<foo>] [<bar>] which takes 8*(8+1+4) = 104

I've got it! Put multiple addresses within one set of [< >]. ksymoops
and klogd will require a small adjustment, of course.

The following show_stack() is 8 addrs per line, 79 columns. Comments?

Peter

--- arch/i386/kernel/traps.c.orig Mon Nov 13 01:44:02 2000
+++ arch/i386/kernel/traps.c Thu Nov 23 10:10:06 2000
@@ -126,7 +126,6 @@
printk("%08lx ", *stack++);
}

- printk("\nCall Trace: ");
stack = esp;
i = 1;
module_start = VMALLOC_START;
@@ -144,12 +143,17 @@
if (((addr >= (unsigned long) &_stext) &&
(addr <= (unsigned long) &_etext)) ||
((addr >= module_start) && (addr <= module_end))) {
- if (i && ((i % 8) == 0))
- printk("\n ");
- printk("[<%08lx>] ", addr);
+ if (i==1)
+ printk("\nCall Trace: [<");
+ else if ((i % 8)==0)
+ printk(">]\n [<");
+ else
+ printk(" ");
+ printk("%08lx", addr);
i++;
}
}
+ printk(">]\n");
}

static void show_registers(struct pt_regs *regs)

2000-11-23 18:51:56

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Ragnar Hojland Espinosa writes:
> On Thu, Nov 23, 2000 at 12:26:30AM +0000, Russell King wrote:
> > Oh, missed this one. Here you're wrong again. The numbers in [< >]
> > should be looked up, and no others. The code can look exactly like
> > a kernel address. In this case you definitely do NOT want to have
> > them converted.
>
> Okay. How about just using some prefix to the hex number, such as '>'?
> It'll still save plenty of space, and would be trivial changes for the
> tools.

That is more a question for Keith Owens, not me. Keith is the maintainer
of ksymoops. He has to be happy with the address highlighting. I just
have to be happy that we don't loose ksymoops.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-23 18:51:57

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> > they are not only references to
> > kernel functions, but also kernel data and read only data within
> > the kernel text segment.
>
> 1. this is harmless
> 2. this is useful (you might get a variable's name)

Wrong. Op-codes on this machine are organised such that bits 31-28
indicate the "condition" that the instruction executes. All 16
combinations are meaningful. This means that any 32-bit value can
appear to be an instruction OR data. It takes human intellect to
decide which it is. No machine can tell you that.

> Nope. You get the unmolested oops and some symbol data.
> If there isn't any symbol for 0x424a5149, so what? It is
> no big deal to look up a few opcodes in the symbol table
> by accident.

But there could well be a symbol for 0xc0023004, but it also
corresponds to the instruction:

andgt r3, r2, r4

"Perform a logical AND operation between r2 and r4 and place the result
in r3 if the condition codes indicate the `Greater Than' condition"

In addition, the kernel may not be compiled to run at address 0xC.......
but at address 0x6....... or maybe even 0xe....... Guess what 0xE means
in the high nibble of the op-code? "Always", or "Unconditional".

> That would be trading one design flaw for another.
>
> The hard part of klogd/ksymoops is decoding the code bytes AFAIK.
> The rest is a just a cross between grep and ps -- you search and
> you do symbol lookups. I could throw it together in a few hours,
> minus the disassembly part.

As far as I am concerned, you are the people who propose to break
something, and therefore you are the people who should provide the
effort to fix what you will be breaking when the brokenness is
highlighted.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-23 20:47:15

by Tuomas Heino

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Thu, 23 Nov 2000, Charles Cazabon wrote:

> [email protected] <[email protected]> wrote:

> > > Thats because too many things get put on a line then.
> > > And because we do [<foo>] [<bar>] not [<foo>][<bar>] ?
> >
> > In the good old times we had foo bar for a total of 8*(8+1) = 72
> > positions. Now we have [<foo>] [<bar>] which takes 8*(8+1+4) = 104
> > positions. If you turned this into 6 items per line instead of 8,
> > it would certainly improve matters a bit.
>
> The original poster complained the output lines were too wide for the screen
> on his PC. Perhaps he should change his console mode to 132 characters wide
> (via SVGATextMode or such) -- voila, no more problem, no broken kernel patches.

Well that 72 characters is also known as the "safe" email line length...
... and besides some people still use the old VT330s and alike ;)

But in practice even those 79 chars would be better than 104.

--
No .sig here... especially no 128x48 .sig here - nor 132xYuch...

2000-11-23 20:57:41

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> Also, cross-arch debugging is done by people who don't need tools
> like ksymoops anyway. Most likely they have half the opcodes
> memorized already, and they have the CPU manual open on their desk.

I certainly don't have each of the 4 billion opcode combinations on
the ARM memorised, and I've been hacking ARM code for over 12 years
now.

> I threw together a semi-working prototype in a few hours.
> It is the worst code I ever wrote in my life, not even
> excluding stuff I wrote in Atari BASIC. It slurps down log
> files pretty well though, and proves "[<>]" is unneeded.

Oh, how have you proven it? Have you proven it with that ARM oops
that appeared on this list? How do you know that it has produced
the right output for the developers?
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-25 05:04:05

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Russell King writes:
> Albert D. Cahalan writes:

>>> they are not only references to
>>> kernel functions, but also kernel data and read only data within
>>> the kernel text segment.
>>
>> 1. this is harmless
>> 2. this is useful (you might get a variable's name)
>
> Wrong. Op-codes on this machine are organised such that bits 31-28
> indicate the "condition" that the instruction executes. All 16
> combinations are meaningful. This means that any 32-bit value can
> appear to be an instruction OR data. It takes human intellect to
> decide which it is. No machine can tell you that.

You haven't shown me to be wrong. How exactly is it harmful to
err on the side of doing symbol lookups that aren't required?
Maybe you'd like an example; I'll provide one below.

>> Nope. You get the unmolested oops and some symbol data.
>> If there isn't any symbol for 0x424a5149, so what? It is
>> no big deal to look up a few opcodes in the symbol table
>> by accident.
>
> But there could well be a symbol for 0xc0023004, but it also
> corresponds to the instruction:
>
> andgt r3, r2, r4

Yes. So what?

> "Perform a logical AND operation between r2 and r4 and place the result
> in r3 if the condition codes indicate the `Greater Than' condition"
>
> In addition, the kernel may not be compiled to run at address 0xC.......
> but at address 0x6....... or maybe even 0xe....... Guess what 0xE means
> in the high nibble of the op-code? "Always", or "Unconditional".

Again, not a problem. Here is the example:

---- example crash ----
kernel NULL (0000002c) accessed from c01a4b98
GPRs c0294041 00000000 c01a4600 0000000a 00000200 c0258000 ffffffff
OSRs 00000000 00403100 c0100000 00000002
RAR c0105344 SP c0294080 FCR 00000000 FSR 00000000
Stack: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
Trace: bad stack frame
Code: 8a739052 c000000a 41310001 87052926
-----------------------

Then it gets run through a simple tool that never fails to look
up a symbol that needs to be looked up:

---- example report of the above crash ----
kernel NULL (0000002c) accessed from c01a4b98
GPRs c0294041 00000000 c01a4600 0000000a 00000200 c0258000 ffffffff
OSRs 00000000 00403100 c0100000 00000002
RAR c0105344 SP c0294080 FCR 00000000 FSR 00000000
Stack: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
Trace: bad stack frame
Code: 8a739052 c000000a 41310001 87052926

Symbols:
c000000a __start
c0105344 qfs_frob_directory
c01a4600 qfs_cleaner
c01a4b98 qfs_hash_file_record
-------------------------------------------

Well, that first symbol (__start) was really "jump +10", but the
extra noise doesn't hurt anyone. You get what you need, no matter
how mangled the oops is. It can be word-wrapped, missing chunks...
The tool doesn't need to care.

Code disassembly is useful too, and again it is best to err on
the side of decoding more than is required. Remember that users
tend to mangle oops data. If the FCR register get disassembled
as a breakpoint instruction... hey, just ignore the extra noise.

When tools really try to understand an oops, they screw up.
They ignore data that should be looked up as symbols.

2000-11-25 09:51:02

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> ---- example report of the above crash ----
> kernel NULL (0000002c) accessed from c01a4b98
> GPRs c0294041 00000000 c01a4600 0000000a 00000200 c0258000 ffffffff
> OSRs 00000000 00403100 c0100000 00000002
> RAR c0105344 SP c0294080 FCR 00000000 FSR 00000000
> Stack: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000
> Trace: bad stack frame
> Code: 8a739052 c000000a 41310001 87052926
>
> Symbols:
> c000000a __start
> c0105344 qfs_frob_directory
> c01a4600 qfs_cleaner
> c01a4b98 qfs_hash_file_record
> -------------------------------------------
>
> Well, that first symbol (__start) was really "jump +10", but the
> extra noise doesn't hurt anyone. You get what you need, no matter
> how mangled the oops is. It can be word-wrapped, missing chunks...
> The tool doesn't need to care.

However, now rather than just reading the dump as you can with ksymoops
or whatever, you have to look at the raw data and try to match it up with
a symbol in the list.

So, with that ARM dump I gave you, we'd potentially end up with about 100
lines of symbols, where about 90 of them are useless. That would be a
backwards step in the development of Linux IMHO.

PS, you're not going to convince me unless you can come up with something
that produces ksymoops-like output, so there's no point continuing.
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-25 10:56:52

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Russell King writes:
> Albert D. Cahalan writes:

>> Symbols:
>> c000000a __start
>> c0105344 qfs_frob_directory
>> c01a4600 qfs_cleaner
>> c01a4b98 qfs_hash_file_record
...
>> Well, that first symbol (__start) was really "jump +10", but the
>> extra noise doesn't hurt anyone. You get what you need, no matter
>> how mangled the oops is. It can be word-wrapped, missing chunks...
>> The tool doesn't need to care.
>
> However, now rather than just reading the dump as you can with
> ksymoops or whatever, you have to look at the raw data and try
> to match it up with a symbol in the list.

Yes. Don't you look at the raw data anyway?

Um, maybe you just don't work the way I do. In response to one
of your other messages, most of the time I have two PowerPC books
open on my desk. I hadn't thought that was so odd.

If you are going to rely on a tool, you might as well add some
error correction bits to the oops, base-32 encode it, and chop
it up into 5-character words for ease of paper-and-pencil copy.

> So, with that ARM dump I gave you, we'd potentially end up with
> about 100 lines of symbols, where about 90 of them are useless.

In theory yes, but in practice no. Your kernel isn't a significant
portion of your address space, so the chance of random data being
looked up successfully is very low. Maybe a 1% chance on 32-bit
hardware, and far less on 64-bit hardware.

> That would be a backwards step in the development of Linux IMHO.
>
> PS, you're not going to convince me unless you can come up with something
> that produces ksymoops-like output, so there's no point continuing.

Damn.

Somebody else posted a reasonable hack for the [<>] problem.
His proposal involved letting multiple values share the same
markers, something like this:

[<c19a5cb4 c180234c c1801134 c1706550 c1800248 c1603310 c1934878 c1840324>]

2000-11-25 11:38:16

by Keith Owens

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Sat, 25 Nov 2000 05:26:20 -0500 (EST),
"Albert D. Cahalan" <[email protected]> wrote:
>Somebody else posted a reasonable hack for the [<>] problem.
>His proposal involved letting multiple values share the same
>markers, something like this:
>
>[<c19a5cb4 c180234c c1801134 c1706550 c1800248 c1603310 c1934878 c1840324>]

What happens if the line is wrapped before being fed to ksymoops? That
happens quite often. What happens with the IKD patch which adds values
between the addresses from stack? Why am I even bothering to reply to
these messages?

Trying to shrink the oops log to fit on a screen is an exercise in
futility. How do you propose to make this IA64 oops fit on 80x24?

Unable to handle kernel paging request at virtual address 0000000000000000
modprobe[301]: Oops 8804682956800
psr : 0000101008026030 ifs : 8000000000000184 ip : [<a000000000018100>]
unat: 0000000000000000 pfs : 0000000000000184 rsc : 0000000000000003
rnat: e00000003c727d60 bsps: e00000003c720000 pr : 00000000000ae693
ldrs: 0000000000000000 ccv : 0000000000000001 fpsr: 0009804c0270033f
b0 : a0000000000180f0 b6 : e0000000008487a0 b7 : e000000000521030
f6 : 1003e00000000000000a0 f7 : 1003e00000000000000a0
f8 : 1003e0000000000000005 f9 : 10002a000000000000000
r1 : a000000000018358 r2 : 0000000000000917 r3 : 00000000000000ff
r8 : 000000000000000d r9 : 6000000000024d30 r10 : 6000000000028d30
r11 : c00000000000040a r12 : e00000003c727da0 r13 : e00000003c720000
r14 : 0000000000000000 r15 : e00000003c720028 r16 : e00000003c720000
r17 : 0000000000000064 r18 : e000000000990240 r19 : e00000003c720098
r20 : 000000000000ff00 r21 : 0000000000000fff r22 : e000000000aeff20
r23 : 0000001008022030 r24 : a000000000018240 r25 : a000000000018398
r26 : 6000000000024e88 r27 : a000000000018398 r28 : 00000000000002d8
r29 : a000000000018398 r30 : 6000000000024e88 r31 : 0000000000000917
r32 : 0000000000000000 r33 : 0000000000000000 r34 : 0000000000000000
r35 : 0000000000000000
Call Trace: [<e0000000005267a0>] [<e000000000526f60>] [<e000000000531ae0>] [<e00000000054a4a0>]
[<e000000000521520>] [<a000000000018100>] [<a0000000000180f0>]
[<a0000000000180f0>] [<a0000000000180f0>] [<a0000000000180f0>]
[<a0000000000180f0>] [<a0000000000180f0>] [<a0000000000180f0>]
[<a0000000000180f0>] [<a0000000000180f0>] [<a0000000000180f0>]
[<a0000000000180f0>] [<a0000000000180f0>] [<a0000000000180f0>]
[<a0000000000180f0>]

Or this one from arm? Even without markers it is 82x47.

Unable to handle kernel paging request at virtual address 42062dd4
current->tss.memmap = 039DC000
*pgd = 00000000, *pmd = 00000000
Internal error: Oops: 0
CPU: 0
pc : [<c00160c8>] lr : [<c0016354>]
sp : c3813e88 ip : c3813ebc fp : c3813eb8
r10: 00000000 r9 : 00000004 r8 : 00000002
r7 : c3813ed4 r6 : e5832000 r5 : 00000000 r4 : 42062dd3
r3 : 00000000 r2 : 00000000 r1 : 00000003 r0 : c3813ed4
Flags: nZcv IRQs on FIQs on Mode SVC_32 Segment user
Control: 39DD17F Table: 039DD17F DAC: 00000015
Process insmod (pid: 267, stackpage=c3813000)
Stack:
c3813e60: c0016354 c00160c8
c3813e80: 40000013 ffffffff 00000004 c3813e98 00000003 c3813ed4 00000003 00000000
c3813ea0: ffffffea c4824094 42062dd3 c3813ed0 c3813ebc c0016354 c0015df4 ffffffff
c3813ec0: c3813f08 c3813f30 c3813ed4 c000e7ec c0016288 c01ccd80 c0134800 00000000
c3813ee0: 42062dd3 00000000 c4824000 00000000 ffffffea c4824094 42062dd3 00000000
c3813f00: c3813f30 c3813f34 c3813f1c c0020524 c482406c a0000013 ffffffff c4824094
c3813f20: 42062dd3 c3813fb0 c3813f34 c0020524 c4824058 00000005 c3813f3c c32c8000
c3813f40: c32d6000 00000048 c4815000 c4824048 00000094 00000000 00000000 00000000
c3813f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
c3813f80: 00000000 00000000 00000000 0205ce98 c3812000 00000080 60000013 c3813ff4
c3813fa0: 00000000 00000000 c3813fb4 c000ec0c c001ff24 0205ce98 0205ce98 02062d40
c3813fc0: 00000000 00000000 0205ce98 bffffd94 020000c0 0204ab08 020001ac 00000000
c3813fe0: 00000000 bffffc74 02062d40 bffffc58 02003a54 020080b8 60000010 0205ce98
Backtrace:
Function entered at [<c0015de8>] from [<c0016354>]
r9 = 42062dd3
r8 = c4824094
r7 = ffffffea
r6 = 00000000
r5 = 00000003
r4 = c3813ed4
Function entered at [<c001627c>] from [<c000e7ec>]
r5 = c3813f08
r4 = ffffffff
Function entered at [<c482404c>] from [<c0020524>]
Function entered at [<c001ff18>] from [<c000ec0c>]
r9 = 00000000
r8 = c3813ff4
r7 = 60000013
r6 = 00000080
r5 = c3812000
r4 = 0205ce98
Code: e7973108 e1a02423 (e5c42001) e5c43000 e1a02823

Or this one from sparc, 106x26? No, you cannot merge the callers onto
multiple lines, on some traces the caller address is followed by the
parameters to the function.

Warning: kfree_skb passed an skb still on a list (from f00eb568).
Unable to handle kernel NULL pointer dereference<1>Unable to handle kernel NULL pointer dereference<1>tsk->mm->context = 00000172
tsk->mm->pgd = f8569400
rpc.nfsd(374): Oops
PSR: 40400fc1 PC: f00929f4 NPC: f00929f8 Y: 00000000
g0: 40000fc3 g1: 40400fe2 g2: f0196000 g3: fd002010 g4: 00000000 g5: 000186b5 g6: f8562000 g7: 00000050
o0: 00000001 o1: 00000000 o2: 0000000e o3: ee7794b2 o4: 00000100 o5: ee7795f4 sp: f8563918 o7: f00929cc
l0: f014c204 l1: f012de68 l2: 40400fe2 l3: f0187910 l4: 00000000 l5: f0145664 l6: f8562000 l7: 00000002
i0: e56df2c0 i1: f0151d54 i2: 000000b0 i3: ff000000 i4: 00000078 i5: f011e388 fp: f8563980 i7: f00eb86c
Caller[f00eb86c]
Caller[f001a74c]
Caller[f001654c]
Caller[f00167d0]
Caller[f0034988]
Caller[f0027448]
Caller[f0027814]
Caller[f0016fcc]
Caller[f00984f0]
Caller[f00985a0]
Caller[f0092aec]
Caller[f003b334]
Caller[f002fd4c]
Caller[f001806c]
Caller[00019444]
Instruction DUMP: e6260000 d2262004 f024e004 <f0224000> 4002389f 90100010 818ca000 01000000 01000000
Aiee, killing interrupt handler

If anybody really worries about the ix86 call trace going past column
80, just patch your kernel to print 5 fields per line instead of 8. Do
not change the format. But hand copying an oops from an 80x24 screen
is not going to work in the long term, see above. Fiddling with the
output format is a waste of time, instead work out how to capture the
oops without relying on hand copying or a limited screen size. Fix the
problem, not the symptom.

2000-11-25 12:49:27

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Keith Owens writes:
> "Albert D. Cahalan" <[email protected]> wrote:

>> Somebody else posted a reasonable hack for the [<>] problem.
>> His proposal involved letting multiple values share the same
>> markers, something like this:
>>
>> [<c19a5cb4 c180234c c1801134 c1706550 c1800248 c1603310 c1934878 c1840324>]
>
> What happens if the line is wrapped before being fed to ksymoops? That
> happens quite often.

You code handles far worse already, plus getting rid of all the
extra junk will make word wrapping far less common.

Robust tools don't fail just because the data isn't in one of 42
different hard-coded formats, and don't fail because of word-wrap.

> What happens with the IKD patch which adds values
> between the addresses from stack?

So use a different marker... those users with IKD are quite a bit
more likely to have a serial console anyway, so it matters little.
The current over-size format would be fine.

> Why am I even bothering to reply to
> these messages?

You are guilty of causing data loss via scrolling, and you know it.
It is not good that your choice of regular expressions should dictate
that we get stuck with oopses that don't fit on the standard screen.

> Trying to shrink the oops log to fit on a screen is an exercise in
> futility. How do you propose to make this IA64 oops fit on 80x24?

I didn't have much trouble, even while keeping it somewhat readable.
The actual data is worth much more than register names that are
exactly the same for every oops, so it is obvious what to chop out.

> Or this one from arm? Even without markers it is 82x47.

If the standard ARM console can hold 82x47 then that is OK.
If nearly all ARM systems are embedded devices with a serial
console, then again there is no problem. If VGA is common,
then this is a bug.

> Or this one from sparc, 106x26? No, you cannot merge the callers
> onto multiple lines, on some traces the caller address is followed
> by the parameters to the function.

Oh yes I can merge them, using a '/' as a separator. There are plenty
of other options... but isn't the SPARC console usually 128-wide anyway?

> But hand copying an oops from an 80x24 screen
> is not going to work in the long term, see above. Fiddling with the
> output format is a waste of time, instead work out how to capture the
> oops without relying on hand copying or a limited screen size. Fix the
> problem, not the symptom.

The 80x25 screen really must work. Normal users don't have serial
consoles, JTAG connections, ROM-based debuggers, or anything else
reliable that can be used to capture a crash message.

2000-11-25 12:51:07

by Russell King

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Albert D. Cahalan writes:
> Yes. Don't you look at the raw data anyway?

I look at the raw stack data from time to time, but mostly I want
the backtrace, PC and LR converted into something more meaningful,
and I don't want the extra clutter of that particular raw data.

> In theory yes, but in practice no. Your kernel isn't a significant
> portion of your address space, so the chance of random data being
> looked up successfully is very low. Maybe a 1% chance on 32-bit
> hardware, and far less on 64-bit hardware.

Not so. This is my point; on the ARM, when you get stuff like stack
and registers dumped, a lot of the hex numbers can look very much like
addresses in kernel space; most of them are data object symbols and
the like. There can be a lot of these, and suddenly you'd end up with
most of the System.map being output because something in the dump
somewhere looks like its a symbol.

> Somebody else posted a reasonable hack for the [<>] problem.
> His proposal involved letting multiple values share the same
> markers, something like this:

Yep, now that is one idea I like!
_____
|_____| ------------------------------------------------- ---+---+-
| | Russell King [email protected] --- ---
| | | | http://www.arm.linux.org.uk/personal/aboutme.html / / |
| +-+-+ --- -+-
/ | THE developer of ARM Linux |+| /|\
/ | | | --- |
+-+-+ ------------------------------------------------- /\\\ |

2000-11-25 18:51:10

by Guest section DW

[permalink] [raw]
Subject: Re: silly [< >]

On Sat, Nov 25, 2000 at 10:07:44PM +1100, Keith Owens wrote:

> If anybody really worries about the ix86 call trace going past column
> 80, just patch your kernel to print 5 fields per line instead of 8. Do
> not change the format. But hand copying an oops from an 80x24 screen
> is not going to work in the long term, see above. Fiddling with the
> output format is a waste of time, instead work out how to capture the
> oops without relying on hand copying or a limited screen size. Fix the
> problem, not the symptom.

Maybe you do not understand. This is a step towards capturing an oops.
People do different things - invent schemes to write an oops to swap space,
or to a floppy, or put it in memory in a place that might survive a reboot,
or log it over the ethernet, or attach a serial line to some terminal or
logging computer, they even use a high speed camera to capture an oops
as it is flying by.

These are all useful. Not every computer has a floppy drive, not every
computer has a hard disk, not every computer has a monitor screen,
not every computer can feasibly be connected to something else,
so more than a single scheme is needed. The scheme: write info to the screen
is very simple and is useful in a large number of cases.
This screen has a very finite capacity (known to the kernel) so it is
important not to waste this capacity.

At present we waste a lot of space inserting meaningless parentheses.
The place of these parentheses is syntactically predictable, so
ksymoops can insert them just as well as the kernel can.
(I am talking about i386 here, have not looked at other architectures.)

And what is reality? I work under X and the system crashes. Boom. Dead.
No reaction to anything, no log, zero information. Ach.
Six weeks later, I work under X and the system crashes. Boom. Dead.
Nothing to do, zero information.
Four weeks later, I am on the console and the system crashes. A panic!
The message scrolls over the screen but the call trace is too long
and wastes 50% of the space so that only the end is visible - but that
is the part with the most worthless information. I do not want EIP to scroll
off-screen, I want the first few items of the Call Trace, not the last few.
Again the system is dead, and no useful information was obtained.

So, each time I miss important information because of these stupid
parentheses, I patch that kernel. And it regularly happens that I am
rewarded with a complete message that would otherwise have missed
the most important part. I think this change is useful for everybody,
not just for me.

Andries

2000-11-26 23:14:17

by Pavel Machek

[permalink] [raw]
Subject: Re: silly [< >]

Hi!

You complain that most important information is on begging of OOps. So what
about moving most important info to the end? I.e. stack trace first, EIP next;
or even print EIP twice. Than s to [<, ksymoops will not break ;-)
Pavel

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2000-11-27 22:33:40

by Peter Samuelson

[permalink] [raw]
Subject: Re: silly [< >] and other excess


[Albert D. Cahalan]
> > Somebody else posted a reasonable hack for the [<>] problem. His
> > proposal involved letting multiple values share the same markers,
> > something like this:

[Russell King]
> Yep, now that is one idea I like!

Me too. (: Keith posed two objections:

1. The >] could get word-wrapped so that it doesn't appear on the same
line as the [<. I *do not* see what makes this hard to parse
reliably.

2. Someone (i.e. kernel debugger) could insert extra text. Well, same
culprits can mangle oopsen already -- see klogd. These evil tools,
whichever ones they may be, should learn to use /* */ or something.
That way it is relatively easy to ignore their output.

Peter

PS. Should we be using KERN_* here?

--- arch/i386/kernel/traps.c.orig Mon Nov 13 01:44:02 2000
+++ arch/i386/kernel/traps.c Thu Nov 23 10:10:06 2000
@@ -126,7 +126,6 @@
printk("%08lx ", *stack++);
}

- printk("\nCall Trace: ");
stack = esp;
i = 1;
module_start = VMALLOC_START;
@@ -144,12 +143,17 @@
if (((addr >= (unsigned long) &_stext) &&
(addr <= (unsigned long) &_etext)) ||
((addr >= module_start) && (addr <= module_end))) {
- if (i && ((i % 8) == 0))
- printk("\n ");
- printk("[<%08lx>] ", addr);
+ if (i==1)
+ printk("\nCall Trace: [<");
+ else if ((i % 8)==0)
+ printk(">]\n [<");
+ else
+ printk(" ");
+ printk("%08lx", addr);
i++;
}
}
+ printk(">]\n");
}

static void show_registers(struct pt_regs *regs)

2000-11-27 23:06:22

by Keith Owens

[permalink] [raw]
Subject: Re: silly [< >] and other excess

On Mon, 27 Nov 2000 16:02:13 -0600,
Peter Samuelson <[email protected]> wrote:
>
> [Albert D. Cahalan]
>> > Somebody else posted a reasonable hack for the [<>] problem. His
>> > proposal involved letting multiple values share the same markers,
>> > something like this:
>Me too. (: Keith posed two objections:
>
>1. The >] could get word-wrapped so that it doesn't appear on the same
> line as the [<. I *do not* see what makes this hard to parse
> reliably.

People seem to have forgotten that reading an oops from the screen is
not the only source of data. Many oops are read from syslog which
contains lots of different lines, most of which have no identification.
ksymoops has to pick out oops text from a syslog and ignore all the
non-oops lines.

If the oops text is just a hex number with no identifying characters
then it is very difficult to pick out oops text from all the other
noise in syslog. ksymoops already gets false positives and prints some
non-oops text, this confuses users who think that these lines are
related to the oops.

Removing [< >] increases the already high level of ambiguity and false
positives in oops reporting from syslog. The presence of the marker
characters makes the output more robust when line wrapped, without the
markers a line wrapped trace is just a hex number.

2000-11-27 23:39:16

by Nerijus Baliūnas

[permalink] [raw]
Subject: bread in fat_access failed

Hello,

I get a lot of such messages:

Nov 28 01:00:28 sargis kernel: bread in fat_access failed
Nov 28 01:00:28 sargis kernel: attempt to access beyond end of device
Nov 28 01:00:28 sargis kernel: 02:00: rw=0, want=5, limit=4
Nov 28 01:00:28 sargis kernel: dev 02:00 blksize=512 blocknr=9 sector=9 size=512
count=1

What do they mean? Problems with hdd?

2000-11-27 23:48:56

by Nerijus Baliūnas

[permalink] [raw]
Subject: RE: bread in fat_access failed

> Nov 28 01:00:28 sargis kernel: bread in fat_access failed
> Nov 28 01:00:28 sargis kernel: attempt to access beyond end of device
> Nov 28 01:00:28 sargis kernel: 02:00: rw=0, want=5, limit=4
> Nov 28 01:00:28 sargis kernel: dev 02:00 blksize=512 blocknr=9
> sector=9 size=512
> count=1
>
> What do they mean? Problems with hdd?

Sorry, it seems floppy was removed when still mounted.

2000-11-28 09:47:58

by Christian Gennerat

[permalink] [raw]
Subject: Re: silly [< >] and other excess

Keith Owens a ?crit :

> On Mon, 27 Nov 2000 16:02:13 -0600,
> > [Albert D. Cahalan]
> >> > Somebody else posted a reasonable hack for the [<>] problem. His
> >> > proposal involved letting multiple values share the same markers,
> >> > something like this:
> >Me too. (: Keith posed two objections:
> >
> >1. The >] could get word-wrapped so that it doesn't appear on the same
> > line as the [<. I *do not* see what makes this hard to parse
> > reliably.
>
> People seem to have forgotten that reading an oops from the screen is
> not the only source of data. Many oops are read from syslog which
> contains lots of different lines, most of which have no identification.
> ksymoops has to pick out oops text from a syslog and ignore all the
> non-oops lines.
>

When the oops is inside an interrupt, ther in no sync, and the only information
is on the 24 lines of the screen (not 25, because the oops ends with a "\n"
that kills one line at the top.

When you have a minor oops, you can add all information you want.
when you have a major oops that stops the machine,
implying a restart with fsck, you have only 25x80 characters.

So, 5 chars between 2 words is TOO MUCH !
why do not use only ONE special character as "~" "!" or ";"
instead of ">] [<"


>
> If the oops text is just a hex number with no identifying characters
> then it is very difficult to pick out oops text from all the other
> noise in syslog. ksymoops already gets false positives and prints some
> non-oops text, this confuses users who think that these lines are
> related to the oops.
>
> Removing [< >] increases the already high level of ambiguity and false
> positives in oops reporting from syslog. The presence of the marker
> characters makes the output more robust when line wrapped, without the
> markers a line wrapped trace is just a hex number.

yes, but use a marker made of only ONE character !

>
>