2003-03-08 13:37:54

by Ludootje

[permalink] [raw]
Subject: what's an OOPS

Hi,

I've been reading LKML for a few weeks now to understand Linux
development better, and there's one thing I just can't understand:
what's an OOPS? What does it stand for, what is it?
I search for it on google.com/linux, but found only one result and
didn't understand it really well... I hope one of you can help me out
here?

Sorry for the newbie question :o

Thanks,
Ludootje

--

The Grasshoppers' Linux Journal - a free, online distributed magazine about GNU/Linux / Open Source / ... oriented towards newbies. Check it out @ http://ghj.sunsite.dk !


2003-03-08 13:55:24

by bert hubert

[permalink] [raw]
Subject: Re: what's an OOPS

On Sat, Mar 08, 2003 at 02:47:10PM +0100, Ludootje wrote:
> Hi,
>
> I've been reading LKML for a few weeks now to understand Linux
> development better, and there's one thing I just can't understand:
> what's an OOPS? What does it stand for, what is it?

An oops is a lot like a segmentation fault for a userspace program. It
indicates the kernel tried to access memory that doesn't exist, for example.

Regards,

bert

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl Consulting

2003-03-08 14:07:54

by John Bradford

[permalink] [raw]
Subject: Re: what's an OOPS

> I've been reading LKML for a few weeks now to understand Linux
> development better, and there's one thing I just can't understand:
> what's an OOPS? What does it stand for, what is it?

It's a report of a bug in the kernel, for example, if the kernel tried
to access an invalid memory location. It doesn't necessarily indicate
a programming error - faulty hardware can cause an OOPS as well.

The following explaination may not be 100% accurate, hopefully
somebody else will post a better one, but here goes:

As far as I know it doesn't stand for anything, and the name is a
kind-of joke, (as in, "oops, we've found a bug in the kernel").

On X86, an OOPS contains information such as:

Text description - something like "Unable to handle NULL pointer
dereference". This tells you what sort of error it is.

The number of the oops, (I.E. whether it was the first, second, third,
etc, starting with 0000).

The CPU it occured on, (0 on a single processor machine). Note, I
think that on a multi processor machine, there isn't a physical
relationship between CPU and number, I.E. CPUs are assigned numbers on
boot, in a semi-random fashion.

The contents of the CPU's registers.

A stack backtrace.

The code the CPU was executing.

A call trace, which is, basically, a list of functions that the
process was in at the moment of the OOPS. The actual numeric values
are almost completely useless[1], because they depend on your
particular kernel. Only somebody who has access to the corresponding
symbol map for that kernel can identify the actual names of the
functions, and this is why there are often posts by developers on this
list asking people to decode an OOPS they have posted.

[1] Without it being decoded, you can still check, for example,
whether the CPU was executing data, but it's mostly speculation.

John.

2003-03-08 15:59:41

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: what's an OOPS


On Sat, 8 Mar 2003, John Bradford wrote:

> The number of the oops, (I.E. whether it was the first, second, third,
> etc, starting with 0000).

Urban myth (at least on i386). The "Oops:" part can be decoded on i386 as,

* bit 0 == 0 means no page found, 1 means protection fault
* bit 1 == 0 means read, 1 means write
* bit 2 == 0 means kernel, 1 means user-mode

Szaka

2003-03-08 16:42:23

by John Bradford

[permalink] [raw]
Subject: Re: what's an OOPS

> > The number of the oops, (I.E. whether it was the first, second, third,
> > etc, starting with 0000).
>
> Urban myth (at least on i386). The "Oops:" part can be decoded on i386 as,
>
> * bit 0 == 0 means no page found, 1 means protection fault
> * bit 1 == 0 means read, 1 means write
> * bit 2 == 0 means kernel, 1 means user-mode

Interesting - I wasn't aware of that.

Maybe we should note this in Documentation/oops-tracing.txt?

Infact, overall there must be quite a lot that isn't documented at
all, except in this mailing list's archives - I think an overhaul of
Documentation/* is more than slightly overdue...

John.

2003-03-08 18:28:30

by Ludootje

[permalink] [raw]
Subject: Re: what's an OOPS

Op za 08-03-2003, om 17:54 schreef John Bradford:
> > > The number of the oops, (I.E. whether it was the first, second, third,
> > > etc, starting with 0000).
> >
> > Urban myth (at least on i386). The "Oops:" part can be decoded on i386 as,
> >
> > * bit 0 == 0 means no page found, 1 means protection fault
> > * bit 1 == 0 means read, 1 means write
> > * bit 2 == 0 means kernel, 1 means user-mode
>
> Interesting - I wasn't aware of that.
>
> Maybe we should note this in Documentation/oops-tracing.txt?
>
> Infact, overall there must be quite a lot that isn't documented at
> all, except in this mailing list's archives - I think an overhaul of
> Documentation/* is more than slightly overdue...
>
> John.

Thanks a lot for the very good explanatiosn everyone, I really
appreciate it!

Thanks,
Ludootje

--

The Grasshoppers' Linux Journal - a free, online distributed magazine about GNU/Linux / Open Source / ... oriented towards newbies. Check it out @ http://ghj.sunsite.dk !