2001-04-23 18:10:49

by Jeff V. Merkey

[permalink] [raw]
Subject: filp_open() in 2.2.19 causes memory corruption



I am now using the filp_open() call in kernel to scan for tape
devices in lieu of chrdev_open()/blkdev_open(), but I have
discovered that calling this api with non-existent devices
appears to result in memory corruption and some nasty oops.

I have attached the code fragment and oops generated by calling
ScanTapeDevices(). This basically works the way Al Viro described,
and I like the auto-probing of the tape device via calls to
filp_open(), which is really slick, if I can just get over
the oops, I think it's there.

Jeff


Attachments:
trace.txt (1.77 kB)
tape.c (3.00 kB)
Download all attachments

2001-04-23 20:25:49

by Manfred Spraul

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption

Are you sure the trace is decoded correctly?

> CPU: 0
> EIP: 0010:[sys_mremap+31/884]
> EFLAGS: 00010206

> Code: ac ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 75 d9
ac ae is
lodsb
scasb

Could you run
#objdump --disassemble-all --reloc linux/mm/mremap.o | less

and check that the code is really at offset 31 of sys_mremap?

And is it correct that only 64 MB memory is installed/enabled?

--
Manfred


2001-04-23 20:51:10

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption

On Mon, Apr 23, 2001 at 10:24:55PM +0200, Manfred Spraul wrote:
> Are you sure the trace is decoded correctly?
>
> > CPU: 0
> > EIP: 0010:[sys_mremap+31/884]
> > EFLAGS: 00010206
>
> > Code: ac ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 75 d9
> ac ae is
> lodsb
> scasb
>
> Could you run
> #objdump --disassemble-all --reloc linux/mm/mremap.o | less
>
> and check that the code is really at offset 31 of sys_mremap?
>
> And is it correct that only 64 MB memory is installed/enabled?
>
> --
> Manfred


Manfred,

This is what's being reported when I produce the oops. I think we have
memory corruption somewhere, which explains the funky code offsets. It's
easy to reproduce. Call filp_open with the handle table I gave you
on a single IDE system with **NO** tape drive in the system, and it
crashes quite after the module is loaded the fisrt time, then unloaded,
and reloaded a second time. The oops happens on the second insmod
of the module. I can provide you the actual module itself built with
all the code if you want to reproduce it.

It's 100% reproduceable.

Jeff

>

2001-04-23 22:04:21

by David Woodhouse

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption


[email protected] said:
> Are you sure the trace is decoded correctly?

> > CPU: 0
> > EIP: 0010:[sys_mremap+31/884]

Probably not. It looks like it was munged by klogd. Some distributions are
still shipping with klogd configured to destroy the original information on
the way to the log, without even making it do a sanity check that the
System.map it's using actually matches the current kernel.

Jeff, please disable the broken klogd symbol munging and reproduce it,
running the oops through ksymoops manually. Ksymoops should have built-in
sanity checks on the System.map it tries to use.

Also, please make sure you report this as a serious bug with the vendor of
whatever distribution you're running on this box.

--
dwmw2


2001-04-23 22:42:55

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption

On Mon, Apr 23, 2001 at 11:03:48PM +0100, David Woodhouse wrote:
>
> [email protected] said:
> > Are you sure the trace is decoded correctly?
>
> > > CPU: 0
> > > EIP: 0010:[sys_mremap+31/884]
>
> Probably not. It looks like it was munged by klogd. Some distributions are
> still shipping with klogd configured to destroy the original information on
> the way to the log, without even making it do a sanity check that the
> System.map it's using actually matches the current kernel.
>
> Jeff, please disable the broken klogd symbol munging and reproduce it,
> running the oops through ksymoops manually. Ksymoops should have built-in
> sanity checks on the System.map it tries to use.
>
> Also, please make sure you report this as a serious bug with the vendor of
> whatever distribution you're running on this box.
>


David,

I will comply and repost the oops.

Jeff

> --
> dwmw2
>

2001-04-25 23:39:13

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption

On Mon, Apr 23, 2001 at 11:47:27PM +0100, David Woodhouse wrote:

David/LKML,

I've gotten to the bottom of this problem, and you are correct that klog
is trashing the messages file for the oops. As for the oops, it was related
to the use of ll_rw_blk() instead of submit_bh() in 2.4.3 which was causing
memory corruption in Linus' buffer cache code. In NetWare, we used to
create a signature field for I/O and other structures that were submitted
by modules other than the media manager.

This would be useful for the buffer cache to put in a signature field so
if he ever gets back a buffer head that is not his, the buffer cache
could drop it with a noisy message rather than have memory corruption
and other side effects that take days to track down.

Jeff

>

2001-04-27 11:51:33

by David Woodhouse

[permalink] [raw]
Subject: Re: filp_open() in 2.2.19 causes memory corruption


[email protected] said:
> I've gotten to the bottom of this problem, and you are correct that
> klog is trashing the messages file for the oops.

Oh dear. That's quite a serious bug in klogd. It should never destroy the
original information, _especially_ if the System.map it's looking at
blatantly doesn't match /proc/ksyms.

Have you reported it to your distribution vendor yet?

--
dwmw2