Hello again,
If I start a system that has 1 Gb of memory with mem=500m,
the value of the kernel's num_physpages is 0x20000 as would
be expected. If I multiply that by PAGE_SIZE, I get 0x20000000,
also as expected. If I observe that memory region, I note
that somebody has written something there!
This is not good. The kernel touches RAM it doesn't own. I have
booted the system with only the internal floppy controller
and no other modules installed. I see the same thing.
Script started on Fri Apr 16 11:33:39 2004
# monitor
TMD Platinum(tm) Control System Version 2.0
Copyright(c) 1999-2003, Analogic Corporation
Enter "help" for commands
PLATINUM> dump=20000000
20000000 78 56 34 12 21 43 65 87-FF FF FF FF FF FF FF FF xV4.!Ce.........
20000010 FF FF FF FF FF FF FF FD-FF FF FF FF FF FF FF FF ................
20000020 FF FF FF FF FF FE FF FF-FF FF FE FF FF FF FF FF ................
[SNIPPED...]
My temporary work around for the kernel's destroying a
precious DMA buffer is to start one page higher. However,
whomever is writing to that RAM is likely writing other
places it doesn't belong also. This could lead to some
very interesting bugs.
Note that the value written there is 0x12345678, twice, once
in little endian and another in swap-nibble big endian, like
a mirror. This is evil.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (5596.77 BogoMips).
Note 96.31% of all statistics are fiction.
mem= isn't there to tell the kernel what ram it owns and what ram it
doesn't own. It's there to tell the kernel what ram is in the system.
Since you told the system it only has 500m, it assumes the rest of
the 3.5G of address space is available for things like memory mapped
i/o. If you cat /proc/iomem, you'll probably see something has
reserved the memory range in question.
I added a hack to make the kernel assume the greater of the mem= and
what is passed to in from the BIOS via the e820 maps is where the
unused address space starts. It seems to eliminate such problems.
Ross
On Fri, 16 Apr 2004 11:55:28 -0400 (EDT), Richard B. Johnson
<[email protected]> wrote:
>
>
> Hello again,
>
> If I start a system that has 1 Gb of memory with mem=500m,
> the value of the kernel's num_physpages is 0x20000 as would
> be expected. If I multiply that by PAGE_SIZE, I get 0x20000000,
> also as expected. If I observe that memory region, I note
> that somebody has written something there!
>
> This is not good. The kernel touches RAM it doesn't own. I have
> booted the system with only the internal floppy controller
> and no other modules installed. I see the same thing.
>
> Script started on Fri Apr 16 11:33:39 2004
> # monitor
> TMD Platinum(tm) Control System Version 2.0
> Copyright(c) 1999-2003, Analogic Corporation
>
> Enter "help" for commands
>
> PLATINUM> dump=20000000
> 20000000 78 56 34 12 21 43 65 87-FF FF FF FF FF FF FF FF xV4.!Ce.........
> 20000010 FF FF FF FF FF FF FF FD-FF FF FF FF FF FF FF FF ................
> 20000020 FF FF FF FF FF FE FF FF-FF FF FE FF FF FF FF FF ................
> [SNIPPED...]
>
> My temporary work around for the kernel's destroying a
> precious DMA buffer is to start one page higher. However,
> whomever is writing to that RAM is likely writing other
> places it doesn't belong also. This could lead to some
> very interesting bugs.
>
> Note that the value written there is 0x12345678, twice, once
> in little endian and another in swap-nibble big endian, like
> a mirror. This is evil.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.24 on an i686 machine (5596.77 BogoMips).
> Note 96.31% of all statistics are fiction.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Fri, 16 Apr 2004, Ross Biro wrote:
> mem= isn't there to tell the kernel what ram it owns and what ram it
> doesn't own. It's there to tell the kernel what ram is in the system.
> Since you told the system it only has 500m, it assumes the rest of
> the 3.5G of address space is available for things like memory mapped
> i/o. If you cat /proc/iomem, you'll probably see something has
> reserved the memory range in question.
>
No! This is address space, not RAM. Whether or not a PCI device
or whatever has internal RAM that's mapped makes no difference.
I told the kernel that it has 500m of RAM. It better not assume
I don't know what I'm talking about. I might have reserved that
RAM because it's bad or I may have something else important to
do with that RAM (which I do).
> I added a hack to make the kernel assume the greater of the mem= and
> what is passed to in from the BIOS via the e820 maps is where the
> unused address space starts. It seems to eliminate such problems.
>
> Ross
>
> On Fri, 16 Apr 2004 11:55:28 -0400 (EDT), Richard B. Johnson
> <[email protected]> wrote:
Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (5596.77 BogoMips).
Note 96.31% of all statistics are fiction.
On Fri, 16 Apr 2004 12:55:33 -0400 (EDT), Richard B. Johnson
<[email protected]> wrote:
>
> On Fri, 16 Apr 2004, Ross Biro wrote:
>
> > mem= isn't there to tell the kernel what ram it owns and what ram it
> > doesn't own. It's there to tell the kernel what ram is in the system.
> > Since you told the system it only has 500m, it assumes the rest of
> > the 3.5G of address space is available for things like memory mapped
> > i/o. If you cat /proc/iomem, you'll probably see something has
> > reserved the memory range in question.
> >
>
> No! This is address space, not RAM. Whether or not a PCI device
> or whatever has internal RAM that's mapped makes no difference.
>
> I told the kernel that it has 500m of RAM. It better not assume
> I don't know what I'm talking about. I might have reserved that
> RAM because it's bad or I may have something else important to
> do with that RAM (which I do).
The problem is that the kernel does assume you know what you are
talking about, and you don't. You are abusing the mem= parameter.
That's fine, but then you have to tell the kernel what you really
mean. What you really want to say is there is memory above 500M and I
don't want you to touch it. There may be a way to do that via the
fancy mem=@ parameters.
What mem= tells the kernel is that there is RAM in a certain spot an
no where else. Since you told the kernel there is no ram about 500M,
that means that address space is free to be used for memory mapped
I/O. Since the kernel trusts you, it started using the memory above
500m for memory mapped i/o. Since you LIED to the kernel, you are
getting results you do not like. The solution I settled on was to
tell the kernel that people LIE to it and only use memory for I/O if
both the BIOS and the USER agree that it's available. You have to
find a way to tell the kernel the TRUTH, or you will never get the
results you want.
On Fri, 16 Apr 2004, Ross Biro wrote:
> On Fri, 16 Apr 2004, Richard B. Johnson wrote:
> >
> > On Fri, 16 Apr 2004, Ross Biro wrote:
> >
> > > mem= isn't there to tell the kernel what ram it owns and what ram it
> > > doesn't own. It's there to tell the kernel what ram is in the system.
> > > Since you told the system it only has 500m, it assumes the rest of
> > > the 3.5G of address space is available for things like memory mapped
> > > i/o. If you cat /proc/iomem, you'll probably see something has
> > > reserved the memory range in question.
> > >
> >
> > No! This is address space, not RAM. Whether or not a PCI device
> > or whatever has internal RAM that's mapped makes no difference.
> >
> > I told the kernel that it has 500m of RAM. It better not assume
> > I don't know what I'm talking about. I might have reserved that
> > RAM because it's bad or I may have something else important to
> > do with that RAM (which I do).
> The problem is that the kernel does assume you know what you are
> talking about, and you don't. You are abusing the mem= parameter.
> That's fine, but then you have to tell the kernel what you really
> mean. What you really want to say is there is memory above 500M and I
> don't want you to touch it. There may be a way to do that via the
> fancy mem=@ parameters.
> What mem= tells the kernel is that there is RAM in a certain spot an
> no where else. Since you told the kernel there is no ram about 500M,
> that means that address space is free to be used for memory mapped
> I/O. Since the kernel trusts you, it started using the memory above
> 500m for memory mapped i/o. Since you LIED to the kernel, you are
> getting results you do not like. The solution I settled on was to
> tell the kernel that people LIE to it and only use memory for I/O if
> both the BIOS and the USER agree that it's available. You have to
> find a way to tell the kernel the TRUTH, or you will never get the
> results you want.
> -
This is all most enlightening. If I am understanding correctly then every
device driver that the author specifies to use a "mem=" command to
reserve some memory for said drivers use at the upper part of physical
memory is stuffed by design.
I thought it was a valid technique? I never questioned it because there is
a history of its use -I think the early bttv driver was written this way.
I have been debugging an oops on a system which uses the open source
driver for the Matrox MeteorII multichannel available from,
http://www.emlix.com/index.php?id=158
This driver uses the technique and I am getting a corrupted slab free list.
Ross B, could I please have details of your mem bios hack please so I can try
it as a workaround.
Regards
Ross Dickson
On Sat, 17 Apr 2004 14:40:18 +1000, Ross Dickson
<[email protected]> wrote:
> This is all most enlightening. If I am understanding correctly then every
> device driver that the author specifies to use a "mem=" command to
> reserve some memory for said drivers use at the upper part of physical
> memory is stuffed by design.
The problem is really that Linux doesn't trust the BARs assigned by
the PCI bios because some BIOSes do it incorrectly. So it reprograms
them based on the memory map it got from the BIOS. However, before it
does that the mem= parameter overrides the memory map from the BIOS.
I believe what I did was to save a copy of the e820 maps for later,
and then take then take the first free address as the max of the first
free address from the user supplied map and the bios supplied map.
I'll send out a patch on Tuesday.
Ross
Followup to: <[email protected]>
By author: Ross Dickson <[email protected]>
In newsgroup: linux.dev.kernel
>
> This is all most enlightening. If I am understanding correctly then every
> device driver that the author specifies to use a "mem=" command to
> reserve some memory for said drivers use at the upper part of physical
> memory is stuffed by design.
>
Yup.
-hpa
> I believe what I did was to save a copy of the e820 maps for later,
> and then take then take the first free address as the max of the first
> free address from the user supplied map and the bios supplied map.
> I'll send out a patch on Tuesday.
Here's the changes to setup.c. I haven't check to see if it's
complete, but I did the diff from a working 2.4.18-kernel. You need
to apply this patch in arch/i386/kernel.