Hi,
I want to mmap a device in an application, so I do:
base = mmap(NULL , DEV_LENGTH, myprot , flags, kmem, dev_base);
Turns out that some BIOSs put my device at an address like
0xdffffc00
whereas others put it at 0xfa000000 . In the latter case, mmap works
as expected. However in the first case I get EINVAL: The base is
not page-aligned.
However, in the latter case I get my requested 1k of memory, and the
following 3k for free. In the first case I'd want "3k for free,
followed by the 1k I requested".
effectively, provided "start" equals NULL, the kernel IMHO should:
offset = dev_base & PAGE_MASK;
return mmap (NULL, length+offset, prot, flags, base - offset) + offset;
Comments?
The "failure" was observed on 2.4.14 and/or 2.4.9.
Roger.
P.S. I end up not being able to closely follow linux-kernel
lately. CCs to me appreciated.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.
Followup to: <[email protected]>
By author: [email protected] (Rogier Wolff)
In newsgroup: linux.dev.kernel
>
>
> Hi,
>
> I want to mmap a device in an application, so I do:
>
> base = mmap(NULL , DEV_LENGTH, myprot , flags, kmem, dev_base);
>
> Turns out that some BIOSs put my device at an address like
>
> 0xdffffc00
>
> whereas others put it at 0xfa000000 . In the latter case, mmap works
> as expected. However in the first case I get EINVAL: The base is
> not page-aligned.
>
> However, in the latter case I get my requested 1k of memory, and the
> following 3k for free. In the first case I'd want "3k for free,
> followed by the 1k I requested".
>
> effectively, provided "start" equals NULL, the kernel IMHO should:
>
> offset = dev_base & PAGE_MASK;
> return mmap (NULL, length+offset, prot, flags, base - offset) + offset;
> Comments?
>
> The "failure" was observed on 2.4.14 and/or 2.4.9.
>
> Roger.
>
>
> P.S. I end up not being able to closely follow linux-kernel
> lately. CCs to me appreciated.
>
Sorry, what you're asking for the TLB to do something it simply cannot
do -- there is no way for the TLB to remap the bottom 12 bits since it
doesn't control them.
What you'd have to do is to make your device driver move the device to
a different address.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
H. Peter Anvin wrote:
> Rogier Wolff wrote:
>
> >
> > I know about TLBs. I know how they work, and I think I've explained it
> > well enough that rereading my message should allow you to understand
> > what I'm saying. Still, let me try to picture it...
> >
> > Situation A:
> >
> > physical map. XX is the interesting part, | is a page boundary, = is
> > "uninteresting stuff".
> >
> > |========|XX======|========|========|========|
> >
> > virtual map:
> >
> > |--------|XX======|--------|--------|--------|
> > ^
> > | This is the pointer that mmap returns.
> >
> > - is "unmapped".
> >
> >
> > Situation B:
> >
> > |========|======XX|========|========|========|
> >
> > virtual map:
> >
> > |--------|======XX|--------|--------|--------|
> > ^
> > | This is the pointer that mmap returns.
> >
> > In Situation A I get the 1K mapped that I wanted and 3 more because
> > the MMU can't NOT give me access to that. Situation B is exactly the
> > same, except that I get those extra 3K in front of the pointer
> > that I get returned by mmap.
> >
>
>
> Just make the adjustment in userspace, if your application really can
> handle it. This is never going to fly generically (and therefore not
> get integrated into anything), because the PCI BIOS will typically map
> multiple things into that 4K chunk, and thus you have opened up your
> system to messing with a completely "innocent" device.
>
> Since the only way is to avoid this involves moving your device to its
> own 4K chunk of I/O space anyway, you don't really have a choice.
There is this application that was written in '91-93 that works in
situation A and not in situation B. It follows the ruls from "mmap"(*),
but the kernel just doesn't do the obvious thing.
If I address something before my 1k window, in situation A, I'll get a
segfault. If I address something beyond my 1k window in situation B
I'll get a segfault.
If I address something after my 1k window in situation A, I'll access
an innocent other device. Same if I address something before my window
in situation B.
Now in practise, I agree that it is more likely in situation B that
something is actually mapped there.
I'm not sure wether the kernel has been wrong all the time or if
something changed recently. I posted the "workaround" the first time
through, which also works from userspace. I can change my application.
I can modify my libc.
However, I'd rather have "mmap" fixed, as that fixes it for all other
applications too. Not just for mine on my system.
The SGI manpage says:
All implementations interpret an addr value of
zero as granting the system complete freedom in selecting pa, subject to
constraints described below. A non-zero value of addr is taken to be a
suggestion of a process address near which the mapping should be placed.
which hints at a possible non-alignment. It also mentions that
"offset" should be page-aligned, which I disagree with here:
everything has been set up to "do the right thing" when the mapping is
possible with an unaligned offset.
Roger.
(*) Allow mmap to chose the address, to allow mmap the maximum
flexibilty of mapping your object.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.
On Sat, 17 Nov 2001, Rogier Wolff wrote:
> The SGI manpage says:
>
> All implementations interpret an addr value of
> zero as granting the system complete freedom in selecting pa, subject to
> constraints described below. A non-zero value of addr is taken to be a
> suggestion of a process address near which the mapping should be placed.
^^^^^^^^^^
The key word here is "suggestion". There is absolutely no
requirement that the OS actually uses this address.
> which hints at a possible non-alignment. It also mentions that
> "offset" should be page-aligned, which I disagree with here:
> everything has been set up to "do the right thing" when the mapping is
> possible with an unaligned offset.
I don't know what MMU your machine has, but on most (if not
all) machines an mmap() is only possible when it's page-aligned.
regards,
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
[email protected] (Rogier Wolff) writes:
> I'm not sure wether the kernel has been wrong all the time or if
> something changed recently. I posted the "workaround" the first time
> through, which also works from userspace. I can change my application.
> I can modify my libc.
>
> However, I'd rather have "mmap" fixed, as that fixes it for all other
> applications too. Not just for mine on my system.
>
> The SGI manpage says:
>
> All implementations interpret an addr value of
> zero as granting the system complete freedom in selecting pa, subject to
> constraints described below. A non-zero value of addr is taken to be a
> suggestion of a process address near which the mapping should be placed.
>
> which hints at a possible non-alignment. It also mentions that
> "offset" should be page-aligned, which I disagree with here:
> everything has been set up to "do the right thing" when the mapping is
> possible with an unaligned offset.
Except there is no way to give you enough information to munmap the page.
As the address passed to munmap must be page aligned.
The current policy appears to make an application think as up front
as possible about the need to be page aligned when talking to mmap,
while not being overly harsh. We do have the silent rounding
up of length until it is a multiple of PAGE_SIZE.
Beyond this the internal linux implementation of mmap does not even
see the extra bits in the offset. Instead the most recent syscall
entry point takes an argument as to which page you want to mmap from
the device. This allows much larger devices to be mmaped while
still using 32bit arithmetic.
So I neither see that it is easy or even desirable to ``fix'' mmap.
Eric