I have a driver memory mapping issue that I'm unsure how to
handle. Basically I've written a i810 framebuffer driver that uses
only stolen memory modes (Mostly for embedded customers). This driver
currently can only work when compiled into the kernel because I need
zap_page_rage(). Is there an acceptable way for me to get equivalent
functionality in a module so that this will be more useful to the
general public?
Some backup info:
The "stolen memory" is the 1mb that the bios takes from the system
before OS load. The i810 maps this in 64k banks to 0xa0000. I can
use any video modes <1MB in size by accessing the memory via these
64k banks and swapping banks when needed.
For the fb driver I allow memory mapping of a 1MB area on the fb device
file and install a zero_page fault handler. When a page is faulted I
map the 64k region that contains the page the client needs with
remap_page_range() and switch the memory bank. I then need to drop
any old 64k ranges so that I will get another zero_page fault when
they are accessed. This way the client see's 1MB linear memory and
I bank flip behind the scenes.
So I'm using zap_page_range() to drop the pages for the "old" memory
bank. This, of course, is not exported to modules. Is there some
existing way to get this functionality in a module? is there any
chance to export zap_page_range()?
please cc this address in replies
-Matt
On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
> currently can only work when compiled into the kernel because I need
> zap_page_rage(). Is there an acceptable way for me to get equivalent
> functionality in a module so that this will be more useful to the
> general public?
The vm does zap_page_range for you if you're implementing an mmap operation,
otherwise vmalloc/vfree/vremap will take care of the details for you. How
is your code using zap_page_range? It really shouldn't be.
-ben
>On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
>> currently can only work when compiled into the kernel because I need
>> zap_page_rage(). Is there an acceptable way for me to get equivalent
>> functionality in a module so that this will be more useful to the
>> general public?
>The vm does zap_page_range for you if you're implementing an
>mmap operation,
It only does zap_page_range() when the memory map is being
removed right?
>otherwise vmalloc/vfree/vremap will take care of the details for
>you. How is your code using zap_page_range? It really shouldn't be.
I will try to explain in it again in another way.
I have a 64k sliding "window" into a 1MB region. You can only access
64k at a time then you have to switch the "bank" to access the next
64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
memory is above the top of memory and not directly addressable by the
CPU, you have to go through the banks.
My driver implements the mmap file operation and does NOT do a
remap_page_range(). I also install a zero_page fault handler.
The client application then memory maps a 1MB region on the device
file. When the client tries to access the first page, my fault
handler is called and I remap_page_range() the 64k window
and set the hardware such that the first 64k of memory is what
can be viewed through the window.
When the client gets to 64k + 1 my fault handler is triggered again.
At this time I change the window to view the second 64k and do
another remap_page_range() of the window to the second 64k in the
vma. HERE is the problem. I need to get rid of the area so that
when the client reads from the first page my fault handler is
triggered again. zap_page_range() works, but only from within the
kernel.
This seems like something that would have lots of uses, so I assume
there is a way to do it that I just haven't discovered.
Is there no driver doing something like this to give mutual exclusion
to a memory mapped resource?
-Matt
On Fri, Dec 14, 2001 at 06:10:52PM -0800, Sottek, Matthew J wrote:
> >The vm does zap_page_range for you if you're implementing an
> >mmap operation,
>
> It only does zap_page_range() when the memory map is being
> removed right?
Right.
> I have a 64k sliding "window" into a 1MB region. You can only access
> 64k at a time then you have to switch the "bank" to access the next
> 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> memory is above the top of memory and not directly addressable by the
> CPU, you have to go through the banks.
Stop right there. You can't do that. The code will deadlock on page
faults for certain usage patterns. It's slow, inefficient and a waste
of effort.
-ben
--
Fish.
On Fri, 14 Dec 2001, Benjamin LaHaise wrote:
> > I have a 64k sliding "window" into a 1MB region. You can only access
> > 64k at a time then you have to switch the "bank" to access the next
> > 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> > memory is above the top of memory and not directly addressable by the
> > CPU, you have to go through the banks.
>
> Stop right there. You can't do that. The code will deadlock on page
> faults for certain usage patterns. It's slow, inefficient and a waste
> of effort.
Would you mind giving a hint how the predicted deadlock path would look
like or what the usage pattern might be, please?
I'm asking because I'm happily doing something very similar to what
Matthew describes without ever running into trouble - and this operates
at major page fault rates up to 1000/sec here. What I'm doings is:
in fops->mmap(vma), serialized with other file operations:
drv->vaddr = vma->vm_start;
drv->vlen = vma->vm_end - vma->vm_start;
vma->vm_flags |= VM_RESERVED;
vma->vm_ops = &my_vm_ops;
vma->vm_ops->nopage() is my overloaded page fault handler which maps
_selectable_ kmalloc'ed kernel pages to the userland vma.
in fops->ioctl(), again serialized with other file operations:
down_write(¤t->mm->mmap_sem);
zap_page_range(current->mm, drv->vaddr, drv->vlen);
up_write(¤t->mm->mmap_sem);
note that this is pretty much the same what sys_munmap() is doing - with
one important difference: the mmap'ed vma isn't freed, it just remains
unchanged and a major page fault is triggered on the next access.
Finally let me point out that performance is not an issue here - and IMHO
simple creation and destruction of pte's pointing to advance-kmalloc'ed
pages shouldn't be that slow anyway. OTOH, ability to use the page fault
handler to control which page gets mapped to this vma (including none,
i.e. forcing SIGBUS) is an issue.
Regards,
Martin
On Fri, 14 Dec 2001, Benjamin LaHaise wrote:
> On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
> > currently can only work when compiled into the kernel because I need
> > zap_page_rage(). Is there an acceptable way for me to get equivalent
> > functionality in a module so that this will be more useful to the
> > general public?
>
> The vm does zap_page_range for you if you're implementing an mmap operation,
> otherwise vmalloc/vfree/vremap will take care of the details for you. How
> is your code using zap_page_range? It really shouldn't be.
True, but IMHO only for standard mmap semantics.
Well, the background is slightly different here, but very much the same
problem: I'd like to get rid of some page(s) which are mapped to an
userland vma. At certain points I need to force a page fault on this and
so the overloaded vma->nopage() gets called and can do the right thing.
zap_page_range() does exactly what I want. IMHO zap_page_range() is some
kind of symmetric buddy of remap_page_range() - it's somewhat surprizing
to find one exported but not the other one. And, AFAICS, there is no
technical reason as well, not to use it - at least for me it's working
perfectly fine. Of course it needs proper mm serialization provided by
down_write(&mm->mmap_sem).
Martin
Martin Diehl wrote:
>
> On Fri, 14 Dec 2001, Benjamin LaHaise wrote:
>
> > > I have a 64k sliding "window" into a 1MB region. You can only access
> > > 64k at a time then you have to switch the "bank" to access the next
> > > 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> > > memory is above the top of memory and not directly addressable by the
> > > CPU, you have to go through the banks.
> >
> > Stop right there. You can't do that. The code will deadlock on page
> > faults for certain usage patterns. It's slow, inefficient and a waste
> > of effort.
>
> Would you mind giving a hint how the predicted deadlock path would look
> like or what the usage pattern might be, please?
>
> I'm asking because I'm happily doing something very similar to what
> Matthew describes without ever running into trouble - and this operates
> at major page fault rates up to 1000/sec here. What I'm doings is:
Some processors have instructions that require 2 or more pages
present simultaneously to execute. That _will_ fail
spectacularly if the two pages belongs to different banks
in the above scenario, as only one bank can be present at a time.
Some examples for x86 processors:
1. The string move/compare instructions. Fine for copying blocks of
memory around. The above case is a framebuffer, using
"movsd" to copy from one location to another isn't
all that uncommon. The two locations might be in different banks.
2. An unaligned read or write, such as writing a 32-bit quantity
to the last even address in the first bank. The the rest hits
the first part of the next bank. (A 16-bit quantity written to
the last odd address does the same thing.)
3. An instruction that cross a bank bounddary, or lives in one
and access data in another bank. Of course you don't usually
store instructions in a frame buffer. :-)
4. Processor-specific structures (page tables, interrupt
vectors... stored so they cross a bank.) Not applicable
to framebuffers, but there might be strange machines with
bank-switched main memory around.
In any of these cases, the following happens:
1. You get a page fault for the page in the missing bank.
2. The page fault handler switch banks.
3. The instruction is restarted as the page fault handler returns
4. You get a page fault for the now missing page in the bank
that was switched off.
5. The page fault handler switch banks
7. the instruction is restarted. Repeat from 1 in
an endless loop. Your machine is now deadlocked. Perhaps
you're so lucky that some other processes still gets
scheduled - lets hope none of them need the bank-switched
memory _at all_.
Helge Hafting