2007-10-22 15:12:29

by Stefani Seibold

[permalink] [raw]
Subject: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

Hi,

i have a problem with vmalloc() and vm_ops.page_mkwrite().

ReadOnly access works, but on a write access the VM will
endless invoke the vm_ops.page_mkwrite() handler.

I tracked down the problem to the
struct page.mapping pointer,
which is NULL.

The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
which will be in an extra thread handled, to perform the IO and clean and
write protect all pages with page_clean().

I am not sure if the is a feature of the new rmap code or a bug.

Is there an way to get a similar functionality? Currently, i have no
idea
how to get the ptep from a page alloced with vmalloc().

Greetings,
Stefani

Here is a small sample driver:

#include <linux/module.h>
#include <linux/errno.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>

#define DEVICE_NAME "mydrv"
#define DEVICE_MAJOR 240

static u8 *mydrv_memory;
static const u_long mydrv_memory_size=1024*1024; // 1 Megabyte

#ifdef MODULE
static u_long vmas=0;
#endif

static void mydrv_vma_open(struct vm_area_struct *vma)
{
#ifdef MODULE
if (vmas++==0)
try_module_get(THIS_MODULE);
#endif
}

static void mydrv_vma_close(struct vm_area_struct *vma)
{
#ifdef MODULE
if (--vmas==0)
module_put(THIS_MODULE);
#endif
}

struct page *mydrv_vma_nopage(struct vm_area_struct *vma,unsigned long
address,int *type)
{
unsigned long offset;
struct page *page;

offset = (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
printk("--> mydrv_vma_nopage:%lu(%08lx)\n",offset,offset);

if (offset >= mydrv_memory_size)
return NOPAGE_SIGBUS;

page = vmalloc_to_page(mydrv_memory + offset);
if (!page)
return NOPAGE_OOM;

get_page(page);
if (type)
*type = VM_FAULT_MINOR;
return page;
}

int mydrv_mkwrite(struct vm_area_struct *vma,struct page *page)
{
printk("--> mydrv_mkwrite:%p\n",page->mapping);

return 0;
}

struct vm_operations_struct mydrv_vm_ops = {
.open = mydrv_vma_open,
.close = mydrv_vma_close,
.nopage = mydrv_vma_nopage,
.page_mkwrite = mydrv_mkwrite,
};

static int mydrv_mmap(struct file *file, struct vm_area_struct *vma)
{
vma->vm_ops = &mydrv_vm_ops;
vma->vm_flags |= ( VM_IO | VM_RESERVED | VM_DONTEXPAND );
vma->vm_private_data = 0;
mydrv_vma_open(vma);
return 0;
}

int mydrv_open(struct inode *inode, struct file *file)
{
int minor;

minor=MINOR(file->f_dentry->d_inode->i_rdev);
printk("--> mydrv_open called for minor: %d\n", minor);

if (minor>0) {
printk("--> mydrv_open minor %d failed\n",minor);
return -ENODEV;
}

printk("--> mydrv_open o.K.\n");
return 0;
}


int mydrv_close(struct inode *inode, struct file *file)
{
int minor;

minor=MINOR(file->f_dentry->d_inode->i_rdev);
printk("--> mydrv_close for minor: %d\n", minor);
return 0;
}

struct file_operations mydrv_fops = {
.owner=THIS_MODULE,
.mmap=mydrv_mmap,
.open=mydrv_open,
.release=mydrv_close,
};

int init_module(void)
{
printk("--> init_module called\n");

if (!(mydrv_memory = vmalloc(mydrv_memory_size)))
return -ENOMEM;

if (register_chrdev(DEVICE_MAJOR, DEVICE_NAME, &mydrv_fops) < 0) {
printk("--> Error registering driver.\n");
return -ENODEV;
}

memset(mydrv_memory, 0, mydrv_memory_size);
printk("--> init_module done\n");
return 0;
}


void cleanup_module(void)
{
printk("--> cleanup_module called\n");
unregister_chrdev(DEVICE_MAJOR,DEVICE_NAME);
vfree(mydrv_memory);
printk("--> cleanup_module done\n");
}

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Stefan Seibold <[email protected]>");
MODULE_DESCRIPTION("Test Driver");



2007-10-29 07:40:49

by Andrew Morton

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold <[email protected]> wrote:

> Hi,
>
> i have a problem with vmalloc() and vm_ops.page_mkwrite().
>
> ReadOnly access works, but on a write access the VM will
> endless invoke the vm_ops.page_mkwrite() handler.
>
> I tracked down the problem to the
> struct page.mapping pointer,
> which is NULL.
>
> The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> which will be in an extra thread handled, to perform the IO and clean and
> write protect all pages with page_clean().
>
> I am not sure if the is a feature of the new rmap code or a bug.
>
> Is there an way to get a similar functionality? Currently, i have no
> idea
> how to get the ptep from a page alloced with vmalloc().
>
> Greetings,
> Stefani
>
> Here is a small sample driver:
>
> #include <linux/module.h>
> #include <linux/errno.h>
> #include <linux/string.h>
> #include <linux/mm.h>
> #include <linux/vmalloc.h>
> #include <linux/fs.h>
>
> #define DEVICE_NAME "mydrv"
> #define DEVICE_MAJOR 240
>
> static u8 *mydrv_memory;
> static const u_long mydrv_memory_size=1024*1024; // 1 Megabyte
>
> #ifdef MODULE
> static u_long vmas=0;
> #endif
>
> static void mydrv_vma_open(struct vm_area_struct *vma)
> {
> #ifdef MODULE
> if (vmas++==0)
> try_module_get(THIS_MODULE);
> #endif
> }
>
> static void mydrv_vma_close(struct vm_area_struct *vma)
> {
> #ifdef MODULE
> if (--vmas==0)
> module_put(THIS_MODULE);
> #endif
> }
>
> struct page *mydrv_vma_nopage(struct vm_area_struct *vma,unsigned long
> address,int *type)
> {
> unsigned long offset;
> struct page *page;
>
> offset = (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
> printk("--> mydrv_vma_nopage:%lu(%08lx)\n",offset,offset);
>
> if (offset >= mydrv_memory_size)
> return NOPAGE_SIGBUS;
>
> page = vmalloc_to_page(mydrv_memory + offset);
> if (!page)
> return NOPAGE_OOM;
>
> get_page(page);
> if (type)
> *type = VM_FAULT_MINOR;
> return page;
> }
>
> int mydrv_mkwrite(struct vm_area_struct *vma,struct page *page)
> {
> printk("--> mydrv_mkwrite:%p\n",page->mapping);
>
> return 0;
> }
>
> struct vm_operations_struct mydrv_vm_ops = {
> .open = mydrv_vma_open,
> .close = mydrv_vma_close,
> .nopage = mydrv_vma_nopage,
> .page_mkwrite = mydrv_mkwrite,
> };
>
> static int mydrv_mmap(struct file *file, struct vm_area_struct *vma)
> {
> vma->vm_ops = &mydrv_vm_ops;
> vma->vm_flags |= ( VM_IO | VM_RESERVED | VM_DONTEXPAND );
> vma->vm_private_data = 0;
> mydrv_vma_open(vma);
> return 0;
> }
>
> int mydrv_open(struct inode *inode, struct file *file)
> {
> int minor;
>
> minor=MINOR(file->f_dentry->d_inode->i_rdev);
> printk("--> mydrv_open called for minor: %d\n", minor);
>
> if (minor>0) {
> printk("--> mydrv_open minor %d failed\n",minor);
> return -ENODEV;
> }
>
> printk("--> mydrv_open o.K.\n");
> return 0;
> }
>
>
> int mydrv_close(struct inode *inode, struct file *file)
> {
> int minor;
>
> minor=MINOR(file->f_dentry->d_inode->i_rdev);
> printk("--> mydrv_close for minor: %d\n", minor);
> return 0;
> }
>
> struct file_operations mydrv_fops = {
> .owner=THIS_MODULE,
> .mmap=mydrv_mmap,
> .open=mydrv_open,
> .release=mydrv_close,
> };
>
> int init_module(void)
> {
> printk("--> init_module called\n");
>
> if (!(mydrv_memory = vmalloc(mydrv_memory_size)))
> return -ENOMEM;
>
> if (register_chrdev(DEVICE_MAJOR, DEVICE_NAME, &mydrv_fops) < 0) {
> printk("--> Error registering driver.\n");
> return -ENODEV;
> }
>
> memset(mydrv_memory, 0, mydrv_memory_size);
> printk("--> init_module done\n");
> return 0;
> }
>
>
> void cleanup_module(void)
> {
> printk("--> cleanup_module called\n");
> unregister_chrdev(DEVICE_MAJOR,DEVICE_NAME);
> vfree(mydrv_memory);
> printk("--> cleanup_module done\n");
> }
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Stefan Seibold <[email protected]>");
> MODULE_DESCRIPTION("Test Driver");
>

(cc's added)

2007-10-29 08:17:48

by Jaya Kumar

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On 10/29/07, Andrew Morton <[email protected]> wrote:
> On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold <[email protected]> wrote:
> >
> > The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> > This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> > which will be in an extra thread handled, to perform the IO and clean and
> > write protect all pages with page_clean().
> >

Hi,

An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.

I understood from the thread that PeterZ is looking into page_mkclean
changes which I guess went into 2.6.23. I'm also happy to help in any
way if the way we're doing fb_defio needs to change.

Thanks,
jaya

2007-10-29 10:12:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> On 10/29/07, Andrew Morton <[email protected]> wrote:
> > On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold <[email protected]> wrote:
> > >
> > > The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> > > This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> > > which will be in an extra thread handled, to perform the IO and clean and
> > > write protect all pages with page_clean().
> > >
>
> Hi,
>
> An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
>
> I understood from the thread that PeterZ is looking into page_mkclean
> changes which I guess went into 2.6.23. I'm also happy to help in any
> way if the way we're doing fb_defio needs to change.

Yeah, its the truncate race stuff introduced by Nick in
d0217ac04ca6591841e5665f518e38064f4e65bd

I'm a bit at a loss on how to go around fixing this. One ugly idea I had
was to check page->mapping before going into page_mkwrite() and when
that is null, don't bother with the truncate check.



2007-10-29 12:35:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 2007-10-29 at 11:11 +0100, Peter Zijlstra wrote:
> On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> > On 10/29/07, Andrew Morton <[email protected]> wrote:
> > > On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold <[email protected]> wrote:
> > > >
> > > > The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> > > > This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> > > > which will be in an extra thread handled, to perform the IO and clean and
> > > > write protect all pages with page_clean().
> > > >
> >
> > Hi,
> >
> > An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
> >
> > I understood from the thread that PeterZ is looking into page_mkclean
> > changes which I guess went into 2.6.23. I'm also happy to help in any
> > way if the way we're doing fb_defio needs to change.
>
> Yeah, its the truncate race stuff introduced by Nick in
> d0217ac04ca6591841e5665f518e38064f4e65bd
>
> I'm a bit at a loss on how to go around fixing this. One ugly idea I had
> was to check page->mapping before going into page_mkwrite() and when
> that is null, don't bother with the truncate check.

Something like this

---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2300,6 +2300,8 @@ static int __do_fault(struct mm_struct *
* to become writable
*/
if (vma->vm_ops->page_mkwrite) {
+ struct address_space *mapping = page->mapping;
+
unlock_page(page);
if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
ret = VM_FAULT_SIGBUS;
@@ -2314,7 +2316,7 @@ static int __do_fault(struct mm_struct *
* reworking page_mkwrite locking API, which
* is better done later.
*/
- if (!page->mapping) {
+ if (mapping != page->mapping) {
ret = 0;
anon = 1; /* no anon but release vmf.page */
goto out;


2007-10-29 16:22:52

by Nick Piggin

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Monday 29 October 2007 23:35, Peter Zijlstra wrote:
> On Mon, 2007-10-29 at 11:11 +0100, Peter Zijlstra wrote:
> > On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> > > On 10/29/07, Andrew Morton <[email protected]> wrote:
> > > > On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold
<[email protected]> wrote:
> > > > > The problem original occurs with the fb_defio driver
> > > > > (driver/video/fb_defio.c). This driver use the
> > > > > vm_ops.page_mkwrite() handler for tracking the modified pages,
> > > > > which will be in an extra thread handled, to perform the IO and
> > > > > clean and write protect all pages with page_clean().
> > >
> > > Hi,
> > >
> > > An aside, I just tested that deferred IO works fine on
> > > 2.6.22.10/pxa255.
> > >
> > > I understood from the thread that PeterZ is looking into page_mkclean
> > > changes which I guess went into 2.6.23. I'm also happy to help in any
> > > way if the way we're doing fb_defio needs to change.
> >
> > Yeah, its the truncate race stuff introduced by Nick in
> > d0217ac04ca6591841e5665f518e38064f4e65bd
> >
> > I'm a bit at a loss on how to go around fixing this. One ugly idea I had
> > was to check page->mapping before going into page_mkwrite() and when
> > that is null, don't bother with the truncate check.
>
> Something like this


I think it's a fine minimal patch. Maybe add a comment to say exactly
what we're doing here (pagecache generally just uses !mapping to test
for truncate).

Otherwise, Acked-by: Nick Piggin <[email protected]>, thanks!

> ---
> mm/memory.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -2300,6 +2300,8 @@ static int __do_fault(struct mm_struct *
> * to become writable
> */
> if (vma->vm_ops->page_mkwrite) {
> + struct address_space *mapping = page->mapping;
> +
> unlock_page(page);
> if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
> ret = VM_FAULT_SIGBUS;
> @@ -2314,7 +2316,7 @@ static int __do_fault(struct mm_struct *
> * reworking page_mkwrite locking API, which
> * is better done later.
> */
> - if (!page->mapping) {
> + if (mapping != page->mapping) {
> ret = 0;
> anon = 1; /* no anon but release vmf.page */
> goto out;

2007-10-29 17:01:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> On 10/29/07, Andrew Morton <[email protected]> wrote:
> > On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold <[email protected]> wrote:
> > >
> > > The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> > > This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> > > which will be in an extra thread handled, to perform the IO and clean and
> > > write protect all pages with page_clean().
> > >

> An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
>
> I understood from the thread that PeterZ is looking into page_mkclean
> changes which I guess went into 2.6.23. I'm also happy to help in any
> way if the way we're doing fb_defio needs to change.

OK, seems I can't read. Or at least, I missed a large part of the
problem.

page_mkclean() hasn't changed, it was ->page_mkwrite() that changed. And
looking at the fb_defio code, I'm not sure I understand how its
page_mkclean() use could ever have worked.

The proposed patch [1] only fixes the issue of ->page_mkwrite() on
vmalloc()'ed memory. Not page_mkclean(), and that has never worked from
what I can make of it.

Jaya, could you shed some light on this? I presume you had your display
working.


[1] which I will clean up and resend after this issue is cleared up -
and preferably tested by someone who has this hardware.

2007-10-29 17:51:58

by Jaya Kumar

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On 10/29/07, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> > An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
> >
> > I understood from the thread that PeterZ is looking into page_mkclean
> > changes which I guess went into 2.6.23. I'm also happy to help in any
> > way if the way we're doing fb_defio needs to change.
>
> OK, seems I can't read. Or at least, I missed a large part of the
> problem.
>
> page_mkclean() hasn't changed, it was ->page_mkwrite() that changed. And
> looking at the fb_defio code, I'm not sure I understand how its
> page_mkclean() use could ever have worked.
>
> The proposed patch [1] only fixes the issue of ->page_mkwrite() on
> vmalloc()'ed memory. Not page_mkclean(), and that has never worked from
> what I can make of it.
>
> Jaya, could you shed some light on this? I presume you had your display
> working.
>

I thought I had it working. I saw the display update after each
mmap/write sequence to the framebuffer. I need to check if there's an
munmap or anything else going on in between write sequences that would
cause it to behave like page_mkclean was working.

Is it correct to assume that page_mkclean should mark the pages
read-only so that the next write would again trigger mkwrite? Even if
the page was from a vmalloc_to_page()?

Thanks,
jaya

2007-10-29 18:17:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 2007-10-29 at 13:51 -0400, Jaya Kumar wrote:
> On 10/29/07, Peter Zijlstra <[email protected]> wrote:
> > On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> > > An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
> > >
> > > I understood from the thread that PeterZ is looking into page_mkclean
> > > changes which I guess went into 2.6.23. I'm also happy to help in any
> > > way if the way we're doing fb_defio needs to change.
> >
> > OK, seems I can't read. Or at least, I missed a large part of the
> > problem.
> >
> > page_mkclean() hasn't changed, it was ->page_mkwrite() that changed. And
> > looking at the fb_defio code, I'm not sure I understand how its
> > page_mkclean() use could ever have worked.
> >
> > The proposed patch [1] only fixes the issue of ->page_mkwrite() on
> > vmalloc()'ed memory. Not page_mkclean(), and that has never worked from
> > what I can make of it.
> >
> > Jaya, could you shed some light on this? I presume you had your display
> > working.
> >
>
> I thought I had it working. I saw the display update after each
> mmap/write sequence to the framebuffer. I need to check if there's an
> munmap or anything else going on in between write sequences that would
> cause it to behave like page_mkclean was working.
>
> Is it correct to assume that page_mkclean should mark the pages
> read-only so that the next write would again trigger mkwrite?

Well, yes, that is the intended behaviour.

> Even if the page was from a vmalloc_to_page()?

That is the crux, I only ever implemented it for file pages.



2007-10-29 22:17:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23


On Mon, 2007-10-29 at 19:17 +0100, Peter Zijlstra wrote:
> On Mon, 2007-10-29 at 13:51 -0400, Jaya Kumar wrote:
> > On 10/29/07, Peter Zijlstra <[email protected]> wrote:
> > > On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote:
> > > > An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255.
> > > >
> > > > I understood from the thread that PeterZ is looking into page_mkclean
> > > > changes which I guess went into 2.6.23. I'm also happy to help in any
> > > > way if the way we're doing fb_defio needs to change.
> > >
> > > OK, seems I can't read. Or at least, I missed a large part of the
> > > problem.
> > >
> > > page_mkclean() hasn't changed, it was ->page_mkwrite() that changed. And
> > > looking at the fb_defio code, I'm not sure I understand how its
> > > page_mkclean() use could ever have worked.
> > >
> > > The proposed patch [1] only fixes the issue of ->page_mkwrite() on
> > > vmalloc()'ed memory. Not page_mkclean(), and that has never worked from
> > > what I can make of it.
> > >
> > > Jaya, could you shed some light on this? I presume you had your display
> > > working.
> > >
> >
> > I thought I had it working. I saw the display update after each
> > mmap/write sequence to the framebuffer. I need to check if there's an
> > munmap or anything else going on in between write sequences that would
> > cause it to behave like page_mkclean was working.
> >
> > Is it correct to assume that page_mkclean should mark the pages
> > read-only so that the next write would again trigger mkwrite?
>
> Well, yes, that is the intended behaviour.
>
> > Even if the page was from a vmalloc_to_page()?
>
> That is the crux, I only ever implemented it for file pages.

Hmm, so these vmalloc pages are mapped into user-space with
remap_pfn_range(), which doesn't have any form of rmap. That is, given a
pfn there is no way to obtain all ptes for it. So the interface to
page_mkclean() could never work for these (as it only provides a struct
page *).

[ also, remap_vmalloc_range() suffers similar issues, only file and anon
have proper rmap ]

I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps
we could assist drivers in maintaining and using vma lists.

I think page_mkclean_one() would work if you'd manually set page->index
and iterate the vmas yourself. Although atm I'm not sure of anything so
don't pin me on it.

2007-10-30 01:22:37

by Jaya Kumar

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On 10/29/07, Peter Zijlstra <[email protected]> wrote:
>
> [ also, remap_vmalloc_range() suffers similar issues, only file and anon
> have proper rmap ]
>
> I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps
> we could assist drivers in maintaining and using vma lists.
>
> I think page_mkclean_one() would work if you'd manually set page->index
> and iterate the vmas yourself. Although atm I'm not sure of anything so
> don't pin me on it.

:-) If it's anybody's fault, it's mine for not testing properly. My bad.

In the case of defio, I think it's no trouble to build a list of vmas
at mmap time and then to iterate through them when it's ready for
mkclean time as you suggested. I don't fully understand page->index
yet. I had thought it was only used by swap cache or file map.

On an unrelated note, I was looking for somewhere to stuff a 16 bit
offset (so that I have a cheap way to know which struct page
corresponds to which framebuffer block or offset) for another driver.
I had thought page->index was it but I think I am wrong now.

Thanks,
jaya

2007-10-30 09:56:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Mon, 2007-10-29 at 21:22 -0400, Jaya Kumar wrote:
> On 10/29/07, Peter Zijlstra <[email protected]> wrote:
> >
> > [ also, remap_vmalloc_range() suffers similar issues, only file and anon
> > have proper rmap ]
> >
> > I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps
> > we could assist drivers in maintaining and using vma lists.
> >
> > I think page_mkclean_one() would work if you'd manually set page->index
> > and iterate the vmas yourself. Although atm I'm not sure of anything so
> > don't pin me on it.
>
> :-) If it's anybody's fault, it's mine for not testing properly. My bad.
>
> In the case of defio, I think it's no trouble to build a list of vmas
> at mmap time and then to iterate through them when it's ready for
> mkclean time as you suggested. I don't fully understand page->index
> yet. I had thought it was only used by swap cache or file map.
>
> On an unrelated note, I was looking for somewhere to stuff a 16 bit
> offset (so that I have a cheap way to know which struct page
> corresponds to which framebuffer block or offset) for another driver.
> I had thought page->index was it but I think I am wrong now.

Yeah, page->index is used along with vma->vmpgoff and vma->vm_start to
determine the address of the page in the given vma:

address = vma->vm_start + ((page->index - vma->vm_pgoff) << PAGE_SHIFT);

and from that address the pte can be found by walking the vma->vm_mm
page tables.

So page->index does what you want it to, identify which part of the
framebuffer this particular page belongs to.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-30 10:49:37

by Stefani Seibold

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

Hi,

the question is how can i get all pte's from a vmalloc'ed memory. Due to
the zeroed mapping pointer i dont see how to do this?


Am Dienstag, den 30.10.2007, 10:56 +0100 schrieb Peter Zijlstra:
> On Mon, 2007-10-29 at 21:22 -0400, Jaya Kumar wrote:
> > On 10/29/07, Peter Zijlstra <[email protected]> wrote:
> > >
> > > [ also, remap_vmalloc_range() suffers similar issues, only file and anon
> > > have proper rmap ]
> > >
> > > I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps
> > > we could assist drivers in maintaining and using vma lists.
> > >
> > > I think page_mkclean_one() would work if you'd manually set page->index
> > > and iterate the vmas yourself. Although atm I'm not sure of anything so
> > > don't pin me on it.
> >
> > :-) If it's anybody's fault, it's mine for not testing properly. My bad.
> >
> > In the case of defio, I think it's no trouble to build a list of vmas
> > at mmap time and then to iterate through them when it's ready for
> > mkclean time as you suggested. I don't fully understand page->index
> > yet. I had thought it was only used by swap cache or file map.
> >
> > On an unrelated note, I was looking for somewhere to stuff a 16 bit
> > offset (so that I have a cheap way to know which struct page
> > corresponds to which framebuffer block or offset) for another driver.
> > I had thought page->index was it but I think I am wrong now.
> Yeah, page->index is used along with vma->vmpgoff and vma->vm_start to
> determine the address of the page in the given vma:
>
> address = vma->vm_start + ((page->index - vma->vm_pgoff) << PAGE_SHIFT);
>
> and from that address the pte can be found by walking the vma->vm_mm
> page tables.
>
> So page->index does what you want it to, identify which part of the
> framebuffer this particular page belongs to.

2007-10-30 12:54:23

by Hugh Dickins

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Tue, 30 Oct 2007, Stefani Seibold wrote:
>
> the question is how can i get all pte's from a vmalloc'ed memory. Due to
> the zeroed mapping pointer i dont see how to do this?

The mapping pointer is zeroed because you've done nothing to set it.
Below is how I answered you a week ago. But this is new territory
(extending page_mkclean to work on more than just pagecache pages),
I'm still unsure what would be the safest way to do it.

On Mon, 22 Oct 2007, Stefani Seibold wrote:
>
> i have a problem with vmalloc() and vm_ops.page_mkwrite().
>
> ReadOnly access works, but on a write access the VM will
> endless invoke the vm_ops.page_mkwrite() handler.
>
> I tracked down the problem to the
> struct page.mapping pointer,
> which is NULL.
>
> The problem original occurs with the fb_defio driver (driver/video/fb_defio.c).
> This driver use the vm_ops.page_mkwrite() handler for tracking the modified pages,
> which will be in an extra thread handled, to perform the IO and clean and
> write protect all pages with page_clean().

Interesting. You need to ask Jaya (CC'ed) since he's the one
who put that code into fb_defio.c, exported page_mkclean, and
should have tested it.

>
> I am not sure if the is a feature of the new rmap code or a bug.

page_mkclean was written in the belief that it was being used on
pagecache pages. I'm not sure how deeply engrained that belief is.

If it can easily and safely be used on something else, that may be
nice: though there's a danger we'll keep breaking and re-breaking
it if there's only one driver using it in an unusual way. CC'ed
Peter since he's the one who most needs to be aware of this.

>
> Is there an way to get a similar functionality? Currently, i have no
> idea
> how to get the ptep from a page alloced with vmalloc().

A pagecache page would have page->mapping initialized to point to
the struct address_space of the vma, and page->index to the offset
(in PAGE_SIZE units): see mm/filemap.c:add_to_page_cache. Without
page->mapping set, page_mkclean_file won't be able to find the vmas
in which the page might appear; and without page->index set, it
won't be able to find where the page should be in those vmas.

If such a driver does not put its pages into the page cache (the
safer course? I'm unsure), then it needs to set page->mapping and
page->index appropriately (and reset page->mapping to NULL before
freeing).

Hugh

2007-10-30 13:12:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Tue, 2007-10-30 at 12:39 +0000, Hugh Dickins wrote:
> On Tue, 30 Oct 2007, Stefani Seibold wrote:
> >
> > the question is how can i get all pte's from a vmalloc'ed memory. Due to
> > the zeroed mapping pointer i dont see how to do this?
>
> The mapping pointer is zeroed because you've done nothing to set it.
> Below is how I answered you a week ago. But this is new territory
> (extending page_mkclean to work on more than just pagecache pages),
> I'm still unsure what would be the safest way to do it.

Quite, I think manual usage of page_mkclean_one() on the vma gotten from
mmap() along with properly setting page->index is the simplest solution
to make work.

Making page_mkclean(struct page *) work for remap_pfn/vmalloc_range()
style mmaps would require extending rmap to work with those, which
includes setting page->mapping to point to a anon_vma like object.

But that sounds like a lot of work, and I'm not sure its worth the
overhead, because so far all users of remap_pfn/vmalloc_range() have
survived without.



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-30 13:16:51

by Jaya Kumar

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On 10/30/07, Peter Zijlstra <[email protected]> wrote:
> So page->index does what you want it to, identify which part of the
> framebuffer this particular page belongs to.

Ok. I'm attempting to walk the code sequence. Here's what I think:

- driver loads
- driver vmalloc()s its fb
- this creates the necessary pte entries
then...
- app mmap(/dev/fb0)
- vma is created
- defio mmap adds this vma to private list (equivalent of
address_space or anon_vma)
- app touches base + pixel(128,128) = base + 16k
- page fault
- defio nopage gets called
- defio nopage does vmalloc_to_page(base+16k)
- that finds the correct struct page corresponding to that vaddr.
page->index has not been set by anyone so far, right?
* ah... i see, you are suggesting that this is where I could set the
index since i know the offset i want it to represent. right?
- defio mkwrite get called. defio adds page to its list. schedules delayed work
- app keeps writing the page
- delayed work occurs
- foreach vma { foreach page { page_mkclean_one(page, vma) }
- cycle repeats...

Thanks,
jaya

2007-10-30 13:26:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Tue, 2007-10-30 at 09:16 -0400, Jaya Kumar wrote:
> On 10/30/07, Peter Zijlstra <[email protected]> wrote:
> > So page->index does what you want it to, identify which part of the
> > framebuffer this particular page belongs to.
>
> Ok. I'm attempting to walk the code sequence. Here's what I think:
>
> - driver loads
> - driver vmalloc()s its fb
> - this creates the necessary pte entries

well, one set thereof, the kernel mappings, which for this purpose are
the least interesting.

> then...
> - app mmap(/dev/fb0)
> - vma is created
> - defio mmap adds this vma to private list (equivalent of
> address_space or anon_vma)

> - app touches base + pixel(128,128) = base + 16k
> - page fault
> - defio nopage gets called
> - defio nopage does vmalloc_to_page(base+16k)

this installs a user space page table entry for your page; this is the
interesting one as it carries the user-dirty state.

> - that finds the correct struct page corresponding to that vaddr.
> page->index has not been set by anyone so far, right?
> * ah... i see, you are suggesting that this is where I could set the
> index since i know the offset i want it to represent. right?

Not quite, you would set that right after vmallocing, just set an
increasing page->index starting with 0 for the first page.

Then ensure your vma->vm_pgoff is 0 (which should be the case since
userspace will most likely mmap the whole thing, and if not it still
gets what it expects).

> - defio mkwrite get called. defio adds page to its list. schedules delayed work
> - app keeps writing the page
> - delayed work occurs
> - foreach vma { foreach page { page_mkclean_one(page, vma) }

Yeah, page_mkclean_one(page, vma) will use vma_address() to obtain an
user-space address for the page in this vma using page->index and the
formula from the last email, this address is then used to walk the page
tables and obtain a pte.

This will be the user-space pte installed by your nopfn handler. Not the
kernel vmap pte resulting from the vmalloc() call.

> - cycle repeats...



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-10-30 15:48:37

by Hugh Dickins

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Tue, 30 Oct 2007, Peter Zijlstra wrote:
> On Tue, 2007-10-30 at 09:16 -0400, Jaya Kumar wrote:
...
> > - defio mmap adds this vma to private list (equivalent of
> > address_space or anon_vma)
...
> > - foreach vma { foreach page { page_mkclean_one(page, vma) }
>
> Yeah, page_mkclean_one(page, vma) will use vma_address() to obtain an
> user-space address for the page in this vma using page->index and the
> formula from the last email, this address is then used to walk the page
> tables and obtain a pte.

I don't understand why you suggested an anon_vma, nor why Jaya is
suggesting a private list. All vmas mapping /dev/fb0 will be kept
in the prio_tree rooted in its struct address_space (__vma_link_file
in mm/mmap.c). And page_mkclean gets page_mkclean_file to walk that
very tree. The missing part is just the setting of page->mapping to
point to that struct address_space (and clearing it before finally
freeing the pages), and the setting of page->index as you described.
Isn't it?

Hugh

2007-10-30 15:51:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Tue, 2007-10-30 at 15:47 +0000, Hugh Dickins wrote:
> On Tue, 30 Oct 2007, Peter Zijlstra wrote:
> > On Tue, 2007-10-30 at 09:16 -0400, Jaya Kumar wrote:
> ....
> > > - defio mmap adds this vma to private list (equivalent of
> > > address_space or anon_vma)
> ....
> > > - foreach vma { foreach page { page_mkclean_one(page, vma) }
> >
> > Yeah, page_mkclean_one(page, vma) will use vma_address() to obtain an
> > user-space address for the page in this vma using page->index and the
> > formula from the last email, this address is then used to walk the page
> > tables and obtain a pte.
>
> I don't understand why you suggested an anon_vma, nor why Jaya is
> suggesting a private list. All vmas mapping /dev/fb0 will be kept
> in the prio_tree rooted in its struct address_space (__vma_link_file
> in mm/mmap.c). And page_mkclean gets page_mkclean_file to walk that
> very tree. The missing part is just the setting of page->mapping to
> point to that struct address_space (and clearing it before finally
> freeing the pages), and the setting of page->index as you described.
> Isn't it?

Hmm, there is a thought. I had not considered that mapping a chardev
would have that effect.

I'd have to have a look at the actual code, but yeah, that might very
well work out. How silly of me.

Thanks!


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-11-01 08:02:59

by Jaya Kumar

[permalink] [raw]
Subject: Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

On Oct 30, 2007 11:47 AM, Hugh Dickins <[email protected]> wrote:
>
> I don't understand why you suggested an anon_vma, nor why Jaya is
> suggesting a private list. All vmas mapping /dev/fb0 will be kept
> in the prio_tree rooted in its struct address_space (__vma_link_file
> in mm/mmap.c). And page_mkclean gets page_mkclean_file to walk that
> very tree. The missing part is just the setting of page->mapping to
> point to that struct address_space (and clearing it before finally
> freeing the pages), and the setting of page->index as you described.
> Isn't it?

Oops, sorry that I missed that. Now I understand. I think:

page->mapping = vma->vm_file->f_mapping
page->index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff

at nopage time and then before the driver vfrees, I'll clear mapping
for all those pages.

Thanks,
jaya