LinuxLists.cc - remap_file_pages() broken in 2.6.23?

2007-11-29 19:45:53

Subject: remap_file_pages() broken in 2.6.23?

Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201

The test case below, taken from the LTP test code, prints -1 (as
expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
page. Proposed patch follows the program. Bug was apparently caused by
commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.

/*
* originally remap_file_pages02.c, from LTP
* - creates the file 'cache'; no cleanup
*
* Copyright (C) Ricardo Salveti de Araujo, 2007
*
* GPL v2
*/

#define _GNU_SOURCE
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <syscall.h>
#include <stdlib.h>
#include <linux/unistd.h>

/* Test case defines */
#define WINDOW_START 0x48000000

size_t page_sz;
size_t page_words;
size_t cache_pages;
size_t cache_sz;
size_t window_pages;
size_t window_sz;

static char *cache_contents;
int fd; /* File descriptor used at the test */
char *data = NULL;
char *data01 = NULL;

int
main(int ac, char **av)
{
int i, j, ret;

page_sz = getpagesize();
page_words = (page_sz/sizeof(char));

/* Set the cache size */
cache_pages = 32;
cache_sz = cache_pages*page_sz;
cache_contents = (char *) malloc(cache_sz * sizeof(char));

for (i = 0; i < cache_pages; i++) {
char *page = cache_contents + i*page_sz;

for (j = 0; j < page_words; j++)
page[j] = i;
}

if ((fd = open("cache", O_RDWR|O_CREAT|O_TRUNC,S_IRWXU)) < 0)
perror("open"), exit(1);

if (write(fd, cache_contents, cache_sz) != cache_sz)
perror("write"), exit(1);

data = mmap((void *)WINDOW_START,
cache_sz,
PROT_READ|PROT_WRITE,
MAP_FIXED | MAP_SHARED,
fd, 0);

if (data == MAP_FAILED)
perror("mmap"), exit(1);

ret = remap_file_pages(data, page_sz, 0, cache_pages * 2, 0);

printf("%d\n", ret);

exit(0);
}

Patch:

Signed-off-by: Supriya Kannery <[email protected]>

--- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.000000000 -0600
+++ linux-2.6.23/mm/fremap.c 2007-11-26 03:08:55.000000000 -0600
@@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
struct vm_area_struct *vma;
int err = -EINVAL;
int has_write_lock = 0;
+ unsigned long f_size = 0;

if (__prot)
return err;
@@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
goto retry;
}
mapping = vma->vm_file->f_mapping;
+
+ f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
+ f_size = f_size >> PAGE_CACHE_SHIFT;
+ if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
+ err = -EINVAL;
+ goto out;
+ }
+
/*
* page_mkclean doesn't work on nonlinear vmas, so if
* dirty pages need to be accounted, emulate with linear

2007-11-29 23:31:16

by Nick Piggin

[permalink] [raw]

Subject: Re: remap_file_pages() broken in 2.6.23?

On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote:
> Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201
>
> The test case below, taken from the LTP test code, prints -1 (as
> expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
> page. Proposed patch follows the program. Bug was apparently caused by
> commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.

Ah, that's not such good behaviour anyway. mmap is allowed to map
outside the file offset, so you're telling me that remap_file_pages
just magically should not be allowed to remap these...?

> Patch:
>
> Signed-off-by: Supriya Kannery <[email protected]>
>
> --- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.000000000 -0600
> +++ linux-2.6.23/mm/fremap.c 2007-11-26 03:08:55.000000000 -0600
> @@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
> struct vm_area_struct *vma;
> int err = -EINVAL;
> int has_write_lock = 0;
> + unsigned long f_size = 0;
>
> if (__prot)
> return err;
> @@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
> goto retry;
> }
> mapping = vma->vm_file->f_mapping;
> +
> + f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> + f_size = f_size >> PAGE_CACHE_SHIFT;
> + if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> /*
> * page_mkclean doesn't work on nonlinear vmas, so if
> * dirty pages need to be accounted, emulate with linear

I don't think there is anything preventing truncate races here. Theoretically
we could do it by taking i_mutex around here, but anyway then a subsequent
truncate is just going to be able to cause the mapping to be out of bounds
anyway.

If it were any other syscall than remap_file_pages, I'd be much more
hesitant to say this: I propose we change the test case instead. I
also changed other elements of the API, and we had the result tested
and verified by Oracle...

2007-12-03 12:29:32

by Supriya Kannery

[permalink] [raw]

Subject: Re: remap_file_pages() broken in 2.6.23?

Nick Piggin wrote:
> On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote:
>
>> Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201
>>
>> The test case below, taken from the LTP test code, prints -1 (as
>> expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
>> page. Proposed patch follows the program. Bug was apparently caused by
>> commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.
>>
>
> Ah, that's not such good behaviour anyway. mmap is allowed to map
> outside the file offset, so you're telling me that remap_file_pages
> just magically should not be allowed to remap these...?
>
>
Validation check for pgoff was there in populate() in earlier
kernels.When populate() got removed and populate_range() was added,
during the specified commit, validation for pgoff also got removed. This
symantic would break existing apps that expects an error from
remap_file_pages when a large value for pgoff is given. Though the
change is error handling related, it breaks ABI from previous kernel
versions.

For validation, we check whether the pgoff + size exceeds the file size,
all in page units. And while calculating file size in page units, one
additional page unit is taken into account to get the exact number of
pages that contain the file size in bytes.
f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
<---- file size in bytes -------> <--- helps in rounding to next page
unit -->

mmap() will be mapping the minimum number of pages that can contain a
file. So offset cannot be a large value compared to file size. mmap() is
also supposed to return EINVAL when the offset is a large/invalid value
as man page mandates.
>
>> Patch:
>>
>> Signed-off-by: Supriya Kannery <[email protected]>
>>
>> --- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.000000000 -0600
>> +++ linux-2.6.23/mm/fremap.c 2007-11-26 03:08:55.000000000 -0600
>> @@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
>> struct vm_area_struct *vma;
>> int err = -EINVAL;
>> int has_write_lock = 0;
>> + unsigned long f_size = 0;
>>
>> if (__prot)
>> return err;
>> @@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
>> goto retry;
>> }
>> mapping = vma->vm_file->f_mapping;
>> +
>> + f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
>> + f_size = f_size >> PAGE_CACHE_SHIFT;
>> + if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
>> + err = -EINVAL;
>> + goto out;
>> + }
>> +
>> /*
>> * page_mkclean doesn't work on nonlinear vmas, so if
>> * dirty pages need to be accounted, emulate with linear
>>
>
>
> I don't think there is anything preventing truncate races here. Theoretically
> we could do it by taking i_mutex around here, but anyway then a subsequent
> truncate is just going to be able to cause the mapping to be out of bounds
> anyway.
>
>
i_size_read() is taking care of syncing between the writes/truncations
in SMP/ pre-emtable kernel. For SMP, it specifically takes care to get
the value again if any changes happen to the source.
> If it were any other syscall than remap_file_pages, I'd be much more
> hesitant to say this: I propose we change the test case instead. I
> also changed other elements of the API, and we had the result tested
> and verified by Oracle...
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Thanks, Supriya
Linux Technology Centre.

2007-12-03 23:04:37

by Nick Piggin

[permalink] [raw]

Subject: Re: remap_file_pages() broken in 2.6.23?

On Mon, Dec 03, 2007 at 06:01:40PM +0530, Supriya Kannery wrote:
> Nick Piggin wrote:
> >On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote:
> >
> >>Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201
> >>
> >>The test case below, taken from the LTP test code, prints -1 (as
> >>expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
> >>page. Proposed patch follows the program. Bug was apparently caused by
> >>commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.
> >>
> >
> >Ah, that's not such good behaviour anyway. mmap is allowed to map
> >outside the file offset, so you're telling me that remap_file_pages
> >just magically should not be allowed to remap these...?
> >
> >
> Validation check for pgoff was there in populate() in earlier
> kernels.When populate() got removed and populate_range() was added,
> during the specified commit, validation for pgoff also got removed. This
> symantic would break existing apps that expects an error from
> remap_file_pages when a large value for pgoff is given. Though the
> change is error handling related, it breaks ABI from previous kernel
> versions.

But only Oracle uses it AFAIK, and they don't require this behaviour.

> For validation, we check whether the pgoff + size exceeds the file size,
> all in page units. And while calculating file size in page units, one
> additional page unit is taken into account to get the exact number of
> pages that contain the file size in bytes.
> f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> <---- file size in bytes -------> <--- helps in rounding to next page
> unit -->
>
> mmap() will be mapping the minimum number of pages that can contain a
> file. So offset cannot be a large value compared to file size. mmap() is
> also supposed to return EINVAL when the offset is a large/invalid value
> as man page mandates.

I don't think it is required that mmap must fail if it maps past i_size.
I don't think Linux fails in this case.

> >
> >>Patch:
> >>
> >>Signed-off-by: Supriya Kannery <[email protected]>
> >>
> >>--- linux-2.6.23/mm/fremap.c.orig 2007-11-22 00:56:09.000000000 -0600
> >>+++ linux-2.6.23/mm/fremap.c 2007-11-26 03:08:55.000000000 -0600
> >>@@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
> >> struct vm_area_struct *vma;
> >> int err = -EINVAL;
> >> int has_write_lock = 0;
> >>+ unsigned long f_size = 0;
> >>
> >> if (__prot)
> >> return err;
> >>@@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
> >> goto retry;
> >> }
> >> mapping = vma->vm_file->f_mapping;
> >>+
> >>+ f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> >>+ f_size = f_size >> PAGE_CACHE_SHIFT;
> >>+ if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
> >>+ err = -EINVAL;
> >>+ goto out;
> >>+ }
> >>+
> >> /*
> >> * page_mkclean doesn't work on nonlinear vmas, so if
> >> * dirty pages need to be accounted, emulate with linear
> >>
> >
> >
> >I don't think there is anything preventing truncate races here.
> >Theoretically
> >we could do it by taking i_mutex around here, but anyway then a subsequent
> >truncate is just going to be able to cause the mapping to be out of bounds
> >anyway.
> >
> >
> i_size_read() is taking care of syncing between the writes/truncations
> in SMP/ pre-emtable kernel. For SMP, it specifically takes care to get
> the value again if any changes happen to the source.

And then right afterwards, the file gets truncated, and you hav eremapped
past i_size. So what's the point of preventing it? We have SIGBUS for
that.

Thanks,
Nick