2009-03-22 18:35:27

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH next] sysfs: fix some bin_vm_ops errors

Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
"sysfs: don't block indefinitely for unmapped files" in linux-next
crashes the PowerMac G5 when X starts up. It's caught out by the way
powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
a new vma->vm_file whose private_data no longer points to the bin_buffer
(substitution done because some versions of X crash if that mmap fails).

The fix to this is straightforward: the original vm_file is fput() in
that case, so this mmap won't block sysfs at all, so just don't switch
over to bin_vm_ops if vm_file has changed.

But more fixes made before realizing that was the problem:-

It should not be an error if bin_page_mkwrite() finds no underlying
page_mkwrite().

Check that a file already mmap'ed has the same underlying vm_ops
_before_ pointing vma->vm_ops at bin_vm_ops.

If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
on CONFIG_NUMA=y, just because that has a set_policy and get_policy.
They probably won't be called, and it's not a disaster if they don't
work on sysfs. vm_ops->migrate? I couldn't find any example. Just
delete CONFIG_NUMA tests, easy enough to add stubs later if required.

Signed-off-by: Hugh Dickins <[email protected]>
---
I have only tested that X now starts up fine on the G5: I don't think
I've messed up the intent of your patch, but not tested it still works.

fs/sysfs/bin.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

--- 2.6.29-rc8-next/fs/sysfs/bin.c 2009-03-20 12:41:45.000000000 +0000
+++ linux/fs/sysfs/bin.c 2009-03-21 18:01:26.000000000 +0000
@@ -241,9 +241,12 @@ static int bin_page_mkwrite(struct vm_ar
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;

- if (!bb->vm_ops || !bb->vm_ops->page_mkwrite)
+ if (!bb->vm_ops)
return -EINVAL;

+ if (!bb->vm_ops->page_mkwrite)
+ return 0;
+
if (!sysfs_get_active_two(attr_sd))
return -EINVAL;

@@ -287,7 +290,6 @@ static int mmap(struct file *file, struc
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
- struct vm_operations_struct *vm_ops;
int rc;

mutex_lock(&bb->mutex);
@@ -302,24 +304,20 @@ static int mmap(struct file *file, struc
goto out_put;

rc = attr->mmap(kobj, attr, vma);
- vm_ops = vma->vm_ops;
- vma->vm_ops = &bin_vm_ops;
if (rc)
goto out_put;

- rc = -EINVAL;
- if (bb->mmapped && bb->vm_ops != vma->vm_ops)
+ if (vma->vm_file != file)
goto out_put;

-#ifdef CONFIG_NUMA
rc = -EINVAL;
- if (vm_ops && ((vm_ops->set_policy || vm_ops->get_policy || vm_ops->migrate)))
+ if (bb->mmapped && bb->vm_ops != vma->vm_ops)
goto out_put;
-#endif

rc = 0;
bb->mmapped = 1;
- bb->vm_ops = vm_ops;
+ bb->vm_ops = vma->vm_ops;
+ vma->vm_ops = &bin_vm_ops;
out_put:
sysfs_put_active_two(attr_sd);
out_unlock:


2009-03-22 20:56:50

by Eric Biederman

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors

On Sun, Mar 22, 2009 at 11:33 AM, Hugh Dickins <[email protected]> wrote:
> Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
> "sysfs: don't block indefinitely for unmapped files" in linux-next
> crashes the PowerMac G5 when X starts up. ?It's caught out by the way
> powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
> a new vma->vm_file whose private_data no longer points to the bin_buffer
> (substitution done because some versions of X crash if that mmap fails).
>
> The fix to this is straightforward: the original vm_file is fput() in
> that case, so this mmap won't block sysfs at all, so just don't switch
> over to bin_vm_ops if vm_file has changed.

That part is really weird but it looks reasonable.

> But more fixes made before realizing that was the problem:-
>
> It should not be an error if bin_page_mkwrite() finds no underlying
> page_mkwrite().

Likely. The goal is to get the same behaviour as if mkwrite had not been
implemented. Looking at the code clearly returning an error in
the version I am looking at does not meet the goal, and since I unmap
everything during a hot-unplug and rely on the fault handler I have
no problem with that change.

> Check that a file already mmap'ed has the same underlying vm_ops
> _before_ pointing vma->vm_ops at bin_vm_ops.

Checking that the underlying file is the same is good. I know my original
reasoning was that using the original vm_ops would likely mess up error
handling if I used them.

> If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
> on CONFIG_NUMA=y, just because that has a set_policy and get_policy.
> They probably won't be called, and it's not a disaster if they don't
> work on sysfs. ?vm_ops->migrate? ?I couldn't find any example. ?Just
> delete CONFIG_NUMA tests, easy enough to add stubs later if required.

I agree about it being easy enough to add stubs later if required. In my case
I deliberately added an error check instead so that the code would fail fast
instead of failing subtlely if anyone ever did that. And that check appears to
have worked in your case so I would like to retain that check.

> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> I have only tested that X now starts up fine on the G5: I don't think
> I've messed up the intent of your patch, but not tested it still works.
>
> ?fs/sysfs/bin.c | ? 18 ++++++++----------
> ?1 file changed, 8 insertions(+), 10 deletions(-)
>
> --- 2.6.29-rc8-next/fs/sysfs/bin.c ? ? ?2009-03-20 12:41:45.000000000 +0000
> +++ linux/fs/sysfs/bin.c ? ? ? ?2009-03-21 18:01:26.000000000 +0000
> @@ -241,9 +241,12 @@ static int bin_page_mkwrite(struct vm_ar
> ? ? ? ?struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
> ? ? ? ?int ret;
>
> - ? ? ? if (!bb->vm_ops || !bb->vm_ops->page_mkwrite)
> + ? ? ? if (!bb->vm_ops)
> ? ? ? ? ? ? ? ?return -EINVAL;
>
> + ? ? ? if (!bb->vm_ops->page_mkwrite)
> + ? ? ? ? ? ? ? return 0;
> +
> ? ? ? ?if (!sysfs_get_active_two(attr_sd))
> ? ? ? ? ? ? ? ?return -EINVAL;
>
> @@ -287,7 +290,6 @@ static int mmap(struct file *file, struc
> ? ? ? ?struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
> ? ? ? ?struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
> ? ? ? ?struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
> - ? ? ? struct vm_operations_struct *vm_ops;
> ? ? ? ?int rc;
>
> ? ? ? ?mutex_lock(&bb->mutex);
> @@ -302,24 +304,20 @@ static int mmap(struct file *file, struc
> ? ? ? ? ? ? ? ?goto out_put;
>
> ? ? ? ?rc = attr->mmap(kobj, attr, vma);
> - ? ? ? vm_ops = vma->vm_ops;
> - ? ? ? vma->vm_ops = &bin_vm_ops;
> ? ? ? ?if (rc)
> ? ? ? ? ? ? ? ?goto out_put;
>
> - ? ? ? rc = -EINVAL;
> - ? ? ? if (bb->mmapped && bb->vm_ops != vma->vm_ops)
> + ? ? ? if (vma->vm_file != file)
> ? ? ? ? ? ? ? ?goto out_put;
>
> -#ifdef CONFIG_NUMA
> ? ? ? ?rc = -EINVAL;
> - ? ? ? if (vm_ops && ((vm_ops->set_policy || vm_ops->get_policy || vm_ops->migrate)))
> + ? ? ? if (bb->mmapped && bb->vm_ops != vma->vm_ops)
> ? ? ? ? ? ? ? ?goto out_put;
> -#endif
>
> ? ? ? ?rc = 0;
> ? ? ? ?bb->mmapped = 1;
> - ? ? ? bb->vm_ops = vm_ops;
> + ? ? ? bb->vm_ops = vma->vm_ops;
> + ? ? ? vma->vm_ops = &bin_vm_ops;
> ?out_put:
> ? ? ? ?sysfs_put_active_two(attr_sd);
> ?out_unlock:
>

2009-03-23 00:15:35

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors

On Sun, 2009-03-22 at 18:33 +0000, Hugh Dickins wrote:
> Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
> "sysfs: don't block indefinitely for unmapped files" in linux-next
> crashes the PowerMac G5 when X starts up. It's caught out by the way
> powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
> a new vma->vm_file whose private_data no longer points to the bin_buffer
> (substitution done because some versions of X crash if that mmap fails).
>
> The fix to this is straightforward: the original vm_file is fput() in
> that case, so this mmap won't block sysfs at all, so just don't switch
> over to bin_vm_ops if vm_file has changed.

Looks good, though we should probably also add a comment to your code
to make it clear why the test is here.

The fix should probably go into .29...

Cheers,
Ben.

2009-03-23 01:44:13

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors

On Sun, 22 Mar 2009, Eric Biederman wrote:
> On Sun, Mar 22, 2009 at 11:33 AM, Hugh Dickins <[email protected]> wrote:
>
> > If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
> > on CONFIG_NUMA=y, just because that has a set_policy and get_policy.
> > They probably won't be called, and it's not a disaster if they don't
> > work on sysfs.  vm_ops->migrate?  I couldn't find any example.  Just
> > delete CONFIG_NUMA tests, easy enough to add stubs later if required.
>
> I agree about it being easy enough to add stubs later if required. In my case
> I deliberately added an error check instead so that the code would fail fast
> instead of failing subtlely if anyone ever did that. And that check appears
> to have worked in your case so I would like to retain that check.

No, I didn't really hit that case (I didn't even have CONFIG_NUMA=y),
it was just one of the several misguesses I made before hitting on the
right answer.

I can understand you preferring to keep a check on this, but I don't
believe you really want to retain _that_ check: it just causes some
mmaps to fail on CONFIG_NUMA=y configurations, which succeed on
non-NUMA configurations, and work fine on non-NUMA configurations,
so long as you don't try set_policy, get_policy or migrate.

Here's a replacement version of the patch which adds those methods,
and adds a comment on the substituted vm_file as Ben suggests.


[PATCH next] sysfs: fix some bin_vm_ops errors

Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
"sysfs: don't block indefinitely for unmapped files" in linux-next
crashes the PowerMac G5 when X starts up. It's caught out by the way
powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
a new vma->vm_file whose private_data no longer points to the bin_buffer
(substitution done because some versions of X crash if that mmap fails).

The fix to this is straightforward: the original vm_file is fput() in
that case, so this mmap won't block sysfs at all, so just don't switch
over to bin_vm_ops if vm_file has changed.

But more fixes made before realizing that was the problem:-

It should not be an error if bin_page_mkwrite() finds no underlying
page_mkwrite().

Check that a file already mmap'ed has the same underlying vm_ops
_before_ pointing vma->vm_ops at bin_vm_ops.

If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
on CONFIG_NUMA=y, just because that has a set_policy and get_policy:
provide bin_set_policy, bin_get_policy and bin_migrate.

Signed-off-by: Hugh Dickins <[email protected]>
---
I have only tested that X now starts up fine on the G5: I don't think
I've messed up the intent of your patch, but not tested it still works.

fs/sysfs/bin.c | 89 +++++++++++++++++++++++++++++++++++++++++------
1 file changed, 79 insertions(+), 10 deletions(-)

--- 2.6.29-rc8-next/fs/sysfs/bin.c 2009-03-20 12:41:45.000000000 +0000
+++ linux/fs/sysfs/bin.c 2009-03-23 01:24:48.000000000 +0000
@@ -241,9 +241,12 @@ static int bin_page_mkwrite(struct vm_ar
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
int ret;

- if (!bb->vm_ops || !bb->vm_ops->page_mkwrite)
+ if (!bb->vm_ops)
return -EINVAL;

+ if (!bb->vm_ops->page_mkwrite)
+ return 0;
+
if (!sysfs_get_active_two(attr_sd))
return -EINVAL;

@@ -273,12 +276,78 @@ static int bin_access(struct vm_area_str
return ret;
}

+#ifdef CONFIG_NUMA
+static int bin_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
+{
+ struct file *file = vma->vm_file;
+ struct bin_buffer *bb = file->private_data;
+ struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+ int ret;
+
+ if (!bb->vm_ops || !bb->vm_ops->set_policy)
+ return 0;
+
+ if (!sysfs_get_active_two(attr_sd))
+ return -EINVAL;
+
+ ret = bb->vm_ops->set_policy(vma, new);
+
+ sysfs_put_active_two(attr_sd);
+ return ret;
+}
+
+static struct mempolicy *bin_get_policy(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ struct file *file = vma->vm_file;
+ struct bin_buffer *bb = file->private_data;
+ struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+ struct mempolicy *pol;
+
+ if (!bb->vm_ops || !bb->vm_ops->get_policy)
+ return vma->vm_policy;
+
+ if (!sysfs_get_active_two(attr_sd))
+ return vma->vm_policy;
+
+ pol = bb->vm_ops->get_policy(vma, addr);
+
+ sysfs_put_active_two(attr_sd);
+ return pol;
+}
+
+static int bin_migrate(struct vm_area_struct *vma, const nodemask_t *from,
+ const nodemask_t *to, unsigned long flags)
+{
+ struct file *file = vma->vm_file;
+ struct bin_buffer *bb = file->private_data;
+ struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
+ int ret;
+
+ if (!bb->vm_ops || !bb->vm_ops->migrate)
+ return 0;
+
+ if (!sysfs_get_active_two(attr_sd))
+ return 0;
+
+ ret = bb->vm_ops->migrate(vma, from, to, flags);
+
+ sysfs_put_active_two(attr_sd);
+ return ret;
+}
+#endif
+
static struct vm_operations_struct bin_vm_ops = {
.open = bin_vma_open,
.close = bin_vma_close,
.fault = bin_fault,
.page_mkwrite = bin_page_mkwrite,
.access = bin_access,
+#ifdef CONFIG_NUMA
+ .set_policy = bin_set_policy,
+ .get_policy = bin_get_policy,
+ .migrate = bin_migrate,
+#endif
};

static int mmap(struct file *file, struct vm_area_struct *vma)
@@ -287,7 +356,6 @@ static int mmap(struct file *file, struc
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
- struct vm_operations_struct *vm_ops;
int rc;

mutex_lock(&bb->mutex);
@@ -302,24 +370,25 @@ static int mmap(struct file *file, struc
goto out_put;

rc = attr->mmap(kobj, attr, vma);
- vm_ops = vma->vm_ops;
- vma->vm_ops = &bin_vm_ops;
if (rc)
goto out_put;

- rc = -EINVAL;
- if (bb->mmapped && bb->vm_ops != vma->vm_ops)
+ /*
+ * PowerPC's pci_mmap of legacy_mem uses shmem_zero_setup()
+ * to satisfy versions of X which crash if the mmap fails: that
+ * substitutes a new vm_file, and we don't then want bin_vm_ops.
+ */
+ if (vma->vm_file != file)
goto out_put;

-#ifdef CONFIG_NUMA
rc = -EINVAL;
- if (vm_ops && ((vm_ops->set_policy || vm_ops->get_policy || vm_ops->migrate)))
+ if (bb->mmapped && bb->vm_ops != vma->vm_ops)
goto out_put;
-#endif

rc = 0;
bb->mmapped = 1;
- bb->vm_ops = vm_ops;
+ bb->vm_ops = vma->vm_ops;
+ vma->vm_ops = &bin_vm_ops;
out_put:
sysfs_put_active_two(attr_sd);
out_unlock:

2009-03-23 01:47:14

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors

On Mon, 23 Mar 2009, Benjamin Herrenschmidt wrote:
> On Sun, 2009-03-22 at 18:33 +0000, Hugh Dickins wrote:
> > Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
> > "sysfs: don't block indefinitely for unmapped files" in linux-next
> > crashes the PowerMac G5 when X starts up. It's caught out by the way
> > powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
> > a new vma->vm_file whose private_data no longer points to the bin_buffer
> > (substitution done because some versions of X crash if that mmap fails).
> >
> > The fix to this is straightforward: the original vm_file is fput() in
> > that case, so this mmap won't block sysfs at all, so just don't switch
> > over to bin_vm_ops if vm_file has changed.
>
> Looks good, though we should probably also add a comment to your code
> to make it clear why the test is here.

Right, done.

>
> The fix should probably go into .29...

No, this is just a linux-next or mmotm thing. (I probably misled
you by quoting the commit id: though helpful, that is confusing.)

Hugh

2009-03-23 03:47:12

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors


> No, this is just a linux-next or mmotm thing. (I probably misled
> you by quoting the commit id: though helpful, that is confusing.)

Ok, cool, thanks !

Cheers,
Ben.

2009-03-23 03:54:28

by Eric Biederman

[permalink] [raw]
Subject: Re: [PATCH next] sysfs: fix some bin_vm_ops errors

On Sun, Mar 22, 2009 at 6:41 PM, Hugh Dickins <[email protected]> wrote:
> On Sun, 22 Mar 2009, Eric Biederman wrote:
>> On Sun, Mar 22, 2009 at 11:33 AM, Hugh Dickins <[email protected]> wrote:
>>
>> > If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
>> > on CONFIG_NUMA=y, just because that has a set_policy and get_policy.
>> > They probably won't be called, and it's not a disaster if they don't
>> > work on sysfs. ?vm_ops->migrate? ?I couldn't find any example. ?Just
>> > delete CONFIG_NUMA tests, easy enough to add stubs later if required.
>>
>> I agree about it being easy enough to add stubs later if required. ?In my case
>> I deliberately added an error check instead so that the code would fail fast
>> instead of failing subtlely if anyone ever did that. ?And that check appears
>> to have worked in your case so I would like to retain that check.
>
> No, I didn't really hit that case (I didn't even have CONFIG_NUMA=y),
> it was just one of the several misguesses I made before hitting on the
> right answer.
>
> I can understand you preferring to keep a check on this, but I don't
> believe you really want to retain _that_ check: it just causes some
> mmaps to fail on CONFIG_NUMA=y configurations, which succeed on
> non-NUMA configurations, and work fine on non-NUMA configurations,
> so long as you don't try set_policy, get_policy or migrate.
>
> Here's a replacement version of the patch which adds those methods,
> and adds a comment on the substituted vm_file as Ben suggests.
>
>
> [PATCH next] sysfs: fix some bin_vm_ops errors
>
> Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
> "sysfs: don't block indefinitely for unmapped files" in linux-next
> crashes the PowerMac G5 when X starts up. ?It's caught out by the way
> powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
> a new vma->vm_file whose private_data no longer points to the bin_buffer
> (substitution done because some versions of X crash if that mmap fails).
>
> The fix to this is straightforward: the original vm_file is fput() in
> that case, so this mmap won't block sysfs at all, so just don't switch
> over to bin_vm_ops if vm_file has changed.
>
> But more fixes made before realizing that was the problem:-
>
> It should not be an error if bin_page_mkwrite() finds no underlying
> page_mkwrite().
>
> Check that a file already mmap'ed has the same underlying vm_ops
> _before_ pointing vma->vm_ops at bin_vm_ops.
>
> If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
> on CONFIG_NUMA=y, just because that has a set_policy and get_policy:
> provide bin_set_policy, bin_get_policy and bin_migrate.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> I have only tested that X now starts up fine on the G5: I don't think
> I've messed up the intent of your patch, but not tested it still works.

Looks reasonable to me.
Acked-by: Eric Biederman <[email protected]>

Eric