2010-11-23 10:22:58

by Nick Piggin

[permalink] [raw]
Subject: [patch 1/2] fs: fix d_validate (again)

Again, I'd like to please fix d_validate. It can be trivally fixed
without delving into filesystem code, and it does not prevent the proper
careful removal of d_validate and untrusted dentry caches from
filesystems.

The right way to merge is fix the bugs in core code now, and remove it
completely if and when filesystems stop using it. But it's no longer a
liability or problem to maintain, so it can stay as long as it likes.

So, we need to revert 3825bdb7ed920845961f32f364454bee5f469abb, and then
apply this patch (which also should apply to .stable kernels and distros
going back a long way).


fs: d_validate fixes

d_validate has been broken for a long time.

kmem_ptr_validate does not guarantee that a pointer can be dereferenced
if it can go away at any time. Even rcu_read_lock doesn't help, because
the pointer might be queued in RCU callbacks but not executed yet.

So the parent cannot be checked, nor the name hashed. The dentry pointer
can not be touched until it can be verified under lock. Hashing simply
cannot be used.

Instead, verify the parent/child relationship by traversing parent's
d_child list. It's slow, but only ncpfs and the destaged smbfs care
about it, at this point.

Signed-off-by: Nick Piggin <[email protected]>

---
fs/dcache.c | 25 +++++++------------------
1 file changed, 7 insertions(+), 18 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c 2010-11-16 17:20:40.000000000 +1100
+++ linux-2.6/fs/dcache.c 2010-11-16 17:20:40.000000000 +1100
@@ -1483,41 +1483,30 @@ struct dentry *d_hash_and_lookup(struct
}

/**
- * d_validate - verify dentry provided from insecure source
+ * d_validate - verify dentry provided from insecure source (deprecated)
* @dentry: The dentry alleged to be valid child of @dparent
* @dparent: The parent dentry (known to be valid)
*
* An insecure source has sent us a dentry, here we verify it and dget() it.
* This is used by ncpfs in its readdir implementation.
* Zero is returned in the dentry is invalid.
+ *
+ * This function is slow for big directories, and deprecated, do not use it.
*/
-
int d_validate(struct dentry *dentry, struct dentry *dparent)
{
- struct hlist_head *base;
- struct hlist_node *lhp;
-
- /* Check whether the ptr might be valid at all.. */
- if (!kmem_ptr_validate(dentry_cache, dentry))
- goto out;
-
- if (dentry->d_parent != dparent)
- goto out;
+ struct dentry *child;

spin_lock(&dcache_lock);
- base = d_hash(dparent, dentry->d_name.hash);
- hlist_for_each(lhp,base) {
- /* hlist_for_each_entry_rcu() not required for d_hash list
- * as it is parsed under dcache_lock
- */
- if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
+ list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
+ if (dentry == child) {
__dget_locked(dentry);
spin_unlock(&dcache_lock);
return 1;
}
}
spin_unlock(&dcache_lock);
-out:
+
return 0;
}
EXPORT_SYMBOL(d_validate);


2010-11-23 10:26:49

by Nick Piggin

[permalink] [raw]
Subject: [patch 2/2] mm: remove (kmem|kern)_ptr_validate

This is a nasty and error prone API. It is no longer used, remove it.

Signed-off-by: Nick Piggin <[email protected]>

---
include/linux/slab.h | 2 --
mm/slab.c | 32 +-------------------------------
mm/slob.c | 5 -----
mm/slub.c | 40 ----------------------------------------
mm/util.c | 21 ---------------------
5 files changed, 1 insertion(+), 99 deletions(-)

Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h 2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/include/linux/slab.h 2010-11-17 00:11:49.000000000 +1100
@@ -106,8 +106,6 @@ int kmem_cache_shrink(struct kmem_cache
void kmem_cache_free(struct kmem_cache *, void *);
unsigned int kmem_cache_size(struct kmem_cache *);
const char *kmem_cache_name(struct kmem_cache *);
-int kern_ptr_validate(const void *ptr, unsigned long size);
-int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);

/*
* Please use this macro to create slab caches. Simply specify the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c 2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slab.c 2010-11-17 00:11:49.000000000 +1100
@@ -2781,7 +2781,7 @@ static void slab_put_obj(struct kmem_cac
/*
* Map pages beginning at addr to the given cache and slab. This is required
* for the slab allocator to be able to lookup the cache and slab of a
- * virtual address for kfree, ksize, kmem_ptr_validate, and slab debugging.
+ * virtual address for kfree, ksize, and slab debugging.
*/
static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
void *addr)
@@ -3660,36 +3660,6 @@ void *kmem_cache_alloc_notrace(struct km
EXPORT_SYMBOL(kmem_cache_alloc_notrace);
#endif

-/**
- * kmem_ptr_validate - check if an untrusted pointer might be a slab entry.
- * @cachep: the cache we're checking against
- * @ptr: pointer to validate
- *
- * This verifies that the untrusted pointer looks sane;
- * it is _not_ a guarantee that the pointer is actually
- * part of the slab cache in question, but it at least
- * validates that the pointer can be dereferenced and
- * looks half-way sane.
- *
- * Currently only used for dentry validation.
- */
-int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr)
-{
- unsigned long size = cachep->buffer_size;
- struct page *page;
-
- if (unlikely(!kern_ptr_validate(ptr, size)))
- goto out;
- page = virt_to_page(ptr);
- if (unlikely(!PageSlab(page)))
- goto out;
- if (unlikely(page_get_cache(page) != cachep))
- goto out;
- return 1;
-out:
- return 0;
-}
-
#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slub.c 2010-11-17 00:24:15.000000000 +1100
@@ -1917,17 +1917,6 @@ void kmem_cache_free(struct kmem_cache *
}
EXPORT_SYMBOL(kmem_cache_free);

-/* Figure out on which slab page the object resides */
-static struct page *get_object_page(const void *x)
-{
- struct page *page = virt_to_head_page(x);
-
- if (!PageSlab(page))
- return NULL;
-
- return page;
-}
-
/*
* Object placement in a slab is made very easy because we always start at
* offset 0. If we tune the size of the object to the alignment then we can
@@ -2386,35 +2375,6 @@ static int kmem_cache_open(struct kmem_c
}

/*
- * Check if a given pointer is valid
- */
-int kmem_ptr_validate(struct kmem_cache *s, const void *object)
-{
- struct page *page;
-
- if (!kern_ptr_validate(object, s->size))
- return 0;
-
- page = get_object_page(object);
-
- if (!page || s != page->slab)
- /* No slab or wrong slab */
- return 0;
-
- if (!check_valid_pointer(s, page, object))
- return 0;
-
- /*
- * We could also check if the object is on the slabs freelist.
- * But this would be too expensive and it seems that the main
- * purpose of kmem_ptr_valid() is to check if the object belongs
- * to a certain slab.
- */
- return 1;
-}
-EXPORT_SYMBOL(kmem_ptr_validate);
-
-/*
* Determine the size of a slab object
*/
unsigned int kmem_cache_size(struct kmem_cache *s)
Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c 2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/util.c 2010-11-17 00:11:49.000000000 +1100
@@ -186,27 +186,6 @@ void kzfree(const void *p)
}
EXPORT_SYMBOL(kzfree);

-int kern_ptr_validate(const void *ptr, unsigned long size)
-{
- unsigned long addr = (unsigned long)ptr;
- unsigned long min_addr = PAGE_OFFSET;
- unsigned long align_mask = sizeof(void *) - 1;
-
- if (unlikely(addr < min_addr))
- goto out;
- if (unlikely(addr > (unsigned long)high_memory - size))
- goto out;
- if (unlikely(addr & align_mask))
- goto out;
- if (unlikely(!kern_addr_valid(addr)))
- goto out;
- if (unlikely(!kern_addr_valid(addr + size - 1)))
- goto out;
- return 1;
-out:
- return 0;
-}
-
/*
* strndup_user - duplicate an existing string from user space
* @s: The string to duplicate
Index: linux-2.6/mm/slob.c
===================================================================
--- linux-2.6.orig/mm/slob.c 2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slob.c 2010-11-17 00:11:49.000000000 +1100
@@ -678,11 +678,6 @@ int kmem_cache_shrink(struct kmem_cache
}
EXPORT_SYMBOL(kmem_cache_shrink);

-int kmem_ptr_validate(struct kmem_cache *a, const void *b)
-{
- return 0;
-}
-
static unsigned int slob_ready __read_mostly;

int slab_is_available(void)

2010-12-15 12:53:22

by Al Viro

[permalink] [raw]
Subject: Re: [patch 1/2] fs: fix d_validate (again)

On Tue, Nov 23, 2010 at 09:22:41PM +1100, Nick Piggin wrote:
> Again, I'd like to please fix d_validate. It can be trivally fixed
> without delving into filesystem code, and it does not prevent the proper
> careful removal of d_validate and untrusted dentry caches from
> filesystems.
>
> The right way to merge is fix the bugs in core code now, and remove it
> completely if and when filesystems stop using it. But it's no longer a
> liability or problem to maintain, so it can stay as long as it likes.

TBH, I would prefer to kill the damn thing completely. Never liked it,
and I think I see how to get rid of it.

Look: what happens is that ncpfs reads the entire directory from server
and since it gets all metadata for free anyway, it creates all dentries
as it goes (assiging i_ino, etc.) and sticks references to them into
page cache for that directory. readdir() goes through those pages and
as long as all dentries it sees are still alive (and pages still present),
it uses them. If something's gone or the cache is old enough, we just
evict all those pages and reread. If some dentries are around during
reread (or initial read, for that matter), we stick references to those
back into the repopulated page cache.

The order of entries matches that in the metadata we'd got from server.

OK. So let's do that:
* stick reference back into the page in dentry->d_fsdata (the current
uses are becoming obsolete with new scheme)
* have d_iput() remove the reference to dentry from page under
spinlock
* have page eviction remove the references from corresponding dentries
under the same spinlock
* instead of d_validate() crap, grab the spinlock, lock dentry,
check if it's still alive and do dget_locked() on it if it is (and unlock,
of course).

Voila - d_validate() is no more. IIRC, smbfs was essentially the same
as ncpfs in that respect. Comments?

2010-12-16 02:47:03

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 1/2] fs: fix d_validate (again)

On Wed, Dec 15, 2010 at 12:53:15PM +0000, Al Viro wrote:
> On Tue, Nov 23, 2010 at 09:22:41PM +1100, Nick Piggin wrote:
> > Again, I'd like to please fix d_validate. It can be trivally fixed
> > without delving into filesystem code, and it does not prevent the proper
> > careful removal of d_validate and untrusted dentry caches from
> > filesystems.
> >
> > The right way to merge is fix the bugs in core code now, and remove it
> > completely if and when filesystems stop using it. But it's no longer a
> > liability or problem to maintain, so it can stay as long as it likes.
>
> TBH, I would prefer to kill the damn thing completely. Never liked it,
> and I think I see how to get rid of it.

Yes that's the logical next step.


> Look: what happens is that ncpfs reads the entire directory from server
> and since it gets all metadata for free anyway, it creates all dentries
> as it goes (assiging i_ino, etc.) and sticks references to them into
> page cache for that directory. readdir() goes through those pages and
> as long as all dentries it sees are still alive (and pages still present),
> it uses them. If something's gone or the cache is old enough, we just
> evict all those pages and reread. If some dentries are around during
> reread (or initial read, for that matter), we stick references to those
> back into the repopulated page cache.
>
> The order of entries matches that in the metadata we'd got from server.
>
> OK. So let's do that:
> * stick reference back into the page in dentry->d_fsdata (the current
> uses are becoming obsolete with new scheme)
> * have d_iput() remove the reference to dentry from page under
> spinlock
> * have page eviction remove the references from corresponding dentries
> under the same spinlock
> * instead of d_validate() crap, grab the spinlock, lock dentry,
> check if it's still alive and do dget_locked() on it if it is (and unlock,
> of course).
>
> Voila - d_validate() is no more. IIRC, smbfs was essentially the same
> as ncpfs in that respect. Comments?

TBH, it sounds plausible, but I have no way to test ncpfs, so I just
tried to stay away from such a big change after one look at trying to
rip the cache out.

It would be great if someone can do it, but can we not couple old
filesystem changes with this patch? Ie. just merge the bugfixes no
upstream and in -stable (nobody will care about performance, I'd bet on
it). Then d_validate can go away naturally after the filesystem changes
are merged and it has no callers left.