2011-05-31 00:33:44

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm

Here's a patchset for mmotm, based on 30-rc1. Nothing exciting,
mostly cleanup, preparation for what will probably be two more
patchsets coming over the next few weeks, first simplifying tmpfs
by getting rid of its ->readpage (give it a splice instead), then
getting rid of its peculiar swap index (use radix_tree instead).

The ordering here is somewhat illogical, arranged in the hope that
at least the first four can get into 30-rc, which would simplify
the dependencies between linux-next and mmotm.

The first is just an independent fix (I think) noticed on the way.
2,3,4 affect the interface between tmpfs and drm very slightly.
Once they're in 30-rc, drm maintainers could take 5,6,7,8 out of
mmotm and into linux-next (or even 30-rc).

1/14 mm: invalidate_mapping_pages flush cleancache
2/14 mm: move vmtruncate_range to truncate.c
3/14 tmpfs: take control of its truncate_range
4/14 tmpfs: add shmem_read_mapping_page_gfp
5/14 drm/ttm: use shmem_read_mapping_page
6/14 drm/i915: use shmem_read_mapping_page
7/14 drm/i915: adjust to new truncate_range
8/14 drm/i915: more struct_mutex locking
9/14 mm: cleanup descriptions of filler arg
10/14 mm: truncate functions are in truncate.c
11/14 mm: tidy vmtruncate_range and related functions
12/14 mm: consistent truncate and invalidate loops
13/14 mm: pincer in truncate_inode_pages_range
14/14 tmpfs: no need to use i_lock

drivers/gpu/drm/i915/i915_dma.c | 3
drivers/gpu/drm/i915/i915_gem.c | 36 ++---
drivers/gpu/drm/i915/intel_overlay.c | 5
drivers/gpu/drm/ttm/ttm_tt.c | 4
include/linux/mm.h | 3
include/linux/pagemap.h | 22 ++-
mm/filemap.c | 14 +-
mm/memory.c | 24 ---
mm/shmem.c | 79 +++++++-----
mm/truncate.c | 161 +++++++++++++------------
10 files changed, 185 insertions(+), 166 deletions(-)

Hugh


2011-05-31 00:35:51

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache

truncate_inode_pages_range() and invalidate_inode_pages2_range()
call cleancache_flush_inode(mapping) before and after: shouldn't
invalidate_mapping_pages() be doing the same?

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Dan Magenheimer <[email protected]>
---
mm/truncate.c | 2 ++
1 file changed, 2 insertions(+)

--- linux.orig/mm/truncate.c 2011-05-30 13:56:10.416798124 -0700
+++ linux/mm/truncate.c 2011-05-30 14:08:46.612547848 -0700
@@ -333,6 +333,7 @@ unsigned long invalidate_mapping_pages(s
unsigned long count = 0;
int i;

+ cleancache_flush_inode(mapping);
pagevec_init(&pvec, 0);
while (next <= end &&
pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
@@ -373,6 +374,7 @@ unsigned long invalidate_mapping_pages(s
mem_cgroup_uncharge_end();
cond_resched();
}
+ cleancache_flush_inode(mapping);
return count;
}
EXPORT_SYMBOL(invalidate_mapping_pages);

2011-05-31 00:37:01

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 2/14] mm: move vmtruncate_range to truncate.c

You would expect to find vmtruncate_range() next to vmtruncate()
in mm/truncate.c: move it there.

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/memory.c | 24 ------------------------
mm/truncate.c | 24 ++++++++++++++++++++++++
2 files changed, 24 insertions(+), 24 deletions(-)

--- linux.orig/mm/memory.c 2011-05-30 13:56:10.416798124 -0700
+++ linux/mm/memory.c 2011-05-30 14:09:52.908876549 -0700
@@ -2796,30 +2796,6 @@ void unmap_mapping_range(struct address_
}
EXPORT_SYMBOL(unmap_mapping_range);

-int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
-{
- struct address_space *mapping = inode->i_mapping;
-
- /*
- * If the underlying filesystem is not going to provide
- * a way to truncate a range of blocks (punch a hole) -
- * we should return failure right now.
- */
- if (!inode->i_op->truncate_range)
- return -ENOSYS;
-
- mutex_lock(&inode->i_mutex);
- down_write(&inode->i_alloc_sem);
- unmap_mapping_range(mapping, offset, (end - offset), 1);
- truncate_inode_pages_range(mapping, offset, end);
- unmap_mapping_range(mapping, offset, (end - offset), 1);
- inode->i_op->truncate_range(inode, offset, end);
- up_write(&inode->i_alloc_sem);
- mutex_unlock(&inode->i_mutex);
-
- return 0;
-}
-
/*
* We enter with non-exclusive mmap_sem (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
--- linux.orig/mm/truncate.c 2011-05-30 14:08:46.612547848 -0700
+++ linux/mm/truncate.c 2011-05-30 14:09:52.912876640 -0700
@@ -605,3 +605,27 @@ int vmtruncate(struct inode *inode, loff
return 0;
}
EXPORT_SYMBOL(vmtruncate);
+
+int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
+{
+ struct address_space *mapping = inode->i_mapping;
+
+ /*
+ * If the underlying filesystem is not going to provide
+ * a way to truncate a range of blocks (punch a hole) -
+ * we should return failure right now.
+ */
+ if (!inode->i_op->truncate_range)
+ return -ENOSYS;
+
+ mutex_lock(&inode->i_mutex);
+ down_write(&inode->i_alloc_sem);
+ unmap_mapping_range(mapping, offset, (end - offset), 1);
+ truncate_inode_pages_range(mapping, offset, end);
+ unmap_mapping_range(mapping, offset, (end - offset), 1);
+ inode->i_op->truncate_range(inode, offset, end);
+ up_write(&inode->i_alloc_sem);
+ mutex_unlock(&inode->i_mutex);
+
+ return 0;
+}

2011-05-31 00:39:07

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 3/14] tmpfs: take control of its truncate_range

2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the ->truncate_range interface, so far implemented
only by tmpfs to support madvise(,,MADV_REMOVE). Let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.

Let's rename shmem_notify_change() to shmem_setattr(). Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform
its own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there's no need to call it a third time.

Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
calls the tmpfs ->truncate_range directly: update that in a separate
patch later, for now just let it duplicate the truncate_inode_pages().
Because i915 handles unmap_mapping_range() itself at a different stage,
we have chosen not to bundle that into ->truncate_range.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
tmpfs should support fallocate? But worry about that another time...

mm/shmem.c | 42 +++++++++++++++++++++---------------------
mm/truncate.c | 4 ++--
2 files changed, 23 insertions(+), 23 deletions(-)

--- linux.orig/mm/shmem.c 2011-05-30 13:56:10.000000000 -0700
+++ linux/mm/shmem.c 2011-05-30 14:13:03.569821995 -0700
@@ -562,6 +562,8 @@ static void shmem_truncate_range(struct
spinlock_t *punch_lock;
unsigned long upper_limit;

+ truncate_inode_pages_range(inode->i_mapping, start, end);
+
inode->i_ctime = inode->i_mtime = CURRENT_TIME;
idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
if (idx >= info->next_index)
@@ -738,16 +740,8 @@ done2:
* lowered next_index. Also, though shmem_getpage checks
* i_size before adding to cache, no recheck after: so fix the
* narrow window there too.
- *
- * Recalling truncate_inode_pages_range and unmap_mapping_range
- * every time for punch_hole (which never got a chance to clear
- * SHMEM_PAGEIN at the start of vmtruncate_range) is expensive,
- * yet hardly ever necessary: try to optimize them out later.
*/
truncate_inode_pages_range(inode->i_mapping, start, end);
- if (punch_hole)
- unmap_mapping_range(inode->i_mapping, start,
- end - start, 1);
}

spin_lock(&info->lock);
@@ -767,21 +761,21 @@ done2:
}
}

-static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
+static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = dentry->d_inode;
- loff_t newsize = attr->ia_size;
int error;

error = inode_change_ok(inode, attr);
if (error)
return error;

- if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)
- && newsize != inode->i_size) {
+ if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
+ loff_t oldsize = inode->i_size;
+ loff_t newsize = attr->ia_size;
struct page *page = NULL;

- if (newsize < inode->i_size) {
+ if (newsize < oldsize) {
/*
* If truncating down to a partial page, then
* if that page is already allocated, hold it
@@ -810,12 +804,19 @@ static int shmem_notify_change(struct de
spin_unlock(&info->lock);
}
}
-
- /* XXX(truncate): truncate_setsize should be called last */
- truncate_setsize(inode, newsize);
+ if (newsize != oldsize) {
+ i_size_write(inode, newsize);
+ inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+ }
+ if (newsize < oldsize) {
+ loff_t holebegin = round_up(newsize, PAGE_SIZE);
+ unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+ shmem_truncate_range(inode, newsize, (loff_t)-1);
+ /* unmap again to remove racily COWed private pages */
+ unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+ }
if (page)
page_cache_release(page);
- shmem_truncate_range(inode, newsize, (loff_t)-1);
}

setattr_copy(inode, attr);
@@ -832,7 +833,6 @@ static void shmem_evict_inode(struct ino
struct shmem_xattr *xattr, *nxattr;

if (inode->i_mapping->a_ops == &shmem_aops) {
- truncate_inode_pages(inode->i_mapping, 0);
shmem_unacct_size(info->flags, inode->i_size);
inode->i_size = 0;
shmem_truncate_range(inode, 0, (loff_t)-1);
@@ -2706,7 +2706,7 @@ static const struct file_operations shme
};

static const struct inode_operations shmem_inode_operations = {
- .setattr = shmem_notify_change,
+ .setattr = shmem_setattr,
.truncate_range = shmem_truncate_range,
#ifdef CONFIG_TMPFS_XATTR
.setxattr = shmem_setxattr,
@@ -2739,7 +2739,7 @@ static const struct inode_operations shm
.removexattr = shmem_removexattr,
#endif
#ifdef CONFIG_TMPFS_POSIX_ACL
- .setattr = shmem_notify_change,
+ .setattr = shmem_setattr,
.check_acl = generic_check_acl,
#endif
};
@@ -2752,7 +2752,7 @@ static const struct inode_operations shm
.removexattr = shmem_removexattr,
#endif
#ifdef CONFIG_TMPFS_POSIX_ACL
- .setattr = shmem_notify_change,
+ .setattr = shmem_setattr,
.check_acl = generic_check_acl,
#endif
};
--- linux.orig/mm/truncate.c 2011-05-30 14:09:52.000000000 -0700
+++ linux/mm/truncate.c 2011-05-30 14:15:29.814546645 -0700
@@ -621,9 +621,9 @@ int vmtruncate_range(struct inode *inode
mutex_lock(&inode->i_mutex);
down_write(&inode->i_alloc_sem);
unmap_mapping_range(mapping, offset, (end - offset), 1);
- truncate_inode_pages_range(mapping, offset, end);
- unmap_mapping_range(mapping, offset, (end - offset), 1);
inode->i_op->truncate_range(inode, offset, end);
+ /* unmap again to remove racily COWed private pages */
+ unmap_mapping_range(mapping, offset, (end - offset), 1);
up_write(&inode->i_alloc_sem);
mutex_unlock(&inode->i_mutex);

2011-05-31 00:40:54

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp

Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's ->readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.

At present tmpfs provides a ->readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its ->readpage.
Provide now a shmem_read_mapping_page_gfp() ready for that transition,
and a shmem_read_mapping_page() inline for its common mapping_gfp case.

(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally
the read_mapping_page functions use the mapping's ->readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)

Tidy up the nearby declarations in pagemap.h.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
include/linux/pagemap.h | 22 +++++++++++++++-------
mm/shmem.c | 23 +++++++++++++++++++++++
2 files changed, 38 insertions(+), 7 deletions(-)

--- linux.orig/include/linux/pagemap.h 2011-05-30 13:56:10.212797101 -0700
+++ linux/include/linux/pagemap.h 2011-05-30 14:25:32.665536626 -0700
@@ -255,31 +255,39 @@ static inline struct page *grab_cache_pa
extern struct page * grab_cache_page_nowait(struct address_space *mapping,
pgoff_t index);
extern struct page * read_cache_page_async(struct address_space *mapping,
- pgoff_t index, filler_t *filler,
- void *data);
+ pgoff_t index, filler_t *filler, void *data);
extern struct page * read_cache_page(struct address_space *mapping,
- pgoff_t index, filler_t *filler,
- void *data);
+ pgoff_t index, filler_t *filler, void *data);
extern struct page * read_cache_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp_mask);
extern int read_cache_pages(struct address_space *mapping,
struct list_head *pages, filler_t *filler, void *data);

static inline struct page *read_mapping_page_async(
- struct address_space *mapping,
- pgoff_t index, void *data)
+ struct address_space *mapping,
+ pgoff_t index, void *data)
{
filler_t *filler = (filler_t *)mapping->a_ops->readpage;
return read_cache_page_async(mapping, index, filler, data);
}

static inline struct page *read_mapping_page(struct address_space *mapping,
- pgoff_t index, void *data)
+ pgoff_t index, void *data)
{
filler_t *filler = (filler_t *)mapping->a_ops->readpage;
return read_cache_page(mapping, index, filler, data);
}

+extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+ pgoff_t index, gfp_t gfp_mask);
+
+static inline struct page *shmem_read_mapping_page(
+ struct address_space *mapping, pgoff_t index)
+{
+ return shmem_read_mapping_page_gfp(mapping, index,
+ mapping_gfp_mask(mapping));
+}
+
/*
* Return byte-offset into filesystem object for page.
*/
--- linux.orig/mm/shmem.c 2011-05-30 14:13:03.569821995 -0700
+++ linux/mm/shmem.c 2011-05-30 14:25:32.665536626 -0700
@@ -3028,3 +3028,26 @@ int shmem_zero_setup(struct vm_area_stru
vma->vm_flags |= VM_CAN_NONLINEAR;
return 0;
}
+
+/**
+ * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags.
+ * @mapping: the page's address_space
+ * @index: the page index
+ * @gfp: the page allocator flags to use if allocating
+ *
+ * This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
+ * with any new page allocations done using the specified allocation flags.
+ * But read_cache_page_gfp() uses the ->readpage() method: which does not
+ * suit tmpfs, since it may have pages in swapcache, and needs to find those
+ * for itself; although drivers/gpu/drm i915 and ttm rely upon this support.
+ *
+ * Provide a stub for those callers to start using now, then later
+ * flesh it out to call shmem_getpage() with additional gfp mask, when
+ * shmem_file_splice_read() is added and shmem_readpage() is removed.
+ */
+struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+ pgoff_t index, gfp_t gfp)
+{
+ return read_cache_page_gfp(mapping, index, gfp);
+}
+EXPORT_SYMBOL_GPL(shmem_read_mapping_page_gfp);

2011-05-31 00:42:21

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 5/14] drm/ttm: use shmem_read_mapping_page

Soon tmpfs will stop supporting ->readpage and read_mapping_page():
once "tmpfs: add shmem_read_mapping_page_gfp" has been applied,
this patch can be applied to ease the transition.

ttm_tt_swapin() and ttm_tt_swapout() use shmem_read_mapping_page()
in place of read_mapping_page(), since their swap_space has been
created with shmem_file_setup().

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Thomas Hellstrom <[email protected]>
Cc: Dave Airlie <[email protected]>
---
drivers/gpu/drm/ttm/ttm_tt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- linux.orig/drivers/gpu/drm/ttm/ttm_tt.c 2011-05-30 13:56:10.112796608 -0700
+++ linux/drivers/gpu/drm/ttm/ttm_tt.c 2011-05-30 14:25:59.641670407 -0700
@@ -484,7 +484,7 @@ static int ttm_tt_swapin(struct ttm_tt *
swap_space = swap_storage->f_path.dentry->d_inode->i_mapping;

for (i = 0; i < ttm->num_pages; ++i) {
- from_page = read_mapping_page(swap_space, i, NULL);
+ from_page = shmem_read_mapping_page(swap_space, i);
if (IS_ERR(from_page)) {
ret = PTR_ERR(from_page);
goto out_err;
@@ -557,7 +557,7 @@ int ttm_tt_swapout(struct ttm_tt *ttm, s
from_page = ttm->pages[i];
if (unlikely(from_page == NULL))
continue;
- to_page = read_mapping_page(swap_space, i, NULL);
+ to_page = shmem_read_mapping_page(swap_space, i);
if (unlikely(IS_ERR(to_page))) {
ret = PTR_ERR(to_page);
goto out_err;

2011-05-31 00:43:49

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 6/14] drm/i915: use shmem_read_mapping_page

Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp():
once "tmpfs: add shmem_read_mapping_page_gfp" has been applied,
this patch can be applied to ease the transition.

i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(),
with the mapping's gfp_mask properly initialized.

Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Keith Packard <[email protected]>
Cc: Dave Airlie <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_gem.c 2011-05-30 13:56:10.068796399 -0700
+++ linux/drivers/gpu/drm/i915/i915_gem.c 2011-05-30 14:26:13.121737248 -0700
@@ -359,8 +359,7 @@ i915_gem_shmem_pread_fast(struct drm_dev
if ((page_offset + remain) > PAGE_SIZE)
page_length = PAGE_SIZE - page_offset;

- page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
if (IS_ERR(page))
return PTR_ERR(page);

@@ -463,8 +462,7 @@ i915_gem_shmem_pread_slow(struct drm_dev
if ((data_page_offset + page_length) > PAGE_SIZE)
page_length = PAGE_SIZE - data_page_offset;

- page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
if (IS_ERR(page))
return PTR_ERR(page);

@@ -796,8 +794,7 @@ i915_gem_shmem_pwrite_fast(struct drm_de
if ((page_offset + remain) > PAGE_SIZE)
page_length = PAGE_SIZE - page_offset;

- page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
if (IS_ERR(page))
return PTR_ERR(page);

@@ -906,8 +903,7 @@ i915_gem_shmem_pwrite_slow(struct drm_de
if ((data_page_offset + page_length) > PAGE_SIZE)
page_length = PAGE_SIZE - data_page_offset;

- page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
goto out;
@@ -1556,12 +1552,10 @@ i915_gem_object_get_pages_gtt(struct drm

inode = obj->base.filp->f_path.dentry->d_inode;
mapping = inode->i_mapping;
+ gfpmask |= mapping_gfp_mask(mapping);
+
for (i = 0; i < page_count; i++) {
- page = read_cache_page_gfp(mapping, i,
- GFP_HIGHUSER |
- __GFP_COLD |
- __GFP_RECLAIMABLE |
- gfpmask);
+ page = shmem_read_mapping_page_gfp(mapping, i, gfpmask);
if (IS_ERR(page))
goto err_pages;

@@ -3565,6 +3559,7 @@ struct drm_i915_gem_object *i915_gem_all
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_object *obj;
+ struct address_space *mapping;

obj = kzalloc(sizeof(*obj), GFP_KERNEL);
if (obj == NULL)
@@ -3575,6 +3570,9 @@ struct drm_i915_gem_object *i915_gem_all
return NULL;
}

+ mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
+ mapping_set_gfp_mask(mapping, GFP_HIGHUSER | __GFP_RECLAIMABLE);
+
i915_gem_info_add_obj(dev_priv, size);

obj->base.write_domain = I915_GEM_DOMAIN_CPU;
@@ -3950,8 +3948,7 @@ void i915_gem_detach_phys_object(struct

page_count = obj->base.size / PAGE_SIZE;
for (i = 0; i < page_count; i++) {
- struct page *page = read_cache_page_gfp(mapping, i,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ struct page *page = shmem_read_mapping_page(mapping, i);
if (!IS_ERR(page)) {
char *dst = kmap_atomic(page);
memcpy(dst, vaddr + i*PAGE_SIZE, PAGE_SIZE);
@@ -4012,8 +4009,7 @@ i915_gem_attach_phys_object(struct drm_d
struct page *page;
char *dst, *src;

- page = read_cache_page_gfp(mapping, i,
- GFP_HIGHUSER | __GFP_RECLAIMABLE);
+ page = shmem_read_mapping_page(mapping, i);
if (IS_ERR(page))
return PTR_ERR(page);

2011-05-31 00:45:14

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 7/14] drm/i915: adjust to new truncate_range

The interface to ->truncate_range is changing very slightly:
once "tmpfs: take control of its truncate_range" has been applied,
this can be applied. For now it's only a slight inefficiency while
this remains unapplied, but soon it will become essential.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Keith Packard <[email protected]>
Cc: Dave Airlie <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_gem.c 2011-05-30 14:26:13.121737248 -0700
+++ linux/drivers/gpu/drm/i915/i915_gem.c 2011-05-30 14:26:20.861775625 -0700
@@ -1693,13 +1693,13 @@ i915_gem_object_truncate(struct drm_i915
/* Our goal here is to return as much of the memory as
* is possible back to the system as we are called from OOM.
* To do this we must instruct the shmfs to drop all of its
- * backing pages, *now*. Here we mirror the actions taken
- * when by shmem_delete_inode() to release the backing store.
+ * backing pages, *now*.
*/
inode = obj->base.filp->f_path.dentry->d_inode;
- truncate_inode_pages(inode->i_mapping, 0);
if (inode->i_op->truncate_range)
inode->i_op->truncate_range(inode, 0, (loff_t)-1);
+ else
+ truncate_inode_pages(inode->i_mapping, 0);

obj->madv = __I915_MADV_PURGED;
}

2011-05-31 00:47:05

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 8/14] drm/i915: more struct_mutex locking

When auditing the locking in i915_gem.c (for a prospective change which
I then abandoned), I noticed two places where struct_mutex is not held
across GEM object manipulations that would usually require it. Since
one is in initial setup and the other in driver unload, I'm guessing
the mutex is not required for either; but post a patch in case it is.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Keith Packard <[email protected]>
---
drivers/gpu/drm/i915/i915_dma.c | 3 +--
drivers/gpu/drm/i915/intel_overlay.c | 5 +++++
2 files changed, 6 insertions(+), 2 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_dma.c 2011-05-30 13:56:09.972795920 -0700
+++ linux/drivers/gpu/drm/i915/i915_dma.c 2011-05-30 14:26:33.445838032 -0700
@@ -2182,9 +2182,8 @@ int i915_driver_unload(struct drm_device
/* Flush any outstanding unpin_work. */
flush_workqueue(dev_priv->wq);

- i915_gem_free_all_phys_object(dev);
-
mutex_lock(&dev->struct_mutex);
+ i915_gem_free_all_phys_object(dev);
i915_gem_cleanup_ringbuffer(dev);
mutex_unlock(&dev->struct_mutex);
if (I915_HAS_FBC(dev) && i915_powersave)
--- linux.orig/drivers/gpu/drm/i915/intel_overlay.c 2011-05-30 13:56:09.972795920 -0700
+++ linux/drivers/gpu/drm/i915/intel_overlay.c 2011-05-30 14:26:33.449838050 -0700
@@ -1416,6 +1416,8 @@ void intel_setup_overlay(struct drm_devi
goto out_free;
overlay->reg_bo = reg_bo;

+ mutex_lock(&dev->struct_mutex);
+
if (OVERLAY_NEEDS_PHYSICAL(dev)) {
ret = i915_gem_attach_phys_object(dev, reg_bo,
I915_GEM_PHYS_OVERLAY_REGS,
@@ -1440,6 +1442,8 @@ void intel_setup_overlay(struct drm_devi
}
}

+ mutex_unlock(&dev->struct_mutex);
+
/* init all values */
overlay->color_key = 0x0101fe;
overlay->brightness = -19;
@@ -1463,6 +1467,7 @@ void intel_setup_overlay(struct drm_devi
out_unpin_bo:
i915_gem_object_unpin(reg_bo);
out_free_bo:
+ mutex_unlock(&dev->struct_mutex);
drm_gem_object_unreference(&reg_bo->base);
out_free:
kfree(overlay);

2011-05-31 00:48:16

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 9/14] mm: cleanup descriptions of filler arg

The often-NULL data arg to read_cache_page() and read_mapping_page()
functions is misdescribed as "destination for read data": no, it's the
first arg to the filler function, often struct file * to ->readpage().
And satisfy checkpatch.pl on those filler prototypes.

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/filemap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- linux.orig/mm/filemap.c 2011-05-30 13:56:10.260797344 -0700
+++ linux/mm/filemap.c 2011-05-30 14:26:54.097940438 -0700
@@ -1795,7 +1795,7 @@ EXPORT_SYMBOL(generic_file_readonly_mmap

static struct page *__read_cache_page(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *,struct page*),
+ int (*filler)(void *, struct page *),
void *data,
gfp_t gfp)
{
@@ -1826,7 +1826,7 @@ repeat:

static struct page *do_read_cache_page(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *,struct page*),
+ int (*filler)(void *, struct page *),
void *data,
gfp_t gfp)

@@ -1866,7 +1866,7 @@ out:
* @mapping: the page's address_space
* @index: the page index
* @filler: function to perform the read
- * @data: destination for read data
+ * @data: first arg to filler(data, page) function, often left as NULL
*
* Same as read_cache_page, but don't wait for page to become unlocked
* after submitting it to the filler.
@@ -1878,7 +1878,7 @@ out:
*/
struct page *read_cache_page_async(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *,struct page*),
+ int (*filler)(void *, struct page *),
void *data)
{
return do_read_cache_page(mapping, index, filler, data, mapping_gfp_mask(mapping));
@@ -1926,7 +1926,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
* @mapping: the page's address_space
* @index: the page index
* @filler: function to perform the read
- * @data: destination for read data
+ * @data: first arg to filler(data, page) function, often left as NULL
*
* Read into the page cache. If a page already exists, and PageUptodate() is
* not set, try to fill the page then wait for it to become unlocked.
@@ -1935,7 +1935,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
*/
struct page *read_cache_page(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *,struct page*),
+ int (*filler)(void *, struct page *),
void *data)
{
return wait_on_page_read(read_cache_page_async(mapping, index, filler, data));

2011-05-31 00:49:55

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 10/14] mm: truncate functions are in truncate.c

Correct comment on truncate_inode_pages*() in linux/mm.h; and remove
declaration of page_unuse(), it didn't exist even in 2.2.26 or 2.4.0!

Signed-off-by: Hugh Dickins <[email protected]>
---
include/linux/mm.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- linux.orig/include/linux/mm.h 2011-05-30 13:56:10.444798255 -0700
+++ linux/include/linux/mm.h 2011-05-30 14:43:05.642758070 -0700
@@ -1445,8 +1445,7 @@ extern int do_munmap(struct mm_struct *,

extern unsigned long do_brk(unsigned long, unsigned long);

-/* filemap.c */
-extern unsigned long page_unuse(struct page *);
+/* truncate.c */
extern void truncate_inode_pages(struct address_space *, loff_t);
extern void truncate_inode_pages_range(struct address_space *,
loff_t lstart, loff_t lend);

2011-05-31 00:51:08

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 11/14] mm: tidy vmtruncate_range and related functions

Use consistent variable names in truncate_pagecache(), truncate_setsize(),
vmtruncate() and vmtruncate_range().

unmap_mapping_range() and vmtruncate_range() have mismatched interfaces:
don't change either, but make the vmtruncates more precise about what
they expect unmap_mapping_range() to do.

vmtruncate_range() is currently called only with page-aligned start and
end+1: can handle unaligned start, but unaligned end+1 would hit BUG_ON
in truncate_inode_pages_range() (lacks partial clearing of the end page).

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/truncate.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)

--- linux.orig/mm/truncate.c 2011-05-30 14:15:29.000000000 -0700
+++ linux/mm/truncate.c 2011-05-30 14:51:03.553127951 -0700
@@ -528,8 +528,8 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
/**
* truncate_pagecache - unmap and remove pagecache that has been truncated
* @inode: inode
- * @old: old file offset
- * @new: new file offset
+ * @oldsize: old file size
+ * @newsize: new file size
*
* inode's new i_size must already be written before truncate_pagecache
* is called.
@@ -541,9 +541,10 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
* situations such as writepage being called for a page that has already
* had its underlying blocks deallocated.
*/
-void truncate_pagecache(struct inode *inode, loff_t old, loff_t new)
+void truncate_pagecache(struct inode *inode, loff_t oldsize, loff_t newsize)
{
struct address_space *mapping = inode->i_mapping;
+ loff_t holebegin = round_up(newsize, PAGE_SIZE);

/*
* unmap_mapping_range is called twice, first simply for
@@ -554,9 +555,9 @@ void truncate_pagecache(struct inode *in
* truncate_inode_pages finishes, hence the second
* unmap_mapping_range call must be made for correctness.
*/
- unmap_mapping_range(mapping, new + PAGE_SIZE - 1, 0, 1);
- truncate_inode_pages(mapping, new);
- unmap_mapping_range(mapping, new + PAGE_SIZE - 1, 0, 1);
+ unmap_mapping_range(mapping, holebegin, 0, 1);
+ truncate_inode_pages(mapping, newsize);
+ unmap_mapping_range(mapping, holebegin, 0, 1);
}
EXPORT_SYMBOL(truncate_pagecache);

@@ -586,29 +587,31 @@ EXPORT_SYMBOL(truncate_setsize);
/**
* vmtruncate - unmap mappings "freed" by truncate() syscall
* @inode: inode of the file used
- * @offset: file offset to start truncating
+ * @newsize: file offset to start truncating
*
* This function is deprecated and truncate_setsize or truncate_pagecache
* should be used instead, together with filesystem specific block truncation.
*/
-int vmtruncate(struct inode *inode, loff_t offset)
+int vmtruncate(struct inode *inode, loff_t newsize)
{
int error;

- error = inode_newsize_ok(inode, offset);
+ error = inode_newsize_ok(inode, newsize);
if (error)
return error;

- truncate_setsize(inode, offset);
+ truncate_setsize(inode, newsize);
if (inode->i_op->truncate)
inode->i_op->truncate(inode);
return 0;
}
EXPORT_SYMBOL(vmtruncate);

-int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
+int vmtruncate_range(struct inode *inode, loff_t lstart, loff_t lend)
{
struct address_space *mapping = inode->i_mapping;
+ loff_t holebegin = round_up(lstart, PAGE_SIZE);
+ loff_t holelen = 1 + lend - holebegin;

/*
* If the underlying filesystem is not going to provide
@@ -620,10 +623,10 @@ int vmtruncate_range(struct inode *inode

mutex_lock(&inode->i_mutex);
down_write(&inode->i_alloc_sem);
- unmap_mapping_range(mapping, offset, (end - offset), 1);
- inode->i_op->truncate_range(inode, offset, end);
+ unmap_mapping_range(mapping, holebegin, holelen, 1);
+ inode->i_op->truncate_range(inode, lstart, lend);
/* unmap again to remove racily COWed private pages */
- unmap_mapping_range(mapping, offset, (end - offset), 1);
+ unmap_mapping_range(mapping, holebegin, holelen, 1);
up_write(&inode->i_alloc_sem);
mutex_unlock(&inode->i_mutex);

2011-05-31 00:52:40

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 12/14] mm: consistent truncate and invalidate loops

Make the pagevec_lookup loops in truncate_inode_pages_range(),
invalidate_mapping_pages() and invalidate_inode_pages2_range()
more consistent with each other.

They were relying upon page->index of an unlocked page, but apologizing
for it: accept it, embrace it, add comments and WARN_ONs, and simplify
the index handling.

invalidate_inode_pages2_range() had special handling for a wrapped
page->index + 1 = 0 case; but MAX_LFS_FILESIZE doesn't let us anywhere
near there, and a corrupt page->index in the radix_tree could cause
more trouble than that would catch. Remove that wrapped handling.

invalidate_inode_pages2_range() uses min() to limit the pagevec_lookup
when near the end of the range: copy that into the other two, although
it's less useful than you might think (it limits the use of the buffer,
rather than the indices looked up).

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/filemap.c | 2
mm/truncate.c | 110 ++++++++++++++++++++----------------------------
2 files changed, 49 insertions(+), 63 deletions(-)

--- linux.orig/mm/filemap.c 2011-05-30 14:26:54.000000000 -0700
+++ linux/mm/filemap.c 2011-05-30 14:57:02.130905988 -0700
@@ -131,6 +131,7 @@ void __delete_from_page_cache(struct pag

radix_tree_delete(&mapping->page_tree, page->index);
page->mapping = NULL;
+ /* Leave page->index set: truncation lookup relies upon it */
mapping->nrpages--;
__dec_zone_page_state(page, NR_FILE_PAGES);
if (PageSwapBacked(page))
@@ -486,6 +487,7 @@ int add_to_page_cache_locked(struct page
spin_unlock_irq(&mapping->tree_lock);
} else {
page->mapping = NULL;
+ /* Leave page->index set: truncation relies upon it */
spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
page_cache_release(page);
--- linux.orig/mm/truncate.c 2011-05-30 14:51:03.000000000 -0700
+++ linux/mm/truncate.c 2011-05-30 15:01:01.660093602 -0700
@@ -199,9 +199,6 @@ int invalidate_inode_page(struct page *p
* The first pass will remove most pages, so the search cost of the second pass
* is low.
*
- * When looking at page->index outside the page lock we need to be careful to
- * copy it into a local to avoid races (it could change at any time).
- *
* We pass down the cache-hot hint to the page freeing code. Even if the
* mapping is large, it is probably the case that the final pages are the most
* recently touched, and freeing happens in ascending file offset order.
@@ -210,10 +207,10 @@ void truncate_inode_pages_range(struct a
loff_t lstart, loff_t lend)
{
const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
- pgoff_t end;
const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
struct pagevec pvec;
- pgoff_t next;
+ pgoff_t index;
+ pgoff_t end;
int i;

cleancache_flush_inode(mapping);
@@ -224,24 +221,21 @@ void truncate_inode_pages_range(struct a
end = (lend >> PAGE_CACHE_SHIFT);

pagevec_init(&pvec, 0);
- next = start;
- while (next <= end &&
- pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+ index = start;
+ while (index <= end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
- pgoff_t page_index = page->index;

- if (page_index > end) {
- next = page_index;
+ /* We rely upon deletion not changing page->index */
+ index = page->index;
+ if (index > end)
break;
- }

- if (page_index > next)
- next = page_index;
- next++;
if (!trylock_page(page))
continue;
+ WARN_ON(page->index != index);
if (PageWriteback(page)) {
unlock_page(page);
continue;
@@ -252,6 +246,7 @@ void truncate_inode_pages_range(struct a
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched();
+ index++;
}

if (partial) {
@@ -264,13 +259,14 @@ void truncate_inode_pages_range(struct a
}
}

- next = start;
+ index = start;
for ( ; ; ) {
cond_resched();
- if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
- if (next == start)
+ if (!pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ if (index == start)
break;
- next = start;
+ index = start;
continue;
}
if (pvec.pages[0]->index > end) {
@@ -281,18 +277,20 @@ void truncate_inode_pages_range(struct a
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];

- if (page->index > end)
+ /* We rely upon deletion not changing page->index */
+ index = page->index;
+ if (index > end)
break;
+
lock_page(page);
+ WARN_ON(page->index != index);
wait_on_page_writeback(page);
truncate_inode_page(mapping, page);
- if (page->index > next)
- next = page->index;
- next++;
unlock_page(page);
}
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
+ index++;
}
cleancache_flush_inode(mapping);
}
@@ -328,36 +326,27 @@ unsigned long invalidate_mapping_pages(s
pgoff_t start, pgoff_t end)
{
struct pagevec pvec;
- pgoff_t next = start;
+ pgoff_t index = start;
unsigned long ret;
unsigned long count = 0;
int i;

cleancache_flush_inode(mapping);
pagevec_init(&pvec, 0);
- while (next <= end &&
- pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+ while (index <= end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
- pgoff_t index;
- int lock_failed;
-
- lock_failed = !trylock_page(page);

- /*
- * We really shouldn't be looking at the ->index of an
- * unlocked page. But we're not allowed to lock these
- * pages. So we rely upon nobody altering the ->index
- * of this (pinned-by-us) page.
- */
+ /* We rely upon deletion not changing page->index */
index = page->index;
- if (index > next)
- next = index;
- next++;
- if (lock_failed)
- continue;
+ if (index > end)
+ break;

+ if (!trylock_page(page))
+ continue;
+ WARN_ON(page->index != index);
ret = invalidate_inode_page(page);
unlock_page(page);
/*
@@ -367,12 +356,11 @@ unsigned long invalidate_mapping_pages(s
if (!ret)
deactivate_page(page);
count += ret;
- if (next > end)
- break;
}
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched();
+ index++;
}
cleancache_flush_inode(mapping);
return count;
@@ -439,37 +427,32 @@ int invalidate_inode_pages2_range(struct
pgoff_t start, pgoff_t end)
{
struct pagevec pvec;
- pgoff_t next;
+ pgoff_t index;
int i;
int ret = 0;
int ret2 = 0;
int did_range_unmap = 0;
- int wrapped = 0;

cleancache_flush_inode(mapping);
pagevec_init(&pvec, 0);
- next = start;
- while (next <= end && !wrapped &&
- pagevec_lookup(&pvec, mapping, next,
- min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ index = start;
+ while (index <= end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
- pgoff_t page_index;
+
+ /* We rely upon deletion not changing page->index */
+ index = page->index;
+ if (index > end)
+ break;

lock_page(page);
+ WARN_ON(page->index != index);
if (page->mapping != mapping) {
unlock_page(page);
continue;
}
- page_index = page->index;
- next = page_index + 1;
- if (next == 0)
- wrapped = 1;
- if (page_index > end) {
- unlock_page(page);
- break;
- }
wait_on_page_writeback(page);
if (page_mapped(page)) {
if (!did_range_unmap) {
@@ -477,9 +460,9 @@ int invalidate_inode_pages2_range(struct
* Zap the rest of the file in one hit.
*/
unmap_mapping_range(mapping,
- (loff_t)page_index<<PAGE_CACHE_SHIFT,
- (loff_t)(end - page_index + 1)
- << PAGE_CACHE_SHIFT,
+ (loff_t)index << PAGE_CACHE_SHIFT,
+ (loff_t)(1 + end - index)
+ << PAGE_CACHE_SHIFT,
0);
did_range_unmap = 1;
} else {
@@ -487,8 +470,8 @@ int invalidate_inode_pages2_range(struct
* Just zap this page
*/
unmap_mapping_range(mapping,
- (loff_t)page_index<<PAGE_CACHE_SHIFT,
- PAGE_CACHE_SIZE, 0);
+ (loff_t)index << PAGE_CACHE_SHIFT,
+ PAGE_CACHE_SIZE, 0);
}
}
BUG_ON(page_mapped(page));
@@ -504,6 +487,7 @@ int invalidate_inode_pages2_range(struct
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched();
+ index++;
}
cleancache_flush_inode(mapping);
return ret;

2011-05-31 00:54:07

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 13/14] mm: pincer in truncate_inode_pages_range

truncate_inode_pages_range()'s final loop has a nice pincer property,
bringing start and end together, squeezing out the last pages. But
the range handling missed out on that, just sliding up the range,
perhaps letting pages come in behind it. Add one more test to give
it the same pincer effect.

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/truncate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- linux.orig/mm/truncate.c 2011-05-30 15:01:01.660093602 -0700
+++ linux/mm/truncate.c 2011-05-30 15:03:28.688822856 -0700
@@ -269,7 +269,7 @@ void truncate_inode_pages_range(struct a
index = start;
continue;
}
- if (pvec.pages[0]->index > end) {
+ if (index == start && pvec.pages[0]->index > end) {
pagevec_release(&pvec);
break;
}

2011-05-31 00:55:51

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 14/14] tmpfs: no need to use i_lock

2.6.36's 7e496299d4d2 to make tmpfs scalable with percpu_counter used
inode->i_lock in place of sbinfo->stat_lock around i_blocks updates;
but that was adverse to scalability, and unnecessary, since info->lock
is already held there in the fast paths.

Remove those uses of i_lock, and add info->lock in the three error
paths where it's then needed across shmem_free_blocks(). It's not
actually needed across shmem_unacct_blocks(), but they're so often
paired that it looks wrong to split them apart.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Tim Chen <[email protected]>
---
mm/shmem.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

--- linux.orig/mm/shmem.c 2011-05-30 14:25:32.665536626 -0700
+++ linux/mm/shmem.c 2011-05-30 15:03:41.680887254 -0700
@@ -241,9 +241,7 @@ static void shmem_free_blocks(struct ino
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
if (sbinfo->max_blocks) {
percpu_counter_add(&sbinfo->used_blocks, -pages);
- spin_lock(&inode->i_lock);
inode->i_blocks -= pages*BLOCKS_PER_PAGE;
- spin_unlock(&inode->i_lock);
}
}

@@ -432,9 +430,7 @@ static swp_entry_t *shmem_swp_alloc(stru
sbinfo->max_blocks - 1) >= 0)
return ERR_PTR(-ENOSPC);
percpu_counter_inc(&sbinfo->used_blocks);
- spin_lock(&inode->i_lock);
inode->i_blocks += BLOCKS_PER_PAGE;
- spin_unlock(&inode->i_lock);
}

spin_unlock(&info->lock);
@@ -1420,9 +1416,7 @@ repeat:
shmem_acct_block(info->flags))
goto nospace;
percpu_counter_inc(&sbinfo->used_blocks);
- spin_lock(&inode->i_lock);
inode->i_blocks += BLOCKS_PER_PAGE;
- spin_unlock(&inode->i_lock);
} else if (shmem_acct_block(info->flags))
goto nospace;

@@ -1433,8 +1427,10 @@ repeat:
spin_unlock(&info->lock);
filepage = shmem_alloc_page(gfp, info, idx);
if (!filepage) {
+ spin_lock(&info->lock);
shmem_unacct_blocks(info->flags, 1);
shmem_free_blocks(inode, 1);
+ spin_unlock(&info->lock);
error = -ENOMEM;
goto failed;
}
@@ -1448,8 +1444,10 @@ repeat:
current->mm, GFP_KERNEL);
if (error) {
page_cache_release(filepage);
+ spin_lock(&info->lock);
shmem_unacct_blocks(info->flags, 1);
shmem_free_blocks(inode, 1);
+ spin_unlock(&info->lock);
filepage = NULL;
goto failed;
}
@@ -1479,10 +1477,10 @@ repeat:
* be done automatically.
*/
if (ret) {
- spin_unlock(&info->lock);
- page_cache_release(filepage);
shmem_unacct_blocks(info->flags, 1);
shmem_free_blocks(inode, 1);
+ spin_unlock(&info->lock);
+ page_cache_release(filepage);
filepage = NULL;
if (error)
goto failed;

2011-05-31 15:53:38

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache

> From: Hugh Dickins [mailto:[email protected]]
> Sent: Monday, May 30, 2011 6:36 PM
> To: Andrew Morton
> Cc: Dan Magenheimer; [email protected]; [email protected]
> Subject: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache
>
> truncate_inode_pages_range() and invalidate_inode_pages2_range()
> call cleancache_flush_inode(mapping) before and after: shouldn't
> invalidate_mapping_pages() be doing the same?
>
> Signed-off-by: Hugh Dickins <[email protected]>
> Cc: Dan Magenheimer <[email protected]>
> ---
> mm/truncate.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> --- linux.orig/mm/truncate.c 2011-05-30 13:56:10.416798124 -0700
> +++ linux/mm/truncate.c 2011-05-30 14:08:46.612547848 -0700
> @@ -333,6 +333,7 @@ unsigned long invalidate_mapping_pages(s
> unsigned long count = 0;
> int i;
>
> + cleancache_flush_inode(mapping);
> pagevec_init(&pvec, 0);
> while (next <= end &&
> pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
> @@ -373,6 +374,7 @@ unsigned long invalidate_mapping_pages(s
> mem_cgroup_uncharge_end();
> cond_resched();
> }
> + cleancache_flush_inode(mapping);
> return count;
> }
> EXPORT_SYMBOL(invalidate_mapping_pages);

Hi Hugh --

I don't claim to be an expert on VFS, and so I have cc'ed
Chris Mason who originally placed the cleancache hooks
in VFS, but I think this patch is unnecessary. Instead
of flushing ALL of the cleancache pages belonging to
the inode with cleancache_flush_inode, the existing code
eventually calls __delete_from_page_cache on EACH page
that is being invalidated. And since __delete_from_page_cache
calls cleancache_flush_page, only that subset of pages
in the mapping that invalidate_mapping_pages() would
invalidate (which, from the comment above the routine
indicates, is only *unlocked* pages) is removed from
cleancache.

However, there may be some path through VFS I am missing
or something else subtle I am misunderstanding so please
clarify if either is true.

Thanks,
Dan
---
Thanks... for the memory!
I really could use more / my throughput's on the floor
The balloon is flat / my swap disk's fat / I've OOM's in store
Overcommitted so much
(with apologies to Bob Hope)

2011-05-31 16:10:32

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH 14/14] tmpfs: no need to use i_lock

On Mon, 2011-05-30 at 17:55 -0700, Hugh Dickins wrote:
> 2.6.36's 7e496299d4d2 to make tmpfs scalable with percpu_counter used
> inode->i_lock in place of sbinfo->stat_lock around i_blocks updates;
> but that was adverse to scalability, and unnecessary, since info->lock
> is already held there in the fast paths.
>
> Remove those uses of i_lock, and add info->lock in the three error
> paths where it's then needed across shmem_free_blocks(). It's not
> actually needed across shmem_unacct_blocks(), but they're so often
> paired that it looks wrong to split them apart.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> Cc: Tim Chen <[email protected]>
> ---

Acked.

Tim Chen


2011-05-31 17:05:54

by Hugh Dickins

[permalink] [raw]
Subject: RE: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache

On Tue, 31 May 2011, Dan Magenheimer wrote:
> >
> > truncate_inode_pages_range() and invalidate_inode_pages2_range()
> > call cleancache_flush_inode(mapping) before and after: shouldn't
> > invalidate_mapping_pages() be doing the same?
>
> I don't claim to be an expert on VFS, and so I have cc'ed
> Chris Mason who originally placed the cleancache hooks
> in VFS, but I think this patch is unnecessary. Instead
> of flushing ALL of the cleancache pages belonging to
> the inode with cleancache_flush_inode, the existing code
> eventually calls __delete_from_page_cache on EACH page
> that is being invalidated.

On each one that's in pagecache (and satisfies the other "can we
do it easily?" conditions peculiar to invalidate_mapping_pages()).
But there may be other slots in the range that don't reach
__delete_from_page_cache() e.g. because not currently in pagecache,
but whose cleancache ought to be flushed. I think that's what a
caller of invalidate_mapping_pages(), e.g. drop caches, expects.

> And since __delete_from_page_cache
> calls cleancache_flush_page, only that subset of pages
> in the mapping that invalidate_mapping_pages() would
> invalidate (which, from the comment above the routine
> indicates, is only *unlocked* pages) is removed from
> cleancache.

It's nice to target the particular range asked for, rather than
throwing away all the cleancache for the whole mapping, I can
see that (though that's a defect in the cleancache_flush_inode()
interface). But then why do truncate_inode_pages_range() and
invalidate_inode_pages2_range() throw it all away, despite
going down to __delete_from_page_cache on individual pages found?

Maybe the right patch is to remove cleancache_flush_inode() from
the two instead of adding it to the one? But I think not.

Hugh

2011-05-31 21:08:42

by Chris Mason

[permalink] [raw]
Subject: RE: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache

Excerpts from Hugh Dickins's message of 2011-05-31 13:05:27 -0400:
> On Tue, 31 May 2011, Dan Magenheimer wrote:
> > >
> > > truncate_inode_pages_range() and invalidate_inode_pages2_range()
> > > call cleancache_flush_inode(mapping) before and after: shouldn't
> > > invalidate_mapping_pages() be doing the same?
> >
> > I don't claim to be an expert on VFS, and so I have cc'ed
> > Chris Mason who originally placed the cleancache hooks
> > in VFS, but I think this patch is unnecessary. Instead
> > of flushing ALL of the cleancache pages belonging to
> > the inode with cleancache_flush_inode, the existing code
> > eventually calls __delete_from_page_cache on EACH page
> > that is being invalidated.
>
> On each one that's in pagecache (and satisfies the other "can we
> do it easily?" conditions peculiar to invalidate_mapping_pages()).
> But there may be other slots in the range that don't reach
> __delete_from_page_cache() e.g. because not currently in pagecache,
> but whose cleancache ought to be flushed. I think that's what a
> caller of invalidate_mapping_pages(), e.g. drop caches, expects.

We call invalidate_mapping_pages from prune_icache, so if we drop the
cleancache there we lose the cache entries any time the inode is dropped
from ram.

Is there a specific case you're thinking of where we want to drop the
cleancache but don't have the pages?

O_DIRECT perhaps?

-chris

2011-05-31 22:01:49

by Hugh Dickins

[permalink] [raw]
Subject: RE: [PATCH 1/14] mm: invalidate_mapping_pages flush cleancache

On Tue, 31 May 2011, Chris Mason wrote:
> Excerpts from Hugh Dickins's message of 2011-05-31 13:05:27 -0400:
> > On Tue, 31 May 2011, Dan Magenheimer wrote:
> > > >
> > > > truncate_inode_pages_range() and invalidate_inode_pages2_range()
> > > > call cleancache_flush_inode(mapping) before and after: shouldn't
> > > > invalidate_mapping_pages() be doing the same?
> > >
> > > I don't claim to be an expert on VFS, and so I have cc'ed
> > > Chris Mason who originally placed the cleancache hooks
> > > in VFS, but I think this patch is unnecessary. Instead
> > > of flushing ALL of the cleancache pages belonging to
> > > the inode with cleancache_flush_inode, the existing code
> > > eventually calls __delete_from_page_cache on EACH page
> > > that is being invalidated.
> >
> > On each one that's in pagecache (and satisfies the other "can we
> > do it easily?" conditions peculiar to invalidate_mapping_pages()).
> > But there may be other slots in the range that don't reach
> > __delete_from_page_cache() e.g. because not currently in pagecache,
> > but whose cleancache ought to be flushed. I think that's what a
> > caller of invalidate_mapping_pages(), e.g. drop caches, expects.
>
> We call invalidate_mapping_pages from prune_icache, so if we drop the
> cleancache there we lose the cache entries any time the inode is dropped
> from ram.

I hadn't noticed that use of invalidate_mapping_pages(). Right,
I can understand that you wouldn't want to drop the cleancache there.

I was more conscious of the dispose_list() at the end of prune_icache(),
which would call truncate_inode_pages(), which would flush cleancache.

Ah, but inode only gets on the freeable list if can_unuse(inode),
and one of the reasons to retain the inode is if nrpages is non-0.

All rather odd, and what it adds up to, I think, is that if that
invalidate_mapping_pages() succeeds in removing all the pages from
the page cache (but leaving some in cleancache), then the inode may
advance to truncate_inode_pages(), and meet cleancache_flush_inode()
there. (It happens to be called before the mapping->nrpages test.)

All rather odd, both the pruning decisions and the cleancache decisions.

>
> Is there a specific case you're thinking of where we want to drop the
> cleancache but don't have the pages?

The case I was thinking of, where I'd met invalidate_mapping_pages()
before, is fs/drop_caches.c ... which is about, er, dropping caches.

(But in general, why would you care to keep the cleancache when you
do have the pages? I thought its use was for the pages we don't have.)

>
> O_DIRECT perhaps?

Hadn't given it a thought. But now you mention it, yes,
that one looks like a worry too.

Hugh

2011-06-01 00:36:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/14] mm: move vmtruncate_range to truncate.c

On Mon, May 30, 2011 at 05:36:57PM -0700, Hugh Dickins wrote:
> You would expect to find vmtruncate_range() next to vmtruncate()
> in mm/truncate.c: move it there.

Sounds fine to me.

2011-06-01 00:39:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/14] tmpfs: take control of its truncate_range

> Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
> calls the tmpfs ->truncate_range directly: update that in a separate
> patch later, for now just let it duplicate the truncate_inode_pages().
> Because i915 handles unmap_mapping_range() itself at a different stage,
> we have chosen not to bundle that into ->truncate_range.

In your next series that makes it call the readpae replacement directly
it might be nice to also call directly into shmem for hole punching.

> I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
> FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
> tmpfs should support fallocate? But worry about that another time...

No, truncate_range and the madvice interface are pretty sad hacks that
should never have been added in the first place. Adding
FALLOC_FL_PUNCH_HOLE support for shmem on the other hand might make
some sense.

2011-06-01 00:42:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp

> (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally
> the read_mapping_page functions use the mapping's ->readpage, and the
> read_cache_page functions use the supplied filler, so I think
> read_cache_page_gfp was slightly misnamed.)

What about just shmem_read_page? It's not using the pagecache, so
no need for the mapping or cache, and the _gfp really is just a hack
because the old pagecache APIs didn't allow to pass the gfp flags.
For a new API there's no need for that.

> +static inline struct page *shmem_read_mapping_page(
> + struct address_space *mapping, pgoff_t index)
> +{
> + return shmem_read_mapping_page_gfp(mapping, index,
> + mapping_gfp_mask(mapping));
> +}

This really shouldn't be in pagemap.h. For now probably in shmem_fs.h

2011-06-01 00:43:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 7/14] drm/i915: adjust to new truncate_range

> /* Our goal here is to return as much of the memory as
> * is possible back to the system as we are called from OOM.
> * To do this we must instruct the shmfs to drop all of its
> - * backing pages, *now*. Here we mirror the actions taken
> - * when by shmem_delete_inode() to release the backing store.
> + * backing pages, *now*.
> */
> inode = obj->base.filp->f_path.dentry->d_inode;
> - truncate_inode_pages(inode->i_mapping, 0);
> if (inode->i_op->truncate_range)
> inode->i_op->truncate_range(inode, 0, (loff_t)-1);
> + else
> + truncate_inode_pages(inode->i_mapping, 0);

Given that it relies on beeing on shmemfs it should just call it
directly.

2011-06-01 16:58:32

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 3/14] tmpfs: take control of its truncate_range

Thanks a lot for looking at these.

On Tue, 31 May 2011, Christoph Hellwig wrote:
> > Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
> > calls the tmpfs ->truncate_range directly: update that in a separate
> > patch later, for now just let it duplicate the truncate_inode_pages().
> > Because i915 handles unmap_mapping_range() itself at a different stage,
> > we have chosen not to bundle that into ->truncate_range.
>
> In your next series that makes it call the readpae replacement directly
> it might be nice to also call directly into shmem for hole punching.

(i915 isn't really doing hole-punching there, I think it just found it
a useful interface to remove the page-and-swapcache without touching
i_size. Parentheses because it makes no difference to your point.)

Okay, I'd better do a v2 (probably not before the weekend), and change
that around to go explicitly to shmem there as well: I'd rather settle
the interfaces to other subsystems in this series, than mix it with the
implementation in the next series.

When I say "shmem", I am including the !SHMEM-was-TINY_SHMEM case too,
which goes to ramfs. Currently i915 has been configured to disable that
possibility, though we insisted on it originally: there may or may not be
good reason for disabling it - may just be a side-effect of the rather
twisted unintuitive SHMEM/TMPFS dependencies.

>
> > I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
> > FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
> > tmpfs should support fallocate? But worry about that another time...
>
> No, truncate_range and the madvice interface are pretty sad hacks that
> should never have been added in the first place. Adding
> FALLOC_FL_PUNCH_HOLE support for shmem on the other hand might make
> some sense.

Fine, I'll add tmpfs PUNCH_HOLE later on. And wire up madvise MADV_REMOVE
to fallocate PUNCH_HOLE, yes?

Would you like me to remove the ->truncate_range method from
inode_operations completely? I can do that now, hack directly to tmpfs
in the interim, in the same way as for i915.

Hugh

2011-06-01 17:02:37

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp

On Tue, 31 May 2011, Christoph Hellwig wrote:
> > (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally
> > the read_mapping_page functions use the mapping's ->readpage, and the
> > read_cache_page functions use the supplied filler, so I think
> > read_cache_page_gfp was slightly misnamed.)
>
> What about just shmem_read_page? It's not using the pagecache, so
> no need for the mapping or cache, and the _gfp really is just a hack
> because the old pagecache APIs didn't allow to pass the gfp flags.
> For a new API there's no need for that.

I am trying to mirror what's already there, and it's conceivable that
we'll find others who want converting from read_mapping_page() to
shmem_read_mapping_page(), so I'd rather keep the similar naming.

The _gfp() thing I'm not very keen on, but can't avoid it for i915.
If anyone else needs something here, it's likely to be without _gfp().

>
> > +static inline struct page *shmem_read_mapping_page(
> > + struct address_space *mapping, pgoff_t index)
> > +{
> > + return shmem_read_mapping_page_gfp(mapping, index,
> > + mapping_gfp_mask(mapping));
> > +}
>
> This really shouldn't be in pagemap.h. For now probably in shmem_fs.h

Okay, I can move it there, if I do the same for those shmem_...()
interfaces currently in linux/mm.h too.

Hugh

2011-06-01 17:04:08

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 7/14] drm/i915: adjust to new truncate_range

On Tue, 31 May 2011, Christoph Hellwig wrote:

> > /* Our goal here is to return as much of the memory as
> > * is possible back to the system as we are called from OOM.
> > * To do this we must instruct the shmfs to drop all of its
> > - * backing pages, *now*. Here we mirror the actions taken
> > - * when by shmem_delete_inode() to release the backing store.
> > + * backing pages, *now*.
> > */
> > inode = obj->base.filp->f_path.dentry->d_inode;
> > - truncate_inode_pages(inode->i_mapping, 0);
> > if (inode->i_op->truncate_range)
> > inode->i_op->truncate_range(inode, 0, (loff_t)-1);
> > + else
> > + truncate_inode_pages(inode->i_mapping, 0);
>
> Given that it relies on beeing on shmemfs it should just call it
> directly.

As agreed in other mail, I'll do a v2 series in a few days,
making that change - thanks.

Hugh

2011-06-03 05:16:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/14] tmpfs: take control of its truncate_range

On Wed, Jun 01, 2011 at 09:58:18AM -0700, Hugh Dickins wrote:
> (i915 isn't really doing hole-punching there, I think it just found it
> a useful interface to remove the page-and-swapcache without touching
> i_size. Parentheses because it makes no difference to your point.)

Keeping i_size while removing pages on tmpfs fits the defintion of hole
punching for me. Not that it matters anyway.

> When I say "shmem", I am including the !SHMEM-was-TINY_SHMEM case too,
> which goes to ramfs. Currently i915 has been configured to disable that
> possibility, though we insisted on it originally: there may or may not be
> good reason for disabling it - may just be a side-effect of the rather
> twisted unintuitive SHMEM/TMPFS dependencies.

Hmm, the two different implementations make everything harder. Also
because we don't even implement the hole punching in !SHMEM tmpfs.

> Fine, I'll add tmpfs PUNCH_HOLE later on. And wire up madvise MADV_REMOVE
> to fallocate PUNCH_HOLE, yes?

Yeah. One thing I've noticed is that the hole punching doesn't seem
to do the unmap_mapping_range. It might be worth to audit that from the
VM point of view.

> Would you like me to remove the ->truncate_range method from
> inode_operations completely?

Doing that would be nice. Do we always have the required file struct
for ->fallocate in the callers?

2011-06-06 05:37:22

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 3/14] tmpfs: take control of its truncate_range

On Fri, 3 Jun 2011, Christoph Hellwig wrote:
> On Wed, Jun 01, 2011 at 09:58:18AM -0700, Hugh Dickins wrote:
>
> > Fine, I'll add tmpfs PUNCH_HOLE later on. And wire up madvise MADV_REMOVE
> > to fallocate PUNCH_HOLE, yes?
>
> Yeah. One thing I've noticed is that the hole punching doesn't seem
> to do the unmap_mapping_range. It might be worth to audit that from the
> VM point of view.

I'd noticed that recently too.

At first I was alarmed, but it's actually an inefficiency rather than
a danger: because at some stage a safety unmap_mapping_range() call has
been added into truncate_inode_page(). I don't know what case that was
originally for, but it will cover fallocate() for now.

This is a call to unmap_mapping_range() with 0 for the even_cows arg
i.e. it will not remove COWed copies of the file page from private
mappings. I think that's good semantics for hole punching (and it's
difficult to enforce the alternative, because we've neither i_size nor
page lock to prevent races); but it does differ from the (odd) POSIX
truncation behaviour, to unmap even the private COWs.

What do you think? If you think we should unmap COWs, then it ought
to be corrrected sooner. Otherwise I was inclined not to rush (I'm
also wondering about cleancache in truncation: that should be another
mail thread, but might call for passing down a flag useful here too).

You might notice that the alternate hole-punching's vmtruncate_range()
is passing even_cows 1: doesn't matter in practice, since that one has
been restricted to operating on shared writable mappings. Oh, I
suppose there could also be a parallel private writable mapping,
whose COWs would get unmapped. Hmm, worth worrying about if we
choose the opposite with fallocate hole punching?

>
> > Would you like me to remove the ->truncate_range method from
> > inode_operations completely?
>
> Doing that would be nice. Do we always have the required file struct
> for ->fallocate in the callers?

Good point, but yes, no problem.

I'm carrying on using ->truncate_range for the moment, partly because
I don't want to get diverted by testing the ->fallocate alternative yet,
but also because removing ->truncate_range now would force an immediate
change on drm/i915: better use shmem_truncate_range() for the transition.

Hugh