Commit bde05d1ccd51 ("shmem: replace page if mapping excludes its zone")
is not at all likely to break for anyone, but it was an earlier version
from before review feedback was incorporated. Fix that up now.
* shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
* Expand comment on why shmem_unuse_inode needs page_swapcount [akpm]
* Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong]
* Check page_private matches swap before calling shmem_replace_page [hughd]
* shmem_replace_page allow for unexpected race in radix_tree lookup [hughd]
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Stephane Marchesin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Rob Clark <[email protected]>
---
Andrew, many thanks for sending those patches in for v3.5. You'll find
a [PATCH v2 1/10] languishing in your mailbox from 18 May. Since that
didn't make it to Linus, please delete it, and send this incremental
instead. I don't see much point in renaming shmem_replace_page now,
but if you still feel that shmem_substitute_page is preferable, we
can send another patch to make that change. Thanks - Hugh
mm/shmem.c | 57 +++++++++++++++++++++++++++++++++------------------
1 file changed, 37 insertions(+), 20 deletions(-)
--- 3.4.0+/mm/shmem.c 2012-05-30 08:17:19.404008281 -0700
+++ linux/mm/shmem.c 2012-05-31 12:26:28.920160948 -0700
@@ -683,10 +683,21 @@ static int shmem_unuse_inode(struct shme
mutex_lock(&shmem_swaplist_mutex);
/*
* We needed to drop mutex to make that restrictive page
- * allocation; but the inode might already be freed by now,
- * and we cannot refer to inode or mapping or info to check.
- * However, we do hold page lock on the PageSwapCache page,
- * so can check if that still has our reference remaining.
+ * allocation, but the inode might have been freed while we
+ * dropped it: although a racing shmem_evict_inode() cannot
+ * complete without emptying the radix_tree, our page lock
+ * on this swapcache page is not enough to prevent that -
+ * free_swap_and_cache() of our swap entry will only
+ * trylock_page(), removing swap from radix_tree whatever.
+ *
+ * We must not proceed to shmem_add_to_page_cache() if the
+ * inode has been freed, but of course we cannot rely on
+ * inode or mapping or info to check that. However, we can
+ * safely check if our swap entry is still in use (and here
+ * it can't have got reused for another page): if it's still
+ * in use, then the inode cannot have been freed yet, and we
+ * can safely proceed (if it's no longer in use, that tells
+ * nothing about the inode, but we don't need to unuse swap).
*/
if (!page_swapcount(*pagep))
error = -ENOENT;
@@ -730,9 +741,9 @@ int shmem_unuse(swp_entry_t swap, struct
/*
* There's a faint possibility that swap page was replaced before
- * caller locked it: it will come back later with the right page.
+ * caller locked it: caller will come back later with the right page.
*/
- if (unlikely(!PageSwapCache(page)))
+ if (unlikely(!PageSwapCache(page) || page_private(page) != swap.val))
goto out;
/*
@@ -995,21 +1006,15 @@ static int shmem_replace_page(struct pag
newpage = shmem_alloc_page(gfp, info, index);
if (!newpage)
return -ENOMEM;
- VM_BUG_ON(shmem_should_replace_page(newpage, gfp));
- *pagep = newpage;
page_cache_get(newpage);
copy_highpage(newpage, oldpage);
+ flush_dcache_page(newpage);
- VM_BUG_ON(!PageLocked(oldpage));
__set_page_locked(newpage);
- VM_BUG_ON(!PageUptodate(oldpage));
SetPageUptodate(newpage);
- VM_BUG_ON(!PageSwapBacked(oldpage));
SetPageSwapBacked(newpage);
- VM_BUG_ON(!swap_index);
set_page_private(newpage, swap_index);
- VM_BUG_ON(!PageSwapCache(oldpage));
SetPageSwapCache(newpage);
/*
@@ -1019,13 +1024,24 @@ static int shmem_replace_page(struct pag
spin_lock_irq(&swap_mapping->tree_lock);
error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
newpage);
- __inc_zone_page_state(newpage, NR_FILE_PAGES);
- __dec_zone_page_state(oldpage, NR_FILE_PAGES);
+ if (!error) {
+ __inc_zone_page_state(newpage, NR_FILE_PAGES);
+ __dec_zone_page_state(oldpage, NR_FILE_PAGES);
+ }
spin_unlock_irq(&swap_mapping->tree_lock);
- BUG_ON(error);
- mem_cgroup_replace_page_cache(oldpage, newpage);
- lru_cache_add_anon(newpage);
+ if (unlikely(error)) {
+ /*
+ * Is this possible? I think not, now that our callers check
+ * both PageSwapCache and page_private after getting page lock;
+ * but be defensive. Reverse old to newpage for clear and free.
+ */
+ oldpage = newpage;
+ } else {
+ mem_cgroup_replace_page_cache(oldpage, newpage);
+ lru_cache_add_anon(newpage);
+ *pagep = newpage;
+ }
ClearPageSwapCache(oldpage);
set_page_private(oldpage, 0);
@@ -1033,7 +1049,7 @@ static int shmem_replace_page(struct pag
unlock_page(oldpage);
page_cache_release(oldpage);
page_cache_release(oldpage);
- return 0;
+ return error;
}
/*
@@ -1107,7 +1123,8 @@ repeat:
/* We have to do this with page locked to prevent races */
lock_page(page);
- if (!PageSwapCache(page) || page->mapping) {
+ if (!PageSwapCache(page) || page_private(page) != swap.val ||
+ page->mapping) {
error = -EEXIST; /* try again */
goto failed;
}
Hi Hugh,
On Thu, May 31, 2012 at 03:31:27PM -0700, Hugh Dickins wrote:
> * shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
>
> - *pagep = newpage;
> page_cache_get(newpage);
> copy_highpage(newpage, oldpage);
> + flush_dcache_page(newpage);
>
Couldn't we use the lighter flush_kernel_dcache_page() here (like in
fs/exec.c copy_strings())? If I got this correctly, the page is
copied via the kernel mapping and thus, only the kernel mapping needs
to be flushed.
- Simon
On Fri, 8 Jun 2012, Simon Baatz wrote:
> On Thu, May 31, 2012 at 03:31:27PM -0700, Hugh Dickins wrote:
> > * shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
>
> >
> > - *pagep = newpage;
> > page_cache_get(newpage);
> > copy_highpage(newpage, oldpage);
> > + flush_dcache_page(newpage);
> >
>
> Couldn't we use the lighter flush_kernel_dcache_page() here (like in
> fs/exec.c copy_strings())? If I got this correctly, the page is
> copied via the kernel mapping and thus, only the kernel mapping needs
> to be flushed.
Sorry for being so slow to respond, I had to focus on something else.
That's an interesting question you raise: I think you are almost right.
You are correct that it's copied via kernel mapping; and this page
cannot yet be visible to userspace.
But I have to say that you're "almost" right, because when I look up
flush_kernel_dcache_page(), I notice that it's supposed to be called
while the page is still kmapped (if kmap() or kmap_atomic() were
necessary). So I would have to pull apart copy_highpage() and
do it inside there.
There are four uses of flush_dcache_page() in mm/shmem.c. One of
those (in do_shmem_file_read()) should certainly not be converted
to flush_kernel_dcache_page(), but the rest could be.
However, I'm reluctant to do so myself, since I don't test on any
architecture which has a non-default flush_kernel_dcache_page(),
and I'm not at all familiar with it either.
fs/exec.c is rather an exception to be using it. I believe it was
introduced to solve a coherency issue at the block or driver level,
and generally nothing in fs or mm has been using it.
You may well be right that savings (on arm, mips, parisc) could be
made in various places by using it in place of flush_dcache_page().
But I'd rather leave that exercise to someone who understands better
what they're doing, and can see the results if they get it wrong.
Hugh