Hi,
At the moment, the kernel frees the following things at
exit() / exec() time:
1) memory in the process page tables + swapcache belonging
to these pages
2) swap space used by the process (refcount decrement)
However, it does NOT free:
3) swap cache and space belonging to the process, where the
page is in RAM (and the swap cache) but NOT in the process'
page tables
The attached patch fixes the problem by simply looking up the
address in the swap cache and freeing the page if it's there.
Comments ?
regards,
Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)
http://www.surriel.com/ http://distro.conectiva.com/
--- linux-2.4.12-ac3/mm/memory.c.freemore Thu Oct 18 01:20:48 2001
+++ linux-2.4.12-ac3/mm/memory.c Thu Oct 18 01:37:32 2001
@@ -302,7 +302,7 @@
/* This will eventually call __free_pte on the pte. */
tlb_remove_page(tlb, ptep, address + offset);
} else {
- swap_free(pte_to_swp_entry(pte));
+ free_swap_and_swap_cache(pte_to_swp_entry(pte));
pte_clear(ptep);
}
}
--- linux-2.4.12-ac3/mm/swap_state.c.freemore Thu Oct 18 01:20:59 2001
+++ linux-2.4.12-ac3/mm/swap_state.c Thu Oct 18 02:32:47 2001
@@ -147,6 +147,29 @@
}
/*
+ * Like the above, but used to clean up the non-resident pages of
+ * a process. If the page exists but we couldn't get the trylock,
+ * the pageout code will remove the page later.
+ */
+void free_swap_and_swap_cache(swp_entry_t entry)
+{
+ struct page * page;
+
+ /* Free our own reference to the swap space */
+ swap_free(entry);
+
+ /* If the swapcache is the only remaining user, free that too. */
+ page = find_trylock_page(&swapper_space, entry.val);
+ if (page) {
+ if (exclusive_swap_page(page)) {
+ delete_from_swap_cache(page);
+ }
+ UnlockPage(page);
+ page_cache_release(page);
+ }
+}
+
+/*
* Lookup a swap entry in the swap cache. A found page will be returned
* unlocked and with its refcount incremented - we rely on the kernel
* lock getting page table operations atomic even if we drop the page
--- linux-2.4.12-ac3/mm/filemap.c.freemore Thu Oct 18 01:21:04 2001
+++ linux-2.4.12-ac3/mm/filemap.c Thu Oct 18 02:03:03 2001
@@ -740,12 +740,8 @@
* the hash-list needs a held write-lock.
*/
repeat:
- spin_lock(&pagecache_lock);
- page = __find_page_nolock(mapping, offset, *hash);
+ page = __find_get_page(mapping, offset, hash);
if (page) {
- page_cache_get(page);
- spin_unlock(&pagecache_lock);
-
lock_page(page);
/* Is the page still hashed? Ok, good.. */
@@ -757,7 +753,36 @@
page_cache_release(page);
goto repeat;
}
- spin_unlock(&pagecache_lock);
+ return NULL;
+}
+
+/*
+ * Find a page in the page cache and return it to us locked and
+ * with the page count incremented, but only if nobody else has
+ * it locked already. Used by the VM to opportunistically grab
+ * a page in places where we don't want to sleep.
+ */
+struct page * find_trylock_page (struct address_space *mapping,
+ unsigned long offset)
+{
+ struct page *page;
+
+ page = __find_get_page(mapping, offset, page_hash(mapping, offset));
+ if (page) {
+ /* If we cannot get the page, drop it and return NULL. */
+ if (TryLockPage(page)) {
+ page_cache_release(page);
+ return NULL;
+ }
+
+ /* The page didn't get removed/remapped behind our backs ? */
+ if (page->mapping == mapping && page->index == offset)
+ return page;
+
+ /* Hrrrm, it did; release the page and return NULL. */
+ UnlockPage(page);
+ page_cache_release(page);
+ }
return NULL;
}
--- linux-2.4.12-ac3/include/linux/swap.h.freemore Thu Oct 18 01:31:15 2001
+++ linux-2.4.12-ac3/include/linux/swap.h Thu Oct 18 01:34:57 2001
@@ -141,6 +141,7 @@
extern void __delete_from_swap_cache(struct page *page);
extern void delete_from_swap_cache(struct page *page);
extern void free_page_and_swap_cache(struct page *page);
+extern void free_swap_and_swap_cache(swp_entry_t);
extern struct page * lookup_swap_cache(swp_entry_t);
extern struct page * read_swap_cache_async(swp_entry_t);
--- linux-2.4.12-ac3/include/linux/pagemap.h.freemore Thu Oct 18 01:31:35 2001
+++ linux-2.4.12-ac3/include/linux/pagemap.h Thu Oct 18 01:45:35 2001
@@ -77,6 +77,7 @@
__find_get_page(mapping, index, page_hash(mapping, index))
extern struct page * __find_lock_page (struct address_space * mapping,
unsigned long index, struct page **hash);
+extern struct page * find_trylock_page (struct address_space *, unsigned long);
extern void lock_page(struct page *page);
#define find_lock_page(mapping, index) \
__find_lock_page(mapping, index, page_hash(mapping, index))
On Thu, 18 Oct 2001, Rik van Riel wrote:
>
> At the moment, the kernel frees the following things at
> exit() / exec() time:
> 1) memory in the process page tables + swapcache belonging
> to these pages
> 2) swap space used by the process (refcount decrement)
Yes, it has the information readily to hand to do so.
We did go through a phase of not freeing swap+cache there,
but Linus got Marcelo to reinstate it - because swapoff
wasted so much time on them then, or other reasons?
> However, it does NOT free:
> 3) swap cache and space belonging to the process, where the
> page is in RAM (and the swap cache) but NOT in the process'
> page tables
Yes, which is asymmetric, but it doesn't currently do so for fear
of the repeated hash lookup overhead on a large address space?
and Linus' preference for letting them be dealt with by regular
memory recycling mechanisms in the course of time. Similar
patches have been rejected or ignored in the past.
> The attached patch fixes the problem by simply looking up the
> address in the swap cache and freeing the page if it's there.
The pages (and their swap) will be found and freed later by vmscan
when memory pressure demands. And vm_enough_memory (over-)allows
for the interim. Does out_of_memory allow for it? I _think_ the
answer is currently "yes" in -linus, but "no" in -ac; but both
might benefit from a proper count of such pages.
Am I right in thinking "the problem" you allude to is that these
stale swap cache pages clog up the lists, and useful cache pages
get reclaimed for other use meanwhile, before the stale swap cache
pages get discovered further down the list? And that this cannot
be solved by keeping counts of such pages: which would be lighter
than hash lookups, and good for knowing we have a glut of swap
stales, but no use for actually finding what's best to free?
Your patch looks good to me, if measurements confirm it's a win.
I wanted to find some obscure race there, but failed. What I would
like is a comment in find_swap_and_swap_cache, between the swap_free
and the find_trylock_page, making it explicit that the page found by
find_trylock_page may not be "ours" at all: once that swap_free has
been done, something else (usually vmscanning, maybe swapoff, maybe
another) could come in and do the delete_from_swap_cache on what was
"our" page and swap, and either or both be reallocated to other use.
If the new use of the swap entry passes the exclusive_swap_cache test
(unlikely), then it's still okay for us to delete_from_swap_cache.
And I'd prefer you not to add find_trylock_page at all, nor rewrite
__find_lock_page. The latter rewrite just makes the point that we
already have far too many variants on __find_get_grab_lock_page in
filemap.c (and undoes some inlining): adding yet another variant
for just a single use doesn't make anything clearer for me. Its
page->mapping check is unnecessary: exclusive_swap_page will check
PageSwapCache. And its page->index check is unnecessary: if you
end up catching some other exclusive swap page here, it too is
fair game for delete_from_swap_cache. Please just find_get_page
and TryLockPage within free_swap_and_swap_cache?
Hugh
On Fri, 19 Oct 2001, Hugh Dickins wrote:
> Please just find_get_page and TryLockPage within
> free_swap_and_swap_cache?
I really don't like the idea of sprinkling the magic all
around the VM subsystem, but prefer to keep the code
easier to maintain instead.
About the "undoes some inlining", I guess we might as
well mark __find_get_page() inline then so it gets
included into __find_lock_page(), after all it's the
equivalent code so it should end up the same as before.
regards,
Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)
http://www.surriel.com/ http://distro.conectiva.com/