The last known bugs were addressed in this latest version of the swap
prefetching patch. Thanks to the testers out there who helped it get this
far.
-Prefetched pages weren't handled properly by the lru lists.
-Prefetch groups are now 10 times larger when laptop_mode is enabled thus
decreasing the amount of time spent prefetching and thus the disk spinning.
-Documentation as suggested by Ingo Oeser
Incremental patches and latest available here:
http://ck.kolivas.org/patches/swap-prefetch/
Cheers,
Con
---
On Fri, 7 Oct 2005 12:01 am, Con Kolivas wrote:
> The last known bugs were addressed in this latest version of the swap
> prefetching patch. Thanks to the testers out there who helped it get this
> far.
>
> -Prefetched pages weren't handled properly by the lru lists.
> -Prefetch groups are now 10 times larger when laptop_mode is enabled thus
> decreasing the amount of time spent prefetching and thus the disk spinning.
> -Documentation as suggested by Ingo Oeser
>
> Incremental patches and latest available here:
> http://ck.kolivas.org/patches/swap-prefetch/
And the docs...
Cheers,
Con
---
Hi Con,
A teeny-weeny nitpick:
On 10/6/05, Con Kolivas <[email protected]> wrote:
> +struct swapped_root_t {
[snip]
> +struct swapped_entry_t {
[snip]
Since these are not typedefs, please drop the _t postfix.
Pekka
On Fri, 7 Oct 2005 20:03, Pekka Enberg wrote:
> Hi Con,
>
> A teeny-weeny nitpick:
>
> On 10/6/05, Con Kolivas <[email protected]> wrote:
> > +struct swapped_root_t {
>
> [snip]
>
> > +struct swapped_entry_t {
>
> [snip]
>
> Since these are not typedefs, please drop the _t postfix.
Good point, thanks! Any and all feedback is appreciated.
Cheers,
Con
Hi,
On 10/7/05, Con Kolivas <[email protected]> wrote:
> Good point, thanks! Any and all feedback is appreciated.
Well, since you asked :-)
> +/*
> + * How many pages to prefetch at a time. We prefetch SWAP_CLUSTER_MAX *
> + * swap_prefetch per PREFETCH_INTERVAL, but prefetch ten times as much at a
> + * time in laptop_mode to minimise the time we keep the disk spinning.
> + */
> +#define PREFETCH_PAGES() (SWAP_CLUSTER_MAX * swap_prefetch * \
> + (1 + 9 * laptop_mode))
This looks strange. Please either drop the parenthesis from PREFETCH_PAGES or
make it a real static inline function.
> +/*
> + * Find the zone with the most free pages, recheck the watermarks and
> + * then directly allocate the ram. We don't want prefetch to use
> + * __alloc_pages and go calling on reclaim.
> + */
> +static struct page *prefetch_get_page(void)
> +{
Should this be put in mm/page_alloc.c? It is, after all, a special-purpose
page allocator. That way you wouldn't have to export zone_statistics and
buffered_rmqueue.
> +/*
> + * trickle_swap is the main function that initiates the swap prefetching. It
> + * first checks to see if the busy flag is set, and does not prefetch if it
> + * is, as the flag implied we are low on memory or swapping in currently.
> + * Otherwise it runs till PREFETCH_PAGES() are prefetched.
> + * This function returns 1 if it succeeds in a cycle of prefetching, 0 if it
> + * is interrupted or -1 if there is nothing left to prefetch.
> + */
> +static int trickle_swap(void)
> +{
This could perhaps use a three-state enum as return value. I find return value
checks in kprefetchd() slightly confusing.
Pekka
On 10/6/05, Con Kolivas <[email protected]> wrote:
> The last known bugs were addressed in this latest version of the swap
> prefetching patch. Thanks to the testers out there who helped it get this
> far.
>
> -Prefetched pages weren't handled properly by the lru lists.
> -Prefetch groups are now 10 times larger when laptop_mode is enabled thus
> decreasing the amount of time spent prefetching and thus the disk spinning.
> -Documentation as suggested by Ingo Oeser
>
> Incremental patches and latest available here:
> http://ck.kolivas.org/patches/swap-prefetch/
>
Ciao Con,
i downloading right now kernel 2.6.14-rc3 and your latest patch (v15),
in the weekend I'll update my Ubuntu platform and I'll like to compare
performance of vanilla vs vm-swap_prefetch.
Any hint about what kind of instrumentation I could use in order to
get interesting and useful numbers ?
Thanks!
Regards,
--
Paolo
http://technologynews.altervista.org
On Fri, 7 Oct 2005 21:31, Pekka Enberg wrote:
> Hi,
>
> On 10/7/05, Con Kolivas <[email protected]> wrote:
> > Good point, thanks! Any and all feedback is appreciated.
>
> Well, since you asked :-)
>
> > +/*
> > + * How many pages to prefetch at a time. We prefetch SWAP_CLUSTER_MAX *
> > + * swap_prefetch per PREFETCH_INTERVAL, but prefetch ten times as much
> > at a + * time in laptop_mode to minimise the time we keep the disk
> > spinning. + */
> > +#define PREFETCH_PAGES() (SWAP_CLUSTER_MAX * swap_prefetch * \
> > + (1 + 9 * laptop_mode))
>
> This looks strange. Please either drop the parenthesis from PREFETCH_PAGES
> or make it a real static inline function.
I have seen this sort of macro style before in the kernel where () just makes
it clear that it is a function but a real static inline is ok with me.
> > +/*
> > + * Find the zone with the most free pages, recheck the watermarks and
> > + * then directly allocate the ram. We don't want prefetch to use
> > + * __alloc_pages and go calling on reclaim.
> > + */
> > +static struct page *prefetch_get_page(void)
> > +{
>
> Should this be put in mm/page_alloc.c? It is, after all, a special-purpose
> page allocator. That way you wouldn't have to export zone_statistics and
> buffered_rmqueue.
Makes sense but it is only used in the CONFIG_SWAP_PREFETCH case so it would
end up as a static inline in swap.h to avoid ending being #ifdefed in
page_alloc.c. Do you think that's preferable to having it in
swap_prefetch.c ?
>
> > +/*
> > + * trickle_swap is the main function that initiates the swap
> > prefetching. It + * first checks to see if the busy flag is set, and does
> > not prefetch if it + * is, as the flag implied we are low on memory or
> > swapping in currently. + * Otherwise it runs till PREFETCH_PAGES() are
> > prefetched.
> > + * This function returns 1 if it succeeds in a cycle of prefetching, 0
> > if it + * is interrupted or -1 if there is nothing left to prefetch.
> > + */
> > +static int trickle_swap(void)
> > +{
>
> This could perhaps use a three-state enum as return value. I find return
> value checks in kprefetchd() slightly confusing.
Good idea.
Thanks!
Con
On Fri, 7 Oct 2005 21:46, Paolo Ciarrocchi wrote:
> On 10/6/05, Con Kolivas <[email protected]> wrote:
> > The last known bugs were addressed in this latest version of the swap
> > prefetching patch. Thanks to the testers out there who helped it get this
> > far.
> >
> > -Prefetched pages weren't handled properly by the lru lists.
> > -Prefetch groups are now 10 times larger when laptop_mode is enabled thus
> > decreasing the amount of time spent prefetching and thus the disk
> > spinning. -Documentation as suggested by Ingo Oeser
> >
> > Incremental patches and latest available here:
> > http://ck.kolivas.org/patches/swap-prefetch/
>
> Ciao Con,
> i downloading right now kernel 2.6.14-rc3 and your latest patch (v15),
> in the weekend I'll update my Ubuntu platform and I'll like to compare
> performance of vanilla vs vm-swap_prefetch.
Great!
> Any hint about what kind of instrumentation I could use in order to
> get interesting and useful numbers ?
To get some useful advantage from it you have to have a workload that swaps on
your hardware in the first place. I have a simple mechanism to induce it
reliably where I open a few large applications concurrently - browser and
office suite come to mind, then create my swap load:
tail -f /dev/zero
works real nice. Either let it run to completion and hope that tail gets
oom-killed or ctrl-c it after you've used up half your swapspace.
Then I usually let vmstat 1 run in the background. Leave the machine for 5 or
10 minutes and do simple wallclock time from the moment you click on the
application till it is available. Try changing swap_prefetch from 2 to 0
in /proc/sys/vm/swap_prefetch to compare the difference. This is a very
coarse and somewhat contrived example, but even in real world settings where
you hit swapspace during normal usage it is slowly trickling in in the
background and I find it makes a noticeable difference.
If you watch vmstat with swap_prefetch enabled, you'll notice it doing SI of
256KB (at default setting of swap_prefetch==2) every second when the machine
is very idle. Then after all the pages have been prefetched, when you click
on the application you shouldn't see anything in SI at all implying no swap
in. You can see how many entries are currently in the prefetch swaplist
easily enough with
cat /proc/slabinfo | grep swapped_entry
where the first numeric column of active objects shows the number of pages in
the list (not all of them will result in a swapped in page).
Cheers,
Con
On Fri, 7 Oct 2005, Con Kolivas wrote:
> Makes sense but it is only used in the CONFIG_SWAP_PREFETCH case so it would
> end up as a static inline in swap.h to avoid ending being #ifdefed in
> page_alloc.c. Do you think that's preferable to having it in
> swap_prefetch.c ?
But then you would still have to open up buffered_rmqueue() and
zone_statistics() to everyone, no? How about you implement a new gfp flag
__GFP_NEVER_RECLAIM similar to __GFP_NORECLAIM instead so you don't have
to duplicate __page_alloc()?
Pekka
On Fri, 7 Oct 2005 22:26, Pekka J Enberg wrote:
> On Fri, 7 Oct 2005, Con Kolivas wrote:
> > Makes sense but it is only used in the CONFIG_SWAP_PREFETCH case so it
> > would end up as a static inline in swap.h to avoid ending being #ifdefed
> > in page_alloc.c. Do you think that's preferable to having it in
> > swap_prefetch.c ?
>
> But then you would still have to open up buffered_rmqueue() and
> zone_statistics() to everyone, no?
bah of course.. /me slaps forehead
> How about you implement a new gfp flag
> __GFP_NEVER_RECLAIM similar to __GFP_NORECLAIM instead so you don't have
> to duplicate __page_alloc()?
That will end up being far more intrusive than this version and __alloc_pages
would need more tests that affect every call to __alloc_pages which seems
much more expensive to me than exporting buffered_rmqueue and
zone_statistics, and the modified __alloc_pages will still be a much more
complicated function than prefetch_get_page.
Thanks,
Con
On Fri, 7 Oct 2005, Con Kolivas wrote:
> That will end up being far more intrusive than this version and __alloc_pages
> would need more tests that affect every call to __alloc_pages which seems
> much more expensive to me than exporting buffered_rmqueue and
> zone_statistics, and the modified __alloc_pages will still be a much more
> complicated function than prefetch_get_page.
Short-term, perhaps. However, what you are doing is inventing your own
page allocator which, I suspect, is more expensive in the long term.
Up to you of course and I am probably the wrong person to talk to about
this. Never the less, here's a totally untested patch to do it.
Pekka
Index: 2.6/include/linux/gfp.h
===================================================================
--- 2.6.orig/include/linux/gfp.h
+++ 2.6/include/linux/gfp.h
@@ -41,6 +41,7 @@ struct vm_area_struct;
#define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
#define __GFP_NORECLAIM 0x20000u /* No realy zone reclaim during allocation */
#define __GFP_HARDWALL 0x40000u /* Enforce hardwall cpuset memory allocs */
+#define __GFP_NEVER_RECLAIM 0x80000u /* Never attempt to reclaim */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
Index: 2.6/mm/page_alloc.c
===================================================================
--- 2.6.orig/mm/page_alloc.c
+++ 2.6/mm/page_alloc.c
@@ -778,6 +778,7 @@ __alloc_pages(unsigned int __nocast gfp_
struct zonelist *zonelist)
{
const int wait = gfp_mask & __GFP_WAIT;
+ const int can_reclaim = !(gfp_mask & __GFP_NEVER_RECLAIM);
struct zone **zones, *z;
struct page *page;
struct reclaim_state reclaim_state;
@@ -812,7 +813,7 @@ restart:
* See also cpuset_zone_allowed() comment in kernel/cpuset.c.
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
- int do_reclaim = should_reclaim_zone(z, gfp_mask);
+ int do_reclaim = can_reclaim && should_reclaim_zone(z, gfp_mask);
if (!cpuset_zone_allowed(z, __GFP_HARDWALL))
continue;
@@ -840,6 +841,9 @@ zone_reclaim_retry:
goto got_pg;
}
+ if (unlikely(!can_reclaim))
+ goto out;
+
for (i = 0; (z = zones[i]) != NULL; i++)
wakeup_kswapd(z, order);
@@ -966,6 +970,7 @@ nopage:
dump_stack();
show_mem();
}
+out:
return NULL;
got_pg:
zone_statistics(zonelist, z);
On 10/7/05, Pekka Enberg <[email protected]> wrote:
> Hi,
>
> On 10/7/05, Con Kolivas <[email protected]> wrote:
> > Good point, thanks! Any and all feedback is appreciated.
>
> Well, since you asked :-)
>
> > +/*
> > + * How many pages to prefetch at a time. We prefetch SWAP_CLUSTER_MAX *
> > + * swap_prefetch per PREFETCH_INTERVAL, but prefetch ten times as much at a
> > + * time in laptop_mode to minimise the time we keep the disk spinning.
> > + */
> > +#define PREFETCH_PAGES() (SWAP_CLUSTER_MAX * swap_prefetch * \
> > + (1 + 9 * laptop_mode))
>
> This looks strange. Please either drop the parenthesis from PREFETCH_PAGES or
> make it a real static inline function.
Or make it a "const static" variable, so compiler will check types and
everything, but the symbol will not be present in the binary, causing
no overhead. So it could be:
const unsigned PREFETCH_PAGES = (SWAP_CLUSTER_MAX * swap_prefetch * \
(1 + 9 * laptop_mode));
--
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: [email protected]
ICQ#: 17249123
MSN: [email protected]
Skype: gsbarbieri
GPG: 0xB640E1A2 @ wwwkeys.pgp.net
> Or make it a "const static" variable, so compiler will check types and
> everything, but the symbol will not be present in the binary, causing
> no overhead. So it could be:
>
> const unsigned PREFETCH_PAGES = (SWAP_CLUSTER_MAX * swap_prefetch * \
> (1 + 9 * laptop_mode));
This won't work, AFAICT. swap_prefetch and laptop_mode are variables,
but with the code above, they would be evaluated only once. And maybe
the compiler will reject that code immediately...
Rudo.
On 10/7/05, Rudo Thomas <[email protected]> wrote:
> > Or make it a "const static" variable, so compiler will check types and
> > everything, but the symbol will not be present in the binary, causing
> > no overhead. So it could be:
> >
> > const unsigned PREFETCH_PAGES = (SWAP_CLUSTER_MAX * swap_prefetch * \
> > (1 + 9 * laptop_mode));
>
> This won't work, AFAICT. swap_prefetch and laptop_mode are variables,
> but with the code above, they would be evaluated only once. And maybe
> the compiler will reject that code immediately...
Ah... yes, you're right! These may change in runtime.
--
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: [email protected]
ICQ#: 17249123
MSN: [email protected]
Skype: gsbarbieri
GPG: 0xB640E1A2 @ wwwkeys.pgp.net