2006-02-08 09:58:21

by Samuel Thibault

[permalink] [raw]
Subject: Direct Migration and "Affinity on next touch" ?

Hi,

Direct Migration support is quite great, but some "migration on next
touch" (aka affinity on next touch) would be quite useful too.

A bunch of parallel applications have an sequential part that
initializes all data. Then threads are launched for achieving the actual
computation in parallel. However, with a simple affinity on first touch
policy, all data is allocated on the node which initialization ran
on. Manually migrating data where threads will eventually run on may
really not be easy.

A "simple" (from the userland point of view) solution is to have
an "affinity on next touch" policy that would be set _after_
initialization, for instance the application would:
- initialize data, which gets allocated on some node ;
- call mbind(data, size, MPOL_DEFAULT, NULL, 0, MPOL_MF_NEXTTOUCH), that
records the new policy and invalidates pages ;
- run threads ;
- threads start computing, hence they touch data pages; the page fault
handler migrates these pages to the node on which the fault occured,
i.e. hopefully the node on which it will be mostly used (this is
generally true with such applications) ;
- after very little time, data pages are distributed as appropriate, and
then the computation runs fast.

The cost of page fault + migration is quickly compensated by the
resulting better data distribution. Being able to ask for bigger page
sizes would also reduce page fault cot.

Solaris implements this solution through madvise(data, size,
MADV_ACCESS_LWP); (see Solaris' madvise() manpage
http://docs.sun.com/app/docs/doc/817-0677/6mgf9b66i?a=view ). Using this
facility can bring quite interesting performance improvements:
(for instance "affinity-on-next-touch: increasing the
performance of an industrial PDE solver on a cc-NUMA system":
http://portal.acm.org/ft_gateway.cfm%3Fid=1088201%26type=pdf )

Could such facility be implemented?

Regards,
Samuel