From: Zi Yan <[email protected]>
migrate_pages() requires at least down_read(mmap_sem) to protect
related page tables and VMAs from changing. Let's do it in
do_page_moves() for both do_move_pages_to_node() and
add_page_for_migration().
Also add this lock requirement in the comment of migrate_pages().
Signed-off-by: Zi Yan <[email protected]>
---
mm/migrate.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 5d0dc7b85f90..52d029953c32 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1354,6 +1354,9 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
* or free list only if ret != 0.
*
* Returns the number of pages that were not migrated, or an error code.
+ *
+ * The caller must hold at least down_read(mmap_sem) for to-be-migrated pages
+ * to protect related page tables and VMAs from changing.
*/
int migrate_pages(struct list_head *from, new_page_t get_new_page,
free_page_t put_new_page, unsigned long private,
@@ -1457,6 +1460,12 @@ static int store_status(int __user *status, int start, int value, int nr)
return 0;
}
+/*
+ * Migrates the pages from pagelist and put back those not migrated.
+ *
+ * The caller must at least hold down_read(mmap_sem), which is required
+ * for migrate_pages()
+ */
static int do_move_pages_to_node(struct mm_struct *mm,
struct list_head *pagelist, int node)
{
@@ -1487,7 +1496,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
unsigned int follflags;
int err;
- down_read(&mm->mmap_sem);
err = -EFAULT;
vma = find_vma(mm, addr);
if (!vma || addr < vma->vm_start || !vma_migratable(vma))
@@ -1540,7 +1548,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
*/
put_page(page);
out:
- up_read(&mm->mmap_sem);
return err;
}
@@ -1561,6 +1568,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
migrate_prep();
+ down_read(&mm->mmap_sem);
for (i = start = 0; i < nr_pages; i++) {
const void __user *p;
unsigned long addr;
@@ -1628,6 +1636,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (!err)
err = err1;
out:
+ up_read(&mm->mmap_sem);
return err;
}
--
2.15.1
On Mon, 29 Jan 2018, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> migrate_pages() requires at least down_read(mmap_sem) to protect
> related page tables and VMAs from changing. Let's do it in
Page tables are protected by their locks. VMAs may change while
migration is active on them, but does that need locking against?
> do_page_moves() for both do_move_pages_to_node() and
> add_page_for_migration().
>
> Also add this lock requirement in the comment of migrate_pages().
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/migrate.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5d0dc7b85f90..52d029953c32 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1354,6 +1354,9 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> * or free list only if ret != 0.
> *
> * Returns the number of pages that were not migrated, or an error code.
> + *
> + * The caller must hold at least down_read(mmap_sem) for to-be-migrated pages
> + * to protect related page tables and VMAs from changing.
I have not been keeping up with Michal's recent migration changes,
but migrate_pages() never used to need mmap_sem held (despite being
called with an mmap_sem held from some of its callsites), and it
would be a backward step to require that now.
There is not even an mm argument to migrate_pages(), so which
mm->mmap_sem do you think would be required for it? There may be
particular cases in which it is required (when the new_page function
involves the old_page's vma - is that so below?), but in general not.
Hugh
> */
> int migrate_pages(struct list_head *from, new_page_t get_new_page,
> free_page_t put_new_page, unsigned long private,
> @@ -1457,6 +1460,12 @@ static int store_status(int __user *status, int start, int value, int nr)
> return 0;
> }
>
> +/*
> + * Migrates the pages from pagelist and put back those not migrated.
> + *
> + * The caller must at least hold down_read(mmap_sem), which is required
> + * for migrate_pages()
> + */
> static int do_move_pages_to_node(struct mm_struct *mm,
> struct list_head *pagelist, int node)
> {
> @@ -1487,7 +1496,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
> unsigned int follflags;
> int err;
>
> - down_read(&mm->mmap_sem);
> err = -EFAULT;
> vma = find_vma(mm, addr);
> if (!vma || addr < vma->vm_start || !vma_migratable(vma))
> @@ -1540,7 +1548,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
> */
> put_page(page);
> out:
> - up_read(&mm->mmap_sem);
> return err;
> }
>
> @@ -1561,6 +1568,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>
> migrate_prep();
>
> + down_read(&mm->mmap_sem);
> for (i = start = 0; i < nr_pages; i++) {
> const void __user *p;
> unsigned long addr;
> @@ -1628,6 +1636,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> if (!err)
> err = err1;
> out:
> + up_read(&mm->mmap_sem);
> return err;
> }
>
> --
> 2.15.1
On Mon 29-01-18 22:00:11, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> migrate_pages() requires at least down_read(mmap_sem) to protect
> related page tables and VMAs from changing. Let's do it in
> do_page_moves() for both do_move_pages_to_node() and
> add_page_for_migration().
>
> Also add this lock requirement in the comment of migrate_pages().
This doesn't make much sense to me, to be honest. We are holding
mmap_sem for _read_ so we allow parallel updates like page faults
or unmaps. Therefore we are isolating pages prior to the migration.
The sole purpose of the mmap_sem in add_page_for_migration is to protect
from vma going away _while_ need it to get the proper page.
Moving the lock up is just wrong because it allows caller to hold the
lock for way too long if a lot of pages is migrated. Not only that,
it is even incorrect because we are doing get_user() (aka page fault)
and while read lock recursion is OK, we might block and deadlock when
there is a writer pending. I haven't checked the current implementation
of semaphores but I believe we do not allow recursive locking.
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/migrate.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5d0dc7b85f90..52d029953c32 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1354,6 +1354,9 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> * or free list only if ret != 0.
> *
> * Returns the number of pages that were not migrated, or an error code.
> + *
> + * The caller must hold at least down_read(mmap_sem) for to-be-migrated pages
> + * to protect related page tables and VMAs from changing.
> */
> int migrate_pages(struct list_head *from, new_page_t get_new_page,
> free_page_t put_new_page, unsigned long private,
> @@ -1457,6 +1460,12 @@ static int store_status(int __user *status, int start, int value, int nr)
> return 0;
> }
>
> +/*
> + * Migrates the pages from pagelist and put back those not migrated.
> + *
> + * The caller must at least hold down_read(mmap_sem), which is required
> + * for migrate_pages()
> + */
> static int do_move_pages_to_node(struct mm_struct *mm,
> struct list_head *pagelist, int node)
> {
> @@ -1487,7 +1496,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
> unsigned int follflags;
> int err;
>
> - down_read(&mm->mmap_sem);
> err = -EFAULT;
> vma = find_vma(mm, addr);
> if (!vma || addr < vma->vm_start || !vma_migratable(vma))
> @@ -1540,7 +1548,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
> */
> put_page(page);
> out:
> - up_read(&mm->mmap_sem);
> return err;
> }
>
> @@ -1561,6 +1568,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>
> migrate_prep();
>
> + down_read(&mm->mmap_sem);
> for (i = start = 0; i < nr_pages; i++) {
> const void __user *p;
> unsigned long addr;
> @@ -1628,6 +1636,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> if (!err)
> err = err1;
> out:
> + up_read(&mm->mmap_sem);
> return err;
> }
>
> --
> 2.15.1
--
Michal Hocko
SUSE Labs
Hugh Dickins wrote:
> On Mon, 29 Jan 2018, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> migrate_pages() requires at least down_read(mmap_sem) to protect
>> related page tables and VMAs from changing. Let's do it in
>
> Page tables are protected by their locks. VMAs may change while
> migration is active on them, but does that need locking against?
>
>> do_page_moves() for both do_move_pages_to_node() and
>> add_page_for_migration().
>>
>> Also add this lock requirement in the comment of migrate_pages().
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/migrate.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 5d0dc7b85f90..52d029953c32 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1354,6 +1354,9 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>> * or free list only if ret != 0.
>> *
>> * Returns the number of pages that were not migrated, or an error code.
>> + *
>> + * The caller must hold at least down_read(mmap_sem) for to-be-migrated pages
>> + * to protect related page tables and VMAs from changing.
>
> I have not been keeping up with Michal's recent migration changes,
> but migrate_pages() never used to need mmap_sem held (despite being
> called with an mmap_sem held from some of its callsites), and it
> would be a backward step to require that now.
>
> There is not even an mm argument to migrate_pages(), so which
> mm->mmap_sem do you think would be required for it? There may be
> particular cases in which it is required (when the new_page function
> involves the old_page's vma - is that so below?), but in general not.
mmap_sem is held during migrate_pages() in current implementation.
http://elixir.free-electrons.com/linux/latest/source/mm/migrate.c#L1576
--
Best Regards,
Yan Zi
Michal Hocko wrote:
> On Mon 29-01-18 22:00:11, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> migrate_pages() requires at least down_read(mmap_sem) to protect
>> related page tables and VMAs from changing. Let's do it in
>> do_page_moves() for both do_move_pages_to_node() and
>> add_page_for_migration().
>>
>> Also add this lock requirement in the comment of migrate_pages().
>
> This doesn't make much sense to me, to be honest. We are holding
> mmap_sem for _read_ so we allow parallel updates like page faults
> or unmaps. Therefore we are isolating pages prior to the migration.
>
> The sole purpose of the mmap_sem in add_page_for_migration is to protect
> from vma going away _while_ need it to get the proper page.
Then, I am wondering why we are holding mmap_sem when calling
migrate_pages() in existing code.
http://elixir.free-electrons.com/linux/latest/source/mm/migrate.c#L1576
>
> Moving the lock up is just wrong because it allows caller to hold the
> lock for way too long if a lot of pages is migrated. Not only that,
> it is even incorrect because we are doing get_user() (aka page fault)
> and while read lock recursion is OK, we might block and deadlock when
> there is a writer pending. I haven't checked the current implementation
> of semaphores but I believe we do not allow recursive locking.
>
Sorry, I missed that. If mmap_sem is not needed for migrate_pages(),
please ignore this patch.
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/migrate.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 5d0dc7b85f90..52d029953c32 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1354,6 +1354,9 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>> * or free list only if ret != 0.
>> *
>> * Returns the number of pages that were not migrated, or an error code.
>> + *
>> + * The caller must hold at least down_read(mmap_sem) for to-be-migrated pages
>> + * to protect related page tables and VMAs from changing.
>> */
>> int migrate_pages(struct list_head *from, new_page_t get_new_page,
>> free_page_t put_new_page, unsigned long private,
>> @@ -1457,6 +1460,12 @@ static int store_status(int __user *status, int start, int value, int nr)
>> return 0;
>> }
>>
>> +/*
>> + * Migrates the pages from pagelist and put back those not migrated.
>> + *
>> + * The caller must at least hold down_read(mmap_sem), which is required
>> + * for migrate_pages()
>> + */
>> static int do_move_pages_to_node(struct mm_struct *mm,
>> struct list_head *pagelist, int node)
>> {
>> @@ -1487,7 +1496,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>> unsigned int follflags;
>> int err;
>>
>> - down_read(&mm->mmap_sem);
>> err = -EFAULT;
>> vma = find_vma(mm, addr);
>> if (!vma || addr < vma->vm_start || !vma_migratable(vma))
>> @@ -1540,7 +1548,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>> */
>> put_page(page);
>> out:
>> - up_read(&mm->mmap_sem);
>> return err;
>> }
>>
>> @@ -1561,6 +1568,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>
>> migrate_prep();
>>
>> + down_read(&mm->mmap_sem);
>> for (i = start = 0; i < nr_pages; i++) {
>> const void __user *p;
>> unsigned long addr;
>> @@ -1628,6 +1636,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> if (!err)
>> err = err1;
>> out:
>> + up_read(&mm->mmap_sem);
>> return err;
>> }
>>
>> --
>> 2.15.1
>
--
Best Regards,
Yan Zi
On Tue 30-01-18 10:52:58, Zi Yan wrote:
>
>
> Michal Hocko wrote:
> > On Mon 29-01-18 22:00:11, Zi Yan wrote:
> >> From: Zi Yan <[email protected]>
> >>
> >> migrate_pages() requires at least down_read(mmap_sem) to protect
> >> related page tables and VMAs from changing. Let's do it in
> >> do_page_moves() for both do_move_pages_to_node() and
> >> add_page_for_migration().
> >>
> >> Also add this lock requirement in the comment of migrate_pages().
> >
> > This doesn't make much sense to me, to be honest. We are holding
> > mmap_sem for _read_ so we allow parallel updates like page faults
> > or unmaps. Therefore we are isolating pages prior to the migration.
> >
> > The sole purpose of the mmap_sem in add_page_for_migration is to protect
> > from vma going away _while_ need it to get the proper page.
>
> Then, I am wondering why we are holding mmap_sem when calling
> migrate_pages() in existing code.
> http://elixir.free-electrons.com/linux/latest/source/mm/migrate.c#L1576
You mean in the original code? I strongly suspect this was to not take
it for each page.
--
Michal Hocko
SUSE Labs
On 30 Jan 2018, at 11:10, Michal Hocko wrote:
> On Tue 30-01-18 10:52:58, Zi Yan wrote:
>>
>>
>> Michal Hocko wrote:
>>> On Mon 29-01-18 22:00:11, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> migrate_pages() requires at least down_read(mmap_sem) to protect
>>>> related page tables and VMAs from changing. Let's do it in
>>>> do_page_moves() for both do_move_pages_to_node() and
>>>> add_page_for_migration().
>>>>
>>>> Also add this lock requirement in the comment of migrate_pages().
>>>
>>> This doesn't make much sense to me, to be honest. We are holding
>>> mmap_sem for _read_ so we allow parallel updates like page faults
>>> or unmaps. Therefore we are isolating pages prior to the migration.
>>>
>>> The sole purpose of the mmap_sem in add_page_for_migration is to protect
>>> from vma going away _while_ need it to get the proper page.
>>
>> Then, I am wondering why we are holding mmap_sem when calling
>> migrate_pages() in existing code.
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Felixir.free-electrons.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmigrate.c%23L1576&data=02%7C01%7Czi.yan%40cs.rutgers.edu%7C855d86d83cff4669d25f08d567fbfb8d%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636529254319323899&sdata=Ba8F7IHIjxDRV%2FeGg7883wlRBmDHQW6pbZubAWZDNLs%3D&reserved=0
>
> You mean in the original code? I strongly suspect this was to not take
> it for each page.
Right. The original code gathers 169 pages, whose information (struct page_to_node, 24bytes)
fits into a 4KB page, then migrates them at a time. So mmap_sem is not held for long
in the original code, because of this design.
I think the question is whether we need to hold mmap_sem for migrate_pages(). Hugh
also agrees it is not necessary on a separate email. But it is held in the original code.
--
Best Regards
Yan Zi
On Tue 30-01-18 14:12:28, Zi Yan wrote:
> On 30 Jan 2018, at 11:10, Michal Hocko wrote:
[...]
> I think the question is whether we need to hold mmap_sem for
> migrate_pages(). Hugh also agrees it is not necessary on a separate
> email. But it is held in the original code.
I would be really surprised if we really needed the lock. If we do,
however, then we really need a very good explanation why. The code used
to do so is not a valid reason.
--
Michal Hocko
SUSE Labs