2019-06-09 10:10:15

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 0/6] mm: use down_read_killable for locking mmap_sem in proc

v1:
https://lore.kernel.org/lkml/155790967258.1319.11531787078240675602.stgit@buzz/

v1 "mm: use down_read_killable for locking mmap_sem in access_remote_vm"
https://lore.kernel.org/lkml/155790847881.2798.7160461383704600177.stgit@buzz/

changes since v1:
* update comments and collect acks/reviews.

---

Konstantin Khlebnikov (6):
proc: use down_read_killable mmap_sem for /proc/pid/maps
proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup
proc: use down_read_killable mmap_sem for /proc/pid/pagemap
proc: use down_read_killable mmap_sem for /proc/pid/clear_refs
proc: use down_read_killable mmap_sem for /proc/pid/map_files
mm: use down_read_killable for locking mmap_sem in access_remote_vm


fs/proc/base.c | 27 +++++++++++++++++++++------
fs/proc/task_mmu.c | 23 ++++++++++++++++++-----
fs/proc/task_nommu.c | 6 +++++-
mm/memory.c | 4 +++-
mm/nommu.c | 3 ++-
5 files changed, 49 insertions(+), 14 deletions(-)

--
Signature


2019-06-09 10:10:26

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 3/6] proc: use down_read_killable mmap_sem for /proc/pid/pagemap

Do not stuck forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
fs/proc/task_mmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 781879a91e3b..78bed6adc62d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1547,7 +1547,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
/* overflow ? */
if (end < start_vaddr || end > end_vaddr)
end = end_vaddr;
- down_read(&mm->mmap_sem);
+ ret = down_read_killable(&mm->mmap_sem);
+ if (ret)
+ goto out_free;
ret = walk_page_range(start_vaddr, end, &pagemap_walk);
up_read(&mm->mmap_sem);
start_vaddr = end;

2019-06-09 10:10:31

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

Do not stuck forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

It seems ->d_revalidate() could return any error (except ECHILD) to
abort validation and pass error as result of lookup sequence.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
fs/proc/base.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9c8ca6cd3ce4..515ab29c2adf 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1962,9 +1962,12 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
goto out;

if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
- down_read(&mm->mmap_sem);
- exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
- up_read(&mm->mmap_sem);
+ status = down_read_killable(&mm->mmap_sem);
+ if (!status) {
+ exact_vma_exists = !!find_exact_vma(mm, vm_start,
+ vm_end);
+ up_read(&mm->mmap_sem);
+ }
}

mmput(mm);
@@ -2010,8 +2013,11 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
if (rc)
goto out_mmput;

+ rc = down_read_killable(&mm->mmap_sem);
+ if (rc)
+ goto out_mmput;
+
rc = -ENOENT;
- down_read(&mm->mmap_sem);
vma = find_exact_vma(mm, vm_start, vm_end);
if (vma && vma->vm_file) {
*path = vma->vm_file->f_path;
@@ -2107,7 +2113,10 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
if (!mm)
goto out_put_task;

- down_read(&mm->mmap_sem);
+ result = ERR_PTR(-EINTR);
+ if (down_read_killable(&mm->mmap_sem))
+ goto out_put_mm;
+
vma = find_exact_vma(mm, vm_start, vm_end);
if (!vma)
goto out_no_vma;
@@ -2118,6 +2127,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,

out_no_vma:
up_read(&mm->mmap_sem);
+out_put_mm:
mmput(mm);
out_put_task:
put_task_struct(task);
@@ -2160,7 +2170,12 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
mm = get_task_mm(task);
if (!mm)
goto out_put_task;
- down_read(&mm->mmap_sem);
+
+ ret = down_read_killable(&mm->mmap_sem);
+ if (ret) {
+ mmput(mm);
+ goto out_put_task;
+ }

nr_files = 0;


2019-06-09 10:11:45

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 2/6] proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup

Do not stuck forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
fs/proc/task_mmu.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2bf210229daf..781879a91e3b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -832,7 +832,10 @@ static int show_smaps_rollup(struct seq_file *m, void *v)

memset(&mss, 0, sizeof(mss));

- down_read(&mm->mmap_sem);
+ ret = down_read_killable(&mm->mmap_sem);
+ if (ret)
+ goto out_put_mm;
+
hold_task_mempolicy(priv);

for (vma = priv->mm->mmap; vma; vma = vma->vm_next) {
@@ -849,8 +852,9 @@ static int show_smaps_rollup(struct seq_file *m, void *v)

release_task_mempolicy(priv);
up_read(&mm->mmap_sem);
- mmput(mm);

+out_put_mm:
+ mmput(mm);
out_put_task:
put_task_struct(priv->task);
priv->task = NULL;

2019-06-09 10:12:12

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 4/6] proc: use down_read_killable mmap_sem for /proc/pid/clear_refs

Do not stuck forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

Replace the only unkillable mmap_sem lock in clear_refs_write.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
fs/proc/task_mmu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 78bed6adc62d..7f84d1477b5b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1140,7 +1140,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
goto out_mm;
}

- down_read(&mm->mmap_sem);
+ if (down_read_killable(&mm->mmap_sem)) {
+ count = -EINTR;
+ goto out_mm;
+ }
tlb_gather_mmu(&tlb, mm, 0, -1);
if (type == CLEAR_REFS_SOFT_DIRTY) {
for (vma = mm->mmap; vma; vma = vma->vm_next) {

2019-06-09 10:13:24

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 6/6] mm: use down_read_killable for locking mmap_sem in access_remote_vm

This function is used by ptrace and proc files like /proc/pid/cmdline and
/proc/pid/environ.

Access_remote_vm never returns error codes, all errors are ignored and
only size of successfully read data is returned. So, if current task was
killed we'll simply return 0 (bytes read).

Mmap_sem could be locked for a long time or forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
mm/memory.c | 4 +++-
mm/nommu.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ddf20bd0c317..9a4401d21e94 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4349,7 +4349,9 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;

- down_read(&mm->mmap_sem);
+ if (down_read_killable(&mm->mmap_sem))
+ return 0;
+
/* ignore errors, just check how much was successfully transferred */
while (len) {
int bytes, ret, offset;
diff --git a/mm/nommu.c b/mm/nommu.c
index d8c02fbe03b5..b2823519f8cd 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1792,7 +1792,8 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
int write = gup_flags & FOLL_WRITE;

- down_read(&mm->mmap_sem);
+ if (down_read_killable(&mm->mmap_sem))
+ return 0;

/* the access must start within one of the target process's mappings */
vma = find_vma(mm, addr);

2019-06-09 10:25:12

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2 1/6] proc: use down_read_killable mmap_sem for /proc/pid/maps

Do not stuck forever if something wrong.
Killable lock allows to cleanup stuck tasks and simplifies investigation.

This function also used for /proc/pid/smaps.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
fs/proc/task_mmu.c | 6 +++++-
fs/proc/task_nommu.c | 6 +++++-
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 01d4eb0e6bd1..2bf210229daf 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -166,7 +166,11 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
if (!mm || !mmget_not_zero(mm))
return NULL;

- down_read(&mm->mmap_sem);
+ if (down_read_killable(&mm->mmap_sem)) {
+ mmput(mm);
+ return ERR_PTR(-EINTR);
+ }
+
hold_task_mempolicy(priv);
priv->tail_vma = get_gate_vma(mm);

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 36bf0f2e102e..7907e6419e57 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -211,7 +211,11 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!mm || !mmget_not_zero(mm))
return NULL;

- down_read(&mm->mmap_sem);
+ if (down_read_killable(&mm->mmap_sem)) {
+ mmput(mm);
+ return ERR_PTR(-EINTR);
+ }
+
/* start from the Nth VMA */
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p))
if (n-- == 0)

2019-06-13 16:34:03

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

On 13.06.2019 2:14, Andrei Vagin wrote:
> On Sun, Jun 09, 2019 at 01:09:00PM +0300, Konstantin Khlebnikov wrote:
>> Do not stuck forever if something wrong.
>> Killable lock allows to cleanup stuck tasks and simplifies investigation.
>
> This patch breaks the CRIU project, because stat() returns EINTR instead
> of ENOENT:
>
> [root@fc24 criu]# stat /proc/self/map_files/0-0
> stat: cannot stat '/proc/self/map_files/0-0': Interrupted system call

Good catch.

It seems CRIU tests has good coverage for darkest corners of kernel API.
Kernel CI projects should use it. I suppose you know how to promote this. =)

>
> Here is one inline comment with the fix for this issue.
>
>>
>> It seems ->d_revalidate() could return any error (except ECHILD) to
>> abort validation and pass error as result of lookup sequence.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>> Reviewed-by: Roman Gushchin <[email protected]>
>> Reviewed-by: Cyrill Gorcunov <[email protected]>
>> Reviewed-by: Kirill Tkhai <[email protected]>
>
> It was nice to see all four of you in one place :).
>
>> Acked-by: Michal Hocko <[email protected]>
>> ---
>> fs/proc/base.c | 27 +++++++++++++++++++++------
>> 1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> index 9c8ca6cd3ce4..515ab29c2adf 100644
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -1962,9 +1962,12 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
>> goto out;
>>
>> if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
>> - down_read(&mm->mmap_sem);
>> - exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
>> - up_read(&mm->mmap_sem);
>> + status = down_read_killable(&mm->mmap_sem);
>> + if (!status) {
>> + exact_vma_exists = !!find_exact_vma(mm, vm_start,
>> + vm_end);
>> + up_read(&mm->mmap_sem);
>> + }
>> }
>>
>> mmput(mm);
>> @@ -2010,8 +2013,11 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
>> if (rc)
>> goto out_mmput;
>>
>> + rc = down_read_killable(&mm->mmap_sem);
>> + if (rc)
>> + goto out_mmput;
>> +
>> rc = -ENOENT;
>> - down_read(&mm->mmap_sem);
>> vma = find_exact_vma(mm, vm_start, vm_end);
>> if (vma && vma->vm_file) {
>> *path = vma->vm_file->f_path;
>> @@ -2107,7 +2113,10 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
>> if (!mm)
>> goto out_put_task;
>>
>> - down_read(&mm->mmap_sem);
>> + result = ERR_PTR(-EINTR);
>> + if (down_read_killable(&mm->mmap_sem))
>> + goto out_put_mm;
>> +
>
> result = ERR_PTR(-ENOENT);
>
>> vma = find_exact_vma(mm, vm_start, vm_end);
>> if (!vma)
>> goto out_no_vma;
>> @@ -2118,6 +2127,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
>>
>> out_no_vma:
>> up_read(&mm->mmap_sem);
>> +out_put_mm:
>> mmput(mm);
>> out_put_task:
>> put_task_struct(task);
>> @@ -2160,7 +2170,12 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
>> mm = get_task_mm(task);
>> if (!mm)
>> goto out_put_task;
>> - down_read(&mm->mmap_sem);
>> +
>> + ret = down_read_killable(&mm->mmap_sem);
>> + if (ret) {
>> + mmput(mm);
>> + goto out_put_task;
>> + }
>>
>> nr_files = 0;
>>

2019-06-13 16:42:55

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

On Wed, Jun 12, 2019 at 04:14:28PM -0700, Andrei Vagin wrote:
> On Sun, Jun 09, 2019 at 01:09:00PM +0300, Konstantin Khlebnikov wrote:
> > Do not stuck forever if something wrong.
> > Killable lock allows to cleanup stuck tasks and simplifies investigation.
>
> This patch breaks the CRIU project, because stat() returns EINTR instead
> of ENOENT:
>
> [root@fc24 criu]# stat /proc/self/map_files/0-0
> stat: cannot stat '/proc/self/map_files/0-0': Interrupted system call
>
> Here is one inline comment with the fix for this issue.
>
> >
> > It seems ->d_revalidate() could return any error (except ECHILD) to
> > abort validation and pass error as result of lookup sequence.
> >
> > Signed-off-by: Konstantin Khlebnikov <[email protected]>
> > Reviewed-by: Roman Gushchin <[email protected]>
> > Reviewed-by: Cyrill Gorcunov <[email protected]>
> > Reviewed-by: Kirill Tkhai <[email protected]>
>
> It was nice to see all four of you in one place :).

Holymoly ;) And we all managed to miss this error code.
Thanks, Andrew!

2019-06-13 17:00:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

On Wed, 12 Jun 2019 16:14:28 -0700 Andrei Vagin <[email protected]> wrote:

> On Sun, Jun 09, 2019 at 01:09:00PM +0300, Konstantin Khlebnikov wrote:
> > Do not stuck forever if something wrong.
> > Killable lock allows to cleanup stuck tasks and simplifies investigation.
>
> This patch breaks the CRIU project, because stat() returns EINTR instead
> of ENOENT:
>
> [root@fc24 criu]# stat /proc/self/map_files/0-0
> stat: cannot stat '/proc/self/map_files/0-0': Interrupted system call
>
> Here is one inline comment with the fix for this issue.
>
> > @@ -2107,7 +2113,10 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
> > if (!mm)
> > goto out_put_task;
> >
> > - down_read(&mm->mmap_sem);
> > + result = ERR_PTR(-EINTR);
> > + if (down_read_killable(&mm->mmap_sem))
> > + goto out_put_mm;
> > +
>
> result = ERR_PTR(-ENOENT);
>

yes, thanks.

--- a/fs/proc/base.c~proc-use-down_read_killable-mmap_sem-for-proc-pid-map_files-fix
+++ a/fs/proc/base.c
@@ -2117,6 +2117,7 @@ static struct dentry *proc_map_files_loo
if (down_read_killable(&mm->mmap_sem))
goto out_put_mm;

+ result = ERR_PTR(-ENOENT);
vma = find_exact_vma(mm, vm_start, vm_end);
if (!vma)
goto out_no_vma;



We started doing this trick of using

ret = -EFOO;
if (something)
goto out;

decades ago because it generated slightly better code. I rather doubt
whether that's still the case.

In fact this:

--- a/fs/proc/base.c~a
+++ a/fs/proc/base.c
@@ -2096,35 +2096,45 @@ static struct dentry *proc_map_files_loo
struct dentry *result;
struct mm_struct *mm;

- result = ERR_PTR(-ENOENT);
task = get_proc_task(dir);
- if (!task)
+ if (!task) {
+ result = ERR_PTR(-ENOENT);
goto out;
+ }

- result = ERR_PTR(-EACCES);
- if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+ if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
+ result = ERR_PTR(-EACCES);
goto out_put_task;
+ }

- result = ERR_PTR(-ENOENT);
- if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
+ if (dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
+ result = ERR_PTR(-ENOENT);
goto out_put_task;
+ }

mm = get_task_mm(task);
- if (!mm)
+ if (!mm) {
+ result = ERR_PTR(-ENOENT);
goto out_put_task;
+ }

- result = ERR_PTR(-EINTR);
- if (down_read_killable(&mm->mmap_sem))
+ if (down_read_killable(&mm->mmap_sem)) {
+ result = ERR_PTR(-EINTR);
goto out_put_mm;
+ }

- result = ERR_PTR(-ENOENT);
vma = find_exact_vma(mm, vm_start, vm_end);
- if (!vma)
+ if (!vma) {
+ result = ERR_PTR(-ENOENT);
goto out_no_vma;
+ }

- if (vma->vm_file)
+ if (vma->vm_file) {
result = proc_map_files_instantiate(dentry, task,
(void *)(unsigned long)vma->vm_file->f_mode);
+ } else {
+ result = ERR_PTR(-ENOENT);
+ }

out_no_vma:
up_read(&mm->mmap_sem);

makes no change to the generated assembly with gcc-7.2.0.

And I do think that the above style is clearer and more reliable, as
this bug demonstrates.

2019-06-13 17:03:27

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

On Sun, Jun 09, 2019 at 01:09:00PM +0300, Konstantin Khlebnikov wrote:
> Do not stuck forever if something wrong.
> Killable lock allows to cleanup stuck tasks and simplifies investigation.

This patch breaks the CRIU project, because stat() returns EINTR instead
of ENOENT:

[root@fc24 criu]# stat /proc/self/map_files/0-0
stat: cannot stat '/proc/self/map_files/0-0': Interrupted system call

Here is one inline comment with the fix for this issue.

>
> It seems ->d_revalidate() could return any error (except ECHILD) to
> abort validation and pass error as result of lookup sequence.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> Reviewed-by: Roman Gushchin <[email protected]>
> Reviewed-by: Cyrill Gorcunov <[email protected]>
> Reviewed-by: Kirill Tkhai <[email protected]>

It was nice to see all four of you in one place :).

> Acked-by: Michal Hocko <[email protected]>
> ---
> fs/proc/base.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 9c8ca6cd3ce4..515ab29c2adf 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1962,9 +1962,12 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
> goto out;
>
> if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
> - down_read(&mm->mmap_sem);
> - exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
> - up_read(&mm->mmap_sem);
> + status = down_read_killable(&mm->mmap_sem);
> + if (!status) {
> + exact_vma_exists = !!find_exact_vma(mm, vm_start,
> + vm_end);
> + up_read(&mm->mmap_sem);
> + }
> }
>
> mmput(mm);
> @@ -2010,8 +2013,11 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
> if (rc)
> goto out_mmput;
>
> + rc = down_read_killable(&mm->mmap_sem);
> + if (rc)
> + goto out_mmput;
> +
> rc = -ENOENT;
> - down_read(&mm->mmap_sem);
> vma = find_exact_vma(mm, vm_start, vm_end);
> if (vma && vma->vm_file) {
> *path = vma->vm_file->f_path;
> @@ -2107,7 +2113,10 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
> if (!mm)
> goto out_put_task;
>
> - down_read(&mm->mmap_sem);
> + result = ERR_PTR(-EINTR);
> + if (down_read_killable(&mm->mmap_sem))
> + goto out_put_mm;
> +

result = ERR_PTR(-ENOENT);

> vma = find_exact_vma(mm, vm_start, vm_end);
> if (!vma)
> goto out_no_vma;
> @@ -2118,6 +2127,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
>
> out_no_vma:
> up_read(&mm->mmap_sem);
> +out_put_mm:
> mmput(mm);
> out_put_task:
> put_task_struct(task);
> @@ -2160,7 +2170,12 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
> mm = get_task_mm(task);
> if (!mm)
> goto out_put_task;
> - down_read(&mm->mmap_sem);
> +
> + ret = down_read_killable(&mm->mmap_sem);
> + if (ret) {
> + mmput(mm);
> + goto out_put_task;
> + }
>
> nr_files = 0;
>

2019-06-13 20:33:06

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] proc: use down_read_killable mmap_sem for /proc/pid/map_files

On Thu, Jun 13, 2019 at 1:15 AM Konstantin Khlebnikov
<[email protected]> wrote:
>
> On 13.06.2019 2:14, Andrei Vagin wrote:
> > On Sun, Jun 09, 2019 at 01:09:00PM +0300, Konstantin Khlebnikov wrote:
> >> Do not stuck forever if something wrong.
> >> Killable lock allows to cleanup stuck tasks and simplifies investigation.
> >
> > This patch breaks the CRIU project, because stat() returns EINTR instead
> > of ENOENT:
> >
> > [root@fc24 criu]# stat /proc/self/map_files/0-0
> > stat: cannot stat '/proc/self/map_files/0-0': Interrupted system call
>
> Good catch.
>
> It seems CRIU tests has good coverage for darkest corners of kernel API.
> Kernel CI projects should use it. I suppose you know how to promote this. =)

I remember Mike was trying to add the CRIU test suite into kernel-ci,
but it looks like this ended up with nothing.

The good thing here is that we have our own kernel-ci:
https://travis-ci.org/avagin/linux/builds

Travis-CI doesn't allow to replace the kernel, so we use CRIU to
dump/restore a ssh session and travis doesn't notice when we kexec a
new kernel.

>
> >
> > Here is one inline comment with the fix for this issue.
> >
> >>
> >> It seems ->d_revalidate() could return any error (except ECHILD) to
> >> abort validation and pass error as result of lookup sequence.
> >>
> >> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> >> Reviewed-by: Roman Gushchin <[email protected]>
> >> Reviewed-by: Cyrill Gorcunov <[email protected]>
> >> Reviewed-by: Kirill Tkhai <[email protected]>
> >
> > It was nice to see all four of you in one place :).
> >
> >> Acked-by: Michal Hocko <[email protected]>
> >> ---
> >> fs/proc/base.c | 27 +++++++++++++++++++++------
> >> 1 file changed, 21 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/proc/base.c b/fs/proc/base.c
> >> index 9c8ca6cd3ce4..515ab29c2adf 100644
> >> --- a/fs/proc/base.c
> >> +++ b/fs/proc/base.c
> >> @@ -1962,9 +1962,12 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
> >> goto out;
> >>
> >> if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
> >> - down_read(&mm->mmap_sem);
> >> - exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
> >> - up_read(&mm->mmap_sem);
> >> + status = down_read_killable(&mm->mmap_sem);
> >> + if (!status) {
> >> + exact_vma_exists = !!find_exact_vma(mm, vm_start,
> >> + vm_end);
> >> + up_read(&mm->mmap_sem);
> >> + }
> >> }
> >>
> >> mmput(mm);
> >> @@ -2010,8 +2013,11 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
> >> if (rc)
> >> goto out_mmput;
> >>
> >> + rc = down_read_killable(&mm->mmap_sem);
> >> + if (rc)
> >> + goto out_mmput;
> >> +
> >> rc = -ENOENT;
> >> - down_read(&mm->mmap_sem);
> >> vma = find_exact_vma(mm, vm_start, vm_end);
> >> if (vma && vma->vm_file) {
> >> *path = vma->vm_file->f_path;
> >> @@ -2107,7 +2113,10 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
> >> if (!mm)
> >> goto out_put_task;
> >>
> >> - down_read(&mm->mmap_sem);
> >> + result = ERR_PTR(-EINTR);
> >> + if (down_read_killable(&mm->mmap_sem))
> >> + goto out_put_mm;
> >> +
> >
> > result = ERR_PTR(-ENOENT);
> >
> >> vma = find_exact_vma(mm, vm_start, vm_end);
> >> if (!vma)
> >> goto out_no_vma;
> >> @@ -2118,6 +2127,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
> >>
> >> out_no_vma:
> >> up_read(&mm->mmap_sem);
> >> +out_put_mm:
> >> mmput(mm);
> >> out_put_task:
> >> put_task_struct(task);
> >> @@ -2160,7 +2170,12 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
> >> mm = get_task_mm(task);
> >> if (!mm)
> >> goto out_put_task;
> >> - down_read(&mm->mmap_sem);
> >> +
> >> + ret = down_read_killable(&mm->mmap_sem);
> >> + if (ret) {
> >> + mmput(mm);
> >> + goto out_put_task;
> >> + }
> >>
> >> nr_files = 0;
> >>