Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7198695imu; Tue, 22 Jan 2019 01:58:00 -0800 (PST) X-Google-Smtp-Source: ALg8bN6h7C4tSQEETUqebXyipYaPEmw509hvr3Fvsh//jLrMI2inXKCgr03YF0Qn9DgBVbTP25iP X-Received: by 2002:a17:902:2a89:: with SMTP id j9mr34084715plb.296.1548151080384; Tue, 22 Jan 2019 01:58:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548151080; cv=none; d=google.com; s=arc-20160816; b=MQxkTZcFPP3t3QwyhAiur9uS3+Yekvj5UiX7Q+F3ohUGXnDL4sS1j7LImesRUjjNm9 eRp90q4YVPQq79ns9pspYsKXfOmVfOCpt1Lar14W4gNxzGKyp2yhSAxLVkQH7r40gyc0 mM2JBUD5V9LIj2TTr4ILPjTcRfMpX4qGlqaJBS305zPdAO1uhctjRzO4h7g/eMbpEPkv f97WUeTdrKY3h57ehrT8hB5GaBNWHCX3jTmLOPHRTotzgGq/1U/3ap9EnDPwx75J7ejq khWIHk1t1C6jvmwawCQmfMAGGxqA/l5ylPVCQwsxSK33tqyoqHQr+Wh3KnEuATEhbr4K qk2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=fpuo+FKiR844guhUSW7z1MvnKZ4Ldmax8fNE/L5NZR4=; b=CIGCVfQ8xSkK71VxJqeH9djwqeBNdsooQqigA3gtVte9IMPR/Q8gncVP/D8N4NGuMS sXGDHPFrfVJd8es41+j1Hz2ImZ1pmHHH80s23Lu0qKXnPb4jAhYtqsWDkD5NqYNiz32P IPhu4ibDqdyeAMCDStfQ2umQwTyTo+HR+veGZpWPZVOSnuZDhaxqz981bXM/czaHE1T3 gBrnxoQdVit+Xc2yAHn37GfUQ9O0Yv5d77oxARnQ4I7eZdzroKDI/xH+DDM5R3fYiGLP MaQFU0vaZaqBAp72A4qLOHCTJR8a4zf2B/Z1Slh+2lLhggad7k/5Lxg1DPcBh86mLA/m uOkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k22si15103019pgl.29.2019.01.22.01.57.44; Tue, 22 Jan 2019 01:58:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727985AbfAVJ4U (ORCPT + 99 others); Tue, 22 Jan 2019 04:56:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:37036 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726872AbfAVJ4T (ORCPT ); Tue, 22 Jan 2019 04:56:19 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 744FDAE5C; Tue, 22 Jan 2019 09:56:17 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 9A1AB1E1584; Tue, 22 Jan 2019 10:56:16 +0100 (CET) Date: Tue, 22 Jan 2019 10:56:16 +0100 From: Jan Kara To: Davidlohr Bueso Cc: akpm@linux-foundation.org, dledford@redhat.com, jgg@mellanox.com, jack@suse.de, ira.weiny@intel.com, linux-rdma@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: Re: [PATCH 1/6] mm: make mm->pinned_vm an atomic64 counter Message-ID: <20190122095616.GA13149@quack2.suse.cz> References: <20190121174220.10583-1-dave@stgolabs.net> <20190121174220.10583-2-dave@stgolabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190121174220.10583-2-dave@stgolabs.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 21-01-19 09:42:15, Davidlohr Bueso wrote: > Taking a sleeping lock to _only_ increment a variable is quite the > overkill, and pretty much all users do this. Furthermore, some drivers > (ie: infiniband and scif) that need pinned semantics can go to quite > some trouble to actually delay via workqueue (un)accounting for pinned > pages when not possible to acquire it. > > By making the counter atomic we no longer need to hold the mmap_sem > and can simply some code around it for pinned_vm users. The counter > is 64-bit such that we need not worry about overflows such as rdma > user input controlled from userspace. > > Signed-off-by: Davidlohr Bueso The patch looks good to me. You can add: Reviewed-by: Jan Kara and I really like the cleanups allowed by this in the drivers :) Honza > --- > drivers/infiniband/core/umem.c | 12 ++++++------ > drivers/infiniband/hw/hfi1/user_pages.c | 6 +++--- > drivers/infiniband/hw/qib/qib_user_pages.c | 4 ++-- > drivers/infiniband/hw/usnic/usnic_uiom.c | 8 ++++---- > drivers/misc/mic/scif/scif_rma.c | 6 +++--- > fs/proc/task_mmu.c | 2 +- > include/linux/mm_types.h | 2 +- > kernel/events/core.c | 8 ++++---- > kernel/fork.c | 2 +- > mm/debug.c | 3 ++- > 10 files changed, 27 insertions(+), 26 deletions(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index 1efe0a74e06b..678abe1afcba 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -166,13 +166,13 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > down_write(&mm->mmap_sem); > - if (check_add_overflow(mm->pinned_vm, npages, &new_pinned) || > - (new_pinned > lock_limit && !capable(CAP_IPC_LOCK))) { > + new_pinned = atomic64_read(&mm->pinned_vm) + npages; > + if (new_pinned > lock_limit && !capable(CAP_IPC_LOCK)) { > up_write(&mm->mmap_sem); > ret = -ENOMEM; > goto out; > } > - mm->pinned_vm = new_pinned; > + atomic64_set(&mm->pinned_vm, new_pinned); > up_write(&mm->mmap_sem); > > cur_base = addr & PAGE_MASK; > @@ -234,7 +234,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, > __ib_umem_release(context->device, umem, 0); > vma: > down_write(&mm->mmap_sem); > - mm->pinned_vm -= ib_umem_num_pages(umem); > + atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm); > up_write(&mm->mmap_sem); > out: > if (vma_list) > @@ -263,7 +263,7 @@ static void ib_umem_release_defer(struct work_struct *work) > struct ib_umem *umem = container_of(work, struct ib_umem, work); > > down_write(&umem->owning_mm->mmap_sem); > - umem->owning_mm->pinned_vm -= ib_umem_num_pages(umem); > + atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm); > up_write(&umem->owning_mm->mmap_sem); > > __ib_umem_release_tail(umem); > @@ -302,7 +302,7 @@ void ib_umem_release(struct ib_umem *umem) > } else { > down_write(&umem->owning_mm->mmap_sem); > } > - umem->owning_mm->pinned_vm -= ib_umem_num_pages(umem); > + atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm); > up_write(&umem->owning_mm->mmap_sem); > > __ib_umem_release_tail(umem); > diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c > index e341e6dcc388..40a6e434190f 100644 > --- a/drivers/infiniband/hw/hfi1/user_pages.c > +++ b/drivers/infiniband/hw/hfi1/user_pages.c > @@ -92,7 +92,7 @@ bool hfi1_can_pin_pages(struct hfi1_devdata *dd, struct mm_struct *mm, > size = DIV_ROUND_UP(size, PAGE_SIZE); > > down_read(&mm->mmap_sem); > - pinned = mm->pinned_vm; > + pinned = atomic64_read(&mm->pinned_vm); > up_read(&mm->mmap_sem); > > /* First, check the absolute limit against all pinned pages. */ > @@ -112,7 +112,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np > return ret; > > down_write(&mm->mmap_sem); > - mm->pinned_vm += ret; > + atomic64_add(ret, &mm->pinned_vm); > up_write(&mm->mmap_sem); > > return ret; > @@ -131,7 +131,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, > > if (mm) { /* during close after signal, mm can be NULL */ > down_write(&mm->mmap_sem); > - mm->pinned_vm -= npages; > + atomic64_sub(npages, &mm->pinned_vm); > up_write(&mm->mmap_sem); > } > } > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c > index 16543d5e80c3..602387bf98e7 100644 > --- a/drivers/infiniband/hw/qib/qib_user_pages.c > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c > @@ -75,7 +75,7 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, > goto bail_release; > } > > - current->mm->pinned_vm += num_pages; > + atomic64_add(num_pages, ¤t->mm->pinned_vm); > > ret = 0; > goto bail; > @@ -156,7 +156,7 @@ void qib_release_user_pages(struct page **p, size_t num_pages) > __qib_release_user_pages(p, num_pages, 1); > > if (current->mm) { > - current->mm->pinned_vm -= num_pages; > + atomic64_sub(num_pages, ¤t->mm->pinned_vm); > up_write(¤t->mm->mmap_sem); > } > } > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c > index ce01a59fccc4..854436a2b437 100644 > --- a/drivers/infiniband/hw/usnic/usnic_uiom.c > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > @@ -129,7 +129,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, > uiomr->owning_mm = mm = current->mm; > down_write(&mm->mmap_sem); > > - locked = npages + current->mm->pinned_vm; > + locked = npages + atomic64_read(¤t->mm->pinned_vm); > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) { > @@ -187,7 +187,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, > if (ret < 0) > usnic_uiom_put_pages(chunk_list, 0); > else { > - mm->pinned_vm = locked; > + atomic64_set(&mm->pinned_vm, locked); > mmgrab(uiomr->owning_mm); > } > > @@ -441,7 +441,7 @@ static void usnic_uiom_release_defer(struct work_struct *work) > container_of(work, struct usnic_uiom_reg, work); > > down_write(&uiomr->owning_mm->mmap_sem); > - uiomr->owning_mm->pinned_vm -= usnic_uiom_num_pages(uiomr); > + atomic64_sub(usnic_uiom_num_pages(uiomr), &uiomr->owning_mm->pinned_vm); > up_write(&uiomr->owning_mm->mmap_sem); > > __usnic_uiom_release_tail(uiomr); > @@ -469,7 +469,7 @@ void usnic_uiom_reg_release(struct usnic_uiom_reg *uiomr, > } else { > down_write(&uiomr->owning_mm->mmap_sem); > } > - uiomr->owning_mm->pinned_vm -= usnic_uiom_num_pages(uiomr); > + atomic64_sub(usnic_uiom_num_pages(uiomr), &uiomr->owning_mm->pinned_vm); > up_write(&uiomr->owning_mm->mmap_sem); > > __usnic_uiom_release_tail(uiomr); > diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c > index 749321eb91ae..2448368f181e 100644 > --- a/drivers/misc/mic/scif/scif_rma.c > +++ b/drivers/misc/mic/scif/scif_rma.c > @@ -285,7 +285,7 @@ __scif_dec_pinned_vm_lock(struct mm_struct *mm, > } else { > down_write(&mm->mmap_sem); > } > - mm->pinned_vm -= nr_pages; > + atomic64_sub(nr_pages, &mm->pinned_vm); > up_write(&mm->mmap_sem); > return 0; > } > @@ -299,7 +299,7 @@ static inline int __scif_check_inc_pinned_vm(struct mm_struct *mm, > return 0; > > locked = nr_pages; > - locked += mm->pinned_vm; > + locked += atomic64_read(&mm->pinned_vm); > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) { > dev_err(scif_info.mdev.this_device, > @@ -307,7 +307,7 @@ static inline int __scif_check_inc_pinned_vm(struct mm_struct *mm, > locked, lock_limit); > return -ENOMEM; > } > - mm->pinned_vm = locked; > + atomic64_set(&mm->pinned_vm, locked); > return 0; > } > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 6976e17dba68..640ae8a47c73 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -59,7 +59,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > SEQ_PUT_DEC("VmPeak:\t", hiwater_vm); > SEQ_PUT_DEC(" kB\nVmSize:\t", total_vm); > SEQ_PUT_DEC(" kB\nVmLck:\t", mm->locked_vm); > - SEQ_PUT_DEC(" kB\nVmPin:\t", mm->pinned_vm); > + SEQ_PUT_DEC(" kB\nVmPin:\t", atomic64_read(&mm->pinned_vm)); > SEQ_PUT_DEC(" kB\nVmHWM:\t", hiwater_rss); > SEQ_PUT_DEC(" kB\nVmRSS:\t", total_rss); > SEQ_PUT_DEC(" kB\nRssAnon:\t", anon); > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6312b02d65ed..0c8be6f9c92d 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -404,7 +404,7 @@ struct mm_struct { > > unsigned long total_vm; /* Total pages mapped */ > unsigned long locked_vm; /* Pages that have PG_mlocked set */ > - unsigned long pinned_vm; /* Refcount permanently increased */ > + atomic64_t pinned_vm; /* Refcount permanently increased */ > unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */ > unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */ > unsigned long stack_vm; /* VM_STACK */ > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 3cd13a30f732..8df0b77a4687 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5459,7 +5459,7 @@ static void perf_mmap_close(struct vm_area_struct *vma) > > /* now it's safe to free the pages */ > atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm); > - vma->vm_mm->pinned_vm -= rb->aux_mmap_locked; > + atomic64_sub(rb->aux_mmap_locked, &vma->vm_mm->pinned_vm); > > /* this has to be the last one */ > rb_free_aux(rb); > @@ -5532,7 +5532,7 @@ static void perf_mmap_close(struct vm_area_struct *vma) > */ > > atomic_long_sub((size >> PAGE_SHIFT) + 1, &mmap_user->locked_vm); > - vma->vm_mm->pinned_vm -= mmap_locked; > + atomic64_sub(mmap_locked, &vma->vm_mm->pinned_vm); > free_uid(mmap_user); > > out_put: > @@ -5680,7 +5680,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) > > lock_limit = rlimit(RLIMIT_MEMLOCK); > lock_limit >>= PAGE_SHIFT; > - locked = vma->vm_mm->pinned_vm + extra; > + locked = atomic64_read(&vma->vm_mm->pinned_vm) + extra; > > if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() && > !capable(CAP_IPC_LOCK)) { > @@ -5721,7 +5721,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) > unlock: > if (!ret) { > atomic_long_add(user_extra, &user->locked_vm); > - vma->vm_mm->pinned_vm += extra; > + atomic64_add(extra, &vma->vm_mm->pinned_vm); > > atomic_inc(&event->mmap_count); > } else if (rb) { > diff --git a/kernel/fork.c b/kernel/fork.c > index c48e9e244a89..a68de9032ced 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -981,7 +981,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > mm_pgtables_bytes_init(mm); > mm->map_count = 0; > mm->locked_vm = 0; > - mm->pinned_vm = 0; > + atomic64_set(&mm->pinned_vm, 0); > memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); > spin_lock_init(&mm->page_table_lock); > spin_lock_init(&mm->arg_lock); > diff --git a/mm/debug.c b/mm/debug.c > index 0abb987dad9b..bcf70e365a77 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -166,7 +166,8 @@ void dump_mm(const struct mm_struct *mm) > mm_pgtables_bytes(mm), > mm->map_count, > mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm, > - mm->pinned_vm, mm->data_vm, mm->exec_vm, mm->stack_vm, > + atomic64_read(&mm->pinned_vm), > + mm->data_vm, mm->exec_vm, mm->stack_vm, > mm->start_code, mm->end_code, mm->start_data, mm->end_data, > mm->start_brk, mm->brk, mm->start_stack, > mm->arg_start, mm->arg_end, mm->env_start, mm->env_end, > -- > 2.16.4 > -- Jan Kara SUSE Labs, CR