Received: by 10.192.165.148 with SMTP id m20csp340102imm; Thu, 3 May 2018 21:49:56 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpFJt8n3kTCpjNys7MlXX5T81Xtnw0zQa4a2fu+IM0j9LMLsz9kE8+iZKvng8cUhmmzRQ3D X-Received: by 2002:a63:7c14:: with SMTP id x20-v6mr21435570pgc.161.1525409396613; Thu, 03 May 2018 21:49:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525409396; cv=none; d=google.com; s=arc-20160816; b=xEiENtRQVS0vGu/cCNAzKdcVl4es06l5ErqP0gxCKHUEXPh13Mkgu9YFxwCg8YGz6H GwYRI8zzdLt2WzuKyyJoSMaUIcFuvzbGCIr+YHUgmLY4PoIc3m96dKUV7UPtUzrjsRPg 1r/w5Jx6GpKIM3HRy8Le8NRz2dyQ4IXfEbtdEXZ/vBc2SxPaslhGy3oUXoO48ASoFd53 wPM9PkHAVl8QlwWk1vIaZcYRheuKo+JfRpbhDMXatDbAg9fNGKr91NTVoEezkJAZvOp9 0rnhvhYPCC7ff7lq266Sex8u94G5tG1v75P+1b3gzzl7tehgGbbWe1WpDLCC1HiAazRR 2JBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=q8UhgIV4emsXe8DAKbmD/JhasBJqw8OAQyvDxr881Ao=; b=WLvmn2kGYmtIaOp1Sa/Zj/FtjUvewJ5N2vR+agI0unllC1Jene7xY4t1umWXMa4CUv GJHFhbAUcN+xsYWRn82mteKFA/FDXSpxl8+HygTvSEsrkVFcJNPTe1p8B/k/azAZsFDG zVpPfJ2frKrxZRPxqvwgq5vWS1kGT7ybjd18lZaQvIW4smhbaKDarRU/nnnNx2093+Qz xR7AM1qgecrgBVowaeNc+P+6Yx+tT2jyBXwVx/sjKIp5zsEimzGldUux3WgYTSbBqC4Y iSt95v5/ZhCTKMZeI8fdjgmSk6CILZwr1D7XhoaLtrDJ8RVC5sI2ZnvDfJtxYCGeEYXZ a7AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fbwb5b34; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u8-v6si15584770plm.134.2018.05.03.21.49.42; Thu, 03 May 2018 21:49:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fbwb5b34; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751172AbeEDEsX (ORCPT + 99 others); Fri, 4 May 2018 00:48:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:40554 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750774AbeEDEsW (ORCPT ); Fri, 4 May 2018 00:48:22 -0400 Received: from devbox (NE2965lan1.rev.em-net.ne.jp [210.141.244.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D83BE2177F; Fri, 4 May 2018 04:48:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1525409301; bh=rcFcPRCqCjZvCEUzDvmt56fsA30YoDhKmy8g5Ld9UvY=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=fbwb5b34kBC1kkHYaDTXgpg4aO39V+ITo+0PUN2dM+xgIPI4A+eMs/F7CirEt3MeS plp1RoAVtdH81HjLa74coSsxUHbLhyOFWEG4XVNomx4M6CRwU9xLyS5dciL0LvrSjv A6ANGKaTV6GsEujGoxiAYSpfpWCf1R4ZHfdLF02w= Date: Fri, 4 May 2018 13:48:16 +0900 From: Masami Hiramatsu To: Ravi Bangoria Cc: oleg@redhat.com, peterz@infradead.org, srikar@linux.vnet.ibm.com, rostedt@goodmis.org, acme@kernel.org, ananth@linux.vnet.ibm.com, akpm@linux-foundation.org, alexander.shishkin@linux.intel.com, alexis.berlemont@gmail.com, corbet@lwn.net, dan.j.williams@intel.com, jolsa@redhat.com, kan.liang@intel.com, kjlx@templeofstupid.com, kstewart@linuxfoundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, milian.wolff@kdab.com, mingo@redhat.com, namhyung@kernel.org, naveen.n.rao@linux.vnet.ibm.com, pc@us.ibm.com, tglx@linutronix.de, yao.jin@linux.intel.com, fengguang.wu@intel.com, jglisse@redhat.com, Ravi Bangoria Subject: Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore) Message-Id: <20180504134816.8633a157dd036489d9b0f1db@kernel.org> In-Reply-To: <20180417043244.7501-7-ravi.bangoria@linux.vnet.ibm.com> References: <20180417043244.7501-1-ravi.bangoria@linux.vnet.ibm.com> <20180417043244.7501-7-ravi.bangoria@linux.vnet.ibm.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ravi, I have some comments, please see below. On Tue, 17 Apr 2018 10:02:41 +0530 Ravi Bangoria wrote:\ > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index 7bd2760..2db3ed1 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h > @@ -122,6 +122,8 @@ struct uprobe_map_info { > unsigned long vaddr; > }; > > +extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma); > + > extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); > extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); > extern bool is_swbp_insn(uprobe_opcode_t *insn); > @@ -136,6 +138,8 @@ struct uprobe_map_info { > extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end); > extern void uprobe_start_dup_mmap(void); > extern void uprobe_end_dup_mmap(void); > +extern void uprobe_down_write_dup_mmap(void); > +extern void uprobe_up_write_dup_mmap(void); > extern void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm); > extern void uprobe_free_utask(struct task_struct *t); > extern void uprobe_copy_process(struct task_struct *t, unsigned long flags); > @@ -192,6 +196,12 @@ static inline void uprobe_start_dup_mmap(void) > static inline void uprobe_end_dup_mmap(void) > { > } > +static inline void uprobe_down_write_dup_mmap(void) > +{ > +} > +static inline void uprobe_up_write_dup_mmap(void) > +{ > +} > static inline void > uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm) > { > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 096d1e6..e26ad83 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1044,6 +1044,9 @@ static void build_probe_list(struct inode *inode, > spin_unlock(&uprobes_treelock); > } > > +/* Rightnow the only user of this is trace_uprobe. */ > +void (*uprobe_mmap_callback)(struct vm_area_struct *vma); > + > /* > * Called from mmap_region/vma_adjust with mm->mmap_sem acquired. > * > @@ -1056,7 +1059,13 @@ int uprobe_mmap(struct vm_area_struct *vma) > struct uprobe *uprobe, *u; > struct inode *inode; > > - if (no_uprobe_events() || !valid_vma(vma, true)) > + if (no_uprobe_events()) > + return 0; > + > + if (uprobe_mmap_callback) > + uprobe_mmap_callback(vma); > + > + if (!valid_vma(vma, true)) > return 0; > > inode = file_inode(vma->vm_file); > @@ -1247,6 +1256,16 @@ void uprobe_end_dup_mmap(void) > percpu_up_read(&dup_mmap_sem); > } > > +void uprobe_down_write_dup_mmap(void) > +{ > + percpu_down_write(&dup_mmap_sem); > +} > + > +void uprobe_up_write_dup_mmap(void) > +{ > + percpu_up_write(&dup_mmap_sem); > +} > + I'm not sure why these hunks are not done in previous patch. If you separate "uprobe_map_info" export patch, this also should be separated. (Or both merged into this patch) > void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm) > { > if (test_bit(MMF_HAS_UPROBES, &oldmm->flags)) { > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > index 0d450b4..1a48b04 100644 > --- a/kernel/trace/trace_uprobe.c > +++ b/kernel/trace/trace_uprobe.c > @@ -25,6 +25,8 @@ > #include > #include > #include > +#include > +#include > > #include "trace_probe.h" > > @@ -58,6 +60,7 @@ struct trace_uprobe { > struct inode *inode; > char *filename; > unsigned long offset; > + unsigned long ref_ctr_offset; > unsigned long nhit; > struct trace_probe tp; > }; > @@ -364,10 +367,10 @@ static int create_trace_uprobe(int argc, char **argv) > { > struct trace_uprobe *tu; > struct inode *inode; > - char *arg, *event, *group, *filename; > + char *arg, *event, *group, *filename, *rctr, *rctr_end; > char buf[MAX_EVENT_NAME_LEN]; > struct path path; > - unsigned long offset; > + unsigned long offset, ref_ctr_offset; > bool is_delete, is_return; > int i, ret; > > @@ -377,6 +380,7 @@ static int create_trace_uprobe(int argc, char **argv) > is_return = false; > event = NULL; > group = NULL; > + ref_ctr_offset = 0; > > /* argc must be >= 1 */ > if (argv[0][0] == '-') > @@ -456,6 +460,26 @@ static int create_trace_uprobe(int argc, char **argv) > goto fail_address_parse; > } > > + /* Parse reference counter offset if specified. */ > + rctr = strchr(arg, '('); > + if (rctr) { > + rctr_end = strchr(rctr, ')'); > + if (rctr > rctr_end || *(rctr_end + 1) != 0) { > + ret = -EINVAL; > + pr_info("Invalid reference counter offset.\n"); > + goto fail_address_parse; > + } > + > + *rctr++ = '\0'; > + *rctr_end = '\0'; > + ret = kstrtoul(rctr, 0, &ref_ctr_offset); > + if (ret) { > + pr_info("Invalid reference counter offset.\n"); > + goto fail_address_parse; > + } > + } > + > + /* Parse uprobe offset. */ > ret = kstrtoul(arg, 0, &offset); > if (ret) > goto fail_address_parse; > @@ -490,6 +514,7 @@ static int create_trace_uprobe(int argc, char **argv) > goto fail_address_parse; > } > tu->offset = offset; > + tu->ref_ctr_offset = ref_ctr_offset; > tu->inode = inode; > tu->filename = kstrdup(filename, GFP_KERNEL); > > @@ -622,6 +647,8 @@ static int probes_seq_show(struct seq_file *m, void *v) > break; > } > } > + if (tu->ref_ctr_offset) > + seq_printf(m, "(0x%lx)", tu->ref_ctr_offset); > > for (i = 0; i < tu->tp.nr_args; i++) > seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm); > @@ -896,6 +923,129 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func, > return trace_handle_return(s); > } > > +static bool sdt_valid_vma(struct trace_uprobe *tu, > + struct vm_area_struct *vma, > + unsigned long vaddr) > +{ > + return tu->ref_ctr_offset && > + vma->vm_file && > + file_inode(vma->vm_file) == tu->inode && > + vma->vm_flags & VM_WRITE && > + vma->vm_start <= vaddr && > + vma->vm_end > vaddr; > +} > + > +static struct vm_area_struct *sdt_find_vma(struct trace_uprobe *tu, > + struct mm_struct *mm, > + unsigned long vaddr) > +{ > + struct vm_area_struct *vma = find_vma(mm, vaddr); > + > + return (vma && sdt_valid_vma(tu, vma, vaddr)) ? vma : NULL; > +} > + > +/* > + * Reference counter gate the invocation of probe. If present, > + * by default reference counter is 0. One needs to increment > + * it before tracing the probe and decrement it when done. > + */ > +static int > +sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) > +{ > + void *kaddr; > + struct page *page; > + struct vm_area_struct *vma; > + int ret = 0; > + unsigned short *ptr; > + > + if (vaddr == 0) > + return -EINVAL; > + > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, > + FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL); > + if (ret <= 0) > + return ret; Hmm, get_user_pages_remote() said === If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns -errno. === And you've passed 1 for nr_pages, so it must be 1 or -errno. > + > + kaddr = kmap_atomic(page); > + ptr = kaddr + (vaddr & ~PAGE_MASK); > + *ptr += d; > + kunmap_atomic(kaddr); > + > + put_page(page); > + return 0; And obviously 0 means "success" for sdt_update_ref_ctr(). I think if get_user_pages_remote returns 0, this should return -EBUSY (*) or something else. * It seems that if faultin_page() in __get_user_pages() returns -EBUSY, get_user_pages_remote() can return 0. > +} > + > +static void sdt_increment_ref_ctr(struct trace_uprobe *tu) > +{ > + struct uprobe_map_info *info; > + > + uprobe_down_write_dup_mmap(); > + info = uprobe_build_map_info(tu->inode->i_mapping, > + tu->ref_ctr_offset, false); > + if (IS_ERR(info)) > + goto out; > + > + while (info) { > + down_write(&info->mm->mmap_sem); > + > + if (sdt_find_vma(tu, info->mm, info->vaddr)) > + sdt_update_ref_ctr(info->mm, info->vaddr, 1); Don't you have to handle the error to map pages here? > + > + up_write(&info->mm->mmap_sem); > + info = uprobe_free_map_info(info); > + } > + > +out: > + uprobe_up_write_dup_mmap(); > +} > + > +/* Called with down_write(&vma->vm_mm->mmap_sem) */ > +static void trace_uprobe_mmap(struct vm_area_struct *vma) > +{ > + struct trace_uprobe *tu; > + unsigned long vaddr; > + > + if (!(vma->vm_flags & VM_WRITE)) > + return; > + > + mutex_lock(&uprobe_lock); > + list_for_each_entry(tu, &uprobe_list, list) { > + if (!trace_probe_is_enabled(&tu->tp)) > + continue; > + > + vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > + if (!sdt_valid_vma(tu, vma, vaddr)) > + continue; > + > + sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); Same here. > + } > + mutex_unlock(&uprobe_lock); > +} > + > +static void sdt_decrement_ref_ctr(struct trace_uprobe *tu) > +{ > + struct uprobe_map_info *info; > + > + uprobe_down_write_dup_mmap(); > + info = uprobe_build_map_info(tu->inode->i_mapping, > + tu->ref_ctr_offset, false); > + if (IS_ERR(info)) > + goto out; > + > + while (info) { > + down_write(&info->mm->mmap_sem); > + > + if (sdt_find_vma(tu, info->mm, info->vaddr)) > + sdt_update_ref_ctr(info->mm, info->vaddr, -1); Ditto. Thank you, > + > + up_write(&info->mm->mmap_sem); > + info = uprobe_free_map_info(info); > + } > + > +out: > + uprobe_up_write_dup_mmap(); > +} > + > typedef bool (*filter_func_t)(struct uprobe_consumer *self, > enum uprobe_filter_ctx ctx, > struct mm_struct *mm); > @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self, > if (ret) > goto err_buffer; > > + if (tu->ref_ctr_offset) > + sdt_increment_ref_ctr(tu); > + > return 0; > > err_buffer: > @@ -981,6 +1134,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self, > > WARN_ON(!uprobe_filter_is_empty(&tu->filter)); > > + if (tu->ref_ctr_offset) > + sdt_decrement_ref_ctr(tu); > + > uprobe_unregister(tu->inode, tu->offset, &tu->consumer); > tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE; > > @@ -1425,6 +1581,8 @@ static __init int init_uprobe_trace(void) > /* Profile interface */ > trace_create_file("uprobe_profile", 0444, d_tracer, > NULL, &uprobe_profile_ops); > + > + uprobe_mmap_callback = trace_uprobe_mmap; > return 0; > } > > -- > 1.8.3.1 > -- Masami Hiramatsu