Received: by 10.213.65.68 with SMTP id h4csp1019018imn; Wed, 14 Mar 2018 07:17:04 -0700 (PDT) X-Google-Smtp-Source: AG47ELs8PQs40v44u3eC+/VEcq9Hevl0fynALNyj93tssjHJGibeweoQqBipVuBaF1HiYgBdMLxV X-Received: by 2002:a17:902:b903:: with SMTP id bf3-v6mr4371504plb.316.1521037024557; Wed, 14 Mar 2018 07:17:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521037024; cv=none; d=google.com; s=arc-20160816; b=J50wAJiR/B7wDR4nuAepLmyKwNMpXL4XvxAY7dplfoojkh5NdS+xEXR+Bi5XEiahwi UoONSd99mmoFEok1sfiK/32Op6VJ7yb8LqxEhJlM8XMPHQfxRTGwWEJ0e+0kcT/PUh0g 1tF8JW2onv16uUCJi/4I/mB9NZLeiQ195VEKxtnpx5HPpBRs/1YH9jF51d+Ag/F5molI 9XZqDMBc5917b5jvD7uIgTmWNYU+FhvXJqnVGnzS3D9y2ilnBTUbfFIr1Msf8F5D5O6F HJ+OBz5v0gp5AsC5GSUAUZRVpbXyJ/qwlucBkZ/P1hTdD3N8YscD9JADTzHYS2OfDrqI jt6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dmarc-filter:arc-authentication-results; bh=M6SuklTeKaIV6hNblVeVWyTKwWfAPImeGf6ymq1waKk=; b=uoEzhkzjMcWsc4uuNq7KN1dU7U/7JRIDBXmZjJb2s8OLNOJTK6rJaFd7IhqlekEVpy lvJB6wZc2ov8wYfvFmVbH0d9Ou7iXD7nuUG9difqPiSzHoN6/qiaIU0h+I7oW65/uVc3 KW25xxdlxIJpQ1muIfrpftCQHjnYa4KNRfYhTW4svb5Z6tOkjj4SR9LJPiaeSb6Fg7SP ounQ9pH00/5/6Dkf4ye8C0pQUmn+5KPGKKgYNyyPrP6soyJEKXnYNO9dmfIYX4wt0qdP 3WJQHKnL7OkrY4d9kPSH4mBpwYbDfRfzTAZUZULGoExff5tcXxVSlcRYsP6l1NS4QbKb Aqbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s6-v6si674786plp.79.2018.03.14.07.16.49; Wed, 14 Mar 2018 07:17:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751682AbeCNOPs (ORCPT + 99 others); Wed, 14 Mar 2018 10:15:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:53860 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751523AbeCNOPr (ORCPT ); Wed, 14 Mar 2018 10:15:47 -0400 Received: from devbox (NE2965lan1.rev.em-net.ne.jp [210.141.244.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CAD0E214D7; Wed, 14 Mar 2018 14:15:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CAD0E214D7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=mhiramat@kernel.org Date: Wed, 14 Mar 2018 23:15:40 +0900 From: Masami Hiramatsu To: Ravi Bangoria Cc: oleg@redhat.com, peterz@infradead.org, srikar@linux.vnet.ibm.com, acme@kernel.org, ananth@linux.vnet.ibm.com, akpm@linux-foundation.org, alexander.shishkin@linux.intel.com, alexis.berlemont@gmail.com, corbet@lwn.net, dan.j.williams@intel.com, gregkh@linuxfoundation.org, huawei.libin@huawei.com, hughd@google.com, jack@suse.cz, jglisse@redhat.com, jolsa@redhat.com, kan.liang@intel.com, kirill.shutemov@linux.intel.com, kjlx@templeofstupid.com, kstewart@linuxfoundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, milian.wolff@kdab.com, mingo@redhat.com, namhyung@kernel.org, naveen.n.rao@linux.vnet.ibm.com, pc@us.ibm.com, pombredanne@nexb.com, rostedt@goodmis.org, tglx@linutronix.de, tmricht@linux.vnet.ibm.com, willy@infradead.org, yao.jin@linux.intel.com, fengguang.wu@intel.com Subject: Re: [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter Message-Id: <20180314231540.b98c74a153255f59f54ebc46@kernel.org> In-Reply-To: <20180313125603.19819-7-ravi.bangoria@linux.vnet.ibm.com> References: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com> <20180313125603.19819-7-ravi.bangoria@linux.vnet.ibm.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Mar 2018 18:26:01 +0530 Ravi Bangoria wrote: > For tiny binaries/libraries, different mmap regions points to the > same file portion. In such cases, we may increment reference counter > multiple times. But while de-registration, reference counter will get > decremented only by once leaving reference counter > 0 even if no one > is tracing on that marker. > > Ensure increment and decrement happens in sync by keeping list of > mms in trace_uprobe. Increment reference counter only if mm is not > present in the list and decrement only if mm is present in the list. > > Example > > # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events > > Before patch: > > # perf stat -a -e sdt_tick:loop2 > # /tmp/tick > # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd > 0000000: 02 . > > # pkill perf > # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd > 0000000: 01 . > > After patch: > > # perf stat -a -e sdt_tick:loop2 > # /tmp/tick > # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd > 0000000: 01 . > > # pkill perf > # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd > 0000000: 00 . > > Signed-off-by: Ravi Bangoria > --- > kernel/trace/trace_uprobe.c | 105 +++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 103 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > index b6c9b48..9bf3f7a 100644 > --- a/kernel/trace/trace_uprobe.c > +++ b/kernel/trace/trace_uprobe.c > @@ -50,6 +50,11 @@ struct trace_uprobe_filter { > struct list_head perf_events; > }; > > +struct sdt_mm_list { > + struct mm_struct *mm; > + struct sdt_mm_list *next; > +}; Oh, please use struct list_head instead of defining your own pointer-chain :( > + > /* > * uprobe event core functions > */ > @@ -61,6 +66,8 @@ struct trace_uprobe { > char *filename; > unsigned long offset; > unsigned long ref_ctr_offset; > + struct sdt_mm_list *sml; > + struct rw_semaphore sml_rw_sem; BTW, is there any reason to use rw_semaphore? (mutex doesn't fit?) Thank you, > unsigned long nhit; > struct trace_probe tp; > }; > @@ -274,6 +281,7 @@ static inline bool is_ret_probe(struct trace_uprobe *tu) > if (is_ret) > tu->consumer.ret_handler = uretprobe_dispatcher; > init_trace_uprobe_filter(&tu->filter); > + init_rwsem(&tu->sml_rw_sem); > return tu; > > error: > @@ -921,6 +929,74 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func, > return trace_handle_return(s); > } > > +static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) > +{ > + struct sdt_mm_list *tmp = tu->sml; > + > + if (!tu->sml || !mm) > + return false; > + > + while (tmp) { > + if (tmp->mm == mm) > + return true; > + tmp = tmp->next; > + } > + > + return false; > +} > + > +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) > +{ > + struct sdt_mm_list *tmp; > + > + tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); > + if (!tmp) > + return; > + > + tmp->mm = mm; > + tmp->next = tu->sml; > + tu->sml = tmp; > +} > + > +static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) > +{ > + struct sdt_mm_list *prev, *curr; > + > + if (!tu->sml) > + return; > + > + if (tu->sml->mm == mm) { > + curr = tu->sml; > + tu->sml = tu->sml->next; > + kfree(curr); > + return; > + } > + > + prev = tu->sml; > + curr = tu->sml->next; > + while (curr) { > + if (curr->mm == mm) { > + prev->next = curr->next; > + kfree(curr); > + return; > + } > + prev = curr; > + curr = curr->next; > + } > +} > + > +static void sdt_flush_mm_list(struct trace_uprobe *tu) > +{ > + struct sdt_mm_list *next, *curr = tu->sml; > + > + while (curr) { > + next = curr->next; > + kfree(curr); > + curr = next; > + } > + tu->sml = NULL; > +} > + > static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma) > { > unsigned long vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > @@ -989,17 +1065,25 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu) > if (IS_ERR(info)) > goto out; > > + down_write(&tu->sml_rw_sem); > while (info) { > + if (sdt_check_mm_list(tu, info->mm)) > + goto cont; > + > down_write(&info->mm->mmap_sem); > > vma = sdt_find_vma(info->mm, tu); > vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > - sdt_update_ref_ctr(info->mm, vaddr, 1); > + if (!sdt_update_ref_ctr(info->mm, vaddr, 1)) > + sdt_add_mm_list(tu, info->mm); > > up_write(&info->mm->mmap_sem); > + > +cont: > mmput(info->mm); > info = uprobe_free_map_info(info); > } > + up_write(&tu->sml_rw_sem); > > out: > uprobe_end_dup_mmap(); > @@ -1020,8 +1104,16 @@ void trace_uprobe_mmap_callback(struct vm_area_struct *vma) > !trace_probe_is_enabled(&tu->tp)) > continue; > > + down_write(&tu->sml_rw_sem); > + if (sdt_check_mm_list(tu, vma->vm_mm)) > + goto cont; > + > vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > - sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); > + if (!sdt_update_ref_ctr(vma->vm_mm, vaddr, 1)) > + sdt_add_mm_list(tu, vma->vm_mm); > + > +cont: > + up_write(&tu->sml_rw_sem); > } > mutex_unlock(&uprobe_lock); > } > @@ -1038,7 +1130,11 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu) > if (IS_ERR(info)) > goto out; > > + down_write(&tu->sml_rw_sem); > while (info) { > + if (!sdt_check_mm_list(tu, info->mm)) > + goto cont; > + > down_write(&info->mm->mmap_sem); > > vma = sdt_find_vma(info->mm, tu); > @@ -1046,9 +1142,14 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu) > sdt_update_ref_ctr(info->mm, vaddr, -1); > > up_write(&info->mm->mmap_sem); > + sdt_del_mm_list(tu, info->mm); > + > +cont: > mmput(info->mm); > info = uprobe_free_map_info(info); > } > + sdt_flush_mm_list(tu); > + up_write(&tu->sml_rw_sem); > > out: > uprobe_end_dup_mmap(); > -- > 1.8.3.1 > -- Masami Hiramatsu