Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp715502imu; Mon, 5 Nov 2018 07:44:57 -0800 (PST) X-Google-Smtp-Source: AJdET5emXhBCJsvquOtLtebFxjzb7yrgGEgoH4AUnv8dbleClRARvUYOq+6niZp5F2Zh9GhU/Wd8 X-Received: by 2002:a63:3546:: with SMTP id c67mr20870059pga.284.1541432697068; Mon, 05 Nov 2018 07:44:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541432697; cv=none; d=google.com; s=arc-20160816; b=n/YIGUJzXDOjIw64WDa1nzI6iTTV0Cl1OTxga+Wzcr+aeK+SRfYsfuT8riB9tUh4Kk +FEBxPkY+EhslsRrVqo5HF9awvdKmvuJuxXM6DnMnePw6OfhuI0YcbJcxRxuDxoGrt0J QhNSLqs8FYaepJBzhZimAfGrfm+aFsj7oWc7Czq8CDP4JoPJpPgal0q5SUP5lZSjOkHh tM7huaR0wouoRukDpgnjL+Nq1BSWBSfxDLbAuG0OHMwAi10fJf9C+Raz4y4iJIcDNN2M rLxxxFsLkjSKnlUcOT83xTE4ww+y0M5y7VhkLfs9ebWfNbO2gacyZe8G8aT4iFjpT+uv l2kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=FI+gZmTOgbK0QH4ZQJHcQCBaSOsoHekYD8ujvtTBYWU=; b=fmTGru3Z6bnEHzAb4nBx3KmE0rPQaTitLo1u4cIf3xAjQb+lAStsSzKfHqmJCk6kyG vn0bIai31FroN/28YVAjCEJienIlhQR8eEIPc5zk9rZMwXYFEmDDLJ3xRzUvMwIfqr2m fYK7G80Xm0rhXSg3tj0GZnYwL3RNOyoVEGWcYZSSKc5oPvd5Rizftt/qDEEKLQ7p94V0 +Vn7O2BhazJVxmYJ54bY2g1N8VsT3jrcMEzFNuyFf0PiqkQ2BBp2X5ZeWvixl7krB6+z /IrG+nLwntQGAQ2koWVbJPmsF136yJanEyEu2oRfnoh1a5RVxBKvMHOEKZ56lujmD9F8 jDZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h1-v6si37558423pls.281.2018.11.05.07.44.35; Mon, 05 Nov 2018 07:44:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729908AbeKFBEB (ORCPT + 99 others); Mon, 5 Nov 2018 20:04:01 -0500 Received: from mga12.intel.com ([192.55.52.136]:53307 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729691AbeKFBEB (ORCPT ); Mon, 5 Nov 2018 20:04:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Nov 2018 07:43:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,468,1534834800"; d="scan'208";a="271517232" Received: from linux.intel.com ([10.54.29.200]) by orsmga005.jf.intel.com with ESMTP; 05 Nov 2018 07:43:43 -0800 Received: from [10.251.16.220] (kliang2-mobl1.ccr.corp.intel.com [10.251.16.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id B9E2C580213; Mon, 5 Nov 2018 07:43:42 -0800 (PST) Subject: Re: [PATCH 1/2] perf: Add munmap callback To: Stephane Eranian Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Arnaldo Carvalho de Melo , LKML , Borislav Petkov , Andi Kleen References: <20181024151116.30935-1-kan.liang@linux.intel.com> From: "Liang, Kan" Message-ID: Date: Mon, 5 Nov 2018 10:43:41 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/5/2018 5:59 AM, Stephane Eranian wrote: > Hi Kan, > > I built a small test case for you to demonstrate the issue for code and data. > Compile the test program and then do: > For text: > $ perf record ./mmap > $ perf report -D | fgrep MMAP2 > > The test program mmaps 2 pages, unmaps the second, and remap 1 page > over the freed space. > If you look at the MMAP2 record, you will not be able to reconstruct > what happened and perf will > get confused should it try to symbolize from the address range > > With Text: > PERF_RECORD_MMAP2 5937/5937: [0x400000(0x1000) @ 0 08:01 400938 > 824817672]: r-xp /home/eranian/mmap > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 > 00:00 0 0]: rwxp //anon > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 > 00:00 0 0]: rwxp //anon > > ^^^^^^^^^^^^^^^^^^^^^^^^ captures the whole VMA but not the mapping > change in user space > > For data: > $ perf record -d ./mmap > $ perf report -D | fgrep MMAP2 > With data: > PERF_RECORD_MMAP2 6430/6430: [0x400000(0x1000) @ 0 08:01 400938 > 3278843184]: r-xp /home/eranian/mmap > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 > 00:00 0 0]: rw-p //anon > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 > 00:00 0 0]: rw-p //anon > > Same test case with data. > Perf will think the entire 2 pages have been replaced when in fact > only the second has. > I believe the problem is likely to impact data and jitted code cache > > #include > #include > #include > #include > #include > #include > > int main(int argc, char **argv) > { > void *addr1, *addr2; > size_t pgsz = sysconf(_SC_PAGESIZE); > int n = 2; > int ret; > int c, mode = 0; > > while ((c = getopt(argc, argv, "hd")) != -1) { > switch (c) { > case 'h': > printf("[-h]\tget this help\n"); > printf("[-d]\tuse data mmaps (no PROT_EXEC)\n"); > return 0; > case 'd': > mode = PROT_EXEC; > break; > default: > errx(1, "unknown option"); > } > } > /* default to data */ > if (mode == 0) > mode = PROT_WRITE; > > /* > * mmap 2 contiugous pages > */ > addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > if (addr1 == (void *)MAP_FAILED) > err(1, "mmap 1 failed"); > > printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz); > > /* > * unmap only the second page > */ > ret = munmap(addr1 + pgsz, pgsz); > if (ret == -1) > err(1, "munmp failed"); > > /* > * mmap 1 page at the location of the unmap page (should reuse virtual space) > * This creates a continuous region built from two mmaps and > potentially two different sources > * especially with jitted runtimes > */ The two mmaps are both anon. As my understanding, we cannot symbolize from the anonymous address, can we? If we cannot, why we have to distinguish with them? I think we do not need to know their sources for symbolization. As my understanding, only --jit can inject MMAP event, which tag an anon. Perf can symbolize the address after that. Then the unmap is needed. Thanks, Kan > addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode, > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > > printf("addr2=%p\n", addr2); > > if (addr2 == (void *)MAP_FAILED) > err(1, "mmap 2 failed"); > if (addr2 != (addr1 + pgsz)) > errx(1, "wrong mmap2 address"); > > sleep(1); > > return 0; > } > > On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan wrote: >> >> >> >> On 10/24/2018 3:30 PM, Stephane Eranian wrote: >>> The need for this new record type extends beyond physical address conversions >>> and PEBS. A long while ago, someone reported issues with symbolization related >>> to perf lacking munmap tracking. It had to do with vma merging. I think the >>> sequence of mmaps was as follows in the problematic case: >>> 1. addr1 = mmap(8192); >>> 2. munmap(addr1 + 4096, 4096) >>> 3. addr2 = mmap(addr1+4096, 4096) >>> >>> If successful, that yields addr2 = addr1 + 4096 (could also get the >>> same without forcing the address). >>> >>> In that case, if I recall correctly, the vma for 1st mapping (now at >>> 4k) and that of the 2nd mapping (4k) >>> get merged into a single 8k vma and this is what perf_events will >>> record for PERF_RECORD_MMAP. >>> On the perf tool side, it is assumed that if two timestamped mappings >>> overlap then, the latter overrides >>> the former. In this case, perf would loose the mapping of the first >>> 4kb and assume all symbols comes from >>> 2nd mapping. Hopefully I got the scenario right. If so, then you'd >>> need PERF_RECORD_UNMAP to >>> disambiguate assuming the perf tool is modified accordingly. >>> >> >> Hi Stephane and Peter, >> >> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying >> to understand the problematic case. >> >> It looks like the issue can only be triggered by perf inject --jit. >> Because it can inject extra MMAP events. >> As my understanding, Linux kernel only try to merge VMAs if they are >> both from anon or they are both from the same file. --jit breaks the >> rule, and makes the merged VMA partly from anon, partly from file. >> Now, there is a new MMAP event which range covers the modified VMA. >> Without the help of MUNMAP event, perf tool have no idea if the new one >> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA. >> Current code just simply overwrite the modified VMAs. The VMA >> information which --jit injected may be lost. The symbolization may be >> lost as well. >> >> Except --jit, the VMAs information should be consistent between kernel >> and perf tools. We shouldn't observe the problem. MUNMAP event is not >> needed. >> >> Is my understanding correct? >> >> Do you have a test case for the problem? >> >> Thanks, >> Kan