Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1961699imu; Tue, 6 Nov 2018 07:03:25 -0800 (PST) X-Google-Smtp-Source: AJdET5cq1FAeBarWW1flosfXi+7KSIgPjPo11EvKGQGV5DNWtE74W0VOHf8KpHshH7lxYpDrK9qx X-Received: by 2002:aa7:8254:: with SMTP id e20-v6mr708555pfn.164.1541516605504; Tue, 06 Nov 2018 07:03:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541516605; cv=none; d=google.com; s=arc-20160816; b=HD5OosStYGsZI2NGdFKgJVaepTunsI4aKR517MU6UBwaT6NvfugFS3SmivD5z2NBm/ TJZKbKe0461RA+5T66hRu6mzCoQnmwSOgRsDAaCkzsPHMcPur7AuctmsG1t4JV8KQZwC sy8Ybbidns0Q7kqeXldxbdTHjMtPIV7Y0G18fpNoxkVYKWOv4gYqYjeRdO5l2GEIbKgy WZxjnpBVAqV5OsyoTqNRc+mnpyTCSq5Fa4k7heOytUrwqgBFaw6DDA5exLOCD4PB0bcZ SXB+2cV4noBA6h9x57ZqEAsJbLy4p/WIwhQAreUpsAUEyvazr5h36IFbgp4w4s0gNfhl HFmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ufLleiMJsLLBP6Nyrgucr+3ko3K4kubEC73ZKopBepo=; b=F0ddfYWbhBZ0TVdjLKA62lRAp+HGA9FuJrr0tm3QjoZZtlSY0SF1DnCRH9Q3dCy3EK sPUq176TGAl8oGIKWu9udbHkWclEyhsC+7esX2cmXKvCvKN4TgloIoHPr1eVckQ6gmB3 +WXFNY2b9iy3BfWJH+FXGJK/1g8u0wm04e0t0lnoZF59ADu9UXtc6m67soNEiP1t+dKE VmJIq7Pu/UXxpG6jg9jeBiHWVTsIBzt4UwEtHX8N6M1znzCTYZOlpeSpGQn/Q0hHTHCo kgBQrcpwB+z3xZ7JeCSFg8M13QLy6dBafccuck/PiY+dzvncBqVDtEfuaFvqwohyLrrJ I6Ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vg6SR8dW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3-v6si37470803pfl.209.2018.11.06.07.03.02; Tue, 06 Nov 2018 07:03:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vg6SR8dW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730519AbeKGA0F (ORCPT + 99 others); Tue, 6 Nov 2018 19:26:05 -0500 Received: from mail-vk1-f193.google.com ([209.85.221.193]:45087 "EHLO mail-vk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729642AbeKGA0F (ORCPT ); Tue, 6 Nov 2018 19:26:05 -0500 Received: by mail-vk1-f193.google.com with SMTP id n126so2722991vke.12 for ; Tue, 06 Nov 2018 07:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ufLleiMJsLLBP6Nyrgucr+3ko3K4kubEC73ZKopBepo=; b=vg6SR8dWpJZ/bgan0OMWyGKRYOOsSnfiUOV56o0ltR2igGL589+7Ar/q789y5ZefXI G33GviOEQU64SYeDbmCEm6g02DmGVM4Id2JR53K57XQ5GmdCuLwerAHCm9hzUER416Zu hm2fnN2z/hKI4pNC0pZ60JoE8gYcEIdcRYV7C5LS1qJLny3HvQkZ5S6S79/5l2kE+nt9 qU5nb3DKjsOJmPFgUwO9bMSRyi2pLlgYRR9UeD5cHbO0kdyXku9yvgnetz01OeksCgJB c+1D9I2gQSCegMPVF/2Ua/+Yp5K0aUtxqmSVI3iTRcDrZTY7cjikUg8jT8d7DbnWxzYh LpLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ufLleiMJsLLBP6Nyrgucr+3ko3K4kubEC73ZKopBepo=; b=ZHbgtn57i3o1ppnxSiWxkfpKnjmjQEicZ3xcz2+tNWiY/P2XGW+A6j4f1qssO/+SkM Tzw7zfVRsaewvAqj0/FPPGpaIpGoCAuAEpvaZUfb4mk4TiUevlSE469GcdJ5NhUSj21M M2rJ4qK9lwpWzWnDnL9fbop+2uHpNlHOCRlAz5E0RMhm9aJXNl0P/j7K8hr6lJ4uKZO8 /ps91VOrcLUFz9L9dqVvvBFjMghBWFLjQ9OanGEvP+rBn41Fh3+haQjDGwGMnEcyl+S4 mIli5+Kn5IZOU5RlK4kn8Y0+e8Dehu09KW7XAYFXG21gsxLG0lJ4Tkuh1PVgKj7xboky ugcA== X-Gm-Message-State: AGRZ1gK6RWzXyJ1D3D9DR1DyIj5MiamPB5WqgbhT9d0rdd9S6hOgApoc lxGVR3n8F4y9d87Kf00QljyUVdwT49PpKovXLxyGyw== X-Received: by 2002:a1f:f0d:: with SMTP id 13mr619940vkp.21.1541516426042; Tue, 06 Nov 2018 07:00:26 -0800 (PST) MIME-Version: 1.0 References: <20181024151116.30935-1-kan.liang@linux.intel.com> In-Reply-To: From: Stephane Eranian Date: Tue, 6 Nov 2018 07:00:13 -0800 Message-ID: Subject: Re: [PATCH 1/2] perf: Add munmap callback To: "Liang, Kan" Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Arnaldo Carvalho de Melo , LKML , Borislav Petkov , Andi Kleen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 5, 2018 at 7:43 AM Liang, Kan wrote: > > > > On 11/5/2018 5:59 AM, Stephane Eranian wrote: > > Hi Kan, > > > > I built a small test case for you to demonstrate the issue for code and data. > > Compile the test program and then do: > > For text: > > $ perf record ./mmap > > $ perf report -D | fgrep MMAP2 > > > > The test program mmaps 2 pages, unmaps the second, and remap 1 page > > over the freed space. > > If you look at the MMAP2 record, you will not be able to reconstruct > > what happened and perf will > > get confused should it try to symbolize from the address range > > > > With Text: > > PERF_RECORD_MMAP2 5937/5937: [0x400000(0x1000) @ 0 08:01 400938 > > 824817672]: r-xp /home/eranian/mmap > > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 > > 00:00 0 0]: rwxp //anon > > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 > > 00:00 0 0]: rwxp //anon > > > > ^^^^^^^^^^^^^^^^^^^^^^^^ captures the whole VMA but not the mapping > > change in user space > > > > For data: > > $ perf record -d ./mmap > > $ perf report -D | fgrep MMAP2 > > With data: > > PERF_RECORD_MMAP2 6430/6430: [0x400000(0x1000) @ 0 08:01 400938 > > 3278843184]: r-xp /home/eranian/mmap > > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 > > 00:00 0 0]: rw-p //anon > > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 > > 00:00 0 0]: rw-p //anon > > > > Same test case with data. > > Perf will think the entire 2 pages have been replaced when in fact > > only the second has. > > I believe the problem is likely to impact data and jitted code cache > > > > #include > > #include > > #include > > #include > > #include > > #include > > > > int main(int argc, char **argv) > > { > > void *addr1, *addr2; > > size_t pgsz = sysconf(_SC_PAGESIZE); > > int n = 2; > > int ret; > > int c, mode = 0; > > > > while ((c = getopt(argc, argv, "hd")) != -1) { > > switch (c) { > > case 'h': > > printf("[-h]\tget this help\n"); > > printf("[-d]\tuse data mmaps (no PROT_EXEC)\n"); > > return 0; > > case 'd': > > mode = PROT_EXEC; > > break; > > default: > > errx(1, "unknown option"); > > } > > } > > /* default to data */ > > if (mode == 0) > > mode = PROT_WRITE; > > > > /* > > * mmap 2 contiugous pages > > */ > > addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > > if (addr1 == (void *)MAP_FAILED) > > err(1, "mmap 1 failed"); > > > > printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz); > > > > /* > > * unmap only the second page > > */ > > ret = munmap(addr1 + pgsz, pgsz); > > if (ret == -1) > > err(1, "munmp failed"); > > > > /* > > * mmap 1 page at the location of the unmap page (should reuse virtual space) > > * This creates a continuous region built from two mmaps and > > potentially two different sources > > * especially with jitted runtimes > > */ > > The two mmaps are both anon. As my understanding, we cannot symbolize > from the anonymous address, can we? Can't we build the same test case using an actual file mapping (both mmap from the same file)? > If we cannot, why we have to distinguish with them? I think we do not > need to know their sources for symbolization. > > As my understanding, only --jit can inject MMAP event, which tag an > anon. Perf can symbolize the address after that. Then the unmap is needed. > Yes, perf inject --jit injects timestamped MMAP2 records to cover the jitted regions which helps symbolize anons. > Thanks, > Kan > > addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode, > > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > > > > printf("addr2=%p\n", addr2); > > > > if (addr2 == (void *)MAP_FAILED) > > err(1, "mmap 2 failed"); > > if (addr2 != (addr1 + pgsz)) > > errx(1, "wrong mmap2 address"); > > > > sleep(1); > > > > return 0; > > } > > > > On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan wrote: > >> > >> > >> > >> On 10/24/2018 3:30 PM, Stephane Eranian wrote: > >>> The need for this new record type extends beyond physical address conversions > >>> and PEBS. A long while ago, someone reported issues with symbolization related > >>> to perf lacking munmap tracking. It had to do with vma merging. I think the > >>> sequence of mmaps was as follows in the problematic case: > >>> 1. addr1 = mmap(8192); > >>> 2. munmap(addr1 + 4096, 4096) > >>> 3. addr2 = mmap(addr1+4096, 4096) > >>> > >>> If successful, that yields addr2 = addr1 + 4096 (could also get the > >>> same without forcing the address). > >>> > >>> In that case, if I recall correctly, the vma for 1st mapping (now at > >>> 4k) and that of the 2nd mapping (4k) > >>> get merged into a single 8k vma and this is what perf_events will > >>> record for PERF_RECORD_MMAP. > >>> On the perf tool side, it is assumed that if two timestamped mappings > >>> overlap then, the latter overrides > >>> the former. In this case, perf would loose the mapping of the first > >>> 4kb and assume all symbols comes from > >>> 2nd mapping. Hopefully I got the scenario right. If so, then you'd > >>> need PERF_RECORD_UNMAP to > >>> disambiguate assuming the perf tool is modified accordingly. > >>> > >> > >> Hi Stephane and Peter, > >> > >> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying > >> to understand the problematic case. > >> > >> It looks like the issue can only be triggered by perf inject --jit. > >> Because it can inject extra MMAP events. > >> As my understanding, Linux kernel only try to merge VMAs if they are > >> both from anon or they are both from the same file. --jit breaks the > >> rule, and makes the merged VMA partly from anon, partly from file. > >> Now, there is a new MMAP event which range covers the modified VMA. > >> Without the help of MUNMAP event, perf tool have no idea if the new one > >> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA. > >> Current code just simply overwrite the modified VMAs. The VMA > >> information which --jit injected may be lost. The symbolization may be > >> lost as well. > >> > >> Except --jit, the VMAs information should be consistent between kernel > >> and perf tools. We shouldn't observe the problem. MUNMAP event is not > >> needed. > >> > >> Is my understanding correct? > >> > >> Do you have a test case for the problem? > >> > >> Thanks, > >> Kan