Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754809Ab2BDCY5 (ORCPT ); Fri, 3 Feb 2012 21:24:57 -0500 Received: from mail3.caviumnetworks.com ([12.108.191.235]:15420 "EHLO mail3.caviumnetworks.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753381Ab2BDCY4 (ORCPT ); Fri, 3 Feb 2012 21:24:56 -0500 Message-ID: <4F2C96F7.9070307@cavium.com> Date: Fri, 03 Feb 2012 18:24:55 -0800 From: David Daney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10 MIME-Version: 1.0 To: Dan McGee CC: linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo Subject: Re: [perf] perf top segfaulting References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 04 Feb 2012 02:24:55.0952 (UTC) FILETIME=[2F37F900:01CCE2E4] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7637 Lines: 159 On 02/03/2012 04:45 PM, Dan McGee wrote: > On i686, version 3.2-2, but looks like annotate.c hasn't changed much > since. It sometimes happens within 5 seconds of starting perf, > sometimes much later, but almost always if I leave it running I well > come back to it having segfaulted. When ran with gdb here it took > about 3 minutes; I had a 5 second segfault and a 5 minute segfault > before and after this run as well. I'm not sure what triggers it other > than it isn't user input, as I can start `perf top`, not touch it, and > it will eventually segfault. > I have seen the same thing (basically the same stack trace), so I think what I see is probably closely related. My failures however are on mips64 based systems. My debugging suggests that this happens when the ABIs used by the running processes are heterogeneous (A mixture of 32-bit and 64-bit processes). What I see is that all processes use a library with a common name, but differing in paths (/lib32/libc-2.11.3.so and /lib64/libc-2.11.3.so for example). It looks like perf is confusing the offsets it caches from one library to look up information in the other and since the symbols are in different locations, the resulting erroneous address calculations result in accesses to unmapped portions of perf's address space and you get SIGSEGV. I haven't dug into the code enough to suggest a fix, but I think that at a high hand-waving level, this is what is happening. I have never observed the failure when using only a single ABI on the system David Daney > -Dan > > Program received signal SIGSEGV, Segmentation fault. > symbol__inc_addr_samples (sym=0x91ca9e8, map=0x9153010, evidx=0, > addr=385376) at util/annotate.c:73 > 73 util/annotate.c: No such file or directory. > (gdb) bt > #0 symbol__inc_addr_samples (sym=0x91ca9e8, map=0x9153010, evidx=0, > addr=385376) at util/annotate.c:73 > #1 0x08066593 in record_precise_ip (ip=, counter=0, > he=) at builtin-top.c:223 > #2 perf_event__process_sample (session=0x827bf50, sample=0xbffff9f4, > evsel=, event=) at builtin-top.c:801 > #3 perf_session__mmap_read_idx (self=0x827bf50, idx=0) at builtin-top.c:825 > #4 0x08068489 in perf_session__mmap_read (self=0x827bf50) at builtin-top.c:839 > #5 __cmd_top () at builtin-top.c:1003 > #6 cmd_top (argc=, argv=0xbffffd38, prefix=0x0) at > builtin-top.c:1274 > #7 0x08055e90 in run_builtin (p=0x80f2624, argc=1, argv=0xbffffd38) > at perf.c:286 > #8 0x08055651 in handle_internal_command (argv=0xbffffd38, argc=1) at > perf.c:358 > #9 run_argv (argv=0xbffffbd8, argcp=0xbffffbdc) at perf.c:402 > #10 main (argc=1, argv=0xbffffd38) at perf.c:512 > (gdb) bt full > #0 symbol__inc_addr_samples (sym=0x91ca9e8, map=0x9153010, evidx=0, > addr=385376) at util/annotate.c:73 > offset = 4294557696 > h = 0xb6a8101c > __func__ = "symbol__inc_addr_samples" > #1 0x08066593 in record_precise_ip (ip=, counter=0, > he=) at builtin-top.c:223 > sym = 0x91ca9e8 > #2 perf_event__process_sample (session=0x827bf50, sample=0xbffff9f4, > evsel=, event=) at builtin-top.c:801 > he = 0xfff9c000 > parent = 0x0 > machine = 0x0 > err = > origin = > ip = 3076591968 > al = {thread = 0x91fa360, map = 0x91fa650, sym = 0x91ca9e8, > addr = 794976, level = 46 '.', filtered = false, cpumode = 2 '\002', > cpu = -1} > #3 perf_session__mmap_read_idx (self=0x827bf50, idx=0) at builtin-top.c:825 > sample = {ip = 3076591968, pid = 16947, tid = 16947, time = > 18446744073709551615, addr = 0, id = 18446744073709551615, stream_id = > 18446744073709551615, period = 1057945, > cpu = 4294967295, raw_size = 4294967295, raw_data = 0x0, > callchain = 0x8} > evsel = > event = > ret = -1230499812 > __PRETTY_FUNCTION__ = "perf_session__mmap_read_idx" > #4 0x08068489 in perf_session__mmap_read (self=0x827bf50) at builtin-top.c:839 > i = > #5 __cmd_top () at builtin-top.c:1003 > hits = > thread = 3075533632 > #6 cmd_top (argc=, argv=0xbffffd38, prefix=0x0) at > builtin-top.c:1274 > pos = > status = -12 > #7 0x08055e90 in run_builtin (p=0x80f2624, argc=1, argv=0xbffffd38) > at perf.c:286 > status = > st = {st_dev = 13258589575124611012, __pad1 = 49030, __st_ino > = 3087003588, st_mode = 3086926771, st_nlink = 3087006384, st_uid = > 3079435296, st_gid = 1, st_rdev = 1, > __pad2 = 256, st_size = -4611691086352719864, st_blksize = > -1073742952, st_blocks = 577833303592140800, st_atim = {tv_sec = > 135210216, tv_nsec = 1}, st_mtim = { > tv_sec = 135105751, tv_nsec = -1}, st_ctim = {tv_sec = > -1213274024, tv_nsec = 0}, st_ino = 135811744} > #8 0x08055651 in handle_internal_command (argv=0xbffffd38, argc=1) at > perf.c:358 > p = 0x80f2624 > cmd = 0xbffffe6b "top" > i = > commands = {{cmd = 0x80d20f7 "buildid-cache", fn = 0x805e8a0 > , option = 0}, {cmd = 0x80ce581 "buildid-list", fn > = 0x805e320, > option = 0}, {cmd = 0x80ce58e "diff", fn = 0x8058630 > , option = 0}, {cmd = 0x80ce593 "evlist", fn = 0x8058a60 > , option = 0}, { > cmd = 0x80ce3ea "help", fn = 0x8059420, option > = 0}, {cmd = 0x80d13b1 "list", fn = 0x805e920, option = 0}, > {cmd = 0x80d6b7c "record", > fn = 0x8060360, option = 0}, {cmd = 0x80d5a46 > "report", fn = 0x8060d90, option = 0}, {cmd = 0x80ce59a > "bench", fn = 0x8056850, > option = 0}, {cmd = 0x80ce5a0 "stat", fn = 0x8062950 > , option = 0}, {cmd = 0x80ce5a5 "timechart", fn = 0x8065550 > , option = 0}, { > cmd = 0x80ce5af "top", fn = 0x8067de0, option = > 0}, {cmd = 0x80ceb3b "annotate", fn = 0x8056350, option > = 0}, {cmd = 0x80ce3f1 "version", > fn = 0x807c780, option = 0}, {cmd = > 0x80d5b37 "script", fn = 0x806a5f0, option = 0}, {cmd = > 0x80ce5b3 "sched", fn = 0x805c5d0, > option = 0}, {cmd = 0x80ce5b9 "probe", fn = 0x806b740 > , option = 0}, {cmd = 0x80ce5bf "kmem", fn = 0x806cf80 > , option = 0}, { > cmd = 0x80d9295 "lock", fn = 0x806e180, option > = 0}, {cmd = 0x80ce5c4 "kvm", fn = 0x806eaf0, option = 0}, > {cmd = 0x80ce5c8 "test", > fn = 0x8071880, option = 0}, {cmd = 0x80ce5cd > "inject", fn = 0x8071ed0, option = 0}} > #9 run_argv (argv=0xbffffbd8, argcp=0xbffffbdc) at perf.c:402 > done_alias = 0 > #10 main (argc=1, argv=0xbffffd38) at perf.c:512 > done_help = 0 > was_alias = 0 > cmd = 0xbffffe6b "top" > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/