Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp12251943rwl; Tue, 3 Jan 2023 11:16:33 -0800 (PST) X-Google-Smtp-Source: AMrXdXtzDi6PmUBIlIYhlP9EaUcZ9EkixvkSBVla0heSJmRmlTA+rl4fZA4Z+dUn3XVJkZvSwBHL X-Received: by 2002:a17:903:3014:b0:191:1987:9f67 with SMTP id o20-20020a170903301400b0019119879f67mr39515369pla.34.1672773392915; Tue, 03 Jan 2023 11:16:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672773392; cv=none; d=google.com; s=arc-20160816; b=LLZl38zDKpvtP8l3swnulRLt3hqUB7+Fk0pISQC9BAsLI3pN5LYpqfoxdZYfyIGH5G nrYQw4bLXs1jaWbJSP5kdr0EKsI2QNKBXAGyz3jWPKRD+6fDErvgtobpPFqo3qvrTB5P xefuS/E3j5v2dSQbOLisELvlS7oIByhat/w0VCXhYJJ5yw48u+pW5fYQ+QHrH5Id1O2X Tvg7OyU3PvWHp1oXfgOWTJS5COglEeFU8mPEZ263Dm4h0Zek/18963LmpdjRBkZ4Y/P3 /YUg2coZFxSRRqkd0MWwpu7brpwDH8MRCIkRnUEqLsI3QZ83V4KTCW3Z5ykujHEz0ody /UuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=cFDLmPhzfQsznVNbQGoErzPryu79EGpfirjanA+bO50=; b=CO038i88R5ZoKLyanIzxX5p4BdTJFdoOxFoHchN7Qs/sKR0/yocGZPdchzdZtR103D Yb2U9BDM5ebNeD5Q1H0hYhMzKy/6QBIza2D3YuejP9ci6mKWTJq4Dn4TiwT4RVyFwiNP VpkaACBmV9mEPe7RBDGIvgYEvaVdp71AOA4FUGiUlV/aaiibU7uRA4p6gCLcCXUneQwx 0nYIN4XQC8IiwiRg3BGBS4kZ2QvPGE+FfznZv/TX8dUidz00ry/NeztrCwjjYg8mc9KW KZAjc3wd8nPtIhBrZHVOcNobIC7S5/cmyPN9Hr++iPj9nNs1iF7oefkPznF8HWs92o5+ LiMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="uX52/RRb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i12-20020a170902c94c00b001871e838ba9si35253085pla.344.2023.01.03.11.16.25; Tue, 03 Jan 2023 11:16:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="uX52/RRb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238469AbjACSuk (ORCPT + 60 others); Tue, 3 Jan 2023 13:50:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238705AbjACSu0 (ORCPT ); Tue, 3 Jan 2023 13:50:26 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CB47124; Tue, 3 Jan 2023 10:50:22 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 87B89614E2; Tue, 3 Jan 2023 18:50:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97E3DC433EF; Tue, 3 Jan 2023 18:50:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672771821; bh=lc9EWhpLl3cJRu/WOwyewFLiY3xxPhpyl6R5J9nJ08Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=uX52/RRbU4FqO9YV6D1O+Ayn9r/CPtlNof6E4GlI4TlEA33SOhVG3bT3pKNWOVdo7 tvjJ9zs92jzmVHzVZb+uqg22k8u+TnRZoABG2CZnMO9IsA0aeh3pLWql8gE0HAec5B kTiDEebz3Dl5uE/Yo4oi6wGuj0nuiBwvULJ9gxxbHWrX8BS6KcTX/Kg1AH8SbiT0Ky kzi3g0rxpq9uZjQyhq7km1SyaUspiVVOh7POd90TU6ZJVRIgMExCdhkBWY/LGLcS42 A7YJB/piZZJjsLqR8K+SBF99lBW1WhXGQxxAby0dKSbMykBhrVLn31lsJHOvxxEAos 4+KMKNG0DzEiw== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id AB81E40468; Tue, 3 Jan 2023 15:50:17 -0300 (-03) Date: Tue, 3 Jan 2023 15:50:17 -0300 From: Arnaldo Carvalho de Melo To: Namhyung Kim Cc: Thomas Richter , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, svens@linux.ibm.com, gor@linux.ibm.com, sumanthk@linux.ibm.com, hca@linux.ibm.com Subject: Re: [PATCH] perf lock: Fix core dump in command perf lock contention Message-ID: References: <20221230102627.2410847-1-tmricht@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://acmel.wordpress.com X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Tue, Jan 03, 2023 at 08:45:34AM -0800, Namhyung Kim escreveu: > Hi Arnaldo, > > On Mon, Jan 2, 2023 at 7:33 AM Arnaldo Carvalho de Melo wrote: > > > > Em Fri, Dec 30, 2022 at 11:26:27AM +0100, Thomas Richter escreveu: > > > The test case perf lock contention dumps core on s390. Run the following > > > commands: > > > # ./perf lock record -- ./perf bench sched messaging > > > # Running 'sched/messaging' benchmark: > > > # 20 sender and receiver processes per group > > > # 10 groups == 400 processes run > > > > > > Total time: 2.799 [sec] > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ] > > > # > > > # ./perf lock contention > > > Segmentation fault (core dumped) > > > # > > > > > > The function call stack is lengthy, here are the top 5 functions: > > > # gdb ./perf core.24048 > > > GNU gdb (GDB) Fedora Linux 12.1-6.fc37 > > > Copyright (C) 2022 Free Software Foundation, Inc. > > > Core was generated by `./perf lock contention'. > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, > > > addr=1789230) at util/machine.c:3356 > > > 3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start); > > > > > > (gdb) where > > > #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28,\ > > > addr=1789230) at util/machine.c:3356 > > > #1 0x000000000109f244 in callchain_id (evsel=0x30313e0,\ > > > sample=0x3ffea4f77d0) at builtin-lock.c:957 > > > #2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290,\ > > > addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) \ > > > at builtin-lock.c:586 > > > #3 0x000000000109f4d0 in report_lock_contention_begin_event \ > > > (evsel=0x30313e0, sample=0x3ffea4f77d0) > > > at builtin-lock.c:1004 > > > #4 0x00000000010a00ae in evsel__process_contention_begin \ > > > (evsel=0x30313e0, sample=0x3ffea4f77d0) > > > at builtin-lock.c:1254 > > > #5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, \ > > > event=0x3ff85601ef8, sample=0x3ffea4f77d0, > > > evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464 > > > sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) \ > > > at util/session.c:1523 > > > ..... > > > > > > The issue is in function machine__is_lock_function() in file > > > ./util/machine.c lines 3355: > > > /* should not fail from here */ > > > sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", > > > &kmap); > > > machine->sched.text_end = kmap->unmap_ip(kmap, sym->start) > > > > > > On s390 the symbol __sched_text_end is *NOT* in the symbol list and the > > > resulting pointer sym is set to NULL. The sym->start is then a NULL pointer > > > access and generates the core dump. > > > > > > The reason why __sched_text_end is not in the symbol list on s390 is > > > simple: > > > When the symbol list is created at perf start up with function calls > > > dso__load > > > +--> dso__load_vmlinux_path > > > +--> dso__load_vmlinux > > > +--> dso__load_sym > > > +--> dso__load_sym_internal (reads kernel symbols) > > > +--> symbols__fixup_end > > > +--> symbols__fixup_duplicate > > > > > > The issue is in function symbols__fixup_duplicate(). It deletes all > > > symbols with have the same address. On s390 > > > # nm -g ~/linux/vmlinux| fgrep c68390 > > > 0000000000c68390 T __cpuidle_text_start > > > 0000000000c68390 T __sched_text_end > > > # > > > two symbols have identical addresses and __sched_text_end is considered > > > duplicate (in ascending sort order) and removed from the symbol list. > > > Therefore it is missing and an invalid pointer reference occurs. > > > The code checks for symbol __sched_text_start and when it exists assumes > > > symbol __sched_text_end is also in the symbol table. However this is > > > not the case on s390. > > > > > > Same situation exists for symbol __lock_text_start: > > > 0000000000c68770 T __cpuidle_text_end > > > 0000000000c68770 T __lock_text_start > > > This symbol is also removed from the symbol table but used in function > > > machine__is_lock_function(). > > > > > > To fix this and keep duplicate symbols in the symbol table, set > > > symbol_conf.allow_aliases to true. This prevents the removal of duplicate > > > symbols in function symbols__fixup_duplicate(). > > > > > > Output After: > > > # ./perf lock contention > > > contended total wait max wait avg wait type caller > > > > > > 48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a > > > 47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132 > > > 5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140 > > > 4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8 > > > # > > > > > > Fixes: cc2367eebb0c ("machine: Adopt is_lock_function() from builtin-lock.c") > > > > Humm, is that really the cset that introduces the problem? It just moves > > things around, the cset that introduced the is_lock_function() function, > > that assumed that __sched_text_end was always available was: > > > > commit 0d2997f750d1de394231bc22768dab94a5b5db2f > > Author: Namhyung Kim > > Date: Wed Jun 15 09:32:22 2022 -0700 > > > > perf lock: Look up callchain for the contended locks > > > > --- > > > > Right? Namhyung? Can you spot any problem in enabling duplicates as a > > fix? > > Yep, I think that's the cset introduced the problem. > I'm fine with the fix. > > Acked-by: Namhyung Kim Thanks, applied. - Arnaldo