Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752933AbaDRPXq (ORCPT ); Fri, 18 Apr 2014 11:23:46 -0400 Received: from merlin.infradead.org ([205.233.59.134]:59408 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752527AbaDRPXf (ORCPT ); Fri, 18 Apr 2014 11:23:35 -0400 Date: Fri, 18 Apr 2014 17:23:14 +0200 From: Peter Zijlstra To: Vince Weaver Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Thomas Gleixner , Steven Rostedt Subject: Re: [perf] more perf_fuzzer memory corruption Message-ID: <20140418152314.GY11182@twins.programming.kicks-ass.net> References: <20140416141514.GS11182@twins.programming.kicks-ass.net> <20140417094815.GA9348@gmail.com> <20140417114533.GJ11096@twins.programming.kicks-ass.net> <20140417142213.GA29338@gmail.com> <20140417145418.GM11096@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 18, 2014 at 10:45:47AM -0400, Vince Weaver wrote: > > OK, since the slab corruption was happening to event->hlist_entry->pprev > I added a WARN() call to every modifier of pprev under > include/linux/*list*.h to see what was stomping over freed memory. > > This is what came up: > > Apr 18 10:36:11 haswell kernel: [ 998.316177] ------------[ cut here ]------------ > Apr 18 10:36:11 haswell kernel: [ 998.321188] WARNING: CPU: 3 PID: 20717 at include/linux/rculist.h:410 perf_trace_add+0xc1/0x100() > Apr 18 10:36:11 haswell kernel: [ 998.330681] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys > Apr 18 10:36:11 haswell kernel: [ 998.405736] CPU: 3 PID: 20717 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #63 > Apr 18 10:36:11 haswell kernel: [ 998.413162] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > Apr 18 10:36:11 haswell kernel: [ 998.420987] 0000000000000009 ffff880117923c70 ffffffff8164f753 0000000000000000 > Apr 18 10:36:11 haswell kernel: [ 998.429016] ffff880117923ca8 ffffffff810647cd ffffe8ffffcc30d8 ffff8800ce29f040 > Apr 18 10:36:11 haswell kernel: [ 998.437121] ffffffff81c1ba40 ffff8800cd3d9040 000000da35b18e50 ffff880117923cb8 > Apr 18 10:36:11 haswell kernel: [ 998.445246] Call Trace: > Apr 18 10:36:11 haswell kernel: [ 998.447910] [] dump_stack+0x45/0x56 > Apr 18 10:36:11 haswell kernel: [ 998.453451] [] warn_slowpath_common+0x7d/0xa0 > Apr 18 10:36:11 haswell kernel: [ 998.459871] [] warn_slowpath_null+0x1a/0x20 > Apr 18 10:36:11 haswell kernel: [ 998.466143] [] perf_trace_add+0xc1/0x100 > Apr 18 10:36:11 haswell kernel: [ 998.472160] [] event_sched_in.isra.76+0x90/0x1e0 > Apr 18 10:36:11 haswell kernel: [ 998.478849] [] group_sched_in+0x69/0x1e0 > Apr 18 10:36:11 haswell kernel: [ 998.484812] [] __perf_event_enable+0x255/0x260 > Apr 18 10:36:11 haswell kernel: [ 998.491370] [] remote_function+0x40/0x50 > Apr 18 10:36:11 haswell kernel: [ 998.497311] [] generic_exec_single+0x126/0x170 > Apr 18 10:36:11 haswell kernel: [ 998.503764] [] ? task_clock_event_add+0x40/0x40 > Apr 18 10:36:11 haswell kernel: [ 998.510432] [] smp_call_function_single+0x67/0xa0 > Apr 18 10:36:11 haswell kernel: [ 998.517299] [] task_function_call+0x44/0x50 > Apr 18 10:36:11 haswell kernel: [ 998.523539] [] ? perf_event_sched_in+0x90/0x90 > Apr 18 10:36:11 haswell kernel: [ 998.530085] [] perf_event_enable+0x90/0xf0 > Apr 18 10:36:11 haswell kernel: [ 998.536308] [] ? task_function_call+0x50/0x50 > Apr 18 10:36:11 haswell kernel: [ 998.542761] [] perf_event_for_each_child+0x3a/0xa0 > Apr 18 10:36:11 haswell kernel: [ 998.551512] [] perf_event_task_enable+0x4f/0x80 > Apr 18 10:36:11 haswell kernel: [ 998.560080] [] SyS_prctl+0x255/0x4b0 > Apr 18 10:36:11 haswell kernel: [ 998.567605] [] ? lockdep_sys_exit_thunk+0x35/0x67 > Apr 18 10:36:11 haswell kernel: [ 998.576333] [] system_call_fastpath+0x1a/0x1f > Apr 18 10:36:11 haswell kernel: [ 998.584698] ---[ end trace b175966afd57a174 ]--- > Apr 18 10:36:12 haswell kernel: [ 998.910691] ------------[ cut here ]------------ OK, that's a good clue. That looks like we're freeing events that still are on the owner list, which would indicate we're freeing events that have a refcount. I added a WARN in free_event() to check the refcount, along with a number of false positives (through the perf_event_open() fail path) I do appear to be getting actual fails here. At least I can 'reproduce' this. Earlier attempts, even based on your .config only got me very mysterious lockups -- I suspect the corruption happens on a slightly different spot or so and completely messes up the machine. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/