Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754464Ab1ESWY0 (ORCPT ); Thu, 19 May 2011 18:24:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:9488 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753356Ab1ESWYY (ORCPT ); Thu, 19 May 2011 18:24:24 -0400 Date: Thu, 19 May 2011 18:24:04 -0400 From: Vivek Goyal To: "Paul E. McKenney" Cc: Jens Axboe , Paul Bolle , linux kernel mailing list Subject: Mysterious CFQ crash and RCU Message-ID: <20110519222404.GG12600@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4170 Lines: 97 Hi Paul, Few people have reported hard to reproduce CFQ crash time to time. Most of the people could not reproduce the issue consistently hence we could not debug it further. Now for the first time, Paul Bolle, can reproduce this issue reliably on his machine, so I am hopeful that we can get to bottom of this long time pending mysterious issue. There might be many bz tracking this issue but one of the bzs is 577968 in bugzilla.redhat.com BUG: unable to handle kernel paging request at 6b6b6b6b IP: [] call_for_each_cic+0x29/0x44 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_netbios_ns ip6table_filter nf_conntrack_broadcast ip6_tables snd_intel8x0m snd_intel8x0 arc4 snd_ac97_codec ath5k ac97_bus snd_seq ath snd_seq_device mac80211 snd_pcm ppdev thinkpad_acpi cfg80211 microcode snd_timer parport_pc e1000 snd rfkill parport iTCO_wdt soundcore snd_page_alloc iTCO_vendor_support joydev i2c_i801 pcspkr uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Pid: 11397, comm: mandb Not tainted 2.6.39-0.rc5.git0.0.local.fc16.i686 #1 IBM / EIP: 0060:[] EFLAGS: 00010202 CPU: 0 EIP is at call_for_each_cic+0x29/0x44 EAX: 00000001 EBX: 6b6b6b6b ECX: 00000246 EDX: c0ad4a98 ESI: ecfd0980 EDI: c05e88a4 EBP: c670bf18 ESP: c670bf0c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mandb (pid: 11397, ti=c670a000 task=d40769f0 task.ti=c670a000) Stack: ecfd0980 d40769f0 d4076dc0 c670bf20 c05e85ca c670bf2c c05ddc86 ecfd0980 c670bf40 c05ddd16 d40769f0 00000012 c670bf74 c670bf8c c0440813 f2a182e8 d401ac08 c670bf70 c04f48da 00000001 d4076bf4 d40769e8 d4076bf4 00000001 Call Trace: [] cfq_free_io_context+0x12/0x14 [] put_io_context+0x34/0x5c [] exit_io_context+0x68/0x6d [] do_exit+0x63e/0x661 [] ? fsnotify_modify+0x64/0x6c [] do_group_exit+0x63/0x86 [] sys_exit_group+0x18/0x18 [] sysenter_do_call+0x12/0x38 Code: 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 89 c6 89 d7 e8 3a ec ff ff 8b 5e 4c e8 00 b1 e6 ff 85 c0 74 05 e8 b7 ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd e8 49 ec ff EIP: [] call_for_each_cic+0x29/0x44 SS:ESP 0068:c670bf0c CR2: 000000006b6b6b6b ---[ end trace 70dea08ef41fda48 ]--- In summary once in a while people notice CFQ crash. Debugging shows that we have a rcu protected hlist of elements of type cfq_io_context. Head of the list is at ioc->cic_list. We crash while traversing ioc->cic_list under rcu. Looks like an element which we are trying to fetch the next pointer from got freed to slab, and the object got poisoned with 0x6b6b6b6b.. and then we tried to fetch the next object pointer and ended up dereferencing a freed object and CFQ crashes. The function in question here is call_for_each_cic() in block/cfq-iosched.c We free the cfq_io_context object using call_rcu(). So on the surface it looks like that we decoupled a cfq_io_context object from the hash list and scheduled a call_rcu() so that it is freed after rcu grace period but somehow object got freed earlier and got released to slab and got poisoned. Is it possible? We have looked at the code many a times and we think that rcu locking around it is fine. Is it possible that a call_rcu() can fire before rcu grace period is over. I had put a debug patch in CFQ (details are in bugzilla) and I can see that after decoupling the object from the hash list, it got freed while we were still under rcu_read_lock(). Is there any known issue or is there any quick tip on how can I go about debugging it further from rcu point of view. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/