Return-path: Received: from mtiwmhc13.worldnet.att.net ([204.127.131.117]:42174 "EHLO mtiwmhc13.worldnet.att.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbYHVUwU (ORCPT ); Fri, 22 Aug 2008 16:52:20 -0400 Message-ID: <48AF2709.6040707@lwfinger.net> (sfid-20080822_225224_144118_52274BE2) Date: Fri, 22 Aug 2008 15:52:25 -0500 From: Larry Finger MIME-Version: 1.0 To: Chr CC: John W Linville , Johannes Berg , linux-wireless@vger.kernel.org Subject: Re: [PATCH] p54: Fix potential concurrent access to private data References: <4896381c.abQmtcnBFB3TpLyf%Larry.Finger@lwfinger.net> <200808062321.53120.chunkeey@web.de> <489A4829.5080704@lwfinger.net> <200808222157.55851.chunkeey@web.de> In-Reply-To: <200808222157.55851.chunkeey@web.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: Chr wrote: > On Thursday 07 August 2008 02:56:09 Larry Finger wrote: >> Chr wrote: >>> Well the reason why there isn't any bug report about it is maybe >>> because unlike most other devices we don't program one single >>> setting per time, but always a "group of similar ones" at once so >>> the device should always be in a sane state in theory. >>> >>> So as long as mac80211 provides at least some kind of protection >>> against it's own concurrency, we are in save waters even without >>> any fancy kind of locking. >>> >>> Regards, >>> Chr >> For the config callback, mac80211 does not protect against concurrency. We >> learned that with rtl8187, where this routine ran somewhat longer. After >> the power setting routine was added to the wext interface in mac80211, the >> driver broke due to concurrent calls to the config routine and was fixed by >> just this kind of mutex, which is why I added this protection for p54. >> > I just got a reply from an anonymous p54pci (2.6.27-rc4) user, with the > following problem: > > INFO: task prism54pci:10881 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > prism54pci D 0000000000000001 0 10881 2 > ffff8800780dbde0 0000000000000046 0000000000000000 ffffffff802375cb > ffffffff80778740 0000000000000286 ffff88007c632f38 ffff88007c5d8b40 > ffffffff806c2330 ffff88007c5d8d70 0000000001018740 ffff88007c5d8d70 > Call Trace: > [] __mod_timer+0xb7/0xc5 > [] dequeue_task_fair+0x4d/0xce > [] __mutex_lock_slowpath+0x6a/0xa0 > [] mutex_lock+0x1a/0x1e > [] p54_config+0x1a/0x46 [p54common] > [] ieee80211_sta_scan_work+0xf8/0x1b8 [mac80211] > [] ieee80211_sta_scan_work+0x0/0x1b8 [mac80211] > [] run_workqueue+0x79/0xfe > [] worker_thread+0x96/0xa5 > [] autoremove_wake_function+0x0/0x2e > [] worker_thread+0x0/0xa5 > [] kthread+0x47/0x73 > [] schedule_tail+0x27/0x5f > [] child_rip+0xa/0x11 > [] kthread+0x0/0x73 > [] child_rip+0x0/0x11 > > I have no idea going on here since locking is so simple here that it > shouldn't misbehave at all!?! > > Regards, > Chr > With only one spinlock that is set in p54_assign_addresses() and one mutex that is set in p54_config() and in p54_config_interface(), there are not too many possibilities. I don't see any paths that miss an unlock. The only explanation I can see is the following: There has to be an entry to p54_config() with the spinlock already set. This call locks the mutex and proceeds to call p54_set_freq(), which in turn calls p54_assign_addresses(). Execution would be held at the spin_lock. If p54_config is called again, the mutex is already locked, which would lead to the bug reported. We know that mac80211 can call the config routine before completion of the previous entry. I don't understand why p54_assign_addresses() does not run to completion and clear the spinlock. I'm sure this explanation is wrong, so please shoot it down. Larry