Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751449AbaDUIli (ORCPT ); Mon, 21 Apr 2014 04:41:38 -0400 Received: from rtits2.realtek.com ([60.250.210.242]:58182 "EHLO rtits2.realtek.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750902AbaDUIlD (ORCPT ); Mon, 21 Apr 2014 04:41:03 -0400 X-SpamFilter-By: BOX Solutions SpamTrap 5.39 with qID s3L8dpVb013517, This message is accepted by code: ctloc85258 Message-ID: <5354DA4A.8060503@realsil.com.cn> Date: Mon, 21 Apr 2014 16:43:54 +0800 From: micky User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Peter Wu , CC: , , , , , , , , , , Borislav Petkov Subject: Re: Locking issues in "mmc: rtsx: add support for pre_req and post_req" References: <1397612325-4968-1-git-send-email-micky_ching@realsil.com.cn> <29134783.v67ScDLFcC@al> <201404200220.s3K2KC4h007515@rtits1.realtek.com> In-Reply-To: <201404200220.s3K2KC4h007515@rtits1.realtek.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [172.29.41.103] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, I'm considering not using spinlock to get it work and will resend the patch later. Thank you. Best Regards. micky On 04/19/2014 07:13 AM, Peter Wu wrote: > So, I reverted c42deffd5b53c9e583d83c7964854ede2f12410d ("mmc: rtsx: add > support for pre_req and post_req") on top of v3.15-rc1-49-g10ec34f and the > hang issue went away. > > There is something that is possibly problematic. All three tasklets (cmd, > data, finish) try to spinlock on host->lock. According to the tasklets > documentation[1], they always run at interrupts (i.e., at any time, possibly > right after tasklet_schedule if I got this right?). > > These tasklets however do get scheduled *under* the host->lock which will > cause a deadlock. This proposed patch ("mmc: rtsx: fix possible circular > locking dependency") fixes the issue for sd_isr_done_transfer, but there are > others: > > 1. sd_request_timeout: > 125 spin_lock_irqsave(&host->lock, flags); > ... > 139 out: > 140 tasklet_schedule(&host->finish_tasklet); // <-- > 141 spin_unlock_irqrestore(&host->lock, flags); > > 2. sd_get_rsp (cmd_tasklet!): > 429 spin_lock_irqsave(&host->lock, flags); > ... > 506 tasklet_schedule(&host->finish_tasklet); // <-- > 507 spin_unlock_irqrestore(&host->lock, flags); > > 3. sd_finish_multi_rw (data_tasklet!): > 657 spin_lock_irqsave(&host->lock, flags); > ... > 684 tasklet_schedule(&host->finish_tasklet); // <-- > 685 spin_unlock_irqrestore(&host->lock, flags); > > 4. sdmmc_request: > 921 mutex_lock(&pcr->pcr_mutex); > 922 spin_lock_irqsave(&host->lock, flags); > ... > 967 tasklet_schedule(&host->finish_tasklet); // <-- > 968 spin_unlock_irqrestore(&host->lock, flags); > > 5. rtsx_pci_sdmmc_drv_remove: > 1526 spin_lock_irqsave(&host->lock, flags); > 1527 if (host->mrq) { > ... > 1541 tasklet_schedule(&host->finish_tasklet); // <-- > 1542 } > 1543 spin_unlock_irqrestore(&host->lock, flags); > 1544 > 1545 del_timer_sync(&host->timer); > 1546 tasklet_kill(&host->cmd_tasklet); > 1547 tasklet_kill(&host->data_tasklet); > 1548 tasklet_kill(&host->finish_tasklet); // <-- > > pcr_mutex (un)locking: > - gets locked in sdmmc_request > - gets unlocked in sd_finish_request (finish_tasklet). > - gets locked/unlocked in sdmmc_set_ios > - gets locked/unlocked in sdmmc_get_ro > - gets locked/unlocked in sdmmc_get_cd > - gets locked/unlocked in sdmmc_switch_voltage > - gets locked/unlocked in sdmmc_execute_tuning > > finish_tasklet (sd_finish_request()) gets scheduled in: > - sd_request_timeout (under host->lock; called on timer expiration) > - error path of sd_send_cmd > - get_rsp (cmd_tasklet; under host->lock) > - error path of sd_start_multi_rw (after mod timer) > - sd_finish_multi_rw (under host->lock) > - sdmmc_request (under host->lock in error path; without host->lock > elsewhere) > - rtsx_pci_sdmmc_drv_remove (under host->lock) > > sd_request_timeout (timer) related: > - deleted in sd_finish_request under host->lock (also assumes pcr_mutex) > - sd_send_cmd (timer set with 100ms timeout, not called in error path) > - sd_start_multi_rw (timeout set to 10 HZ, called before error path is > checked in which finish_tasklet gets scheduled) > Note: sd_request_timeout claims+releases host->lock. > > If I understand it correctly, host->lock is responsible for protecting the > realtek_pci_sdmmc structure. Why is tasklet_schedule() on the same lock? > > Shouldn't the above five tasklet_schedule calls be moved outside the lock? > Will it be problematic if the same tasklet gets executed multiple times (if > that is possible?). > > Does it really need that much locking? dw_mmc.c also implements pre_req, but > uses tasklets without needing to lock anything. > > Kind regards, > Peter > > [1]: http://www.makelinux.net/ldd3/chp-7-sect-5 > > On Friday 18 April 2014 16:00:53 Peter Wu wrote: >> On Wednesday 16 April 2014 09:38:44 micky_ching@realsil.com.cn wrote: >>> From: Micky Ching >>> >>> To avoid dead lock, we need make sure host->lock is always acquire >>> before pcr->lock. But in irq handler, we acquired pcr->lock in rtsx mfd >>> driver, and sd_isr_done_transfer() is called during pcr->lock already >>> acquired. Since in sd_isr_done_transfer() the only work we do is schdule >>> tasklet, the cmd_tasklet and data_tasklet never conflict, so it is safe >>> to remove spin_lock() here. >>> >>> Signed-off-by: Micky Ching >>> --- >>> >>> drivers/mmc/host/rtsx_pci_sdmmc.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >> This patch came from >> https://lkml.kernel.org/r/534DE1D7.3000308@realsil.com.cn ("Re: >> rtsx_pci_sdmmc lockdep splat"). >> >> With v3.15-rc1-49-g10ec34f, I have a hung machine when inserting a SD card. >> lockdep was not enabled for the kernel, I have not bisected yet. >> This patch on top of that kernel version does not help (tested by >> rmmod rtsx_pci_sdmmc and insmod the patched one). >> >> Console (as typed over from a picture, sorry for any typos): >> WARNING: CPU: 1 PID: 0 at kernel/locking/mutex.c:698 >> DEBUG_LOCKS_WARN_ON(in_interrupt()) >> Modules linked in: ... >> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15-rc1-custom-000049-g10ec34f #5 >> Hardware name: Shuttle Inc. XS36V/XS36V, BIOS 1.11 12/18/2012 >> Call trace: >> dump_stack >> warn_slowpath_common >> warn_slowpath_fmt >> __mutex_unlock_slowpath >> mutex_unlock >> sd_finish_request [rtsx_pci_sdmmc] >> tasklet_action >> __do_softirq >> irq_exit >> smp_apic_timer_interrupt >> apic_timer_interrupt >> ? cpuidle_enter_state >> cpuidle_enter >> cpu_startup_entry >> start_secondary >> ---[end trace ...]--- >> (60 seconds later:) >> INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=18004 >> jiffies, g=3264, c=3263, q=2) INFO: Stall ended before state dump start >> >> I also managed to get this trace about 106 seconds later when switching TTY: >> INFO: task kworker/... blocked for more than 120 seconds >> Workqueue: kmmcd mm_rescan [mmc_core] >> Call trace: >> ? update_rq_clock.part80 >> ? internal_add_timer >> schedule >> schedule_preempt >> __mutex_lock_slowpath >> mutex_lock >> sdmmc_request [rtsx_pci_sdmmc] >> mmc_start_request [mmc_core] >> __mmc_start_req [mmc_core] >> mmc_wait_for_cmd [mmc_core] >> ? mmc_release_host [mmc_core] >> mmc_io_rw_direct_host [mmc_core] >> >> I'll try to get a lockdep kernel and text logs later, but perhaps you >> already know the issue? >> >> Peter >> >>> diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c >>> b/drivers/mmc/host/rtsx_pci_sdmmc.c index 453e1d4..40695e0 100644 >>> --- a/drivers/mmc/host/rtsx_pci_sdmmc.c >>> +++ b/drivers/mmc/host/rtsx_pci_sdmmc.c >>> @@ -108,12 +108,10 @@ static void sd_isr_done_transfer(struct >>> platform_device *pdev) { >>> >>> struct realtek_pci_sdmmc *host = platform_get_drvdata(pdev); >>> >>> - spin_lock(&host->lock); >>> >>> if (host->cmd) >>> >>> tasklet_schedule(&host->cmd_tasklet); >>> >>> - if (host->data) >>> + else if (host->data) >>> >>> tasklet_schedule(&host->data_tasklet); >>> >>> - spin_unlock(&host->lock); >>> >>> } >>> >>> static void sd_request_timeout(unsigned long host_addr) > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/