Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752790AbaLTANm (ORCPT ); Fri, 19 Dec 2014 19:13:42 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:15466 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751414AbaLTANk (ORCPT ); Fri, 19 Dec 2014 19:13:40 -0500 Date: Fri, 19 Dec 2014 19:12:49 -0500 From: Chris Mason Subject: Re: frequent lockups in 3.18rc4 To: Thomas Gleixner CC: Linus Torvalds , Dave Jones , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?iso-8859-1?q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Message-ID: <1419034369.13012.8@mail.thefacebook.com> In-Reply-To: References: <20141218051327.GA31988@redhat.com> <1418918059.17358.6@mail.thefacebook.com> <20141218161230.GA6042@redhat.com> <20141219024549.GB1671@redhat.com> <20141219035859.GA20022@redhat.com> <20141219040308.GB20022@redhat.com> <20141219145528.GC13404@redhat.com> <20141219203135.GA1200@ret.masoncoding.com> X-Mailer: geary/0.8.2 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [192.168.16.4] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2014-12-19_05:2014-12-19,2014-12-19,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 kscore.is_bulkscore=0 kscore.compositescore=0 circleOfTrustscore=92.9824445108128 compositescore=0.12498974524102 urlsuspect_oldscore=0.12498974524102 suspectscore=0 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=2524143 rbsscore=0.12498974524102 spamscore=0 recipient_to_sender_domain_totalscore=8 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1412200001 X-FB-Internal: deliver Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 19, 2014 at 6:22 PM, Thomas Gleixner wrote: > On Fri, 19 Dec 2014, Chris Mason wrote: >> On Fri, Dec 19, 2014 at 11:15:21AM -0800, Linus Torvalds wrote: >> > Here's another pattern. In your latest thing, every single time >> that >> > CPU1 is waiting for some other CPU to pick up the IPI, we have >> CPU0 >> > doing this: >> > >> > [24998.060963] NMI backtrace for cpu 0 >> > [24998.061989] CPU: 0 PID: 2940 Comm: trinity-c150 Not tainted >> 3.18.0+ #108 >> > [24998.064073] task: ffff8801bf3536b0 ti: ffff880197e0c000 >> task.ti: >> > ffff880197e0c000 >> > [24998.065137] RIP: 0010:[] >> [] >> > read_hpet+0x16/0x20 >> > [24998.083577] [] ktime_get+0x3e/0xa0 >> > [24998.084450] [] tick_sched_timer+0x23/0x160 >> > [24998.085315] [] __run_hrtimer+0x76/0x1f0 >> > [24998.086173] [] ? tick_init_highres+0x20/0x20 >> > [24998.087025] [] hrtimer_interrupt+0x107/0x260 >> > [24998.087877] [] >> local_apic_timer_interrupt+0x3b/0x70 >> > [24998.088732] [] >> smp_apic_timer_interrupt+0x45/0x60 >> > [24998.089583] [] >> apic_timer_interrupt+0x6f/0x80 >> > [24998.090435] >> > [24998.091279] [] ? __remove_hrtimer+0x4e/0xa0 >> > [24998.092118] [] ? ipcget+0x8a/0x1e0 >> > [24998.092951] [] ? ipcget+0x7c/0x1e0 >> > [24998.093779] [] SyS_msgget+0x4d/0x70 >> > >> > and I think that's the smoking gun. The reason CPU0 isn't picking >> up >> > any IPI's is because it is in some endless loop around >> read_hpet(). >> > >> > There is even time information in the register dump: >> > >> > RAX: 0000000061fece8a RBX: 0000000000510792 RCX: 0000000000000000 >> > RAX: 0000000079e588fc RBX: 0000000000511d6e RCX: 0000000000000000 >> > RAX: 0000000091ca7f65 RBX: 0000000000513346 RCX: 0000000000000000 >> > RAX: 00000000a9afbd0d RBX: 000000000051491e RCX: 0000000000000000 >> > RAX: 00000000cbd1340c RBX: 000000000051684a RCX: 0000000000000000 >> > RAX: 00000000fb9d303f RBX: 00000000005193fc RCX: 0000000000000000 >> > RAX: 000000002b67efe4 RBX: 000000000051c224 RCX: 0000000000000004 >> > >> > That RAX value is the value we just read from the HPET, and RBX >> seems >> > to be monotonically increasing too, so it's likely the sequence >> > counter in ktime_get(). >> > >> > So it's not stuck *inside* read_hpet(), and it's almost certainly >> not >> > the loop over the sequence counter in ktime_get() either (it's not >> > increasing *that* quickly). But some basically infinite >> __run_hrtimer >> > thing or something? >> >> Really interesting. >> >> So, we're calling __run_hrtimer in a loop: >> >> while ((node = timerqueue_getnext(&base->active))) { >> ... >> __run_hrtimer(timer, &basenow); >> ... >> } >> >> The easy question is how often does trinity call nanosleep? >> >> Looking at __run_hrtimer(), it drops the lock and runs the function >> and then >> takes the lock again, maybe enqueueing us again right away. >> >> timer->state is supposed to protect us from other CPUs jumping in >> and doing >> something else with the timer, but it feels racey wrt >> remove_hrtimer(). >> Something like this, but I'm not sure how often >> __hrtimer_start_range_ns gets >> called >> >> CPU 0 CPU 1 >> __run_hrtimer() >> timer->state = HRTIMER_STATE_CALLBACK >> removed from list >> unlock cpu_base->lock >> restrt = fn(timer) >> __hrtimer_start_range_ns() >> base = lock_hrtimer_base() >> ret = remove_hrtimer() >> finds timer->state = HRTIMER_STATE_CALLBACK >> does nothing >> new_base = switch_hrtimer_base() >> now we're on a different base, different lock >> lock(cpu_base->lock) >> enqueue the timer >> enqueue the timer > > But at the very end this would be detected by the runtime check of the > hrtimer interrupt, which does not trigger. And it would trigger at > some point as ALL cpus including CPU0 in that trace dump make > progress. I'll admit that at some point we should be hitting one of the WARN or BUG_ON, but it's possible to thread that needle and corrupt the timer list, without hitting a warning (CPU 1 in my example has to enqueue last). Once the rbtree is hosed, it can go forever. Probably not the bug we're looking for, but still suspect in general. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/