Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp600742ybz; Wed, 15 Apr 2020 14:55:28 -0700 (PDT) X-Google-Smtp-Source: APiQypJyWLi1kD5CxI3X/XYER/xijE17JpCc5cMUjLxQmfcaIftZejDU0Sdon0D/AqbOZcJS7H60 X-Received: by 2002:a17:906:130d:: with SMTP id w13mr6765933ejb.253.1586987727827; Wed, 15 Apr 2020 14:55:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586987727; cv=none; d=google.com; s=arc-20160816; b=LYVzLBbfPdC9lplXbOmlGPHGLxI35sViQNA3hLmtiVsHsVJAy6+8PfFGyz68+yXT1C sTtlL1Za2XBTRDJdlKQNNPNxGWNtOmNKyKmQjVThjqLCaefEX9Y93sbiz3FMNevXIqpY 9qel0jp+FduQfMW2j4iLBjxjx06m73ruXSq3e7/il9ruwvAbd/P6/r/mABiKA16D+i8R qX6357EungyVmmZQwV+raDcAWk6cRdSZ1UD21E/VdmFy41Ih4G3WWeAQmoOodmkYYcFS Ugxg4jIB7TwrralT5/ZD41EkM5hZbfvPIn+/6NCGDb9J4dYd+BzqbuZz04Z0GZRY/Vd8 mPDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=f6MlGhTrcfhE1matnY+gFa+jGAtYKA2T1KF8VNUMpDE=; b=eGQxH5HcezkWKt1ihxQ6bVfClorGIAjopISgIKdHIxeDP934qclut57IcgNKd66Hec exLUzs5YRDylXfgHiLCmRFQc+eaXPHjyKcUCyNKHBNxaUTb2vAe13mJ48sdZ1j08MDvH 9INagxl3nDNpcJD9iEtIbYbBTuhsBGMfpkS85qGdPuBgzuS3ky/IRY+05oGu+9+Xkyye 1I9uG76uUF/4uFaq75182Kxmq28rYBkkaonNuTEuERWssauPulazHDXK6Fdz83j/p/qn uRXbqxwTtjKTVBZAAow+KtFjxTzspOHAu0CNhVCOR9otPgLJ/MZOC/0M6gCguFK13UdN dCmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=e3+2T5N+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z5si12801261edp.102.2020.04.15.14.55.04; Wed, 15 Apr 2020 14:55:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=e3+2T5N+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2633298AbgDNUme (ORCPT + 99 others); Tue, 14 Apr 2020 16:42:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729251AbgDNUmQ (ORCPT ); Tue, 14 Apr 2020 16:42:16 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 958F5C061A0C for ; Tue, 14 Apr 2020 13:42:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=f6MlGhTrcfhE1matnY+gFa+jGAtYKA2T1KF8VNUMpDE=; b=e3+2T5N+0nfY8EtcHFc1NCkWQG FaeuviunEq6Wm0GkCubBuH66TIFCjGuHLfcqqEzA31UTQtFMnKL0IMjud4r7RsPGNlTTGlZO2GpG4 KTnhLkSX/o5DC+G53S0e8WNc4ZeiNbYx64khP15zkcWFLOAExY7Rqcz8J1f3z4RK6vK0Jf8YVhgLt zuQPLCXMwyeDiX4kZG0FZzqXogloO2Uk8ZgibofNW5e4wFZ2KrTstwcJoE5WxgtvsswTmmj72JIgi UirhkeJ1Me/WQjy1XlUriJ27VDRV2KXfC2tbSuM7DjbTAUCEv50/ksza4Gie2K9Mxmtz2p9m/Ts7T QneatjUg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1jOSNm-0001ZJ-Rr; Tue, 14 Apr 2020 20:42:11 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id 8F899980FA9; Tue, 14 Apr 2020 22:42:08 +0200 (CEST) Date: Tue, 14 Apr 2020 22:42:08 +0200 From: Peter Zijlstra To: Barret Rhoden Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , linux-kernel@vger.kernel.org, syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com Subject: Re: perf: add cond_resched() to task_function_call() Message-ID: <20200414204208.GI2483@worktop.programming.kicks-ass.net> References: <20200414190351.16893-1-brho@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200414190351.16893-1-brho@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 14, 2020 at 03:03:51PM -0400, Barret Rhoden wrote: > Under rare circumstances, task_function_call() can repeatedly fail and > cause a soft lockup. > > There is a slight race where the process is no longer running on the cpu > we targeted by the time remote_function() runs. The code will simply > try again. If we are very unlucky, this will continue to fail, until a > watchdog fires. This can happen in a heavily loaded, multi-core virtual > machine. Sigh,.. virt again :/ > Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com > Signed-off-by: Barret Rhoden > --- > kernel/events/core.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 55e44417f66d..65c2c05e24c2 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -99,7 +99,7 @@ static void remote_function(void *data) > * > * returns: @func return value, or > * -ESRCH - when the process isn't running > - * -EAGAIN - when the process moved away > + * -ENXIO - when the cpu the process was on has gone offline > */ Hurm.. I don't think that was actually intended behaviour. As long as the task lives we ought to retry. Luckily I don't think the current code cares much, it'll loop again on the caller side. With the exception of perf_cgroup_attach() that is, that might actually be broken because of this. > static int > task_function_call(struct task_struct *p, remote_function_f func, void *info) > @@ -112,11 +112,15 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info) > }; > int ret; > > - do { > - ret = smp_call_function_single(task_cpu(p), remote_function, &data, 1); > + while (1) { > + ret = smp_call_function_single(task_cpu(p), remote_function, > + &data, 1); > if (!ret) > ret = data.ret; > - } while (ret == -EAGAIN); > + if (ret != -EAGAIN) > + break; > + cond_resched(); > + } So how about we make that: for (;;) { ret = smp_call_function_single(task_cpu(p), remote_function, &data, 1); ret = !ret ? data.ret : -EAGAIN; if (ret != -EAGAIN) break; cond_resched(); } Or something like that, hmmm?