Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp182790ybz; Wed, 15 Apr 2020 06:51:46 -0700 (PDT) X-Google-Smtp-Source: APiQypKDyOFJhT7fHAM52+RpnvpMdJMGD4RjvDkXGghWzkRRvY8NFXyh/yxke0zeifBCH0mSHS/8 X-Received: by 2002:aa7:d457:: with SMTP id q23mr25661782edr.66.1586958706296; Wed, 15 Apr 2020 06:51:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586958706; cv=none; d=google.com; s=arc-20160816; b=VqTFEHIDDVWuPa5bGZijIWaFHoW1xPioemOz8F3ylazu9dJ8AtXEknJPMFy4f0uRS4 FNzJFQaoEXmSUW4Op11cegnZ+o/+XblN5b5tKMKygplqwHNZzrkEhgicKqj8jNzH5pR8 qNAKzHaEXHcblqdPGn6VzKWEmiepGKZKCQKDU4WdsheCD+NPt4rnqPPkwzpAG2magY5v vppp74+pShV6SQ3ylpMfa26NFrxfyKGL0LsNHGK1PFUvBh002ctcAy2VeLQJY2Ag22ha HjYFe2o42BFxaqd5B2yqxJ7w6YWdetOjiu4IeZ8byyKj9Z8gSP6q75JYGKNV3Apl6pXB snoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=qfEBpoByh/EF6RzNjsQpd80C4dsY79aLZ3TtEdxZIzU=; b=DDxxEcatviA9QZUV4xJ9+0rnyXZ7ZDJWkxEmzddig/pEUhpCYzd8R/FZfANaDeBe2n dGtSJxQayG+qZ8Gek3Xf7gXKTbQPKL+bX4mkI148mhJ9U88rin68HDHKAb7qGHQppxvS Xva6+M96pqrRAg/q7Wx2E2Ki2e14gytfrPGnh6TmztQ3abFjps/Eoc8tI33AbgsYO0Sf A6g3qRimE82zzu7LLzmdOkf0bn0RK5GFu6vUs6DCC+E7GGPCvPDzoBCXUamiiGK5nO1y fiSBAZPWgGBxOWI249ARg4lbB70hC8aQ/KuEVt4K3w+aJJQLAkOddhDA0hf+NT7Ul8xs ZwkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ljkywbh0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h11si4697549ejc.416.2020.04.15.06.51.22; Wed, 15 Apr 2020 06:51:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ljkywbh0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391788AbgDNVQL (ORCPT + 99 others); Tue, 14 Apr 2020 17:16:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1732367AbgDNVQI (ORCPT ); Tue, 14 Apr 2020 17:16:08 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3657C061A0C for ; Tue, 14 Apr 2020 14:16:08 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id l25so15064698qkk.3 for ; Tue, 14 Apr 2020 14:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=qfEBpoByh/EF6RzNjsQpd80C4dsY79aLZ3TtEdxZIzU=; b=ljkywbh0N5mD20pRb2kUxBzu2PtcjtaGV+YjeyV+vUzyoOJm561CktJY0po2GRGSer soycIhyrbtNaExpkhTfnC+OmqlJHZ1vGDSzYhf8+DHqFcYFMV6w6PJ/CzquVGAUE8aLD W9Fq4oNNev7idGdFll9Hm9cogo7BTNnqtWFQ/bovYEhfWROBi6LVhVebi7uh6uPcifuV Cxyaf0AH0/bevZyh2XRGYiihntTWOVU7RzXrYybhEvRVhhhtEybs6aQNzmnFIo8Pr05R rpBBbxBEMwhM0rRHTRCYaIZrKhNMlOW9uS2rcBN563SCIeKU38OYx8xEdEsmUp90WY5h itow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=qfEBpoByh/EF6RzNjsQpd80C4dsY79aLZ3TtEdxZIzU=; b=FkBcChodGkHN5Sv5Y1LPF8znCc2t48T1FFheqBATjH6Ehf7FFe9mEG2yt70Ijf6/hH 0VWDSAy/YBdPk2tIPD4XIg8egOM7nQ9OjSKUppLOAIJnWYJT1EbKo9OeAslsP1gHUDNK AT9WXtzXosrcbhX7nNg9IhCU9lvMOOlWypMb6MUHvz1v6N0IyM1XVru2UBBcTylMn6Ac ZkBdaqAyXywWPRRk2Is5+fgzobTgYBe6nOrXdBD+SPLwaHUmGSkaQ2hlo8TwUhO5u9rt izoJtuypEXvcaGGvHz9B1q84Z9S8EYffiEt7YEEqPxzXHH0OYF0dhs45ZXhc59pg8DxZ G/wA== X-Gm-Message-State: AGi0PuZj3wl/R7gekEcaY1mkzR7/uai9I4T/Oo6ug+QRP09q6Ep+FmfI ZBmbrHfph5+f0DXAB3HbngffqA== X-Received: by 2002:a37:dd5:: with SMTP id 204mr13977379qkn.321.1586898967524; Tue, 14 Apr 2020 14:16:07 -0700 (PDT) Received: from [192.168.1.10] (c-66-30-119-151.hsd1.ma.comcast.net. [66.30.119.151]) by smtp.gmail.com with ESMTPSA id x18sm3518598qkn.107.2020.04.14.14.16.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 14 Apr 2020 14:16:06 -0700 (PDT) Subject: Re: perf: add cond_resched() to task_function_call() To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , linux-kernel@vger.kernel.org, syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com References: <20200414190351.16893-1-brho@google.com> <20200414204208.GI2483@worktop.programming.kicks-ass.net> From: Barret Rhoden Message-ID: Date: Tue, 14 Apr 2020 17:16:05 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20200414204208.GI2483@worktop.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/14/20 4:42 PM, Peter Zijlstra wrote: > On Tue, Apr 14, 2020 at 03:03:51PM -0400, Barret Rhoden wrote: >> Under rare circumstances, task_function_call() can repeatedly fail and >> cause a soft lockup. >> >> There is a slight race where the process is no longer running on the cpu >> we targeted by the time remote_function() runs. The code will simply >> try again. If we are very unlucky, this will continue to fail, until a >> watchdog fires. This can happen in a heavily loaded, multi-core virtual >> machine. > > Sigh,.. virt again :/ > >> Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com >> Signed-off-by: Barret Rhoden >> --- >> kernel/events/core.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index 55e44417f66d..65c2c05e24c2 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -99,7 +99,7 @@ static void remote_function(void *data) >> * >> * returns: @func return value, or >> * -ESRCH - when the process isn't running >> - * -EAGAIN - when the process moved away >> + * -ENXIO - when the cpu the process was on has gone offline >> */ > > Hurm.. I don't think that was actually intended behaviour. Yeah, I noticed we never return EAGAIN, so I figured I'd look for any other potential return values. > As long as > the task lives we ought to retry. Luckily I don't think the current code > cares much, it'll loop again on the caller side. > > With the exception of perf_cgroup_attach() that is, that might actually > be broken because of this. > >> static int >> task_function_call(struct task_struct *p, remote_function_f func, void *info) >> @@ -112,11 +112,15 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info) >> }; >> int ret; >> >> - do { >> - ret = smp_call_function_single(task_cpu(p), remote_function, &data, 1); >> + while (1) { >> + ret = smp_call_function_single(task_cpu(p), remote_function, >> + &data, 1); >> if (!ret) >> ret = data.ret; >> - } while (ret == -EAGAIN); >> + if (ret != -EAGAIN) >> + break; >> + cond_resched(); >> + } > > So how about we make that: > > for (;;) { > ret = smp_call_function_single(task_cpu(p), remote_function, &data, 1); > ret = !ret ? data.ret : -EAGAIN; > > if (ret != -EAGAIN) > break; > > cond_resched(); > } > > Or something like that, hmmm? Can do. I can put a comment in too. /* Retry due to any failure in smp_call_function */ or something. Thanks, Barret