Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2919361rwb; Mon, 15 Aug 2022 14:03:52 -0700 (PDT) X-Google-Smtp-Source: AA6agR7Jh55vhLbllatNH9Bri6KYabXGZ8NB9rppUTzueFO1Qkwo26f3Yzofo4VGNrWXnRQnpjV3 X-Received: by 2002:a17:907:6da8:b0:730:8ed5:2df8 with SMTP id sb40-20020a1709076da800b007308ed52df8mr11466402ejc.75.1660597432351; Mon, 15 Aug 2022 14:03:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660597432; cv=none; d=google.com; s=arc-20160816; b=RILY0rNIU8ry5S56xYQJBUg8ZiLWpA6H5RtDdxwgoqjo1PGc7BDqaE5+dv1Fr2/9NI DpEsB+Y+NMYDUxIlX2KjJ/90bR9JQR2AVfbNIyT0K8bkhbkG/DA59IBfaP0ddorcEnAi 0zrD1Dc/K598wXV+kbcWlnJ/N7Epi8NgwN4KkHd8txZG44JNH5nbpmjFLdz59kNvyM/T UFh+k9RrvRWy8TtVehjpTZm6DsPLQYVp3/9gkzQqGAetpTet4yeCMMTbyQJYL9rl2TzA 8dssfuUJ4RK4PFE3SnxXwFWv04bT3EBmFgQuYvG9Wu+2pDYvuTO9QbsRF4WNWIeFnJvi J9Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=aEVgZpEY3sRl/+o1/LTaaOe76uPYgDuQZM0FRHPwGLM=; b=BX/RCcQKipqx7yIiHjZHS9aqNapv8R5Fo3+V3rkqYkZ9a9SNYRhQdhuGQ3Wz5lvhg8 qBrfA4gLxnbHdR6dw7PnVjq9p+qqHSWvScQNspIqWj/tnqIrGlxJiR4CsCno814IKsR6 KQoynhaPgeSboXBmxhnBjj0yoiCVGJuir/FyR1eGaiV29W//tdSN8y6hGkaHPViPuIka Iljai941K+EcipxZYqf+86ZBjFfXlilFesZmPGBXtRuq1iAoqONMLhyHvTz1vssGH8Xp 6L2EYctwqWFDpMvmPVftVzWFu3cnridOVQ+IImOPG0JNyY4SFn7ff8BvrUGuQjcUOfN8 BxqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CxVtU3Ai; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ie8-20020a170906df0800b007304a1ee3e3si6318853ejc.517.2022.08.15.14.03.26; Mon, 15 Aug 2022 14:03:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=CxVtU3Ai; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240989AbiHOU3x (ORCPT + 99 others); Mon, 15 Aug 2022 16:29:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347326AbiHOUWR (ORCPT ); Mon, 15 Aug 2022 16:22:17 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E21F33427; Mon, 15 Aug 2022 12:02:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ADDB661281; Mon, 15 Aug 2022 19:02:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D3CBC433D6; Mon, 15 Aug 2022 19:02:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1660590147; bh=uLFx4RHYD59OS7NFdOPVWAzPXWiJpunBElMg6vgcyso=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CxVtU3AiDJQg3kdN3HuT9eAXfb0lka6ED/Bb8AVUgl9YL2NQIRIpmmap+pWJU2wDF PAXrgc2zy8X4cpo/tO6uQQMFD73D0NPxlYo9c4+qYt3ZSs3GQzgTCRLw3p7yruGk8V lMIbG385VteoWY/jI00BGzvyAYOqN5psB5qu9vu4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ben Segall , Shakeel Butt , Alexander Viro , Linus Torvalds , Eric Dumazet , Roman Penyaev , Jason Baron , Khazhismel Kumykov , Heiher , stable@kernel.org, Andrew Morton Subject: [PATCH 5.18 0134/1095] epoll: autoremove wakers even more aggressively Date: Mon, 15 Aug 2022 19:52:13 +0200 Message-Id: <20220815180435.143346938@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220815180429.240518113@linuxfoundation.org> References: <20220815180429.240518113@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Benjamin Segall commit a16ceb13961068f7209e34d7984f8e42d2c06159 upstream. If a process is killed or otherwise exits while having active network connections and many threads waiting on epoll_wait, the threads will all be woken immediately, but not removed from ep->wq. Then when network traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will not remove the entries from the list. This means that the cost of the wakeup attempt is far higher than usual, does not decrease, and this also competes with the dying threads trying to actually make progress and remove themselves from the wq. Handle this by removing visited epoll wq entries unconditionally, rather than only when the wakeup succeeds - the structure of ep_poll means that the only potential loss is the timed_out->eavail heuristic, which now can race and result in a redundant ep_send_events attempt. (But only when incoming data and a timeout actually race, not on every timeout) Shakeel added: : We are seeing this issue in production with real workloads and it has : caused hard lockups. Particularly network heavy workloads with a lot : of threads in epoll_wait() can easily trigger this issue if they get : killed (oom-killed in our case). Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com Signed-off-by: Ben Segall Tested-by: Shakeel Butt Cc: Alexander Viro Cc: Linus Torvalds Cc: Shakeel Butt Cc: Eric Dumazet Cc: Roman Penyaev Cc: Jason Baron Cc: Khazhismel Kumykov Cc: Heiher Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/eventpoll.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1747,6 +1747,21 @@ static struct timespec64 *ep_timeout_to_ return to; } +/* + * autoremove_wake_function, but remove even on failure to wake up, because we + * know that default_wake_function/ttwu will only fail if the thread is already + * woken, and in that case the ep_poll loop will remove the entry anyways, not + * try to reuse it. + */ +static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, + unsigned int mode, int sync, void *key) +{ + int ret = default_wake_function(wq_entry, mode, sync, key); + + list_del_init(&wq_entry->entry); + return ret; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -1828,8 +1843,15 @@ static int ep_poll(struct eventpoll *ep, * normal wakeup path no need to call __remove_wait_queue() * explicitly, thus ep->lock is not taken, which halts the * event delivery. + * + * In fact, we now use an even more aggressive function that + * unconditionally removes, because we don't reuse the wait + * entry between loop iterations. This lets us also avoid the + * performance issue if a process is killed, causing all of its + * threads to wake up without being removed normally. */ init_wait(&wait); + wait.func = ep_autoremove_wake_function; write_lock_irq(&ep->lock); /*