Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp883516rwi; Wed, 19 Oct 2022 04:17:15 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7yiHE75geNrb5/j+VpAB/3mv0CHVoRGJgmI4nTzFGBVXsJCLGSrl84uYjVDPLczv1PuHR7 X-Received: by 2002:a05:6402:d58:b0:458:5eca:a2c9 with SMTP id ec24-20020a0564020d5800b004585ecaa2c9mr7054850edb.306.1666178234782; Wed, 19 Oct 2022 04:17:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666178234; cv=none; d=google.com; s=arc-20160816; b=JYFJRZ7JG9+k0D0GoNnnxBNKLn1qAZ2Gm7IBWbwVkMM02aBXt0hohyNb1AP3M4CVSz IriWI7G56SjgwQ3pn6T94KMbHTO7Cwx4T1z2dtH5NxELr0NWbyMEn0kNk3HeHQrKLHDg akNUB85pKniXdoD7hKQmr8+C/+pF+9CBLypKC8w3FqYjx3t3bPd3wqsjAkXPd9y+6Qkg 9sHzX1zMvlJadM49IcPZMFs2QEIvUdgEaXPeLB0BC7i5q4WaLmG7BSaKANdLJtWGuW9w KY3sX8OnqzGBoZ7cbSdLYssMr5XalsLPxzI96l10OH/q8CeIIug4l3d08anjVQWUJxM4 PT3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Hvs4hgilptgO+/i38adUEuP5+Rgjnez5emDoQYy+TTc=; b=a0LdIwHRTiwCilBs+AmmQspcuuFPhILiefdZT+87eAHpe2RUm1gFVbtMQLHpMDSCOM Uezq1yFoIimcCvW4Zfo6duHqlIgU/Tsh5ih+VDNYbg588jD19MSvzy1JSWKvzdKojY0j pDmhLBhQvnSCsvl3+2jj49RkMTvXqw6zQrQ83pRNCsCJUOkaIvKe5hgpiVNV1qtrMElf T6Qe6IhDYOKqXOKSM/pvoTxnZkk5ITtqY/aOrnrm+JJCdV9pwvUx9/k0R8YB/aDOaThc LQyMnNRDLSSqfYlRLSQYGXHCW2wZOo0F9laYjuIy1g/xkhaEOGZ/slvle5PCBZS7oJ/y 2q1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=lAeYbyW0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y4-20020a056402440400b0045c2174074asi17055911eda.350.2022.10.19.04.16.48; Wed, 19 Oct 2022 04:17:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=lAeYbyW0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234657AbiJSK4I (ORCPT + 99 others); Wed, 19 Oct 2022 06:56:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235165AbiJSKzU (ORCPT ); Wed, 19 Oct 2022 06:55:20 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C50A162513; Wed, 19 Oct 2022 03:26:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7906FB82392; Wed, 19 Oct 2022 09:07:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7446C433B5; Wed, 19 Oct 2022 09:07:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1666170433; bh=gQ9uhgSJYlR3is3lDutdEcOjJQsBHpKOa3lCfgM3yUc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lAeYbyW0jW8vwGnz3c0oDnH646WLr2bNry/+T/PymK2e906OAIYurHo6oW5iUKCl8 scvPLWREK1h9C32ZO5+eYwL45FfxuTQcVx/n7kogGyVKGGX8kpfhzqFO49tkLv9xo9 IP69LppB1MLlk4K5lpbNxTD2rHsb3Y+tSljEgNmg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Shuai Xue , Tony Luck , "Rafael J. Wysocki" , Sasha Levin Subject: [PATCH 6.0 654/862] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak Date: Wed, 19 Oct 2022 10:32:21 +0200 Message-Id: <20221019083318.839392843@linuxfoundation.org> X-Mailer: git-send-email 2.38.0 In-Reply-To: <20221019083249.951566199@linuxfoundation.org> References: <20221019083249.951566199@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shuai Xue [ Upstream commit 415fed694fe11395df56e05022d6e7cee1d39dd3 ] If an error is detected as a result of user-space process accessing a corrupt memory location, the CPU may take an abort. Then the platform firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA, NOTIFY_SOFTWARE_DELEGATED, etc. For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue so that the work is processed before return to user-space. The code use init_mm to check whether the error occurs in user space: if (current->mm != &init_mm) The condition is always true, becase _nobody_ ever has "init_mm" as a real VM any more. In addition to abort, errors can also be signaled as asynchronous exceptions, such as interrupt and SError. In such case, the interrupted current process could be any kind of thread. When a kernel thread is interrupted, the work ghes_kick_task_work deferred to task_work will never be processed because entry_handler returns to call ret_to_kernel() instead of ret_to_user(). Consequently, the estatus_node alloced from ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed. After around 200 allocations in our platform, the ghes_estatus_pool will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a result, the event failed to be processed. sdei: event 805 on CPU 113 failed with error: -2 Finally, a lot of unhandled events may cause platform firmware to exceed some threshold and reboot. The condition should generally just do if (current->mm) as described in active_mm.rst documentation. Then if an asynchronous error is detected when a kernel thread is running, (e.g. when detected by a background scrubber), do not add task_work to it as the original patch intends to do. Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") Signed-off-by: Shuai Xue Reviewed-by: Tony Luck Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/apei/ghes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d91ad378c00d..80ad530583c9 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -985,7 +985,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) ghes_estatus_cache_add(generic, estatus); } - if (task_work_pending && current->mm != &init_mm) { + if (task_work_pending && current->mm) { estatus_node->task_work.func = ghes_kick_task_work; estatus_node->task_work_cpu = smp_processor_id(); ret = task_work_add(current, &estatus_node->task_work, -- 2.35.1