Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp405278pxp; Wed, 16 Mar 2022 08:10:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwf25zhuRKcWGmrFOIAD6kUwbN48+pwY589idFqeMgKSXgvJKPKpMaTA71vOlRGNB6l1zGg X-Received: by 2002:a05:6a00:1894:b0:4f7:288:9844 with SMTP id x20-20020a056a00189400b004f702889844mr18165pfh.28.1647443412334; Wed, 16 Mar 2022 08:10:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647443412; cv=none; d=google.com; s=arc-20160816; b=J2aIM16mLhZgltv+OMcjbzMqdJ7jg83ZM4oqZBM4WCNl3d6625+UjZI6w2loElVXXR qp9/Z8GXjxjIWEbKwaVN7hji6aSf8K4CZBDoQ1TvSNjb2T65ZHvUj7vg9VEMOH07va2Y NTcn++FP9TkxFZuSbh2jO4sLyHCxb6sKt1Hn9PD39iyJ1Jw746xDEYwoSGTwgfk9a8yj uRLwDqttTB3643MFkbIGDrz72zq/AESs/SF8gEW2nt1uI4OdBdOdHtmWreQKX2fqY5bg vOHYIlpGll90xGAf0SdxePLZcsiJQqRw7CD5f2ICVafnN8DIvQG5TYgpSPv5YsgEwg9P 9G6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=IS4g8C+3UaqIOzzDUpgJjOk5q+BQB0YtL+m6GkeLJ5Y=; b=ep+RXUa60Xftz2lEEwA1p8JK3llUVqv2dOnGyrkBn5C7MhsnUSsgWqNNJ4ksSYbMya CNRiftQWFQBmBKQOWnazSneav2KaG/RoodN+rz1SK2A+24ifLHQZw4zJuN+JPM9V2kZW PQmEN5ejBhlUIUYh8M3QgGAi5Pp8VTUWyI1+wRYEvVQticXaksMCDcHTo5YplNV3nmSd nEGs2Li8D4FJBSvydbzSQed/XdKXWw0kS9eAl4UlCk1Qj7JWJVwvwI7rTZZr11b4V9RW vgVqxwZz7zzslYcYom2FaiRIfq4Mknqiwgqvu6LpquioAhDAlhiA/VdBVaBGfh7ALgzE wQZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bpFjx3ng; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i9-20020a63a849000000b0036ff8789c9bsi2357549pgp.25.2022.03.16.08.09.58; Wed, 16 Mar 2022 08:10:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bpFjx3ng; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351609AbiCOT5Z (ORCPT + 99 others); Tue, 15 Mar 2022 15:57:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351551AbiCOT5J (ORCPT ); Tue, 15 Mar 2022 15:57:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8811C5621F for ; Tue, 15 Mar 2022 12:55:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647374144; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=IS4g8C+3UaqIOzzDUpgJjOk5q+BQB0YtL+m6GkeLJ5Y=; b=bpFjx3ngJHtre8DWGJAiOaB5SfQ12B/waqhunAcxzjkg9t4ThbTfu55xzQof40YLBIHBCM ZWsx33S418DESWiQW5db2w8An+AfPbm98JGGhJ11//TdKTSadtv/4TtKxRvjtXGoHv4A9H iksGaX0Ban1HtPL5SQvUPBXTTIMjpmw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-170-nZFzwYdZNC2zSI7cD_5ORg-1; Tue, 15 Mar 2022 15:55:41 -0400 X-MC-Unique: nZFzwYdZNC2zSI7cD_5ORg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BBC42811E7A; Tue, 15 Mar 2022 19:55:40 +0000 (UTC) Received: from fuller.cnet (ovpn-112-3.gru2.redhat.com [10.97.112.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BC6D430BA1; Tue, 15 Mar 2022 19:55:27 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id B7BD0416D5DE; Tue, 15 Mar 2022 12:33:58 -0300 (-03) Message-ID: <20220315153313.952151848@fedora.localdomain> User-Agent: quilt/0.66 Date: Tue, 15 Mar 2022 12:31:37 -0300 From: Marcelo Tosatti To: linux-kernel@vger.kernel.org Cc: Nitesh Lal , Nicolas Saenz Julienne , Frederic Weisbecker , Christoph Lameter , Juri Lelli , Peter Zijlstra , Alex Belits , Peter Xu , Thomas Gleixner , Daniel Bristot de Oliveira , Oscar Shiang , Marcelo Tosatti Subject: [patch v12 05/13] task isolation: sync vmstats on return to userspace References: <20220315153132.717153751@fedora.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The logic to disable vmstat worker thread, when entering nohz full, does not cover all scenarios. For example, it is possible for the following to happen: 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats. 2) app runs mlock, which increases counters for mlock'ed pages. 3) start -RT loop Since refresh_cpu_vm_stats from nohz_full logic can happen _before_ the mlock, vmstat shepherd can restart vmstat worker thread on the CPU in question. To fix this, use the task isolation prctl interface to quiesce deferred actions when returning to userspace. This patch adds hooks to fork and exit code paths. Signed-off-by: Marcelo Tosatti --- v12: set TIF_TASK_ISOL only when necessary (Frederic) v11: fold patch to add task_isol_exit hooks (Frederic) Use _TIF_TASK_ISOL bit on thread flags (Frederic) v6: modify exit_to_user_mode_loop to cover exceptions and interrupts v5: no changes v4: add oneshot mode support include/linux/task_isolation.h | 16 ++++++++++++++++ include/linux/vmstat.h | 8 ++++++++ kernel/entry/common.c | 15 +++++++++++---- kernel/task_isolation.c | 21 +++++++++++++++++++++ mm/vmstat.c | 21 +++++++++++++++++++++ 5 files changed, 77 insertions(+), 4 deletions(-) Index: linux-2.6/include/linux/task_isolation.h =================================================================== --- linux-2.6.orig/include/linux/task_isolation.h +++ linux-2.6/include/linux/task_isolation.h @@ -27,6 +27,13 @@ static inline void task_isol_free(struct __task_isol_free(tsk); } +void __task_isol_exit(struct task_struct *tsk); +static inline void task_isol_exit(struct task_struct *tsk) +{ + if (tsk->task_isol_info) + __task_isol_exit(tsk); +} + int prctl_task_isol_feat_get(unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); int prctl_task_isol_cfg_get(unsigned long arg2, unsigned long arg3, @@ -40,12 +47,22 @@ int prctl_task_isol_activate_set(unsigne int __copy_task_isol(struct task_struct *tsk); +void task_isol_exit_to_user_mode(void); + #else +static inline void task_isol_exit_to_user_mode(void) +{ +} + static inline void task_isol_free(struct task_struct *tsk) { } +static inline void task_isol_exit(struct task_struct *tsk) +{ +} + static inline int prctl_task_isol_feat_get(unsigned long arg2, unsigned long arg3, unsigned long arg4, Index: linux-2.6/include/linux/vmstat.h =================================================================== --- linux-2.6.orig/include/linux/vmstat.h +++ linux-2.6/include/linux/vmstat.h @@ -21,6 +21,14 @@ int sysctl_vm_numa_stat_handler(struct c void *buffer, size_t *length, loff_t *ppos); #endif +#if defined(CONFIG_SMP) && defined(CONFIG_TASK_ISOLATION) +void sync_vmstat(void); +#else +static inline void sync_vmstat(void) +{ +} +#endif + struct reclaim_stat { unsigned nr_dirty; unsigned nr_unqueued_dirty; Index: linux-2.6/kernel/entry/common.c =================================================================== --- linux-2.6.orig/kernel/entry/common.c +++ linux-2.6/kernel/entry/common.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "common.h" @@ -174,6 +175,9 @@ static unsigned long exit_to_user_mode_l if (ti_work & _TIF_NOTIFY_RESUME) tracehook_notify_resume(regs); + if (ti_work & _TIF_TASK_ISOL) + task_isol_exit_to_user_mode(); + /* Architecture specific TIF work */ arch_exit_to_user_mode_work(regs, ti_work); Index: linux-2.6/kernel/task_isolation.c =================================================================== --- linux-2.6.orig/kernel/task_isolation.c +++ linux-2.6/kernel/task_isolation.c @@ -18,6 +18,12 @@ #include #include #include +#include +#include + +void __task_isol_exit(struct task_struct *tsk) +{ +} void __task_isol_free(struct task_struct *tsk) { @@ -251,6 +257,11 @@ static int cfg_feat_quiesce_set(unsigned info->quiesce_mask = i_qctrl->quiesce_mask; info->oneshot_mask = i_qctrl->quiesce_oneshot_mask; info->conf_mask |= ISOL_F_QUIESCE; + + if ((info->active_mask & ISOL_F_QUIESCE) && + (info->quiesce_mask & ISOL_F_QUIESCE_VMSTATS)) + set_thread_flag(TIF_TASK_ISOL); + ret = 0; out_free: @@ -303,6 +314,7 @@ int __copy_task_isol(struct task_struct new_info->active_mask = info->active_mask; tsk->task_isol_info = new_info; + set_ti_thread_flag(task_thread_info(tsk), TIF_TASK_ISOL); return 0; } @@ -330,6 +342,10 @@ int prctl_task_isol_activate_set(unsigne info->active_mask = active_mask; ret = 0; + if ((info->active_mask & ISOL_F_QUIESCE) && + (info->quiesce_mask & ISOL_F_QUIESCE_VMSTATS)) + set_thread_flag(TIF_TASK_ISOL); + out: return ret; } @@ -349,3 +365,24 @@ int prctl_task_isol_activate_get(unsigne return 0; } + +void task_isol_exit_to_user_mode(void) +{ + struct task_isol_info *i; + + clear_thread_flag(TIF_TASK_ISOL); + + i = current->task_isol_info; + if (!i) + return; + + if (i->active_mask != ISOL_F_QUIESCE) + return; + + if (i->quiesce_mask & ISOL_F_QUIESCE_VMSTATS) { + sync_vmstat(); + if (i->oneshot_mask & ISOL_F_QUIESCE_VMSTATS) + i->quiesce_mask &= ~ISOL_F_QUIESCE_VMSTATS; + } +} +EXPORT_SYMBOL_GPL(task_isol_exit_to_user_mode); Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -2018,6 +2018,29 @@ static void vmstat_shepherd(struct work_ round_jiffies_relative(sysctl_stat_interval)); } +#ifdef CONFIG_TASK_ISOLATION +void sync_vmstat(void) +{ + int cpu; + + cpu = get_cpu(); + + refresh_cpu_vm_stats(false); + put_cpu(); + + /* + * If task is migrated to another CPU between put_cpu + * and cancel_delayed_work_sync, the code below might + * cancel vmstat_update work for a different cpu + * (than the one from which the vmstats were flushed). + * + * However, vmstat shepherd will re-enable it later, + * so its harmless. + */ + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); +} +#endif + static void __init start_shepherd_timer(void) { int cpu; Index: linux-2.6/include/linux/entry-common.h =================================================================== --- linux-2.6.orig/include/linux/entry-common.h +++ linux-2.6/include/linux/entry-common.h @@ -60,7 +60,7 @@ #define EXIT_TO_USER_MODE_WORK \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \ - ARCH_EXIT_TO_USER_MODE_WORK) + _TIF_TASK_ISOL | ARCH_EXIT_TO_USER_MODE_WORK) /** * arch_check_user_regs - Architecture specific sanity check for user mode regs Index: linux-2.6/kernel/exit.c =================================================================== --- linux-2.6.orig/kernel/exit.c +++ linux-2.6/kernel/exit.c @@ -64,6 +64,7 @@ #include #include #include +#include #include #include @@ -759,6 +760,7 @@ void __noreturn do_exit(long code) validate_creds_for_do_exit(tsk); io_uring_files_cancel(); + task_isol_exit(tsk); exit_signals(tsk); /* sets PF_EXITING */ /* sync mm's RSS info before statistics gathering */ Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c +++ linux-2.6/kernel/fork.c @@ -2427,6 +2427,7 @@ bad_fork_free_pid: if (pid != &init_struct_pid) free_pid(pid); bad_fork_cleanup_task_isol: + task_isol_exit(p); task_isol_free(p); bad_fork_cleanup_thread: exit_thread(p);