Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5856717rwd; Mon, 5 Jun 2023 09:25:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7aPN4qqdbhJSUT+TPtwJ+ev2R6PXsT7Fki9hIkV9v3K8r8l5vCWAcT33PSlEfz1S3yCeFd X-Received: by 2002:a17:90a:7147:b0:253:74f8:1e31 with SMTP id g7-20020a17090a714700b0025374f81e31mr7150383pjs.39.1685982304738; Mon, 05 Jun 2023 09:25:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685982304; cv=none; d=google.com; s=arc-20160816; b=JSYquCiMSc+g75k0x25JFJboa+R6e7unssC6HpJ6BUuIdpVD/WkWARmHN6l6SKpCuM HWyOtrFPXxPWJalPFdCYcPwpxA2+nwzQRJcZa358+eMG0sUZsGA0mtWKrfQxQTvxIWaG qvBZiEYwVjKM9ns8SYPxr4ZwkrbSum7IVhUWD75UoK23uFGcBMkaJY3rNRM7hXZb6fm2 oYdxcMf3Dyo5RWE6ToxiDf4IK7W1jlfQrxIANdYBzEoBdpOVSPy1mrhjfWon3gdTZVvP xIougHZt0hhE3/wmGuHsrOeSkgESzH61J6H4LjWLqx4EQlaMBbSsNHJ+ksLKbdzKlJCp 4XZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=9ffbC5GLRZjhG3fHjx3VH+5QhPEpL5jEzUeB4tiW2aU=; b=dN+u1Xf0VGdBbQSlhgsOQXqJ5tf1PKlTs/XhpFxyOE7/kZqQD+33QEhn/dTTzfSyZa tWmE8IvA6TOVcosjl08/A3qTqYVFNAiK1yd6TDaT9H6nEvJtw6o7sZHk5MeBvlqSGRod pmS2lfSef3RcTWylOw5kX/4TeP0SvlhoTlpRo6Iet767MesADhWAM3z/HH7Dy7Lx1GvA JB+dsZNpSXVAVZbJ6jmnp1yCGh2I8aRt/h8c5I2nCcQ4zIvmjJ25hYq3OUY+ngAe6jnA c8crrGy+sveLj9kwgSOIUAcTWCRQSb9cPwlAC/lgCYsfAb3ZYmORGKbf6JYjekvGJ34C AJSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZSGQWP+o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w2-20020a17090aad4200b002569147d64asi5863614pjv.114.2023.06.05.09.24.50; Mon, 05 Jun 2023 09:25:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZSGQWP+o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234988AbjFEPrm (ORCPT + 99 others); Mon, 5 Jun 2023 11:47:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234536AbjFEPrP (ORCPT ); Mon, 5 Jun 2023 11:47:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12DDB10EC for ; Mon, 5 Jun 2023 08:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685979938; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9ffbC5GLRZjhG3fHjx3VH+5QhPEpL5jEzUeB4tiW2aU=; b=ZSGQWP+oJqdpWwv63efI5m45XNvifDeBkUZSPWNtMUsUVvHZCrWKZIbOnEpS766yxsztLA rx1DPDUX6m/pT7J9F/+Kdcz0biem39n7pim3oJXJ4JHfqtCUZpt6tmZSn3qsb6H+sAPoJo fH0BUcZ5RvgtwHbk70Fcg/qHqeHe/yM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-oqc--MSvPQuc3YKlDpeDfw-1; Mon, 05 Jun 2023 11:45:35 -0400 X-MC-Unique: oqc--MSvPQuc3YKlDpeDfw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E445101A52C; Mon, 5 Jun 2023 15:45:34 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-3.gru2.redhat.com [10.97.112.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2AE5848205E; Mon, 5 Jun 2023 15:45:34 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id D62B3401030EC; Mon, 5 Jun 2023 11:53:56 -0300 (-03) Date: Mon, 5 Jun 2023 11:53:56 -0300 From: Marcelo Tosatti To: Michal Hocko Cc: Christoph Lameter , Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka Subject: Re: [PATCH v2 2/3] vmstat: skip periodic vmstat update for nohz full CPUs Message-ID: References: <20230602185757.110910188@redhat.com> <20230602190115.521067386@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote: > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote: > > The interruption caused by vmstat_update is undesirable > > for certain aplications: > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > > > The example above shows an additional 7us for the > > > > oslat -> kworker -> oslat > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > observed in the guest is higher than 50us, violating the acceptable > > latency threshold. > > I personally find the above problem description insufficient. I have > asked several times and only got piece by piece information each time. > Maybe there is a reason to be secretive but it would be great to get at > least some basic expectations described and what they are based on. There is no reason to be secretive. > > E.g. workloads are running on isolated cpus with nohz full mode to > shield off any kernel interruption. Yet there are operations that update > counters (like mlock, but not mlock alone) that update per cpu counters > that will eventually get flushed and that will cause some interference. > Now the host/guest transition and intereference. How that happens when > the guest is running on an isolated and dedicated cpu? Follows the updated changelog. Does it contain the information requested ? ---- Performance details for the kworker interruption: With workloads that are running on isolated cpus with nohz full mode to shield off any kernel interruption. For example, a VM running a time sensitive application with a 50us maximum acceptable interruption (use case: soft PLC). oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... The example above shows an additional 7us for the oslat -> kworker -> oslat switches. In the case of a virtualized CPU, and the vmstat_update interruption in the host (of a qemu-kvm vcpu), the latency penalty observed in the guest is higher than 50us, violating the acceptable latency threshold. The isolated vCPU can perform operations that modify per-CPU page counters, for example to complete I/O operations: CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback CPU 11/KVM-9540 [001] dNh1. 2314.248585: => 0xffffffffc042b083 => mod_zone_page_state => __folio_end_writeback => folio_end_writeback => iomap_finish_ioend => blk_mq_end_request_batch => nvme_irq => __handle_irq_event_percpu => handle_irq_event => handle_edge_irq => __common_interrupt => common_interrupt => asm_common_interrupt => vmx_do_interrupt_nmi_irqoff => vmx_handle_exit_irqoff => vcpu_enter_guest => vcpu_run => kvm_arch_vcpu_ioctl_run => kvm_vcpu_ioctl => __x64_sys_ioctl => do_syscall_64 => entry_SYSCALL_64_after_hwframe > > Skip periodic updates for nohz full CPUs. Any callers who > > need precise values should use a snapshot of the per-CPU > > counters, or use the global counters with measures to > > handle errors up to thresholds (see calculate_normal_threshold). > > I would rephrase this paragraph. > In kernel users of vmstat counters either require the precise value and > they are using zone_page_state_snapshot interface or they can live with > an imprecision as the regular flushing can happen at arbitrary time and > cumulative error can grow (see calculate_normal_threshold). > >From that POV the regular flushing can be postponed for CPUs that have > been isolated from the kernel interference withtout critical > infrastructure ever noticing. Skip regular flushing from vmstat_shepherd > for all isolated CPUs to avoid interference with the isolated workload. > > > Suggested by Michal Hocko. > > > > Signed-off-by: Marcelo Tosatti > > Acked-by: Michal Hocko OK, updated comment, thanks.