Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp3673059rwb; Mon, 16 Jan 2023 11:10:32 -0800 (PST) X-Google-Smtp-Source: AMrXdXtU5Bknr51oumAVmrf8oJ7IUDSDQFUxqSdmIQaoRtClLjXRfa/sUTj6AS5zam8EEZFKkakW X-Received: by 2002:a05:6a20:6f8d:b0:b6:a6c4:87b0 with SMTP id gv13-20020a056a206f8d00b000b6a6c487b0mr16343418pzb.41.1673896232359; Mon, 16 Jan 2023 11:10:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673896232; cv=none; d=google.com; s=arc-20160816; b=VU82iMbonQPPTwXjvOa/4mzPnYQKgEcFhjvHrz/CiV4ZnXDvisGmSGBfCs+0v+yA1G +R5iRJ5rPap+oQ8P9wDNMpNRUYLjxgjvKG+xYtcCxb48c4p5oqh//GWhd5MB8GqMMkeD anCmQmI91Woqgdd8AF6J71gK04Z111VF10BbUIyNmbd1oFkUbYEGJdshWB8LPgWxcbJ6 0WXcPjHFGlRB37ma/Xm3yBxPNu3Mftu+3DS6yky7lfqaF5WTvdOlkfMJtDaUSXpwCJK3 NkgLCW01jPr3NYQaWI8xGdcrO45MtqcGpKTX4nmxQakeUzjenbYB3+qNBeqT8GAcidJE 6SEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2rYVIy1kB09an+EtRfzUSHQAOZ0Qm1/8bViAryaPMZc=; b=ZTVuS7BRDuNEC4dLTcRYKKwGor6bgH5Wauz6pz9/qw/h4z1iDB6fl2zBOtmqCv1hh9 GKDZdevBsCR0DtwLfDiBMNtkE3O9m82bRCPYp3gV4nj3fuPqVZ60ryDVhgwj15rEvsMy OPtAVfe81d+B9blu1LKdDklSh3C6tkKjHNRgjkJ9AY8xiT0S6MSzVl1GnESzTPLbCQWj AJPQSjxadrHESPT4OPkA8ClThUkY2eF+tsblS4RAFnEFEZ+PeYHrUmvdqomvzcbGI6KR PtZ8s/2MCC6b9Z9RjrReNUAm7n+LtcvrEeDc85QD+3EPKhJADv5jhDH0XGGxray1DYrF e0hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ThlQtWZe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x23-20020a634a17000000b0046fec9f9ed8si29967779pga.704.2023.01.16.11.10.26; Mon, 16 Jan 2023 11:10:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ThlQtWZe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234190AbjAPScm (ORCPT + 50 others); Mon, 16 Jan 2023 13:32:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234278AbjAPScE (ORCPT ); Mon, 16 Jan 2023 13:32:04 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52FD027483 for ; Mon, 16 Jan 2023 10:17:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673893033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2rYVIy1kB09an+EtRfzUSHQAOZ0Qm1/8bViAryaPMZc=; b=ThlQtWZe0BkN5LbJjRJGKetXaxnUOLsQLjq6EZDIOkXAaqItW1G8vG3DRX3YFdnprCMG6U H05MnhwTTLcl4ihaLkOwpVHkUihSbmJZV7SaUonQ83rGnZxncrY3L9yGvwzZY27mHJ2xHx 36BGED8IVim0448iw/gXM5kEeFkqB3Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-kQ123Hv8Mj6P2K1n6Onu3w-1; Mon, 16 Jan 2023 13:17:11 -0500 X-MC-Unique: kQ123Hv8Mj6P2K1n6Onu3w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0D6B4101A521; Mon, 16 Jan 2023 18:17:11 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 62B701121314; Mon, 16 Jan 2023 18:17:10 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 4C4F5404265AA; Mon, 16 Jan 2023 13:11:40 -0300 (-03) Date: Mon, 16 Jan 2023 13:11:40 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Frederic Weisbecker , atomlin@atomlin.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies Message-ID: References: <20230105125218.031928326@redhat.com> <20230105125248.813825852@redhat.com> <7c2af941-42a9-a59b-6a20-b331a4934a3@gentwo.de> <60183179-3a28-6bf9-a6ab-8a8976f283d@gentwo.de> <24ca2aad-54b2-2c3a-70b5-49a33c9a33@gentwo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <24ca2aad-54b2-2c3a-70b5-49a33c9a33@gentwo.de> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 16, 2023 at 10:51:40AM +0100, Christoph Lameter wrote: > On Wed, 11 Jan 2023, Marcelo Tosatti wrote: > > > OK, can replace this_cpu operations with this_cpu_ptr + standard C operators > > (and in fact can do that for interrupt disabled functions as well, that > > is CONFIG_HAVE_CMPXCHG_LOCAL not defined). > > > > Is that it? > > No that was hyperthetical. > > I do not know how to get out of this dilemma. We surely want to keep fast > vmstat operations working. Honestly, to me, there is no dilemma: * There is a requirement from applications to be uninterrupted by operating system activities. Examples include radio access network software, software defined PLCs for industrial automation (1). * There exists vm-statistics counters (which count the number of pages on different states, for example, number of free pages, locked pages, pages under writeback, pagetable pages, file pages, etc). To reduce number of accesses to the global counters, each CPU maintains its own delta relative to the global VM counters (which can be cached in the local processor cache, therefore fast). The per-CPU deltas are synchronized to global counters: 1) If the per-CPU counters exceed a given threshold. 2) Periodically, with a low frequency compared to CPU events (every number of seconds). 3) Upon an event that requires accurate counters. * The periodic synchronization interrupts a given CPU, in case it contains a counter delta relative to the global counters. To avoid this interruption, due to [1], the proposed patchset synchronizes any pending per-CPU deltas to global counters, for nohz_full= CPUs, when returning to userspace (which is a very fast path). Since return to userspace is a very fast path, synchronizing per-CPU counter deltas by reading their contents is undesired. Therefore a single bit is introduced to compact the following information: does this CPU contain any delta relative to the global counters that should be written back? This bit is set when a per-CPU delta is increased. This bit is cleared when the per-CPU deltas are written back to the global counters. Since for the following two operations: modify per-CPU delta (for current CPU) of counter X by Y set bit (for current CPU) indicating the per-CPU delta exists "current CPU" can change, it is necessary to disable CPU preemption when executing the pair of operations. vmstat operations still perform their main goal which is to maintain accesses local to the CPU when incrementing the counters (for most of counter modifications). The preempt_disable/enable pair is also a per-CPU variable. Now you are objecting to this patchset because: It increases the number of cycles to execute the function to modify the counters by 6. Can you mention any benchmark where this increase is significant? By searching for mod_zone_page_state/mode_node_page_state one can see the following: the codepaths that call them are touching multiple pages and other data structures, so the preempt_enable/preempt_disable pair should be a very small contribution (in terms of percentage) to any meaningful benchmark. > The fundamental issue that causes the vmstat discrepancies is likely that > the fast this_cpu ops can increment the counter on any random cpu and that > this is the reason you get vmstat discrepancies. Yes. > Give up the assumption that an increment of a this_cpu counter on a > specific cpu means that something occurred on that specific cpu. Maybe > that will get you on a path to resolve the issues you are seeing. But it can't. To be able to condense the information "does a delta exist on this CPU" from a number of cacheline reads to a single cacheline read, one bit can be used. And the write to that bit and to the counters is not atomic. Alternatives: 1) Disable periodic synchronization for nohz_full CPUs. 2) Processor instructions which can modify more than one address in memory. 3) Synchronize the per-CPU stats remotely (which increases per-CPU and per-node accesses).