Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2720401pxb; Thu, 3 Feb 2022 12:43:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJzi8p8a/pg0jHqbKQpELrXu7jgwCx+Td5UCt21huJ2CV6wjxOROo3k9/rDvw58+8rtKaSJ/ X-Received: by 2002:aa7:dd8f:: with SMTP id g15mr13042536edv.436.1643920985586; Thu, 03 Feb 2022 12:43:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643920985; cv=none; d=google.com; s=arc-20160816; b=klNzPjbDOtIkNrWy8e16H/wk6WtocWRmhsy2K+MTPyB2syJlsAzAinoiyWqaKNGacF RCBRtm4ZOdsa6BFqwvdf1A8rw6MNKseD5oq+pWjUcgWWt0H535yB3F7h/+HX+4VuImVr L2kAzjh+VZm4awlstZMMFt6Y3kQ9hZ3ae3OwoN2eoTHk3uvyDLPAB2i19o2BXlanCc65 hrk26j862+fuAkeEfpRhe4Fjw+l1n3qaiTvxB0Gn6grVBujgTIs4UQn/6Mweo886LytL SmYO2GsR6S0PxgEaGz0Nay39SrbMFsCBoKTy8ZdaMEv9fx96BBOZbfQ/ihWB/GGuVoj+ KEQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=srHq4RerCPcuK2/I9lWBI5nczQqLUr4zYMys9nonwW0=; b=eNh9fTGGg0yjUxty33lbgtJKmyYS9FoFkKBC1k6JoRZ2j78Qw7S+KM98hpSkef6yww /b/2Xn7xwjL3jqvIgXrQZZNxApPHKkvapmNa+F4fhDXF5rbwDskqTbdnl0k/TIjlwKDG QI2PVChXLdsWD6sUSwGl6PN33FCpSfGgSFd6MrYuD9Wt2lEAYcmAz5Xu4qFBLAQpupar KcHl6bnS7rpjwMnkPf5vtrNjmC7gfdgsUw0HNUyHXgOZc1kvFlr3W/jnr5lOwoFxEmAO caQ5AIIwKu0rJrhRlZCuV9YwWf8IKkrVkpSNaunbit4YDxSXYtdhwRQDvlmzOgC8Ka/H bbEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ANHHMEoE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kx3si14780063ejc.840.2022.02.03.12.42.40; Thu, 03 Feb 2022 12:43:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ANHHMEoE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348000AbiBBXG5 (ORCPT + 99 others); Wed, 2 Feb 2022 18:06:57 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:58304 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234207AbiBBXG4 (ORCPT ); Wed, 2 Feb 2022 18:06:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643843216; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=srHq4RerCPcuK2/I9lWBI5nczQqLUr4zYMys9nonwW0=; b=ANHHMEoEt4kme7PJSzZBjj3biAAgFIaNGmWaDG62aj49RUPavtiX6235XGQ+nM/Ujpq3L/ rIfzsl9T9wEBizsO/qTCKd+8YrhVswQpFyCxekSzqghzOobv74asghUS6tWYHam7OdDaIt YqWDL/Yz3GlXUlf2bzJ/WHdnPYpJl2k= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-511-i09LabvbO1eil30Q4Q1CrA-1; Wed, 02 Feb 2022 18:06:55 -0500 X-MC-Unique: i09LabvbO1eil30Q4Q1CrA-1 Received: by mail-qv1-f71.google.com with SMTP id u15-20020a0cec8f000000b00425d89d8be0so902975qvo.20 for ; Wed, 02 Feb 2022 15:06:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=srHq4RerCPcuK2/I9lWBI5nczQqLUr4zYMys9nonwW0=; b=5XvO2jFSAHiS8Fwtd7ksl6vHtZNVLqF/yKaKHumh++lF6KbGidmNYoDIZLzUi/b0vS fvINYpBaI1+y0NDWSgIBdIXoWoNx4uupidwuoABKtAMqNq73XlhO/lOtQIDBplgM3mCn I7ycae+iMmJEWQoYso6KCXtyqcBFuzJJ6CxclbII/40wZEOFgFWsEbjJEFrU6mqPyQe6 JegJn2INF0SmZCrie72RxXj/Ua8jzdGh8cskltctBaMt3bPTjFd/5PLxVJ4n8yM+SDcC a4kFZw6rEhPL61OqjXbHQ+1zZ0VLD1CM62Ugys6NJfpu3Hen/B6xbKB4dR/FjxQs3zKW iR3Q== X-Gm-Message-State: AOAM531TgZP5dUULOIy5DS15dKdD3BkV1AgIXaRrfX9M77cqfwarf8S4 G154egrVFAGTheeeUK9fWybc2EiaLgRdM17/7p3B6p9p/+lgoN8tKM/FXpnGQvB8X+y9R7DxAY9 F4eU8epKkNDgiI5U9jlJhbLRH X-Received: by 2002:a05:622a:144a:: with SMTP id v10mr16612648qtx.350.1643843214561; Wed, 02 Feb 2022 15:06:54 -0800 (PST) X-Received: by 2002:a05:622a:144a:: with SMTP id v10mr16612625qtx.350.1643843214253; Wed, 02 Feb 2022 15:06:54 -0800 (PST) Received: from optiplex-fbsd (c-73-182-255-193.hsd1.nh.comcast.net. [73.182.255.193]) by smtp.gmail.com with ESMTPSA id h7sm3106143qtb.27.2022.02.02.15.06.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Feb 2022 15:06:53 -0800 (PST) Date: Wed, 2 Feb 2022 18:06:51 -0500 From: Rafael Aquini To: Waiman Long Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Ira Weiny , Mike Rapoport , David Rientjes , Roman Gushchin Subject: Re: [PATCH v4 0/4] mm/page_owner: Extend page_owner to show memcg information Message-ID: References: <20220131192308.608837-5-longman@redhat.com> <20220202203036.744010-1-longman@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220202203036.744010-1-longman@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 02, 2022 at 03:30:32PM -0500, Waiman Long wrote: > v4: > - Take rcu_read_lock() when memcg is being accessed as suggested by > Michal. > - Make print_page_owner_memcg() return the new offset into the buffer > and put CONFIG_MEMCG block inside as suggested by Mike. > - Directly use TASK_COMM_LEN as length of name buffer as suggested by > Roman. > > v3: > - Add unlikely() to patch 1 and clarify that -1 will not be returned. > - Use a helper function to print out memcg information in patch 3. > - Add a new patch 4 to store task command name in page_owner > structure. > > v2: > - Remove the SNPRINTF() macro as suggested by Ira and use scnprintf() > instead to remove some buffer overrun checks. > - Add a patch to optimize vscnprintf with a size parameter of 0. > > While debugging the constant increase in percpu memory consumption on > a system that spawned large number of containers, it was found that a > lot of offline mem_cgroup structures remained in place without being > freed. Further investigation indicated that those mem_cgroup structures > were pinned by some pages. > > In order to find out what those pages are, the existing page_owner > debugging tool is extended to show memory cgroup information and whether > those memcgs are offline or not. With the enhanced page_owner tool, > the following is a typical page that pinned the mem_cgroup structure > in my test case: > > Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns > PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) > prep_new_page+0xac/0xe0 > get_page_from_freelist+0x1327/0x14d0 > __alloc_pages+0x191/0x340 > alloc_pages_vma+0x84/0x250 > shmem_alloc_page+0x3f/0x90 > shmem_alloc_and_acct_page+0x76/0x1c0 > shmem_getpage_gfp+0x281/0x940 > shmem_write_begin+0x36/0xe0 > generic_perform_write+0xed/0x1d0 > __generic_file_write_iter+0xdc/0x1b0 > generic_file_write_iter+0x5d/0xb0 > new_sync_write+0x11f/0x1b0 > vfs_write+0x1ba/0x2a0 > ksys_write+0x59/0xd0 > do_syscall_64+0x37/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. > > So the page was not freed because it was part of a shmem segment. That > is useful information that can help users to diagnose similar problems. > > With cgroup v1, /proc/cgroups can be read to find out the total number > of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of > the root cgroup can be read to find the number of dying cgroups (most > likely pinned by dying memcgs). > > The page_owner feature is not supposed to be enabled for production > system due to its memory overhead. However, if it is suspected that > dying memcgs are increasing over time, a test environment with page_owner > enabled can then be set up with appropriate workload for further analysis > on what may be causing the increasing number of dying memcgs. > > Waiman Long (4): > lib/vsprintf: Avoid redundant work with 0 size > mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check > mm/page_owner: Print memcg information > mm/page_owner: Record task command name > > lib/vsprintf.c | 8 +++--- > mm/page_owner.c | 70 ++++++++++++++++++++++++++++++++++++++----------- > 2 files changed, 60 insertions(+), 18 deletions(-) > > -- > 2.27.0 > Thank you, Waiman. Acked-by: Rafael Aquini