Received: by 2002:ab2:2441:0:b0:1f3:1f8c:d0c6 with SMTP id k1csp96253lqe; Thu, 4 Apr 2024 00:10:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV1IATKip5nv53oIa7y+eIJgnZtZgTj2Xlg6vVY2wXuyUyNrU1dtXhxB1Vq6neRDpQtOnehPdBnolH8dkJbDjzbhtN9YiXhVNNuNGR7iQ== X-Google-Smtp-Source: AGHT+IEx/LsTtlRauKNneP+8Kmzf33LODrmw68UV8nYM6R0scFk7sSiztr6LcvfvJKlhzfQMwGbe X-Received: by 2002:a05:6218:2612:b0:184:c60:2bc6 with SMTP id oy18-20020a056218261200b001840c602bc6mr727703rwc.26.1712214640908; Thu, 04 Apr 2024 00:10:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712214640; cv=pass; d=google.com; s=arc-20160816; b=ZwgRegF5sTk4rGr5i/FKSC5X3m5rJPHYqyQJby2q6qBHml+7cPbTjySU0NEdNDtY8A gKU7i9bRbiYSBj4FL5NhcoJI6Iz3XchXxFhxu83LANHRhyDNa9X6kyUvsS06v0X3YF/a uNE8Ulnlsh6x7k5XlM6W7r1+2/TikB2TbrliF9eOEmVOVCPFP359SjfzKpjrPI7fF7x1 H4ZEQAw2WLQhj4q26J1vSlEI+FcNkToqwuUPjkt/Dq396e/UNLztnU5o0UMX46PkA9CB ocOLdebJO7vLk+AURwrwbXaTKHos1/iwc/ap5QxzInjkVNivM6K3PXw9kp2mmm/CrW27 baHQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature:dkim-signature; bh=ghkXx40nWEPSM9bclOWwsS092PohOPXWVqRF6XEBYQE=; fh=6bkGU1IBM9/o1vPsMwwSF6rIs39UwZVpk/m4l9svmhI=; b=E5CHH3uqTtZDjrnlNm7nfdwaUpzuBfYcOUicATKTOCIOdPilCtKbLMmUopZqgBuIli Hn8FSekQao3R1yi0a/F/0e2OZ12WIXHwhJlDYdtrvbjyzstKVcYEwLqYRwWF9pXnfm84 /StTlgl/nzVLH+auPqGlxzuNdNx2exluRB5APb0idjdy0z10CcAEgFqEnR8dxor7N2d1 UDSM0g3cY1IQrxYJ58NVlInz4SVdExLP87j9vaBEOYAeNidl/ykWAnrEoBWKpU9sO094 yoAhPh5uASiYwlZtawJ2dsROoVBa9yGNPspzkUnG70Q1pklXBv4lWJpHdGEPtAUTEZ9p JbZQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=BHMMgYkU; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-131006-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131006-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id r144-20020a632b96000000b005dc423d758dsi14264209pgr.116.2024.04.04.00.10.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 00:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-131006-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=BHMMgYkU; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-131006-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-131006-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id B2A61B21419 for ; Thu, 4 Apr 2024 07:06:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C796958139; Thu, 4 Apr 2024 07:05:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="BHMMgYkU"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="DMflKcu2" Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFAC555C35 for ; Thu, 4 Apr 2024 07:05:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712214353; cv=none; b=lKTZfwN+xEZ0jcOOx7A4TcVe6EvadcrI5PcyKvMAek6di60vPFRgqELzAjsSvXbkpo6PsXynCgW0eL2DHnXGrSDh+I8tTBQYTMGxBHWkrDNBcpbaJWSA7JEuFbOUZA3sPI1eL9Jo+u1ztLIuaqU9iwCB8vHnmXUS0fALB40OclI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712214353; c=relaxed/simple; bh=+Yw303P06vUpHeiNshxkqg3FhelxXe2YPfl+wBWNLgM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ebncDS2zI8TOpcv+uREPmYrDTUzQiAmDJXsvkQGeVNJd1ZKjWv8OCqGX/Ei/VtwgpvgR8XJvY0rvV20liWhhTu3z5FoB3FFw8kS5P7CI97K0cVxwv4PhJ9CS79NIGwGUe0oNzGvV1m8MFyahpcxBcVU0NjrzEkx7yZptxL44qH4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=BHMMgYkU; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=DMflKcu2; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.dmz-prg2.suse.org (imap2.dmz-prg2.suse.org [10.150.64.98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CF8805D673; Thu, 4 Apr 2024 07:05:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1712214349; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ghkXx40nWEPSM9bclOWwsS092PohOPXWVqRF6XEBYQE=; b=BHMMgYkUiju095r33Rm4aFYQVJAwhFlIDN4uo666GZf1EZGTkXYRxMz5A3BQTzfRhJ8oNb CtTBgFODKCu7/cFXbTZUQheOsQIhLtf4bHs5qlDS6ulVH5UdWyu5SiRrmOxqe0qk6e8ihl +wYgMi7rurqELUnAhNYcjr67THsJgts= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1712214349; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ghkXx40nWEPSM9bclOWwsS092PohOPXWVqRF6XEBYQE=; b=DMflKcu20nVPELyns9/pBRYiLwB8A3fIMLkoYrZfn85iYZ6nYiNRUcVI529KzKH7zGBig0 jJO3FfEYbmXXKABg== Authentication-Results: smtp-out2.suse.de; none Received: from imap2.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap2.dmz-prg2.suse.org (Postfix) with ESMTPS id 370B413A91; Thu, 4 Apr 2024 07:05:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap2.dmz-prg2.suse.org with ESMTPSA id wAuiCk1RDmY6cgAAn2gu4w (envelope-from ); Thu, 04 Apr 2024 07:05:49 +0000 From: Oscar Salvador To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , Vlastimil Babka , Marco Elver , Andrey Konovalov , Alexander Potapenko , Alexandre Ghiti , Oscar Salvador , syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com Subject: [PATCH v4 2/4] mm,page_owner: Fix refcount imbalance Date: Thu, 4 Apr 2024 09:07:00 +0200 Message-ID: <20240404070702.2744-3-osalvador@suse.de> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240404070702.2744-1-osalvador@suse.de> References: <20240404070702.2744-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.77 X-Spamd-Result: default: False [-1.77 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; BAYES_HAM(-3.00)[100.00%]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_MISSING_CHARSET(2.50)[]; TAGGED_RCPT(0.00)[41bbfdb8d41003d12c0f]; MIME_GOOD(-0.10)[text/plain]; REPLY(-4.00)[]; BROKEN_CONTENT_TYPE(1.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-0.97)[-0.973]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; RCPT_COUNT_SEVEN(0.00)[11]; MID_CONTAINS_FROM(1.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,suse.cz:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_CC(0.00)[vger.kernel.org,kvack.org,suse.com,suse.cz,google.com,gmail.com,rivosinc.com,suse.de,syzkaller.appspotmail.com]; RCVD_TLS_ALL(0.00)[]; SUSPICIOUS_RECIPS(1.50)[] X-Spam-Level: X-Spam-Flag: NO Current code does not contemplate scenarios were an allocation and free operation on the same pages do not handle it in the same amount at once. To give an example, page_alloc_exact(), where we will allocate a page of enough order to stafisfy the size request, but we will free the remainings right away. In the above example, we will increment the stack_record refcount only once, but we will decrease it the same number of times as number of unused pages we have to free. This will lead to a warning because of refcount imbalance. Fix this by recording the number of base pages in the refcount field. Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Signed-off-by: Oscar Salvador Reviewed-by: Vlastimil Babka Tested-by: Alexandre Ghiti --- Documentation/mm/page_owner.rst | 73 +++++++++++++++++---------------- mm/page_owner.c | 34 ++++++++------- 2 files changed, 58 insertions(+), 49 deletions(-) diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst index 0d0334cd5179..3a45a20fc05a 100644 --- a/Documentation/mm/page_owner.rst +++ b/Documentation/mm/page_owner.rst @@ -24,10 +24,10 @@ fragmentation statistics can be obtained through gfp flag information of each page. It is already implemented and activated if page owner is enabled. Other usages are more than welcome. -It can also be used to show all the stacks and their outstanding -allocations, which gives us a quick overview of where the memory is going -without the need to screen through all the pages and match the allocation -and free operation. +It can also be used to show all the stacks and their current number of +allocated base pages, which gives us a quick overview of where the memory +is going without the need to screen through all the pages and match the +allocation and free operation. page owner is disabled by default. So, if you'd like to use it, you need to add "page_owner=on" to your boot cmdline. If the kernel is built @@ -75,42 +75,45 @@ Usage cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt - prep_new_page+0xa9/0x120 - get_page_from_freelist+0x7e6/0x2140 - __alloc_pages+0x18a/0x370 - new_slab+0xc8/0x580 - ___slab_alloc+0x1f2/0xaf0 - __slab_alloc.isra.86+0x22/0x40 - kmem_cache_alloc+0x31b/0x350 - __khugepaged_enter+0x39/0x100 - dup_mmap+0x1c7/0x5ce - copy_process+0x1afe/0x1c90 - kernel_clone+0x9a/0x3c0 - __do_sys_clone+0x66/0x90 - do_syscall_64+0x7f/0x160 - entry_SYSCALL_64_after_hwframe+0x6c/0x74 - stack_count: 234 + post_alloc_hook+0x177/0x1a0 + get_page_from_freelist+0xd01/0xd80 + __alloc_pages+0x39e/0x7e0 + allocate_slab+0xbc/0x3f0 + ___slab_alloc+0x528/0x8a0 + kmem_cache_alloc+0x224/0x3b0 + sk_prot_alloc+0x58/0x1a0 + sk_alloc+0x32/0x4f0 + inet_create+0x427/0xb50 + __sock_create+0x2e4/0x650 + inet_ctl_sock_create+0x30/0x180 + igmp_net_init+0xc1/0x130 + ops_init+0x167/0x410 + setup_net+0x304/0xa60 + copy_net_ns+0x29b/0x4a0 + create_new_namespaces+0x4a1/0x820 + nr_base_pages: 16 ... ... echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt cat stacks_7000.txt - prep_new_page+0xa9/0x120 - get_page_from_freelist+0x7e6/0x2140 - __alloc_pages+0x18a/0x370 - alloc_pages_mpol+0xdf/0x1e0 - folio_alloc+0x14/0x50 - filemap_alloc_folio+0xb0/0x100 - page_cache_ra_unbounded+0x97/0x180 - filemap_fault+0x4b4/0x1200 - __do_fault+0x2d/0x110 - do_pte_missing+0x4b0/0xa30 - __handle_mm_fault+0x7fa/0xb70 - handle_mm_fault+0x125/0x300 - do_user_addr_fault+0x3c9/0x840 - exc_page_fault+0x68/0x150 - asm_exc_page_fault+0x22/0x30 - stack_count: 8248 + post_alloc_hook+0x177/0x1a0 + get_page_from_freelist+0xd01/0xd80 + __alloc_pages+0x39e/0x7e0 + alloc_pages_mpol+0x22e/0x490 + folio_alloc+0xd5/0x110 + filemap_alloc_folio+0x78/0x230 + page_cache_ra_order+0x287/0x6f0 + filemap_get_pages+0x517/0x1160 + filemap_read+0x304/0x9f0 + xfs_file_buffered_read+0xe6/0x1d0 [xfs] + xfs_file_read_iter+0x1f0/0x380 [xfs] + __kernel_read+0x3b9/0x730 + kernel_read_file+0x309/0x4d0 + __do_sys_finit_module+0x381/0x730 + do_syscall_64+0x8d/0x150 + entry_SYSCALL_64_after_hwframe+0x62/0x6a + nr_base_pages: 20824 ... cat /sys/kernel/debug/page_owner > page_owner_full.txt diff --git a/mm/page_owner.c b/mm/page_owner.c index 52d1ced0b57f..5df0d6892bdc 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -196,7 +196,8 @@ static void add_stack_record_to_list(struct stack_record *stack_record, spin_unlock_irqrestore(&stack_list_lock, flags); } -static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) +static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask, + int nr_base_pages) { struct stack_record *stack_record = __stack_depot_get_stack_record(handle); @@ -217,15 +218,20 @@ static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) /* Add the new stack_record to our list */ add_stack_record_to_list(stack_record, gfp_mask); } - refcount_inc(&stack_record->count); + refcount_add(nr_base_pages, &stack_record->count); } -static void dec_stack_record_count(depot_stack_handle_t handle) +static void dec_stack_record_count(depot_stack_handle_t handle, + int nr_base_pages) { struct stack_record *stack_record = __stack_depot_get_stack_record(handle); - if (stack_record) - refcount_dec(&stack_record->count); + if (!stack_record) + return; + + if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) + pr_warn("%s: refcount went to 0 for %u handle\n", __func__, + handle); } static inline void __update_page_owner_handle(struct page_ext *page_ext, @@ -306,7 +312,7 @@ void __reset_page_owner(struct page *page, unsigned short order) * the machinery is not ready yet, we cannot decrement * their refcount either. */ - dec_stack_record_count(alloc_handle); + dec_stack_record_count(alloc_handle, 1 << order); } noinline void __set_page_owner(struct page *page, unsigned short order, @@ -325,7 +331,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order, current->pid, current->tgid, ts_nsec, current->comm); page_ext_put(page_ext); - inc_stack_record_count(handle, gfp_mask); + inc_stack_record_count(handle, gfp_mask, 1 << order); } void __set_page_owner_migrate_reason(struct page *page, int reason) @@ -872,11 +878,11 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos) return stack; } -static unsigned long page_owner_stack_threshold; +static unsigned long page_owner_pages_threshold; static int stack_print(struct seq_file *m, void *v) { - int i, stack_count; + int i, nr_base_pages; struct stack *stack = v; unsigned long *entries; unsigned long nr_entries; @@ -887,14 +893,14 @@ static int stack_print(struct seq_file *m, void *v) nr_entries = stack_record->size; entries = stack_record->entries; - stack_count = refcount_read(&stack_record->count) - 1; + nr_base_pages = refcount_read(&stack_record->count) - 1; - if (stack_count < 1 || stack_count < page_owner_stack_threshold) + if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold) return 0; for (i = 0; i < nr_entries; i++) seq_printf(m, " %pS\n", (void *)entries[i]); - seq_printf(m, "stack_count: %d\n\n", stack_count); + seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages); return 0; } @@ -924,13 +930,13 @@ static const struct file_operations page_owner_stack_operations = { static int page_owner_threshold_get(void *data, u64 *val) { - *val = READ_ONCE(page_owner_stack_threshold); + *val = READ_ONCE(page_owner_pages_threshold); return 0; } static int page_owner_threshold_set(void *data, u64 val) { - WRITE_ONCE(page_owner_stack_threshold, val); + WRITE_ONCE(page_owner_pages_threshold, val); return 0; } -- 2.44.0