Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp2204181pxb; Sun, 30 Jan 2022 08:15:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJwUDfQMonKOSNigyWcddH8ZBAMtZ+xxi/B6Fi1XwU2CVuk4zHmNKdSw2ebw6fPalur4ODzR X-Received: by 2002:a17:907:1623:: with SMTP id hb35mr14109334ejc.209.1643559318021; Sun, 30 Jan 2022 08:15:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643559318; cv=none; d=google.com; s=arc-20160816; b=UX1Zt/7OMnFkRrQp45myykPEyORduH4BG3gjgrj1WNTQ1UkISEbhdmbkzESBb9syRD bb8CCweb2psSln4TnmxWuMgRbGflaTUZkjHbE7kpFVwD4pH/IpnIr5Gzfp7buCzBZW6k yeNWPd4vHDXfLR8nVonn+wu4ObF5lfwIg/9AiOixQZbQVg8FJci5xNuw6ii7dTlmuhNL uxwYT0vRAZW/JDHtANpRAvduhx9D4K55iicygIuPd2dp7pRDd0vtWdrufm5BAOJOMY/6 Wh4RoqjO379SbtazovYsJoD4kzzeLSHdJYhyWk7gh9nQg0mvBLDax0fuW20bUaGBWVX/ j3sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=ZoOlUeTcIB3lde6agUIzDy5+c2Kj0vOnYJNRZPTSdL4=; b=Ga7E3rhIkjGSzg6aaJziEejaX8f703tIrJbrx/SkapDmEe1FuMVcA2Cx4cIKzzr6l9 L5BDv5xX7BiyrK5BEx/uZTIj2xYH/VKJPiefmwhK20m7NDqQtrkkQ/YYH6/2+R0AIeV6 4PYIm4vpDJ7Wgbnye9E5HGgiHZommjZAX8NAumfIBkVaXexIBaFoEsMEp8IussgL9ky0 TKtFlAdi+brjH636YEj9b3Smhz0RdAd4Iy9RUvvodjEXdW381/DWqkeMv/jBlZT2xz5F sjjnrPY+H5UPBo61WA4D90KO5mILmFndLL3QwGBr8mNWlnXsqn4YqM/mZJxyuWGTDySQ QO0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=akm9+BmF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 4si5551489ejl.369.2022.01.30.08.14.53; Sun, 30 Jan 2022 08:15:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=akm9+BmF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243124AbiA1IlY (ORCPT + 99 others); Fri, 28 Jan 2022 03:41:24 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:44448 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232136AbiA1IlW (ORCPT ); Fri, 28 Jan 2022 03:41:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643359281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZoOlUeTcIB3lde6agUIzDy5+c2Kj0vOnYJNRZPTSdL4=; b=akm9+BmFfw4+SymGGiuDjA//xG/hI5HvaAGQ1wr9ANlVPO1OMMqB+6Okpmk6ir0L0BwfT9 9x2y/2cLRx0CaJ3eBbHM9pPtmb80jmby26JIE3DdAKIG3jOuQyOx2HbxY3ZEx4oHoNJDiX Na8eMlfYnoo39MJHVjzODzIQuV2dMlU= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-117-nNApc-HJP5qRgbGOPvgBjA-1; Fri, 28 Jan 2022 03:41:20 -0500 X-MC-Unique: nNApc-HJP5qRgbGOPvgBjA-1 Received: by mail-wm1-f69.google.com with SMTP id o194-20020a1ca5cb000000b00350b177fb22so4785712wme.3 for ; Fri, 28 Jan 2022 00:41:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=ZoOlUeTcIB3lde6agUIzDy5+c2Kj0vOnYJNRZPTSdL4=; b=B0ROMEYVBKFglwty45rBVPB/VNJA3PXj5a1cI47IcWp5mITGXy3hYr+unA8AI8Eblx uIXlCZK0AHOxGSjpaatnk3OsVjyM+ml+08RRlV38mziqS/NbwdiMo2L+almm7eMI6KsQ kWF05ml3z/XnFGfiqRauWdDI/OrJWBsL6X1zfT/z+M5EG9Jncgf2WL74NuvQ2BVDk6Gs 6IhIgkp0voeFxoFFxWsupFJOkgIbx7nU1cuoJcNOZFZ9eR7z/g57OSk2ZqTgOQsyWnSX BxfszuW/8mZtM/sj/0YiV0ck0RhaJxQQeF0Ti8RIz1cRqnsbzh6nSjEHHMRRqGnxeo3z tsAw== X-Gm-Message-State: AOAM530yBswVQ+E3wtxgsFX0g039Dfdt2i2FTNDAQxtsU3jBZ3h/2/tb oN7bK3wXQnNAUtJEYC+eLI7G7YIVxG7oFdffawwGWo6MrPs4BB04Jk8vDh0XuHqZg01Skh+3/oi vKHVGw3zWoAJfbhuvw5zADM+Z X-Received: by 2002:a05:600c:3d10:: with SMTP id bh16mr6696129wmb.127.1643359278851; Fri, 28 Jan 2022 00:41:18 -0800 (PST) X-Received: by 2002:a05:600c:3d10:: with SMTP id bh16mr6696103wmb.127.1643359278632; Fri, 28 Jan 2022 00:41:18 -0800 (PST) Received: from ?IPV6:2003:cb:c70e:5c00:522f:9bcd:24a0:cd70? (p200300cbc70e5c00522f9bcd24a0cd70.dip0.t-ipconnect.de. [2003:cb:c70e:5c00:522f:9bcd:24a0:cd70]) by smtp.gmail.com with ESMTPSA id h127sm17207297wmh.2.2022.01.28.00.41.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jan 2022 00:41:17 -0800 (PST) Message-ID: <205231d0-2b4e-7d93-1028-2d501c1cbf74@redhat.com> Date: Fri, 28 Jan 2022 09:41:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Content-Language: en-US To: Yang Shi Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Linux MM References: <20220126095557.32392-1-david@redhat.com> <20220126095557.32392-7-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27.01.22 22:23, Yang Shi wrote: > On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand wrote: >> >> reuse_swap_page() currently indicates if we can write to an anon page >> without COW. A COW is required if the page is shared by multiple >> processes (either already mapped or via swap entries) or if there is >> concurrent writeback that cannot tolerate concurrent page modifications. >> >> reuse_swap_page() doesn't check for pending references from other >> processes that already unmapped the page, however, >> is_refcount_suitable() essentially does the same thing in the context of >> khugepaged. khugepaged is the last remaining user of reuse_swap_page() and >> we want to remove that function. >> >> In the context of khugepaged, we are not actually going to write to the >> page and we don't really care about other processes mapping the page: >> for example, without swap, we don't care about shared pages at all. >> >> The current logic seems to be: >> * Writable: -> Not shared, but might be in the swapcache. Nobody can >> fault it in from the swapcache as there are no other swap entries. >> * Readable and not in the swapcache: Might be shared (but nobody can >> fault it in from the swapcache). >> * Readable and in the swapcache: Might be shared and someone might be >> able to fault it in from the swapcache. Make sure we're the exclusive >> owner via reuse_swap_page(). >> >> Having to guess due to lack of comments and documentation, the current >> logic really only wants to make sure that a page that might be shared >> cannot be faulted in from the swapcache while khugepaged is active. >> It's hard to guess why that is that case and if it's really still required, >> but let's try keeping that logic unmodified. > > I don't think it could be faulted in while khugepaged is active since > khugepaged does hold mmap_lock in write mode IIUC. So page fault is > serialized against khugepaged. It could get faulted in by another process sharing the page, because we only synchronize against the current process. > > My wild guess is that collapsing shared pages was not supported before > v5.8, so we need reuse_swap_page() to tell us if the page in swap > cache is shared or not. But it is not true anymore. And khugepaged > just allocates a THP then copy the data from base pages to huge page > then replace PTEs to PMD, it doesn't change the content of the page, > so I failed to see a problem by collapsing a shared page in swap > cache. But I'm really not entirely sure, I may miss something... Looking more closely where this logic originates from, it was introduced in: commit 10359213d05acf804558bda7cc9b8422a828d1cd Author: Ebru Akagunduz Date: Wed Feb 11 15:28:28 2015 -0800 mm: incorporate read-only pages into transparent huge pages This patch aims to improve THP collapse rates, by allowing THP collapse in the presence of read-only ptes, like those left in place by do_swap_page after a read fault. Currently THP can collapse 4kB pages into a THP when there are up to khugepaged_max_ptes_none pte_none ptes in a 2MB range. This patch applies the same limit for read-only ptes. The change essentially results in a read-only mapped PTE page getting copied and mapped writable via a new PMD-mapped THP. It mentions do_swap_page(), so I assume it just tried to do what do_swap_page() would do when trying to map a page swapped in from the page cache writable immediately. But we differ from do_swap_page() that we're not actually going to map the page writable, we're going to copy the page (__collapse_huge_page_copy()) and map the copy writable. I assume we can remove that logic. -- Thanks, David / dhildenb