Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2236025pxa; Mon, 24 Aug 2020 08:46:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/WRgJU/13SIA6m3+e06gCERJw8zPTI0PijFwmzxal11jIu6LQ7bkjeOmhoBUk4Ac1PHzc X-Received: by 2002:a05:6402:1c95:: with SMTP id cy21mr3383979edb.151.1598284018649; Mon, 24 Aug 2020 08:46:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598284018; cv=none; d=google.com; s=arc-20160816; b=vNDK48Mz1L7mz6quirDYSEe6uWV8FaBK4eqXZBuzLvc4eC9oqfGvcykRs1oiydLq4Q RDqvR7rpmLRA875mYUFOuGHgnzFljxFBfOsldafVs4K3kLGZjMhepBuVQp0TV8JQGrPJ nBNrgur9quFR5zF5ZKs2H/4oVxe69lBvu+OiM2MgFL+ygedrEpAPY0DxHzrmHC/qt+fZ 6rEz9yzu38/Dhbwc6wgRhKkKudNzW0JOU5OVsXuuQzUkjNgfeqZA/Kz3TfloYAr/WOjp hJiKt42goJtp3+hJrHdJZgtBxmpaJ5CJrHTmNOozFuYX2vYKF+KKIJOCv5Ub6HiZSHKX A7Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=tneqyr0zlboICbHevb5rTPhA6BydfllGwdj3dfYaTtM=; b=pkN998w890omf+roXqncE2dYYe+Go65/3agOMxVrD5FolPqo0jkbdjIzZLK20Q4XKv BaQFdKdmAygy7HPyH6/wD9dTPUj3o5lepnyXE1EGS/5ZYRaYOuvgVzs4mUNRkGdyCC1A K9XGSezboudqnzgisQwuFVXPCVU/WIa6cJSbNzKT0oXc8bgOm+pcLQ717f0zEpXrA2fT TsKhdWcBk3bkSbaHRbEWjq5UghYWmpbgTtNsQLXgr+DDqReY/lFR7DZpgx4oh4GAFzoK fxLFXOlo1el02ZYmYkJbo42VyMfoGbJcF+0Ga2a8jWgx+7tAEkA1RCilDgIN7/XbGIFG Cl7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i7si9034446ejo.230.2020.08.24.08.46.34; Mon, 24 Aug 2020 08:46:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728024AbgHXPnh (ORCPT + 99 others); Mon, 24 Aug 2020 11:43:37 -0400 Received: from relay.sw.ru ([185.231.240.75]:54606 "EHLO relay3.sw.ru" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727081AbgHXPid (ORCPT ); Mon, 24 Aug 2020 11:38:33 -0400 Received: from [192.168.15.190] by relay3.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1kAEXY-00105K-KD; Mon, 24 Aug 2020 18:37:44 +0300 Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification To: Jan Kara Cc: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Maya B . Gokhale" , Linus Torvalds , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Andrea Arcangeli , Christoph Hellwig , Andrew Morton References: <20200821234958.7896-1-peterx@redhat.com> <20200821234958.7896-2-peterx@redhat.com> <42bc9a68-ef9e-2542-0b21-392a7f47bd74@virtuozzo.com> <20200824143010.GG24877@quack2.suse.cz> From: Kirill Tkhai Message-ID: Date: Mon, 24 Aug 2020 18:37:53 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200824143010.GG24877@quack2.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24.08.2020 17:30, Jan Kara wrote: > On Mon 24-08-20 11:36:22, Kirill Tkhai wrote: >> On 22.08.2020 02:49, Peter Xu wrote: >>> From: Linus Torvalds >>> >>> How about we just make sure we're the only possible valid user fo the >>> page before we bother to reuse it? >>> >>> Simplify, simplify, simplify. >>> >>> And get rid of the nasty serialization on the page lock at the same time. >>> >>> Signed-off-by: Linus Torvalds >>> [peterx: add subject prefix] >>> Signed-off-by: Peter Xu >>> --- >>> mm/memory.c | 59 +++++++++++++++-------------------------------------- >>> 1 file changed, 17 insertions(+), 42 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 602f4283122f..cb9006189d22 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -2927,50 +2927,25 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >>> * not dirty accountable. >>> */ >>> if (PageAnon(vmf->page)) { >>> - int total_map_swapcount; >>> - if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) || >>> - page_count(vmf->page) != 1)) >>> + struct page *page = vmf->page; >>> + >>> + /* PageKsm() doesn't necessarily raise the page refcount */ >> >> No, this is wrong. PageKSM() always raises refcount. > > OK, then I'm confused. The comment before get_ksm_page() states: > > * get_ksm_page: checks if the page indicated by the stable node > * is still its ksm page, despite having held no reference to it. > * In which case we can trust the content of the page, and it > * returns the gotten page; but if the page has now been zapped, > * remove the stale node from the stable tree and return NULL. > ... > * You would expect the stable_node to hold a reference to the ksm page. > * But if it increments the page's count, swapping out has to wait for > * ksmd to come around again before it can free the page, which may take > * seconds or even minutes: much too unresponsive. So instead we use a > * "keyhole reference": access to the ksm page from the stable node peeps > * out through its keyhole to see if that page still holds the right key, > * pointing back to this stable node. > > So this all seems to indicate that KSM doesn't hold a proper page reference > and relies on anyone making page writeable to change page->mapping so that > KSM notices this and doesn't use the page anymore... Am I missing > something? Sure, KSM does not increment page counter, when a page becomes PageKsm(). Is patch comment about that? Even if so, I don't understand what this comment is about. "PageKsm() does not take additional counter" is not a reason the page can't be reused there. The reason is that readers of this page may increase a counter without taking the lock, so this page_count() == 1 under the lock does not guarantee anything. >> There was another >> problem: KSM may raise refcount without lock_page(), and only then it >> takes the lock. See get_ksm_page(GET_KSM_PAGE_NOLOCK) for the details. >> >> So, reliable protection against parallel access requires to freeze page >> counter, which is made in reuse_ksm_page(). > > OK, this as well. > > Honza > >> >>> + if (PageKsm(page) || page_count(page) != 1) >>> + goto copy; >>> + if (!trylock_page(page)) >>> + goto copy; >>> + if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { >>> + unlock_page(page); >>> goto copy; >>> - if (!trylock_page(vmf->page)) { >>> - get_page(vmf->page); >>> - pte_unmap_unlock(vmf->pte, vmf->ptl); >>> - lock_page(vmf->page); >>> - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, >>> - vmf->address, &vmf->ptl); >>> - if (!pte_same(*vmf->pte, vmf->orig_pte)) { >>> - update_mmu_tlb(vma, vmf->address, vmf->pte); >>> - unlock_page(vmf->page); >>> - pte_unmap_unlock(vmf->pte, vmf->ptl); >>> - put_page(vmf->page); >>> - return 0; >>> - } >>> - put_page(vmf->page); >>> - } >>> - if (PageKsm(vmf->page)) { >>> - bool reused = reuse_ksm_page(vmf->page, vmf->vma, >>> - vmf->address); >>> - unlock_page(vmf->page); >>> - if (!reused) >>> - goto copy; >>> - wp_page_reuse(vmf); >>> - return VM_FAULT_WRITE; >>> - } >>> - if (reuse_swap_page(vmf->page, &total_map_swapcount)) { >>> - if (total_map_swapcount == 1) { >>> - /* >>> - * The page is all ours. Move it to >>> - * our anon_vma so the rmap code will >>> - * not search our parent or siblings. >>> - * Protected against the rmap code by >>> - * the page lock. >>> - */ >>> - page_move_anon_rmap(vmf->page, vma); >>> - } >>> - unlock_page(vmf->page); >>> - wp_page_reuse(vmf); >>> - return VM_FAULT_WRITE; >>> } >>> - unlock_page(vmf->page); >>> + /* >>> + * Ok, we've got the only map reference, and the only >>> + * page count reference, and the page is locked, >>> + * it's dark out, and we're wearing sunglasses. Hit it. >>> + */ >>> + wp_page_reuse(vmf); >>> + unlock_page(page); >>> + return VM_FAULT_WRITE; >>> } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == >>> (VM_WRITE|VM_SHARED))) { >>> return wp_page_shared(vmf); >>> >>