Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1810398lqp; Mon, 15 Apr 2024 19:52:45 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVE40QYCEx8vz8C8NVTxopY5Yqt9gP0qmba+dBEI0HM7O0//Nk7vNcURwvvwHHR7m9/pyFOGXjSntAvYHN7f/59/cgT2NLjAvrFSI2+UQ== X-Google-Smtp-Source: AGHT+IFUPSdOgarwvBa/s9DqIhbAPXBJAZUqraMqN8ZW16C3Gkz510kdROqI1Pc8GDcC5m9y3Pnt X-Received: by 2002:a05:6402:5489:b0:570:20:e62d with SMTP id fg9-20020a056402548900b005700020e62dmr1163689edb.8.1713235965770; Mon, 15 Apr 2024 19:52:45 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713235965; cv=pass; d=google.com; s=arc-20160816; b=ZiZRT7J1Y483Te+rP0gxBqgpdOAS1NdwBvN+12ZZApDcWwlJf2lq5h9FM9F5Vdj3ap dmwYstlNfWbWnNLdvAaHm+UYum0fHxIRVfmZsdFg9zc6uKZFV2Y9rjbJBfFhED74MK1A STIv+6PzOxw+nuCqcdXHbLvorI6P93PLK9ZqtwwgHTmZXe3Bww3QYAPAcoysu9a567lo z6wMAafifFMOv4PeoKNET9Co8ekM4323p7jHoha57ndrceXAmHhnmVkv2aYSTQ4e68qT LJ/pQ0cQAvNcq4GI6kYndSX+nEh08+srIRBppQJTtyT7OhSXyGZtbtNmVC4re9D+0DrT vhuA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=lFczH1dIfyyTRf2W5n4H9U/PB8b6sWya9Y+4vbVP4sA=; fh=CoQtvw3puZANW/oPb0bUG/L/xEHij2i4BleiVt4h8Ow=; b=O+zUQtkvhGl32VKKPmZDXKaRbdEVUnetR1iDJU7i7X7nN9tnc5n9jaATBBJBw4SQlW e7ZHT+AybB9dkvCYTpEZxZz54vc9eUPQNd8j5rt/8SYuKtUC7KyB6hCXjKiUNmCZ0i0f JcdipMOHMsIQlxE1QApFbWubejgsiU5dFMLsclKBg8Gnb/AbF9JZzaHIOHEQB+OuFkmu XRhcANr8B8RVAVNtICX7SiQFsbHNF4uvJrAkhrVK8GsoQVfxVKpWp+cpVgixVWaOAETC yfNaERMUGZP+pSk6SAVUF+qy6X89vDlvv10mP7DrW5+hf2GDKnN12q0kODBNxS5hSfT0 4NCw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=S+RKtPRp; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-146135-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146135-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id dz8-20020a0564021d4800b0056beb467f3fsi5275895edb.685.2024.04.15.19.52.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 19:52:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-146135-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=S+RKtPRp; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-146135-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146135-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 2894A1F21878 for ; Tue, 16 Apr 2024 02:52:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7765811181; Tue, 16 Apr 2024 02:52:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="S+RKtPRp" Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB9E9A38 for ; Tue, 16 Apr 2024 02:52:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713235958; cv=none; b=b6bZ1WM2hLqQQXoAJezIlbi5ZHo6hAMmCQ80PqrDFwaGrpSkUpu2LzM0jiFyf7sY4dCqv8jMQqkXzAOHwJ2FgojyFqv5S0XqkndeneS7Tr8ns8dVsVfTf74Ftgf3o5+U8oaxxKRbHrhL88V/1ucGJxCVyoTcjOVOSTvA6vwm/Wo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713235958; c=relaxed/simple; bh=8Pkir9qaqf83Jtkp/E3HQO9padGCKAwrP8HOfQFuQgI=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=REwhIbtkfKSm1xbk18uPYWD6WYu88zjuxDfv/LK93uXBxYW8WppA5IRdS8TxyecrkFQUqx8/ZA1ZXNoGvHnfkzP8VmXZPogbwJn6AF4d2eRFVRS29a+YM1ZhK4KlBNPF4kNKbSpLIv3TlJBZ42cPiZ5QYYb+LOLHRSVdKUKjloQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=S+RKtPRp; arc=none smtp.client-ip=209.85.221.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-4daa91c0344so1814515e0c.3 for ; Mon, 15 Apr 2024 19:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713235956; x=1713840756; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lFczH1dIfyyTRf2W5n4H9U/PB8b6sWya9Y+4vbVP4sA=; b=S+RKtPRpeqU9yUrMKGebaI0q8LotYMTJYBF8LApO/xTNSj092JhgHjhwND7uJQQOBR 48QTd+oEmIKMOXdEWyBkqAugGv6xqRM+yBp0AWcEKZPU3B6AcEcJ7yV1baPFxvZG5811 CARrhqlW1b/VFDGkhhVn1SizjW2xGF7M7orKuA5f0T/KjACs7x+o5nGIsHQrYc/iwKyo Ooq9xJi697V22ZBPFuvLh6l9jIvSyrsk8H5ZOKQG/XeCr55ExZ0lwlOZD+RzUEYJuJf0 uZzWGrvj7Xjlil/9Xr2BLv/sFSXgOCDFjxGarXWe40af6C3Hzawf0CPIDoM7GKInnRet tblA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713235956; x=1713840756; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lFczH1dIfyyTRf2W5n4H9U/PB8b6sWya9Y+4vbVP4sA=; b=aC8oaB9+qk+lhk+PbW0fm17NUo7zogqZe2wiXEgXSMpVkxznFJQ9ZVWNakvOlWTNNM rtocZfDDiL/zk7m+IHysJD5Vbmxnf0wE8v8+/TFBLOHa8Y9/hyLwghWdUEW31aTeYC53 Ru4H7sJseoJWeac524Y+uZNs22BnSH77t7a2KmnRmD7AEuZ43bFy5bYBiQVgxqkObWp5 RoSYhyjxHO+oOWtbYWRYnPX6HdWTklaQE+oxjMnuLSGPKW5oV92Tc1QrJOMEKNnuy2l1 6b78bG/FIIcseG4cUGGMCxHbjLDzS8Hp6coNhlhL8kmm5IsbdjK03BWOObZCyrevr4fH loHg== X-Forwarded-Encrypted: i=1; AJvYcCWKN2jcTzt+u4V4HndeA0U6vqTQzW+mhpnomU0xu9Jg/mslhriElqWT1TKdRxWf0UrH4/pToYl7FcJy9xMkcsD91KK1zgXSmRpIQiMz X-Gm-Message-State: AOJu0Yz/7hyflu5BxqwK5eB33g63AN0JuGcuSxZped6TLAEyQP3zqvgr kPaVxsmy54pcvwNEc2ju7BmseETx9kMibvRxk8DwhX8qbznz+BL3Xj30q9XC09yXSOqonczYrp6 xot0L+E63NtOxRbafcK58RkolRto= X-Received: by 2002:a05:6122:3d14:b0:4d4:ef9:719d with SMTP id ga20-20020a0561223d1400b004d40ef9719dmr9351058vkb.5.1713235955700; Mon, 15 Apr 2024 19:52:35 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240409082631.187483-1-21cnbao@gmail.com> <20240409082631.187483-5-21cnbao@gmail.com> <87frvn2f89.fsf@yhuang6-desk2.ccr.corp.intel.com> <8734rm2gdj.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y19e115l.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87y19e115l.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 16 Apr 2024 14:52:24 +1200 Message-ID: Subject: Re: [PATCH v2 4/5] mm: swap: entirely map large folios found in swapcache To: "Huang, Ying" Cc: Khalid Aziz , akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, kasong@tencent.com, ryan.roberts@arm.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Apr 16, 2024 at 2:41=E2=80=AFPM Huang, Ying = wrote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Tue, Apr 16, 2024 at 2:27=E2=80=AFPM Huang, Ying wrote: > >> > >> > >> Added Khalid for arch_do_swap_page(). > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >> > On Mon, Apr 15, 2024 at 8:39=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Barry Song <21cnbao@gmail.com> writes: > >> > >> [snip] > >> > >> >> > >> >> > + bool any_swap_shared =3D false; > >> >> > > >> >> > if (!pte_unmap_same(vmf)) > >> >> > goto out; > >> >> > @@ -4137,6 +4141,35 @@ vm_fault_t do_swap_page(struct vm_fault *v= mf) > >> >> > */ > >> >> > vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf-= >address, > >> >> > &vmf->ptl); > >> >> > >> >> We should move pte check here. That is, > >> >> > >> >> if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf= ->orig_pte))) > >> >> goto out_nomap; > >> >> > >> >> This will simplify the situation for large folio. > >> > > >> > the plan is moving the whole code block > >> > > >> > if (start_pte && folio_test_large(folio) && folio_test_swapcache(fol= io)) > >> > > >> > after > >> > if (unlikely(!folio_test_uptodate(folio))) { > >> > ret =3D VM_FAULT_SIGBUS; > >> > goto out_nomap; > >> > } > >> > > >> > though we couldn't be !folio_test_uptodate(folio)) for hitting > >> > swapcache but it seems > >> > logically better for future use. > >> > >> LGTM, Thanks! > >> > >> >> > >> >> > + > >> >> > + /* We hit large folios in swapcache */ > >> >> > >> >> The comments seems unnecessary because the code tells that already. > >> >> > >> >> > + if (start_pte && folio_test_large(folio) && folio_test_swap= cache(folio)) { > >> >> > + int nr =3D folio_nr_pages(folio); > >> >> > + int idx =3D folio_page_idx(folio, page); > >> >> > + unsigned long folio_start =3D vmf->address - idx * = PAGE_SIZE; > >> >> > + unsigned long folio_end =3D folio_start + nr * PAGE= _SIZE; > >> >> > + pte_t *folio_ptep; > >> >> > + pte_t folio_pte; > >> >> > + > >> >> > + if (unlikely(folio_start < max(vmf->address & PMD_M= ASK, vma->vm_start))) > >> >> > + goto check_pte; > >> >> > + if (unlikely(folio_end > pmd_addr_end(vmf->address,= vma->vm_end))) > >> >> > + goto check_pte; > >> >> > + > >> >> > + folio_ptep =3D vmf->pte - idx; > >> >> > + folio_pte =3D ptep_get(folio_ptep); > >> >> > >> >> It's better to construct pte based on fault PTE via generalizing > >> >> pte_next_swp_offset() (may be pte_move_swp_offset()). Then we can = find > >> >> inconsistent PTEs quicker. > >> > > >> > it seems your point is getting the pte of page0 by pte_next_swp_offs= et() > >> > unfortunately pte_next_swp_offset can't go back. on the other hand, > >> > we have to check the real pte value of the 0nd entry right now becau= se > >> > swap_pte_batch() only really reads pte from the 1st entry. it assume= s > >> > pte argument is the real value for the 0nd pte entry. > >> > > >> > static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_= t pte) > >> > { > >> > pte_t expected_pte =3D pte_next_swp_offset(pte); > >> > const pte_t *end_ptep =3D start_ptep + max_nr; > >> > pte_t *ptep =3D start_ptep + 1; > >> > > >> > VM_WARN_ON(max_nr < 1); > >> > VM_WARN_ON(!is_swap_pte(pte)); > >> > VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte))); > >> > > >> > while (ptep < end_ptep) { > >> > pte =3D ptep_get(ptep); > >> > > >> > if (!pte_same(pte, expected_pte)) > >> > break; > >> > > >> > expected_pte =3D pte_next_swp_offset(expected_pte); > >> > ptep++; > >> > } > >> > > >> > return ptep - start_ptep; > >> > } > >> > >> Yes. You are right. > >> > >> But we may check whether the pte of page0 is same as "vmf->orig_pte - > >> folio_page_idx()" (fake code). > > > > right, that is why we are reading and checking PTE0 before calling > > swap_pte_batch() > > right now. > > > > folio_ptep =3D vmf->pte - idx; > > folio_pte =3D ptep_get(folio_ptep); > > if (!is_swap_pte(folio_pte) || non_swap_entry(pte_to_swp_entry(folio_= pte)) || > > swap_pte_batch(folio_ptep, nr, folio_pte, &any_swap_shared) !=3D = nr) > > goto check_pte; > > > > So, if I understand correctly, you're proposing that we should directly= check > > PTE0 in swap_pte_batch(). Personally, I don't have any objections to th= is idea. > > However, I'd also like to hear the feedback from Ryan and David :-) > > I mean that we can replace > > !is_swap_pte(folio_pte) || non_swap_entry(pte_to_swp_entry(folio_= pte)) > > in above code with pte_same() with constructed expected first pte. Got it. It could be quite tricky, especially with considerations like pte_swp_soft_dirty, pte_swp_exclusive, and pte_swp_uffd_wp. We might require a helper function similar to pte_next_swp_offset() but capable of moving both forward and backward. For instance: pte_move_swp_offset(pte_t pte, long delta) pte_next_swp_offset can insteadly call it by: pte_move_swp_offset(pte, 1); Is it what you are proposing? > > >> > >> You need to check the pte of page 0 anyway. > >> > >> >> > >> >> > + if (!is_swap_pte(folio_pte) || non_swap_entry(pte_t= o_swp_entry(folio_pte)) || > >> >> > + swap_pte_batch(folio_ptep, nr, folio_pte, &any_= swap_shared) !=3D nr) > >> >> > + goto check_pte; > >> >> > + > >> >> > + start_address =3D folio_start; > >> >> > + start_pte =3D folio_ptep; > >> >> > + nr_pages =3D nr; > >> >> > + entry =3D folio->swap; > >> >> > + page =3D &folio->page; > >> >> > + } > >> >> > + > >> >> > +check_pte: > >> >> > if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf= ->orig_pte))) > >> >> > goto out_nomap; > >> >> > > >> >> > @@ -4190,6 +4223,10 @@ vm_fault_t do_swap_page(struct vm_fault *v= mf) > >> >> > */ > >> >> > exclusive =3D false; > >> >> > } > >> >> > + > >> >> > + /* Reuse the whole large folio iff all entries are = exclusive */ > >> >> > + if (nr_pages > 1 && any_swap_shared) > >> >> > + exclusive =3D false; > >> >> > } > >> >> > > > [snip] > > -- > Best Regards, > Huang, Ying