Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2014767rdb; Thu, 7 Dec 2023 16:00:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IELCOgpfKZHhMzABNZC9SPpjPdBnxHtvIxZrNDS4rdg8X17wlJTwuTkopgZND688RrYQ3tN X-Received: by 2002:a05:6871:6114:b0:1fa:ded0:d8e6 with SMTP id ra20-20020a056871611400b001faded0d8e6mr3693096oab.5.1701993641738; Thu, 07 Dec 2023 16:00:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701993641; cv=none; d=google.com; s=arc-20160816; b=FiSbyJFjzBJxocq39lKV5jGmPTzsng8pcxlYsWe44DccaMZDDfh3cBUbaZ6RA9/krW CPrt6vFkwcMPOjJoziF+pQzxjAhGdUriat+V2+V7en294D5NA1fdzlzXC6r8wJSNatEY 4mqxTaeBmJEFQu5fUkg8RLgStcdYM9FbBVyH3pO26cIINGtyQvlZpSycNqq6YCnvCxpr ThVe1yqetYc6hwEjsdBnMNrOBpV4ebk9JQSFnKXxm4BaSWcuhsurqLeV/ZX3LJijdF8M 1KIXHGE/i7gxjTkFI6BXg5zosiwDUxbyFvVE6qbXV1+BfclQJFxLd0p/HdLniVt7Xh+Q eBsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; fh=tphL2Ew+veTHgMruILVRv8nFxCq32LZ9K4AGOa88O28=; b=m7lVn69tP09XYr90ZYlq/EPMOeNATLS745Q+NAuI6LUZ80E6d0EqN9e2oSx9eSYgj5 MyqgrieLHMqugf+pQTSnhZ2ZIFneix4Qqezcth8+i77C7t8lYqIQuJDmIZi3/kpwhUJh kGwgk2KVSursyMdBDIDccSIyhPQ+m7AsC0RXmfkazIUDRPE/voDSRdRavRpEasHeQdZp tM4CGRpfHZU+lQ3MdhT1PVNQzUh4wwvgJ06TzeO4NQp8p6NmLtfOqRr4DdYBn9BriCPM I23P59XmvmUdRiQFUd4vTmjKrMH1ii9xGp++7Ez66BANWmPFkVAHaUgZ37Qf6mxx+mlm fjxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GKJM0KjK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id k12-20020a6568cc000000b0056f7f18bbfdsi470360pgt.632.2023.12.07.16.00.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 16:00:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GKJM0KjK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E35188062BA7; Thu, 7 Dec 2023 16:00:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232505AbjLHAAZ (ORCPT + 99 others); Thu, 7 Dec 2023 19:00:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229531AbjLHAAX (ORCPT ); Thu, 7 Dec 2023 19:00:23 -0500 Received: from mail-vs1-xe2d.google.com (mail-vs1-xe2d.google.com [IPv6:2607:f8b0:4864:20::e2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A3C1716 for ; Thu, 7 Dec 2023 16:00:28 -0800 (PST) Received: by mail-vs1-xe2d.google.com with SMTP id ada2fe7eead31-464754e1120so482766137.1 for ; Thu, 07 Dec 2023 16:00:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701993627; x=1702598427; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; b=GKJM0KjK/a0EsPLTbet3bXg3ISMRpAGWai95mjBgFFbN7JGuh0CyoudKjPiiQ5GpRj 9P7St0QDyH6PnS+gP5sTURGJvfhYcRL31UVbz9E6pQofNT4YUlchDhm+B+FzPwMECpnA ijiz09mQCRMAgAFtI9vWiYCB9K9XXriz+Dyxl33Bsy/+rv8Skdax6Bmw/4I7aLAf65gG N4X2RXC+SwlFBgxUi6ojc1y84sOFIfLylqj+n2GDw1n1vksB9YtuUbFSyU0ua/fvmgfu 0gPlhRnP72zzi3Xz7EkN8czs5dK+qrfviOFE/VKVvqEVmFNg25iQo/MJEm3mP8cTPLfw 39cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701993627; x=1702598427; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; b=lV8M56bzMbMrHLUDsD/0w73+las4KV0NQQGXp1NAH3Ns1HbisrGR2Fek0adQeOILMA piBuFdRHWLAPuTrNWwVJv+CRyzHGB4sCTNrxdhgXq+LgCFddp8aoKEF/0DG/r8kdlZuu 30Ngfr5svyF+86ogZN8vmeIU5JtfV/pEj4a/RnkrYZku9TxKIIXBetrEXAdQkEwhuaPo wMOkoZuxZRMuGmGX727q3JoMF9eNpP2R1CkC3x5DOAP37R90F79C0R0Igay1AfIH2IIQ 1Wo/5ygVCYmSxJWBy0RkDjEYT7nn0jn6hktqMYamaE4SaxFQ5LBRbN2t1vpO8xznM0Jx 2t7Q== X-Gm-Message-State: AOJu0YyomOIAFtbT5UpwTc9fZgXAP61YMAFXdIeUnlg5LzE17iAtxde5 UFZ3At5nHuhrtJ91j25wtq04iGGATkw9DjVDrD0= X-Received: by 2002:a05:6102:5489:b0:464:44e0:8f9 with SMTP id bk9-20020a056102548900b0046444e008f9mr4223136vsb.35.1701993627086; Thu, 07 Dec 2023 16:00:27 -0800 (PST) MIME-Version: 1.0 References: <20231114014313.67232-1-v-songbaohua@oppo.com> <8c7f1a2f-57d2-4f20-abb2-394c7980008e@redhat.com> <5de66ff5-b6c8-4ffc-acd9-59aec4604ca4@redhat.com> <71c4b8b2-512a-4e50-9160-6ee77a5ec0a4@arm.com> <679a144a-db47-4d05-bbf7-b6a0514f5ed0@arm.com> <8aa8f095-1840-4a2e-ad06-3f375282ab6a@arm.com> <7065bbd3-64b3-4cd6-a2cd-146c556aac66@redhat.com> <1dcd6985-aa29-4df7-a7cb-ef57ae658861@redhat.com> In-Reply-To: <1dcd6985-aa29-4df7-a7cb-ef57ae658861@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 8 Dec 2023 13:00:15 +1300 Message-ID: Subject: Re: [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for large folios To: David Hildenbrand Cc: Ryan Roberts , Steven Price , akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, shy828301@gmail.com, v-songbaohua@oppo.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 07 Dec 2023 16:00:39 -0800 (PST) On Thu, Dec 7, 2023 at 11:04=E2=80=AFPM David Hildenbrand wrote: > > >> > >>> not per-folio? I'm also not sure what it buys us - instead of reading= a per-page > >>> flag we now have to read 128 bytes of tag for each page and check its= zero. > >> > >> My point is, if that is the corner case, we might not care about that. > > > > Hi David, > > Hi! > > > my understanding is that this is NOT a corner. Alternatively, it is > > really a common case. > > If it happens with < 1% of all large folios on swapout/swapin, it's not > the common case. Even if some scenarios you point out below can and will > happen. > Fair enough. If we define "corner case" based on the percentage of those fo= lios which can get partial MTE tags set or get partial MTE tags invalidated, I a= gree this is a corner case. I thought that a corner case was a case which could rarely happen. > > > > 1. a large folio can be partially unmapped when it is in swapche and > > after it is swapped out > > in all cases, its tags can be partially invalidated. I don't think > > this is a corner case, as long > > as userspaces are still working at the granularity of basepages, this > > is always going to > > happen. For example, userspace libc such as jemalloc can identify > > PAGESIZE, and use > > madvise(DONTNEED) to return memory to the kernel. Heap management is > > still working > > at the granularity of the basepage. > > > > 2. mprotect on a part of a large folio as Steven pointed out. > > > > 3.long term, we are working to swap-in large folios as a whole[1] just > > like swapping out large > > folios as a whole. for those ptes which are still contiguous swap > > entries, i mean, which > > are not unmapped by userspace after the large folios are swapped out > > to swap devices, > > we have a chance to swap in a whole large folio, we do have a chance > > to restore tags > > for the large folio without early-exit. but we still have a good > > chance to fall back to base > > page if we fail to allocate large folio, in this case, do_swap_page() > > still works at the > > granularity of basepage. and do_swap_page() will call swap_free(entry),= tags of > > > > this particular page can be invalidated as a result. > > I don't immediately see how that relates. You get a fresh small folio > and simply load that tag from the internal datastructure. No messing > with large folios required, because you don't have a large folio. So no > considerations about large folio batch MTE tag restore apply. right. I was thinking the original large folio was partially swapped-in and forgot the new allocated page was actually one folio with only one page :-) Indeed, in that case, it is still restoring the MTE tag for the whole folio with one page. > > > > > 4. too many early-exit might be negative to performance. > > > > > > So I am thinking that in the future, we need two helpers, > > 1. void __arch_swap_restore(swp_entry_t entry, struct page *page); > > this is always needed to support page-level tag restore. > > > > 2. void arch_swap_restore(swp_entry_t entry, struct folio *folio); > > this can be a helper when we are able to swap in a whole folio. two > > conditions must be met > > (a). PTEs entries are still contiguous swap entries just as when large > > folios were swapped > > out. > > (b). we succeed in the allocation of a large folio in do_swap_page. > > > > For this moment, we only need 1; we will add 2 in swap-in large folio s= eries. > > > > What do you think? > > I agree that it's better to keep it simple for now. > > -- > Cheers, > > David / dhildenb > Thanks Barry