Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7112679rwb; Wed, 23 Nov 2022 02:15:28 -0800 (PST) X-Google-Smtp-Source: AA0mqf6wF6E0TtKsiJt7GSpv+MTZCVbg58xfrWhKQxX3hh2Gec0fgKy1yT1JX8ALB2o7OThNIZu7 X-Received: by 2002:a05:6a00:2444:b0:558:991a:6671 with SMTP id d4-20020a056a00244400b00558991a6671mr8476254pfj.58.1669198528147; Wed, 23 Nov 2022 02:15:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669198528; cv=none; d=google.com; s=arc-20160816; b=SCdUSm2bJlb4nTCY3b8UoIG4LcRH+sZiipEW8qUsqHkMtiYdgUGrrKgc6AC82O8Xn+ ciSjavrzlVeQpk95QiBQ3Q+LH9ZPrQuH2Q19Zr2YknHnQnxQHIQsF3QxZoDXFXFu7TXG nJWsz20H9s20MuDPgu6rTjICFKCJcjaxEu0LYk3/ijbmXvEw4NcY5jrZpA2jr9/nAwEs XS1CTaBNqZfzjqsF3DJqmqws28RLFrOh6Tc/95Iye1nosYHAO7HpWoGMsqcA19VIAYFZ UI7KQfNFDFAyk5TP5ZYVn4g+JX6wyiQEEcMMdsagW+/WV30FxNYqDwqC3gfbBITzClOo Sl3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=tq+poMhz8Fd2G5ba0zcljDsZPPBdKlmqRNVJ3P/Wuag=; b=ZtlT5QSCZ5Z2772pz4a6Y4QNXktdRbV69ZLiddPjrtlrDc5AI4xnJZGf49lxKeEg1+ JptUHANCS+Pcqf+D2PFLqotBxp4aBTNznT0fxSFVwaC/jlSjpOREh9alaOrSxJ0BVqWK cam8wMYXvcjOS0PqS6ZEqD9pztfZu9Pg1wSgZlz1/xyaqG04RsJutQoO3i6L8j+sZZmu pDxJpZ8Xo0Ybdr1mOV7ifdTETMWjgckDH/gBNewX7N39KMlSUJfA3yeVSYnUn/ephSS3 ZkJTjHulj5vD+AQCX4vBmgFFuXc6bUTGDXCDSYmJxCCawm3AGiJ7MEJk7DF55gxqJiuV Uzbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BeOBYHJy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lw5-20020a17090b180500b00218a7391526si1432801pjb.186.2022.11.23.02.15.15; Wed, 23 Nov 2022 02:15:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BeOBYHJy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236435AbiKWIMH (ORCPT + 89 others); Wed, 23 Nov 2022 03:12:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236405AbiKWIMD (ORCPT ); Wed, 23 Nov 2022 03:12:03 -0500 Received: from mail-il1-x12b.google.com (mail-il1-x12b.google.com [IPv6:2607:f8b0:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C0AFECCEA for ; Wed, 23 Nov 2022 00:12:00 -0800 (PST) Received: by mail-il1-x12b.google.com with SMTP id h2so7520063ile.11 for ; Wed, 23 Nov 2022 00:12:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=tq+poMhz8Fd2G5ba0zcljDsZPPBdKlmqRNVJ3P/Wuag=; b=BeOBYHJyJVbmPozj1iUuXCTlux28dk2FT1rBJ4scTok+C71FHo4TscMconGw/go53e ZRbQhXPucXNgbePvxpPArYnp29tfGuGD+xpgQTRhGd5M2GhONKtYwXLGWtnbl9NgXTpN JUstsbLuPqeVK8dP5Fzu9wN+96gVdVEQevAtCrQJ/E3K6b1VKGQ3Gc1JeYn2AvlwJjGx tx6H+l0lyiagspwpx5YpvuxmlRcdHpTIpRYoboz9OJBIvkIBVr6QRBzfKz9fOGUr4g6r BItl1J+T/7vWnb4ReOd81gCIdPsra1KsL0htnfKgkD9BJZeVQu3VKLxW1Vtr3g4HvDiR 68cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tq+poMhz8Fd2G5ba0zcljDsZPPBdKlmqRNVJ3P/Wuag=; b=Jw0aQTnPz3YY6cIYzeA6T8X5wUDkm6NMuRxMraFaPFdOOyFZq0hhuz2zIxrj5KFpRJ qglugAe2em4tZQDaaGPifWskWQssJU2xFhVUPSlDGobpppT7vSZoZco4nB7QbvH/rYap fZVCq6M1NT1Y7KbuPn46UnCY/p/+17e1NlS/UStS1ItfcVax+5tEdQAEcPKQzN7uWBGg k7I5WtrWAzC2fwPBGrljbmGrmASgOZyVKukBVKg880ThimUXe37TMbj6KuFDunfIae9W RkOnpoCaPkiiJ9po3w2USL7DCECH4OA7nyViPZnKLKJbX6WqAucIoTBa3ejvvFvj4MEk n2Lw== X-Gm-Message-State: ANoB5pl68C1qyp4IwFjNl5/rvxXvo9xk0zgVoO5zrd3BSxLIch0Z0NFm w+xGgYmkR6A3EWaMxVev6nP2sEHx2fWDTepRKSAFAA== X-Received: by 2002:a92:7310:0:b0:302:571f:8d7f with SMTP id o16-20020a927310000000b00302571f8d7fmr12089082ilc.53.1669191119744; Wed, 23 Nov 2022 00:11:59 -0800 (PST) MIME-Version: 1.0 References: <20221119001536.2086599-1-nphamcs@gmail.com> <20221119001536.2086599-5-nphamcs@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 23 Nov 2022 00:11:24 -0800 Message-ID: Subject: Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order To: Sergey Senozhatsky Cc: Johannes Weiner , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 23, 2022 at 12:02 AM Yosry Ahmed wrote: > > On Tue, Nov 22, 2022 at 7:50 PM Sergey Senozhatsky > wrote: > > > > On (22/11/22 12:42), Johannes Weiner wrote: > > > On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote: > > > > On (22/11/18 16:15), Nhat Pham wrote: > > > > [..] > > > > > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, > > > > > obj_to_location(obj, &page, &obj_idx); > > > > > zspage = get_zspage(page); > > > > > > > > > > +#ifdef CONFIG_ZPOOL > > > > > + /* Move the zspage to front of pool's LRU */ > > > > > + if (mm == ZS_MM_WO) { > > > > > + if (!list_empty(&zspage->lru)) > > > > > + list_del(&zspage->lru); > > > > > + list_add(&zspage->lru, &pool->lru); > > > > > + } > > > > > +#endif > > > > > > > > Do we consider pages that were mapped for MM_RO/MM_RW as cold? > > > > I wonder why, we use them, so technically they are not exactly > > > > "least recently used". > > > > > > This is a swap LRU. Per definition there are no ongoing accesses to > > > the memory while the page is swapped out that would make it "hot". > > > > Hmm. Not arguing, just trying to understand some things. > > > > There are no accesses to swapped out pages yes, but zspage holds multiple > > objects, which are compressed swapped out pages in this particular case. > > For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage, > > that is 93 compressed swapped out pages. Consider ZS_FULL zspages which > > is at the tail of the LRU list. Suppose that we page-faulted 20 times and > > read 20 objects from that zspage, IOW zspage has been in use 20 times very > > recently, while writeback still considers it to be "not-used" and will > > evict it. > > > > So if this works for you then I'm fine. But we probably, like you suggested, > > can document a couple of things here - namely why WRITE access to zspage > > counts as "zspage is in use" but READ access to the same zspage does not > > count as "zspage is in use". > > > > I guess the key here is that we have an LRU of zspages, when we really > want an LRU of compressed objects. In some cases, we may end up > reclaiming the wrong pages. > > Assuming we have 2 zspages, Z1 and Z2, and 4 physical pages that we > compress over time, P1 -> P4. > > Let's assume P1 -> P4 get compressed in order (P4 is the hottest > page), and they get assigned to zspages as follows: > Z1: P1, P3 > Z2: P2, P4 > > In this case, the zspages LRU would be Z2->Z1, because Z2 was touched > last when we compressed P4. Now if we want to writeback, we will look > at Z1, and we might end up reclaiming P3, depending on the order the > pages are stored in. > > A worst case scenario of this would be if we have a large number of > pages, maybe 1000, P1->P1000 (where P1000 is the hottest), and they > all go into Z1 and Z2 in this way: > Z1: P1 -> P499, P1000 > Z2: P500 -> P999 > > In this case, Z1 contains 499 cold pages, but it got P1000 at the end > which caused us to put it on the front of the LRU. Now writeback will > consistently use Z2. This is bad. Now I have no idea how practical > this is, but it seems fairly random, based on the compression size of > pages and access patterns. > > Does this mean we should move zspages to the front of the LRU when we > writeback from them? No, I wouldn't say so. The same exact scenario > can happen because of this. Imagine the following assignment of the > 1000 pages: > Z1: P (P1, P3, .., P999) > Z2: P (P2, P4, .., P1000) > > Z2 is at the front of the LRU because it has P1000, so the first time > we do writeback we will start at Z1. Once we reclaim one object from > Z1, we will start writeback from Z2 next time, and we will keep > alternating. Now if we are really unlucky, we can end up reclaiming in > this order P999, P1000, P997, P998, ... . So yeah I don't think > putting zspages in the front of the LRU when we writeback is the > answer. I would even say it's completely orthogonal to the problem, > because writing back an object from the zspage at the end of the LRU > gives us 0 information about the state of other objects on the same > zspage. > > Ideally, we would have an LRU of objects instead, but this would be > very complicated with the current form of writeback. It would be much > easier if we have an LRU for zswap entries instead, which is something > I am looking into, and is a much bigger surgery, and should be > separate from this work. Today zswap inverts LRU priorities anyway by > sending hot pages to the swapfile when zswap is full, when colder > pages are in zswap, so I wouldn't really worry about this now :) Oh, I didn't realize we reclaim all the objects in the zspage at the end of the LRU. All the examples are wrong, but the concept still stands, the problem is that we have an LRU of zspages not an LRU of objects. Nonetheless, the fact that we refaulted an object in a zspage does not necessarily mean that other objects on the same are hotter than objects in other zspages IIUC.