2010-01-29 20:54:31

by Christoph Lameter

[permalink] [raw]
Subject: dentries: dentry defragmentation

The dentry pruning for unused entries works in a straightforward way. It
could be made more aggressive if one would actually move dentries instead
of just reclaiming them.

Cc: Alexander Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>

---
fs/dcache.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 100 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c 2009-12-18 13:13:24.000000000 -0600
+++ linux-2.6/fs/dcache.c 2010-01-29 12:10:37.000000000 -0600
@@ -33,6 +33,7 @@
#include <linux/bootmem.h>
#include <linux/fs_struct.h>
#include <linux/hardirq.h>
+#include <linux/backing-dev.h>
#include "internal.h"

int sysctl_vfs_cache_pressure __read_mostly = 100;
@@ -173,7 +174,10 @@ static struct dentry *d_kill(struct dent

list_del(&dentry->d_u.d_child);
dentry_stat.nr_dentry--; /* For d_free, below */
- /*drops the locks, at that point nobody can reach this dentry */
+ /*
+ * drops the locks, at that point nobody (aside from defrag)
+ * can reach this dentry
+ */
dentry_iput(dentry);
if (IS_ROOT(dentry))
parent = NULL;
@@ -2263,6 +2267,100 @@ static void __init dcache_init_early(voi
INIT_HLIST_HEAD(&dentry_hashtable[loop]);
}

+/*
+ * The slab allocator is holding off frees. We can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *get_dentries(struct kmem_cache *s, int nr, void **v)
+{
+ struct dentry *dentry;
+ int i;
+
+ spin_lock(&dcache_lock);
+ for (i = 0; i < nr; i++) {
+ dentry = v[i];
+
+ /*
+ * Three sorts of dentries cannot be reclaimed:
+ *
+ * 1. dentries that are in the process of being allocated
+ * or being freed. In that case the dentry is neither
+ * on the LRU nor hashed.
+ *
+ * 2. Fake hashed entries as used for anonymous dentries
+ * and pipe I/O. The fake hashed entries have d_flags
+ * set to indicate a hashed entry. However, the
+ * d_hash field indicates that the entry is not hashed.
+ *
+ * 3. dentries that have a backing store that is not
+ * writable. This is true for tmpsfs and other in
+ * memory filesystems. Removing dentries from them
+ * would loose dentries for good.
+ */
+ if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) ||
+ (!d_unhashed(dentry) && hlist_unhashed(&dentry->d_hash)) ||
+ (dentry->d_inode &&
+ !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping)))
+ /* Ignore this dentry */
+ v[i] = NULL;
+ else
+ /* dget_locked will remove the dentry from the LRU */
+ dget_locked(dentry);
+ }
+ spin_unlock(&dcache_lock);
+ return NULL;
+}
+
+/*
+ * Slab has dropped all the locks. Get rid of the refcount obtained
+ * earlier and also free the object.
+ */
+static void kick_dentries(struct kmem_cache *s,
+ int nr, void **v, void *private)
+{
+ struct dentry *dentry;
+ int i;
+
+ /*
+ * First invalidate the dentries without holding the dcache lock
+ */
+ for (i = 0; i < nr; i++) {
+ dentry = v[i];
+
+ if (dentry)
+ d_invalidate(dentry);
+ }
+
+ /*
+ * If we are the last one holding a reference then the dentries can
+ * be freed. We need the dcache_lock.
+ */
+ spin_lock(&dcache_lock);
+ for (i = 0; i < nr; i++) {
+ dentry = v[i];
+ if (!dentry)
+ continue;
+
+ spin_lock(&dentry->d_lock);
+ if (atomic_read(&dentry->d_count) > 1) {
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_lock);
+ dput(dentry);
+ spin_lock(&dcache_lock);
+ continue;
+ }
+
+ prune_one_dentry(dentry);
+ }
+ spin_unlock(&dcache_lock);
+
+ /*
+ * dentries are freed using RCU so we need to wait until RCU
+ * operations are complete.
+ */
+ synchronize_rcu();
+}
+
static void __init dcache_init(void)
{
int loop;
@@ -2276,6 +2374,7 @@ static void __init dcache_init(void)
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);

register_shrinker(&dcache_shrinker);
+ kmem_cache_setup_defrag(dentry_cache, get_dentries, kick_dentries);

/* Hash may have been set up in dcache_init_early */
if (!hashdist)

--


2010-01-29 22:01:12

by Al Viro

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Fri, Jan 29, 2010 at 02:49:48PM -0600, Christoph Lameter wrote:
> + if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) ||
> + (!d_unhashed(dentry) && hlist_unhashed(&dentry->d_hash)) ||
> + (dentry->d_inode &&
> + !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping)))
> + /* Ignore this dentry */
> + v[i] = NULL;
> + else
> + /* dget_locked will remove the dentry from the LRU */
> + dget_locked(dentry);
> + }
> + spin_unlock(&dcache_lock);
> + return NULL;
> +}

No. As the matter of fact - fuck, no. For one thing, it's going to race
with umount. For another, kicking busy dentry out of hash is worse than
useless - you are just asking to get more and more copies of that sucker
in dcache. This is fundamentally bogus, especially since there is a 100%
safe time for killing dentry - when dput() drives the refcount to 0 and
you *are* doing dput() on the references you've acquired. If anything, I'd
suggest setting a flag that would trigger immediate freeing on the final
dput().

And that does not cover the umount races. You *can't* go around grabbing
dentries without making sure that superblock won't be shut down under
you. And no, I don't know how to deal with that cleanly - simply bumping
superblock ->s_count under sb_lock is enough to make sure it's not freed
under you, but what you want is more than that. An active reference would
be enough, except that you'd get sudden "oh, sorry, now there's no way
to make sure that superblock is shut down at umount(2), no matter what kind
of setup you have". So you really need to get ->s_umount held shared,
which is, not particulary locking-order-friendly, to put it mildly.

2010-02-01 08:27:13

by Nick Piggin

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Fri, Jan 29, 2010 at 10:00:44PM +0000, Al Viro wrote:
> On Fri, Jan 29, 2010 at 02:49:48PM -0600, Christoph Lameter wrote:
> > + if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) ||
> > + (!d_unhashed(dentry) && hlist_unhashed(&dentry->d_hash)) ||
> > + (dentry->d_inode &&
> > + !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping)))
> > + /* Ignore this dentry */
> > + v[i] = NULL;
> > + else
> > + /* dget_locked will remove the dentry from the LRU */
> > + dget_locked(dentry);
> > + }
> > + spin_unlock(&dcache_lock);
> > + return NULL;
> > +}
>
> No. As the matter of fact - fuck, no. For one thing, it's going to race
> with umount. For another, kicking busy dentry out of hash is worse than
> useless - you are just asking to get more and more copies of that sucker
> in dcache. This is fundamentally bogus, especially since there is a 100%
> safe time for killing dentry - when dput() drives the refcount to 0 and
> you *are* doing dput() on the references you've acquired. If anything, I'd
> suggest setting a flag that would trigger immediate freeing on the final
> dput().
>
> And that does not cover the umount races. You *can't* go around grabbing
> dentries without making sure that superblock won't be shut down under
> you. And no, I don't know how to deal with that cleanly - simply bumping
> superblock ->s_count under sb_lock is enough to make sure it's not freed
> under you, but what you want is more than that. An active reference would
> be enough, except that you'd get sudden "oh, sorry, now there's no way
> to make sure that superblock is shut down at umount(2), no matter what kind
> of setup you have". So you really need to get ->s_umount held shared,
> which is, not particulary locking-order-friendly, to put it mildly.

I always preferred to do defrag in the opposite way. Ie. query the
slab allocator from existing shrinkers rather than opposite way
around. This lets you reuse more of the locking and refcounting etc.

So you have a pin on the object somehow via the normal shrinker path,
and therefore you get a pin on the underlying slab. I would just like
to see even performance of a real simple approach that just asks
whether we are in this slab defrag mode, and if so, whether the slab
is very sparse. If yes, then reclaim aggressively.

If that doesn't perform well enough and you have to go further and
discover objects on the same slab, then it does get a bit more
tricky because:
- you need the pin on the first object in order to discover more
- discovered objects may not be expected in the existing shrinker
code that just picks objects off LRUs

However your code already has to handle the 2nd case anyway, and for
the 1st case it is probably not too hard to do with dcache/icache. And
in either case you seem to avoid the worst of the sleeping and lock
ordering and slab inversion problems of your ->get approach.

But I'm really interested to see numbers, and especially numbers of
the simpler approaches before adding this complexity.

2010-02-01 10:10:19

by Andi Kleen

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 06:08:35PM +1100, Nick Piggin wrote:
> I always preferred to do defrag in the opposite way. Ie. query the
> slab allocator from existing shrinkers rather than opposite way
> around. This lets you reuse more of the locking and refcounting etc.

I looked at this for hwpoison soft offline.

But it works really badly because the LRU list ordering
has nothing to do with the actual ordering inside the slab pages.

Christoph's basic approach is more efficient.

> So you have a pin on the object somehow via the normal shrinker path,
> and therefore you get a pin on the underlying slab. I would just like
> to see even performance of a real simple approach that just asks
> whether we are in this slab defrag mode, and if so, whether the slab
> is very sparse. If yes, then reclaim aggressively.

The typical result is that you need to get through most of the LRU
list (and prune them all) just to free the page.

>
> If that doesn't perform well enough and you have to go further and

It doesn't.

-Andi
--
[email protected] -- Speaking for myself only.

2010-02-01 10:16:52

by Nick Piggin

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 11:10:13AM +0100, Andi Kleen wrote:
> On Mon, Feb 01, 2010 at 06:08:35PM +1100, Nick Piggin wrote:
> > I always preferred to do defrag in the opposite way. Ie. query the
> > slab allocator from existing shrinkers rather than opposite way
> > around. This lets you reuse more of the locking and refcounting etc.
>
> I looked at this for hwpoison soft offline.
>
> But it works really badly because the LRU list ordering
> has nothing to do with the actual ordering inside the slab pages.

No, you don't *have* to follow LRU order. The most important thing
is if you followed what I wrote is to get a pin on the objects and
the slabs via the regular shrinker path first, then querying slab
rather than calling into all these subsystems from an atomic, and
non-slab-reentrant path.

Following LRU order would just be the first and simplest cut at
this.


> Christoph's basic approach is more efficient.

I want to see numbers because it is also the far more complex
approach.


> > So you have a pin on the object somehow via the normal shrinker path,
> > and therefore you get a pin on the underlying slab. I would just like
> > to see even performance of a real simple approach that just asks
> > whether we are in this slab defrag mode, and if so, whether the slab
> > is very sparse. If yes, then reclaim aggressively.
>
> The typical result is that you need to get through most of the LRU
> list (and prune them all) just to free the page.

Really? If you have a large proportion of slabs which are quite
internally fragmented, then I would have thought it would give a
significant improvement (aggressive reclaim, that is).


> > If that doesn't perform well enough and you have to go further and
>
> It doesn't.

Can we see your numbers? And the patches you tried?

Thanks,
Nick

2010-02-01 10:23:00

by Andi Kleen

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 09:16:45PM +1100, Nick Piggin wrote:
> On Mon, Feb 01, 2010 at 11:10:13AM +0100, Andi Kleen wrote:
> > On Mon, Feb 01, 2010 at 06:08:35PM +1100, Nick Piggin wrote:
> > > I always preferred to do defrag in the opposite way. Ie. query the
> > > slab allocator from existing shrinkers rather than opposite way
> > > around. This lets you reuse more of the locking and refcounting etc.
> >
> > I looked at this for hwpoison soft offline.
> >
> > But it works really badly because the LRU list ordering
> > has nothing to do with the actual ordering inside the slab pages.
>
> No, you don't *have* to follow LRU order. The most important thing

What list would you follow then?

There's LRU, there's hast (which is as random) and there's slab
itself. The only one who is guaranteed to match the physical
layout in memory is slab. That is what this patchkit is trying
to attempt.

> is if you followed what I wrote is to get a pin on the objects and

Which objects? You first need to collect all that belong to a page.
How else would you do that?

> > > whether we are in this slab defrag mode, and if so, whether the slab
> > > is very sparse. If yes, then reclaim aggressively.
> >
> > The typical result is that you need to get through most of the LRU
> > list (and prune them all) just to free the page.
>
> Really? If you have a large proportion of slabs which are quite
> internally fragmented, then I would have thought it would give a
> significant improvement (aggressive reclaim, that is)


You wrote the same as me?


>
>
> > > If that doesn't perform well enough and you have to go further and
> >
> > It doesn't.
>
> Can we see your numbers? And the patches you tried?

What I tried (in some dirty patches you probably don't want to see)
was to just implement slab shrinking for a single page for soft hwpoison.
But it didn't work too well because it couldn't free the objects
still actually in the dcache.

Then I called the shrinker and tried to pass in the page as a hint
and drop only objects on that page, but I realized that it's terrible
inefficient to do it this way.

Now soft hwpoison doesn't care about a little inefficiency, but I still
didn't like to be terrible inefficient.

That is why I asked Christoph to repost his old patchkit that can
do the shrink from the slab side (which is the right order here)

BTW the other potential user for this would be defragmentation
for large page allocation.

-Andi


--
[email protected] -- Speaking for myself only.

2010-02-01 10:35:32

by Nick Piggin

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 11:22:53AM +0100, Andi Kleen wrote:
> On Mon, Feb 01, 2010 at 09:16:45PM +1100, Nick Piggin wrote:
> > On Mon, Feb 01, 2010 at 11:10:13AM +0100, Andi Kleen wrote:
> > > On Mon, Feb 01, 2010 at 06:08:35PM +1100, Nick Piggin wrote:
> > > > I always preferred to do defrag in the opposite way. Ie. query the
> > > > slab allocator from existing shrinkers rather than opposite way
> > > > around. This lets you reuse more of the locking and refcounting etc.
> > >
> > > I looked at this for hwpoison soft offline.
> > >
> > > But it works really badly because the LRU list ordering
> > > has nothing to do with the actual ordering inside the slab pages.
> >
> > No, you don't *have* to follow LRU order. The most important thing
>
> What list would you follow then?

You can follow the slab, as I said in the first mail.

> There's LRU, there's hast (which is as random) and there's slab
> itself. The only one who is guaranteed to match the physical
> layout in memory is slab. That is what this patchkit is trying
> to attempt.
>
> > is if you followed what I wrote is to get a pin on the objects and
>
> Which objects? You first need to collect all that belong to a page.
> How else would you do that?

Objects that you're interested in reclaiming, I guess. I don't
understand the question.


> > > > whether we are in this slab defrag mode, and if so, whether the slab
> > > > is very sparse. If yes, then reclaim aggressively.
> > >
> > > The typical result is that you need to get through most of the LRU
> > > list (and prune them all) just to free the page.
> >
> > Really? If you have a large proportion of slabs which are quite
> > internally fragmented, then I would have thought it would give a
> > significant improvement (aggressive reclaim, that is)
>
>
> You wrote the same as me?

Aggressive reclaim: as-in, ignoring referenced bit on the LRU,
*possibly* even trying to actively invalidate the dentry.


> > > > If that doesn't perform well enough and you have to go further and
> > >
> > > It doesn't.
> >
> > Can we see your numbers? And the patches you tried?
>
> What I tried (in some dirty patches you probably don't want to see)
> was to just implement slab shrinking for a single page for soft hwpoison.
> But it didn't work too well because it couldn't free the objects
> still actually in the dcache.
>
> Then I called the shrinker and tried to pass in the page as a hint
> and drop only objects on that page, but I realized that it's terrible
> inefficient to do it this way.
>
> Now soft hwpoison doesn't care about a little inefficiency, but I still
> didn't like to be terrible inefficient.
>
> That is why I asked Christoph to repost his old patchkit that can
> do the shrink from the slab side (which is the right order here)

Right, but as you can see it is complex to do it this way. And I
think for reclaim driven targetted reclaim, then it needn't be so
inefficient because you aren't restricted to just one page, but
in any page which is heavily fragmented (and by definition there
should be a lot of them in the system).

Hwpoison I don't think adds much weight, frankly. Just panic and
reboot if you get unrecoverable error. We have everything to handle
that so I can't see how it's worth adding much complexity to the
kernel for.

2010-02-01 10:45:46

by Andi Kleen

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 09:35:26PM +1100, Nick Piggin wrote:
> > > > > I always preferred to do defrag in the opposite way. Ie. query the
> > > > > slab allocator from existing shrinkers rather than opposite way
> > > > > around. This lets you reuse more of the locking and refcounting etc.
> > > >
> > > > I looked at this for hwpoison soft offline.
> > > >
> > > > But it works really badly because the LRU list ordering
> > > > has nothing to do with the actual ordering inside the slab pages.
> > >
> > > No, you don't *have* to follow LRU order. The most important thing
> >
> > What list would you follow then?
>
> You can follow the slab, as I said in the first mail.

That's pretty much what Christoph's patchkit is about (with yes some details
improved)

>
> > There's LRU, there's hast (which is as random) and there's slab
> > itself. The only one who is guaranteed to match the physical
> > layout in memory is slab. That is what this patchkit is trying
> > to attempt.
> >
> > > is if you followed what I wrote is to get a pin on the objects and
> >
> > Which objects? You first need to collect all that belong to a page.
> > How else would you do that?
>
> Objects that you're interested in reclaiming, I guess. I don't
> understand the question.

Objects that are in the same page

There are really two different cases here:
- Run out of memory: in this case i just want to find all the objects
of any page, ideally of not that recently used pages.
- I am very fragmented and want a specific page freed to get a 2MB
region back or for hwpoison: same, but do it for a specific page.


> Right, but as you can see it is complex to do it this way. And I
> think for reclaim driven targetted reclaim, then it needn't be so
> inefficient because you aren't restricted to just one page, but
> in any page which is heavily fragmented (and by definition there
> should be a lot of them in the system).

Assuming you can identify them quickly.

>
> Hwpoison I don't think adds much weight, frankly. Just panic and
> reboot if you get unrecoverable error. We have everything to handle

This is for soft hwpoison :- offlining pages that might go bad
in the future.

But soft hwpoison isn't the only user. The other big one would
be for large pages or other large page allocations.

-Andi
--
[email protected] -- Speaking for myself only.

2010-02-01 10:56:43

by Nick Piggin

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 11:45:44AM +0100, Andi Kleen wrote:
> On Mon, Feb 01, 2010 at 09:35:26PM +1100, Nick Piggin wrote:
> > > > > > I always preferred to do defrag in the opposite way. Ie. query the
> > > > > > slab allocator from existing shrinkers rather than opposite way
> > > > > > around. This lets you reuse more of the locking and refcounting etc.
> > > > >
> > > > > I looked at this for hwpoison soft offline.
> > > > >
> > > > > But it works really badly because the LRU list ordering
> > > > > has nothing to do with the actual ordering inside the slab pages.
> > > >
> > > > No, you don't *have* to follow LRU order. The most important thing
> > >
> > > What list would you follow then?
> >
> > You can follow the slab, as I said in the first mail.
>
> That's pretty much what Christoph's patchkit is about (with yes some details
> improved)

I know what the patch is about. Can you re-read my first mail?


> > > There's LRU, there's hast (which is as random) and there's slab
> > > itself. The only one who is guaranteed to match the physical
> > > layout in memory is slab. That is what this patchkit is trying
> > > to attempt.
> > >
> > > > is if you followed what I wrote is to get a pin on the objects and
> > >
> > > Which objects? You first need to collect all that belong to a page.
> > > How else would you do that?
> >
> > Objects that you're interested in reclaiming, I guess. I don't
> > understand the question.
>
> Objects that are in the same page

OK, well you can pin an object, and from there you can find other
objects in the same page.

This is totally different to how Christoph's patch has to pin the
slab, then (in a restrictive context) pin the objects, then go to
a more relaxed context to reclaim the objects. This is where much
of the complexity comes from.


> There are really two different cases here:
> - Run out of memory: in this case i just want to find all the objects
> of any page, ideally of not that recently used pages.
> - I am very fragmented and want a specific page freed to get a 2MB
> region back or for hwpoison: same, but do it for a specific page.
>
>
> > Right, but as you can see it is complex to do it this way. And I
> > think for reclaim driven targetted reclaim, then it needn't be so
> > inefficient because you aren't restricted to just one page, but
> > in any page which is heavily fragmented (and by definition there
> > should be a lot of them in the system).
>
> Assuming you can identify them quickly.

Well because there are a large number of them, then you are likely
to encounter one very quickly just off the LRU list.


> > Hwpoison I don't think adds much weight, frankly. Just panic and
> > reboot if you get unrecoverable error. We have everything to handle
>
> This is for soft hwpoison :- offlining pages that might go bad
> in the future.

I still don't think it adds much weight. Especially if you can just
try an inefficient scan.


> But soft hwpoison isn't the only user. The other big one would
> be for large pages or other large page allocations.

2010-02-01 13:25:35

by Andi Kleen

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

>
> > > Right, but as you can see it is complex to do it this way. And I
> > > think for reclaim driven targetted reclaim, then it needn't be so
> > > inefficient because you aren't restricted to just one page, but
> > > in any page which is heavily fragmented (and by definition there
> > > should be a lot of them in the system).
> >
> > Assuming you can identify them quickly.
>
> Well because there are a large number of them, then you are likely
> to encounter one very quickly just off the LRU list.

There were some cases in the past where this wasn't the case.
But yes some uptodate numbers on this would be good.

Also it doesn't address the second case here quoted again.

> > There are really two different cases here:
> > - Run out of memory: in this case i just want to find all the objects
> > of any page, ideally of not that recently used pages.
> > - I am very fragmented and want a specific page freed to get a 2MB
> > region back or for hwpoison: same, but do it for a specific page.
> >
>
>
> I still don't think it adds much weight. Especially if you can just
> try an inefficient scan.

Also see second point below.
>
>
> > But soft hwpoison isn't the only user. The other big one would
> > be for large pages or other large page allocations.


-Andi

--
[email protected] -- Speaking for myself only.

2010-02-01 13:36:15

by Nick Piggin

[permalink] [raw]
Subject: Re: dentries: dentry defragmentation

On Mon, Feb 01, 2010 at 02:25:27PM +0100, Andi Kleen wrote:
> >
> > > > Right, but as you can see it is complex to do it this way. And I
> > > > think for reclaim driven targetted reclaim, then it needn't be so
> > > > inefficient because you aren't restricted to just one page, but
> > > > in any page which is heavily fragmented (and by definition there
> > > > should be a lot of them in the system).
> > >
> > > Assuming you can identify them quickly.
> >
> > Well because there are a large number of them, then you are likely
> > to encounter one very quickly just off the LRU list.
>
> There were some cases in the past where this wasn't the case.
> But yes some uptodate numbers on this would be good.
>
> Also it doesn't address the second case here quoted again.
>
> > > There are really two different cases here:
> > > - Run out of memory: in this case i just want to find all the objects
> > > of any page, ideally of not that recently used pages.
> > > - I am very fragmented and want a specific page freed to get a 2MB
> > > region back or for hwpoison: same, but do it for a specific page.
> > >
> >
> >
> > I still don't think it adds much weight. Especially if you can just
> > try an inefficient scan.
>
> Also see second point below.
> >
> >
> > > But soft hwpoison isn't the only user. The other big one would
> > > be for large pages or other large page allocations.

Well yes it's possible that it could help there.

But it is always possible to do the same reclaim work via the LRU, in
worst case it just requires reclaiming of most objects. So it
probably doesn't fundamentally enable something we can't do already.
More a matter of performance, so again, numbers are needed.