From: Eric Biggers <[email protected]>
To install a buffer_head into the cpu's LRU queue, bh_lru_install()
would construct a new copy of the queue and then memcpy it over the real
queue. But it's easily possible to do the update in-place, which is
faster and simpler. Some work can also be skipped if the buffer_head
was already in the queue.
As a microbenchmark I timed how long it takes to run sb_getblk()
10,000,000 times alternating between BH_LRU_SIZE + 1 blocks.
Effectively, this benchmarks looking up buffer_heads that are in the
page cache but not in the LRU:
Before this patch: 1.758s
After this patch: 1.653s
This patch also removes about 350 bytes of compiled code (on x86_64),
partly due to removal of the memcpy() which was being inlined+unrolled.
Signed-off-by: Eric Biggers <[email protected]>
---
fs/buffer.c | 43 +++++++++++++++----------------------------
1 file changed, 15 insertions(+), 28 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d21771fcf7d3..282ca52517bf 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1273,44 +1273,31 @@ static inline void check_irqs_on(void)
}
/*
- * The LRU management algorithm is dopey-but-simple. Sorry.
+ * Install a buffer_head into this cpu's LRU. If not already in the LRU, it is
+ * inserted at the front, and the buffer_head at the back if any is evicted.
+ * Or, if already in the LRU it is moved to the front.
*/
static void bh_lru_install(struct buffer_head *bh)
{
- struct buffer_head *evictee = NULL;
+ struct buffer_head *evictee = bh;
+ struct bh_lru *b;
+ int i;
check_irqs_on();
bh_lru_lock();
- if (__this_cpu_read(bh_lrus.bhs[0]) != bh) {
- struct buffer_head *bhs[BH_LRU_SIZE];
- int in;
- int out = 0;
-
- get_bh(bh);
- bhs[out++] = bh;
- for (in = 0; in < BH_LRU_SIZE; in++) {
- struct buffer_head *bh2 =
- __this_cpu_read(bh_lrus.bhs[in]);
- if (bh2 == bh) {
- __brelse(bh2);
- } else {
- if (out >= BH_LRU_SIZE) {
- BUG_ON(evictee != NULL);
- evictee = bh2;
- } else {
- bhs[out++] = bh2;
- }
- }
+ b = this_cpu_ptr(&bh_lrus);
+ for (i = 0; i < BH_LRU_SIZE; i++) {
+ swap(evictee, b->bhs[i]);
+ if (evictee == bh) {
+ bh_lru_unlock();
+ return;
}
- while (out < BH_LRU_SIZE)
- bhs[out++] = NULL;
- memcpy(this_cpu_ptr(&bh_lrus.bhs), bhs, sizeof(bhs));
}
- bh_lru_unlock();
- if (evictee)
- __brelse(evictee);
+ get_bh(bh);
+ bh_lru_unlock();
+ brelse(evictee);
}
/*
--
2.11.0
On Thu, Dec 29, 2016 at 01:34:45PM -0600, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> To install a buffer_head into the cpu's LRU queue, bh_lru_install()
> would construct a new copy of the queue and then memcpy it over the real
> queue. But it's easily possible to do the update in-place, which is
> faster and simpler. Some work can also be skipped if the buffer_head
> was already in the queue.
>
> As a microbenchmark I timed how long it takes to run sb_getblk()
> 10,000,000 times alternating between BH_LRU_SIZE + 1 blocks.
> Effectively, this benchmarks looking up buffer_heads that are in the
> page cache but not in the LRU:
>
> Before this patch: 1.758s
> After this patch: 1.653s
>
> This patch also removes about 350 bytes of compiled code (on x86_64),
> partly due to removal of the memcpy() which was being inlined+unrolled.
>
> Signed-off-by: Eric Biggers <[email protected]>
Ping? Al, do you have any interest in taking this patch?
- Eric
On Sat, Mar 25, 2017 at 08:34:30PM -0700, Eric Biggers wrote:
> On Thu, Dec 29, 2016 at 01:34:45PM -0600, Eric Biggers wrote:
> > From: Eric Biggers <[email protected]>
> >
> > To install a buffer_head into the cpu's LRU queue, bh_lru_install()
> > would construct a new copy of the queue and then memcpy it over the real
> > queue. But it's easily possible to do the update in-place, which is
> > faster and simpler. Some work can also be skipped if the buffer_head
> > was already in the queue.
> >
> > As a microbenchmark I timed how long it takes to run sb_getblk()
> > 10,000,000 times alternating between BH_LRU_SIZE + 1 blocks.
> > Effectively, this benchmarks looking up buffer_heads that are in the
> > page cache but not in the LRU:
> >
> > Before this patch: 1.758s
> > After this patch: 1.653s
> >
> > This patch also removes about 350 bytes of compiled code (on x86_64),
> > partly due to removal of the memcpy() which was being inlined+unrolled.
> >
> > Signed-off-by: Eric Biggers <[email protected]>
>
> Ping? Al, do you have any interest in taking this patch?
>
> - Eric
IMO, It is great patch.
Could you please share your microbenchmark steps?
I wanna take a deep look at it.