v1->v2:
1. Replace the variable active_refault with mmap_miss. Now mmap_miss will
not decreased if folio is active prior to eviction.
2. Jan has given me other two patches which aims to let mmap_miss properly
increased when the page is not ready. But in my scenario, the problem
is that the page will be reclaimed immediately. These two patches have
no logic conflict with Jan's patches[3].
Recently, when install package in a docker which almost reached its memory
limit, the installer has no respond severely for more than 15 minutes.
During this period, I/O stays high(~1G/s) and influence the whole machine.
I've constructed a use case as follows:
1. create a docker:
$ cat test.sh
#!/bin/bash
docker rm centos7 --force
docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init
docker start centos7
sleep 1
docker cp ./alloc_page centos7:/
docker cp ./reproduce.sh centos7:/
docker exec -it centos7 /bin/bash
2. try reproduce the problem in docker:
$ cat reproduce.sh
#!/bin/bash
while true; do
flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l)
if [ "$flag" -eq 0 ]; then
/alloc_page &
fi
sleep 30
start_time=$(date +%s)
yum install -y expect > /dev/null 2>&1
end_time=$(date +%s)
elapsed_time=$((end_time - start_time))
echo "$elapsed_time seconds"
yum remove -y expect > /dev/null 2>&1
done
$ cat alloc_page.c:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#define SIZE 1*1024*1024 //1M
int main()
{
void *addr = NULL;
int i;
for (i = 0; i < 1024 * 6 - 50;i++) {
addr = (void *)malloc(SIZE);
if (!addr)
return -1;
memset(addr, 0, SIZE);
}
sleep(99999);
return 0;
}
We found that this problem is caused by a lot ot meaningless read-ahead.
Since the docker is almost met memory limit, the page will be reclaimed
immediately after read-ahead and will read-ahead again immediately.
The program is executed slowly and waste a lot of I/O resource.
These two patch aim to break the read-ahead in above scenario.
[1] https://lore.kernel.org/linux-mm/[email protected]/T/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/
Liu Shixin (2):
mm/readahead: break read-ahead loop if filemap_add_folio return
-ENOMEM
mm/readahead: increase mmap_miss when folio in workingset
include/linux/pagemap.h | 2 ++
mm/filemap.c | 7 ++++---
mm/readahead.c | 15 +++++++++++++--
3 files changed, 19 insertions(+), 5 deletions(-)
--
2.25.1
If there are too many folios that are recently evicted in a file, then
they will probably continue to be evicted. In such situation, there is
no positive effect to read-ahead this file since it is only a waste of IO.
The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
read-ahead in above scenario, the mmap_miss have to increased exceed
MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
has workingset flags. The async path is not to care because in above
scenario, it's hard to run into the async path.
Signed-off-by: Liu Shixin <[email protected]>
---
mm/filemap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 8df4797c5287..753771310127 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3439,7 +3439,8 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
if (PageHWPoison(page + count))
goto skip;
- (*mmap_miss)++;
+ if (!folio_test_workingset(folio))
+ (*mmap_miss)++;
/*
* NOTE: If there're PTE markers, we'll leave them to be
@@ -3488,7 +3489,8 @@ static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf,
if (PageHWPoison(page))
return ret;
- (*mmap_miss)++;
+ if (!folio_test_workingset(folio))
+ (*mmap_miss)++;
/*
* NOTE: If there're PTE markers, we'll leave them to be
--
2.25.1
When filemap_add_folio() return -ENOMEM, break read-ahead loop like what
filemap_alloc_folio() does.
Signed-off-by: Liu Shixin <[email protected]>
Signed-off-by: Jinjiang Tu <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
mm/readahead.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 2648ec4f0494..a8513ffbb17f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
*/
for (i = 0; i < nr_to_read; i++) {
struct folio *folio = xa_load(&mapping->i_pages, index + i);
+ int ret;
if (folio && !xa_is_value(folio)) {
/*
@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
folio = filemap_alloc_folio(gfp_mask, 0);
if (!folio)
break;
- if (filemap_add_folio(mapping, folio, index + i,
- gfp_mask) < 0) {
+
+ ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
+ if (ret < 0) {
folio_put(folio);
+ if (ret == -ENOMEM)
+ break;
read_pages(ractl);
ractl->_index++;
i = ractl->_index + ractl->_nr_pages - index - 1;
--
2.25.1
On Fri 22-03-24 17:35:55, Liu Shixin wrote:
> If there are too many folios that are recently evicted in a file, then
> they will probably continue to be evicted. In such situation, there is
> no positive effect to read-ahead this file since it is only a waste of IO.
>
> The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
> both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
> read-ahead in above scenario, the mmap_miss have to increased exceed
> MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
> has workingset flags. The async path is not to care because in above
> scenario, it's hard to run into the async path.
>
> Signed-off-by: Liu Shixin <[email protected]>
..
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8df4797c5287..753771310127 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3439,7 +3439,8 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
> if (PageHWPoison(page + count))
> goto skip;
>
> - (*mmap_miss)++;
> + if (!folio_test_workingset(folio))
> + (*mmap_miss)++;
Hum, so this means we consider this a 'hit' if the page is completely new
in the page cache or evicted long time ago. OK, makes sense. It would be
nice to add a comment in this direction to explain the condition. Frankly
the whole mmap_miss accounting is broken as I've outlined in my patch
series. But I guess this works as a fixup for your immediate problem and
we can make mmap_miss accounting sensible later. So for now feel free to
add:
Reviewed-by: Jan Kara <[email protected]>
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
If there are too many folios that are recently evicted in a file, then
they will probably continue to be evicted. In such situation, there is
no positive effect to read-ahead this file since it is only a waste of IO.
The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
read-ahead in above scenario, the mmap_miss have to increased exceed
MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
has workingset flag. The async path is not to care because in above
scenario, it's hard to run into the async path.
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
v2->v3: Update the title and comment. And add reviewed-by from Jan.
Andrew, please update patch[2] with this new patch, thanks.
mm/filemap.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 8df4797c5287..780aad026b26 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3439,7 +3439,15 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
if (PageHWPoison(page + count))
goto skip;
- (*mmap_miss)++;
+ /*
+ * If there are too many folios that are recently evicted
+ * in a file, they will probably continue to be evicted.
+ * In such situation, read-ahead is only a waste of IO.
+ * Don't decrease mmap_miss in this scenario to make sure
+ * we can stop read-ahead.
+ */
+ if (!folio_test_workingset(folio))
+ (*mmap_miss)++;
/*
* NOTE: If there're PTE markers, we'll leave them to be
@@ -3488,7 +3496,9 @@ static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf,
if (PageHWPoison(page))
return ret;
- (*mmap_miss)++;
+ /* See comment of filemap_map_folio_range() */
+ if (!folio_test_workingset(folio))
+ (*mmap_miss)++;
/*
* NOTE: If there're PTE markers, we'll leave them to be
--
2.25.1