This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED
advices to shmem files which can be helpful for the drivers who may want
to manage the pages of shmem files on their own, like, that are created
through shmem_file_setup[_with_mnt]().
This patchset is unit tested with the below:
(a) Qemu x86_64 with 4 smp cores, 2GB ram, 1Gb swap mounted
on zram block device.
(b) A tmpfs file of size 100MB is created.
(c) Initially this file is filled with a poison value of 0xAA.
(d) POSIX_FADV_[WILL|DONT]NEED is called on this file for a range that
gets generated randomn. This is called for 10K iterations.
(e) Check the poison value at the end of the test. Exit the program if it
is changed.
use the below script:
val1=$(cat /proc/vmstat | grep pswpout | awk '{print $2}')
str=$(./a.out <tmpfs file>)
res=$(echo $str | awk '{print $1}')
val2=$(cat /proc/vmstat | grep pswpout | awk '{print $2}')
if [[ $res == "FAIL" ]]; then echo $str;
else
if [[ $val1 == $val2 ]]; then echo "FAIL. Does the swap setup is done?";
else echo $res;
fi
fi
------------------------------------------------------------------------
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#define NR_ITER (10000)
unsigned int fd;
void run_tests(int fd, loff_t start, loff_t end)
{
if (posix_fadvise(fd, start, end, POSIX_FADV_DONTNEED)) {
perror("FAIL : fadvise():dont_need_thread\n");
exit(6);
}
if (posix_fadvise(fd, start, end, POSIX_FADV_WILLNEED)) {
perror("FAIL : fadvise():will_need_thread\n");
exit(6);
}
}
void get_rand_range(long size, loff_t *start, loff_t *end)
{
*start = rand() % (size >> 1);
*end = *start + (rand() % size);
if (*end > size)
*start = *end = 0;
}
void fill_pattern(int fd, long size)
{
char *data;
data = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (data == MAP_FAILED) {
perror("FAIL : mmap()\n");
exit(5);
}
memset(data, 0xAA, size);
munmap(data, size);
}
int check_pattern(int fd, long size)
{
unsigned char *data;
long i;
data = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
if (data == MAP_FAILED) {
perror("FAIL : mmap()\n");
exit(4);
}
for (i = 0; i < size; ++i) {
if (data[i] != 0xAA) {
printf("FAIL : Testcase is failed. Seeing %x rather than 0xAA\n", data[i]);
exit(4);
}
}
munmap(data, size);
}
int main(int argc, char **argv)
{
struct stat buf = {0};
loff_t start, end, i;
if (argc < 2) {
printf("FAIL : usage: a.out <tmpfs filename>\n");
exit(1);
}
if (stat(argv[1], &buf)) {
perror("FAIL : stat()\n");
exit(2);
}
fd = open(argv[1], O_RDWR);
if (fd < 0) {
perror("FAIL : open()\n");
exit(3);
}
fill_pattern(fd, buf.st_size);
for (i = NR_ITER; i > 0; --i) {
get_rand_range(buf.st_size, &start, &end);
run_tests(fd, start, end);
}
check_pattern(fd, buf.st_size);
close(fd);
printf("PASS \n");
return 0;
}
------------------------------------------------------------------------
Changes in V8:
-- Addressed the comments and fixed the bug caught by Hugh.
-- Updated the commit message for POSIX_FADV_WILLNEED asked by Minchan.
Changes in V7:
-- Use folio based interface, shmem_read_folio(), for FADV_WILLNEED.
-- Don't swap the SHM_LOCK'ed pages.
-- https://lore.kernel.org/all/[email protected]/
Changes in V6:
-- Replaced the pages with folio's for shmem changes.
-- https://lore.kernel.org/all/[email protected]/
Changes in V5:
-- Moved the 'endbyte' calculations to a header function for use by shmem_fadvise().
-- Addressed comments from suren.
-- No changes in resend. Retested on the latest tip.
-- https://lore.kernel.org/all/[email protected]/
Changes in V4:
-- Changed the code to use reclaim_pages() to writeout the shmem pages to swap and then reclaim.
-- Addressed comments from Mark Hemment and Matthew.
-- fadvise() on shmem file may even unmap a page.
-- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
Changes in V3:
-- Considered THP pages while doing FADVISE_[DONT|WILL]NEED, identified by Matthew.
-- xarray used properly, as identified by Matthew.
-- Excluded mapped pages as it requires unmapping and the man pages of fadvise don't talk about them.
-- RESEND: Fixed the compilation issue when CONFIG_TMPFS is not defined.
-- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
Changes in V2:
-- Rearranged the code to not to sleep with rcu_lock while using xas_() functionality.
-- Addressed the comments from Suren.
-- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
changes in V1:
-- Created the interface for fadvise(2) to work on shmem files.
-- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
Charan Teja Kalla (2):
mm: fadvise: move 'endbyte' calculations to helper function
mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem
mm/fadvise.c | 11 +----
mm/internal.h | 21 ++++++++++
mm/shmem.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 149 insertions(+), 10 deletions(-)
--
2.7.4
Just a ping to have your valuable inputs here.
On 4/28/2023 8:32 PM, Charan Teja Kalla wrote:
> This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED
> advices to shmem files which can be helpful for the drivers who may want
> to manage the pages of shmem files on their own, like, that are created
> through shmem_file_setup[_with_mnt]().
>
> This patchset is unit tested with the below:
> (a) Qemu x86_64 with 4 smp cores, 2GB ram, 1Gb swap mounted
> on zram block device.
> (b) A tmpfs file of size 100MB is created.
> (c) Initially this file is filled with a poison value of 0xAA.
> (d) POSIX_FADV_[WILL|DONT]NEED is called on this file for a range that
> gets generated randomn. This is called for 10K iterations.
> (e) Check the poison value at the end of the test. Exit the program if it
> is changed.
>
> use the below script:
> val1=$(cat /proc/vmstat | grep pswpout | awk '{print $2}')
> str=$(./a.out <tmpfs file>)
> res=$(echo $str | awk '{print $1}')
> val2=$(cat /proc/vmstat | grep pswpout | awk '{print $2}')
>
> if [[ $res == "FAIL" ]]; then echo $str;
> else
> if [[ $val1 == $val2 ]]; then echo "FAIL. Does the swap setup is done?";
> else echo $res;
> fi
> fi
> ------------------------------------------------------------------------
> #include <stdio.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/stat.h>
>
> #define NR_ITER (10000)
> unsigned int fd;
>
> void run_tests(int fd, loff_t start, loff_t end)
> {
>
> if (posix_fadvise(fd, start, end, POSIX_FADV_DONTNEED)) {
> perror("FAIL : fadvise():dont_need_thread\n");
> exit(6);
> }
>
> if (posix_fadvise(fd, start, end, POSIX_FADV_WILLNEED)) {
> perror("FAIL : fadvise():will_need_thread\n");
> exit(6);
> }
> }
>
> void get_rand_range(long size, loff_t *start, loff_t *end)
> {
> *start = rand() % (size >> 1);
> *end = *start + (rand() % size);
>
> if (*end > size)
> *start = *end = 0;
> }
>
> void fill_pattern(int fd, long size)
> {
> char *data;
>
> data = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> if (data == MAP_FAILED) {
> perror("FAIL : mmap()\n");
> exit(5);
> }
> memset(data, 0xAA, size);
> munmap(data, size);
> }
>
> int check_pattern(int fd, long size)
> {
> unsigned char *data;
> long i;
>
> data = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
> if (data == MAP_FAILED) {
> perror("FAIL : mmap()\n");
> exit(4);
> }
>
> for (i = 0; i < size; ++i) {
> if (data[i] != 0xAA) {
> printf("FAIL : Testcase is failed. Seeing %x rather than 0xAA\n", data[i]);
> exit(4);
> }
> }
> munmap(data, size);
> }
>
> int main(int argc, char **argv)
> {
> struct stat buf = {0};
> loff_t start, end, i;
>
> if (argc < 2) {
> printf("FAIL : usage: a.out <tmpfs filename>\n");
> exit(1);
> }
>
>
> if (stat(argv[1], &buf)) {
> perror("FAIL : stat()\n");
> exit(2);
> }
>
> fd = open(argv[1], O_RDWR);
> if (fd < 0) {
> perror("FAIL : open()\n");
> exit(3);
> }
>
> fill_pattern(fd, buf.st_size);
> for (i = NR_ITER; i > 0; --i) {
> get_rand_range(buf.st_size, &start, &end);
> run_tests(fd, start, end);
> }
> check_pattern(fd, buf.st_size);
>
> close(fd);
>
> printf("PASS \n");
>
> return 0;
> }
> ------------------------------------------------------------------------
>
> Changes in V8:
> -- Addressed the comments and fixed the bug caught by Hugh.
> -- Updated the commit message for POSIX_FADV_WILLNEED asked by Minchan.
>
> Changes in V7:
> -- Use folio based interface, shmem_read_folio(), for FADV_WILLNEED.
> -- Don't swap the SHM_LOCK'ed pages.
> -- https://lore.kernel.org/all/[email protected]/
>
> Changes in V6:
> -- Replaced the pages with folio's for shmem changes.
> -- https://lore.kernel.org/all/[email protected]/
>
> Changes in V5:
> -- Moved the 'endbyte' calculations to a header function for use by shmem_fadvise().
> -- Addressed comments from suren.
> -- No changes in resend. Retested on the latest tip.
> -- https://lore.kernel.org/all/[email protected]/
>
> Changes in V4:
> -- Changed the code to use reclaim_pages() to writeout the shmem pages to swap and then reclaim.
> -- Addressed comments from Mark Hemment and Matthew.
> -- fadvise() on shmem file may even unmap a page.
> -- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
>
> Changes in V3:
> -- Considered THP pages while doing FADVISE_[DONT|WILL]NEED, identified by Matthew.
> -- xarray used properly, as identified by Matthew.
> -- Excluded mapped pages as it requires unmapping and the man pages of fadvise don't talk about them.
> -- RESEND: Fixed the compilation issue when CONFIG_TMPFS is not defined.
> -- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
>
> Changes in V2:
> -- Rearranged the code to not to sleep with rcu_lock while using xas_() functionality.
> -- Addressed the comments from Suren.
> -- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
>
> changes in V1:
> -- Created the interface for fadvise(2) to work on shmem files.
> -- https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/
>
>
> Charan Teja Kalla (2):
> mm: fadvise: move 'endbyte' calculations to helper function
> mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem
>
> mm/fadvise.c | 11 +----
> mm/internal.h | 21 ++++++++++
> mm/shmem.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 149 insertions(+), 10 deletions(-)
>