We have a system with 4G of memory that process streams of information
in the order of a Terabyte of information. We worked hard to make is
to we never have more then ~3G of data in memory at any one time, so
we would not go into swap. We tested our code, and the began to swap,
I downloaded 2.4.14-pre8, and it still swaps. But, if I turn off our
swap partitions it works well, so swap was not needed.
With the 2.4.14-pre6aa1 kernel, with swap turned on, we have no
problem.
In the beginning I though it was the same bug as the google bug, but
following and asking stupid questions I realized it wasn't. The
following program, which approximates what our processing does on a
small scale, can demonstrate the bug. You need the chunk files as
used in the google bug test code. The comments of the code show a
fast way to make the chunk files.
Any possibility of this being fixed in the stable kernel? Any way to
raise the priority of reclaiming cached memory over swapping out
pages? I'd like to try to stay away from development kernels on what
are meant to be stable systems, but for now it's aa kernels for me.
Sven
#include <stdio.h>
#include <stdlib.h>
// Use the following create the needed files for eading.
// for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19; do perl -e "open(FILE, '>chunk' . $i); seek (FILE, 6682555 * 64, 0); print FILE NULL;" ; done
// This number should be the same number as fake datafile you have
static const int INPUT_COUNT=19;
static const char* INPUT_PATTERN="chunk%d";
static const int OUTPUT_COUNT=25;
static const char* OUTPUT_PATTERN="out%d";
static const size_t MAX_BLOCK_SIZE=1<<12;
static const size_t TICK_EVERY=1<<14;
FILE*
check_fopen(const char* path, const char* mode)
{
FILE* fp;
fp = fopen(path, mode);
if (fp == NULL)
{
perror(path);
abort();
}
return fp;
}
int
main ()
{
FILE* output[OUTPUT_COUNT];
register int a_loop;
// Open up all the output files.
for (a_loop = 0; a_loop < OUTPUT_COUNT; a_loop++)
{
char* filename;
asprintf(&filename, OUTPUT_PATTERN, a_loop);
output[a_loop] = check_fopen(filename, "w");
free(filename);
}
// loop through all the input files, one at a time.
for (a_loop = 0; a_loop < INPUT_COUNT; a_loop++)
{
FILE* input;
char* filename;
size_t block_size;
int buffer[MAX_BLOCK_SIZE];
size_t tick = 0;
asprintf(&filename, INPUT_PATTERN, a_loop);
input = check_fopen(filename, "r");
printf("Reading input: %s\n", filename);
free(filename);
// pick random output files to put random chunks of data.
while (!feof(input))
{
int random_output = rand() % OUTPUT_COUNT;
block_size = rand() % MAX_BLOCK_SIZE;
fread(buffer, block_size, 1, input);
fwrite(buffer, block_size, 1, output[random_output]);
// Just a little output to slow things down a bit.
tick += block_size;
if (tick >= (TICK_EVERY * MAX_BLOCK_SIZE))
{
fputs("tick ", stdout);
fflush(stdout);
tick = 0;
}
}
fputc('\n', stdout);
fclose(input);
}
// Close all the random files.
for (a_loop = 0; a_loop < OUTPUT_COUNT; a_loop++)
{
fclose(output[a_loop]);
}
return 0;
}
On Mon, 5 Nov 2001, Sven Heinicke wrote:
>
> We have a system with 4G of memory that process streams of information
> in the order of a Terabyte of information. We worked hard to make is
> to we never have more then ~3G of data in memory at any one time, so
> we would not go into swap. We tested our code, and the began to swap,
> I downloaded 2.4.14-pre8, and it still swaps. But, if I turn off our
> swap partitions it works well, so swap was not needed.
>
> With the 2.4.14-pre6aa1 kernel, with swap turned on, we have no
> problem.
>
> In the beginning I though it was the same bug as the google bug, but
> following and asking stupid questions I realized it wasn't. The
> following program, which approximates what our processing does on a
> small scale, can demonstrate the bug. You need the chunk files as
> used in the google bug test code. The comments of the code show a
> fast way to make the chunk files.
>
> Any possibility of this being fixed in the stable kernel? Any way to
> raise the priority of reclaiming cached memory over swapping out
> pages?
Yes. Currently, the kernel "tries" to keep the LRU list of pages with 90%
of mapped pages and 10% of cache when there is memory pressure.
By looking at mm/vmscan.c::shrink_cache() you can see:
int max_scan = nr_inactive_pages / priority;
int max_mapped = nr_pages << (9 - priority);
"max_scan" is the max. number of pages the kernel will scan on the
inactive list (the inactive list is usually 1/3 of the total amount of
pages) each time "shrink_cache()" is called.
"max_mapped" is the (roughly) maximum number of non-cache (anonymous)
pages the kernel will scan _before_ it tries to search for data to map to
swap. So basically right now we will scan
You should try to increase "max_mapped" to "nr_pages << (10 - priority)"
and so on until you get good tuning for your workload.
> I'd like to try to stay away from development kernels on what
> are meant to be stable systems, but for now it's aa kernels for me.