2009-03-15 15:29:16

by Oleksij Rempel

[permalink] [raw]
Subject: smart cache. ist is possible?

Hallo all.
I found for my self how great is cache in linux. If read one file from
disk, so i don't need to do it second time, chace will do the job. It
speed up thing greatly. But i found it not working with realy big files.
Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.
Is it possible to have some sort of smart cache wich will read for
exaplme 1GB from disk and other part from cache?

here is some simple test:
=====================cache not working==========================
dd if=dvd.iso of=/dev/null
9017680+0 Datensätze ein
9017680+0 Datensätze aus
4617052160 Bytes (4,6 GB) kopiert, 90,7817 s, *50,9 MB/s*

dd if=dvd.iso of=/dev/null
9017680+0 Datensätze ein
9017680+0 Datensätze aus
4617052160 Bytes (4,6 GB) kopiert, 90,7817 s, *50,9 MB/s*
===============================================================


=====================cache working============================
dd if=film.avi of=/dev/null
1432600+0 Datensätze ein
1432600+0 Datensätze aus
733491200 Bytes (733 MB) kopiert, 15,5108 s, *47,3 MB/s*

dd if=film.avi of=/dev/null
1432600+0 Datensätze ein
1432600+0 Datensätze aus
733491200 Bytes (733 MB) kopiert, 0,941367 s, *779 MB/s*
==============================================================


2009-03-15 18:13:52

by Sitsofe Wheeler

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

Hi,

On Sun, Mar 15, 2009 at 04:28:55PM +0100, Alexey Fisher wrote:
> I found for my self how great is cache in linux. If read one file from
> disk, so i don't need to do it second time, chace will do the job. It
> speed up thing greatly. But i found it not working with realy big files.
> Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.

Watch out - if you are doing cache tests you really want to be using
drop_caches ( http://linux-mm.org/Drop_Caches ) before your "cold" runs
so you can be sure that the cache really was empty before you started...

--
Sitsofe | http://sucs.org/~sits/

2009-03-15 22:06:48

by Oleksij Rempel

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

Hi,

Sitsofe Wheeler schrieb:
> Hi,
>
> On Sun, Mar 15, 2009 at 04:28:55PM +0100, Alexey Fisher wrote:
>> I found for my self how great is cache in linux. If read one file from
>> disk, so i don't need to do it second time, chace will do the job. It
>> speed up thing greatly. But i found it not working with realy big files.
>> Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.
>
> Watch out - if you are doing cache tests you really want to be using
> drop_caches ( http://linux-mm.org/Drop_Caches ) before your "cold" runs
> so you can be sure that the cache really was empty before you started...
>

It is not what i mean. I know how to clear cache but exactly i do not
won't it. I will use cache and it's working perfectly with small files.
But there is a problem with big files. For example i have 4GB RAM, if i
read 4,6GB file the cache is useless. The question is; are there any way
to workaround it, except more RAM?

2009-03-15 22:23:21

by Dave Chinner

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

On Sun, Mar 15, 2009 at 04:28:55PM +0100, Alexey Fisher wrote:
> Hallo all.
> I found for my self how great is cache in linux. If read one file from
> disk, so i don't need to do it second time, chace will do the job. It
> speed up thing greatly. But i found it not working with realy big files.
> Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.
> Is it possible to have some sort of smart cache wich will read for
> exaplme 1GB from disk and other part from cache?

You're asking for a cache algorithm that can predict the future.
i.e. which bit of that file that is larger than memory is going
to be reused in 10s time?

If you find an algorithm that can tell you this, please let me
know so I can use it to build a flux capacitor. ;)

Cheers,

Dave.
--
Dave Chinner
[email protected]

2009-03-15 23:26:27

by Sitsofe Wheeler

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

On Sun, Mar 15, 2009 at 11:06:34PM +0100, Alexey Fisher wrote:
>
> It is not what i mean. I know how to clear cache but exactly i do not
> won't it. I will use cache and it's working perfectly with small files.

I meant for timings on the small files otherwise how do you know which
exactly pages were floating around the cache?

> But there is a problem with big files. For example i have 4GB RAM, if i
> read 4,6GB file the cache is useless. The question is; are there any way
> to workaround it, except more RAM?

I suspect what is happening is that you are cycling the cache. Because
you can't hold everything and you are reading the file sequentially you
will successfully have cleared the cache of the start of the file by the
time you start again (so first bit gets evicted by the time last bit is
read etc). If you use dd bs=1000M count=1 I think you will find that the
kernel CAN cache pieces of files but as pointed out elsewhere, without
knowing the future what do you decide to keep when your cache is full?

At a guess you either you need to provide a hint (e.g. bybassing the
cache for some of the file so it doesn't become full or locking specific
pages into RAM) or create a bigger cache somehow (e.g. by buying more
RAM).

--
Sitsofe | http://sucs.org/~sits/

2009-03-16 07:35:22

by Oleksij Rempel

[permalink] [raw]
Subject: Re: smart cache. ist is possible?



Sitsofe Wheeler schrieb:
> On Sun, Mar 15, 2009 at 11:06:34PM +0100, Alexey Fisher wrote:
>> It is not what i mean. I know how to clear cache but exactly i do not
>> won't it. I will use cache and it's working perfectly with small files.
>
> I meant for timings on the small files otherwise how do you know which
> exactly pages were floating around the cache?
>
>> But there is a problem with big files. For example i have 4GB RAM, if i
>> read 4,6GB file the cache is useless. The question is; are there any way
>> to workaround it, except more RAM?
>
> I suspect what is happening is that you are cycling the cache. Because
> you can't hold everything and you are reading the file sequentially you
> will successfully have cleared the cache of the start of the file by the
> time you start again (so first bit gets evicted by the time last bit is
> read etc). If you use dd bs=1000M count=1 I think you will find that the
> kernel CAN cache pieces of files but as pointed out elsewhere, without
> knowing the future what do you decide to keep when your cache is full?
>
> At a guess you either you need to provide a hint (e.g. bybassing the
> cache for some of the file so it doesn't become full or locking specific
> pages into RAM) or create a bigger cache somehow (e.g. by buying more
> RAM).

Just to make sure i understand you.
for example:
i have some smole RAM to cache 5 blocks from hard drive, and it's empty.
|0|0|0|0|0|
i read some file with 1-10 blocks ( dd if=somefile ). At the beginning
of read it will cache thirst 5 blocks ( |1|2|3|4|5| ) and after no
place left in cache it will replace old cache with new blocks(
|6|7|8|9|10| ).
If i read same somefile second time, normally will happen the same. It
tries to read block 1 and this is not it cache, so it will be cached
and move block 6 out. ( |1|7|8|9|10| ) So it will complete replace
entire cache.
Or i can say, read somefile and only blocks 6-10 so i can use
performance from cache.
Or OS need to get a list of blocks from somefile and list of cached
blocks. And check if there is some of them cached. If they are, it
should lock the cache, read 1-5 without caching it and read 6-10 from
cache. After this unlock the cache. But this is not possible because
this operation is to expensive.
Is this what you mean?

Thank you.
Alexey.

2009-03-16 13:43:49

by Pádraig Brady

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

Alexey Fisher wrote:
> Hallo all.
> I found for my self how great is cache in linux. If read one file from
> disk, so i don't need to do it second time, chace will do the job. It
> speed up thing greatly. But i found it not working with realy big files.
> Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.
> Is it possible to have some sort of smart cache wich will read for
> exaplme 1GB from disk and other part from cache?
>
> here is some simple test:
> =====================cache not working==========================
> dd if=dvd.iso of=/dev/null
> 9017680+0 Datensätze ein
> 9017680+0 Datensätze aus
> 4617052160 Bytes (4,6 GB) kopiert, 90,7817 s, *50,9 MB/s*
>
> dd if=dvd.iso of=/dev/null
> 9017680+0 Datensätze ein
> 9017680+0 Datensätze aus
> 4617052160 Bytes (4,6 GB) kopiert, 90,7817 s, *50,9 MB/s*
> ===============================================================

Right. The cache is being cycled.
I.E. the block you want is never in the cache
as it has previously clobbered be data your reading in.
That's just a consequence of preferring to cache blocks
you have recently read from file, over older blocks.
This is usually the right thing to do, but exactly
the wrong thing to do in your case.

So you would need to provide more info to the
kernel for it to behave as you want.
I.E. never evict a block belonging to the same file
as the block you're trying to insert.

I wonder should posix_fadvise(...POSIX_FADV_SEQUENTIAL)
do what you want. I don't think it does at present.
Note dd doesn't use posix_fadvise() yet, and it probably
should (at least for POSIX_FADV_DONTNEED).
Pity there is no interface like Robert Love's old O_STREAM
patch to just specify the intent for an fd rather than
worrying about ranges.

cheers,
Pádraig.

2009-03-16 15:15:38

by Paulo Marques

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

Alexey Fisher wrote:
> Hallo all.

Hi,

> I found for my self how great is cache in linux. If read one file from
> disk, so i don't need to do it second time, chace will do the job. It
> speed up thing greatly. But i found it not working with realy big files.
> Like i have 4GB RAM, so if i read a file like 4.6GB, cache won't work.
> Is it possible to have some sort of smart cache wich will read for
> exaplme 1GB from disk and other part from cache?

This is something that depends on the page replacement algorithm. A
different page replacement algorithm might do better and there has been
some work in the past on this.

Check, for instance, this paper on Clock-Pro:

http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf

ISTR that Rik van Riel was doing an implementation of this algorithm for
the 2.6 kernel, but I don't remember how that ended up...

--
Paulo Marques - http://www.grupopie.com

"God is real, unless declared integer."

2009-03-16 17:36:21

by Sitsofe Wheeler

[permalink] [raw]
Subject: Re: smart cache. ist is possible?

On Mon, Mar 16, 2009 at 08:34:53AM +0100, Alexey Fisher wrote:
>
> If i read same somefile second time, normally will happen the same. It
> tries to read block 1 and this is not it cache, so it will be cached
> and move block 6 out. ( |1|7|8|9|10| ) So it will complete replace
> entire cache.

Yup, that's my understanding (I'm no pro on this stuff though).
Different cache replacement algorithms will have different behviour
though (I believe Linux uses some variation on LRU -
http://linux-mm.org/AdvancedPageReplacement ).

> Or i can say, read somefile and only blocks 6-10 so i can use
> performance from cache.

Yup, you could do this but your code would have to have exactly this in
mind (e.g. skip the first 7 Mbytes THEN read 10 mbytes). There's nothing
automatic happening in this scenario from the cache's perspective -
you're just reading less so it (hopefully) doesn't become full and evict
what you want it to keep.

> Or OS need to get a list of blocks from somefile and list of cached
> blocks. And check if there is some of them cached. If they are, it
> should lock the cache, read 1-5 without caching it and read 6-10 from
> cache. After this unlock the cache. But this is not possible because
> this operation is to expensive.
> Is this what you mean?

Kinda. More like when you use memory mapped I/O (mmap) you can "lock"
pieces of the file into ram (and thus have those pieces always in
cache). See mlock -
http://opengroup.org/onlinepubs/007908799/xsh/mlock.html . There are
usually restrictions on just how much memory you can lock etc.

--
Sitsofe | http://sucs.org/~sits/