2014-07-27 20:47:55

by Nicholas Krause

[permalink] [raw]
Subject: Multi Core Support for compression in compression.c

This may be a bad idea , but compression in brtfs seems to be only
using one core to compress.
Depending on the CPU used and the amount of cores in the CPU we can
make this much faster
with multiple cores. This seems bad by my reading at least I would
recommend for writing compression
we write a function to use a certain amount of cores based on the load
of the system's CPU not using
more then 75% of the system's CPU resources as my system when idle has
never needed more
then one core of my i5 2500k to run when with interrupts for opening
eclipse are running. For reading
compression on good core seems fine to me as testing other compression
software for reads , it's
way less CPU intensive.
Cheers Nick


2014-07-28 02:56:16

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On 07/27/2014 04:47 PM, Nick Krause wrote:
> This may be a bad idea , but compression in brtfs seems to be only
> using one core to compress.
> Depending on the CPU used and the amount of cores in the CPU we can
> make this much faster
> with multiple cores. This seems bad by my reading at least I would
> recommend for writing compression
> we write a function to use a certain amount of cores based on the load
> of the system's CPU not using
> more then 75% of the system's CPU resources as my system when idle has
> never needed more
> then one core of my i5 2500k to run when with interrupts for opening
> eclipse are running. For reading
> compression on good core seems fine to me as testing other compression
> software for reads , it's
> way less CPU intensive.
> Cheers Nick
We would probably get a bigger benefit from taking an approach like
SquashFS has recently added, that is, allowing multi-threaded
decompression fro reads, and decompressing directly into the pagecache.
Such an approach would likely make zlib compression much more scalable
on large systems.



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2014-07-28 03:21:57

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 07/27/2014 04:47 PM, Nick Krause wrote:
>> This may be a bad idea , but compression in brtfs seems to be only
>> using one core to compress.
>> Depending on the CPU used and the amount of cores in the CPU we can
>> make this much faster
>> with multiple cores. This seems bad by my reading at least I would
>> recommend for writing compression
>> we write a function to use a certain amount of cores based on the load
>> of the system's CPU not using
>> more then 75% of the system's CPU resources as my system when idle has
>> never needed more
>> then one core of my i5 2500k to run when with interrupts for opening
>> eclipse are running. For reading
>> compression on good core seems fine to me as testing other compression
>> software for reads , it's
>> way less CPU intensive.
>> Cheers Nick
> We would probably get a bigger benefit from taking an approach like
> SquashFS has recently added, that is, allowing multi-threaded
> decompression fro reads, and decompressing directly into the pagecache.
> Such an approach would likely make zlib compression much more scalable
> on large systems.
>
>

Austin,
That seems better then my idea as you seem to be more up to date on
brtfs devolopment.
If you and the other developers of brtfs are interested in adding this
as a feature please let
me known as I would like to help improve brtfs as the file system as
an idea is great just
seems like it needs a lot of work :).
Nick

2014-07-28 10:02:18

by Hugo Mills

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Sun, Jul 27, 2014 at 11:21:53PM -0400, Nick Krause wrote:
> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
> <[email protected]> wrote:
> > On 07/27/2014 04:47 PM, Nick Krause wrote:
> >> This may be a bad idea , but compression in brtfs seems to be only
> >> using one core to compress.
> >> Depending on the CPU used and the amount of cores in the CPU we can
> >> make this much faster
> >> with multiple cores. This seems bad by my reading at least I would
> >> recommend for writing compression
> >> we write a function to use a certain amount of cores based on the load
> >> of the system's CPU not using
> >> more then 75% of the system's CPU resources as my system when idle has
> >> never needed more
> >> then one core of my i5 2500k to run when with interrupts for opening
> >> eclipse are running. For reading
> >> compression on good core seems fine to me as testing other compression
> >> software for reads , it's
> >> way less CPU intensive.
> >> Cheers Nick
> > We would probably get a bigger benefit from taking an approach like
> > SquashFS has recently added, that is, allowing multi-threaded
> > decompression fro reads, and decompressing directly into the pagecache.
> > Such an approach would likely make zlib compression much more scalable
> > on large systems.
> >
> >
>
> Austin,
> That seems better then my idea as you seem to be more up to date on
> brtfs devolopment.
> If you and the other developers of brtfs are interested in adding this
> as a feature please let
> me known as I would like to help improve brtfs as the file system as
> an idea is great just
> seems like it needs a lot of work :).

Yes, it probably does need a lot of work. This is (at least one
reason) why it's not been done yet. If you want to work on doing this,
then please do. However, don't expect anyone else to give you a
detailed plan of what code to write. Don't expect anyone else to write
the code for you. You will have to come up with your own ideas as to
how to implement it, and actually do it yourself, including building
it, and testing it.

That's not to say that you are on your own, though. People will
help -- provided that you aren't asking them to do all the work. You
are not an empty vessel to be filled with the wisdom of the ancients.
This means that *you* have to take action. You have to take yourself
as far as you can in learning how things work. When you get stuck,
work out what it is that you don't know, and then ask about that one
thing. This makes it easier to answer, it shows that you're putting in
effort on your side, and it means that you *actually learn things*.
Questions like "what function should I be modifying?", or "how do you
want me to do this?" show that you haven't put in even the smallest
piece of effort, and will be ignored (f you're lucky). Questions like
"I'm trying to implement a crumble filter, but in the mix_breadcrumbs
function, how does it take account of the prestressed_yoghurt field?"
show that you've read and understood at least some of the code, and
have thought about what it's doing.

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Alert status mauve ocelot: Slight chance of brimstone. Be ---
prepared to make a nice cup of tea.


Attachments:
(No filename) (3.31 kB)
signature.asc (811.00 B)
Digital signature
Download all attachments

2014-07-28 10:10:24

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On 07/27/2014 11:21 PM, Nick Krause wrote:
> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
> <[email protected]> wrote:
>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>> This may be a bad idea , but compression in brtfs seems to be only
>>> using one core to compress.
>>> Depending on the CPU used and the amount of cores in the CPU we can
>>> make this much faster
>>> with multiple cores. This seems bad by my reading at least I would
>>> recommend for writing compression
>>> we write a function to use a certain amount of cores based on the load
>>> of the system's CPU not using
>>> more then 75% of the system's CPU resources as my system when idle has
>>> never needed more
>>> then one core of my i5 2500k to run when with interrupts for opening
>>> eclipse are running. For reading
>>> compression on good core seems fine to me as testing other compression
>>> software for reads , it's
>>> way less CPU intensive.
>>> Cheers Nick
>> We would probably get a bigger benefit from taking an approach like
>> SquashFS has recently added, that is, allowing multi-threaded
>> decompression fro reads, and decompressing directly into the pagecache.
>> Such an approach would likely make zlib compression much more scalable
>> on large systems.
>>
>>
>
> Austin,
> That seems better then my idea as you seem to be more up to date on
> brtfs devolopment.
> If you and the other developers of brtfs are interested in adding this
> as a feature please let
> me known as I would like to help improve brtfs as the file system as
> an idea is great just
> seems like it needs a lot of work :).
> Nick
I wouldn't say that I am a BTRFS developer (power user maybe?), but I
would definitely say that parallelizing compression on writes would be a
good idea too (especially for things like lz4, which IIRC is either in
3.16 or in the queue for 3.17). Both options would be a lot of work,
but almost any performance optimization would. I would almost say that
it would provide a bigger performance improvement to get BTRFS to
intelligently stripe reads and writes (at the moment, any given worker
thread only dispatches one write or read to a single device at a time,
and any given write() or read() syscall gets handled by only one worker).


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2014-07-28 15:13:32

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 07/27/2014 11:21 PM, Nick Krause wrote:
>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>> <[email protected]> wrote:
>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>> This may be a bad idea , but compression in brtfs seems to be only
>>>> using one core to compress.
>>>> Depending on the CPU used and the amount of cores in the CPU we can
>>>> make this much faster
>>>> with multiple cores. This seems bad by my reading at least I would
>>>> recommend for writing compression
>>>> we write a function to use a certain amount of cores based on the load
>>>> of the system's CPU not using
>>>> more then 75% of the system's CPU resources as my system when idle has
>>>> never needed more
>>>> then one core of my i5 2500k to run when with interrupts for opening
>>>> eclipse are running. For reading
>>>> compression on good core seems fine to me as testing other compression
>>>> software for reads , it's
>>>> way less CPU intensive.
>>>> Cheers Nick
>>> We would probably get a bigger benefit from taking an approach like
>>> SquashFS has recently added, that is, allowing multi-threaded
>>> decompression fro reads, and decompressing directly into the pagecache.
>>> Such an approach would likely make zlib compression much more scalable
>>> on large systems.
>>>
>>>
>>
>> Austin,
>> That seems better then my idea as you seem to be more up to date on
>> brtfs devolopment.
>> If you and the other developers of brtfs are interested in adding this
>> as a feature please let
>> me known as I would like to help improve brtfs as the file system as
>> an idea is great just
>> seems like it needs a lot of work :).
>> Nick
> I wouldn't say that I am a BTRFS developer (power user maybe?), but I
> would definitely say that parallelizing compression on writes would be a
> good idea too (especially for things like lz4, which IIRC is either in
> 3.16 or in the queue for 3.17). Both options would be a lot of work,
> but almost any performance optimization would. I would almost say that
> it would provide a bigger performance improvement to get BTRFS to
> intelligently stripe reads and writes (at the moment, any given worker
> thread only dispatches one write or read to a single device at a time,
> and any given write() or read() syscall gets handled by only one worker).
>

I will look into this idea and see if I can do this for writes.
Regards Nick

2014-07-28 15:57:54

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]> wrote:
> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
> <[email protected]> wrote:
>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>> <[email protected]> wrote:
>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>> This may be a bad idea , but compression in brtfs seems to be only
>>>>> using one core to compress.
>>>>> Depending on the CPU used and the amount of cores in the CPU we can
>>>>> make this much faster
>>>>> with multiple cores. This seems bad by my reading at least I would
>>>>> recommend for writing compression
>>>>> we write a function to use a certain amount of cores based on the load
>>>>> of the system's CPU not using
>>>>> more then 75% of the system's CPU resources as my system when idle has
>>>>> never needed more
>>>>> then one core of my i5 2500k to run when with interrupts for opening
>>>>> eclipse are running. For reading
>>>>> compression on good core seems fine to me as testing other compression
>>>>> software for reads , it's
>>>>> way less CPU intensive.
>>>>> Cheers Nick
>>>> We would probably get a bigger benefit from taking an approach like
>>>> SquashFS has recently added, that is, allowing multi-threaded
>>>> decompression fro reads, and decompressing directly into the pagecache.
>>>> Such an approach would likely make zlib compression much more scalable
>>>> on large systems.
>>>>
>>>>
>>>
>>> Austin,
>>> That seems better then my idea as you seem to be more up to date on
>>> brtfs devolopment.
>>> If you and the other developers of brtfs are interested in adding this
>>> as a feature please let
>>> me known as I would like to help improve brtfs as the file system as
>>> an idea is great just
>>> seems like it needs a lot of work :).
>>> Nick
>> I wouldn't say that I am a BTRFS developer (power user maybe?), but I
>> would definitely say that parallelizing compression on writes would be a
>> good idea too (especially for things like lz4, which IIRC is either in
>> 3.16 or in the queue for 3.17). Both options would be a lot of work,
>> but almost any performance optimization would. I would almost say that
>> it would provide a bigger performance improvement to get BTRFS to
>> intelligently stripe reads and writes (at the moment, any given worker
>> thread only dispatches one write or read to a single device at a time,
>> and any given write() or read() syscall gets handled by only one worker).
>>
>
> I will look into this idea and see if I can do this for writes.
> Regards Nick

Austin,
Seems since we don't want to release the cache for inodes in order to
improve writes if
are going to use the page cache. We seem to be doing this for writes in
end_compressed_bio_write for standard pages and in end_compressed_bio_write.
If we want to cache write pages why are we removing then ? Seems like this needs
to be removed in order to start off.
Regards Nick

2014-07-28 16:20:11

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On 2014-07-28 11:57, Nick Krause wrote:
> On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]>
> wrote:
>> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
>> <[email protected]> wrote:
>>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>>> <[email protected]> wrote:
>>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>>> This may be a bad idea , but compression in brtfs seems
>>>>>> to be only using one core to compress. Depending on the
>>>>>> CPU used and the amount of cores in the CPU we can make
>>>>>> this much faster with multiple cores. This seems bad by
>>>>>> my reading at least I would recommend for writing
>>>>>> compression we write a function to use a certain amount
>>>>>> of cores based on the load of the system's CPU not using
>>>>>> more then 75% of the system's CPU resources as my system
>>>>>> when idle has never needed more then one core of my i5
>>>>>> 2500k to run when with interrupts for opening eclipse are
>>>>>> running. For reading compression on good core seems fine
>>>>>> to me as testing other compression software for reads ,
>>>>>> it's way less CPU intensive. Cheers Nick
>>>>> We would probably get a bigger benefit from taking an
>>>>> approach like SquashFS has recently added, that is,
>>>>> allowing multi-threaded decompression fro reads, and
>>>>> decompressing directly into the pagecache. Such an approach
>>>>> would likely make zlib compression much more scalable on
>>>>> large systems.
>>>>>
>>>>>
>>>>
>>>> Austin, That seems better then my idea as you seem to be more
>>>> up to date on brtfs devolopment. If you and the other
>>>> developers of brtfs are interested in adding this as a
>>>> feature please let me known as I would like to help improve
>>>> brtfs as the file system as an idea is great just seems like
>>>> it needs a lot of work :). Nick
>>> I wouldn't say that I am a BTRFS developer (power user maybe?),
>>> but I would definitely say that parallelizing compression on
>>> writes would be a good idea too (especially for things like
>>> lz4, which IIRC is either in 3.16 or in the queue for 3.17).
>>> Both options would be a lot of work, but almost any performance
>>> optimization would. I would almost say that it would provide a
>>> bigger performance improvement to get BTRFS to intelligently
>>> stripe reads and writes (at the moment, any given worker thread
>>> only dispatches one write or read to a single device at a
>>> time, and any given write() or read() syscall gets handled by
>>> only one worker).
>>>
>>
>> I will look into this idea and see if I can do this for writes.
>> Regards Nick
>
> Austin, Seems since we don't want to release the cache for inodes
> in order to improve writes if are going to use the page cache. We
> seem to be doing this for writes in end_compressed_bio_write for
> standard pages and in end_compressed_bio_write. If we want to cache
> write pages why are we removing then ? Seems like this needs to be
> removed in order to start off. Regards Nick
>
I'm not entirely sure, it's been a while since I went exploring in the
page-cache code. My guess is that there is some reason that you and I
aren't seeing that we are trying for write-around semantics, maybe one
of the people who originally wrote this code could weigh in? Part of
this might be to do with the fact that normal page-cache semantics
don't always work as expected with COW filesystems (cause a write goes
to a different block on the device than a read before the write would
have gone to). It might be easier to parallelize reads first, and
then work from that (and most workloads would probably benefit more
from the parallelized reads).


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2014-07-28 18:36:27

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 2014-07-28 11:57, Nick Krause wrote:
>> On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]>
>> wrote:
>>> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
>>> <[email protected]> wrote:
>>>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>>>> <[email protected]> wrote:
>>>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>>>> This may be a bad idea , but compression in brtfs seems
>>>>>>> to be only using one core to compress. Depending on the
>>>>>>> CPU used and the amount of cores in the CPU we can make
>>>>>>> this much faster with multiple cores. This seems bad by
>>>>>>> my reading at least I would recommend for writing
>>>>>>> compression we write a function to use a certain amount
>>>>>>> of cores based on the load of the system's CPU not using
>>>>>>> more then 75% of the system's CPU resources as my system
>>>>>>> when idle has never needed more then one core of my i5
>>>>>>> 2500k to run when with interrupts for opening eclipse are
>>>>>>> running. For reading compression on good core seems fine
>>>>>>> to me as testing other compression software for reads ,
>>>>>>> it's way less CPU intensive. Cheers Nick
>>>>>> We would probably get a bigger benefit from taking an
>>>>>> approach like SquashFS has recently added, that is,
>>>>>> allowing multi-threaded decompression fro reads, and
>>>>>> decompressing directly into the pagecache. Such an approach
>>>>>> would likely make zlib compression much more scalable on
>>>>>> large systems.
>>>>>>
>>>>>>
>>>>>
>>>>> Austin, That seems better then my idea as you seem to be more
>>>>> up to date on brtfs devolopment. If you and the other
>>>>> developers of brtfs are interested in adding this as a
>>>>> feature please let me known as I would like to help improve
>>>>> brtfs as the file system as an idea is great just seems like
>>>>> it needs a lot of work :). Nick
>>>> I wouldn't say that I am a BTRFS developer (power user maybe?),
>>>> but I would definitely say that parallelizing compression on
>>>> writes would be a good idea too (especially for things like
>>>> lz4, which IIRC is either in 3.16 or in the queue for 3.17).
>>>> Both options would be a lot of work, but almost any performance
>>>> optimization would. I would almost say that it would provide a
>>>> bigger performance improvement to get BTRFS to intelligently
>>>> stripe reads and writes (at the moment, any given worker thread
>>>> only dispatches one write or read to a single device at a
>>>> time, and any given write() or read() syscall gets handled by
>>>> only one worker).
>>>>
>>>
>>> I will look into this idea and see if I can do this for writes.
>>> Regards Nick
>>
>> Austin, Seems since we don't want to release the cache for inodes
>> in order to improve writes if are going to use the page cache. We
>> seem to be doing this for writes in end_compressed_bio_write for
>> standard pages and in end_compressed_bio_write. If we want to cache
>> write pages why are we removing then ? Seems like this needs to be
>> removed in order to start off. Regards Nick
>>
> I'm not entirely sure, it's been a while since I went exploring in the
> page-cache code. My guess is that there is some reason that you and I
> aren't seeing that we are trying for write-around semantics, maybe one
> of the people who originally wrote this code could weigh in? Part of
> this might be to do with the fact that normal page-cache semantics
> don't always work as expected with COW filesystems (cause a write goes
> to a different block on the device than a read before the write would
> have gone to). It might be easier to parallelize reads first, and
> then work from that (and most workloads would probably benefit more
> from the parallelized reads).
>
I will look into this later today and work on it then.
Regards Nick

2014-07-29 17:08:25

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause <[email protected]> wrote:
> On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
> <[email protected]> wrote:
>> On 2014-07-28 11:57, Nick Krause wrote:
>>> On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]>
>>> wrote:
>>>> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
>>>> <[email protected]> wrote:
>>>>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>>>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>>>>> <[email protected]> wrote:
>>>>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>>>>> This may be a bad idea , but compression in brtfs seems
>>>>>>>> to be only using one core to compress. Depending on the
>>>>>>>> CPU used and the amount of cores in the CPU we can make
>>>>>>>> this much faster with multiple cores. This seems bad by
>>>>>>>> my reading at least I would recommend for writing
>>>>>>>> compression we write a function to use a certain amount
>>>>>>>> of cores based on the load of the system's CPU not using
>>>>>>>> more then 75% of the system's CPU resources as my system
>>>>>>>> when idle has never needed more then one core of my i5
>>>>>>>> 2500k to run when with interrupts for opening eclipse are
>>>>>>>> running. For reading compression on good core seems fine
>>>>>>>> to me as testing other compression software for reads ,
>>>>>>>> it's way less CPU intensive. Cheers Nick
>>>>>>> We would probably get a bigger benefit from taking an
>>>>>>> approach like SquashFS has recently added, that is,
>>>>>>> allowing multi-threaded decompression fro reads, and
>>>>>>> decompressing directly into the pagecache. Such an approach
>>>>>>> would likely make zlib compression much more scalable on
>>>>>>> large systems.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Austin, That seems better then my idea as you seem to be more
>>>>>> up to date on brtfs devolopment. If you and the other
>>>>>> developers of brtfs are interested in adding this as a
>>>>>> feature please let me known as I would like to help improve
>>>>>> brtfs as the file system as an idea is great just seems like
>>>>>> it needs a lot of work :). Nick
>>>>> I wouldn't say that I am a BTRFS developer (power user maybe?),
>>>>> but I would definitely say that parallelizing compression on
>>>>> writes would be a good idea too (especially for things like
>>>>> lz4, which IIRC is either in 3.16 or in the queue for 3.17).
>>>>> Both options would be a lot of work, but almost any performance
>>>>> optimization would. I would almost say that it would provide a
>>>>> bigger performance improvement to get BTRFS to intelligently
>>>>> stripe reads and writes (at the moment, any given worker thread
>>>>> only dispatches one write or read to a single device at a
>>>>> time, and any given write() or read() syscall gets handled by
>>>>> only one worker).
>>>>>
>>>>
>>>> I will look into this idea and see if I can do this for writes.
>>>> Regards Nick
>>>
>>> Austin, Seems since we don't want to release the cache for inodes
>>> in order to improve writes if are going to use the page cache. We
>>> seem to be doing this for writes in end_compressed_bio_write for
>>> standard pages and in end_compressed_bio_write. If we want to cache
>>> write pages why are we removing then ? Seems like this needs to be
>>> removed in order to start off. Regards Nick
>>>
>> I'm not entirely sure, it's been a while since I went exploring in the
>> page-cache code. My guess is that there is some reason that you and I
>> aren't seeing that we are trying for write-around semantics, maybe one
>> of the people who originally wrote this code could weigh in? Part of
>> this might be to do with the fact that normal page-cache semantics
>> don't always work as expected with COW filesystems (cause a write goes
>> to a different block on the device than a read before the write would
>> have gone to). It might be easier to parallelize reads first, and
>> then work from that (and most workloads would probably benefit more
>> from the parallelized reads).
>>
> I will look into this later today and work on it then.
> Regards Nick

Seems the best way to do is to create a kernel thread per core like in NFS and
depending on the load of the system use these threads.
Regards Nick

2014-07-29 17:14:47

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On 2014-07-29 13:08, Nick Krause wrote:
> On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause <[email protected]> wrote:
>> On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
>> <[email protected]> wrote:
>>> On 2014-07-28 11:57, Nick Krause wrote:
>>>> On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]>
>>>> wrote:
>>>>> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
>>>>> <[email protected]> wrote:
>>>>>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>>>>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>>>>>> <[email protected]> wrote:
>>>>>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>>>>>> This may be a bad idea , but compression in brtfs seems
>>>>>>>>> to be only using one core to compress. Depending on the
>>>>>>>>> CPU used and the amount of cores in the CPU we can make
>>>>>>>>> this much faster with multiple cores. This seems bad by
>>>>>>>>> my reading at least I would recommend for writing
>>>>>>>>> compression we write a function to use a certain amount
>>>>>>>>> of cores based on the load of the system's CPU not using
>>>>>>>>> more then 75% of the system's CPU resources as my system
>>>>>>>>> when idle has never needed more then one core of my i5
>>>>>>>>> 2500k to run when with interrupts for opening eclipse are
>>>>>>>>> running. For reading compression on good core seems fine
>>>>>>>>> to me as testing other compression software for reads ,
>>>>>>>>> it's way less CPU intensive. Cheers Nick
>>>>>>>> We would probably get a bigger benefit from taking an
>>>>>>>> approach like SquashFS has recently added, that is,
>>>>>>>> allowing multi-threaded decompression fro reads, and
>>>>>>>> decompressing directly into the pagecache. Such an approach
>>>>>>>> would likely make zlib compression much more scalable on
>>>>>>>> large systems.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Austin, That seems better then my idea as you seem to be more
>>>>>>> up to date on brtfs devolopment. If you and the other
>>>>>>> developers of brtfs are interested in adding this as a
>>>>>>> feature please let me known as I would like to help improve
>>>>>>> brtfs as the file system as an idea is great just seems like
>>>>>>> it needs a lot of work :). Nick
>>>>>> I wouldn't say that I am a BTRFS developer (power user maybe?),
>>>>>> but I would definitely say that parallelizing compression on
>>>>>> writes would be a good idea too (especially for things like
>>>>>> lz4, which IIRC is either in 3.16 or in the queue for 3.17).
>>>>>> Both options would be a lot of work, but almost any performance
>>>>>> optimization would. I would almost say that it would provide a
>>>>>> bigger performance improvement to get BTRFS to intelligently
>>>>>> stripe reads and writes (at the moment, any given worker thread
>>>>>> only dispatches one write or read to a single device at a
>>>>>> time, and any given write() or read() syscall gets handled by
>>>>>> only one worker).
>>>>>>
>>>>>
>>>>> I will look into this idea and see if I can do this for writes.
>>>>> Regards Nick
>>>>
>>>> Austin, Seems since we don't want to release the cache for inodes
>>>> in order to improve writes if are going to use the page cache. We
>>>> seem to be doing this for writes in end_compressed_bio_write for
>>>> standard pages and in end_compressed_bio_write. If we want to cache
>>>> write pages why are we removing then ? Seems like this needs to be
>>>> removed in order to start off. Regards Nick
>>>>
>>> I'm not entirely sure, it's been a while since I went exploring in the
>>> page-cache code. My guess is that there is some reason that you and I
>>> aren't seeing that we are trying for write-around semantics, maybe one
>>> of the people who originally wrote this code could weigh in? Part of
>>> this might be to do with the fact that normal page-cache semantics
>>> don't always work as expected with COW filesystems (cause a write goes
>>> to a different block on the device than a read before the write would
>>> have gone to). It might be easier to parallelize reads first, and
>>> then work from that (and most workloads would probably benefit more
>>> from the parallelized reads).
>>>
>> I will look into this later today and work on it then.
>> Regards Nick
>
> Seems the best way to do is to create a kernel thread per core like in NFS and
> depending on the load of the system use these threads.
> Regards Nick
>
It might be more work now, but it would probably be better in the long
run to do it using kernel workqueues, as they would provide better
support for suspend/hibernate/resume, and then you wouldn't need to
worry about scheduling or how many CPU cores are in the system.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2014-07-29 17:38:11

by Nicholas Krause

[permalink] [raw]
Subject: Re: Multi Core Support for compression in compression.c

On Tue, Jul 29, 2014 at 1:14 PM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 2014-07-29 13:08, Nick Krause wrote:
>> On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause <[email protected]> wrote:
>>> On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
>>> <[email protected]> wrote:
>>>> On 2014-07-28 11:57, Nick Krause wrote:
>>>>> On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause <[email protected]>
>>>>> wrote:
>>>>>> On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
>>>>>> <[email protected]> wrote:
>>>>>>> On 07/27/2014 11:21 PM, Nick Krause wrote:
>>>>>>>> On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
>>>>>>>> <[email protected]> wrote:
>>>>>>>>> On 07/27/2014 04:47 PM, Nick Krause wrote:
>>>>>>>>>> This may be a bad idea , but compression in brtfs seems
>>>>>>>>>> to be only using one core to compress. Depending on the
>>>>>>>>>> CPU used and the amount of cores in the CPU we can make
>>>>>>>>>> this much faster with multiple cores. This seems bad by
>>>>>>>>>> my reading at least I would recommend for writing
>>>>>>>>>> compression we write a function to use a certain amount
>>>>>>>>>> of cores based on the load of the system's CPU not using
>>>>>>>>>> more then 75% of the system's CPU resources as my system
>>>>>>>>>> when idle has never needed more then one core of my i5
>>>>>>>>>> 2500k to run when with interrupts for opening eclipse are
>>>>>>>>>> running. For reading compression on good core seems fine
>>>>>>>>>> to me as testing other compression software for reads ,
>>>>>>>>>> it's way less CPU intensive. Cheers Nick
>>>>>>>>> We would probably get a bigger benefit from taking an
>>>>>>>>> approach like SquashFS has recently added, that is,
>>>>>>>>> allowing multi-threaded decompression fro reads, and
>>>>>>>>> decompressing directly into the pagecache. Such an approach
>>>>>>>>> would likely make zlib compression much more scalable on
>>>>>>>>> large systems.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Austin, That seems better then my idea as you seem to be more
>>>>>>>> up to date on brtfs devolopment. If you and the other
>>>>>>>> developers of brtfs are interested in adding this as a
>>>>>>>> feature please let me known as I would like to help improve
>>>>>>>> brtfs as the file system as an idea is great just seems like
>>>>>>>> it needs a lot of work :). Nick
>>>>>>> I wouldn't say that I am a BTRFS developer (power user maybe?),
>>>>>>> but I would definitely say that parallelizing compression on
>>>>>>> writes would be a good idea too (especially for things like
>>>>>>> lz4, which IIRC is either in 3.16 or in the queue for 3.17).
>>>>>>> Both options would be a lot of work, but almost any performance
>>>>>>> optimization would. I would almost say that it would provide a
>>>>>>> bigger performance improvement to get BTRFS to intelligently
>>>>>>> stripe reads and writes (at the moment, any given worker thread
>>>>>>> only dispatches one write or read to a single device at a
>>>>>>> time, and any given write() or read() syscall gets handled by
>>>>>>> only one worker).
>>>>>>>
>>>>>>
>>>>>> I will look into this idea and see if I can do this for writes.
>>>>>> Regards Nick
>>>>>
>>>>> Austin, Seems since we don't want to release the cache for inodes
>>>>> in order to improve writes if are going to use the page cache. We
>>>>> seem to be doing this for writes in end_compressed_bio_write for
>>>>> standard pages and in end_compressed_bio_write. If we want to cache
>>>>> write pages why are we removing then ? Seems like this needs to be
>>>>> removed in order to start off. Regards Nick
>>>>>
>>>> I'm not entirely sure, it's been a while since I went exploring in the
>>>> page-cache code. My guess is that there is some reason that you and I
>>>> aren't seeing that we are trying for write-around semantics, maybe one
>>>> of the people who originally wrote this code could weigh in? Part of
>>>> this might be to do with the fact that normal page-cache semantics
>>>> don't always work as expected with COW filesystems (cause a write goes
>>>> to a different block on the device than a read before the write would
>>>> have gone to). It might be easier to parallelize reads first, and
>>>> then work from that (and most workloads would probably benefit more
>>>> from the parallelized reads).
>>>>
>>> I will look into this later today and work on it then.
>>> Regards Nick
>>
>> Seems the best way to do is to create a kernel thread per core like in NFS and
>> depending on the load of the system use these threads.
>> Regards Nick
>>
> It might be more work now, but it would probably be better in the long
> run to do it using kernel workqueues, as they would provide better
> support for suspend/hibernate/resume, and then you wouldn't need to
> worry about scheduling or how many CPU cores are in the system.
>

Seems better then my ideas , I will need to work on this later as for now I have
some reading on the Linux networking stack.
Regards Nick