Swap subsystem does lazy swap slot free with expecting the page
would be swapped out again so we can avoid unnecessary write.
But the problem in in-memory swap(ex, zram) is that it consumes
memory space until vm_swap_full(ie, used half of all of swap device)
condition meet. It could be bad if we use multiple swap device,
small in-memory swap and big storage swap or in-memory swap alone.
This patch makes swap subsystem free swap slot as soon as swap-read
is completed and make the swapcache page dirty so the page should
be written out the swap device to reclaim it.
It means we never lose it.
I tested this patch with kernel compile workload.
1. before
compile time : 9882.42
zram max wasted space by fragmentation: 13471881 byte
memory space consumed by zram: 174227456 byte
the number of slot free notify: 206684
2. after
compile time : 9653.90
zram max wasted space by fragmentation: 11805932 byte
memory space consumed by zram: 154001408 byte
the number of slot free notify: 426972
Cc: Hugh Dickins <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Shaohua Li <[email protected]>
Signed-off-by: Dan Magenheimer <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
Fragment ratio is almost same but memory consumption and compile time
is better. I am working to add defragment function of zsmalloc.
mm/page_io.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/mm/page_io.c b/mm/page_io.c
index 78eee32..644900a 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -20,6 +20,7 @@
#include <linux/buffer_head.h>
#include <linux/writeback.h>
#include <linux/frontswap.h>
+#include <linux/blkdev.h>
#include <asm/pgtable.h>
static struct bio *get_swap_bio(gfp_t gfp_flags,
@@ -81,8 +82,30 @@ void end_swap_bio_read(struct bio *bio, int err)
iminor(bio->bi_bdev->bd_inode),
(unsigned long long)bio->bi_sector);
} else {
+ /*
+ * There is no reason to keep both uncompressed data and
+ * compressed data in memory.
+ */
+ struct swap_info_struct *sis;
+
SetPageUptodate(page);
+ sis = page_swap_info(page);
+ if (sis->flags & SWP_BLKDEV) {
+ struct gendisk *disk = sis->bdev->bd_disk;
+ if (disk->fops->swap_slot_free_notify) {
+ swp_entry_t entry;
+ unsigned long offset;
+
+ entry.val = page_private(page);
+ offset = swp_offset(entry);
+
+ SetPageDirty(page);
+ disk->fops->swap_slot_free_notify(sis->bdev,
+ offset);
+ }
+ }
}
+
unlock_page(page);
bio_put(bio);
}
--
1.8.2
> From: Minchan Kim [mailto:[email protected]]
> Sent: Monday, April 08, 2013 12:01 AM
> Subject: [PATCH] mm: remove compressed copy from zram in-memory
(patch removed)
> Fragment ratio is almost same but memory consumption and compile time
> is better. I am working to add defragment function of zsmalloc.
Hi Minchan --
I would be very interested in your design thoughts on
how you plan to add defragmentation for zsmalloc. In
particular, I am wondering if your design will also
handle the requirements for zcache (especially for
cleancache pages) and perhaps also for ramster.
In https://lkml.org/lkml/2013/3/27/501 I suggested it
would be good to work together on a common design, but
you didn't reply. Are you thinking that zsmalloc
improvements should focus only on zram, in which case
we may -- and possibly should -- end up with a different
allocator for frontswap-based/cleancache-based compression
in zcache (and possibly zswap)?
I'm just trying to determine if I should proceed separately
with my design (with Bob Liu, who expressed interest) or if
it would be beneficial to work together.
Thanks,
Dan
On Mon, 8 Apr 2013 15:01:02 +0900 Minchan Kim <[email protected]> wrote:
> Swap subsystem does lazy swap slot free with expecting the page
> would be swapped out again so we can avoid unnecessary write.
Is that correct? How can it save a write?
> But the problem in in-memory swap(ex, zram) is that it consumes
> memory space until vm_swap_full(ie, used half of all of swap device)
> condition meet. It could be bad if we use multiple swap device,
> small in-memory swap and big storage swap or in-memory swap alone.
>
> This patch makes swap subsystem free swap slot as soon as swap-read
> is completed and make the swapcache page dirty so the page should
> be written out the swap device to reclaim it.
> It means we never lose it.
>From my reading of the patch, that isn't how it works? It changed
end_swap_bio_read() to call zram_slot_free_notify(), which appears to
free the underlying compressed page. I have a feeling I'm hopelessly
confused.
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -20,6 +20,7 @@
> #include <linux/buffer_head.h>
> #include <linux/writeback.h>
> #include <linux/frontswap.h>
> +#include <linux/blkdev.h>
> #include <asm/pgtable.h>
>
> static struct bio *get_swap_bio(gfp_t gfp_flags,
> @@ -81,8 +82,30 @@ void end_swap_bio_read(struct bio *bio, int err)
> iminor(bio->bi_bdev->bd_inode),
> (unsigned long long)bio->bi_sector);
> } else {
> + /*
> + * There is no reason to keep both uncompressed data and
> + * compressed data in memory.
> + */
> + struct swap_info_struct *sis;
> +
> SetPageUptodate(page);
> + sis = page_swap_info(page);
> + if (sis->flags & SWP_BLKDEV) {
> + struct gendisk *disk = sis->bdev->bd_disk;
> + if (disk->fops->swap_slot_free_notify) {
> + swp_entry_t entry;
> + unsigned long offset;
> +
> + entry.val = page_private(page);
> + offset = swp_offset(entry);
> +
> + SetPageDirty(page);
> + disk->fops->swap_slot_free_notify(sis->bdev,
> + offset);
> + }
> + }
> }
> +
> unlock_page(page);
> bio_put(bio);
The new code is wasted space if CONFIG_BLOCK=n, yes?
Also, what's up with the SWP_BLKDEV test? zram doesn't support
SWP_FILE? Why on earth not?
Putting swap_slot_free_notify() into block_device_operations seems
rather wrong. It precludes zram-over-swapfiles for all time and means
that other subsystems cannot get notifications for swap slot freeing
for swapfile-backed swap.
Hi Andrew,
On Mon, Apr 08, 2013 at 02:17:10PM -0700, Andrew Morton wrote:
> On Mon, 8 Apr 2013 15:01:02 +0900 Minchan Kim <[email protected]> wrote:
>
> > Swap subsystem does lazy swap slot free with expecting the page
> > would be swapped out again so we can avoid unnecessary write.
>
> Is that correct? How can it save a write?
Correct.
The add_to_swap makes the page dirty and we must pageout only if the page is
dirty. If a anon page is already charged into swapcache, we skip writeout
the page in shrink_page_list, then just remove the page from swapcache and
free it by __remove_mapping.
I did received same question multiple time so it would be good idea to
write down it in vmscan.c somewhere.
>
> > But the problem in in-memory swap(ex, zram) is that it consumes
> > memory space until vm_swap_full(ie, used half of all of swap device)
> > condition meet. It could be bad if we use multiple swap device,
> > small in-memory swap and big storage swap or in-memory swap alone.
> >
> > This patch makes swap subsystem free swap slot as soon as swap-read
> > is completed and make the swapcache page dirty so the page should
> > be written out the swap device to reclaim it.
> > It means we never lose it.
>
> >From my reading of the patch, that isn't how it works? It changed
> end_swap_bio_read() to call zram_slot_free_notify(), which appears to
> free the underlying compressed page. I have a feeling I'm hopelessly
> confused.
You understand right totally.
Selecting swap slot in my description was totally miss.
Need to rewrite the description.
>
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -20,6 +20,7 @@
> > #include <linux/buffer_head.h>
> > #include <linux/writeback.h>
> > #include <linux/frontswap.h>
> > +#include <linux/blkdev.h>
> > #include <asm/pgtable.h>
> >
> > static struct bio *get_swap_bio(gfp_t gfp_flags,
> > @@ -81,8 +82,30 @@ void end_swap_bio_read(struct bio *bio, int err)
> > iminor(bio->bi_bdev->bd_inode),
> > (unsigned long long)bio->bi_sector);
> > } else {
> > + /*
> > + * There is no reason to keep both uncompressed data and
> > + * compressed data in memory.
> > + */
> > + struct swap_info_struct *sis;
> > +
> > SetPageUptodate(page);
> > + sis = page_swap_info(page);
> > + if (sis->flags & SWP_BLKDEV) {
> > + struct gendisk *disk = sis->bdev->bd_disk;
> > + if (disk->fops->swap_slot_free_notify) {
> > + swp_entry_t entry;
> > + unsigned long offset;
> > +
> > + entry.val = page_private(page);
> > + offset = swp_offset(entry);
> > +
> > + SetPageDirty(page);
> > + disk->fops->swap_slot_free_notify(sis->bdev,
> > + offset);
> > + }
> > + }
> > }
> > +
> > unlock_page(page);
> > bio_put(bio);
>
> The new code is wasted space if CONFIG_BLOCK=n, yes?
CONFIG_SWAP is already dependent on CONFIG_BLOCK.
>
> Also, what's up with the SWP_BLKDEV test? zram doesn't support
> SWP_FILE? Why on earth not?
>
> Putting swap_slot_free_notify() into block_device_operations seems
> rather wrong. It precludes zram-over-swapfiles for all time and means
> that other subsystems cannot get notifications for swap slot freeing
> for swapfile-backed swap.
Zram is just pseudo-block device so anyone can format it with any FSes
and swapon a file. In such case, he can't get a benefit from
swap_slot_free_notify. But I think it's not a severe problem because
there is no reason to use a file-swap on zram. If anyone want to use it,
I'd like to know the reason. If it's reasonable, we have to rethink a
wheel and it's another story, IMHO.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
Hi Dan,
On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:[email protected]]
> > Sent: Monday, April 08, 2013 12:01 AM
> > Subject: [PATCH] mm: remove compressed copy from zram in-memory
>
> (patch removed)
>
> > Fragment ratio is almost same but memory consumption and compile time
> > is better. I am working to add defragment function of zsmalloc.
>
> Hi Minchan --
>
> I would be very interested in your design thoughts on
> how you plan to add defragmentation for zsmalloc. In
What I can say now about is only just a word "Compaction".
As you know, zsmalloc has a transparent handle so we can do whatever
under user. Of course, there is a tradeoff between performance
and memory efficiency. I'm biased to latter for embedded usecase.
And I might post it because as you know well, zsmalloc
> particular, I am wondering if your design will also
> handle the requirements for zcache (especially for
> cleancache pages) and perhaps also for ramster.
I don't know requirements for cleancache pages but compaction is
general as you know well so I expect you can get a benefit from it
if you are concern on memory efficiency but not sure it's valuable
to compact cleancache pages for getting more slot in RAM.
Sometime, just discarding would be much better, IMHO.
>
> In https://lkml.org/lkml/2013/3/27/501 I suggested it
> would be good to work together on a common design, but
> you didn't reply. Are you thinking that zsmalloc
I saw the thread but explicit agreement is really matter?
I believe everybody want it although they didn't reply. :)
You can make the design/post it or prototyping/post it.
If there are some conflit with something in my brain,
I will be happy to feedback. :)
Anyway, I think my above statement "COMPACTION" would be enough to
express my current thought to avoid duplicated work and you can catch up.
I will get around to it after LSF/MM.
> improvements should focus only on zram, in which case
Just focusing zsmalloc.
> we may -- and possibly should -- end up with a different
> allocator for frontswap-based/cleancache-based compression
> in zcache (and possibly zswap)?
>
> I'm just trying to determine if I should proceed separately
> with my design (with Bob Liu, who expressed interest) or if
> it would be beneficial to work together.
Just posting and if it affects zsmalloc/zram/zswap and goes the way
I don't want, I will involve the discussion because our product uses
zram heavily and consider zswap, too.
I really appreciate your enthusiastic collaboration model to find
optimal solution!
>
> Thanks,
> Dan
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> Hi Dan,
>
> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > From: Minchan Kim [mailto:[email protected]]
> > > Sent: Monday, April 08, 2013 12:01 AM
> > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> >
> > (patch removed)
> >
> > > Fragment ratio is almost same but memory consumption and compile time
> > > is better. I am working to add defragment function of zsmalloc.
> >
> > Hi Minchan --
> >
> > I would be very interested in your design thoughts on
> > how you plan to add defragmentation for zsmalloc. In
>
> What I can say now about is only just a word "Compaction".
> As you know, zsmalloc has a transparent handle so we can do whatever
> under user. Of course, there is a tradeoff between performance
> and memory efficiency. I'm biased to latter for embedded usecase.
>
> And I might post it because as you know well, zsmalloc
Incomplete sentense,
I might not post it until promoting zsmalloc because as you know well,
zsmalloc/zram's all new stuffs are blocked into staging tree.
Even if we could add it into staging, as you know well, staging is where
every mm guys ignore so we end up needing another round to promote it. sigh.
I hope it gets better after LSF/MM.
--
Kind regards,
Minchan Kim
Hi Minchan,
On 04/09/2013 09:02 AM, Minchan Kim wrote:
> Hi Andrew,
>
> On Mon, Apr 08, 2013 at 02:17:10PM -0700, Andrew Morton wrote:
>> On Mon, 8 Apr 2013 15:01:02 +0900 Minchan Kim <[email protected]> wrote:
>>
>>> Swap subsystem does lazy swap slot free with expecting the page
>>> would be swapped out again so we can avoid unnecessary write.
>> Is that correct? How can it save a write?
> Correct.
>
> The add_to_swap makes the page dirty and we must pageout only if the page is
> dirty. If a anon page is already charged into swapcache, we skip writeout
> the page in shrink_page_list, then just remove the page from swapcache and
> free it by __remove_mapping.
>
> I did received same question multiple time so it would be good idea to
> write down it in vmscan.c somewhere.
>
>>> But the problem in in-memory swap(ex, zram) is that it consumes
>>> memory space until vm_swap_full(ie, used half of all of swap device)
>>> condition meet. It could be bad if we use multiple swap device,
>>> small in-memory swap and big storage swap or in-memory swap alone.
>>>
>>> This patch makes swap subsystem free swap slot as soon as swap-read
>>> is completed and make the swapcache page dirty so the page should
>>> be written out the swap device to reclaim it.
>>> It means we never lose it.
>> >From my reading of the patch, that isn't how it works? It changed
>> end_swap_bio_read() to call zram_slot_free_notify(), which appears to
>> free the underlying compressed page. I have a feeling I'm hopelessly
>> confused.
> You understand right totally.
> Selecting swap slot in my description was totally miss.
> Need to rewrite the description.
free the swap slot and free compress page is the same, isn't it?
>
>>> --- a/mm/page_io.c
>>> +++ b/mm/page_io.c
>>> @@ -20,6 +20,7 @@
>>> #include <linux/buffer_head.h>
>>> #include <linux/writeback.h>
>>> #include <linux/frontswap.h>
>>> +#include <linux/blkdev.h>
>>> #include <asm/pgtable.h>
>>>
>>> static struct bio *get_swap_bio(gfp_t gfp_flags,
>>> @@ -81,8 +82,30 @@ void end_swap_bio_read(struct bio *bio, int err)
>>> iminor(bio->bi_bdev->bd_inode),
>>> (unsigned long long)bio->bi_sector);
>>> } else {
>>> + /*
>>> + * There is no reason to keep both uncompressed data and
>>> + * compressed data in memory.
>>> + */
>>> + struct swap_info_struct *sis;
>>> +
>>> SetPageUptodate(page);
>>> + sis = page_swap_info(page);
>>> + if (sis->flags & SWP_BLKDEV) {
>>> + struct gendisk *disk = sis->bdev->bd_disk;
>>> + if (disk->fops->swap_slot_free_notify) {
>>> + swp_entry_t entry;
>>> + unsigned long offset;
>>> +
>>> + entry.val = page_private(page);
>>> + offset = swp_offset(entry);
>>> +
>>> + SetPageDirty(page);
>>> + disk->fops->swap_slot_free_notify(sis->bdev,
>>> + offset);
>>> + }
>>> + }
>>> }
>>> +
>>> unlock_page(page);
>>> bio_put(bio);
>> The new code is wasted space if CONFIG_BLOCK=n, yes?
> CONFIG_SWAP is already dependent on CONFIG_BLOCK.
>
>> Also, what's up with the SWP_BLKDEV test? zram doesn't support
>> SWP_FILE? Why on earth not?
>>
>> Putting swap_slot_free_notify() into block_device_operations seems
>> rather wrong. It precludes zram-over-swapfiles for all time and means
>> that other subsystems cannot get notifications for swap slot freeing
>> for swapfile-backed swap.
> Zram is just pseudo-block device so anyone can format it with any FSes
> and swapon a file. In such case, he can't get a benefit from
> swap_slot_free_notify. But I think it's not a severe problem because
> there is no reason to use a file-swap on zram. If anyone want to use it,
> I'd like to know the reason. If it's reasonable, we have to rethink a
> wheel and it's another story, IMHO.
>
>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Tue, 9 Apr 2013 10:02:31 +0900 Minchan Kim <[email protected]> wrote:
> > Also, what's up with the SWP_BLKDEV test? zram doesn't support
> > SWP_FILE? Why on earth not?
> >
> > Putting swap_slot_free_notify() into block_device_operations seems
> > rather wrong. It precludes zram-over-swapfiles for all time and means
> > that other subsystems cannot get notifications for swap slot freeing
> > for swapfile-backed swap.
>
> Zram is just pseudo-block device so anyone can format it with any FSes
> and swapon a file. In such case, he can't get a benefit from
> swap_slot_free_notify. But I think it's not a severe problem because
> there is no reason to use a file-swap on zram. If anyone want to use it,
> I'd like to know the reason. If it's reasonable, we have to rethink a
> wheel and it's another story, IMHO.
My point is that making the swap_slot_free_notify() callback a
blockdev-specific thing was restrictive. What happens if someone wants
to use it for swapfile-backed swap? This has nothing to do with zram.
> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> Hi Dan,
>
> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > From: Minchan Kim [mailto:[email protected]]
> > > Sent: Monday, April 08, 2013 12:01 AM
> > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> >
> > (patch removed)
> >
> > > Fragment ratio is almost same but memory consumption and compile time
> > > is better. I am working to add defragment function of zsmalloc.
> >
> > Hi Minchan --
> >
> > I would be very interested in your design thoughts on
> > how you plan to add defragmentation for zsmalloc. In
>
> What I can say now about is only just a word "Compaction".
> As you know, zsmalloc has a transparent handle so we can do whatever
> under user. Of course, there is a tradeoff between performance
> and memory efficiency. I'm biased to latter for embedded usecase.
Have you designed or implemented this yet? I have a couple
of concerns:
1) The handle is transparent to the "user", but it is still a form
of a "pointer" to a zpage. Are you planning on walking zram's
tables and changing those pointers? That may be OK for zram
but for more complex data structures than tables (as in zswap
and zcache) it may not be as easy, due to races, or as efficient
because you will have to walk potentially very large trees.
2) Compaction in the kernel is heavily dependent on page migration
and page migration is dependent on using flags in the struct page.
There's a lot of code in those two code modules and there
are going to be a lot of implementation differences between
compacting pages vs compacting zpages.
I'm also wondering if you will be implementing "variable length
zspages". Without that, I'm not sure compaction will help
enough. (And that is a good example of the difference between
the kernel page compaction design/code and zspage compaction.)
> > particular, I am wondering if your design will also
> > handle the requirements for zcache (especially for
> > cleancache pages) and perhaps also for ramster.
>
> I don't know requirements for cleancache pages but compaction is
> general as you know well so I expect you can get a benefit from it
> if you are concern on memory efficiency but not sure it's valuable
> to compact cleancache pages for getting more slot in RAM.
> Sometime, just discarding would be much better, IMHO.
Zcache has page reclaim. Zswap has zpage reclaim. I am
concerned that these continue to work in the presence of
compaction. With no reclaim at all, zram is a simpler use
case but if you implement compaction in a way that can't be
used by either zcache or zswap, then zsmalloc is essentially
forking.
> > In https://lkml.org/lkml/2013/3/27/501 I suggested it
> > would be good to work together on a common design, but
> > you didn't reply. Are you thinking that zsmalloc
>
> I saw the thread but explicit agreement is really matter?
> I believe everybody want it although they didn't reply. :)
>
> You can make the design/post it or prototyping/post it.
> If there are some conflit with something in my brain,
> I will be happy to feedback. :)
>
> Anyway, I think my above statement "COMPACTION" would be enough to
> express my current thought to avoid duplicated work and you can catch up.
>
> I will get around to it after LSF/MM.
>
> > improvements should focus only on zram, in which case
>
> Just focusing zsmalloc.
Right. Again, I am asking if you are changing zsmalloc in
a way that helps zram but hurts zswap and makes it impossible
for zcache to ever use the improvements to zsmalloc.
If so, that's fine, but please make it clear that is your goal.
> > we may -- and possibly should -- end up with a different
> > allocator for frontswap-based/cleancache-based compression
> > in zcache (and possibly zswap)?
>
> > I'm just trying to determine if I should proceed separately
> > with my design (with Bob Liu, who expressed interest) or if
> > it would be beneficial to work together.
>
> Just posting and if it affects zsmalloc/zram/zswap and goes the way
> I don't want, I will involve the discussion because our product uses
> zram heavily and consider zswap, too.
>
> I really appreciate your enthusiastic collaboration model to find
> optimal solution!
My goal is to have compression be an integral part of Linux
memory management. It may be tied to a config option, but
the goal is that distros turn it on by default. I don't think
zsmalloc meets that objective yet, but it may be fine for
your needs. If so it would be good to understand exactly why
it doesn't meet the other zproject needs.
> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> > Hi Dan,
> >
> > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > From: Minchan Kim [mailto:[email protected]]
> > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > >
> > > (patch removed)
> > >
> > > > Fragment ratio is almost same but memory consumption and compile time
> > > > is better. I am working to add defragment function of zsmalloc.
> > >
> > > Hi Minchan --
> > >
> > > I would be very interested in your design thoughts on
> > > how you plan to add defragmentation for zsmalloc. In
> >
> > What I can say now about is only just a word "Compaction".
> > As you know, zsmalloc has a transparent handle so we can do whatever
> > under user. Of course, there is a tradeoff between performance
> > and memory efficiency. I'm biased to latter for embedded usecase.
> >
> > And I might post it because as you know well, zsmalloc
>
> Incomplete sentense,
>
> I might not post it until promoting zsmalloc because as you know well,
> zsmalloc/zram's all new stuffs are blocked into staging tree.
> Even if we could add it into staging, as you know well, staging is where
> every mm guys ignore so we end up needing another round to promote it. sigh.
>
> I hope it gets better after LSF/MM.
If zsmalloc is moving in the direction of supporting only zram,
why should it be promoted into mm, or even lib? Why not promote
zram into drivers and put zsmalloc.c in the same directory?
On 04/08/2013 08:36 PM, Minchan Kim wrote:
> On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
>> Hi Dan,
>>
>> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
>>>> From: Minchan Kim [mailto:[email protected]]
>>>> Sent: Monday, April 08, 2013 12:01 AM
>>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
>>>
>>> (patch removed)
>>>
>>>> Fragment ratio is almost same but memory consumption and compile time
>>>> is better. I am working to add defragment function of zsmalloc.
>>>
>>> Hi Minchan --
>>>
>>> I would be very interested in your design thoughts on
>>> how you plan to add defragmentation for zsmalloc. In
>>
>> What I can say now about is only just a word "Compaction".
>> As you know, zsmalloc has a transparent handle so we can do whatever
>> under user. Of course, there is a tradeoff between performance
>> and memory efficiency. I'm biased to latter for embedded usecase.
>>
>> And I might post it because as you know well, zsmalloc
>
> Incomplete sentense,
>
> I might not post it until promoting zsmalloc because as you know well,
> zsmalloc/zram's all new stuffs are blocked into staging tree.
> Even if we could add it into staging, as you know well, staging is where
> every mm guys ignore so we end up needing another round to promote it. sigh.
Yes. The lack of compaction/defragmentation support in zsmalloc has not
been raised as an obstacle to mainline acceptance so I think we should
wait to add new features to a yet-to-be accepted codebase.
Also, I think this feature is more important to zram than it is to
zswap/zcache as they can do writeback to free zpages. In other words,
the fragmentation is a transient issue for zswap/zcache since writeback
to the swap device is possible.
Thanks,
Seth
On Tue, Apr 09, 2013 at 01:36:52PM +0800, Ric Mason wrote:
> Hi Minchan,
> On 04/09/2013 09:02 AM, Minchan Kim wrote:
> >Hi Andrew,
> >
> >On Mon, Apr 08, 2013 at 02:17:10PM -0700, Andrew Morton wrote:
> >>On Mon, 8 Apr 2013 15:01:02 +0900 Minchan Kim <[email protected]> wrote:
> >>
> >>>Swap subsystem does lazy swap slot free with expecting the page
> >>>would be swapped out again so we can avoid unnecessary write.
> >>Is that correct? How can it save a write?
> >Correct.
> >
> >The add_to_swap makes the page dirty and we must pageout only if the page is
> >dirty. If a anon page is already charged into swapcache, we skip writeout
> >the page in shrink_page_list, then just remove the page from swapcache and
> >free it by __remove_mapping.
> >
> >I did received same question multiple time so it would be good idea to
> >write down it in vmscan.c somewhere.
> >
> >>>But the problem in in-memory swap(ex, zram) is that it consumes
> >>>memory space until vm_swap_full(ie, used half of all of swap device)
> >>>condition meet. It could be bad if we use multiple swap device,
> >>>small in-memory swap and big storage swap or in-memory swap alone.
> >>>
> >>>This patch makes swap subsystem free swap slot as soon as swap-read
> >>>is completed and make the swapcache page dirty so the page should
> >>>be written out the swap device to reclaim it.
> >>>It means we never lose it.
> >>>From my reading of the patch, that isn't how it works? It changed
> >>end_swap_bio_read() to call zram_slot_free_notify(), which appears to
> >>free the underlying compressed page. I have a feeling I'm hopelessly
> >>confused.
> >You understand right totally.
> >Selecting swap slot in my description was totally miss.
> >Need to rewrite the description.
>
> free the swap slot and free compress page is the same, isn't it?
I think so.
I just wanted to make my description more clear with more general terms. :)
Thanks.
--
Kind regards,
Minchan Kim
On Tue, Apr 09, 2013 at 12:54:23PM -0700, Andrew Morton wrote:
> On Tue, 9 Apr 2013 10:02:31 +0900 Minchan Kim <[email protected]> wrote:
>
> > > Also, what's up with the SWP_BLKDEV test? zram doesn't support
> > > SWP_FILE? Why on earth not?
> > >
> > > Putting swap_slot_free_notify() into block_device_operations seems
> > > rather wrong. It precludes zram-over-swapfiles for all time and means
> > > that other subsystems cannot get notifications for swap slot freeing
> > > for swapfile-backed swap.
> >
> > Zram is just pseudo-block device so anyone can format it with any FSes
> > and swapon a file. In such case, he can't get a benefit from
> > swap_slot_free_notify. But I think it's not a severe problem because
> > there is no reason to use a file-swap on zram. If anyone want to use it,
> > I'd like to know the reason. If it's reasonable, we have to rethink a
> > wheel and it's another story, IMHO.
>
> My point is that making the swap_slot_free_notify() callback a
> blockdev-specific thing was restrictive. What happens if someone wants
> to use it for swapfile-backed swap? This has nothing to do with zram.
Agree that it's not specific to zram even if zram is only user at the memont.
IMHO, more general one is that we introduce SWP_INMEMORY with
QUEUE_FLAS_INMEMORY_OR_SOMETHING so we can register swap_slot_free_notify
if backed device has a such type.
Do you really want to do above work(or alternative one, hope someone
in this thread suggest better idea) prio to this patch?
If so, I am happy to do it.
But my concern that it will introduce to change core kernel for staging one
where everyone ignore and apparently someone will regist it so will be stucked,
again.
So, I need a excuse to tell him "Hey guy, akpm told me "let's resolve
the issue prio to solving duplicated copy problem of zram although
it is for staging one". :)
Could you give me a license to kill?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Tue, Apr 09, 2013 at 01:25:45PM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:[email protected]]
> > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
> >
> > Hi Dan,
> >
> > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > From: Minchan Kim [mailto:[email protected]]
> > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > >
> > > (patch removed)
> > >
> > > > Fragment ratio is almost same but memory consumption and compile time
> > > > is better. I am working to add defragment function of zsmalloc.
> > >
> > > Hi Minchan --
> > >
> > > I would be very interested in your design thoughts on
> > > how you plan to add defragmentation for zsmalloc. In
> >
> > What I can say now about is only just a word "Compaction".
> > As you know, zsmalloc has a transparent handle so we can do whatever
> > under user. Of course, there is a tradeoff between performance
> > and memory efficiency. I'm biased to latter for embedded usecase.
>
> Have you designed or implemented this yet? I have a couple
> of concerns:
Not yet implemented but just had a time to think about it, simply.
So surely, there are some obstacle so I want to uncase the code and
number after I make a prototype/test the performance.
Of course, if it has a severe problem, will drop it without wasting
many guys's time.
>
> 1) The handle is transparent to the "user", but it is still a form
> of a "pointer" to a zpage. Are you planning on walking zram's
> tables and changing those pointers? That may be OK for zram
> but for more complex data structures than tables (as in zswap
> and zcache) it may not be as easy, due to races, or as efficient
> because you will have to walk potentially very large trees.
Rough concept is following as.
I'm considering for zsmalloc to return transparent fake handle
but we have to maintain it with real one.
It could be done in zsmalloc internal so there isn't any race we should consider.
> 2) Compaction in the kernel is heavily dependent on page migration
> and page migration is dependent on using flags in the struct page.
> There's a lot of code in those two code modules and there
> are going to be a lot of implementation differences between
> compacting pages vs compacting zpages.
Compaction of kernel is never related to zsmalloc's one.
>
> I'm also wondering if you will be implementing "variable length
> zspages". Without that, I'm not sure compaction will help
> enough. (And that is a good example of the difference between
Why do you think so?
variable lengh zspage could be further step to improve but it's not
only a solution to solve fragmentation.
> the kernel page compaction design/code and zspage compaction.)
>
> > > particular, I am wondering if your design will also
> > > handle the requirements for zcache (especially for
> > > cleancache pages) and perhaps also for ramster.
> >
> > I don't know requirements for cleancache pages but compaction is
> > general as you know well so I expect you can get a benefit from it
> > if you are concern on memory efficiency but not sure it's valuable
> > to compact cleancache pages for getting more slot in RAM.
> > Sometime, just discarding would be much better, IMHO.
>
> Zcache has page reclaim. Zswap has zpage reclaim. I am
> concerned that these continue to work in the presence of
> compaction. With no reclaim at all, zram is a simpler use
> case but if you implement compaction in a way that can't be
> used by either zcache or zswap, then zsmalloc is essentially
> forking.
Don't go too far. If it's really problem for zswap and zcache,
maybe, we could add it optionally.
>
> > > In https://lkml.org/lkml/2013/3/27/501 I suggested it
> > > would be good to work together on a common design, but
> > > you didn't reply. Are you thinking that zsmalloc
> >
> > I saw the thread but explicit agreement is really matter?
> > I believe everybody want it although they didn't reply. :)
> >
> > You can make the design/post it or prototyping/post it.
> > If there are some conflit with something in my brain,
> > I will be happy to feedback. :)
> >
> > Anyway, I think my above statement "COMPACTION" would be enough to
> > express my current thought to avoid duplicated work and you can catch up.
> >
> > I will get around to it after LSF/MM.
> >
> > > improvements should focus only on zram, in which case
> >
> > Just focusing zsmalloc.
>
> Right. Again, I am asking if you are changing zsmalloc in
> a way that helps zram but hurts zswap and makes it impossible
> for zcache to ever use the improvements to zsmalloc.
As I said, I'm biased to memory efficiency rather than performace.
Of course, severe performance drop is disaster but small drop will
be acceptable for memory-efficiency concerning systems.
>
> If so, that's fine, but please make it clear that is your goal.
Simple, help memory hungry system. :)
>
> > > we may -- and possibly should -- end up with a different
> > > allocator for frontswap-based/cleancache-based compression
> > > in zcache (and possibly zswap)?
> >
> > > I'm just trying to determine if I should proceed separately
> > > with my design (with Bob Liu, who expressed interest) or if
> > > it would be beneficial to work together.
> >
> > Just posting and if it affects zsmalloc/zram/zswap and goes the way
> > I don't want, I will involve the discussion because our product uses
> > zram heavily and consider zswap, too.
> >
> > I really appreciate your enthusiastic collaboration model to find
> > optimal solution!
>
> My goal is to have compression be an integral part of Linux
> memory management. It may be tied to a config option, but
> the goal is that distros turn it on by default. I don't think
> zsmalloc meets that objective yet, but it may be fine for
> your needs. If so it would be good to understand exactly why
> it doesn't meet the other zproject needs.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Tue, Apr 09, 2013 at 01:37:47PM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:[email protected]]
> > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
> >
> > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> > > Hi Dan,
> > >
> > > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > > From: Minchan Kim [mailto:[email protected]]
> > > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > > >
> > > > (patch removed)
> > > >
> > > > > Fragment ratio is almost same but memory consumption and compile time
> > > > > is better. I am working to add defragment function of zsmalloc.
> > > >
> > > > Hi Minchan --
> > > >
> > > > I would be very interested in your design thoughts on
> > > > how you plan to add defragmentation for zsmalloc. In
> > >
> > > What I can say now about is only just a word "Compaction".
> > > As you know, zsmalloc has a transparent handle so we can do whatever
> > > under user. Of course, there is a tradeoff between performance
> > > and memory efficiency. I'm biased to latter for embedded usecase.
> > >
> > > And I might post it because as you know well, zsmalloc
> >
> > Incomplete sentense,
> >
> > I might not post it until promoting zsmalloc because as you know well,
> > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > Even if we could add it into staging, as you know well, staging is where
> > every mm guys ignore so we end up needing another round to promote it. sigh.
> >
> > I hope it gets better after LSF/MM.
>
> If zsmalloc is moving in the direction of supporting only zram,
> why should it be promoted into mm, or even lib? Why not promote
> zram into drivers and put zsmalloc.c in the same directory?
I don't want to make zsmalloc zram specific and will do best effort
to generalize it to all z* familiy. If it is hard to reach out
agreement, yes, forking could be a easy solution like other embedded
product company but I don't want it.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
Hi Seth,
On Tue, Apr 09, 2013 at 03:52:36PM -0500, Seth Jennings wrote:
> On 04/08/2013 08:36 PM, Minchan Kim wrote:
> > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> >> Hi Dan,
> >>
> >> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> >>>> From: Minchan Kim [mailto:[email protected]]
> >>>> Sent: Monday, April 08, 2013 12:01 AM
> >>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
> >>>
> >>> (patch removed)
> >>>
> >>>> Fragment ratio is almost same but memory consumption and compile time
> >>>> is better. I am working to add defragment function of zsmalloc.
> >>>
> >>> Hi Minchan --
> >>>
> >>> I would be very interested in your design thoughts on
> >>> how you plan to add defragmentation for zsmalloc. In
> >>
> >> What I can say now about is only just a word "Compaction".
> >> As you know, zsmalloc has a transparent handle so we can do whatever
> >> under user. Of course, there is a tradeoff between performance
> >> and memory efficiency. I'm biased to latter for embedded usecase.
> >>
> >> And I might post it because as you know well, zsmalloc
> >
> > Incomplete sentense,
> >
> > I might not post it until promoting zsmalloc because as you know well,
> > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > Even if we could add it into staging, as you know well, staging is where
> > every mm guys ignore so we end up needing another round to promote it. sigh.
>
> Yes. The lack of compaction/defragmentation support in zsmalloc has not
> been raised as an obstacle to mainline acceptance so I think we should
> wait to add new features to a yet-to-be accepted codebase.
>
> Also, I think this feature is more important to zram than it is to
> zswap/zcache as they can do writeback to free zpages. In other words,
> the fragmentation is a transient issue for zswap/zcache since writeback
> to the swap device is possible.
Other benefit derived from compaction work is that we can pick a zpage
from zspage and move it into somewhere. It means core mm could control
pages in zsmalloc freely.
>
> Thanks,
> Seth
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
Hi Dan,
On 04/10/2013 04:25 AM, Dan Magenheimer wrote:
>> From: Minchan Kim [mailto:[email protected]]
>> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>>
>> Hi Dan,
>>
>> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
>>>> From: Minchan Kim [mailto:[email protected]]
>>>> Sent: Monday, April 08, 2013 12:01 AM
>>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
>>> (patch removed)
>>>
>>>> Fragment ratio is almost same but memory consumption and compile time
>>>> is better. I am working to add defragment function of zsmalloc.
>>> Hi Minchan --
>>>
>>> I would be very interested in your design thoughts on
>>> how you plan to add defragmentation for zsmalloc. In
>> What I can say now about is only just a word "Compaction".
>> As you know, zsmalloc has a transparent handle so we can do whatever
>> under user. Of course, there is a tradeoff between performance
>> and memory efficiency. I'm biased to latter for embedded usecase.
> Have you designed or implemented this yet? I have a couple
> of concerns:
>
> 1) The handle is transparent to the "user", but it is still a form
> of a "pointer" to a zpage. Are you planning on walking zram's
> tables and changing those pointers? That may be OK for zram
> but for more complex data structures than tables (as in zswap
> and zcache) it may not be as easy, due to races, or as efficient
> because you will have to walk potentially very large trees.
> 2) Compaction in the kernel is heavily dependent on page migration
> and page migration is dependent on using flags in the struct page.
Which flag?
> There's a lot of code in those two code modules and there
> are going to be a lot of implementation differences between
> compacting pages vs compacting zpages.
>
> I'm also wondering if you will be implementing "variable length
> zspages". Without that, I'm not sure compaction will help
> enough. (And that is a good example of the difference between
> the kernel page compaction design/code and zspage compaction.)
>
>>> particular, I am wondering if your design will also
>>> handle the requirements for zcache (especially for
>>> cleancache pages) and perhaps also for ramster.
>> I don't know requirements for cleancache pages but compaction is
>> general as you know well so I expect you can get a benefit from it
>> if you are concern on memory efficiency but not sure it's valuable
>> to compact cleancache pages for getting more slot in RAM.
>> Sometime, just discarding would be much better, IMHO.
> Zcache has page reclaim. Zswap has zpage reclaim. I am
> concerned that these continue to work in the presence of
> compaction. With no reclaim at all, zram is a simpler use
> case but if you implement compaction in a way that can't be
> used by either zcache or zswap, then zsmalloc is essentially
> forking.
I fail to understand "then zsmalloc is essentially forking.", could you
explain more?
>
>>> In https://lkml.org/lkml/2013/3/27/501 I suggested it
>>> would be good to work together on a common design, but
>>> you didn't reply. Are you thinking that zsmalloc
>> I saw the thread but explicit agreement is really matter?
>> I believe everybody want it although they didn't reply. :)
>>
>> You can make the design/post it or prototyping/post it.
>> If there are some conflit with something in my brain,
>> I will be happy to feedback. :)
>>
>> Anyway, I think my above statement "COMPACTION" would be enough to
>> express my current thought to avoid duplicated work and you can catch up.
>>
>> I will get around to it after LSF/MM.
>>
>>> improvements should focus only on zram, in which case
>> Just focusing zsmalloc.
> Right. Again, I am asking if you are changing zsmalloc in
> a way that helps zram but hurts zswap and makes it impossible
> for zcache to ever use the improvements to zsmalloc.
>
> If so, that's fine, but please make it clear that is your goal.
>
>>> we may -- and possibly should -- end up with a different
>>> allocator for frontswap-based/cleancache-based compression
>>> in zcache (and possibly zswap)?
>>> I'm just trying to determine if I should proceed separately
>>> with my design (with Bob Liu, who expressed interest) or if
>>> it would be beneficial to work together.
>> Just posting and if it affects zsmalloc/zram/zswap and goes the way
>> I don't want, I will involve the discussion because our product uses
>> zram heavily and consider zswap, too.
>>
>> I really appreciate your enthusiastic collaboration model to find
>> optimal solution!
> My goal is to have compression be an integral part of Linux
> memory management. It may be tied to a config option, but
> the goal is that distros turn it on by default. I don't think
> zsmalloc meets that objective yet, but it may be fine for
> your needs. If so it would be good to understand exactly why
> it doesn't meet the other zproject needs.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"[email protected]"> [email protected] </a>
Hi Minchan,
On 04/10/2013 08:50 AM, Minchan Kim wrote:
> On Tue, Apr 09, 2013 at 01:25:45PM -0700, Dan Magenheimer wrote:
>>> From: Minchan Kim [mailto:[email protected]]
>>> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>>>
>>> Hi Dan,
>>>
>>> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
>>>>> From: Minchan Kim [mailto:[email protected]]
>>>>> Sent: Monday, April 08, 2013 12:01 AM
>>>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
>>>> (patch removed)
>>>>
>>>>> Fragment ratio is almost same but memory consumption and compile time
>>>>> is better. I am working to add defragment function of zsmalloc.
>>>> Hi Minchan --
>>>>
>>>> I would be very interested in your design thoughts on
>>>> how you plan to add defragmentation for zsmalloc. In
>>> What I can say now about is only just a word "Compaction".
>>> As you know, zsmalloc has a transparent handle so we can do whatever
>>> under user. Of course, there is a tradeoff between performance
>>> and memory efficiency. I'm biased to latter for embedded usecase.
>> Have you designed or implemented this yet? I have a couple
>> of concerns:
> Not yet implemented but just had a time to think about it, simply.
> So surely, there are some obstacle so I want to uncase the code and
> number after I make a prototype/test the performance.
> Of course, if it has a severe problem, will drop it without wasting
> many guys's time.
>
>> 1) The handle is transparent to the "user", but it is still a form
>> of a "pointer" to a zpage. Are you planning on walking zram's
>> tables and changing those pointers? That may be OK for zram
>> but for more complex data structures than tables (as in zswap
>> and zcache) it may not be as easy, due to races, or as efficient
>> because you will have to walk potentially very large trees.
> Rough concept is following as.
>
> I'm considering for zsmalloc to return transparent fake handle
> but we have to maintain it with real one.
> It could be done in zsmalloc internal so there isn't any race we should consider.
>
>
>> 2) Compaction in the kernel is heavily dependent on page migration
>> and page migration is dependent on using flags in the struct page.
>> There's a lot of code in those two code modules and there
>> are going to be a lot of implementation differences between
>> compacting pages vs compacting zpages.
> Compaction of kernel is never related to zsmalloc's one.
>
>> I'm also wondering if you will be implementing "variable length
>> zspages". Without that, I'm not sure compaction will help
>> enough. (And that is a good example of the difference between
> Why do you think so?
> variable lengh zspage could be further step to improve but it's not
> only a solution to solve fragmentation.
>
>> the kernel page compaction design/code and zspage compaction.)
>>>> particular, I am wondering if your design will also
>>>> handle the requirements for zcache (especially for
>>>> cleancache pages) and perhaps also for ramster.
>>> I don't know requirements for cleancache pages but compaction is
>>> general as you know well so I expect you can get a benefit from it
>>> if you are concern on memory efficiency but not sure it's valuable
>>> to compact cleancache pages for getting more slot in RAM.
>>> Sometime, just discarding would be much better, IMHO.
>> Zcache has page reclaim. Zswap has zpage reclaim. I am
>> concerned that these continue to work in the presence of
>> compaction. With no reclaim at all, zram is a simpler use
>> case but if you implement compaction in a way that can't be
>> used by either zcache or zswap, then zsmalloc is essentially
>> forking.
> Don't go too far. If it's really problem for zswap and zcache,
> maybe, we could add it optionally.
>
>>>> In https://lkml.org/lkml/2013/3/27/501 I suggested it
>>>> would be good to work together on a common design, but
>>>> you didn't reply. Are you thinking that zsmalloc
>>> I saw the thread but explicit agreement is really matter?
>>> I believe everybody want it although they didn't reply. :)
>>>
>>> You can make the design/post it or prototyping/post it.
>>> If there are some conflit with something in my brain,
>>> I will be happy to feedback. :)
>>>
>>> Anyway, I think my above statement "COMPACTION" would be enough to
>>> express my current thought to avoid duplicated work and you can catch up.
>>>
>>> I will get around to it after LSF/MM.
>>>
>>>> improvements should focus only on zram, in which case
>>> Just focusing zsmalloc.
>> Right. Again, I am asking if you are changing zsmalloc in
>> a way that helps zram but hurts zswap and makes it impossible
>> for zcache to ever use the improvements to zsmalloc.
> As I said, I'm biased to memory efficiency rather than performace.
> Of course, severe performance drop is disaster but small drop will
> be acceptable for memory-efficiency concerning systems.
>
>> If so, that's fine, but please make it clear that is your goal.
> Simple, help memory hungry system. :)
Which kind of system are memory hungry?
>
>>>> we may -- and possibly should -- end up with a different
>>>> allocator for frontswap-based/cleancache-based compression
>>>> in zcache (and possibly zswap)?
>>>> I'm just trying to determine if I should proceed separately
>>>> with my design (with Bob Liu, who expressed interest) or if
>>>> it would be beneficial to work together.
>>> Just posting and if it affects zsmalloc/zram/zswap and goes the way
>>> I don't want, I will involve the discussion because our product uses
>>> zram heavily and consider zswap, too.
>>>
>>> I really appreciate your enthusiastic collaboration model to find
>>> optimal solution!
>> My goal is to have compression be an integral part of Linux
>> memory management. It may be tied to a config option, but
>> the goal is that distros turn it on by default. I don't think
>> zsmalloc meets that objective yet, but it may be fine for
>> your needs. If so it would be good to understand exactly why
>> it doesn't meet the other zproject needs.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> From: Seth Jennings [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> On 04/08/2013 08:36 PM, Minchan Kim wrote:
> > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> >> Hi Dan,
> >>
> >> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> >>>> From: Minchan Kim [mailto:[email protected]]
> >>>> Sent: Monday, April 08, 2013 12:01 AM
> >>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
> >>>
> >>> (patch removed)
> >>>
> >>>> Fragment ratio is almost same but memory consumption and compile time
> >>>> is better. I am working to add defragment function of zsmalloc.
> >>>
> >>> Hi Minchan --
> >>>
> >>> I would be very interested in your design thoughts on
> >>> how you plan to add defragmentation for zsmalloc. In
> >>
> >> What I can say now about is only just a word "Compaction".
> >> As you know, zsmalloc has a transparent handle so we can do whatever
> >> under user. Of course, there is a tradeoff between performance
> >> and memory efficiency. I'm biased to latter for embedded usecase.
> >>
> >> And I might post it because as you know well, zsmalloc
> >
> > Incomplete sentense,
> >
> > I might not post it until promoting zsmalloc because as you know well,
> > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > Even if we could add it into staging, as you know well, staging is where
> > every mm guys ignore so we end up needing another round to promote it. sigh.
>
> Yes. The lack of compaction/defragmentation support in zsmalloc has not
> been raised as an obstacle to mainline acceptance so I think we should
> wait to add new features to a yet-to-be accepted codebase.
Um, I explicitly raised as an obstacle the greatly reduced density for
zsmalloc on active workloads and on zsize distributions that skew fat.
Understanding that more deeply and hopefully fixing it is an issue,
and compaction/defragmentation is a step in that direction.
> Also, I think this feature is more important to zram than it is to
> zswap/zcache as they can do writeback to free zpages. In other words,
> the fragmentation is a transient issue for zswap/zcache since writeback
> to the swap device is possible.
Actually, I think I demonstrated that the zpage-based writeback in
zswap makes fragmentation worse. Zcache doesn't use zsmalloc
in part because it doesn't support pagframe writeback. If zsmalloc
can fix this (and it may be easier to fix depending on the design
and implementation of compaction/defrag, which is why I'm asking
lots of questions), zcache may be able to make use of zsmalloc.
Lots of good discussion fodder for next week!
Dan
> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> On Tue, Apr 09, 2013 at 01:25:45PM -0700, Dan Magenheimer wrote:
> > > From: Minchan Kim [mailto:[email protected]]
> > > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
> > >
> > > Hi Dan,
> > >
> > > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > > From: Minchan Kim [mailto:[email protected]]
> > > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > > >
> > > > (patch removed)
> > > >
> > > > > Fragment ratio is almost same but memory consumption and compile time
> > > > > is better. I am working to add defragment function of zsmalloc.
> > > >
> > > > Hi Minchan --
> > > >
> > > > I would be very interested in your design thoughts on
> > > > how you plan to add defragmentation for zsmalloc. In
> > >
> > > What I can say now about is only just a word "Compaction".
> > > As you know, zsmalloc has a transparent handle so we can do whatever
> > > under user. Of course, there is a tradeoff between performance
> > > and memory efficiency. I'm biased to latter for embedded usecase.
> >
> > Have you designed or implemented this yet? I have a couple
> > of concerns:
>
> Not yet implemented but just had a time to think about it, simply.
> So surely, there are some obstacle so I want to uncase the code and
> number after I make a prototype/test the performance.
> Of course, if it has a severe problem, will drop it without wasting
> many guys's time.
OK. I have some ideas that may similar or may be very different
than yours. Likely different, since I am coming at it from the
angle of zcache which has some different requirements. So
I'm hoping that by discussing design we can incorporate some
of the zcache requirements before coding.
> > 1) The handle is transparent to the "user", but it is still a form
> > of a "pointer" to a zpage. Are you planning on walking zram's
> > tables and changing those pointers? That may be OK for zram
> > but for more complex data structures than tables (as in zswap
> > and zcache) it may not be as easy, due to races, or as efficient
> > because you will have to walk potentially very large trees.
>
> Rough concept is following as.
>
> I'm considering for zsmalloc to return transparent fake handle
> but we have to maintain it with real one.
> It could be done in zsmalloc internal so there isn't any race we should consider.
That sounds very difficult because I think you will need
an extra level of indirection to translate every fake handle
to every real handle/pointer (like virtual-to-physical page tables).
Or do you have some more clever idea?
> > 2) Compaction in the kernel is heavily dependent on page migration
> > and page migration is dependent on using flags in the struct page.
> > There's a lot of code in those two code modules and there
> > are going to be a lot of implementation differences between
> > compacting pages vs compacting zpages.
>
> Compaction of kernel is never related to zsmalloc's one.
OK. Compaction has certain meaning in the kernel. Defrag
is usually used I think for what we are discussing here.
So I thought you might be planning on doing exactly what
the kernel does that it calls compaction.
> > I'm also wondering if you will be implementing "variable length
> > zspages". Without that, I'm not sure compaction will help
> > enough. (And that is a good example of the difference between
>
> Why do you think so?
> variable lengh zspage could be further step to improve but it's not
> only a solution to solve fragmentation.
In my partial-design-in-my-head, they are related, but I
think I understand what you mean. You are planning to
move zpages across zspage boundaries, and I am not. So
I think your solution will result in better density but
may be harder to implement.
> > > > particular, I am wondering if your design will also
> > > > handle the requirements for zcache (especially for
> > > > cleancache pages) and perhaps also for ramster.
> > >
> > > I don't know requirements for cleancache pages but compaction is
> > > general as you know well so I expect you can get a benefit from it
> > > if you are concern on memory efficiency but not sure it's valuable
> > > to compact cleancache pages for getting more slot in RAM.
> > > Sometime, just discarding would be much better, IMHO.
> >
> > Zcache has page reclaim. Zswap has zpage reclaim. I am
> > concerned that these continue to work in the presence of
> > compaction. With no reclaim at all, zram is a simpler use
> > case but if you implement compaction in a way that can't be
> > used by either zcache or zswap, then zsmalloc is essentially
> > forking.
>
> Don't go too far. If it's really problem for zswap and zcache,
> maybe, we could add it optionally.
Good, I think it should be possible to do it optionally too.
> > > > In https://lkml.org/lkml/2013/3/27/501 I suggested it
> > > > would be good to work together on a common design, but
> > > > you didn't reply. Are you thinking that zsmalloc
> > >
> > > I saw the thread but explicit agreement is really matter?
> > > I believe everybody want it although they didn't reply. :)
> > >
> > > You can make the design/post it or prototyping/post it.
> > > If there are some conflit with something in my brain,
> > > I will be happy to feedback. :)
> > >
> > > Anyway, I think my above statement "COMPACTION" would be enough to
> > > express my current thought to avoid duplicated work and you can catch up.
> > >
> > > I will get around to it after LSF/MM.
> > >
> > > > improvements should focus only on zram, in which case
> > >
> > > Just focusing zsmalloc.
> >
> > Right. Again, I am asking if you are changing zsmalloc in
> > a way that helps zram but hurts zswap and makes it impossible
> > for zcache to ever use the improvements to zsmalloc.
>
> As I said, I'm biased to memory efficiency rather than performace.
> Of course, severe performance drop is disaster but small drop will
> be acceptable for memory-efficiency concerning systems.
> >
> > If so, that's fine, but please make it clear that is your goal.
>
> Simple, help memory hungry system. :)
One major difference I think is that you are focused on systems
where processes often get destroyed by OOMs (e.g. Android-like),
where I am focused on server systems where everything possible
must be done to avoid killed processes. So IMHO writeback and
better integration with the MM system are a requirement. I
think that's a key difference between zram and zcache that
is driving different design decisions.
Dan
> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> On Tue, Apr 09, 2013 at 01:37:47PM -0700, Dan Magenheimer wrote:
> > > From: Minchan Kim [mailto:[email protected]]
> > > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
> > >
> > > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> > > > Hi Dan,
> > > >
> > > > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > > > From: Minchan Kim [mailto:[email protected]]
> > > > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > > > >
> > > > > (patch removed)
> > > > >
> > > > > > Fragment ratio is almost same but memory consumption and compile time
> > > > > > is better. I am working to add defragment function of zsmalloc.
> > > > >
> > > > > Hi Minchan --
> > > > >
> > > > > I would be very interested in your design thoughts on
> > > > > how you plan to add defragmentation for zsmalloc. In
> > > >
> > > > What I can say now about is only just a word "Compaction".
> > > > As you know, zsmalloc has a transparent handle so we can do whatever
> > > > under user. Of course, there is a tradeoff between performance
> > > > and memory efficiency. I'm biased to latter for embedded usecase.
> > > >
> > > > And I might post it because as you know well, zsmalloc
> > >
> > > Incomplete sentense,
> > >
> > > I might not post it until promoting zsmalloc because as you know well,
> > > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > > Even if we could add it into staging, as you know well, staging is where
> > > every mm guys ignore so we end up needing another round to promote it. sigh.
> > >
> > > I hope it gets better after LSF/MM.
> >
> > If zsmalloc is moving in the direction of supporting only zram,
> > why should it be promoted into mm, or even lib? Why not promote
> > zram into drivers and put zsmalloc.c in the same directory?
>
> I don't want to make zsmalloc zram specific and will do best effort
> to generalize it to all z* familiy.
I'm glad to hear that. You may not know/remember that the split between
"old zcache" and "new zcache" (and the fork to zswap) was started
because some people refused to accept changes to zsmalloc to
support a broader set of requirements.
> If it is hard to reach out
> agreement, yes, forking could be a easy solution like other embedded
> product company but I don't want it.
I don't want it either, so I think it is wise for us all to understand
each others' objectives to see if we can avoid a fork. Or if the
objectives are too different, then we have data to explain to other kernel
developers why a fork is necessary.
Thanks!
Dan
> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)
>
> Hi Seth,
>
> On Tue, Apr 09, 2013 at 03:52:36PM -0500, Seth Jennings wrote:
> > On 04/08/2013 08:36 PM, Minchan Kim wrote:
> > > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> > >> Hi Dan,
> > >>
> > >> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > >>>> From: Minchan Kim [mailto:[email protected]]
> > >>>> Sent: Monday, April 08, 2013 12:01 AM
> > >>>> Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > >>>
> > >>> (patch removed)
> > >>>
> > >>>> Fragment ratio is almost same but memory consumption and compile time
> > >>>> is better. I am working to add defragment function of zsmalloc.
> > >>>
> > >>> Hi Minchan --
> > >>>
> > >>> I would be very interested in your design thoughts on
> > >>> how you plan to add defragmentation for zsmalloc. In
> > >>
> > >> What I can say now about is only just a word "Compaction".
> > >> As you know, zsmalloc has a transparent handle so we can do whatever
> > >> under user. Of course, there is a tradeoff between performance
> > >> and memory efficiency. I'm biased to latter for embedded usecase.
> > >>
> > >> And I might post it because as you know well, zsmalloc
> > >
> > > Incomplete sentense,
> > >
> > > I might not post it until promoting zsmalloc because as you know well,
> > > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > > Even if we could add it into staging, as you know well, staging is where
> > > every mm guys ignore so we end up needing another round to promote it. sigh.
> >
> > Yes. The lack of compaction/defragmentation support in zsmalloc has not
> > been raised as an obstacle to mainline acceptance so I think we should
> > wait to add new features to a yet-to-be accepted codebase.
> >
> > Also, I think this feature is more important to zram than it is to
> > zswap/zcache as they can do writeback to free zpages. In other words,
> > the fragmentation is a transient issue for zswap/zcache since writeback
> > to the swap device is possible.
>
> Other benefit derived from compaction work is that we can pick a zpage
> from zspage and move it into somewhere. It means core mm could control
> pages in zsmalloc freely.
I'm not sure I understand which is why I'd like to learn more about
your proposed design. Are you suggesting that core mm would periodically
call zsmalloc-compaction and see what pages get freed? I'm hoping
for more control than that.
More good discussion for next week!
Dan