LinuxLists.cc - Re: [PATCH 0/4] promote zcache from staging

2012-08-07 20:35:25

Subject: Re: [PATCH 0/4] promote zcache from staging

On 07/27/2012 01:18 PM, Seth Jennings wrote:
> Some benchmarking numbers demonstrating the I/O saving that can be had
> with zcache:
>
> https://lkml.org/lkml/2012/3/22/383

There was concern that kernel changes external to zcache since v3.3 may
have mitigated the benefit of zcache. So I re-ran my kernel building
benchmark and confirmed that zcache is still providing I/O and runtime
savings.

Gentoo w/ kernel v3.5 (frontswap only, cleancache disabled)
Quad-core i5-2500 @ 3.3GHz
512MB DDR3 1600MHz (limited with mem=512m on boot)
Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
majflt are major page faults reported by the time command
pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
the make -jN

Mind the 512MB RAM vs 1GB in my previous results. This just reduces
the number of threads required to create memory pressure and removes some
of the context switching noise from the results.

I'm also using a single HDD instead of the RAID0 in my previous results.

Each run started with with:
swapoff -a
swapon -a
sync
echo 3 > /proc/sys/vm/drop_caches

I/O (in pages):
normal zcache change
N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O
4 0 2 2116 2118 0 0 2125 2125 0%
8 0 575 2244 2819 4 4 2219 2227 21%
12 2543 4038 3226 9807 1748 2519 3871 8138 17%
16 23926 47278 9426 80630 8252 15598 9372 33222 59%
20 50307 127797 15039 193143 20224 40634 17975 78833 59%

Runtime (in seconds):
N normal zcache %change
4 126 127 -1%
8 124 124 0%
12 131 133 -2%
16 189 156 17%
20 261 235 10%

%CPU utilization (out of 400% on 4 cpus)
N normal zcache %change
4 254 253 0%
8 261 263 -1%
12 250 248 1%
16 173 211 -22%
20 124 140 -13%

There is a sweet spot at 16 threads, where zcache is improving runtime by
17% and reducing I/O by 59% (185MB) using 22% more CPU.

Seth

2012-08-07 21:47:39

by Dan Magenheimer

[permalink] [raw]

Subject: RE: [PATCH 0/4] promote zcache from staging

> From: Seth Jennings [mailto:[email protected]]
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
> > Some benchmarking numbers demonstrating the I/O saving that can be had
> > with zcache:
> >
> > https://lkml.org/lkml/2012/3/22/383
>
> There was concern that kernel changes external to zcache since v3.3 may
> have mitigated the benefit of zcache. So I re-ran my kernel building
> benchmark and confirmed that zcache is still providing I/O and runtime
> savings.

Hi Seth --

Thanks for re-running your tests. I have a couple of concerns and
hope that you, and other interested parties, will read all the
way through my lengthy response.

The zcache issues I have seen in recent kernels arise when zcache
gets "full". I notice your original published benchmarks [1] include
N=24, N=28, and N=32, but these updated results do not. Are you planning
on completing the runs? Second, I now see the numbers I originally
published for what I thought was the same benchmark as yours are actually
an order of magnitude larger (in sec) than yours. I didn't notice
this in March because we were focused on the percent improvement, not
the raw measurements. Since the hardware is highly similar, I suspect
it is not a hardware difference but instead that you are compiling
a much smaller kernel. In other words, your test case is much
smaller, and so exercises zcache much less. My test case compiles
a full enterprise kernel... what is yours doing?

IMHO, any cache in computer science needs to be measured both
when it is not-yet-full and when it is full. The "demo" zcache in
staging works very well before it is full and I think our benchmarking
in March and your re-run benchmarks demonstrate that. At LSFMM, Andrea
Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
capabilities and, when it is full, it simply rejects further attempts
to put data in its cache. He said this is unacceptable for KVM and I
agreed that it was a flaw that needed to be fixed before zcache should
be promoted. When I tested zcache for this, I found that not only was
he right, but that zcache could not be fixed without a major rewrite.

This is one of the "fundamental flaws" of the "demo" zcache, but the new
code base allows for this to be fixed.

A second flaw is that the "demo" zcache has no concept of LRU for
either cleancache or frontswap pages, or ability to reclaim pageframes
at all for frontswap pages. (And for cleancache, pageframe reclaim
is semi-random). As I've noted in other threads, this may be impossible
to implement/fix with zsmalloc, and zsmalloc's author Nitin Gupta has
agreed, but the new code base implements all of this with zbud. One
can argue that LRU is not a requirement for zcache, but a long history
of operating systems theory would suggest otherwise.

A third flaw is that the "demo" version has a very poor policy to
determine what pages are "admitted". The demo policy does take into
account the total RAM in the system, but not current memory load
conditions. The new code base IMHO does a better job but discussion
will be in a refereed presentation at the upcoming Plumber's meeting.
The fix for this flaw might be back-portable to the "demo" version
so is not a showstopper in the "demo" version, but fixing it is
not just a cosmetic fix.

I can add more issues to the list, but will stop here. IMHO
the "demo" zcache is not suitable for promotion from staging,
which is why I spent over two months generating a new code base.
I, perhaps more than anyone else, would like to see zcache used,
by default, by real distros and customers, but I think it is
premature to promote it, especially the old "demo" code.

I do realize, however, that this decision is not mine alone so
defer to the community to decide.

Dan

[1] https://lkml.org/lkml/2012/3/22/383
[2] http://lkml.indiana.edu/hypermail/linux/kernel/1203.2/02842.html

2012-08-08 16:30:06

by Seth Jennings

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On 08/07/2012 04:47 PM, Dan Magenheimer wrote:
> I notice your original published benchmarks [1] include
> N=24, N=28, and N=32, but these updated results do not. Are you planning
> on completing the runs? Second, I now see the numbers I originally
> published for what I thought was the same benchmark as yours are actually
> an order of magnitude larger (in sec) than yours. I didn't notice
> this in March because we were focused on the percent improvement, not
> the raw measurements. Since the hardware is highly similar, I suspect
> it is not a hardware difference but instead that you are compiling
> a much smaller kernel. In other words, your test case is much
> smaller, and so exercises zcache much less. My test case compiles
> a full enterprise kernel... what is yours doing?

I am doing a minimal kernel build for my local hardware
configuration.

With the reduction in RAM, 1GB to 512MB, I didn't need to do
test runs with >20 threads to find the peak of the benefit
curve at 16 threads. Past that, zcache is saturated and I'd
just be burning up my disk. I'm already swapping out about
500MB (i.e. RAM size) in the 20 thread non-zcache case.

Also, I provide the magnitude numbers (pages, seconds) just
to show my source data. The %change numbers are the real
results as they remove build size as a factor.

> At LSFMM, Andrea
> Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
> capabilities and, when it is full, it simply rejects further attempts
> to put data in its cache. He said this is unacceptable for KVM and I
> agreed that it was a flaw that needed to be fixed before zcache should
> be promoted.

KVM (in-tree) is not a current user of zcache. While the
use cases of possible future zcache users should be
considered, I don't think they can be used to prevent promotion.

> A second flaw is that the "demo" zcache has no concept of LRU for
> either cleancache or frontswap pages, or ability to reclaim pageframes
> at all for frontswap pages.
...
>
> A third flaw is that the "demo" version has a very poor policy to
> determine what pages are "admitted".
...
>
> I can add more issues to the list, but will stop here.

All of the flaws you list do not prevent zcache from being
beneficial right now, as my results demonstrate. Therefore,
the flaws listed are really potential improvements and can
be done in mainline after promotion. Even if large changes
are required to make these improvements, they can be made in
mainline in an incremental and public way.

Seth

2012-08-08 17:48:59

by Dan Magenheimer

[permalink] [raw]

Subject: RE: [PATCH 0/4] promote zcache from staging

> From: Seth Jennings [mailto:[email protected]]

Hi Seth --

Good discussion. Even though we disagree, I appreciate
your enthusiasm and your good work on the kernel!

> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/07/2012 04:47 PM, Dan Magenheimer wrote:
> > I notice your original published benchmarks [1] include
> > N=24, N=28, and N=32, but these updated results do not. Are you planning
> > on completing the runs? Second, I now see the numbers I originally
> > published for what I thought was the same benchmark as yours are actually
> > an order of magnitude larger (in sec) than yours. I didn't notice
> > this in March because we were focused on the percent improvement, not
> > the raw measurements. Since the hardware is highly similar, I suspect
> > it is not a hardware difference but instead that you are compiling
> > a much smaller kernel. In other words, your test case is much
> > smaller, and so exercises zcache much less. My test case compiles
> > a full enterprise kernel... what is yours doing?
>
> I am doing a minimal kernel build for my local hardware
> configuration.
>
> With the reduction in RAM, 1GB to 512MB, I didn't need to do
> test runs with >20 threads to find the peak of the benefit
> curve at 16 threads. Past that, zcache is saturated and I'd
> just be burning up my disk.

I think that's exactly what I said in a snippet of my response
that you deleted. A cache needs to work well both when it
is non-full and when it is full. You are only demonstrating
that it works well when it is non-full. When it is
"saturated", bad things can happen. Finding the "peak of the
benefit" is only half the work of benchmarking.

So it appears you are trying to prove your point by showing
the workloads that look good, while _not_ showing the workloads
that look bad, and then claiming you don't care about those
bad workloads anyway.

> Also, I provide the magnitude numbers (pages, seconds) just
> to show my source data. The %change numbers are the real
> results as they remove build size as a factor.

You'll have to explain what you mean because, if I understand
correctly, this is just not true. Different build sizes
definitely affect memory management differently, just as
different values of N (for make -jN) have an effect.

> > At LSFMM, Andrea
> > Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
> > capabilities and, when it is full, it simply rejects further attempts
> > to put data in its cache. He said this is unacceptable for KVM and I
> > agreed that it was a flaw that needed to be fixed before zcache should
> > be promoted.
>
> KVM (in-tree) is not a current user of zcache. While the
> use cases of possible future zcache users should be
> considered, I don't think they can be used to prevent promotion.

That wasn't my point. Andrea identified the flaw as an issue
of zcache.

> > A second flaw is that the "demo" zcache has no concept of LRU for
> > either cleancache or frontswap pages, or ability to reclaim pageframes
> > at all for frontswap pages.
> ...
> >
> > A third flaw is that the "demo" version has a very poor policy to
> > determine what pages are "admitted".
> ...
> >
> > I can add more issues to the list, but will stop here.
>
> All of the flaws you list do not prevent zcache from being
> beneficial right now, as my results demonstrate. Therefore,
> the flaws listed are really potential improvements and can
> be done in mainline after promotion. Even if large changes
> are required to make these improvements, they can be made in
> mainline in an incremental and public way.

Your results only demonstrate that zcache is beneficial on
the workloads that you chose to present. But using the same
workload with slightly different parameters (-jN or compiling
a larger kernel), zcache can be _detrimental_, and you've chosen
to not measure or present those cases, even though you did
measure and present some of those cases in your first benchmark
runs posted in March (on an earlier kernel).

I can only speak for myself, but this appears disingenuous to me.

Sorry, but FWIW my vote is still a NACK. IMHO zcache needs major
work before it should be promoted, and I think we should be spending
the time fixing the known flaws rather than arguing about promoting
"demo" code.

Dan

2012-08-09 18:50:54

by Seth Jennings

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On 08/07/2012 03:23 PM, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
>> Some benchmarking numbers demonstrating the I/O saving that can be had
>> with zcache:
>>
>> https://lkml.org/lkml/2012/3/22/383
>
> There was concern that kernel changes external to zcache since v3.3 may
> have mitigated the benefit of zcache. So I re-ran my kernel building
> benchmark and confirmed that zcache is still providing I/O and runtime
> savings.

There was a request made to test with even greater memory pressure to
demonstrate that, at some unknown point, zcache doesn't have real
problems. So I continued out to 32 threads:

N=4..20 is the same data as before except for the pswpin values.
I found a mistake in the way I computed pswpin that changed those
values slightly. However, this didn't change the overall trend.

I also inverted the %change fields since it is a percent change vs the
normal case.

I/O (in pages)
normal zcache change
N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O
4 0 2 2116 2118 0 0 2125 2125 0%
8 0 575 2244 2819 0 4 2219 2223 -21%
12 1979 4038 3226 9243 1269 2519 3871 7659 -17%
16 21568 47278 9426 78272 7770 15598 9372 32740 -58%
20 50307 127797 15039 193143 20224 40634 17975 78833 -59%
24 186278 364809 45052 596139 47406 90489 30877 168772 -72%
28 274734 777815 53112 1105661 134981 307346 63480 505807 -54%
32 988530 2002087 168662 3159279 324801 723385 140288 1188474 -62%

Runtime (in seconds)
N normal zcache %change
4 126 127 1%
8 124 124 0%
12 131 133 2%
16 189 156 -17%
20 261 235 -10%
24 513 288 -44%
28 556 434 -22%
32 1463 745 -49%

%CPU utilization (out of 400% on 4 cpus)
N normal zcache %change
4 254 253 0%
8 261 263 1%
12 250 248 -1%
16 173 211 22%
20 124 140 13%
24 64 114 78%
28 59 76 29%
32 23 45 96%

The ~60% I/O savings holds even out to 32 threads, at which point the
non-zcache case has 12GB of I/O and is taking 12x longer to complete.
Additionally, the runtime savings increases significantly beyond 20
threads, even though the absolute runtime is suboptimal due to the
extreme memory pressure.

Seth

2012-08-09 20:21:50

by Dan Magenheimer

[permalink] [raw]

Subject: RE: [PATCH 0/4] promote zcache from staging

> From: Seth Jennings [mailto:[email protected]]
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/07/2012 03:23 PM, Seth Jennings wrote:
> > On 07/27/2012 01:18 PM, Seth Jennings wrote:
> >> Some benchmarking numbers demonstrating the I/O saving that can be had
> >> with zcache:
> >>
> >> https://lkml.org/lkml/2012/3/22/383
> >
> > There was concern that kernel changes external to zcache since v3.3 may
> > have mitigated the benefit of zcache. So I re-ran my kernel building
> > benchmark and confirmed that zcache is still providing I/O and runtime
> > savings.
>
> There was a request made to test with even greater memory pressure to
> demonstrate that, at some unknown point, zcache doesn't have real
> problems. So I continued out to 32 threads:

Hi Seth --

Thanks for continuing with running the 24-32 thread benchmarks.

> Runtime (in seconds)
> N normal zcache %change
> 4 126 127 1%

> threads, even though the absolute runtime is suboptimal due to the
> extreme memory pressure.

I am not in a position right now to reproduce your results or
mine (due to a house move which is limiting my time and access
to my test machines, plus two presentations later this month at
Linuxcon NA and Plumbers) but I still don't think you've really
saturated the cache, which is when the extreme memory pressure
issues will show up in zcache. I suspect that adding more threads
to a minimal kernel compile doesn't increase the memory pressure as
much as I was seeing, so you're not seeing what I was seeing:
the zcache number climb to as much as 150% WORSE than non-zcache.
In various experiments trying variations, I have seen four-fold
degradations and worse.

My test case is a kernel compile using a full OL kernel config
file, which is roughly equivalent to a RHEL6 config. Compiling
this kernel, using similar hardware, I have never seen a runtime
less than ~800 seconds for any value of N. I suspect that my
test case, having much more source to compile, causes the N threads
in a "make -jN" each have more work to do, in parallel.

Since your test harness is obviously all set up, would you be
willing to reproduce your/my non-zcache/zcache runs with a RHEL6
config file and publish the results (using a 3.5 zcache)?

IIRC, the really bad zcache results starting showing up at N=24.
I also wonder if you have anything else unusual in your
test setup, such as a fast swap disk (mine is a partition
on the same rotating disk as source and target of the kernel build,
the default install for a RHEL6 system)? Or have you disabled
cleancache? Or have you changed any sysfs parameters or
other kernel files? Also, whether zcache or non-zcache,
I've noticed that the runtime of this workload when swapping
can vary by as much as 30-40%, so it would be wise to take at
least three samples to ensure a statistically valid comparison.
And are you using 512M of physical memory or relying on
kernel boot parameters to reduce visible memory... and
if the latter have you confirmed with /proc/meminfo?
Obviously, I'm baffled at the difference in our observations.

While I am always willing to admit that my numbers may be wrong,
I still can't imagine why you are in such a hurry to promote
zcache when these questions are looming. Would you care to
explain why? It seems reckless to me, and unlike the IBM
behavior I expect, so I really wonder about the motivation.

My goal is very simple: "First do no harm". I don't think
zcache should be enabled for distros (and users) until we can
reasonably demonstrate that running a workload with zcache
is never substantially worse than running the same workload
without zcache. If you can tell your customer: "Yes, always enable
zcache", great! But if you have to tell your customer: "It
depends on the workload, enable it if it works for you, disable
it otherwise", then zcache will get a bad reputation, and
will/should never be enabled in a reputable non-hobbyist distro.
I fear the "demo" zcache will get a bad reputation
so prefer to delay promotion while there is serious doubt
about whether "harm" may occur.

Last, you've never explained what problems zcache solves
for you that zram does not. With Minchan pushing for
the promotion of zram+zsmalloc, does zram solve your problem?
Another alternative might be to promote zcache as "demozcache"
(i.e. fork it for now).

It's hard to identify a reasonable compromise when you
are just saying "Gotta promote zcache NOW!" and not
explaining the problem you are trying to solve or motivations
behind it.

OK, Seth, I think all my cards are on the table. Where's yours?
(And, hello, is anyone else following this anyway? :-)

Thanks,
Dan

2012-08-10 18:15:15

by Seth Jennings

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> I also wonder if you have anything else unusual in your
> test setup, such as a fast swap disk (mine is a partition
> on the same rotating disk as source and target of the kernel build,
> the default install for a RHEL6 system)?

I'm using a normal SATA HDD with two partitions, one for
swap and the other an ext3 filesystem with the kernel source.

> Or have you disabled cleancache?

Yes, I _did_ disable cleancache. I could see where having
cleancache enabled could explain the difference in results.

> Or have you changed any sysfs parameters or
> other kernel files?

No.

> And are you using 512M of physical memory or relying on
> kernel boot parameters to reduce visible memory

Limited with mem=512M boot parameter.

> ... and
> if the latter have you confirmed with /proc/meminfo?

Yes, confirmed.

Seth

2012-08-15 10:42:08

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
>
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
>
> > Or have you disabled cleancache?
>
> Yes, I _did_ disable cleancache. I could see where having
> cleancache enabled could explain the difference in results.

Why did you disable the cleancache? Having both (cleancache
to compress fs data) and frontswap (to compress swap data) is the
goal - while you turned one of its sources off.

2012-08-15 14:33:16

by Seth Jennings

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On 08/15/2012 04:38 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache. I could see where having
>> cleancache enabled could explain the difference in results.
>
> Why did you disable the cleancache? Having both (cleancache
> to compress fs data) and frontswap (to compress swap data) is the
> goal - while you turned one of its sources off.

I excluded cleancache to reduce interference/noise from the
benchmarking results. For this particular workload,
cleancache doesn't make a lot of sense since it will steal
pages that could otherwise be used for storing frontswap
pages to prevent swapin/swapout I/O.

In a test run with both enabled, I found that it didn't make
much difference under moderate to extreme memory pressure.
Both resulted in about 55% I/O reduction. However, on light
memory pressure with 8 and 12 threads, it lowered the I/O
reduction ability of zcache to roughly 0 compared to ~20%
I/O reduction without cleancache.

In short, cleancache only had the power to harm in this
case, so I didn't enable it.

Seth

2012-08-17 22:22:11

by Dan Magenheimer

[permalink] [raw]

Subject: RE: [PATCH 0/4] promote zcache from staging

> From: Seth Jennings [mailto:[email protected]]
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
>
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
>
> > Or have you disabled cleancache?
>
> Yes, I _did_ disable cleancache. I could see where having
> cleancache enabled could explain the difference in results.

Sorry to beat a dead horse, but I meant to report this
earlier in the week and got tied up by other things.

I finally got my test scaffold set up earlier this week
to try to reproduce my "bad" numbers with the RHEL6-ish
config file.

I found that with "make -j28" and "make -j32" I experienced
__DATA CORRUPTION__. This was repeatable.

The type of error led me to believe that the problem was
due to concurrency of cleancache reclaim. I did not try
with cleancache disabled to prove/support this theory
but it is consistent with the fact that you (Seth) have not
seen a similar problem and has disabled cleancache.

While this problem is most likely in my code and I am
suitably chagrined, it re-emphasizes the fact that
the current zcache in staging is 20-month old "demo"
code. The proposed new zcache codebase handles concurrency
much more effectively.

I'll be away from email for a few days now.

Dan

2012-08-17 23:33:35

by Seth Jennings

[permalink] [raw]

Subject: Re: [PATCH 0/4] promote zcache from staging

On 08/17/2012 05:21 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:[email protected]]
>> Subject: Re: [PATCH 0/4] promote zcache from staging
>>
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache. I could see where having
>> cleancache enabled could explain the difference in results.
>
> Sorry to beat a dead horse, but I meant to report this
> earlier in the week and got tied up by other things.
>
> I finally got my test scaffold set up earlier this week
> to try to reproduce my "bad" numbers with the RHEL6-ish
> config file.
>
> I found that with "make -j28" and "make -j32" I experienced
> __DATA CORRUPTION__. This was repeatable.

I actually hit this for the first time a few hours ago when
I was running performance for your rewrite. I didn't know
what to make of it yet. The 24-thread kernel build failed
when both frontswap and cleancache were enabled.

> The type of error led me to believe that the problem was
> due to concurrency of cleancache reclaim. I did not try
> with cleancache disabled to prove/support this theory
> but it is consistent with the fact that you (Seth) have not
> seen a similar problem and has disabled cleancache.
>
> While this problem is most likely in my code and I am
> suitably chagrined, it re-emphasizes the fact that
> the current zcache in staging is 20-month old "demo"
> code. The proposed new zcache codebase handles concurrency
> much more effectively.

I imagine this can be solved without rewriting the entire
codebase. If your new code contains a fix for this, can we
just pull it as a single patch?

Seth

2012-08-18 19:10:12

by Dan Magenheimer

[permalink] [raw]

Subject: RE: [PATCH 0/4] promote zcache from staging

> From: Seth Jennings [mailto:[email protected]]
> Sent: Friday, August 17, 2012 5:33 PM
> To: Dan Magenheimer
> Cc: Greg Kroah-Hartman; Andrew Morton; Nitin Gupta; Minchan Kim; Konrad Wilk; Robert Jennings; linux-
> [email protected]; [email protected]; [email protected]; Kurt Hackel
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> >
> > Sorry to beat a dead horse, but I meant to report this
> > earlier in the week and got tied up by other things.
> >
> > I finally got my test scaffold set up earlier this week
> > to try to reproduce my "bad" numbers with the RHEL6-ish
> > config file.
> >
> > I found that with "make -j28" and "make -j32" I experienced
> > __DATA CORRUPTION__. This was repeatable.
>
> I actually hit this for the first time a few hours ago when
> I was running performance for your rewrite. I didn't know
> what to make of it yet. The 24-thread kernel build failed
> when both frontswap and cleancache were enabled.
>
> > The type of error led me to believe that the problem was
> > due to concurrency of cleancache reclaim. I did not try
> > with cleancache disabled to prove/support this theory
> > but it is consistent with the fact that you (Seth) have not
> > seen a similar problem and has disabled cleancache.
> >
> > While this problem is most likely in my code and I am
> > suitably chagrined, it re-emphasizes the fact that
> > the current zcache in staging is 20-month old "demo"
> > code. The proposed new zcache codebase handles concurrency
> > much more effectively.
>
> I imagine this can be solved without rewriting the entire
> codebase. If your new code contains a fix for this, can we
> just pull it as a single patch?

Hi Seth --

I didn't even observe this before this week, let alone fix this
as an individual bug. The redesign takes into account LRU ordering
and zombie pageframes (which have valid pointers to the contained
zbuds and possibly valid data, so can't be recycled yet),
taking races and concurrency carefully into account.

The demo codebase is pretty dumb about concurrency, really
a hack that seemed to work. Given the above, I guess the
hack only works _most_ of the time... when it doesn't
data corruption can occur.

It would be an interesting challenge, but likely very
time-consuming, to fix this one bug while minimizing other
changes so that the fix could be delivered as a self-contained
incremental patch. I suspect if you try, you will learn why
the rewrite was preferable and necessary.

(Away from email for a few days very soon now.)
Dan