I missed Hugh.
Minchan Kim wrote:
> Long time ago, we counted zero page as file_rss.
> But after reinstanted zero page, we don't do it.
> It means rss of process would be smaller than old.
>
> It could chage OOM victim selection.
>
> Kame reported following as
> "Before starting zero-page works, I checked "questions" in lkml and
> found some reports that some applications start to go OOM after zero-page
> removal.
>
> For me, I know one of my customer's application depends on behavior of
> zero page (on RHEL5). So, I tried to add again it before RHEL6 because
> I think removal of zero-page corrupts compatibility."
>
> So how about adding zero page as file_rss again for compatibility?
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/memory.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 3743fb5..a4ba271 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1995,6 +1995,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> int reuse = 0, ret = 0;
> int page_mkwrite = 0;
> struct page *dirty_page = NULL;
> + int zero_pfn = 0;
>
> old_page = vm_normal_page(vma, address, orig_pte);
> if (!old_page) {
> @@ -2117,7 +2118,8 @@ gotten:
> if (unlikely(anon_vma_prepare(vma)))
> goto oom;
>
> - if (is_zero_pfn(pte_pfn(orig_pte))) {
> + zero_pfn = is_zero_pfn(pte_pfn(orig_pte));
> + if (zero_pfn) {
> new_page = alloc_zeroed_user_highpage_movable(vma, address);
> if (!new_page)
> goto oom;
> @@ -2147,7 +2149,7 @@ gotten:
> */
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> if (likely(pte_same(*page_table, orig_pte))) {
> - if (old_page) {
> + if (old_page || zero_pfn) {
> if (!PageAnon(old_page)) {
> dec_mm_counter(mm, file_rss);
> inc_mm_counter(mm, anon_rss);
> @@ -2650,6 +2652,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> spin_lock(ptl);
> if (!pte_none(*page_table))
> goto unlock;
> + inc_mm_counter(mm, file_rss);
> goto setpte;
> }
>
On Mon, 28 Dec 2009, Minchan Kim wrote:
> I missed Hugh.
Thank you: it is sweet of you to say so :)
>
> Minchan Kim wrote:
> > Long time ago, we counted zero page as file_rss.
> > But after reinstanted zero page, we don't do it.
> > It means rss of process would be smaller than old.
> >
> > It could chage OOM victim selection.
Eh? We don't use rss for OOM victim selection, we use total_vm.
I know that's under discussion, and there are good arguments on
both sides (I incline to the rss side, but see David's point about
predictability); but here you seem to be making an argument for
back-compatibility, yet there is no such issue in OOM victim selection.
And if we do decide that rss is appropriate for OOM victim selection,
then we would prefer to keep the ZERO_PAGE out of rss, wouldn't we?
> >
> > Kame reported following as
> > "Before starting zero-page works, I checked "questions" in lkml and
> > found some reports that some applications start to go OOM after zero-page
> > removal.
> >
> > For me, I know one of my customer's application depends on behavior of
> > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
> > I think removal of zero-page corrupts compatibility."
> >
> > So how about adding zero page as file_rss again for compatibility?
I think not.
KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
wrong, but I don't think his customer's OOMs had anything to do with
whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
from the fact that many pages were being used up where just the one
ZERO_PAGE had been good before. Wouldn't he have complained if the
zero_pfn patches hadn't solved that problem?
You are right that I completely overlooked the issue of whether to
include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
it was just natural to leave it out); and I overlooked the fact that
it used to be counted into file_rss in the old days (being !PageAnon).
So I'm certainly at fault for that, and thank you for bringing the
issue to attention; but once considered, I can't actually see a good
reason why we should add code to count ZERO_PAGEs into file_rss now.
And if this patch falls, then 1/3 and 3/3 would fall also.
And the patch below would be incomplete anyway, wouldn't it?
There would need to be a matching change to zap_pte_range(),
but I don't see that.
We really don't want to be adding more and more ZERO_PAGE/zero_pfn
tests around the place if we can avoid them: KOSAKI-san has a strong
argument for adding such a test in kernel/futex.c, but I don't the
argument here.
Hugh
> >
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > mm/memory.c | 7 +++++--
> > 1 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 3743fb5..a4ba271 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1995,6 +1995,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > int reuse = 0, ret = 0;
> > int page_mkwrite = 0;
> > struct page *dirty_page = NULL;
> > + int zero_pfn = 0;
> >
> > old_page = vm_normal_page(vma, address, orig_pte);
> > if (!old_page) {
> > @@ -2117,7 +2118,8 @@ gotten:
> > if (unlikely(anon_vma_prepare(vma)))
> > goto oom;
> >
> > - if (is_zero_pfn(pte_pfn(orig_pte))) {
> > + zero_pfn = is_zero_pfn(pte_pfn(orig_pte));
> > + if (zero_pfn) {
> > new_page = alloc_zeroed_user_highpage_movable(vma, address);
> > if (!new_page)
> > goto oom;
> > @@ -2147,7 +2149,7 @@ gotten:
> > */
> > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > if (likely(pte_same(*page_table, orig_pte))) {
> > - if (old_page) {
> > + if (old_page || zero_pfn) {
> > if (!PageAnon(old_page)) {
> > dec_mm_counter(mm, file_rss);
> > inc_mm_counter(mm, anon_rss);
> > @@ -2650,6 +2652,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > spin_lock(ptl);
> > if (!pte_none(*page_table))
> > goto unlock;
> > + inc_mm_counter(mm, file_rss);
> > goto setpte;
> > }
> >
On Thu, Dec 31, 2009 at 1:49 AM, Hugh Dickins
<[email protected]> wrote:
> On Mon, 28 Dec 2009, Minchan Kim wrote:
>> I missed Hugh.
>
> Thank you: it is sweet of you to say so :)
>
>>
>> Minchan Kim wrote:
>> > Long time ago, we counted zero page as file_rss.
>> > But after reinstanted zero page, we don't do it.
>> > It means rss of process would be smaller than old.
>> >
>> > It could chage OOM victim selection.
>
> Eh? We don't use rss for OOM victim selection, we use total_vm.
>
> I know that's under discussion, and there are good arguments on
> both sides (I incline to the rss side, but see David's point about
> predictability); but here you seem to be making an argument for
> back-compatibility, yet there is no such issue in OOM victim selection.
Sorry, I totally confused that.
>
> And if we do decide that rss is appropriate for OOM victim selection,
> then we would prefer to keep the ZERO_PAGE out of rss, wouldn't we?
If we start to use RSS, it's good that keep zero page out of rss in OOM aspect.
But I am not sure it's good in smap aspect.
Some smap user might want to know max memory usage in process.
Zero page has a possibility to change real rss.
>
>> >
>> > Kame reported following as
>> > "Before starting zero-page works, I checked "questions" in lkml and
>> > found some reports that some applications start to go OOM after zero-page
>> > removal.
>> >
>> > For me, I know one of my customer's application depends on behavior of
>> > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
>> > I think removal of zero-page corrupts compatibility."
>> >
>> > So how about adding zero page as file_rss again for compatibility?
>
> I think not.
At least, we had accounted zero page as file_rss until remove zero page.
That was my concern.
I think we have to fix this for above compatibility regardless of OOM issue.
>
> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
> wrong, but I don't think his customer's OOMs had anything to do with
> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
> from the fact that many pages were being used up where just the one
> ZERO_PAGE had been good before. Wouldn't he have complained if the
> zero_pfn patches hadn't solved that problem?
>
> You are right that I completely overlooked the issue of whether to
> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
> it was just natural to leave it out); and I overlooked the fact that
> it used to be counted into file_rss in the old days (being !PageAnon).
>
> So I'm certainly at fault for that, and thank you for bringing the
> issue to attention; but once considered, I can't actually see a good
> reason why we should add code to count ZERO_PAGEs into file_rss now.
> And if this patch falls, then 1/3 and 3/3 would fall also.
>
> And the patch below would be incomplete anyway, wouldn't it?
> There would need to be a matching change to zap_pte_range(),
> but I don't see that.
Thanks.
If we think this patch is need, I will repost path with fix it.
What do you think?
--
Kind regards,
Minchan Kim
On Thu, 31 Dec 2009, Minchan Kim wrote:
> On Thu, Dec 31, 2009 at 1:49 AM, Hugh Dickins
> <[email protected]> wrote:
> >
> > You are right that I completely overlooked the issue of whether to
> > include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
> > it was just natural to leave it out); and I overlooked the fact that
> > it used to be counted into file_rss in the old days (being !PageAnon).
> >
> > So I'm certainly at fault for that, and thank you for bringing the
> > issue to attention; but once considered, I can't actually see a good
> > reason why we should add code to count ZERO_PAGEs into file_rss now.
> > And if this patch falls, then 1/3 and 3/3 would fall also.
> >
> > And the patch below would be incomplete anyway, wouldn't it?
> > There would need to be a matching change to zap_pte_range(),
> > but I don't see that.
>
> Thanks.
> If we think this patch is need, I will repost path with fix it.
>
> What do you think?
If someone comes up with a convincing case in which their system
is behaving significantly worse because the zero page is not being
counted in rss now, then we shall probably want your patch.
But I still don't yet see a reason to add code just to keep the
anon_rss+file_rss number looking the same as it was before, in this
exceptional case where there's a significant number of zero pages.
Hugh
On Wed, 30 Dec 2009 16:49:52 +0000 (GMT)
Hugh Dickins <[email protected]> wrote:
> > >
> > > Kame reported following as
> > > "Before starting zero-page works, I checked "questions" in lkml and
> > > found some reports that some applications start to go OOM after zero-page
> > > removal.
> > >
> > > For me, I know one of my customer's application depends on behavior of
> > > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
> > > I think removal of zero-page corrupts compatibility."
> > >
> > > So how about adding zero page as file_rss again for compatibility?
>
> I think not.
>
> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
> wrong, but I don't think his customer's OOMs had anything to do with
> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
> from the fact that many pages were being used up where just the one
> ZERO_PAGE had been good before. Wouldn't he have complained if the
> zero_pfn patches hadn't solved that problem?
>
> You are right that I completely overlooked the issue of whether to
> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
> it was just natural to leave it out); and I overlooked the fact that
> it used to be counted into file_rss in the old days (being !PageAnon).
>
> So I'm certainly at fault for that, and thank you for bringing the
> issue to attention; but once considered, I can't actually see a good
> reason why we should add code to count ZERO_PAGEs into file_rss now.
> And if this patch falls, then 1/3 and 3/3 would fall also.
>
> And the patch below would be incomplete anyway, wouldn't it?
> There would need to be a matching change to zap_pte_range(),
> but I don't see that.
>
> We really don't want to be adding more and more ZERO_PAGE/zero_pfn
> tests around the place if we can avoid them: KOSAKI-san has a strong
> argument for adding such a test in kernel/futex.c, but I don't the
> argument here.
>
I agree that ZERO_PAGE shouldn't be counted as rss. Now, I feel that old
counting method(in old zero-page implementation) was bad.
Minchan-san, I'm sorry for noise.
Thanks,
-Kame
On Mon, Jan 4, 2010 at 8:43 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 30 Dec 2009 16:49:52 +0000 (GMT)
> Hugh Dickins <[email protected]> wrote:
>
>> > >
>> > > Kame reported following as
>> > > "Before starting zero-page works, I checked "questions" in lkml and
>> > > found some reports that some applications start to go OOM after zero-page
>> > > removal.
>> > >
>> > > For me, I know one of my customer's application depends on behavior of
>> > > zero page (on RHEL5). So, I tried to add again it before RHEL6 because
>> > > I think removal of zero-page corrupts compatibility."
>> > >
>> > > So how about adding zero page as file_rss again for compatibility?
>>
>> I think not.
>>
>> KAMEZAWA-san can correct me (when he returns in the New Year) if I'm
>> wrong, but I don't think his customer's OOMs had anything to do with
>> whether the ZERO_PAGE was counted in file_rss or not: the OOMs came
>> from the fact that many pages were being used up where just the one
>> ZERO_PAGE had been good before. Wouldn't he have complained if the
>> zero_pfn patches hadn't solved that problem?
>>
>> You are right that I completely overlooked the issue of whether to
>> include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
>> it was just natural to leave it out); and I overlooked the fact that
>> it used to be counted into file_rss in the old days (being !PageAnon).
>>
>> So I'm certainly at fault for that, and thank you for bringing the
>> issue to attention; but once considered, I can't actually see a good
>> reason why we should add code to count ZERO_PAGEs into file_rss now.
>> And if this patch falls, then 1/3 and 3/3 would fall also.
>>
>> And the patch below would be incomplete anyway, wouldn't it?
>> There would need to be a matching change to zap_pte_range(),
>> but I don't see that.
>>
>> We really don't want to be adding more and more ZERO_PAGE/zero_pfn
>> tests around the place if we can avoid them: KOSAKI-san has a strong
>> argument for adding such a test in kernel/futex.c, but I don't the
>> argument here.
>>
>
> I agree that ZERO_PAGE shouldn't be counted as rss. Now, I feel that old
> counting method(in old zero-page implementation) was bad.
>
> Minchan-san, I'm sorry for noise.
That's all right.
It was my mistake.
I will drop this and repost Matt and Hugh's ACK version.
Thanks for all. :)
>
> Thanks,
> -Kame
>
>
>
>
--
Kind regards,
Minchan Kim
On Thu, Dec 31, 2009 at 5:43 PM, Hugh Dickins
<[email protected]> wrote:
> On Thu, 31 Dec 2009, Minchan Kim wrote:
>> On Thu, Dec 31, 2009 at 1:49 AM, Hugh Dickins
>> <[email protected]> wrote:
>> >
>> > You are right that I completely overlooked the issue of whether to
>> > include the ZERO_PAGE in rss counts (now being a !vm_normal_page,
>> > it was just natural to leave it out); and I overlooked the fact that
>> > it used to be counted into file_rss in the old days (being !PageAnon).
>> >
>> > So I'm certainly at fault for that, and thank you for bringing the
>> > issue to attention; but once considered, I can't actually see a good
>> > reason why we should add code to count ZERO_PAGEs into file_rss now.
>> > And if this patch falls, then 1/3 and 3/3 would fall also.
>> >
>> > And the patch below would be incomplete anyway, wouldn't it?
>> > There would need to be a matching change to zap_pte_range(),
>> > but I don't see that.
>>
>> Thanks.
>> If we think this patch is need, I will repost path with fix it.
>>
>> What do you think?
>
> If someone comes up with a convincing case in which their system
> is behaving significantly worse because the zero page is not being
> counted in rss now, then we shall probably want your patch.
> But I still don't yet see a reason to add code just to keep the
> anon_rss+file_rss number looking the same as it was before, in this
> exceptional case where there's a significant number of zero pages.
Okay. I understand your point.
I will repost this patch when we need this. :)
Thanks for careful review, Hugh.
>
> Hugh
>
--
Kind regards,
Minchan Kim