2008-06-19 18:16:14

by Eric Paris

[permalink] [raw]
Subject: [PATCH] Hugetlb: stop race when reading proc meminfo

minor nit that hugetlb_report_{,node_}meminfo() does not lock the
reading of nr_huge_pages, free_huge_pages, and friends so this is not an
atomic set of information put into the buffer. If /proc/meminfo is read
while the number of hugetlb pages is in flux it is possible to get
incorrect output such as:

HugePages_Total: 7
HugePages_Free: 8
HugePages_Rsvd: 0
Hugepagesize: 4096 kB

(test available at https://bugzilla.redhat.com/attachment.cgi?id=309864)

With the patch we beat on a number of boxes for hours with the above
test and saw no inconsistencies in the meminfo output.

Signed-off-by: Eric Paris <[email protected]>

---

mm/hugetlb.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ab17127..fe074d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -687,7 +687,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,

int hugetlb_report_meminfo(char *buf)
{
- return sprintf(buf,
+ int ret;
+ spin_lock(&hugetlb_lock);
+ ret = sprintf(buf,
"HugePages_Total: %5lu\n"
"HugePages_Free: %5lu\n"
"HugePages_Rsvd: %5lu\n"
@@ -698,17 +700,23 @@ int hugetlb_report_meminfo(char *buf)
resv_huge_pages,
surplus_huge_pages,
HPAGE_SIZE/1024);
+ spin_unlock(&hugetlb_lock);
+ return ret;
}

int hugetlb_report_node_meminfo(int nid, char *buf)
{
- return sprintf(buf,
+ int ret;
+ spin_lock(&hugetlb_lock);
+ ret = sprintf(buf,
"Node %d HugePages_Total: %5u\n"
"Node %d HugePages_Free: %5u\n"
"Node %d HugePages_Surp: %5u\n",
nid, nr_huge_pages_node[nid],
nid, free_huge_pages_node[nid],
nid, surplus_huge_pages_node[nid]);
+ spin_unlock(&hugetlb_lock);
+ return ret;
}

/* Return the number pages of memory we physically have, in PAGE_SIZE units. */


2008-07-02 22:51:55

by Nish Aravamudan

[permalink] [raw]
Subject: Re: [PATCH] Hugetlb: stop race when reading proc meminfo

On 6/19/08, Eric Paris <[email protected]> wrote:
> minor nit that hugetlb_report_{,node_}meminfo() does not lock the
> reading of nr_huge_pages, free_huge_pages, and friends so this is not an
> atomic set of information put into the buffer. If /proc/meminfo is read
> while the number of hugetlb pages is in flux it is possible to get
> incorrect output such as:
>
> HugePages_Total: 7
> HugePages_Free: 8
> HugePages_Rsvd: 0
> Hugepagesize: 4096 kB
>
> (test available at https://bugzilla.redhat.com/attachment.cgi?id=309864)
>
> With the patch we beat on a number of boxes for hours with the above
> test and saw no inconsistencies in the meminfo output.
>
> Signed-off-by: Eric Paris <[email protected]>

While I understand the spirit of this patch, I'm not sure it's really
necessary. We've never bothered locking the output in the proc file
before because the information is inherently racy. It only gives you a
view of what was...if any application tries to make a decision based
upon that information (unless it has global control of the system, or
is a testcase, arguably), then it is already racy -- that is, the
numbers you read at one point in time have no bearing on whether those
pages will actually be free/available, etc when you actually need
them.

That being said, I don't feel to strongly about it. Just seems like it
might be unnecessary for an interface we know not to be fully accurate
to begin with, but maybe consistency is worth a bit of extra locking.

That actually also leads me to wonder if maybe the locking is not
there currently to avoid letting readers of proc files from blocking
users of hugepages?

Thanks,
Nish

2008-07-03 14:14:46

by Adam Litke

[permalink] [raw]
Subject: Re: [PATCH] Hugetlb: stop race when reading proc meminfo

I agree with Nish on both of his points. While it is slightly
inconvenient for /proc/meminfo to occasionally display hugetlb pool
counters that, together, don't make sense, it's worth living with for
the two reasons he stated. These counters are for informational
purposes only and relying upon them will already get you into trouble
in many other ways. We also don't want to give any user the privilege
to mount a denial of service attack on hugetlb applications.

On Wed, Jul 2, 2008 at 5:51 PM, Nish Aravamudan
<[email protected]> wrote:
> On 6/19/08, Eric Paris <[email protected]> wrote:
>> minor nit that hugetlb_report_{,node_}meminfo() does not lock the
>> reading of nr_huge_pages, free_huge_pages, and friends so this is not an
>> atomic set of information put into the buffer. If /proc/meminfo is read
>> while the number of hugetlb pages is in flux it is possible to get
>> incorrect output such as:
>>
>> HugePages_Total: 7
>> HugePages_Free: 8
>> HugePages_Rsvd: 0
>> Hugepagesize: 4096 kB
>>
>> (test available at https://bugzilla.redhat.com/attachment.cgi?id=309864)
>>
>> With the patch we beat on a number of boxes for hours with the above
>> test and saw no inconsistencies in the meminfo output.
>>
>> Signed-off-by: Eric Paris <[email protected]>
>
> While I understand the spirit of this patch, I'm not sure it's really
> necessary. We've never bothered locking the output in the proc file
> before because the information is inherently racy. It only gives you a
> view of what was...if any application tries to make a decision based
> upon that information (unless it has global control of the system, or
> is a testcase, arguably), then it is already racy -- that is, the
> numbers you read at one point in time have no bearing on whether those
> pages will actually be free/available, etc when you actually need
> them.
>
> That being said, I don't feel to strongly about it. Just seems like it
> might be unnecessary for an interface we know not to be fully accurate
> to begin with, but maybe consistency is worth a bit of extra locking.
>
> That actually also leads me to wonder if maybe the locking is not
> there currently to avoid letting readers of proc files from blocking
> users of hugepages?
>
> Thanks,
> Nish
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>



--
--
Adam Litke ( agl at us.ibm.com )
IBM Linux Technology Center

2008-07-08 13:54:43

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH] Hugetlb: stop race when reading proc meminfo

On Thu, 2008-07-03 at 09:13 -0500, Adam Litke wrote:
> I agree with Nish on both of his points. While it is slightly
> inconvenient for /proc/meminfo to occasionally display hugetlb pool
> counters that, together, don't make sense, it's worth living with for
> the two reasons he stated. These counters are for informational
> purposes only and relying upon them will already get you into trouble
> in many other ways. We also don't want to give any user the privilege
> to mount a denial of service attack on hugetlb applications.

I'm fine with this assessment. Andrew feel free to throw this out of
-mm and forget about it forever.

-Eric


>
> On Wed, Jul 2, 2008 at 5:51 PM, Nish Aravamudan
> <[email protected]> wrote:
> > On 6/19/08, Eric Paris <[email protected]> wrote:
> >> minor nit that hugetlb_report_{,node_}meminfo() does not lock the
> >> reading of nr_huge_pages, free_huge_pages, and friends so this is not an
> >> atomic set of information put into the buffer. If /proc/meminfo is read
> >> while the number of hugetlb pages is in flux it is possible to get
> >> incorrect output such as:
> >>
> >> HugePages_Total: 7
> >> HugePages_Free: 8
> >> HugePages_Rsvd: 0
> >> Hugepagesize: 4096 kB
> >>
> >> (test available at https://bugzilla.redhat.com/attachment.cgi?id=309864)
> >>
> >> With the patch we beat on a number of boxes for hours with the above
> >> test and saw no inconsistencies in the meminfo output.
> >>
> >> Signed-off-by: Eric Paris <[email protected]>
> >
> > While I understand the spirit of this patch, I'm not sure it's really
> > necessary. We've never bothered locking the output in the proc file
> > before because the information is inherently racy. It only gives you a
> > view of what was...if any application tries to make a decision based
> > upon that information (unless it has global control of the system, or
> > is a testcase, arguably), then it is already racy -- that is, the
> > numbers you read at one point in time have no bearing on whether those
> > pages will actually be free/available, etc when you actually need
> > them.
> >
> > That being said, I don't feel to strongly about it. Just seems like it
> > might be unnecessary for an interface we know not to be fully accurate
> > to begin with, but maybe consistency is worth a bit of extra locking.
> >
> > That actually also leads me to wonder if maybe the locking is not
> > there currently to avoid letting readers of proc files from blocking
> > users of hugepages?
> >
> > Thanks,
> > Nish
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
>