by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCHv4 00/10] split page table lock for PMD tables

Alex Thorlton wrote:
> > > Sorry for the delay on these results. I hit some strange issues with
> > > running thp_memscale on systems with either of the following
> > > combinations of configuration options set:
> > >
> > > [thp off]
> > > HUGETLBFS=y
> > > HUGETLB_PAGE=y
> > > NUMA_BALANCING=y
> > > NUMA_BALANCING_DEFAULT_ENABLED=y
> > >
> > > [thp on or off]
> > > HUGETLBFS=n
> > > HUGETLB_PAGE=n
> > > NUMA_BALANCING=y
> > > NUMA_BALANCING_DEFAULT_ENABLED=y
> > >
> > > I'm getting segfaults intermittently, as well as some weird RCU sched
> > > errors. This happens in vanilla 3.12-rc2, so it doesn't have anything
> > > to do with your patches, but I thought I'd let you know. There didn't
> > > used to be any issues with this test, so I think there's a subtle kernel
> > > bug here. That's, of course, an entirely separate issue though.
> >
> > I'll take a look next week, if nobody does it before.
>
> I'm starting a bisect now. Not sure how long it'll take, but I'll keep
> you posted.

I don't see the issue. Could you share your kernel config?

--
Kirill A. Shutemov

2013-10-08 21:47:19

by Alex Thorlton

[permalink] [raw]

Subject: Re: [PATCHv4 00/10] split page table lock for PMD tables

On Mon, Oct 07, 2013 at 12:48:20PM +0300, Kirill A. Shutemov wrote:
> Alex Thorlton wrote:
> > > > Sorry for the delay on these results. I hit some strange issues with
> > > > running thp_memscale on systems with either of the following
> > > > combinations of configuration options set:
> > > >
> > > > [thp off]
> > > > HUGETLBFS=y
> > > > HUGETLB_PAGE=y
> > > > NUMA_BALANCING=y
> > > > NUMA_BALANCING_DEFAULT_ENABLED=y
> > > >
> > > > [thp on or off]
> > > > HUGETLBFS=n
> > > > HUGETLB_PAGE=n
> > > > NUMA_BALANCING=y
> > > > NUMA_BALANCING_DEFAULT_ENABLED=y
> > > >
> > > > I'm getting segfaults intermittently, as well as some weird RCU sched
> > > > errors. This happens in vanilla 3.12-rc2, so it doesn't have anything
> > > > to do with your patches, but I thought I'd let you know. There didn't
> > > > used to be any issues with this test, so I think there's a subtle kernel
> > > > bug here. That's, of course, an entirely separate issue though.
> > >
> > > I'll take a look next week, if nobody does it before.
> >
> > I'm starting a bisect now. Not sure how long it'll take, but I'll keep
> > you posted.
>
> I don't see the issue. Could you share your kernel config?

I put my kernel config up on ftp at:

ftp://shell.sgi.com/collect/atconfig/config_bug

I've been investigating the issue today and the smallest run I've seen
the problem on was with 128 threads, so this might not be something that
most people will hit.

With the config I've shared here the problem appears to only be
intermittent at 128 threads. It happened on every run of the test when
I ran it with 512 threads.

Just for something to compare to, here's a config that seems to behave
just fine for any number of threads:

ftp://shell.sgi.com/collect/atconfig/config_good

It looks like this is a problem all the way back to the current 3.8
stable tree. I'm still working on tracing back to a kernel where this
problem doesn't show up.

- Alex