On Sun 31-10-10 23:40:12, Jan Kara wrote:
> On Sun 31-10-10 13:24:37, Jan Kara wrote:
> > On Mon 25-10-10 01:41:48, Jan Engelhardt wrote:
> > > On Sunday 2010-06-27 18:44, Jan Engelhardt wrote:
> > > >On Monday 2010-02-15 16:41, Jan Engelhardt wrote:
> > > >>On Monday 2010-02-15 15:49, Jan Kara wrote:
> > > >>>On Sat 13-02-10 13:58:19, Jan Engelhardt wrote:
> > > >>>> >>
> > > >>>> >> This fixes it by using the passed in page writeback count, instead of
> > > >>>> >> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
> > > >>>> >> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
> > > >>>> >> finish properly even when new pages are being dirted.
> > > >>>> >
> > > >>>> >This seems broken.
> > > >>>>
> > > >>>> It seems so. Jens, Jan Kara, your patch does not entirely fix this.
> > > >>>> While there is no sync/fsync to be seen in these traces, I can
> > > >>>> tell there's a livelock, without Dirty decreasing at all.
> > > >
> > > >What ultimately became of the discussion and/or the patch?
> > > >
> > > >Your original ad-hoc patch certainly still does its job; had no need to
> > > >reboot in 86 days and still counting.
> > >
> > > I still observe this behavior on 2.6.36-rc8. This is starting to
> > > get frustrating, so I will be happily following akpm's advise to
> > > poke people.
> > Yes, that's a good way :)
> >
> > > Thread entrypoint: http://lkml.org/lkml/2010/2/12/41
> > >
> > > Previously, many concurrent extractions of tarballs and so on have been
> > > one way to trigger the issue; I now also have a rather small testcase
> > > (below) that freezes the box here (which has 24G RAM, so even if I'm
> > > lacking to call msync, I should be fine) sometime after memset finishes.
> > I've tried your test but didn't succeed in freezing my laptop.
> > Everything was running smooth, the machine even felt reasonably responsive
> > although constantly reading and writing to disk. Also sync(1) finished in a
> > couple of seconds as one would expect in an optimistic case.
> > Needless to say that my laptop has only 1G of ram so I had to downsize
> > the hash table from 16G to 1G to be able to run the test and the disk is
> > Intel SSD so the performance of the backing storage compared to the amount
> > of needed IO is much in my favor.
> > OK, so I've taken a machine with standard rotational drive and 28GB of
> > ram and there I can see sync(1) hanging (but otherwise the machine looks
> > OK). Investigating further...
> So with the writeback tracing, I verified that indeed the trouble is that
> work queued by sync(1) gets queued behind the background writeback which is
> just running. And background writeback won't stop because your process is
> dirtying pages so agressively. Actually, it would stop after writing
> LONG_MAX pages but that's effectively infinity. I have a patch
> (e.g. http://www.kerneltrap.com/mailarchive/linux-fsdevel/2010/8/3/6886244)
> to stop background writeback when other work is queued but it's kind
> of hacky so I can see why Christoph doesn't like it ;)
> So I'll have to code something different to fix this issue...
OK, so at Kernel Summit we agreed to fix the issue as I originally wanted
by patches
http://marc.info/?l=linux-fsdevel&m=128861735213143&w=2
and
http://marc.info/?l=linux-fsdevel&m=128861734813131&w=2
I needed one more patch to resolve the issue (attached) which I've just
posted for review and possible inclusion. I had a similar one long time ago
but now I'm better able to explain why it works because of tracepoints.
Yay! ;). With those three patches I'm not able to trigger livelocks (but
sync takes still 15 minutes because the througput to disk is about 4MB/s -
no big surprise given the random nature of the load)
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri 05-11-10 22:33:24, Jan Kara wrote:
> I needed one more patch to resolve the issue (attached) which I've just
> posted for review and possible inclusion. I had a similar one long time ago
> but now I'm better able to explain why it works because of tracepoints.
> Yay! ;). With those three patches I'm not able to trigger livelocks (but
> sync takes still 15 minutes because the througput to disk is about 4MB/s -
> no big surprise given the random nature of the load)
PS: And big thanks to you for providing the test case and your
persistence ;)
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, Nov 5, 2010 at 2:34 PM, Jan Kara <[email protected]> wrote:
> On Fri 05-11-10 22:33:24, Jan Kara wrote:
>> ? I needed one more patch to resolve the issue (attached) which I've just
>> posted for review and possible inclusion. I had a similar one long time ago
>> but now I'm better able to explain why it works because of tracepoints.
>> Yay! ;). With those three patches I'm not able to trigger livelocks (but
>> sync takes still 15 minutes because the througput to disk is about 4MB/s -
>> no big surprise given the random nature of the load)
> ?PS: And big thanks to you for providing the test case and your
> persistence ;)
Ok, I'm inclined to just apply these three patches. Can we get a quick
ack/tested-by for them? I'm sure there's more work to be done, but
let's put this damn thing to rest if it really does fix the problem
Jan(E) sees.
Comments?
Linus
On Friday 2010-11-05 22:33, Jan Kara wrote:
> OK, so at Kernel Summit we agreed to fix the issue as I originally wanted
>by patches
>http://marc.info/?l=linux-fsdevel&m=128861735213143&w=2
>and
>http://marc.info/?l=linux-fsdevel&m=128861734813131&w=2
>
> I needed one more patch to resolve the issue (attached) which I've just
>posted for review and possible inclusion.
If you have a branch that has all three and is easy to fetch, I'll
get on it right away.
On Fri 05-11-10 23:03:31, Jan Engelhardt wrote:
>
> On Friday 2010-11-05 22:33, Jan Kara wrote:
> > OK, so at Kernel Summit we agreed to fix the issue as I originally wanted
> >by patches
> >http://marc.info/?l=linux-fsdevel&m=128861735213143&w=2
> >and
> >http://marc.info/?l=linux-fsdevel&m=128861734813131&w=2
> >
> > I needed one more patch to resolve the issue (attached) which I've just
> >posted for review and possible inclusion.
>
> If you have a branch that has all three and is easy to fetch, I'll
> get on it right away.
In my repo:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6.git
I've created branch writeback_fixes which has 2.6.37-rc1 + the three
patches.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR