2011-02-01 17:55:00

by Dmitry Torokhov

[permalink] [raw]
Subject: 2.6.38-rc3: FUSE (sshfs) hangs under load

Hi,

After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
misbehave on me under load. It starts off fine but when I try to compile
a few modules against kernel sources residing on the other box the
processes go into 'D' state and just sit there doing nothing.

2.6.37 (plus input patches) works fine. Have not tried rc1 not rc2 nor
bisecting yet.

Anyone has seen something similar?

Thanks!

--
Dmitry


2011-02-02 11:52:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: 2.6.38-rc3: FUSE (sshfs) hangs under load

On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> Hi,
>
> After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> misbehave on me under load. It starts off fine but when I try to compile
> a few modules against kernel sources residing on the other box the
> processes go into 'D' state and just sit there doing nothing.

Can you please post a stack trace from SysRq-T?

Thanks,
Miklos

2011-02-02 16:52:48

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: 2.6.38-rc3: FUSE (sshfs) hangs under load

On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> > Hi,
> >
> > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> > misbehave on me under load. It starts off fine but when I try to compile
> > a few modules against kernel sources residing on the other box the
> > processes go into 'D' state and just sit there doing nothing.
>
> Can you please post a stack trace from SysRq-T?
>

Will do tonight. In the meantime I tried bisecting, but failure is not
always triggered on the first attempt so results are iffy. The log so
far:

# bad: [7d44b0440147d83a65270205b22e7d365de28948] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect start '7d44b0440147d83a65270205b22e7d365de28948' 'v2.6.37'
# bad: [84b7290cca16c61a167c7e1912cd84a479852165] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
git bisect bad 84b7290cca16c61a167c7e1912cd84a479852165
# good: [fea9294c5f2902c45613681ad995ca27899d2016] pch_can: Optimize "if" condition in rx/tx processing
git bisect good fea9294c5f2902c45613681ad995ca27899d2016
# bad: [c96e96354a6c9456cdf1f150eca504e2ea35301e] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
git bisect bad c96e96354a6c9456cdf1f150eca504e2ea35301e
# good: [003ea98195eebdfcf476317b517e8c29a25b9d10] iwlwifi: remove reference to Gen2
git bisect good 003ea98195eebdfcf476317b517e8c29a25b9d10

The last good must have been also bad as sshfs got stuck while I was
installing next bisect step over it.

Thanks.

--
Dmitry

2011-02-03 06:55:54

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: 2.6.38-rc3: FUSE (sshfs) hangs under load

On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
> On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> > > Hi,
> > >
> > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> > > misbehave on me under load. It starts off fine but when I try to compile
> > > a few modules against kernel sources residing on the other box the
> > > processes go into 'D' state and just sit there doing nothing.
> >
> > Can you please post a stack trace from SysRq-T?
> >
>
> Will do tonight. In the meantime I tried bisecting, but failure is not
> always triggered on the first attempt so results are iffy. The log so
> far:
>
> # bad: [7d44b0440147d83a65270205b22e7d365de28948] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
> # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
> git bisect start '7d44b0440147d83a65270205b22e7d365de28948' 'v2.6.37'
> # bad: [84b7290cca16c61a167c7e1912cd84a479852165] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
> git bisect bad 84b7290cca16c61a167c7e1912cd84a479852165
> # good: [fea9294c5f2902c45613681ad995ca27899d2016] pch_can: Optimize "if" condition in rx/tx processing
> git bisect good fea9294c5f2902c45613681ad995ca27899d2016
> # bad: [c96e96354a6c9456cdf1f150eca504e2ea35301e] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
> git bisect bad c96e96354a6c9456cdf1f150eca504e2ea35301e
> # good: [003ea98195eebdfcf476317b517e8c29a25b9d10] iwlwifi: remove reference to Gen2
> git bisect good 003ea98195eebdfcf476317b517e8c29a25b9d10
>
> The last good must have been also bad as sshfs got stuck while I was
> installing next bisect step over it.
>

OK, so here are the stack traces you requested. First one is snapshot of
when compile got stuck, the 2nd one is when I interrupted make which
caused gcc to go to 'D' state.

Thanks.

--
Dmitry


Attachments:
(No filename) (1.97 kB)
taskinfo.log.bz2 (12.43 kB)
Download all attachments

2011-02-03 11:13:28

by Miklos Szeredi

[permalink] [raw]
Subject: Re: 2.6.38-rc3: FUSE (sshfs) hangs under load

On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
> --/9DWx/yDrRhgMJTb
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
>
> On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
> > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> > > > Hi,
> > > >
> > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> > > > misbehave on me under load. It starts off fine but when I try to compile
> > > > a few modules against kernel sources residing on the other box the
> > > > processes go into 'D' state and just sit there doing nothing.
> > >
> > > Can you please post a stack trace from SysRq-T?
> > >
> >
> > Will do tonight. In the meantime I tried bisecting, but failure is not
> > always triggered on the first attempt so results are iffy. The log so
> > far:
> >
> > # bad: [7d44b0440147d83a65270205b22e7d365de28948] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
> > # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
> > git bisect start '7d44b0440147d83a65270205b22e7d365de28948' 'v2.6.37'
> > # bad: [84b7290cca16c61a167c7e1912cd84a479852165] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
> > git bisect bad 84b7290cca16c61a167c7e1912cd84a479852165
> > # good: [fea9294c5f2902c45613681ad995ca27899d2016] pch_can: Optimize "if" condition in rx/tx processing
> > git bisect good fea9294c5f2902c45613681ad995ca27899d2016
> > # bad: [c96e96354a6c9456cdf1f150eca504e2ea35301e] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
> > git bisect bad c96e96354a6c9456cdf1f150eca504e2ea35301e
> > # good: [003ea98195eebdfcf476317b517e8c29a25b9d10] iwlwifi: remove reference to Gen2
> > git bisect good 003ea98195eebdfcf476317b517e8c29a25b9d10
> >
> > The last good must have been also bad as sshfs got stuck while I was
> > installing next bisect step over it.
> >
>
> OK, so here are the stack traces you requested. First one is snapshot of
> when compile got stuck, the 2nd one is when I interrupted make which
> caused gcc to go to 'D' state.

There doesn't appear anything abnormal there.

It's going into D state after it has received an interrupt and sent it
along to the userspace filesystem. Then it will go into
uninterruptible sleep until the answer is received.

So the hang is because the answer to an open request is not being
received. I can't tell where it got stuck, apparently not anywhere on
the local machine.

Can you please get a log from sshfs with "-odebug,sshfs_debug" and
redirect stderr to a file? That might tell a bit more about the
situation. Or it might not...

Thanks,
Miklos

2011-02-03 19:41:26

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: 2.6.38-rc3: FUSE (sshfs) hangs under load

On Thu, Feb 03, 2011 at 12:13:24PM +0100, Miklos Szeredi wrote:
> On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
> > --/9DWx/yDrRhgMJTb
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> >
> > On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
> > > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> > > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> > > > > Hi,
> > > > >
> > > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> > > > > misbehave on me under load. It starts off fine but when I try to compile
> > > > > a few modules against kernel sources residing on the other box the
> > > > > processes go into 'D' state and just sit there doing nothing.
> > > >
> > > > Can you please post a stack trace from SysRq-T?
> > > >
> > >
> > > Will do tonight. In the meantime I tried bisecting, but failure is not
> > > always triggered on the first attempt so results are iffy. The log so
> > > far:
> > >
> > > # bad: [7d44b0440147d83a65270205b22e7d365de28948] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
> > > # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
> > > git bisect start '7d44b0440147d83a65270205b22e7d365de28948' 'v2.6.37'
> > > # bad: [84b7290cca16c61a167c7e1912cd84a479852165] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
> > > git bisect bad 84b7290cca16c61a167c7e1912cd84a479852165
> > > # good: [fea9294c5f2902c45613681ad995ca27899d2016] pch_can: Optimize "if" condition in rx/tx processing
> > > git bisect good fea9294c5f2902c45613681ad995ca27899d2016
> > > # bad: [c96e96354a6c9456cdf1f150eca504e2ea35301e] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
> > > git bisect bad c96e96354a6c9456cdf1f150eca504e2ea35301e
> > > # good: [003ea98195eebdfcf476317b517e8c29a25b9d10] iwlwifi: remove reference to Gen2
> > > git bisect good 003ea98195eebdfcf476317b517e8c29a25b9d10
> > >
> > > The last good must have been also bad as sshfs got stuck while I was
> > > installing next bisect step over it.
> > >
> >
> > OK, so here are the stack traces you requested. First one is snapshot of
> > when compile got stuck, the 2nd one is when I interrupted make which
> > caused gcc to go to 'D' state.
>
> There doesn't appear anything abnormal there.
>
> It's going into D state after it has received an interrupt and sent it
> along to the userspace filesystem. Then it will go into
> uninterruptible sleep until the answer is received.
>
> So the hang is because the answer to an open request is not being
> received. I can't tell where it got stuck, apparently not anywhere on
> the local machine.
>
> Can you please get a log from sshfs with "-odebug,sshfs_debug" and
> redirect stderr to a file? That might tell a bit more about the
> situation. Or it might not...

Hmm, it might be just the network itself, last night mutt in ssh session
froze on me as well. I guess I'll just have to finish my bisect
exercise.

Thanks.

--
Dmitry

2011-02-04 11:41:19

by Felix Fietkau

[permalink] [raw]
Subject: Re: Wireless regression (was 2.6.38-rc3: FUSE (sshfs) hangs under load)

On 2011-02-04 7:49 AM, Dmitry Torokhov wrote:
> On Thu, Feb 03, 2011 at 11:41:15AM -0800, Dmitry Torokhov wrote:
>> On Thu, Feb 03, 2011 at 12:13:24PM +0100, Miklos Szeredi wrote:
>> > On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
>> > > On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
>> > > > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
>> > > > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
>> > > > > > Hi,
>> > > > > >
>> > > > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
>> > > > > > misbehave on me under load. It starts off fine but when I try to compile
>> > > > > > a few modules against kernel sources residing on the other box the
>> > > > > > processes go into 'D' state and just sit there doing nothing.
>> > > > >
>> > > > > Can you please post a stack trace from SysRq-T?
>> > > > >
>> > > >
> ...
>> > >
>> > > OK, so here are the stack traces you requested. First one is snapshot of
>> > > when compile got stuck, the 2nd one is when I interrupted make which
>> > > caused gcc to go to 'D' state.
>> >
>> > There doesn't appear anything abnormal there.
>> >
>> > It's going into D state after it has received an interrupt and sent it
>> > along to the userspace filesystem. Then it will go into
>> > uninterruptible sleep until the answer is received.
>> >
>> > So the hang is because the answer to an open request is not being
>> > received. I can't tell where it got stuck, apparently not anywhere on
>> > the local machine.
>> >
>> > Can you please get a log from sshfs with "-odebug,sshfs_debug" and
>> > redirect stderr to a file? That might tell a bit more about the
>> > situation. Or it might not...
>>
>> Hmm, it might be just the network itself, last night mutt in ssh session
>> froze on me as well. I guess I'll just have to finish my bisect
>> exercise.
>>
>
> I finished bisecting and it turned out that the problematic commit
> happened to be in wireless (I have iwl3945):
>
> commit 4cd06a344db752f513437138953af191cbe9a691
> Author: Felix Fietkau <[email protected]>
> Date: Sat Dec 18 19:30:49 2010 +0100
>
> mac80211: skip unnecessary pskb_expand_head calls
>
> If the skb is not cloned and we don't need any extra headroom, there
> is no point in reallocating the skb head.
>
> Signed-off-by: Felix Fietkau <[email protected]>
> Signed-off-by: John W. Linville <[email protected]>
>
> With this commit reverted from 2.6.38-rc3 I can not reproduce sshfs
> getting stuck here.
I really don't see how this commit could be causing these issues, and
I'm not aware of any similar issues affecting other drivers.

- Felix

2011-02-04 12:05:49

by Felix Fietkau

[permalink] [raw]
Subject: Re: Wireless regression (was 2.6.38-rc3: FUSE (sshfs) hangs under load)

On 2011-02-04 12:41 PM, Felix Fietkau wrote:
> On 2011-02-04 7:49 AM, Dmitry Torokhov wrote:
>> On Thu, Feb 03, 2011 at 11:41:15AM -0800, Dmitry Torokhov wrote:
>>> On Thu, Feb 03, 2011 at 12:13:24PM +0100, Miklos Szeredi wrote:
>>> > On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
>>> > > On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
>>> > > > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
>>> > > > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
>>> > > > > > misbehave on me under load. It starts off fine but when I try to compile
>>> > > > > > a few modules against kernel sources residing on the other box the
>>> > > > > > processes go into 'D' state and just sit there doing nothing.
>>> > > > >
>>> > > > > Can you please post a stack trace from SysRq-T?
>>> > > > >
>>> > > >
>> ...
>>> > >
>>> > > OK, so here are the stack traces you requested. First one is snapshot of
>>> > > when compile got stuck, the 2nd one is when I interrupted make which
>>> > > caused gcc to go to 'D' state.
>>> >
>>> > There doesn't appear anything abnormal there.
>>> >
>>> > It's going into D state after it has received an interrupt and sent it
>>> > along to the userspace filesystem. Then it will go into
>>> > uninterruptible sleep until the answer is received.
>>> >
>>> > So the hang is because the answer to an open request is not being
>>> > received. I can't tell where it got stuck, apparently not anywhere on
>>> > the local machine.
>>> >
>>> > Can you please get a log from sshfs with "-odebug,sshfs_debug" and
>>> > redirect stderr to a file? That might tell a bit more about the
>>> > situation. Or it might not...
>>>
>>> Hmm, it might be just the network itself, last night mutt in ssh session
>>> froze on me as well. I guess I'll just have to finish my bisect
>>> exercise.
>>>
>>
>> I finished bisecting and it turned out that the problematic commit
>> happened to be in wireless (I have iwl3945):
>>
>> commit 4cd06a344db752f513437138953af191cbe9a691
>> Author: Felix Fietkau <[email protected]>
>> Date: Sat Dec 18 19:30:49 2010 +0100
>>
>> mac80211: skip unnecessary pskb_expand_head calls
>>
>> If the skb is not cloned and we don't need any extra headroom, there
>> is no point in reallocating the skb head.
>>
>> Signed-off-by: Felix Fietkau <[email protected]>
>> Signed-off-by: John W. Linville <[email protected]>
>>
>> With this commit reverted from 2.6.38-rc3 I can not reproduce sshfs
>> getting stuck here.
> I really don't see how this commit could be causing these issues, and
> I'm not aware of any similar issues affecting other drivers.
Could you please try this patch to see if it fixes the issue as well?

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index ffc6749..3168eae 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1547,7 +1547,7 @@ static int ieee80211_skb_resize(struct ieee80211_local *local,
skb_orphan(skb);
}

- if (skb_header_cloned(skb))
+ if (skb_cloned(skb))
I802_DEBUG_INC(local->tx_expand_skb_head_cloned);
else if (head_need || tail_need)
I802_DEBUG_INC(local->tx_expand_skb_head);

2011-02-07 08:06:44

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: Wireless regression (was 2.6.38-rc3: FUSE (sshfs) hangs under load)

On Fri, Feb 04, 2011 at 01:05:45PM +0100, Felix Fietkau wrote:
> On 2011-02-04 12:41 PM, Felix Fietkau wrote:
> > On 2011-02-04 7:49 AM, Dmitry Torokhov wrote:
> >> On Thu, Feb 03, 2011 at 11:41:15AM -0800, Dmitry Torokhov wrote:
> >>> On Thu, Feb 03, 2011 at 12:13:24PM +0100, Miklos Szeredi wrote:
> >>> > On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
> >>> > > On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
> >>> > > > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> >>> > > > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> >>> > > > > > Hi,
> >>> > > > > >
> >>> > > > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> >>> > > > > > misbehave on me under load. It starts off fine but when I try to compile
> >>> > > > > > a few modules against kernel sources residing on the other box the
> >>> > > > > > processes go into 'D' state and just sit there doing nothing.
> >>> > > > >
> >>> > > > > Can you please post a stack trace from SysRq-T?
> >>> > > > >
> >>> > > >
> >> ...
> >>> > >
> >>> > > OK, so here are the stack traces you requested. First one is snapshot of
> >>> > > when compile got stuck, the 2nd one is when I interrupted make which
> >>> > > caused gcc to go to 'D' state.
> >>> >
> >>> > There doesn't appear anything abnormal there.
> >>> >
> >>> > It's going into D state after it has received an interrupt and sent it
> >>> > along to the userspace filesystem. Then it will go into
> >>> > uninterruptible sleep until the answer is received.
> >>> >
> >>> > So the hang is because the answer to an open request is not being
> >>> > received. I can't tell where it got stuck, apparently not anywhere on
> >>> > the local machine.
> >>> >
> >>> > Can you please get a log from sshfs with "-odebug,sshfs_debug" and
> >>> > redirect stderr to a file? That might tell a bit more about the
> >>> > situation. Or it might not...
> >>>
> >>> Hmm, it might be just the network itself, last night mutt in ssh session
> >>> froze on me as well. I guess I'll just have to finish my bisect
> >>> exercise.
> >>>
> >>
> >> I finished bisecting and it turned out that the problematic commit
> >> happened to be in wireless (I have iwl3945):
> >>
> >> commit 4cd06a344db752f513437138953af191cbe9a691
> >> Author: Felix Fietkau <[email protected]>
> >> Date: Sat Dec 18 19:30:49 2010 +0100
> >>
> >> mac80211: skip unnecessary pskb_expand_head calls
> >>
> >> If the skb is not cloned and we don't need any extra headroom, there
> >> is no point in reallocating the skb head.
> >>
> >> Signed-off-by: Felix Fietkau <[email protected]>
> >> Signed-off-by: John W. Linville <[email protected]>
> >>
> >> With this commit reverted from 2.6.38-rc3 I can not reproduce sshfs
> >> getting stuck here.
> > I really don't see how this commit could be causing these issues, and
> > I'm not aware of any similar issues affecting other drivers.
> Could you please try this patch to see if it fixes the issue as well?
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index ffc6749..3168eae 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1547,7 +1547,7 @@ static int ieee80211_skb_resize(struct ieee80211_local *local,
> skb_orphan(skb);
> }
>
> - if (skb_header_cloned(skb))
> + if (skb_cloned(skb))
> I802_DEBUG_INC(local->tx_expand_skb_head_cloned);
> else if (head_need || tail_need)
> I802_DEBUG_INC(local->tx_expand_skb_head);

Yes, it does, thank you for fixing it.

--
Dmitry

2011-02-04 06:50:00

by Dmitry Torokhov

[permalink] [raw]
Subject: Wireless regression (was 2.6.38-rc3: FUSE (sshfs) hangs under load)

On Thu, Feb 03, 2011 at 11:41:15AM -0800, Dmitry Torokhov wrote:
> On Thu, Feb 03, 2011 at 12:13:24PM +0100, Miklos Szeredi wrote:
> > On Wed, 2 Feb 2011, Dmitry Torokhov wrote:
> > > On Wed, Feb 02, 2011 at 08:52:36AM -0800, Dmitry Torokhov wrote:
> > > > On Wed, Feb 02, 2011 at 12:52:36PM +0100, Miklos Szeredi wrote:
> > > > > On Tue, 1 Feb 2011, Dmitry Torokhov wrote:
> > > > > > Hi,
> > > > > >
> > > > > > After installing 2.6.38-rc3 (plus a few input patches) sshfs started to
> > > > > > misbehave on me under load. It starts off fine but when I try to compile
> > > > > > a few modules against kernel sources residing on the other box the
> > > > > > processes go into 'D' state and just sit there doing nothing.
> > > > >
> > > > > Can you please post a stack trace from SysRq-T?
> > > > >
> > > >
...
> > >
> > > OK, so here are the stack traces you requested. First one is snapshot of
> > > when compile got stuck, the 2nd one is when I interrupted make which
> > > caused gcc to go to 'D' state.
> >
> > There doesn't appear anything abnormal there.
> >
> > It's going into D state after it has received an interrupt and sent it
> > along to the userspace filesystem. Then it will go into
> > uninterruptible sleep until the answer is received.
> >
> > So the hang is because the answer to an open request is not being
> > received. I can't tell where it got stuck, apparently not anywhere on
> > the local machine.
> >
> > Can you please get a log from sshfs with "-odebug,sshfs_debug" and
> > redirect stderr to a file? That might tell a bit more about the
> > situation. Or it might not...
>
> Hmm, it might be just the network itself, last night mutt in ssh session
> froze on me as well. I guess I'll just have to finish my bisect
> exercise.
>

I finished bisecting and it turned out that the problematic commit
happened to be in wireless (I have iwl3945):

commit 4cd06a344db752f513437138953af191cbe9a691
Author: Felix Fietkau <[email protected]>
Date: Sat Dec 18 19:30:49 2010 +0100

mac80211: skip unnecessary pskb_expand_head calls

If the skb is not cloned and we don't need any extra headroom, there
is no point in reallocating the skb head.

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

With this commit reverted from 2.6.38-rc3 I can not reproduce sshfs
getting stuck here.

Thanks.

--
Dmitry