Hello everybody,
with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the setup
on one of my systems: Only three of my logical volumes are available, systemd
reports:
lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
Followed by a lot of failed dependencies. The setup looks like this:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 953,9G 0 disk
|-sda1 8:1 0 1M 0 part
|-sda2 8:2 0 256M 0 part /boot/efi
|-sda3 8:3 0 7,8G 0 part
| |-vg-iso 254:0 0 4G 0 lvm /srv/iso
| |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
| `-vg-boot 254:2 0 128M 0 lvm /boot
|-sda4 8:4 0 913,9G 0 part
| `-cvg 254:3 0 913,9G 0 crypt
| |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
| |-cvg-root 254:5 0 40G 0 lvm /
| |-cvg-log 254:6 0 1G 0 lvm /var/log
| |-cvg-home 254:7 0 500G 0 lvm /home
| |-cvg-vbox_win7 254:8 0 32G 0 lvm
| |-cvg-vbox_win8 254:9 0 32G 0 lvm
| |-cvg-git 254:10 0 12G 0 lvm /srv/git
| `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
`-sda5 8:5 0 32G 0 part
Another system is just fine, the only difference is a logical volume with
btrfs (cvg-chroots). Possibly the btrfs fixes are involved?
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Hello everybody,
I kind of nailed the issue. Adding CC to Greg for kdbus and to Herbert for
the bad commit. Details are below.
Christian Hesse <[email protected]> on Tue, 2015/06/23 10:14:
> with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the setup
> on one of my systems: Only three of my logical volumes are available,
> systemd reports:
>
> lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
>
> Followed by a lot of failed dependencies. The setup looks like this:
>
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda 8:0 0 953,9G 0 disk
> |-sda1 8:1 0 1M 0 part
> |-sda2 8:2 0 256M 0 part /boot/efi
> |-sda3 8:3 0 7,8G 0 part
> | |-vg-iso 254:0 0 4G 0 lvm /srv/iso
> | |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
> | `-vg-boot 254:2 0 128M 0 lvm /boot
> |-sda4 8:4 0 913,9G 0 part
> | `-cvg 254:3 0 913,9G 0 crypt
> | |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
> | |-cvg-root 254:5 0 40G 0 lvm /
> | |-cvg-log 254:6 0 1G 0 lvm /var/log
> | |-cvg-home 254:7 0 500G 0 lvm /home
> | |-cvg-vbox_win7 254:8 0 32G 0 lvm
> | |-cvg-vbox_win8 254:9 0 32G 0 lvm
> | |-cvg-git 254:10 0 12G 0 lvm /srv/git
> | `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
> `-sda5 8:5 0 32G 0 part
>
> Another system is just fine, the only difference is a logical volume with
> btrfs (cvg-chroots). Possibly the btrfs fixes are involved?
I am running Linux 4.0.x with kdbus from Greg's char-misc tree kdbus branch
[0] merged, last commit is b69af624a0 ("kdbus: optimize if statements in
kdbus_conn_disconnect()"). Everything works fine with Linux 4.0.5 but breaks
with 4.0.6 on one of my systems.
I bisected the problem and found this to be the bad commit [1]:
From cf8befcc1a5538b035d478424efcc2d50e66928e Mon Sep 17 00:00:00 2001
From: Herbert Xu <[email protected]>
Date: Sat, 16 May 2015 21:16:28 +0800
Subject: netlink: Disable insertions/removals during rehash
[ Upstream commit: Not applicable ]
The current rhashtable rehash code is buggy and can't deal with
parallel insertions/removals without corrupting the hash table.
This patch disables it by partially reverting
c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
nl_sk_hash_lock").
I can fix my system by booting with kdbus=0 to disable kdbus or by reverting
this single commit. Looks like anything deadlocks... Any idea?
[0] https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/?h=kdbus
[1]
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf8befcc
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}
Hi
On Thu, Jun 25, 2015 at 9:05 AM, Christian Hesse <[email protected]> wrote:
> Hello everybody,
>
> I kind of nailed the issue. Adding CC to Greg for kdbus and to Herbert for
> the bad commit. Details are below.
>
> Christian Hesse <[email protected]> on Tue, 2015/06/23 10:14:
>> with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the setup
>> on one of my systems: Only three of my logical volumes are available,
>> systemd reports:
>>
>> lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
>>
>> Followed by a lot of failed dependencies. The setup looks like this:
>>
>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda 8:0 0 953,9G 0 disk
>> |-sda1 8:1 0 1M 0 part
>> |-sda2 8:2 0 256M 0 part /boot/efi
>> |-sda3 8:3 0 7,8G 0 part
>> | |-vg-iso 254:0 0 4G 0 lvm /srv/iso
>> | |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
>> | `-vg-boot 254:2 0 128M 0 lvm /boot
>> |-sda4 8:4 0 913,9G 0 part
>> | `-cvg 254:3 0 913,9G 0 crypt
>> | |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
>> | |-cvg-root 254:5 0 40G 0 lvm /
>> | |-cvg-log 254:6 0 1G 0 lvm /var/log
>> | |-cvg-home 254:7 0 500G 0 lvm /home
>> | |-cvg-vbox_win7 254:8 0 32G 0 lvm
>> | |-cvg-vbox_win8 254:9 0 32G 0 lvm
>> | |-cvg-git 254:10 0 12G 0 lvm /srv/git
>> | `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
>> `-sda5 8:5 0 32G 0 part
>>
>> Another system is just fine, the only difference is a logical volume with
>> btrfs (cvg-chroots). Possibly the btrfs fixes are involved?
>
> I am running Linux 4.0.x with kdbus from Greg's char-misc tree kdbus branch
> [0] merged, last commit is b69af624a0 ("kdbus: optimize if statements in
> kdbus_conn_disconnect()"). Everything works fine with Linux 4.0.5 but breaks
> with 4.0.6 on one of my systems.
>
> I bisected the problem and found this to be the bad commit [1]:
>
> From cf8befcc1a5538b035d478424efcc2d50e66928e Mon Sep 17 00:00:00 2001
> From: Herbert Xu <[email protected]>
> Date: Sat, 16 May 2015 21:16:28 +0800
> Subject: netlink: Disable insertions/removals during rehash
>
> [ Upstream commit: Not applicable ]
>
> The current rhashtable rehash code is buggy and can't deal with
> parallel insertions/removals without corrupting the hash table.
>
> This patch disables it by partially reverting
> c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> nl_sk_hash_lock").
>
> I can fix my system by booting with kdbus=0 to disable kdbus or by reverting
> this single commit. Looks like anything deadlocks... Any idea?
Greg's kdbus tree does not work on 4.0. How exactly did you do the
back-merge? You need to revert these patches at least to make it work
on 4.0 (in this order):
kdbus: no need to ref current->mm
kdbus: use rcu to access exe file in metadata
kdbus: pool: use __vfs_read()
Furthermore, we don't support kdbus on 4.0. So if this does not happen
on 4.1, I'd recommend staying with 4.1. It'd still be interesting to
see whether the netlink-locking back-port is indeed broken.
Regardless: It is highly unlikely that the netlink commit and kdbus
are in any way related. Either kdbus triggers some uncommon user-space
path, or you have a borked kdbus-merge.
Thanks
David
David Herrmann <[email protected]> on Thu, 2015/06/25 14:01:
> Hi
>
> On Thu, Jun 25, 2015 at 9:05 AM, Christian Hesse <[email protected]> wrote:
> > Hello everybody,
> >
> > I kind of nailed the issue. Adding CC to Greg for kdbus and to Herbert for
> > the bad commit. Details are below.
> >
> > Christian Hesse <[email protected]> on Tue, 2015/06/23 10:14:
> >> with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the
> >> setup on one of my systems: Only three of my logical volumes are
> >> available, systemd reports:
> >>
> >> lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
> >>
> >> Followed by a lot of failed dependencies. The setup looks like this:
> >>
> >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> >> sda 8:0 0 953,9G 0 disk
> >> |-sda1 8:1 0 1M 0 part
> >> |-sda2 8:2 0 256M 0 part /boot/efi
> >> |-sda3 8:3 0 7,8G 0 part
> >> | |-vg-iso 254:0 0 4G 0 lvm /srv/iso
> >> | |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
> >> | `-vg-boot 254:2 0 128M 0 lvm /boot
> >> |-sda4 8:4 0 913,9G 0 part
> >> | `-cvg 254:3 0 913,9G 0 crypt
> >> | |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
> >> | |-cvg-root 254:5 0 40G 0 lvm /
> >> | |-cvg-log 254:6 0 1G 0 lvm /var/log
> >> | |-cvg-home 254:7 0 500G 0 lvm /home
> >> | |-cvg-vbox_win7 254:8 0 32G 0 lvm
> >> | |-cvg-vbox_win8 254:9 0 32G 0 lvm
> >> | |-cvg-git 254:10 0 12G 0 lvm /srv/git
> >> | `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
> >> `-sda5 8:5 0 32G 0 part
> >>
> >> Another system is just fine, the only difference is a logical volume with
> >> btrfs (cvg-chroots). Possibly the btrfs fixes are involved?
> >
> > I am running Linux 4.0.x with kdbus from Greg's char-misc tree kdbus
> > branch [0] merged, last commit is b69af624a0 ("kdbus: optimize if
> > statements in kdbus_conn_disconnect()"). Everything works fine with Linux
> > 4.0.5 but breaks with 4.0.6 on one of my systems.
> >
> > I bisected the problem and found this to be the bad commit [1]:
> >
> > From cf8befcc1a5538b035d478424efcc2d50e66928e Mon Sep 17 00:00:00 2001
> > From: Herbert Xu <[email protected]>
> > Date: Sat, 16 May 2015 21:16:28 +0800
> > Subject: netlink: Disable insertions/removals during rehash
> >
> > [ Upstream commit: Not applicable ]
> >
> > The current rhashtable rehash code is buggy and can't deal with
> > parallel insertions/removals without corrupting the hash table.
> >
> > This patch disables it by partially reverting
> > c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> > nl_sk_hash_lock").
> >
> > I can fix my system by booting with kdbus=0 to disable kdbus or by
> > reverting this single commit. Looks like anything deadlocks... Any idea?
>
> Greg's kdbus tree does not work on 4.0. How exactly did you do the
> back-merge? You need to revert these patches at least to make it work
> on 4.0 (in this order):
> kdbus: no need to ref current->mm
> kdbus: use rcu to access exe file in metadata
> kdbus: pool: use __vfs_read()
> Furthermore, we don't support kdbus on 4.0. So if this does not happen
> on 4.1, I'd recommend staying with 4.1. It'd still be interesting to
> see whether the netlink-locking back-port is indeed broken.
Probably I borked it with my back-merge then...
Everything else worked perfectly fine, though.
I am on 4.1.0 now, which did not have any issues so far.
> Regardless: It is highly unlikely that the netlink commit and kdbus
> are in any way related. Either kdbus triggers some uncommon user-space
> path, or you have a borked kdbus-merge.
I don't know... Possibly the latter. Let's ignore this as it is
unsupported. ;)
Thanks for your support!
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}