2015-07-05 20:22:57

by Linus Torvalds

[permalink] [raw]
Subject: Linux 4.2-rc1

It's Sunday, two weeks have passed, and the merge window is closed. I
just pushed out the tag to the git trees, and tar-balls and patches
should be mirroring out too.

I thought this release would be one of the biggest ones ever, but it
turns out that it will depend on how you count. Just counting pure
commits, it is indeed one of the bigger rc1's in recent history, but
3.10-rc1 was almost as big, and then the final 3.10 grew from that
more than most. I doubt we'll match the 3.10 release, since we have
been getting progressively better at *not* merging tons of stuff after
-rc1.

And it turns out v3.15-rc1 had more commits than 4.2-rc1 does (by a
hair), so even there this isn't the biggest rc1 ever, if you count the
number of commits.

But it's certainly up there with the best of them. It's much too big
to post the shortlog, so as usual for rc1, appended is just my
"mergelog", with the people who are credited being the people I merge
from, which is usually not necessarily at all the same thing as the
people who actually authored the code. You'll need to go look at the
details in the git tree for that.

However, if you count the size in pure number of lines changed, this
really seems to be the biggest rc we've ever had, with over a million
lines added (and about a quarter million removed). That beats the
previous champion (3.11-rc1) that was huge mainly due to Lustre being
added to the staging tree.

The reason for that huge number of lines is largely a single source:
the bulk of this by far is from the new amd gpu register description
headers. In fact, just those register descriptor headers alone are
about 41% of the entire patch. The rest of the new amdgpu driver
itself is another 8% of the total, so we're in the somewhat odd
situation where a single driver is about half of the whole rc1 in
number of lines.

Aside from that unusual anomaly, the rest looks fairly normal - mainly
drivers and architecture updates. The Renesas H8/300 architecture came
back in a newly cleaned-up form, so we have some new(ish) architecture
support, but that's tiny and the bulk is ARM (with x86 a distant
second). Interestingly, there was quite a bit of low-level x86
changes: both source code re-organization for x86 entry code and lots
of FPU handling cleanups. That's fairly unusual, with low-level x86
code being fairly stable and seldom seeing those kinds of big changes.

Outside of the "drivers and architectures", there's a fair amount of
filesystem stuff, including some fundamental changes and cleanups to
symlink handling by Al. And all the usual updates to various
filesystems, networking, crypto, tools, testing, you name it.

Linus

---

Al Viro (2):
vfs updates
more vfs updates

Alex Deucher (1):
radeon and amdgpu fixes

Alex Williamson (1):
VFIO updates

Alexandre Belloni (1):
RTC updates

Andrew Morton (3):
first patchbomb
second patchbomb
third patchbomb

Bjorn Helgaas (1):
PCI updates

Bob Peterson (1):
GFS2 updates

Borislav Petkov (2):
EDAC updates
EDAC fix

Brian Norris (1):
MTD updates

Bruce Fields (1):
nfsd updates

Bryan Wu (1):
LED subsystem updates

Catalin Marinas (2):
arm64 updates
arm64 fixes (and cleanups)

Chris Mason (1):
btrfs updates

Chris Metcalf (1):
arch/tile updates

Dan Williams (1):
libnvdimm subsystem

Daniel Vetter (1):
drm EDID fix

Darren Hart (2):
x86 platform driver updates
late x86 platform driver updates

Dave Airlie (1):
drm updates

David Miller (3):
networking updates
sparc fixes
networking fixes

David Vrabel (1):
xen updates

Dmitry Torokhov (2):
input subsystem updates
second round of input updates

Dominik Brodowski (1):
PCMCIA update

Doug Ledford (1):
rdma updates

Eric Biederman (1):
user namespace updates

Geert Uytterhoeven (1):
m68k update

Grant Likely (1):
devicetree updates

Greg KH (5):
char/misc driver updates
driver core updates
staging driver updates
tty/serial driver updates
USB updates

Greg Ungerer (1):
m68knommu updates

Guenter Roeck (2):
hwmon updates
hwmon fixes

Herbert Xu (3):
crypto update
crypto fixes
crypto fixes

Ingo Molnar (17):
RCU updates
locking updates
perf updates
perf fixes
scheduler updates
x86 cleanups
x86 CPU features
x86 debugging documentation updates
x86 EFI updates
x86 FPU updates
x86 kdump updates
x86 warning fixlet
x86 core updates
max log buf size increase
perf updates
scheduler fixes
x86 fixes

Jaegeuk Kim (1):
f2fs updates

James Bottomley (1):
SCSI updates

James Morris (1):
security subsystem updates

Jan Kara (1):
UDF fixes and cleanups

Jani Nikula (1):
intel drm fixes

Jassi Brar (1):
mailbox updates

Jean Delvare (2):
DMI updates
more hwmon updates

Jens Axboe (6):
core block IO update
block driver updates
asm/scatterlist.h removal
cgroup writeback support
more block layer patches
block fixes

Jiri Kosina (3):
HID updates
livepatching fixes
trivial tree updates

Joerg Roedel (1):
IOMMU updates

Jon Mason (1):
NTB updates

Jonathan Corbet (1):
documentation updates

Kevin Hilman (6):
ARM SoC cleanups
ARM SoC platform support updates
ARM SoC DT updates
ARM SoC driver updates
ARM SoC defconfig updates
ARM SoC late fixes and dependencies

Lee Jones (2):
MFD updates
backlight updates

Ley Foon Tan (1):
nios2 update

Linus Walleij (2):
gpio updates
pin control updates

Mark Brown (3):
regmap updates
spi updates
regulator updates

Martin Schwidefsky (2):
s390 updates
more s390 updates

Mauro Carvalho Chehab (2):
media updates
edac updates

Michael Ellerman (1):
powerpc updates

Michael Tsirkin (1):
virtio/vhost cross endian support

Michael Turquette (1):
clock framework updates

Michal Marek (2):
kconfig updates
kbuild updates

Michal Simek (1):
Microblaze updates

Mike Snitzer (2):
device mapper updates
device mapper fixes

Miklos Szeredi (2):
fuse updates
overlayfs updates

Neil Brown (1):
md updates

Nicholas Bellinger (1):
SCSI target updates

Ohad Ben-Cohen (2):
hwspinlock updates
remoteproc updates

Paolo Bonzini (2):
first batch of KVM updates
kvm fixes

Paul Gortmaker (6):
__cpuinit removal
implicit module.h fixes
module_init replacement part one
module_init replacement part two
module_platform_driver replacement
init.h/module.h fragility fixes

Paul Moore (1):
audit updates

Rafael Wysocki (3):
power management and ACPI updates
power management and ACPI fixes
ACPICA updates

Ralf Baechle (1):
MIPS updates

Richard Weinberger (2):
UBI/UBIFS updates
UML updates

Russell King (2):
clkdev updates
ARM updates

Rusty Russell (1):
module updates

Sage Weil (1):
Ceph updates

Sebastian Reichel (2):
HSI updates
power supply and reset updates

Shuah Khan (1):
kselftest update

Steve French (1):
CIFS/SMB3 updates

Steven Rostedt (2):
tracing fixes
tracing updates

Sumit Semwal (1):
dma-buf updates

Takashi Iwai (2):
sound updates
sound fixes

Ted Ts'o (1):
ext4 updates

Tejun Heo (3):
libata updates
cgroup updates
workqueue updates

Thierry Reding (1):
pwm updates

Thomas Gleixner (8):
timer updates
NOHZ updates
irq updates
locking updates
scheduler updates
irq fixes
timer fixes
irq update

Tomi Valkeinen (2):
fbdev updates
fbdev fix

Tony Luck (4):
ia64 paravirt removal
pstore updates
ia64 updates
ia64 boot noise reduction fix

Trond Myklebust (1):
NFS client updates

Ulf Hansson (1):
MMC updates

Vineet Gupta (1):
ARC architecture updates

Vinod Koul (1):
dmaengine updates

Wim Van Sebroeck (1):
watchdog updates

Wolfram Sang (1):
i2c updates

Yoshinori Sato (1):
Renesas H8/300 architecture re-introduction

Zhang Rui (1):
thermal management updates


2015-07-05 23:02:15

by Guenter Roeck

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Sun, Jul 05, 2015 at 01:22:48PM -0700, Linus Torvalds wrote:
> It's Sunday, two weeks have passed, and the merge window is closed. I
> just pushed out the tag to the git trees, and tar-balls and patches
> should be mirroring out too.
>

Testing doesn't look bad for -rc1.

Build results:
total: 130 pass: 125 fail: 5
Failed builds:
alpha:allmodconfig
m68k:allmodconfig
mips:allmodconfig
powerpc:allmodconfig
s390:allmodconfig

Qemu tests:
total: 31 pass: 31 fail: 0

There are more new allmodconfig (and probably allyesconfig) build errors,
but those all boil down to two problems, and patches are available for both.

staging: make board support depend on OF_IRQ and CLKDEV_LOOKUP [1]
by Paul Gortmaker
staging:lustre: remove irq.h from socklnd.h [2]
by James Simmons

Both patches should be in Greg's patch queue.

Guenter

---
[1] https://lkml.org/lkml/2015/6/20/215
[2] https://lkml.org/lkml/2015/6/25/621

2015-07-07 05:50:33

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: stats (Was: Linux 4.2-rc1)

Hi all,

[These will be easier in the future as I have now scripted this message]

As usual, the executive friendly graph is at
http://neuling.org/linux-next-size.html :-)

(No merge commits counted, next-20150623 was the first linux-next after
the merge window opened.)

Commits in v4.2-rc1 (relative to v4.1): 12092
Commits in next-20150623: 11851
Commits with the same SHA1: 10679
Commits with the same patch_id: 615 (1)
Commits with the same subject line: 54 (1)

(1) not counting those in the lines above.

So commits in -rc1 that were in next-20150623: 11294 93%

Some breakdown of the list of extra commits (relative to next-20150623)
in -rc1:

Top ten first word of commit summary:

54 perf
52 drm
46 net
35 kvm
25 btrfs
25 acpica
24 arcv2
22 ntb
22 ceph
13 nfs

Top ten authors:

32 [email protected]
29 [email protected]
26 [email protected]
24 [email protected]
23 [email protected]
15 [email protected]
15 [email protected]
14 [email protected]
14 [email protected]
13 [email protected]

Top ten commiters:

122 [email protected]
53 [email protected]
40 [email protected]
36 [email protected]
34 [email protected]
32 [email protected]
30 [email protected]
30 [email protected]
29 [email protected]
26 [email protected]

There are also 503 commits in next-20150623 that didn't make it into
v4.2-rc1.

Top ten first word of commit summary:

51 kdbus
31 userfaultfd
31 arm
23 rcu
21 mm
19 ocfs2
17 documentation
17 coresight
16 cris
14 page-flags

Top ten authors:

37 [email protected]
30 [email protected]
29 [email protected]
21 [email protected]
20 [email protected]
20 [email protected]
19 [email protected]
16 [email protected]
14 [email protected]
13 [email protected]

Some of Andrew's patches are fixes for other patches in his tree (and
have been merged into those).

Top ten commiters:

153 [email protected]
59 [email protected]
32 [email protected]
32 [email protected]
30 [email protected]
26 [email protected]
19 [email protected]
18 [email protected]
17 [email protected]
16 [email protected]

Those commits by [email protected] and me are from the quilt series
(mainly Andrew's mmotm tree).
--
Cheers,
Stephen Rothwell [email protected]

2015-07-07 12:56:50

by Mark Langsdorf

[permalink] [raw]
Subject: Build failure on ARM64 for Linux 4.2-rc1 was: Linux 4.2-rc1

On 07/05/2015 03:22 PM, Linus Torvalds wrote:
> It's Sunday, two weeks have passed, and the merge window is closed. I
> just pushed out the tag to the git trees, and tar-balls and patches
> should be mirroring out too.

I'm seeing a build regression on arm64 for tools/perf.

On linux-4.1, it builds fine.

On linux-4.2-rc1, it dies with this relevant message (skipping the
missing defines, etc):

/home/mlangsdorf/tmp/linux-4.2/include/linux/preempt.h: At top level:
/home/mlangsdorf/tmp/linux-4.2/include/linux/preempt.h:64:25: fatal
error: asm/preempt.h: No such file or directory
#include <asm/preempt.h>

On both versions, arch/arm64/include/generated/asm/preempt.h
exists. I'm guessing something changed in the build system so
that's its not being picked up for 4.2-rc1 but I'm not sure where
to look.

--Mark Langsdorf

2015-07-08 16:32:30

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Sun, Jul 5, 2015 at 2:22 PM, Linus Torvalds
<[email protected]> wrote:
> It's Sunday, two weeks have passed, and the merge window is closed. I
> just pushed out the tag to the git trees, and tar-balls and patches
> should be mirroring out too.
>
> I thought this release would be one of the biggest ones ever, but it
> turns out that it will depend on how you count. Just counting pure
> commits, it is indeed one of the bigger rc1's in recent history, but
> 3.10-rc1 was almost as big, and then the final 3.10 grew from that
> more than most. I doubt we'll match the 3.10 release, since we have
> been getting progressively better at *not* merging tons of stuff after
> -rc1.
>
>

I am seeing the following NULL pointer dereference on my test system:

[ 3.640599] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000080
[ 3.640609] IP: [<ffffffff814f1463>] firmware_uevent+0x23/0x80
[ 3.640616] PGD 0
[ 3.640620] Oops: 0000 [#1] SMP
[ 3.640625] Modules linked in: iwlwifi snd_hda_intel snd_hda_codec
eeepc_wmi asus_wmi sparse_keymap drm_kms_helper snd_hda_core drm
crct10dif_pclmul snd_hwdep btusb crc32_pclmul btrtl
ghash_clmulni_intel cfg80211 btbcm snd_pcm btintel bluetooth
snd_seq_midi snd_seq_midi_event aesni_intel snd_rawmidi aes_x86_64 lrw
snd_seq dm_multipath gf128mul glue_helper scsi_dh ablk_helper cryptd
snd_seq_device snd_timer mei_me snd microcode mei serio_raw lpc_ich
shpchp soundcore i2c_algo_bit video wmi mac_hid parport_pc ppdev lp
parport autofs4 btrfs xor raid6_pq psmouse r8169 ahci dm_mirror
libahci mii dm_region_hash dm_log
[ 3.640677] CPU: 6 PID: 362 Comm: systemd-udevd Not tainted 4.2.0-rc1+ #2
[ 3.640681] Hardware name: System76, Inc. Wild Dog
Performance/H87-PLUS, BIOS 0705 12/05/2013
[ 3.640687] task: ffff8800d82d0000 ti: ffff88040cba0000 task.ti:
ffff88040cba0000
[ 3.640692] RIP: 0010:[<ffffffff814f1463>] [<ffffffff814f1463>]
firmware_uevent+0x23/0x80
[ 3.640699] RSP: 0018:ffff88040cba3c78 EFLAGS: 00010282
[ 3.640703] RAX: 0000000000000000 RBX: ffff8800d80a7000 RCX: 0000000000000000
[ 3.640707] RDX: 0000000000000000 RSI: ffffffff81ae290f RDI: ffff8800d80a7000
[ 3.640710] RBP: ffff88040cba3c88 R08: 0000000000019d80 R09: ffff8800d80a7000
[ 3.640714] R10: 0000000000000022 R11: 0000000000000246 R12: ffff8803f3681408
[ 3.640717] R13: ffff8800d80a7000 R14: ffff8803f3681408 R15: 0000000000000001
[ 3.640721] FS: 00007f3c3e7ba880(0000) GS:ffff88041fb80000(0000)
knlGS:0000000000000000
[ 3.640726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.640730] CR2: 0000000000000080 CR3: 00000003f29b3000 CR4: 00000000001407e0
[ 3.640734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.640738] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.640741] Stack:
[ 3.640744] ffff8803f3681418 0000000000000000 ffff88040cba3cd8
ffffffff814dd876
[ 3.640750] ffff88040cba3dd8 0000000000000920 ffff8803f2c20e80
ffff88040f1ef960
[ 3.640757] 0000000000000000 ffff8800d80a4000 ffff8800d80a7000
ffff8803f3681418
[ 3.640763] Call Trace:
[ 3.640769] [<ffffffff814dd876>] dev_uevent+0xb6/0x290
[ 3.640773] [<ffffffff814dc6d4>] uevent_show+0x94/0x100
[ 3.640778] [<ffffffff814dbbe0>] dev_attr_show+0x20/0x50
[ 3.640783] [<ffffffff8176ffe6>] ? mutex_lock+0x16/0x40
[ 3.640788] [<ffffffff8124fe4a>] sysfs_kf_seq_show+0xaa/0x120
[ 3.640793] [<ffffffff8124e6e0>] kernfs_seq_show+0x20/0x30
[ 3.640798] [<ffffffff811fad4d>] seq_read+0xcd/0x370
[ 3.640803] [<ffffffff8124ee77>] kernfs_fop_read+0x107/0x160
[ 3.640808] [<ffffffff811d78b8>] __vfs_read+0x28/0xd0
[ 3.640813] [<ffffffff81316273>] ? security_file_permission+0xa3/0xc0
[ 3.640817] [<ffffffff811d7de3>] ? rw_verify_area+0x53/0xf0
[ 3.640821] [<ffffffff811d7f0a>] vfs_read+0x8a/0x130
[ 3.640825] [<ffffffff811d8d06>] SyS_read+0x46/0xa0
[ 3.640829] [<ffffffff81772172>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 3.640833] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
e5 41 54 53 48 8b 87 c0 02 00 00 48 89 f3 49 89 fc 48 c7 c6 0f 29 ae
81 48 89 df <48> 8b 90 80 00 00 00 31 c0 e8 5f cf ea ff 85 c0 75 39 8b
15 ed
[ 3.640865] RIP [<ffffffff814f1463>] firmware_uevent+0x23/0x80
[ 3.640871] RSP <ffff88040cba3c78>
[ 3.640874] CR2: 0000000000000080
[ 3.640878] ---[ end trace 0df1af0431b2ec1a ]---
[ 3.703087] snd_hda_codec_realtek hdaudioC1D0: autoconfig for
ALC887-VD: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
[ 3.703092] snd_hda_codec_realtek hdaudioC1D0: speaker_outs=0
(0x0/0x0/0x0/0x0/0x0)
[ 3.703095] snd_hda_codec_realtek hdaudioC1D0: hp_outs=1
(0x1b/0x0/0x0/0x0/0x0)
[ 3.703098] snd_hda_codec_realtek hdaudioC1D0: mono: mono_out=0x0
[ 3.703100] snd_hda_codec_realtek hdaudioC1D0: dig-out=0x11/0x0
[ 3.703102] snd_hda_codec_realtek hdaudioC1D0: inputs:
[ 3.703104] snd_hda_codec_realtek hdaudioC1D0: Front Mic=0x19
[ 3.703106] snd_hda_codec_realtek hdaudioC1D0: Rear Mic=0x18
[ 3.703108] snd_hda_codec_realtek hdaudioC1D0: Line=0x1a

I can send config and full dmesg if you would like to see it.

thanks,
-- Shuah

2015-07-08 17:12:44

by Casey Schaufler

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 7/8/2015 9:32 AM, Shuah Khan wrote:
> On Sun, Jul 5, 2015 at 2:22 PM, Linus Torvalds
> <[email protected]> wrote:
>> It's Sunday, two weeks have passed, and the merge window is closed. I
>> just pushed out the tag to the git trees, and tar-balls and patches
>> should be mirroring out too.
>>
>> I thought this release would be one of the biggest ones ever, but it
>> turns out that it will depend on how you count. Just counting pure
>> commits, it is indeed one of the bigger rc1's in recent history, but
>> 3.10-rc1 was almost as big, and then the final 3.10 grew from that
>> more than most. I doubt we'll match the 3.10 release, since we have
>> been getting progressively better at *not* merging tons of stuff after
>> -rc1.
>>
>>
> I am seeing the following NULL pointer dereference on my test system:

Can I get your config file, please? I am particularly interested
in seeing your security settings.

Thank you.

>
> [ 3.640599] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000080
> [ 3.640609] IP: [<ffffffff814f1463>] firmware_uevent+0x23/0x80
> [ 3.640616] PGD 0
> [ 3.640620] Oops: 0000 [#1] SMP
> [ 3.640625] Modules linked in: iwlwifi snd_hda_intel snd_hda_codec
> eeepc_wmi asus_wmi sparse_keymap drm_kms_helper snd_hda_core drm
> crct10dif_pclmul snd_hwdep btusb crc32_pclmul btrtl
> ghash_clmulni_intel cfg80211 btbcm snd_pcm btintel bluetooth
> snd_seq_midi snd_seq_midi_event aesni_intel snd_rawmidi aes_x86_64 lrw
> snd_seq dm_multipath gf128mul glue_helper scsi_dh ablk_helper cryptd
> snd_seq_device snd_timer mei_me snd microcode mei serio_raw lpc_ich
> shpchp soundcore i2c_algo_bit video wmi mac_hid parport_pc ppdev lp
> parport autofs4 btrfs xor raid6_pq psmouse r8169 ahci dm_mirror
> libahci mii dm_region_hash dm_log
> [ 3.640677] CPU: 6 PID: 362 Comm: systemd-udevd Not tainted 4.2.0-rc1+ #2
> [ 3.640681] Hardware name: System76, Inc. Wild Dog
> Performance/H87-PLUS, BIOS 0705 12/05/2013
> [ 3.640687] task: ffff8800d82d0000 ti: ffff88040cba0000 task.ti:
> ffff88040cba0000
> [ 3.640692] RIP: 0010:[<ffffffff814f1463>] [<ffffffff814f1463>]
> firmware_uevent+0x23/0x80
> [ 3.640699] RSP: 0018:ffff88040cba3c78 EFLAGS: 00010282
> [ 3.640703] RAX: 0000000000000000 RBX: ffff8800d80a7000 RCX: 0000000000000000
> [ 3.640707] RDX: 0000000000000000 RSI: ffffffff81ae290f RDI: ffff8800d80a7000
> [ 3.640710] RBP: ffff88040cba3c88 R08: 0000000000019d80 R09: ffff8800d80a7000
> [ 3.640714] R10: 0000000000000022 R11: 0000000000000246 R12: ffff8803f3681408
> [ 3.640717] R13: ffff8800d80a7000 R14: ffff8803f3681408 R15: 0000000000000001
> [ 3.640721] FS: 00007f3c3e7ba880(0000) GS:ffff88041fb80000(0000)
> knlGS:0000000000000000
> [ 3.640726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3.640730] CR2: 0000000000000080 CR3: 00000003f29b3000 CR4: 00000000001407e0
> [ 3.640734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3.640738] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3.640741] Stack:
> [ 3.640744] ffff8803f3681418 0000000000000000 ffff88040cba3cd8
> ffffffff814dd876
> [ 3.640750] ffff88040cba3dd8 0000000000000920 ffff8803f2c20e80
> ffff88040f1ef960
> [ 3.640757] 0000000000000000 ffff8800d80a4000 ffff8800d80a7000
> ffff8803f3681418
> [ 3.640763] Call Trace:
> [ 3.640769] [<ffffffff814dd876>] dev_uevent+0xb6/0x290
> [ 3.640773] [<ffffffff814dc6d4>] uevent_show+0x94/0x100
> [ 3.640778] [<ffffffff814dbbe0>] dev_attr_show+0x20/0x50
> [ 3.640783] [<ffffffff8176ffe6>] ? mutex_lock+0x16/0x40
> [ 3.640788] [<ffffffff8124fe4a>] sysfs_kf_seq_show+0xaa/0x120
> [ 3.640793] [<ffffffff8124e6e0>] kernfs_seq_show+0x20/0x30
> [ 3.640798] [<ffffffff811fad4d>] seq_read+0xcd/0x370
> [ 3.640803] [<ffffffff8124ee77>] kernfs_fop_read+0x107/0x160
> [ 3.640808] [<ffffffff811d78b8>] __vfs_read+0x28/0xd0
> [ 3.640813] [<ffffffff81316273>] ? security_file_permission+0xa3/0xc0
> [ 3.640817] [<ffffffff811d7de3>] ? rw_verify_area+0x53/0xf0
> [ 3.640821] [<ffffffff811d7f0a>] vfs_read+0x8a/0x130
> [ 3.640825] [<ffffffff811d8d06>] SyS_read+0x46/0xa0
> [ 3.640829] [<ffffffff81772172>] entry_SYSCALL_64_fastpath+0x16/0x75
> [ 3.640833] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
> e5 41 54 53 48 8b 87 c0 02 00 00 48 89 f3 49 89 fc 48 c7 c6 0f 29 ae
> 81 48 89 df <48> 8b 90 80 00 00 00 31 c0 e8 5f cf ea ff 85 c0 75 39 8b
> 15 ed
> [ 3.640865] RIP [<ffffffff814f1463>] firmware_uevent+0x23/0x80
> [ 3.640871] RSP <ffff88040cba3c78>
> [ 3.640874] CR2: 0000000000000080
> [ 3.640878] ---[ end trace 0df1af0431b2ec1a ]---
> [ 3.703087] snd_hda_codec_realtek hdaudioC1D0: autoconfig for
> ALC887-VD: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
> [ 3.703092] snd_hda_codec_realtek hdaudioC1D0: speaker_outs=0
> (0x0/0x0/0x0/0x0/0x0)
> [ 3.703095] snd_hda_codec_realtek hdaudioC1D0: hp_outs=1
> (0x1b/0x0/0x0/0x0/0x0)
> [ 3.703098] snd_hda_codec_realtek hdaudioC1D0: mono: mono_out=0x0
> [ 3.703100] snd_hda_codec_realtek hdaudioC1D0: dig-out=0x11/0x0
> [ 3.703102] snd_hda_codec_realtek hdaudioC1D0: inputs:
> [ 3.703104] snd_hda_codec_realtek hdaudioC1D0: Front Mic=0x19
> [ 3.703106] snd_hda_codec_realtek hdaudioC1D0: Rear Mic=0x18
> [ 3.703108] snd_hda_codec_realtek hdaudioC1D0: Line=0x1a
>
> I can send config and full dmesg if you would like to see it.
>
> thanks,
> -- Shuah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2015-07-08 17:17:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Wed, Jul 8, 2015 at 9:32 AM, Shuah Khan <[email protected]> wrote:
>
> I am seeing the following NULL pointer dereference on my test system:
>
> [ 3.640599] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> [ 3.640609] IP: [<ffffffff814f1463>] firmware_uevent+0x23/0x80

Decoding the "Code:" line shows that this is the "->fw_id" dereference in

if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
return -ENOMEM;

and that "fw_priv->buf" pointer is NULL.

However, I don't see anything that looks like it should have changed
any of this since 4.1.

Adding the appropriate firmware people to the cc.

Linus

2015-07-08 17:29:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds
<[email protected]> wrote:
>
> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>
> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
> return -ENOMEM;
>
> and that "fw_priv->buf" pointer is NULL.
>
> However, I don't see anything that looks like it should have changed
> any of this since 4.1.

Looking at the otehr uses of "fw_priv->buf", they all check that
pointer for NULL. I see code like

fw_buf = fw_priv->buf;
if (!fw_buf)
goto out;

etc.

Also, it looks like you need to hold the "fw_lock" to even look at
that pointer, since the buffer can get reallocated etc.

So that uevent code really looks buggy. It just doesn't look like a
*new* bug to me. That code looks old, going back to 2012 and commit
1244691c73b2.

Ming Lei?

Linus

2015-07-08 17:33:28

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/08/2015 11:05 AM, Casey Schaufler wrote:
> On 7/8/2015 9:32 AM, Shuah Khan wrote:
>> On Sun, Jul 5, 2015 at 2:22 PM, Linus Torvalds
>> <[email protected]> wrote:
>>> It's Sunday, two weeks have passed, and the merge window is closed. I
>>> just pushed out the tag to the git trees, and tar-balls and patches
>>> should be mirroring out too.
>>>
>>> I thought this release would be one of the biggest ones ever, but it
>>> turns out that it will depend on how you count. Just counting pure
>>> commits, it is indeed one of the bigger rc1's in recent history, but
>>> 3.10-rc1 was almost as big, and then the final 3.10 grew from that
>>> more than most. I doubt we'll match the 3.10 release, since we have
>>> been getting progressively better at *not* merging tons of stuff after
>>> -rc1.
>>>
>>>
>> I am seeing the following NULL pointer dereference on my test system:
>
> Can I get your config file, please? I am particularly interested
> in seeing your security settings.
>

Please see the attached config file.

thanks,
-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978


Attachments:
config-4.2.0-rc1+ (175.60 kB)

2015-07-08 17:47:47

by Casey Schaufler

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 7/8/2015 10:29 AM, Linus Torvalds wrote:
> On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds
> <[email protected]> wrote:
>> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>>
>> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
>> return -ENOMEM;
>>
>> and that "fw_priv->buf" pointer is NULL.
>>
>> However, I don't see anything that looks like it should have changed
>> any of this since 4.1.
> Looking at the otehr uses of "fw_priv->buf", they all check that
> pointer for NULL. I see code like
>
> fw_buf = fw_priv->buf;
> if (!fw_buf)
> goto out;
>
> etc.
>
> Also, it looks like you need to hold the "fw_lock" to even look at
> that pointer, since the buffer can get reallocated etc.
>
> So that uevent code really looks buggy. It just doesn't look like a
> *new* bug to me. That code looks old, going back to 2012 and commit
> 1244691c73b2.

There have been SELinux changes to kernfs for 4.2. William,
you might want to have a look here.

>
> Ming Lei?
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2015-07-08 17:55:12

by Casey Schaufler

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 7/8/2015 10:33 AM, Shuah Khan wrote:
> On 07/08/2015 11:05 AM, Casey Schaufler wrote:
>> On 7/8/2015 9:32 AM, Shuah Khan wrote:
>>> On Sun, Jul 5, 2015 at 2:22 PM, Linus Torvalds
>>> <[email protected]> wrote:
>>>> It's Sunday, two weeks have passed, and the merge window is closed. I
>>>> just pushed out the tag to the git trees, and tar-balls and patches
>>>> should be mirroring out too.
>>>>
>>>> I thought this release would be one of the biggest ones ever, but it
>>>> turns out that it will depend on how you count. Just counting pure
>>>> commits, it is indeed one of the bigger rc1's in recent history, but
>>>> 3.10-rc1 was almost as big, and then the final 3.10 grew from that
>>>> more than most. I doubt we'll match the 3.10 release, since we have
>>>> been getting progressively better at *not* merging tons of stuff after
>>>> -rc1.
>>>>
>>>>
>>> I am seeing the following NULL pointer dereference on my test system:
>> Can I get your config file, please? I am particularly interested
>> in seeing your security settings.
>>
> Please see the attached config file.

SELinux is not configured, AppArmor is. It's possible that
the recent kernfs changes for SELinux affect this case, although
I couldn't say how from what I see.

>
> thanks,
> -- Shuah
>
>

2015-07-08 18:04:13

by Stephen Smalley

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/08/2015 01:47 PM, Casey Schaufler wrote:
> On 7/8/2015 10:29 AM, Linus Torvalds wrote:
>> On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>>>
>>> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
>>> return -ENOMEM;
>>>
>>> and that "fw_priv->buf" pointer is NULL.
>>>
>>> However, I don't see anything that looks like it should have changed
>>> any of this since 4.1.
>> Looking at the otehr uses of "fw_priv->buf", they all check that
>> pointer for NULL. I see code like
>>
>> fw_buf = fw_priv->buf;
>> if (!fw_buf)
>> goto out;
>>
>> etc.
>>
>> Also, it looks like you need to hold the "fw_lock" to even look at
>> that pointer, since the buffer can get reallocated etc.
>>
>> So that uevent code really looks buggy. It just doesn't look like a
>> *new* bug to me. That code looks old, going back to 2012 and commit
>> 1244691c73b2.
>
> There have been SELinux changes to kernfs for 4.2. William,
> you might want to have a look here.

What change are you referring to? I see no SELinux-related changes to
kernfs in 4.2-rc1.


2015-07-08 19:00:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 4.2-rc1


* Linus Torvalds <[email protected]> wrote:

> On Wed, Jul 8, 2015 at 9:32 AM, Shuah Khan <[email protected]> wrote:
> >
> > I am seeing the following NULL pointer dereference on my test system:
> >
> > [ 3.640599] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> > [ 3.640609] IP: [<ffffffff814f1463>] firmware_uevent+0x23/0x80
>
> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>
> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
> return -ENOMEM;
>
> and that "fw_priv->buf" pointer is NULL.
>
> However, I don't see anything that looks like it should have changed
> any of this since 4.1.
>
> Adding the appropriate firmware people to the cc.

Btw., FWIW, a couple of days ago I started seeing a similar crash pattern when I
updated my randconfig testing system to v4.2-rc1:

cfg80211: Kicking the queue
cfg80211: Exceeded CRDA call max attempts. Not calling CRDA
BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffff81b5e978>] firmware_uevent+0x1a/0xae
PGD 0
Oops: 0000 [#1] SMP
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc1-01514-g4a704ed-dirty #411
Hardware name: System manufacturer System Product Name/A8N-E, BIOS ASUS A8N-E ACPI BIOS Revision 1008 08/22/2005
task: ffff88003d4f0000 ti: ffff88003d4f8000 task.ti: ffff88003d4f8000
RIP: 0010:[<ffffffff81b5e978>] [<ffffffff81b5e978>] firmware_uevent+0x1a/0xae
RSP: 0018:ffff88003d4fba38 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88003ac05668 RCX: 0000000000000003
RDX: 0000000000000001 RSI: ffffffff83822d1a RDI: ffff88003ac05668
RBP: ffff88003ae68008 R08: 000000003ac057f4 R09: 000000010013ffff
R10: ffffffffffffffff R11: ffffffff84f831e0 R12: ffff88003ae68018
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff838947a7
FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000080 CR3: 0000000003a50000 CR4: 00000000000006a0
Stack:
0000000000000003 ffff88003ac05668 ffff88003ae68008 ffffffff81b4a0fa
ffff88003d50e1a8 00000000fffffffe ffffffff838947a7 0000000000000002
000000003ac057e1 ffff88003ac05668 ffff88003ae68018 ffffffff831606f0
Call Trace:
[<ffffffff81b4a0fa>] ? dev_uevent+0x284/0x312
[<ffffffff81721d70>] ? kobject_uevent_env+0x304/0x54b
[<ffffffff810efa3d>] ? do_raw_spin_lock+0x30/0x5e
[<ffffffff81b49cc2>] ? device_del+0x287/0x2c5
[<ffffffff81b5fdda>] ? _request_firmware+0x71b/0xca2
[<ffffffff8197a961>] ? r100_cp_init+0x254/0x692
[<ffffffff8197ef71>] ? r300_startup.constprop.0+0x2da/0x36b
[<ffffffff8197f534>] ? r300_init+0x2e9/0x3a9
[<ffffffff8193a451>] ? radeon_device_init+0xbf1/0xe95
[<ffffffff8193cf1c>] ? radeon_driver_load_kms+0x10f/0x24c
[<ffffffff818e83fc>] ? drm_dev_register+0xec/0x19b
[<ffffffff818eae62>] ? drm_get_pci_dev+0x1d0/0x2d2
[<ffffffff81764fc7>] ? local_pci_probe+0x34/0xa2
[<ffffffff81765b4e>] ? pci_device_probe+0x131/0x187
[<ffffffff81b4e21a>] ? driver_probe_device+0x160/0x3a8
[<ffffffff81b4e500>] ? __driver_attach+0x9e/0xd4
[<ffffffff81b4e462>] ? driver_probe_device+0x3a8/0x3a8
[<ffffffff81b4c326>] ? bus_for_each_dev+0x89/0x9b
[<ffffffff81b4cdc9>] ? bus_add_driver+0x151/0x2ee
[<ffffffff81b4f24d>] ? driver_register+0xe8/0x147
[<ffffffff84e32e71>] ? r128_init+0x1f/0x1f
[<ffffffff84dc54f5>] ? do_one_initcall+0x11e/0x25b
[<ffffffff810cc1af>] ? parse_args+0x327/0x414
[<ffffffff84dc574c>] ? kernel_init_freeable+0x11a/0x1dc
[<ffffffff84dc4994>] ? initcall_blacklist+0xc1/0xc1
[<ffffffff82e90a01>] ? rest_init+0x75/0x75
[<ffffffff82e90a07>] ? kernel_init+0x6/0x14c
[<ffffffff82ecd1df>] ? ret_from_fork+0x3f/0x70
[<ffffffff82e90a01>] ? rest_init+0x75/0x75
Code: c7 c6 3d 7f 80 83 31 c0 e8 f3 c8 bc ff 5a 48 98 c3 55 48 89 fd 53 48 89 f3 48 c7 c6 1a 2d 82 83 51 48 8b 87 90 02 00 00 48 89 df <48> 8b 90 80 00 00 00 31 c0 e8 c9 2f bc ff 85 c0 0f 95 c0 0f b6
RIP [<ffffffff81b5e978>] firmware_uevent+0x1a/0xae
RSP <ffff88003d4fba38>
CR2: 0000000000000080
---[ end trace 3ab09bb9b953b39a ]---

Haven't had the time to dig into it yet.

Thanks,

Ingo

2015-07-09 00:58:20

by Ming Lei

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Jul 8, 2015 at 10:17 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Decoding the "Code:" line shows that this is the "->fw_id" dereference in
>>
>> if (add_uevent_var(env, "FIRMWARE=%s", fw_priv->buf->fw_id))
>> return -ENOMEM;
>>
>> and that "fw_priv->buf" pointer is NULL.
>>
>> However, I don't see anything that looks like it should have changed
>> any of this since 4.1.
>
> Looking at the otehr uses of "fw_priv->buf", they all check that
> pointer for NULL. I see code like
>
> fw_buf = fw_priv->buf;
> if (!fw_buf)
> goto out;
>
> etc.
>
> Also, it looks like you need to hold the "fw_lock" to even look at
> that pointer, since the buffer can get reallocated etc.


Yes, the above code with holding 'fw_lock' is right fix for the issue since
sysfs read can happen anytime, and there is one race between firmware
request abort and reading uevent of sysfs.


> So that uevent code really looks buggy. It just doesn't look like a
> *new* bug to me. That code looks old, going back to 2012 and commit
> 1244691c73b2.

Exactly.

Thanks,
Ming

2015-07-09 03:18:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
> <[email protected]> wrote:
>> Also, it looks like you need to hold the "fw_lock" to even look at
>> that pointer, since the buffer can get reallocated etc.
>
> Yes, the above code with holding 'fw_lock' is right fix for the issue since
> sysfs read can happen anytime, and there is one race between firmware
> request abort and reading uevent of sysfs.

So if fw_priv->buf is NULL, what should we do?

Should we skip the TIMEOUT= and ASYNC= fields too?

Something like the attached, perhaps?

Shuah, how reproducible is this? Does this (completely untested) patch
make any difference?

Linus


Attachments:
patch.diff (1.22 kB)

2015-07-09 05:00:11

by Ming Lei

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Thu, Jul 9, 2015 at 11:17 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>> that pointer, since the buffer can get reallocated etc.
>>
>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>> sysfs read can happen anytime, and there is one race between firmware
>> request abort and reading uevent of sysfs.
>
> So if fw_priv->buf is NULL, what should we do?
>
> Should we skip the TIMEOUT= and ASYNC= fields too?

When the request is aborted, the firmware device will be removed,
so it is OK to skip the two fields.

>
> Something like the attached, perhaps?

Looks it is fine.

>
> Shuah, how reproducible is this? Does this (completely untested) patch
> make any difference?
>
> Linus

2015-07-09 13:10:28

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/08/2015 09:17 PM, Linus Torvalds wrote:
> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>> that pointer, since the buffer can get reallocated etc.
>>
>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>> sysfs read can happen anytime, and there is one race between firmware
>> request abort and reading uevent of sysfs.
>
> So if fw_priv->buf is NULL, what should we do?
>
> Should we skip the TIMEOUT= and ASYNC= fields too?
>
> Something like the attached, perhaps?
>
> Shuah, how reproducible is this? Does this (completely untested) patch
> make any difference?
>

Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
I will test with your patch and report results.

-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-09 13:44:11

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/09/2015 07:10 AM, Shuah Khan wrote:
> On 07/08/2015 09:17 PM, Linus Torvalds wrote:
>> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>>> <[email protected]> wrote:
>>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>>> that pointer, since the buffer can get reallocated etc.
>>>
>>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>>> sysfs read can happen anytime, and there is one race between firmware
>>> request abort and reading uevent of sysfs.
>>
>> So if fw_priv->buf is NULL, what should we do?
>>
>> Should we skip the TIMEOUT= and ASYNC= fields too?
>>
>> Something like the attached, perhaps?
>>
>> Shuah, how reproducible is this? Does this (completely untested) patch
>> make any difference?
>>
>
> Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
> I will test with your patch and report results.
>

Yes. This patch fixed the problem. I have been seeing another problem
that both poweroff and reboot hang. I will get more data on this later
on today.

thanks,
-- Shuah
--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-09 13:47:46

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/09/2015 07:44 AM, Shuah Khan wrote:
> On 07/09/2015 07:10 AM, Shuah Khan wrote:
>> On 07/08/2015 09:17 PM, Linus Torvalds wrote:
>>> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>>>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>>>> <[email protected]> wrote:
>>>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>>>> that pointer, since the buffer can get reallocated etc.
>>>>
>>>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>>>> sysfs read can happen anytime, and there is one race between firmware
>>>> request abort and reading uevent of sysfs.
>>>
>>> So if fw_priv->buf is NULL, what should we do?
>>>
>>> Should we skip the TIMEOUT= and ASYNC= fields too?
>>>
>>> Something like the attached, perhaps?
>>>
>>> Shuah, how reproducible is this? Does this (completely untested) patch
>>> make any difference?
>>>
>>
>> Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
>> I will test with your patch and report results.
>>
>
> Yes. This patch fixed the problem. I have been seeing another problem
> that both poweroff and reboot hang. I will get more data on this later
> on today.
>

This patch also fixed reboot and poweroff hang problem.

thanks,
-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-09 14:51:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Thu, Jul 09, 2015 at 07:44:04AM -0600, Shuah Khan wrote:
> On 07/09/2015 07:10 AM, Shuah Khan wrote:
> > On 07/08/2015 09:17 PM, Linus Torvalds wrote:
> >> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
> >>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
> >>> <[email protected]> wrote:
> >>>> Also, it looks like you need to hold the "fw_lock" to even look at
> >>>> that pointer, since the buffer can get reallocated etc.
> >>>
> >>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
> >>> sysfs read can happen anytime, and there is one race between firmware
> >>> request abort and reading uevent of sysfs.
> >>
> >> So if fw_priv->buf is NULL, what should we do?
> >>
> >> Should we skip the TIMEOUT= and ASYNC= fields too?
> >>
> >> Something like the attached, perhaps?
> >>
> >> Shuah, how reproducible is this? Does this (completely untested) patch
> >> make any difference?
> >>
> >
> > Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
> > I will test with your patch and report results.
> >
>
> Yes. This patch fixed the problem.

That's great, but what changed recently to cause this problem to happen?
Any chance you can bisect to the problem commit?

thanks,

greg k-h

2015-07-09 16:46:14

by Mark Langsdorf

[permalink] [raw]
Subject: Re: Build failure on ARM64 for Linux 4.2-rc1 was: Linux 4.2-rc1

On 07/07/2015 07:56 AM, Mark Langsdorf wrote:
> On 07/05/2015 03:22 PM, Linus Torvalds wrote:
>> It's Sunday, two weeks have passed, and the merge window is closed. I
>> just pushed out the tag to the git trees, and tar-balls and patches
>> should be mirroring out too.
>
> I'm seeing a build regression on arm64 for tools/perf.
>
> On linux-4.1, it builds fine.
>
> On linux-4.2-rc1, it dies with this relevant message (skipping the
> missing defines, etc):
>
> /home/mlangsdorf/tmp/linux-4.2/include/linux/preempt.h: At top level:
> /home/mlangsdorf/tmp/linux-4.2/include/linux/preempt.h:64:25: fatal
> error: asm/preempt.h: No such file or directory
> #include <asm/preempt.h>
>
> On both versions, arch/arm64/include/generated/asm/preempt.h
> exists. I'm guessing something changed in the build system so
> that's its not being picked up for 4.2-rc1 but I'm not sure where
> to look.

Hi Peter,

I did a git bisect and it looks like the faulty patch is
d72da4a4d973d8a0a0d3c97e7cdebf287fbe3a99, "rbtree: Make lockless
searches non-fatal". I can't see why it causes my builds to fail,
but if I revert that patch and the related series, then I can
build the kernel and build tools/perf successfully.

Any insight into a less intensive way of fixing my build would
be appreciated.

--Mark Langsdorf

2015-07-09 17:00:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Build failure on ARM64 for Linux 4.2-rc1 was: Linux 4.2-rc1

On Thu, Jul 09, 2015 at 11:46:04AM -0500, Mark Langsdorf wrote:
> I did a git bisect and it looks like the faulty patch is
> d72da4a4d973d8a0a0d3c97e7cdebf287fbe3a99, "rbtree: Make lockless
> searches non-fatal". I can't see why it causes my builds to fail,
> but if I revert that patch and the related series, then I can
> build the kernel and build tools/perf successfully.
>
> Any insight into a less intensive way of fixing my build would
> be appreciated.

This is tools/perf failing to build right? Add tip/perf/urgent, it
appears to contain the required bits to make it go again.

The /urgent branches will get to Linus 'soon' I imagine.

2015-07-09 17:40:51

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/09/2015 08:45 AM, Greg Kroah-Hartman wrote:
> On Thu, Jul 09, 2015 at 07:44:04AM -0600, Shuah Khan wrote:
>> On 07/09/2015 07:10 AM, Shuah Khan wrote:
>>> On 07/08/2015 09:17 PM, Linus Torvalds wrote:
>>>> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>>>>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>>>>> <[email protected]> wrote:
>>>>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>>>>> that pointer, since the buffer can get reallocated etc.
>>>>>
>>>>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>>>>> sysfs read can happen anytime, and there is one race between firmware
>>>>> request abort and reading uevent of sysfs.
>>>>
>>>> So if fw_priv->buf is NULL, what should we do?
>>>>
>>>> Should we skip the TIMEOUT= and ASYNC= fields too?
>>>>
>>>> Something like the attached, perhaps?
>>>>
>>>> Shuah, how reproducible is this? Does this (completely untested) patch
>>>> make any difference?
>>>>
>>>
>>> Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
>>> I will test with your patch and report results.
>>>
>>
>> Yes. This patch fixed the problem.
>
> That's great, but what changed recently to cause this problem to happen?
> Any chance you can bisect to the problem commit?
>

:) Starting bisect now. Thankfully I built 4.1.0 in this tree
which is good place to start. Let you know in a bit

-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-10 16:58:22

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/09/2015 11:40 AM, Shuah Khan wrote:
> On 07/09/2015 08:45 AM, Greg Kroah-Hartman wrote:
>> On Thu, Jul 09, 2015 at 07:44:04AM -0600, Shuah Khan wrote:
>>> On 07/09/2015 07:10 AM, Shuah Khan wrote:
>>>> On 07/08/2015 09:17 PM, Linus Torvalds wrote:
>>>>> On Wed, Jul 8, 2015 at 5:58 PM, Ming Lei <[email protected]> wrote:
>>>>>> On Thu, Jul 9, 2015 at 1:29 AM, Linus Torvalds
>>>>>> <[email protected]> wrote:
>>>>>>> Also, it looks like you need to hold the "fw_lock" to even look at
>>>>>>> that pointer, since the buffer can get reallocated etc.
>>>>>>
>>>>>> Yes, the above code with holding 'fw_lock' is right fix for the issue since
>>>>>> sysfs read can happen anytime, and there is one race between firmware
>>>>>> request abort and reading uevent of sysfs.
>>>>>
>>>>> So if fw_priv->buf is NULL, what should we do?
>>>>>
>>>>> Should we skip the TIMEOUT= and ASYNC= fields too?
>>>>>
>>>>> Something like the attached, perhaps?
>>>>>
>>>>> Shuah, how reproducible is this? Does this (completely untested) patch
>>>>> make any difference?
>>>>>
>>>>
>>>> Happened both times I booted 4.2-rc1 up, so I would say 100% so far.
>>>> I will test with your patch and report results.
>>>>
>>>
>>> Yes. This patch fixed the problem.
>>
>> That's great, but what changed recently to cause this problem to happen?
>> Any chance you can bisect to the problem commit?
>>
>
> :) Starting bisect now. Thankfully I built 4.1.0 in this tree
> which is good place to start. Let you know in a bit
>

Ok here is the bisect result. Did I mention "I really love bisect
on the very first rc after the merge window" :)

I am not sure why this patch would cause the problem I am seeing.
This patch itself looks like a cleanup type patch and doesn't
really fix a bug. I am building with this patch reverted at the
moment to confirm. In the meantime:


eaa5cd926345f86e9df1eb6b0490da539f5ce7d0 is the first bad commit
commit eaa5cd926345f86e9df1eb6b0490da539f5ce7d0
Author: Vladimir Zapolskiy <[email protected]>
Date: Fri May 22 00:21:16 2015 +0300

fs: sysfs: don't pass count == 0 to bin file readers

If count == 0 bytes are requested by a reader, sysfs_kf_bin_read()
deliberately returns 0 without passing a potentially harmful value to
some externally defined underlying battr->read() function.

However in case of (pos == size && count) the next clause always sets
count to 0 and this value is handed over to battr->read().

The change intends to make obsolete (and remove later) a redundant
sanity check in battr->read(), if it is present, or add more
protection to struct bin_attribute users, who does not care about
input arguments.

Signed-off-by: Vladimir Zapolskiy <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

:040000 040000 aca6ce463db111e3b3ef3fe4ed799a670618b1c4
2aad70773e76e899561ef4a8851174ae87793fed M fs


Complete bisect log:


git bisect start
# bad: [d6ac4ffc61ace6ed6f183e9fd7f207c0ddafb897] Merge branch
'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
git bisect bad d6ac4ffc61ace6ed6f183e9fd7f207c0ddafb897
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# good: [4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag
'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 4570a37169d4b44d316f40b2ccc681dc93fedc7b
# bad: [8d7804a2f03dbd34940fcb426450c730adf29dae] Merge tag
'driver-core-4.2-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect bad 8d7804a2f03dbd34940fcb426450c730adf29dae
# good: [3d9f96d850e4bbfae24dc9aee03033dd77c81596] Merge tag 'armsoc-dt'
of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 3d9f96d850e4bbfae24dc9aee03033dd77c81596
# good: [692a59e696afe1a4e777d0e4359325336ab0ad89] drm/amdgpu: remove
AMDGPU_CTX_OP_STATE_RUNNING
git bisect good 692a59e696afe1a4e777d0e4359325336ab0ad89
# good: [099bfbfc7fbbe22356c02f0caf709ac32e1126ea] Merge branch
'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect good 099bfbfc7fbbe22356c02f0caf709ac32e1126ea
# good: [19a4fb21f804dbd5a327eba7a1569b6b8e941a54] i2c-parport: define
ports to connect
git bisect good 19a4fb21f804dbd5a327eba7a1569b6b8e941a54
# good: [3dc196eae1db548f05e53e5875ff87b8ff79f249] mei: me: wait for
power gating exit confirmation
git bisect good 3dc196eae1db548f05e53e5875ff87b8ff79f249
# good: [e382608254e06c8109f40044f5e693f2e04f3899] Merge tag
'trace-v4.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect good e382608254e06c8109f40044f5e693f2e04f3899
# good: [f5727b05d221796baf69667ed5c891d4bd53711e] firmware: fix
__getname() missing failure check
git bisect good f5727b05d221796baf69667ed5c891d4bd53711e
# good: [9ba8af66432cb8e82553f2e273eb11db0cec7d2d] base:dd - Fix for
typo in comment to function driver_deferred_probe_trigger().
git bisect good 9ba8af66432cb8e82553f2e273eb11db0cec7d2d
# bad: [8b2dcebae330fb6dffc7717b740aa4b2c4d00451] Revert "base/platform:
Remove code duplication"
git bisect bad 8b2dcebae330fb6dffc7717b740aa4b2c4d00451
# bad: [303cda0ea7c1c33701812ccb80d37083a4093c7c] firmware: add missing
kfree for work on async call
git bisect bad 303cda0ea7c1c33701812ccb80d37083a4093c7c
# bad: [eaa5cd926345f86e9df1eb6b0490da539f5ce7d0] fs: sysfs: don't pass
count == 0 to bin file readers
git bisect bad eaa5cd926345f86e9df1eb6b0490da539f5ce7d0
# first bad commit: [eaa5cd926345f86e9df1eb6b0490da539f5ce7d0] fs:
sysfs: don't pass count == 0 to bin file readers

thanks,
-- Shuah

--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-10 17:11:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Fri, Jul 10, 2015 at 9:58 AM, Shuah Khan <[email protected]> wrote:
>
> I am not sure why this patch would cause the problem I am seeing.
> This patch itself looks like a cleanup type patch and doesn't
> really fix a bug. I am building with this patch reverted at the
> moment to confirm.

Smells to me like it's just a timing issue, and that mayeb the bisect
failed because it's not 100% repeatable. Or maybe it *was* 100%
repeatable, but simply because that commit changed the timing of the
bootup scripts etc.

But yes, trying it with the revert in place is a good idea just to
make sure. And perhaps checking that kernel more than a few times to
verify just how repeatable it is.

Linus

2015-07-10 21:57:16

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On 07/10/2015 03:47 PM, Linus Torvalds wrote:
> But my patch (which is committed now) solves it all for you?
>
> I'm going to just assume it's timing, and there is no major real reason why
> it started triggering just now...
>
> Linus

Yes. With your patch I didn't see the problem.

thanks,
-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-07-10 22:36:34

by Ming Lei

[permalink] [raw]
Subject: Re: Linux 4.2-rc1

On Sat, Jul 11, 2015 at 5:47 AM, Linus Torvalds
<[email protected]> wrote:
> But my patch (which is committed now) solves it all for you?
>
> I'm going to just assume it's timing, and there is no major real reason why
> it started triggering just now...

Now I see it, the issue is triggered when firmware request is
timed out, and Shuah's reported should be caused by the following
commit 0cb64249(firmware_loader: abort request if wait_for_completion
is interrupted).

But your patch is correct for this issue too.

Thanks,
Ming

>
> Linus
>
> On Jul 10, 2015 2:33 PM, "Shuah Khan" <[email protected]> wrote:
>>
>> On 07/10/2015 11:11 AM, Linus Torvalds wrote:
>> > On Fri, Jul 10, 2015 at 9:58 AM, Shuah Khan <[email protected]>
>> > wrote:
>> >>
>> >> I am not sure why this patch would cause the problem I am seeing.
>> >> This patch itself looks like a cleanup type patch and doesn't
>> >> really fix a bug. I am building with this patch reverted at the
>> >> moment to confirm.
>> >
>> > Smells to me like it's just a timing issue, and that mayeb the bisect
>> > failed because it's not 100% repeatable. Or maybe it *was* 100%
>> > repeatable, but simply because that commit changed the timing of the
>> > bootup scripts etc.
>> >
>> > But yes, trying it with the revert in place is a good idea just to
>> > make sure. And perhaps checking that kernel more than a few times to
>> > verify just how repeatable it is.
>> >
>>
>> Quick update. Reverting didn't help. I think I mentioned I am seeing
>> hangs during poweroff and reboot. I am seeing hangs during boot as well.
>> I think there is a timing problem that manifests into the following
>> 3 variations:
>>
>> 1. NULL pointer dereference alert, boots fine and runs fine - hangs
>> during poweroff and reboot
>> 2. Hangs during boot. When booted in recovery, it runs into repeated
>> errors which looks very much like the same call trace I see in the
>> alert.
>>
>> Please see attached images. These two are rolling failures repeated
>> during udev initialization. It is related to firmware loading it looks
>> like.
>>
>> thanks,
>> -- Shuah
>>
>> --
>> Shuah Khan
>> Sr. Linux Kernel Developer
>> Open Source Innovation Group
>> Samsung Research America (Silicon Valley)
>> [email protected] | (970) 217-8978

2015-07-13 16:55:58

by Mark Langsdorf

[permalink] [raw]
Subject: Re: Build failure on ARM64 for Linux 4.2-rc1 was: Linux 4.2-rc1

On 07/09/2015 11:59 AM, Peter Zijlstra wrote:
> On Thu, Jul 09, 2015 at 11:46:04AM -0500, Mark Langsdorf wrote:
>> I did a git bisect and it looks like the faulty patch is
>> d72da4a4d973d8a0a0d3c97e7cdebf287fbe3a99, "rbtree: Make lockless
>> searches non-fatal". I can't see why it causes my builds to fail,
>> but if I revert that patch and the related series, then I can
>> build the kernel and build tools/perf successfully.
>>
>> Any insight into a less intensive way of fixing my build would
>> be appreciated.
>
> This is tools/perf failing to build right? Add tip/perf/urgent, it
> appears to contain the required bits to make it go again.
>
> The /urgent branches will get to Linus 'soon' I imagine.

Thanks, that resolved it.

--Mark Langsdorf