2011-02-22 01:45:18

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.38-rc6

Hmm. Another week, another -rc. This one has the fix for the rather
annoying (but also very rare) memory corruption that we were battling
for a couple of weeks - but since I suspect only two people ever saw
it, I doubt most of you care. It was a big relief for _me_ to have it
resolved, though, so I'll mention it anyway. There's another patch for
a related issue pending, it will go in soon.

Diff-wise, the most noticeable thing here is removal of the /proc
interface from the target code (so that we don't make a release with
deprecated interfaces). But ignoring that (and some arm mach/map.h
cleanup patches), the diffs really are pretty small.

But what is probably actually noticeable is a lot of small fixups,
mostly in drivers. Nothing really exciting, I'm afraid. Or not afraid,
since excitement at this stage in the -rc series is a bad thing.

I do want to remind developers to please take a look at Rafael's
latest regression list (I particularly care for the "since 2.6.37",
but I feel compelled to ask people to look at the 36->37 regression
list too.

And for the people who actually reported the regressions (or didn't,
but have found new ones) - please do update the status of the
regression. Some of them I'm really feel a lot more happy about if
they had a note that "yes, I tested with -rc6, and the problem is
still there". We've fixed a fair number of problems, so it's always
good to have regression reporters ping us with something like "hey,
I'm still here, and you may have fixed all those other things, but you
didn't fix my issue".

Linus

---
Ahmed S. Darwish (2):
Documentation: complete crashkernel= parameter documentation
Documentation: explain [KMG] parameter suffix

Akinobu Mita (2):
sparc: use bitmap_set()
sparc: fix size argument to find_next_zero_bit()

Alex Deucher (1):
drm/radeon/kms: add missing frac fb div flag for dce4+

Amir Hanania (1):
ixgbe: work around for DDP last buffer size

Andrew Vasquez (1):
[SCSI] qla2xxx: Return DID_NO_CONNECT when FC device is lost.

Andy Gospodarek (1):
ixgbe: fix panic due to uninitialised pointer

Andy Whitcroft (1):
ecryptfs: read on a directory should return EISDIR if not supported

Axel Lin (1):
ARM: SAMSUNG: Drop exporting s3c24xx_ts_set_platdata

Bao Liang (1):
Bluetooth: Set conn state to BT_DISCONN to avoid multiple responses

Ben Skeggs (4):
drm/nouveau: fix non-EDIDful native mode selection
drm/nv40: fix tiling-related setup for a number of chipsets
drm/nouveau: flips/flipd need to always set 'evict' for
move_accel_cleanup()
drm/nouveau: fix suspend/resume on GPUs that don't have PM support

Casey Leedom (4):
cxgb4vf: Check driver parameters in the right place ...
cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
cxgb4vf: Use defined Mailbox Timeout

Catalin Marinas (1):
ARM: 6676/1: Correct the cpu_architecture() function for ARMv7

Cho, Yu-Chen (1):
Bluetooth: add Atheros BT AR9285 fw supported

Chuck Ebbert (1):
block: revert block_dev read-only check

Clemens Ladisch (4):
hwmon: (jc42) fix type mismatch
hwmon: (jc42) more helpful documentation
hwmon: (jc42) do not allow writing to locked registers
hwmon: (k10temp) add support for AMD Family 12h/14h CPUs

Dan Carpenter (1):
[SCSI] target: iblock/pscsi claim checking for NULL instead of IS_ERR

Daniel Hellstrom (1):
sparc32: unaligned memory access (MNA) trap handler bug

Daniel Walker (1):
MAINTAINERS: email address change

Darrick J. Wong (1):
[SCSI] scsi_debug: Fix 32-bit overflow in do_device_access
causing memory corruption

David Henningsson (3):
ALSA: HDA: Add position_fix quirk for an Asus device
ALSA: HDA: Conexant auto: Handle multiple connections to ADC node
ALSA: HDA: Do not announce false surround in Conexant auto

David S. Miller (4):
hisax: Fix unchecked alloc_skb() return.
iwlwifi: Delete iwl3945_good_plcp_health.
isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
sparc64: Fix NMI startup bug which also breaks perf.

Dmitry Torokhov (1):
module: explicitly align module_version_attribute structure

Eliad Peller (1):
mac80211: add missing locking in ieee80211_reconfig

Eric Dumazet (2):
net: provide default_advmss() methods to blackhole dst_ops
net: deinit automatic LIST_HEAD

Francisco Jerez (3):
drm/nv10: Fix crash when allocating a BO larger than half the
available VRAM.
drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS
native mode.
drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.

Fubo Chen (1):
[SCSI] target: fixed missing lock drop in error path

Giuseppe Cavallaro (1):
stmmac: enable wol via magic frame by default.

Guenter Roeck (1):
MAINTAINERS: Remove stale hwmon quilt tree

Heiko Carstens (2):
[S390] atomic: use ACCESS_ONCE() for atomic_read()
[S390] atomic: use inline asm

Herbert Xu (4):
crypto: sha-s390 - Reset index after processing partial block
bridge: Fix mglist corruption that leads to memory corruption
bridge: Fix timer typo that may render snooping less effective
bridge: Replace mp->mglist hlist with a bool

Hiroaki SHIMODA (1):
xfrm: avoid possible oopse in xfrm_alloc_dst

Horst Hartmann (1):
[S390] net: provide architecture specific NET_SKB_PAD

Ian Campbell (2):
arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
xen: suspend and resume system devices when running PVHVM

Indan Zupancic (1):
drm/i915: Do not handle backlight combination mode specially

Ivan Vecera (1):
drivers/net: Call netif_carrier_off at the end of the probe

James Bottomley (1):
[SCSI] qla2xxx: Fix race that could hang kthread_stop()

Jan Beulich (1):
hwmon: (lm85) extend to support EMC6D103 chips

Jeff Layton (1):
cifs: fix handling of scopeid in cifs_convert_address

Jesper Juhl (4):
Don't potentially dereference NULL in net/dcb/dcbnl.c:dcbnl_getapp()
USB Network driver infrastructure: Fix leak when
usb_autopm_get_interface() returns less than zero in kevent().
Net, USB, Option, hso: Do not dereference NULL pointer
ATM, Solos PCI ADSL2+: Don't deref NULL pointer if
net_ratelimit() and alloc_skb() interact badly.

Jesse Brandeburg (2):
e1000e: check down flag in tasks
e1000e: flush all writebacks before unload

John Fastabend (1):
net: dcb: application priority is per net_device

John Stultz (2):
RTC: Revert UIE emulation removal
RTC: Re-enable UIE timer/polling emulation

Kashyap, Desai (3):
[SCSI] mptfusion: mptctl_release is required in mptctl.c
[SCSI] mptfusion: Fix Incorrect return value in mptscsih_dev_reset
[SCSI] mptfusion: Bump version 03.04.18

Keng-Yu Lin (1):
dell-laptop: Toggle the unsupported hardware killswitch

Kukjin Kim (5):
ARM: S5PV310: Cleanup map.h file
ARM: S5PV210: Cleanup map.h file
ARM: S5PC100: Clenaup map.h file
ARM: S5P6442: Cleanup map.h file
ARM: S5P64X0: Cleanup map.h file

Kurt Van Dijck (1):
net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING

Linus Torvalds (5):
vfs: fix BUG_ON() in fs/namei.c:1461
Expand CONFIG_DEBUG_LIST to several other list operations
net: dont leave active on stack LIST_HEAD
Revert "tpm_tis: Use timeouts returned from TPM"
Linux 2.6.38-rc6

Maciej Sosnowski (1):
RDMA/nes: Don't generate async events for unregistered devices

Madhuranath Iyengar (1):
[SCSI] qla2xxx: Change from irq to irqsave with host_lock

Marek Ol??k (1):
drm/radeon/kms: do not reject X16 and Y16X16 floating-point
formats on r300

Marek Szyprowski (2):
ARM: S5PV210: Update max8998_platform_data
ARM: S5PV210: Fix regulator names

Martin Schwidefsky (1):
[S390] correct ipl parameter block safe guard

Matt Carlson (1):
tg3: Restrict phy ioctl access

Matthew Garrett (1):
acer-wmi: Fix capitalisation of GUID

Michal Marek (1):
fixdep: Do not record dependency on the source file itself

Mike Marciniszyn (2):
IB/qib: Fix double add_timer()
IB/qib: Prevent double completions after a timeout or RNR error

NeilBrown (1):
nfsd: correctly handle return value from nfsd_map_name_to_*

Nicholas Bellinger (6):
[SCSI] target/iblock: Fix failed bd claim NULL pointer dereference
[SCSI] target: Fix demo-mode MappedLUN shutdown UA/PR breakage
[SCSI] target: Fix top-level configfs_subsystem default_group
shutdown breakage
[SCSI] target: Fix SCF_SCSI_CONTROL_SG_IO_CDB breakage
[SCSI] target: Remove procfs based target_core_mib.c code
[SCSI] target: fix use after free detected by SLUB poison

Nicolas Pitre (2):
ARM: 6739/1: update .gitignore for boot/compressed
ARM: 6745/1: kprobes insn decoding fix

Patrick McHardy (1):
netfilter: nf_iterate: fix incorrect RCU usage

Pawel Moll (1):
ARM: 6740/1: Place correctly notes section in the linker script

Randy Dunlap (4):
net: fix ifenslave build flags
platform/x86: ideapad-laptop depends on INPUT
Documentation: log_buf_len uses [KMG] suffix
Docbook: add fs/eventfd.c and fix typos in it

Raymond Yau (1):
ALSA: au88x0 - Modify pointer callback to give accurate playback position

Russell King (4):
ARM: Ensure predictable endian state on signal handler entry
ARM: Keep exit text/data around for SMP_ON_UP
ARM: tlb: delay page freeing for SMP and ARMv7 CPUs
ARM: tlb: move noMMU tlb_flush() to asm/tlb.h

Sage Weil (3):
libceph: fix socket read error handling
libceph: fix socket write error handling
ceph: queue cap_snaps once per realm

Sebastian Andrzej Siewior (1):
spi/pxa2xx pci: fix the release - remove race

Seth Forshee (1):
thinkpad_acpi: Always report scancodes for hotkeys

Shiraz Hashim (1):
ARM: 6722/1: SPEAr: sp810: switch to slow mode before reset

Shirish Pargaonkar (1):
cifs: Fix regression in LANMAN (LM) auth code

Srinidhi Kasagar (1):
ARM: 6741/1: errata: pl310 cache sync operation may be faulty

Stanislaw Gruszka (2):
iwl3945: remove plcp check
PM / Hibernate: Return error code when alloc_image_page() fails

Stefan Haberland (1):
[S390] dasd: correct device table

Steffen Klassert (1):
ip_gre: Add IPPROTO_GRE to flowi in ipgre_tunnel_xmit

Steve French (1):
[CIFS] update cifs version

Takashi Iwai (1):
ALSA: caiaq - Fix possible string-buffer overflow

Tejun Heo (3):
workqueue: wake up a worker when a rescuer is leaving a gcwq
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long

Thomas Abraham (1):
ARM: S5P: Fix end address in memory resource information for UART devices

Thomas Gleixner (4):
platform-drivers: x86: pmic: Fix up bogus irq hackery
platform-drivers: x86: Convert pmic to new irq_chip functions
platform-drivers: x86: pmic: Use irq_chip buslock mechanism
platform-drivers: x86: pmic: Use request_irq instead of chained handler

Timo Warns (1):
fs/partitions: Validate map_count in Mac partition tables

Toshiharu Okada (2):
pch_gbe: Fix the issue that the receiving data is not normal.
pch_gbe: Fix the MAC Address load issue.

Tyler Hicks (3):
eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata"
eCryptfs: Handle NULL nameidata pointers
eCryptfs: Copy up lower inode attrs in getattr

Uwe Kleine-K?nig (1):
RTC: Release mutex in error path of rtc_alarm_irq_enable

Vasiliy Kulikov (3):
platform: x86: tc1100-wmi: world-writable sysfs wireless and jogdial files
platform: x86: asus_acpi: world-writable procfs files
platform: x86: acer-wmi: world-writable sysfs threeg file

Will Deacon (2):
ARM: 6742/1: pmu: avoid setting IRQ affinity on UP systems
ARM: 6743/1: errata: interrupted ICALLUIS may prevent completion
of broadcasted operation

Yehuda Sadeh (1):
ceph: keep reference to parent inode on ceph_dentry

viresh kumar (3):
ARM: 6720/1: SPEAr: Append UL to VMALLOC_END
ARM: 6712/1: SPEAr: replace readl(), writel() with relaxed
versions in uncompress.h
ARM: 6700/1: SPEAr: Correct SOC config base address for spear320


2011-02-22 14:03:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Mon, Feb 21, 2011 at 05:44:53PM -0800, Linus Torvalds wrote:
> We've fixed a fair number of problems, so it's always
> good to have regression reporters ping us with something like "hey,
> I'm still here, and you may have fixed all those other things, but you
> didn't fix my issue".

I don't know whether this one is relevant or not but it is in fs/namei.c
and it could mean vfs regression. I got the oops below after resuming
from disk today, kernel is 38-rc5. However, it didn't happen yesterday
on resume so it either is a glitch or a bug which is hard to reproduce.
I'll run -rc6 to check.

Beware, I've typed the whole oops from the screen, and while I paid
attention and made sure I had enough coffee before starting, some typos
might've sneaked in. I doublechecked the "Code:" section though.

--
[19728.090341] ------------[ cut here ]------------
[19728.090511] kernel BUG at fs/namei.c:1416!
[19728.090635] invalid opcode: 0000 [#1] PREEMPT SMP
[19728.090807] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
[19728.091060] CPU 0
[19728.091060] Modules linked in: powernow_k8 mperf cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace sco bnep rfcomm l2cap crc16 loop btusb bluetooth snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel usbhid snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss usb_storage snd_kill snd_seq_oss snd_seq_midi snd_rawmidi option snd_seq_midi_event usb_wwan snd_seq usbserial snd_timer snd_seq_device ohci_hcd processor snd evdev kcore thermal video nvram usbcore i2c_piix4 snd_page_alloc nls_base battery thermal_sys edac_core ac button
[19728.091060]
[19728.091060] Pid: 1933, comm: conky Not tainted 2.6.38-rc5 #1 LENOVO 01972NG/INVALID
[19728.091060] RIP: 0010:[<ffffffff8111abb5>] [<ffffffff8111abb5>] link_path_walk+0x975/0xa60
[19728.091060] RSP: 0018:ffff880137a99ce8 EFLAGS: 00010282
[19728.091060] RAX: ffff8800ad836738 RBX: ffff88012bedc00a RCX: 0000000000000000
[19728.091060] RDX: ffffffff81622040 RSI: ffff8800ad80d0c0 RDI: ffff8800ad80d0c0
[19728.091060] RBP: ffff880137a99d78 R08: 0000000000000003 R09: ffff8800ad80d0fa
[19728.091060] R10: 0000000000272c85 R11: ffff880137a99ca4 R12: ffff880137a99e18
[19728.091060] R13: ffff880137af9fe0 R14: ffff880137a99d28 R15: ffff880137af9fe0
[19728.091060] FS: 00007fdf4ca19700(0000) GS:ffff8800afc00000(0000) kn1GS:0000000000000000
[19728.091060] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19728.091060] CR2: 00007fdf48001388 CR3: 0000000135a0f000 CR4: 00000000000006f0
[19728.091060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19728.091060] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[19728.091060] Process conky (pid: 1933, threadinfo ffff880137a98000, task ffff880137af9fe0)
[19728.091060] Stack:
[19728.091060] 00000000ffffff9c ffff880137a99e18 ffff880137a99d18 ffff880137af9fe0
[19728.091060] ffff880137a99e38 0000014100000101 0000000300272c85 ffff88012bedc006
[19728.091060] ffff88013fe0ce00 ffff8800ad80d0c0 ffff880137a99d78 ffff8800ad836738
[19728.091060] Call Trace:
[19728.091060] [<ffffffff8111af17>] do_path_lookup+0x57/0xf0
[19728.091060] [<ffffffff8111bfcf>] do_filp_open+0x1ef/0x770
[19728.091060] [<ffffffff810e8971>] ? handle_mm_fault+0x191/0x250
[19728.091060] [<ffffffff81038c19>] ? sub_preempt_count+0xa9/0xe0
[19728.091060] [<ffffffff81127fba>] ? alloc_fd+0xfa/0x140
[19728.091060] [<ffffffff8110b549>] do_sys_open+0x69/0x110
[19728.091060] [<ffffffff8145dcc9>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[19728.091060] [<ffffffff8110b630>] sys_open+0x20/0x30
[19728.091060] [<ffffffff8100276b>] system_call_fastpath+0x16/0x1b
[19728.091060] Code: 4d 80 4c 89 ea 4c 89 e6 ff d0 8b 4d 80 e9 12 ff ff ff 0f 0b eb fe 4c 89 e7 e8 b8 cb ff ff 85 c0 0f 84 ce fd ff ff e9 19 fa ff ff 47 68 48 8b 40 28 f6 40 09 40 0f 84 e5 f9 ff
[19728.091060] RIP [<ffffffff8111abb5>] link_path_walk+0x975/0xa60
[19728.091060] RSP <ffff880137a99ce8>

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2011-02-22 14:50:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Tue, Feb 22, 2011 at 03:03:49PM +0100, Borislav Petkov wrote:
> I'll run -rc6 to check.

Yeah, forgive the noise, I should've found
3abb17e82f08628b59e20d8cbcb55e2204180f69.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2011-02-22 15:22:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Tue, Feb 22, 2011 at 6:03 AM, Borislav Petkov <[email protected]> wrote:
>
> I don't know whether this one is relevant or not but it is in fs/namei.c
> and it could mean vfs regression. I got the oops below after resuming
> from disk today, kernel is 38-rc5. However, it didn't happen yesterday
> on resume so it either is a glitch or a bug which is hard to reproduce.
> I'll run -rc6 to check.

It's relevant, and it's real, but it's fixed in -rc6 by commit 3abb17e82f08.

Linus

2011-02-23 05:42:14

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

General protection fault:
http://i.imgur.com/TBJ6y.jpg

2011-02-23 06:14:24

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Wed, Feb 23, 2011 at 7:42 AM, Anca Emanuel <[email protected]> wrote:
> General protection fault:
> http://i.imgur.com/TBJ6y.jpg

dmesg: http://pastebin.com/qD8pR8QH
config: http://pastebin.com/XEurtHWi

2011-02-23 09:43:36

by Jeff Chua

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

2011/2/22 Linus Torvalds <[email protected]>:
> And for the people who actually reported the regressions (or didn't,
> but have found new ones) - please do update the status of the
> regression. Some of them I'm really feel a lot more happy about if
> they had a note that "yes, I tested with -rc6, and the problem is
> still there". We've fixed a fair number of problems, so it's always
> good to have regression reporters ping us with something like "hey,
> I'm still here, and you may have fixed all those other things, but you
> didn't fix my issue".

I just encountered this reported bug using Bluetooth Intermec scanners
and discovered that the recent kernel has the same problem as reported
below.

https://bugzilla.kernel.org/show_bug.cgi?id=26182 ... the bluetooth
devices would just stop responding after a while.

[ 4533.361959] btusb 8-1:1.0: no reset_resume for driver btusb?
[ 4533.361964] btusb 8-1:1.1: no reset_resume for driver btusb?

It seems to that Fedora doesn't feel this is a big problem, but for
production systems, it's a big deal as the scanners can't connect to
the server.

Seems to relate to this commit. The workabout is to set
usbcore.autosuspend=-1 as command line options, but as this is not
working as intended, then it should be either be fixed or reverted.

Thanks,
Jeff.



commit 556ea928f78a390fe16ae584e6433dff304d3014
Author: Matthew Garrett <[email protected]>
Date: Thu Sep 16 13:58:15 2010 -0400

Bluetooth: Enable USB autosuspend by default on btusb

We've done this for a while in Fedora without any obvious problems other
than some interaction with input devices. Those should be fixed now, so
let's try this in mainline.

Signed-off-by: Matthew Garrett <[email protected]>
Acked-by: Marcel Holtmann <[email protected]>
Signed-off-by: Gustavo F. Padovan <[email protected]>

2011-02-23 16:33:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Tue, Feb 22, 2011 at 9:42 PM, Anca Emanuel <[email protected]> wrote:
> General protection fault:
> http://i.imgur.com/TBJ6y.jpg
>
> dmesg: http://pastebin.com/qD8pR8QH
> config: http://pastebin.com/XEurtHWi

That's drivers/video/fbmem.c: fb_release(), and the "Code:"
disassembly shows that it is

1b: e8 f7 c0 29 00 callq xyz
20: 48 8b 93 b8 03 00 00 mov 0x3b8(%rbx),%rdx
27:* 48 8b 42 10 mov 0x10(%rdx),%rax <-- trapping instruction

which corresponds to

mutex_lock(&info->lock);
if (info->fbops->fb_release)
info->fbops->fb_release(info,1);

so it looks like 'info->fbops' is invalid. It's in %rdx, and is
0x00d000ae00b500c2, which is definitely not a valid pointer. Looks
like some bad corruption (looks like a sequence of 16-bit numbers, but
it could be anything).

Looks like nouveafb took over from vesafb. Did you do anything special
to trigger this?

Also, you do seem to have some extra patches (yama at the least). Anything else?

Linus

2011-02-23 17:16:10

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Wed, Feb 23, 2011 at 6:32 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 22, 2011 at 9:42 PM, Anca Emanuel <[email protected]> wrote:
>> General protection fault:
>> http://i.imgur.com/TBJ6y.jpg
>>
>> dmesg: http://pastebin.com/qD8pR8QH
>> config: http://pastebin.com/XEurtHWi
>
> That's drivers/video/fbmem.c: fb_release(), and the "Code:"
> disassembly shows that it is
>
> ?1b: ? e8 f7 c0 29 00 ? ? ? ? ?callq ?xyz
> ?20: ? 48 8b 93 b8 03 00 00 ? ?mov ? ?0x3b8(%rbx),%rdx
> ?27:* ?48 8b 42 10 ? ? ? ? ? ? mov ? ?0x10(%rdx),%rax ? ? <-- trapping instruction
>
> which corresponds to
>
> ? ? ? ?mutex_lock(&info->lock);
> ? ? ? ?if (info->fbops->fb_release)
> ? ? ? ? ? ? ? ?info->fbops->fb_release(info,1);
>
> so it looks like 'info->fbops' is invalid. It's in %rdx, and is
> 0x00d000ae00b500c2, which is definitely not a valid pointer. Looks
> like some bad corruption (looks like a sequence of 16-bit numbers, but
> it could be anything).
>
> Looks like nouveafb took over from vesafb. Did you do anything special
> to trigger this?

No. Just boot the system.

>
> Also, you do seem to have some extra patches (yama at the least). Anything else?

I used git clone, nothing else.
First time 2.6.38-rc6 was working.
After an update from ubuntu I get that error at boot.

The dmesg is from Ubuntu 11.04 with their kernel and is working fine.

>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

2011-02-24 00:29:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel <[email protected]> wrote:
>>
>> Looks like nouveafb took over from vesafb. Did you do anything special
>> to trigger this?
>
> No. Just boot the system.

Every boot?

And just out of interest, what happens if you don't have the vesafb
driver at all?

Linus

2011-02-24 00:43:43

by Dave Airlie

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 10:28 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel <[email protected]> wrote:
>>>
>>> Looks like nouveafb took over from vesafb. Did you do anything special
>>> to trigger this?
>>
>> No. Just boot the system.
>
> Every boot?
>
> And just out of interest, what happens if you don't have the vesafb
> driver at all?
>

I think this is a race condition somewhere with plymouth getting
access to vesafb before it gets kicked off the hw,

I'm assuming removing the vga= line from the command line will stop it,

Dave.

2011-02-24 13:20:22

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 2:28 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Feb 23, 2011 at 9:16 AM, Anca Emanuel <[email protected]> wrote:
>>>
>>> Looks like nouveafb took over from vesafb. Did you do anything special
>>> to trigger this?
>>
>> No. Just boot the system.
>
> Every boot?

Yes.

>
> And just out of interest, what happens if you don't have the vesafb
> driver at all?
>
> ? ? ? ? ? ? ? ? ? ? ? ? ?Linus
>

I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode'
and it works.

dmesg: http://pastebin.com/JAZsk4vD

2011-02-24 16:38:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel <[email protected]> wrote:
>>
>> Every boot?
>
> Yes.
>
>> And just out of interest, what happens if you don't have the vesafb
>> driver at all?
>
> I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode'
> and it works.
>
> dmesg: http://pastebin.com/JAZsk4vD

Hmm. So it definitely seems to be the hand-over.

Does this patch make any difference? When we unregister the old
framebuffer, we still leave it in the registered_fb[] array, which
looks wrong. But it would also be interesting to hear if setting
CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference
(they'd help detect accesses to free'd data structures).

Linus


Attachments:
patch.diff (497.00 B)
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 08:37:11AM -0800, Linus Torvalds wrote:
> On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel <[email protected]> wrote:
> >>
> >> Every boot?
> >
> > Yes.
> >
> >> And just out of interest, what happens if you don't have the vesafb
> >> driver at all?
> >
> > I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode'
> > and it works.
> >
> > dmesg: http://pastebin.com/JAZsk4vD
>
> Hmm. So it definitely seems to be the hand-over.
>
> Does this patch make any difference? When we unregister the old
> framebuffer, we still leave it in the registered_fb[] array, which
> looks wrong. But it would also be interesting to hear if setting
> CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference
> (they'd help detect accesses to free'd data structures).

Hi Linus,

I opened a bug about this issue in January, while I was still working
with Mandriva and got a similar issue reported. Basically it's a race on
vesafb removal with i915 with modesetting enabled. And indeed you have
to use slub_debug to always reproduce it, sometimes the use after free
of struct fb_info not always trigers it. I posted a testcase and a
proposed patch at https://bugzilla.kernel.org/show_bug.cgi?id=26232

I remember to have posted here on LKML the patch too, but didn't got
answers to it.

Andy Whitcroft fixed it too with a similar patch,
http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69
I CC'd Andy, the author of the patch, he will push his version, looks
more complete as it takes care of mm_lock in do_mmap too.

My bug report has also another test case and fix for a inverse locking
problem, it would be good to take a look too.

In any case, any of these problems are not recent regressions. The race
on framebuffer removal at least exists since unregister_framebuffer
started to be used to remove it while loading framebuffer from modesetting
drivers.

>
> Linus

> drivers/video/fbmem.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> index e2bf953..e8f8925 100644
> --- a/drivers/video/fbmem.c
> +++ b/drivers/video/fbmem.c
> @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a,
> "%s vs %s - removing generic driver\n",
> name, registered_fb[i]->fix.id);
> unregister_framebuffer(registered_fb[i]);
> + registered_fb[i] = NULL;
> }
> }
> }


--
[]'s
Herton

Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 02:21:15PM -0300, Herton Ronaldo Krzesinski wrote:
> On Thu, Feb 24, 2011 at 08:37:11AM -0800, Linus Torvalds wrote:
> > On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel <[email protected]> wrote:
> > >>
> > >> Every boot?
> > >
> > > Yes.
> > >
> > >> And just out of interest, what happens if you don't have the vesafb
> > >> driver at all?
> > >
> > > I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode'
> > > and it works.
> > >
> > > dmesg: http://pastebin.com/JAZsk4vD
> >
> > Hmm. So it definitely seems to be the hand-over.
> >
> > Does this patch make any difference? When we unregister the old
> > framebuffer, we still leave it in the registered_fb[] array, which
> > looks wrong. But it would also be interesting to hear if setting
> > CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference
> > (they'd help detect accesses to free'd data structures).
>
> Hi Linus,
>
> I opened a bug about this issue in January, while I was still working
> with Mandriva and got a similar issue reported. Basically it's a race on
> vesafb removal with i915 with modesetting enabled. And indeed you have
> to use slub_debug to always reproduce it, sometimes the use after free
> of struct fb_info not always trigers it. I posted a testcase and a
> proposed patch at https://bugzilla.kernel.org/show_bug.cgi?id=26232

Sorry, I have a correction to this.

What I wrote here is confusing, the problem should happen on any
"firmware" framebuffer which gets replaced by any modesetting
framebuffer, like intelfb or nouveaufb, not only i915 as it can be
understood from what I stated. Just the test case I made and problem
reported was with i915, but same holds for nouveaufb as reported here.

The oops here first is because struct fb_info of vesafb is freed while
plymouthd has fb opened. In fb_open, we assign info to
file->private_data. So if the application opens it, and before it is
closed some framebuffer from drm (intelfb, nouveaufb...) replaces
vesafb, remove_conflicting_framebuffers removes the vesafb. Inside
remove_conflicting_framebuffers we call unregister_framebuffer, which in
the end will call fb_info->fbops->fb_destroy (vesafb_destroy) ->
framebuffer_release(info) -> kfree(info)

Then if application closes its file descriptor after the drm framebuffer
loaded, it still has the reference of struct fb_info of vesafb in
file->private_data, then we get the oops as it tries to dereference the
info already freed.

But there is also races in this framebuffer removal also while it is
being unregistered, the accesses to registered_fb[] array, so I made the
testcase and attached to the bug report to show them.

>
> I remember to have posted here on LKML the patch too, but didn't got
> answers to it.
>
> Andy Whitcroft fixed it too with a similar patch,
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69
> I CC'd Andy, the author of the patch, he will push his version, looks
> more complete as it takes care of mm_lock in do_mmap too.
>
> My bug report has also another test case and fix for a inverse locking
> problem, it would be good to take a look too.
>
> In any case, any of these problems are not recent regressions. The race
> on framebuffer removal at least exists since unregister_framebuffer
> started to be used to remove it while loading framebuffer from modesetting
> drivers.
>
> >
> > Linus
>
> > drivers/video/fbmem.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> > index e2bf953..e8f8925 100644
> > --- a/drivers/video/fbmem.c
> > +++ b/drivers/video/fbmem.c
> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct apertures_struct *a,
> > "%s vs %s - removing generic driver\n",
> > name, registered_fb[i]->fix.id);
> > unregister_framebuffer(registered_fb[i]);
> > + registered_fb[i] = NULL;
> > }
> > }
> > }
>
>
> --
> []'s
> Herton

--
[]'s
Herton

2011-02-25 00:48:11

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 6:37 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Feb 24, 2011 at 5:20 AM, Anca Emanuel <[email protected]> wrote:
>>>
>>> Every boot?
>>
>> Yes.
>>
>>> And just out of interest, what happens if you don't have the vesafb
>>> driver at all?
>>
>> I used 'e' option from grub, removed the 'set gfxpayload = $linux_gfx_mode'
>> and it works.
>>
>> dmesg: http://pastebin.com/JAZsk4vD
>
> Hmm. So it definitely seems to be the hand-over.
>
> Does this patch make any difference? When we unregister the old
> framebuffer, we still leave it in the registered_fb[] array, which
> looks wrong. But it would also be interesting to hear if setting
> CONFIG_SLUB_DEBUG_ON or CONFIG_DEBUG_PAGEALLOC makes any difference
> (they'd help detect accesses to free'd data structures).
>
> ? ? ? ? ? ? ? ? ? ? ? ? ?Linus
>
drivers/video/fbmem.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index e2bf953..e8f8925 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
apertures_struct *a,
"%s vs %s - removing generic driver\n",
name, registered_fb[i]->fix.id);
unregister_framebuffer(registered_fb[i]);
+ registered_fb[i] = NULL;
}
}
}


Tested the patch, and now I get this:
dmesg: http://pastebin.com/ieMNrA7C

[ 12.252328] BUG: unable to handle kernel NULL pointer dereference
at 00000000000003b8
[ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
[ 12.252354] PGD 78e6c067 PUD 78e6d067 PMD 0
[ 12.252360] Oops: 0000 [#1] SMP
[ 12.252364] last sysfs file: /sys/module/snd/initstate
[ 12.252370] CPU 0
[ 12.252372] Modules linked in: nouveau(+) snd ttm drm_kms_helper
psmouse serio_raw drm soundcore lp snd_page_alloc i2c_algo_bit video
parport pata_marvell ahci r8169 libahci
[ 12.252393]
[ 12.252397] Pid: 244, comm: plymouthd Not tainted
2.6.38-rc6-git3-patch-linus+ #2 MICRO-STAR INTERNATIONAL CO.,LTD
MS-7360/MS-7360
[ 12.252407] RIP: 0010:[<ffffffff81311178>] [<ffffffff81311178>]
fb_mmap+0x58/0x1d0
[ 12.252414] RSP: 0018:ffff880078e8fd88 EFLAGS: 00010293
[ 12.252418] RAX: 00000000ffffffea RBX: ffff88007047d228 RCX: 0000000000000000
[ 12.252423] RDX: 000fffffffffffff RSI: ffff88007047d228 RDI: ffff880078f5d840
[ 12.252428] RBP: ffff880078e8fdc8 R08: 0000000000000000 R09: ffff88007047d228
[ 12.252432] R10: ffff88006f9d9cf0 R11: ffff88006f9d9d28 R12: ffff880037363800
[ 12.252437] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88007047d228
[ 12.252442] FS: 00007fb5fbaa4720(0000) GS:ffff88007fc00000(0000)
knlGS:0000000000000000
[ 12.252448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.252453] CR2: 00000000000003b8 CR3: 0000000078e6b000 CR4: 00000000000006f0
[ 12.252458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 12.252463] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 12.252468] Process plymouthd (pid: 244, threadinfo
ffff880078e8e000, task ffff88003737ad80)
[ 12.252473] Stack:
[ 12.252476] ffff880037363800 00000000000000b8 ffff880078e8fdd8
ffffffffffffffea
[ 12.252484] ffff880037363800 00000000000006bb 00000000006bb000
ffff88007047d228
[ 12.252491] ffff880078e8fe98 ffffffff81130543 ffff880078f5d840
0000000000000000
[ 12.252499] Call Trace:
[ 12.252507] [<ffffffff81130543>] mmap_region+0x3c3/0x500
[ 12.252514] [<ffffffff81010d7e>] ?
arch_get_unmapped_area_topdown+0x1ce/0x2f0
[ 12.252521] [<ffffffff811309c4>] do_mmap_pgoff+0x344/0x380
[ 12.252528] [<ffffffff810524f1>] ? finish_task_switch+0x41/0xe0
[ 12.252535] [<ffffffff815ac0c3>] ? schedule+0x403/0xa00
[ 12.252541] [<ffffffff81130bfe>] sys_mmap_pgoff+0x1fe/0x230
[ 12.252546] [<ffffffff810108c9>] sys_mmap+0x29/0x30
[ 12.252551] [<ffffffff8100bf02>] system_call_fastpath+0x16/0x1b
[ 12.252556] Code: ba ff ff ff ff ff ff 0f 00 48 89 f3 48 8b 40 30
8b 80 b8 00 00 00 25 ff ff 0f 00 49 39 d6 4c 8b 2c c5 c0 cf aa 81 b8
ea ff ff ff <4d> 8b bd b8 03 00 00 76 1f 48 8b 5d d8 4c 8b 65 e0 4c 8b
6d e8
[ 12.252603] RIP [<ffffffff81311178>] fb_mmap+0x58/0x1d0
[ 12.252608] RSP <ffff880078e8fd88>
[ 12.252611] CR2: 00000000000003b8
[ 12.252616] ---[ end trace 381165bafe65d748 ]---

2011-02-25 00:55:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
>
> diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> index e2bf953..e8f8925 100644
> --- a/drivers/video/fbmem.c
> +++ b/drivers/video/fbmem.c
> @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
> apertures_struct *a,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id);
> ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
> + ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
>
> Tested the patch, and now I get this:
> dmesg: http://pastebin.com/ieMNrA7C
>
> [ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000003b8
> [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0

Ok, goodie.

Or not so goodie, but it does make it clear that yeah, the fb code
seems to be using stale pointers from that registered_fb[] array, and
the whole unregistration process is just racing with people using it.

Herton had that much bigger patch, can you test it?

Linus

2011-02-25 01:15:39

by David Airlie

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
> On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
> >
> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> > index e2bf953..e8f8925 100644
> > --- a/drivers/video/fbmem.c
> > +++ b/drivers/video/fbmem.c
> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
> > apertures_struct *a,
> > "%s vs %s - removing generic driver\n",
> > name, registered_fb[i]->fix.id);
> > unregister_framebuffer(registered_fb[i]);
> > + registered_fb[i] = NULL;
> >
> > Tested the patch, and now I get this:
> > dmesg: http://pastebin.com/ieMNrA7C
> >
> > [ 12.252328] BUG: unable to handle kernel NULL pointer dereference
> > at 00000000000003b8
> > [ 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
>
> Ok, goodie.
>
> Or not so goodie, but it does make it clear that yeah, the fb code
> seems to be using stale pointers from that registered_fb[] array, and
> the whole unregistration process is just racing with people using it.
>
> Herton had that much bigger patch, can you test it?

I think Andy's patch worked, not sure why it fell between the cracks,
either didn't appear on lkml or in my inbox at all.

if we can get Herton to repost it properly + a tested by I'm happy for
it to go in.

Dave.

2011-02-25 01:48:03

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie <[email protected]> wrote:
> On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
>> On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
>> >
>> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
>> > index e2bf953..e8f8925 100644
>> > --- a/drivers/video/fbmem.c
>> > +++ b/drivers/video/fbmem.c
>> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
>> > apertures_struct *a,
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n",
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id);
>> > ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
>> > + ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
>> >
>> > Tested the patch, and now I get this:
>> > dmesg: http://pastebin.com/ieMNrA7C
>> >
>> > [ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference
>> > at 00000000000003b8
>> > [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
>>
>> Ok, goodie.
>>
>> Or not so goodie, but it does make it clear that yeah, the fb code
>> seems to be using stale pointers from that registered_fb[] array, and
>> the whole unregistration process is just racing with people using it.
>>
>> Herton had that much bigger patch, can you test it?
>
> I think Andy's patch worked, not sure why it fell between the cracks,
> either didn't appear on lkml or in my inbox at all.
>
> if we can get Herton to repost it properly + a tested by I'm happy for
> it to go in.
>
> Dave.
>
>

Tested Andy's patch and it works !
http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69

Tested-by: Anca Emanuel <[email protected]>

2011-02-25 01:56:23

by Anca Emanuel

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel <[email protected]> wrote:
> On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie <[email protected]> wrote:
>> On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
>>> On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
>>> >
>>> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
>>> > index e2bf953..e8f8925 100644
>>> > --- a/drivers/video/fbmem.c
>>> > +++ b/drivers/video/fbmem.c
>>> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
>>> > apertures_struct *a,
>>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n",
>>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id);
>>> > ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
>>> > + ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
>>> >
>>> > Tested the patch, and now I get this:
>>> > dmesg: http://pastebin.com/ieMNrA7C
>>> >
>>> > [ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference
>>> > at 00000000000003b8
>>> > [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
>>>
>>> Ok, goodie.
>>>
>>> Or not so goodie, but it does make it clear that yeah, the fb code
>>> seems to be using stale pointers from that registered_fb[] array, and
>>> the whole unregistration process is just racing with people using it.
>>>
>>> Herton had that much bigger patch, can you test it?
>>
>> I think Andy's patch worked, not sure why it fell between the cracks,
>> either didn't appear on lkml or in my inbox at all.
>>
>> if we can get Herton to repost it properly + a tested by I'm happy for
>> it to go in.
>>
>> Dave.
>>
>>
>
> Tested Andy's patch and it works !
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69
>
> Tested-by: Anca Emanuel <[email protected]>
>

link to patch: http://is.gd/otIfGc


Attachments:
patch (9.70 kB)
Subject: Re: Linux 2.6.38-rc6

On Fri, Feb 25, 2011 at 03:56:20AM +0200, Anca Emanuel wrote:
> On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel <[email protected]> wrote:
> > On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie <[email protected]> wrote:
> >> On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
> >>> On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
> >>> >
> >>> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> >>> > index e2bf953..e8f8925 100644
> >>> > --- a/drivers/video/fbmem.c
> >>> > +++ b/drivers/video/fbmem.c
> >>> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
> >>> > apertures_struct *a,
> >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n",
> >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id);
> >>> > ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
> >>> > + ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
> >>> >
> >>> > Tested the patch, and now I get this:
> >>> > dmesg: http://pastebin.com/ieMNrA7C
> >>> >
> >>> > [ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference
> >>> > at 00000000000003b8
> >>> > [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
> >>>
> >>> Ok, goodie.
> >>>
> >>> Or not so goodie, but it does make it clear that yeah, the fb code
> >>> seems to be using stale pointers from that registered_fb[] array, and
> >>> the whole unregistration process is just racing with people using it.
> >>>
> >>> Herton had that much bigger patch, can you test it?
> >>
> >> I think Andy's patch worked, not sure why it fell between the cracks,
> >> either didn't appear on lkml or in my inbox at all.
> >>
> >> if we can get Herton to repost it properly + a tested by I'm happy for
> >> it to go in.
> >>
> >> Dave.
> >>
> >>
> >
> > Tested Andy's patch and it works !
> > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69
> >
> > Tested-by: Anca Emanuel <[email protected]>
> >
>
> link to patch: http://is.gd/otIfGc

Adding Andy on CC (btw he is away for today, may get some time to answer).

Andy, can you repost the patch?

--
[]'s
Herton

2011-03-22 08:37:22

by Paul Mundt

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc6

On Fri, Feb 25, 2011 at 11:49:21AM -0300, Herton Ronaldo Krzesinski wrote:
> On Fri, Feb 25, 2011 at 03:56:20AM +0200, Anca Emanuel wrote:
> > On Fri, Feb 25, 2011 at 3:47 AM, Anca Emanuel <[email protected]> wrote:
> > > On Fri, Feb 25, 2011 at 3:14 AM, Dave Airlie <[email protected]> wrote:
> > >> On Thu, 2011-02-24 at 16:54 -0800, Linus Torvalds wrote:
> > >>> On Thu, Feb 24, 2011 at 4:48 PM, Anca Emanuel <[email protected]> wrote:
> > >>> >
> > >>> > diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
> > >>> > index e2bf953..e8f8925 100644
> > >>> > --- a/drivers/video/fbmem.c
> > >>> > +++ b/drivers/video/fbmem.c
> > >>> > @@ -1511,6 +1511,7 @@ void remove_conflicting_framebuffers(struct
> > >>> > apertures_struct *a,
> > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "%s vs %s - removing generic driver\n",
> > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name, registered_fb[i]->fix.id);
> > >>> > ? ? ? ? ? ? ? ? ? ? ? ?unregister_framebuffer(registered_fb[i]);
> > >>> > + ? ? ? ? ? ? ? ? ? ? ? registered_fb[i] = NULL;
> > >>> >
> > >>> > Tested the patch, and now I get this:
> > >>> > dmesg: http://pastebin.com/ieMNrA7C
> > >>> >
> > >>> > [ ? 12.252328] BUG: unable to handle kernel NULL pointer dereference
> > >>> > at 00000000000003b8
> > >>> > [ ? 12.252342] IP: [<ffffffff81311178>] fb_mmap+0x58/0x1d0
> > >>>
> > >>> Ok, goodie.
> > >>>
> > >>> Or not so goodie, but it does make it clear that yeah, the fb code
> > >>> seems to be using stale pointers from that registered_fb[] array, and
> > >>> the whole unregistration process is just racing with people using it.
> > >>>
> > >>> Herton had that much bigger patch, can you test it?
> > >>
> > >> I think Andy's patch worked, not sure why it fell between the cracks,
> > >> either didn't appear on lkml or in my inbox at all.
> > >>
> > >> if we can get Herton to repost it properly + a tested by I'm happy for
> > >> it to go in.
> > >>
> > >> Dave.
> > >>
> > >>
> > >
> > > Tested Andy's patch and it works !
> > > http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-natty.git;a=commit;h=c5a742b5f78e161d6a13853a7e3e6e1dfa429e69
> > >
> > > Tested-by: Anca Emanuel <[email protected]>
> > >
> >
> > link to patch: http://is.gd/otIfGc
>
> Adding Andy on CC (btw he is away for today, may get some time to answer).
>
> Andy, can you repost the patch?
>
This is the first I've seen the patch as well, but fortunately patchwork
caught it on the Cc.

There's also an outstanding patch for fixing an AB-BA deadlock between
the fb_info lock and the console lock which this will clash with. I'm
happy to rework that patch on top of Andy's patch for Anca and/or Herton
to test, though.

I'll need to do some more testing locally as well..