2013-06-18 16:17:41

by Greg KH

[permalink] [raw]
Subject: [ 00/48] 3.9.7-stable review

From: Greg Kroah-Hartman <[email protected]>

This is the start of the stable review cycle for the 3.9.7 release.
There are 48 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.9.7-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 3.9.7-rc1

Nicolas Schichan <[email protected]>
ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().

Nithin Sujir <[email protected]>
tg3: Wait for boot code to finish after power on

Johan Hovold <[email protected]>
USB: spcp8x5: fix device initialisation at open

Johan Hovold <[email protected]>
USB: f81232: fix device initialisation at open

Johan Hovold <[email protected]>
USB: pl2303: fix device initialisation at open

Alexander Shishkin <[email protected]>
usb: chipidea: fix id change handling

Benjamin Herrenschmidt <[email protected]>
powerpc: Fix missing/delayed calls to irq_work

Paul Mackerras <[email protected]>
powerpc: Fix emulation of illegal instructions on PowerNV platform

Michael Ellerman <[email protected]>
powerpc: Fix stack overflow crash in resume_kernel when ftracing

Matthew Garrett <[email protected]>
Modify UEFI anti-bricking code

Sage Weil <[email protected]>
libceph: wrap auth methods in a mutex

Sage Weil <[email protected]>
libceph: wrap auth ops in wrapper functions

Sage Weil <[email protected]>
libceph: add update_authorizer auth method

Sage Weil <[email protected]>
libceph: fix authorizer invalidation

Sage Weil <[email protected]>
libceph: clear messenger auth_retry flag when we authenticate

Ben Skeggs <[email protected]>
drm/nv50/kms: use dac loadval from vbios, where it's available

Ben Skeggs <[email protected]>
drm/nv50/disp: force dac power state during load detect

Kees Cook <[email protected]>
x86: Fix typo in kexec register clearing

Yinghai Lu <[email protected]>
x86: Fix adjust_range_size_mask calling position

Naoya Horiguchi <[email protected]>
mm: migration: add migrate_entry_wait_huge()

Tomasz Stanislawski <[email protected]>
mm/page_alloc.c: fix watermark check in __zone_watermark_ok()

NeilBrown <[email protected]>
md/raid1,raid10: use freeze_array in place of raise_barrier in various places.

H. Peter Anvin <[email protected]>
md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place

Alex Lyakas <[email protected]>
md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.

Rafael Aquini <[email protected]>
swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion

Daniel Vetter <[email protected]>
drm/i915: prefer VBT modes for SVDO-LVDS over EDID

Luciano Coelho <[email protected]>
wl12xx: fix minimum required firmware version for wl127x multirole

Andrey Vagin <[email protected]>
memcg: don't initialize kmem-cache destroying work for root caches

Stephen M. Cameron <[email protected]>
cciss: fix broken mutex usage in ioctl

Kees Cook <[email protected]>
kmsg: honor dmesg_restrict sysctl on /dev/kmsg

Robin Holt <[email protected]>
reboot: rigrate shutdown/reboot to boot cpu

Srivatsa S. Bhat <[email protected]>
CPU hotplug: provide a generic helper to disable/enable CPU hotplug

Sujith Manoharan <[email protected]>
ath9k: Use minstrel rate control by default

Felix Fietkau <[email protected]>
Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"

Sujith Manoharan <[email protected]>
ath9k: Disable PowerSave by default

Ben Hutchings <[email protected]>
s390/pci: Implement IRQ functions if !PCI

Johan Hedberg <[email protected]>
Bluetooth: Fix mgmt handling of power on failures

Johan Hedberg <[email protected]>
Bluetooth: Fix missing length checks for L2CAP signalling PDUs

Patrik Jakobsson <[email protected]>
drm/gma500/cdv: Unpin framebuffer on crtc disable

Patrik Jakobsson <[email protected]>
drm/gma500/psb: Unpin framebuffer on crtc disable

Tony Lindgren <[email protected]>
drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree

Alex Elder <[email protected]>
rbd: don't destroy ceph_opts in rbd_add()

Jim Schutt <[email protected]>
ceph: ceph_pagelist_append might sleep while atomic

Jim Schutt <[email protected]>
ceph: add cpu_to_le32() calls when encoding a reconnect capability

Alex Elder <[email protected]>
libceph: must hold mutex for reset_changed_osds()

Rafael J. Wysocki <[email protected]>
ACPI / video: Do not bind to device objects with a scan handler

Kees Cook <[email protected]>
b43: stop format string leaking into error msgs

Oleg Nesterov <[email protected]>
audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE


-------------

Diffstat:

Makefile | 4 +-
arch/arm/mach-kirkwood/mpp.c | 5 +-
arch/powerpc/include/asm/exception-64s.h | 2 +-
arch/powerpc/kernel/exceptions-64s.S | 2 +-
arch/powerpc/kernel/irq.c | 2 +-
arch/powerpc/kernel/process.c | 4 +-
arch/powerpc/kernel/traps.c | 10 ++
arch/s390/kernel/irq.c | 64 +++++++
arch/s390/pci/pci.c | 33 ----
arch/x86/boot/compressed/eboot.c | 47 ------
arch/x86/include/asm/efi.h | 7 -
arch/x86/include/uapi/asm/bootparam.h | 1 -
arch/x86/kernel/relocate_kernel_64.S | 2 +-
arch/x86/mm/init.c | 6 +-
arch/x86/platform/efi/efi.c | 188 +++++++--------------
drivers/acpi/scan.c | 5 +-
drivers/acpi/video.c | 3 +
drivers/block/cciss.c | 32 ++--
drivers/block/rbd.c | 10 +-
drivers/gpu/drm/gma500/cdv_intel_display.c | 14 ++
drivers/gpu/drm/gma500/psb_intel_display.c | 14 ++
drivers/gpu/drm/i915/intel_sdvo.c | 10 +-
drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c | 4 +
drivers/gpu/drm/nouveau/core/include/core/class.h | 2 +-
drivers/gpu/drm/nouveau/nv50_display.c | 4 +-
drivers/md/raid1.c | 38 +++--
drivers/md/raid10.c | 29 ++--
drivers/md/raid5.c | 4 +-
drivers/net/ethernet/broadcom/tg3.c | 10 ++
drivers/net/wireless/ath/ath9k/Kconfig | 10 +-
drivers/net/wireless/ath/ath9k/Makefile | 2 +-
.../net/wireless/ath/ath9k/ar9003_2p2_initvals.h | 10 +-
drivers/net/wireless/ath/ath9k/init.c | 7 +-
drivers/net/wireless/ath/ath9k/rc.h | 2 +-
drivers/net/wireless/b43/main.c | 2 +-
drivers/net/wireless/ti/wl12xx/wl12xx.h | 2 +-
drivers/rtc/rtc-twl.c | 1 +
drivers/usb/chipidea/core.c | 3 +-
drivers/usb/serial/f81232.c | 8 +-
drivers/usb/serial/pl2303.c | 10 +-
drivers/usb/serial/spcp8x5.c | 10 +-
fs/ceph/locks.c | 73 +++++---
fs/ceph/mds_client.c | 90 +++++-----
fs/ceph/super.h | 9 +-
fs/proc/kmsg.c | 10 +-
include/linux/ceph/auth.h | 18 ++
include/linux/cpu.h | 4 +
include/linux/swapops.h | 3 +
include/linux/syslog.h | 4 +-
include/net/bluetooth/hci_core.h | 1 +
include/net/bluetooth/mgmt.h | 1 +
kernel/audit.c | 2 +-
kernel/cpu.c | 55 +++---
kernel/printk.c | 91 +++++-----
kernel/sys.c | 29 +++-
mm/hugetlb.c | 2 +-
mm/memcontrol.c | 2 -
mm/migrate.c | 23 ++-
mm/page_alloc.c | 6 +-
mm/swap_state.c | 18 +-
net/bluetooth/hci_core.c | 6 +-
net/bluetooth/l2cap_core.c | 70 ++++++--
net/bluetooth/mgmt.c | 21 +++
net/ceph/auth.c | 117 +++++++++++--
net/ceph/auth_x.c | 24 ++-
net/ceph/auth_x.h | 1 +
net/ceph/messenger.c | 3 +-
net/ceph/mon_client.c | 7 +-
net/ceph/osd_client.c | 29 ++--
69 files changed, 803 insertions(+), 539 deletions(-)


2013-06-18 16:17:52

by Greg KH

[permalink] [raw]
Subject: [ 17/48] CPU hotplug: provide a generic helper to disable/enable CPU hotplug

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Srivatsa S. Bhat" <[email protected]>

commit 16e53dbf10a2d7e228709a7286310e629ede5e45 upstream.

There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Robin Holt <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russell King <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Shawn Guo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/cpu.h | 4 +++
kernel/cpu.c | 55 +++++++++++++++++++++-------------------------------
2 files changed, 27 insertions(+), 32 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;

extern void get_online_cpus(void);
extern void put_online_cpus(void);
+extern void cpu_hotplug_disable(void);
+extern void cpu_hotplug_enable(void);
#define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri)
#define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_un

#define get_online_cpus() do { } while (0)
#define put_online_cpus() do { } while (0)
+#define cpu_hotplug_disable() do { } while (0)
+#define cpu_hotplug_enable() do { } while (0)
#define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
/* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -133,6 +133,27 @@ static void cpu_hotplug_done(void)
mutex_unlock(&cpu_hotplug.lock);
}

+/*
+ * Wait for currently running CPU hotplug operations to complete (if any) and
+ * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' protects
+ * the 'cpu_hotplug_disabled' flag. The same lock is also acquired by the
+ * hotplug path before performing hotplug operations. So acquiring that lock
+ * guarantees mutual exclusion from any currently running hotplug operations.
+ */
+void cpu_hotplug_disable(void)
+{
+ cpu_maps_update_begin();
+ cpu_hotplug_disabled = 1;
+ cpu_maps_update_done();
+}
+
+void cpu_hotplug_enable(void)
+{
+ cpu_maps_update_begin();
+ cpu_hotplug_disabled = 0;
+ cpu_maps_update_done();
+}
+
#else /* #if CONFIG_HOTPLUG_CPU */
static void cpu_hotplug_begin(void) {}
static void cpu_hotplug_done(void) {}
@@ -541,36 +562,6 @@ static int __init alloc_frozen_cpus(void
core_initcall(alloc_frozen_cpus);

/*
- * Prevent regular CPU hotplug from racing with the freezer, by disabling CPU
- * hotplug when tasks are about to be frozen. Also, don't allow the freezer
- * to continue until any currently running CPU hotplug operation gets
- * completed.
- * To modify the 'cpu_hotplug_disabled' flag, we need to acquire the
- * 'cpu_add_remove_lock'. And this same lock is also taken by the regular
- * CPU hotplug path and released only after it is complete. Thus, we
- * (and hence the freezer) will block here until any currently running CPU
- * hotplug operation gets completed.
- */
-void cpu_hotplug_disable_before_freeze(void)
-{
- cpu_maps_update_begin();
- cpu_hotplug_disabled = 1;
- cpu_maps_update_done();
-}
-
-
-/*
- * When tasks have been thawed, re-enable regular CPU hotplug (which had been
- * disabled while beginning to freeze tasks).
- */
-void cpu_hotplug_enable_after_thaw(void)
-{
- cpu_maps_update_begin();
- cpu_hotplug_disabled = 0;
- cpu_maps_update_done();
-}
-
-/*
* When callbacks for CPU hotplug notifications are being executed, we must
* ensure that the state of the system with respect to the tasks being frozen
* or not, as reported by the notification, remains unchanged *throughout the
@@ -589,12 +580,12 @@ cpu_hotplug_pm_callback(struct notifier_

case PM_SUSPEND_PREPARE:
case PM_HIBERNATION_PREPARE:
- cpu_hotplug_disable_before_freeze();
+ cpu_hotplug_disable();
break;

case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
- cpu_hotplug_enable_after_thaw();
+ cpu_hotplug_enable();
break;

default:

2013-06-18 16:17:58

by Greg KH

[permalink] [raw]
Subject: [ 24/48] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Rafael Aquini <[email protected]>

commit cbab0e4eec299e9059199ebe6daf48730be46d2b upstream.

read_swap_cache_async() can race against get_swap_page(), and stumble
across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
into the swapcache yet.

This transient swap_map state is expected to be transitory, but the
actual placement of discard at scan_swap_map() inserts a wait for I/O
completion thus making the thread at read_swap_cache_async() to loop
around its -EEXIST case, while the other end at get_swap_page() is
scheduled away at scan_swap_map(). This can leave the system deadlocked
if the I/O completion happens to be waiting on the CPU waitqueue where
read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.

This patch introduces a cond_resched() call to make the aforementioned
read_swap_cache_async() busy loop condition to bail out when necessary,
thus avoiding the subtle race window.

Signed-off-by: Rafael Aquini <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Shaohua Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/swap_state.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -336,8 +336,24 @@ struct page *read_swap_cache_async(swp_e
* Swap entry may have been freed since our caller observed it.
*/
err = swapcache_prepare(entry);
- if (err == -EEXIST) { /* seems racy */
+ if (err == -EEXIST) {
radix_tree_preload_end();
+ /*
+ * We might race against get_swap_page() and stumble
+ * across a SWAP_HAS_CACHE swap_map entry whose page
+ * has not been brought into the swapcache yet, while
+ * the other end is scheduled away waiting on discard
+ * I/O completion at scan_swap_map().
+ *
+ * In order to avoid turning this transitory state
+ * into a permanent loop around this -EEXIST case
+ * if !CONFIG_PREEMPT and the I/O completion happens
+ * to be waiting on the CPU waitqueue where we are now
+ * busy looping, we just conditionally invoke the
+ * scheduler here, if there are some more important
+ * tasks to run.
+ */
+ cond_resched();
continue;
}
if (err) { /* swp entry is obsolete ? */

2013-06-18 16:18:23

by Greg KH

[permalink] [raw]
Subject: [ 44/48] USB: pl2303: fix device initialisation at open

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 2d8f4447b58bba5f8cb895c07690434c02307eaf upstream.

Do not use uninitialised termios data to determine when to configure the
device at open.

This also prevents stack data from leaking to userspace in the OOM error
path.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/serial/pl2303.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -283,7 +283,7 @@ static void pl2303_set_termios(struct tt
serial settings even to the same values as before. Thus
we actually need to filter in this specific case */

- if (!tty_termios_hw_change(&tty->termios, old_termios))
+ if (old_termios && !tty_termios_hw_change(&tty->termios, old_termios))
return;

cflag = tty->termios.c_cflag;
@@ -292,7 +292,8 @@ static void pl2303_set_termios(struct tt
if (!buf) {
dev_err(&port->dev, "%s - out of memory.\n", __func__);
/* Report back no change occurred */
- tty->termios = *old_termios;
+ if (old_termios)
+ tty->termios = *old_termios;
return;
}

@@ -432,7 +433,7 @@ static void pl2303_set_termios(struct tt
control = priv->line_control;
if ((cflag & CBAUD) == B0)
priv->line_control &= ~(CONTROL_DTR | CONTROL_RTS);
- else if ((old_termios->c_cflag & CBAUD) == B0)
+ else if (old_termios && (old_termios->c_cflag & CBAUD) == B0)
priv->line_control |= (CONTROL_DTR | CONTROL_RTS);
if (control != priv->line_control) {
control = priv->line_control;
@@ -491,7 +492,6 @@ static void pl2303_close(struct usb_seri

static int pl2303_open(struct tty_struct *tty, struct usb_serial_port *port)
{
- struct ktermios tmp_termios;
struct usb_serial *serial = port->serial;
struct pl2303_serial_private *spriv = usb_get_serial_data(serial);
int result;
@@ -507,7 +507,7 @@ static int pl2303_open(struct tty_struct

/* Setup termios */
if (tty)
- pl2303_set_termios(tty, port, &tmp_termios);
+ pl2303_set_termios(tty, port, NULL);

result = usb_submit_urb(port->interrupt_in_urb, GFP_KERNEL);
if (result) {

2013-06-18 16:18:32

by Greg KH

[permalink] [raw]
Subject: [ 45/48] USB: f81232: fix device initialisation at open

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 21886725d58e92188159731c7c1aac803dd6b9dc upstream.

Do not use uninitialised termios data to determine when to configure the
device at open.

This also prevents stack data from leaking to userspace.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/serial/f81232.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -165,11 +165,12 @@ static void f81232_set_termios(struct tt
/* FIXME - Stubbed out for now */

/* Don't change anything if nothing has changed */
- if (!tty_termios_hw_change(&tty->termios, old_termios))
+ if (old_termios && !tty_termios_hw_change(&tty->termios, old_termios))
return;

/* Do the real work here... */
- tty_termios_copy_hw(&tty->termios, old_termios);
+ if (old_termios)
+ tty_termios_copy_hw(&tty->termios, old_termios);
}

static int f81232_tiocmget(struct tty_struct *tty)
@@ -187,12 +188,11 @@ static int f81232_tiocmset(struct tty_st

static int f81232_open(struct tty_struct *tty, struct usb_serial_port *port)
{
- struct ktermios tmp_termios;
int result;

/* Setup termios */
if (tty)
- f81232_set_termios(tty, port, &tmp_termios);
+ f81232_set_termios(tty, port, NULL);

result = usb_submit_urb(port->interrupt_in_urb, GFP_KERNEL);
if (result) {

2013-06-18 16:18:30

by Greg KH

[permalink] [raw]
Subject: [ 46/48] USB: spcp8x5: fix device initialisation at open

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 5e4211f1c47560c36a8b3d4544dfd866dcf7ccd0 upstream.

Do not use uninitialised termios data to determine when to configure the
device at open.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/serial/spcp8x5.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

--- a/drivers/usb/serial/spcp8x5.c
+++ b/drivers/usb/serial/spcp8x5.c
@@ -314,7 +314,6 @@ static void spcp8x5_set_termios(struct t
struct spcp8x5_private *priv = usb_get_serial_port_data(port);
unsigned long flags;
unsigned int cflag = tty->termios.c_cflag;
- unsigned int old_cflag = old_termios->c_cflag;
unsigned short uartdata;
unsigned char buf[2] = {0, 0};
int baud;
@@ -323,15 +322,15 @@ static void spcp8x5_set_termios(struct t


/* check that they really want us to change something */
- if (!tty_termios_hw_change(&tty->termios, old_termios))
+ if (old_termios && !tty_termios_hw_change(&tty->termios, old_termios))
return;

/* set DTR/RTS active */
spin_lock_irqsave(&priv->lock, flags);
control = priv->line_control;
- if ((old_cflag & CBAUD) == B0) {
+ if (old_termios && (old_termios->c_cflag & CBAUD) == B0) {
priv->line_control |= MCR_DTR;
- if (!(old_cflag & CRTSCTS))
+ if (!(old_termios->c_cflag & CRTSCTS))
priv->line_control |= MCR_RTS;
}
if (control != priv->line_control) {
@@ -421,7 +420,6 @@ static void spcp8x5_set_termios(struct t
* status of the device. */
static int spcp8x5_open(struct tty_struct *tty, struct usb_serial_port *port)
{
- struct ktermios tmp_termios;
struct usb_serial *serial = port->serial;
struct spcp8x5_private *priv = usb_get_serial_port_data(port);
int ret;
@@ -442,7 +440,7 @@ static int spcp8x5_open(struct tty_struc

/* Setup termios */
if (tty)
- spcp8x5_set_termios(tty, port, &tmp_termios);
+ spcp8x5_set_termios(tty, port, NULL);

spcp8x5_get_msr(serial->dev, &status, priv->type);


2013-06-18 16:19:45

by Greg KH

[permalink] [raw]
Subject: [ 40/48] powerpc: Fix stack overflow crash in resume_kernel when ftracing

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <[email protected]>

commit 0e37739b1c96d65e6433998454985de994383019 upstream.

It's possible for us to crash when running with ftrace enabled, eg:

Bad kernel stack pointer bffffd12 at c00000000000a454
cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
pc: c00000000000a454: resume_kernel+0x34/0x60
lr: c00000000000335c: performance_monitor_common+0x15c/0x180
sp: bffffd12
msr: 8000000000001032
dar: bffffd12
dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

3:mon> t c0000002ecab0000
[c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
[c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
--- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
[c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
[c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
[c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
[c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
[c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
[c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
[c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
[c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
[c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
"powerpc: Rework runlatch code" by myself --BenH
]

Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/exception-64s.h | 2 +-
arch/powerpc/kernel/process.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -513,7 +513,7 @@ label##_common: \
*/
#define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr) \
EXCEPTION_COMMON(trap, label, hdlr, ret_from_except_lite, \
- FINISH_NAP;RUNLATCH_ON;DISABLE_INTS)
+ FINISH_NAP;DISABLE_INTS;RUNLATCH_ON)

/*
* When the idle code in power4_idle puts the CPU into NAP mode,
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1371,7 +1371,7 @@ EXPORT_SYMBOL(dump_stack);

#ifdef CONFIG_PPC64
/* Called with hard IRQs off */
-void __ppc64_runlatch_on(void)
+void notrace __ppc64_runlatch_on(void)
{
struct thread_info *ti = current_thread_info();
unsigned long ctrl;
@@ -1384,7 +1384,7 @@ void __ppc64_runlatch_on(void)
}

/* Called with hard IRQs off */
-void __ppc64_runlatch_off(void)
+void notrace __ppc64_runlatch_off(void)
{
struct thread_info *ti = current_thread_info();
unsigned long ctrl;

2013-06-18 16:19:46

by Greg KH

[permalink] [raw]
Subject: [ 48/48] ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicolas Schichan <[email protected]>

commit 4089fe95bfed295c8ad38251d5fe02b6b0ba684c upstream.

MPP_F6281_MASK would be previously be returned when on mv88f6282,
which would disallow some valid MPP configurations.

Commit 830f8b91 (arm: plat-orion: fix printing of "MPP config
unavailable on this hardware") made this problem visible as an invalid
MPP configuration is now correctly detected and not applied.

Signed-off-by: Nicolas Schichan <[email protected]>
Signed-off-by: Jason Cooper <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mach-kirkwood/mpp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/arm/mach-kirkwood/mpp.c
+++ b/arch/arm/mach-kirkwood/mpp.c
@@ -22,9 +22,10 @@ static unsigned int __init kirkwood_vari

kirkwood_pcie_id(&dev, &rev);

- if ((dev == MV88F6281_DEV_ID && rev >= MV88F6281_REV_A0) ||
- (dev == MV88F6282_DEV_ID))
+ if (dev == MV88F6281_DEV_ID && rev >= MV88F6281_REV_A0)
return MPP_F6281_MASK;
+ if (dev == MV88F6282_DEV_ID)
+ return MPP_F6282_MASK;
if (dev == MV88F6192_DEV_ID && rev >= MV88F6192_REV_A0)
return MPP_F6192_MASK;
if (dev == MV88F6180_DEV_ID)

2013-06-18 16:20:27

by Greg KH

[permalink] [raw]
Subject: [ 47/48] tg3: Wait for boot code to finish after power on

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nithin Sujir <[email protected]>

commit df465abfe06f7dc4f33f4a96d17f096e9e8ac917 upstream.

Some systems that don't need wake-on-lan may choose to power down the
chip on system standby. Upon resume, the power on causes the boot code
to startup and initialize the hardware. On one new platform, this is
causing the device to go into a bad state due to a race between the
driver and boot code, once every several hundred resumes. The same race
exists on open since we come up from a power on.

This patch adds a wait for boot code signature at the beginning of
tg3_init_hw() which is common to both cases. If there has not been a
power-off or the boot code has already completed, the signature will be
present and poll_fw() returns immediately. Also return immediately if
the device does not have firmware.

Signed-off-by: Nithin Nayak Sujir <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ethernet/broadcom/tg3.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -1799,6 +1799,9 @@ static int tg3_poll_fw(struct tg3 *tp)
int i;
u32 val;

+ if (tg3_flag(tp, NO_FWARE_REPORTED))
+ return 0;
+
if (tg3_flag(tp, IS_SSB_CORE)) {
/* We don't use firmware. */
return 0;
@@ -10016,6 +10019,13 @@ static int tg3_reset_hw(struct tg3 *tp,
*/
static int tg3_init_hw(struct tg3 *tp, int reset_phy)
{
+ /* Chip may have been just powered on. If so, the boot code may still
+ * be running initialization. Wait for it to finish to avoid races in
+ * accessing the hardware.
+ */
+ tg3_enable_register_access(tp);
+ tg3_poll_fw(tp);
+
tg3_switch_clocks(tp);

tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);

2013-06-18 16:18:19

by Greg KH

[permalink] [raw]
Subject: [ 42/48] powerpc: Fix missing/delayed calls to irq_work

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <[email protected]>

commit 230b3034793247f61e6a0b08c44cf415f6d92981 upstream.

When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Tested-by: Steven Rostedt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/irq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -162,7 +162,7 @@ notrace unsigned int __check_irq_replay(
* in case we also had a rollover while hard disabled
*/
local_paca->irq_happened &= ~PACA_IRQ_DEC;
- if (decrementer_check_overflow())
+ if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
return 0x900;

/* Finally check if an external interrupt happened */

2013-06-18 16:18:17

by Greg KH

[permalink] [raw]
Subject: [ 35/48] libceph: fix authorizer invalidation

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit 4b8e8b5d78b8322351d44487c1b76f7e9d3412bc upstream.

We were invalidating the authorizer by removing the ticket handler
entirely. This was effective in inducing us to request a new authorizer,
but in the meantime it mean that any authorizer we generated would get a
new and initialized handler with secret_id=0, which would always be
rejected by the server side with a confusing error message:

auth: could not find secret_id=0
cephx: verify_authorizer could not get service secret for service osd secret_id=0

Instead, simply clear the validity field. This will still induce the auth
code to request a new secret, but will let us continue to use the old
ticket in the meantime. The messenger code will probably continue to fail,
but the exponential backoff will kick in, and eventually the we will get a
new (hopefully more valid) ticket from the mon and be able to continue.

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ceph/auth_x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -630,7 +630,7 @@ static void ceph_x_invalidate_authorizer

th = get_ticket_handler(ac, peer_type);
if (!IS_ERR(th))
- remove_ticket_handler(ac, th);
+ memset(&th->validity, 0, sizeof(th->validity));
}



2013-06-18 16:18:16

by Greg KH

[permalink] [raw]
Subject: [ 41/48] powerpc: Fix emulation of illegal instructions on PowerNV platform

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paul Mackerras <[email protected]>

commit bf593907f7236e95698a76b7c7a2bbf8b1165327 upstream.

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr). The emulation of unimplemented instructions is currently not
working on the PowerNV platform. The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt. This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt(). With this, old programs that use no-longer
implemented instructions such as dcba now work again.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/kernel/exceptions-64s.S | 2 +-
arch/powerpc/kernel/traps.c | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -707,7 +707,7 @@ machine_check_common:
STD_EXCEPTION_COMMON(0xb00, trap_0b, .unknown_exception)
STD_EXCEPTION_COMMON(0xd00, single_step, .single_step_exception)
STD_EXCEPTION_COMMON(0xe00, trap_0e, .unknown_exception)
- STD_EXCEPTION_COMMON(0xe40, emulation_assist, .program_check_exception)
+ STD_EXCEPTION_COMMON(0xe40, emulation_assist, .emulation_assist_interrupt)
STD_EXCEPTION_COMMON(0xe60, hmi_exception, .unknown_exception)
#ifdef CONFIG_PPC_DOORBELL
STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, .doorbell_exception)
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1142,6 +1142,16 @@ void __kprobes program_check_exception(s
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
}

+/*
+ * This occurs when running in hypervisor mode on POWER6 or later
+ * and an illegal instruction is encountered.
+ */
+void __kprobes emulation_assist_interrupt(struct pt_regs *regs)
+{
+ regs->msr |= REASON_ILLEGAL;
+ program_check_exception(regs);
+}
+
void alignment_exception(struct pt_regs *regs)
{
int sig, code, fixed = 0;

2013-06-18 16:18:14

by Greg KH

[permalink] [raw]
Subject: [ 36/48] libceph: add update_authorizer auth method

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one. In the meantime, when
we rotate our service keys, the authorizer doesn't get updated. Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret. If it
is not, we will build a new one that does. This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ceph/mds_client.c | 7 ++++++-
include/linux/ceph/auth.h | 3 +++
net/ceph/auth_x.c | 23 +++++++++++++++++++++++
net/ceph/auth_x.h | 1 +
net/ceph/osd_client.c | 5 +++++
5 files changed, 38 insertions(+), 1 deletion(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3444,7 +3444,12 @@ static struct ceph_auth_handshake *get_a
}
if (!auth->authorizer && ac->ops && ac->ops->create_authorizer) {
int ret = ac->ops->create_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
- auth);
+ auth);
+ if (ret)
+ return ERR_PTR(ret);
+ } else if (ac->ops && ac->ops_update_authorizer) {
+ int ret = ac->ops->update_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
+ auth);
if (ret)
return ERR_PTR(ret);
}
--- a/include/linux/ceph/auth.h
+++ b/include/linux/ceph/auth.h
@@ -52,6 +52,9 @@ struct ceph_auth_client_ops {
*/
int (*create_authorizer)(struct ceph_auth_client *ac, int peer_type,
struct ceph_auth_handshake *auth);
+ /* ensure that an existing authorizer is up to date */
+ int (*update_authorizer)(struct ceph_auth_client *ac, int peer_type,
+ struct ceph_auth_handshake *auth);
int (*verify_authorizer_reply)(struct ceph_auth_client *ac,
struct ceph_authorizer *a, size_t len);
void (*destroy_authorizer)(struct ceph_auth_client *ac,
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -298,6 +298,7 @@ static int ceph_x_build_authorizer(struc
return -ENOMEM;
}
au->service = th->service;
+ au->secret_id = th->secret_id;

msg_a = au->buf->vec.iov_base;
msg_a->struct_v = 1;
@@ -555,6 +556,27 @@ static int ceph_x_create_authorizer(
return 0;
}

+static int ceph_x_update_authorizer(
+ struct ceph_auth_client *ac, int peer_type,
+ struct ceph_auth_handshake *auth)
+{
+ struct ceph_x_authorizer *au;
+ struct ceph_x_ticket_handler *th;
+ int ret;
+
+ th = get_ticket_handler(ac, peer_type);
+ if (IS_ERR(th))
+ return PTR_ERR(th);
+
+ au = (struct ceph_x_authorizer *)auth->authorizer;
+ if (au->secret_id < th->secret_id) {
+ dout("ceph_x_update_authorizer service %u secret %llu < %llu\n",
+ au->service, au->secret_id, th->secret_id);
+ return ceph_x_build_authorizer(ac, th, au);
+ }
+ return 0;
+}
+
static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac,
struct ceph_authorizer *a, size_t len)
{
@@ -641,6 +663,7 @@ static const struct ceph_auth_client_ops
.build_request = ceph_x_build_request,
.handle_reply = ceph_x_handle_reply,
.create_authorizer = ceph_x_create_authorizer,
+ .update_authorizer = ceph_x_update_authorizer,
.verify_authorizer_reply = ceph_x_verify_authorizer_reply,
.destroy_authorizer = ceph_x_destroy_authorizer,
.invalidate_authorizer = ceph_x_invalidate_authorizer,
--- a/net/ceph/auth_x.h
+++ b/net/ceph/auth_x.h
@@ -29,6 +29,7 @@ struct ceph_x_authorizer {
struct ceph_buffer *buf;
unsigned int service;
u64 nonce;
+ u64 secret_id;
char reply_buf[128]; /* big enough for encrypted blob */
};

--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2177,6 +2177,11 @@ static struct ceph_auth_handshake *get_a
auth);
if (ret)
return ERR_PTR(ret);
+ } else if (ac->ops && ac->ops->update_authorizer) {
+ int ret = ac->ops->update_authorizer(ac, CEPH_ENTITY_TYPE_OSD,
+ auth);
+ if (ret)
+ return ERR_PTR(ret);
}
*proto = ac->protocol;


2013-06-18 16:18:12

by Greg KH

[permalink] [raw]
Subject: [ 28/48] mm/page_alloc.c: fix watermark check in __zone_watermark_ok()

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tomasz Stanislawski <[email protected]>

commit 026b08147923142e925a7d0aaa39038055ae0156 upstream.

The watermark check consists of two sub-checks. The first one is:

if (free_pages <= min + lowmem_reserve)
return false;

The check assures that there is minimal amount of RAM in the zone. If
CMA is used then the free_pages is reduced by the number of free pages
in CMA prior to the over-mentioned check.

if (!(alloc_flags & ALLOC_CMA))
free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

This prevents the zone from being drained from pages available for
non-movable allocations.

The second check prevents the zone from getting too fragmented.

for (o = 0; o < order; o++) {
free_pages -= z->free_area[o].nr_free << o;
min >>= 1;
if (free_pages <= min)
return false;
}

The field z->free_area[o].nr_free is equal to the number of free pages
including free CMA pages. Therefore the CMA pages are subtracted twice.
This may cause a false positive fail of __zone_watermark_ok() if the CMA
area gets strongly fragmented. In such a case there are many 0-order
free pages located in CMA. Those pages are subtracted twice therefore
they will quickly drain free_pages during the check against
fragmentation. The test fails even though there are many free non-cma
pages in the zone.

This patch fixes this issue by subtracting CMA pages only for a purpose of
(free_pages <= min + lowmem_reserve) check.

Laura said:

We were observing allocation failures of higher order pages (order 5 =
128K typically) under tight memory conditions resulting in driver
failure. The output from the page allocation failure showed plenty of
free pages of the appropriate order/type/zone and mostly CMA pages in
the lower orders.

For full disclosure, we still observed some page allocation failures
even after applying the patch but the number was drastically reduced and
those failures were attributed to fragmentation/other system issues.

Signed-off-by: Tomasz Stanislawski <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Tested-by: Laura Abbott <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1626,6 +1626,7 @@ static bool __zone_watermark_ok(struct z
long min = mark;
long lowmem_reserve = z->lowmem_reserve[classzone_idx];
int o;
+ long free_cma = 0;

free_pages -= (1 << order) - 1;
if (alloc_flags & ALLOC_HIGH)
@@ -1635,9 +1636,10 @@ static bool __zone_watermark_ok(struct z
#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
if (!(alloc_flags & ALLOC_CMA))
- free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
+ free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
- if (free_pages <= min + lowmem_reserve)
+
+ if (free_pages - free_cma <= min + lowmem_reserve)
return false;
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */

2013-06-18 16:18:10

by Greg KH

[permalink] [raw]
Subject: [ 37/48] libceph: wrap auth ops in wrapper functions

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks. Simplifies the external
interface.

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ceph/mds_client.c | 26 +++++++++++--------------
include/linux/ceph/auth.h | 13 ++++++++++++
net/ceph/auth.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
net/ceph/auth_x.c | 1
net/ceph/mon_client.c | 7 ++----
net/ceph/osd_client.c | 26 ++++++++-----------------
6 files changed, 84 insertions(+), 36 deletions(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -364,9 +364,9 @@ void ceph_put_mds_session(struct ceph_md
atomic_read(&s->s_ref), atomic_read(&s->s_ref)-1);
if (atomic_dec_and_test(&s->s_ref)) {
if (s->s_auth.authorizer)
- s->s_mdsc->fsc->client->monc.auth->ops->destroy_authorizer(
- s->s_mdsc->fsc->client->monc.auth,
- s->s_auth.authorizer);
+ ceph_auth_destroy_authorizer(
+ s->s_mdsc->fsc->client->monc.auth,
+ s->s_auth.authorizer);
kfree(s);
}
}
@@ -3438,18 +3438,17 @@ static struct ceph_auth_handshake *get_a
struct ceph_auth_handshake *auth = &s->s_auth;

if (force_new && auth->authorizer) {
- if (ac->ops && ac->ops->destroy_authorizer)
- ac->ops->destroy_authorizer(ac, auth->authorizer);
+ ceph_auth_destroy_authorizer(ac, auth->authorizer);
auth->authorizer = NULL;
}
- if (!auth->authorizer && ac->ops && ac->ops->create_authorizer) {
- int ret = ac->ops->create_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
- auth);
+ if (!auth->authorizer) {
+ int ret = ceph_auth_create_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
+ auth);
if (ret)
return ERR_PTR(ret);
- } else if (ac->ops && ac->ops_update_authorizer) {
- int ret = ac->ops->update_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
- auth);
+ } else {
+ int ret = ceph_auth_update_authorizer(ac, CEPH_ENTITY_TYPE_MDS,
+ auth);
if (ret)
return ERR_PTR(ret);
}
@@ -3465,7 +3464,7 @@ static int verify_authorizer_reply(struc
struct ceph_mds_client *mdsc = s->s_mdsc;
struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;

- return ac->ops->verify_authorizer_reply(ac, s->s_auth.authorizer, len);
+ return ceph_auth_verify_authorizer_reply(ac, s->s_auth.authorizer, len);
}

static int invalidate_authorizer(struct ceph_connection *con)
@@ -3474,8 +3473,7 @@ static int invalidate_authorizer(struct
struct ceph_mds_client *mdsc = s->s_mdsc;
struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;

- if (ac->ops->invalidate_authorizer)
- ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_MDS);
+ ceph_auth_invalidate_authorizer(ac, CEPH_ENTITY_TYPE_MDS);

return ceph_monc_validate_auth(&mdsc->fsc->client->monc);
}
--- a/include/linux/ceph/auth.h
+++ b/include/linux/ceph/auth.h
@@ -97,5 +97,18 @@ extern int ceph_build_auth(struct ceph_a
void *msg_buf, size_t msg_len);

extern int ceph_auth_is_authenticated(struct ceph_auth_client *ac);
+extern int ceph_auth_create_authorizer(struct ceph_auth_client *ac,
+ int peer_type,
+ struct ceph_auth_handshake *auth);
+extern void ceph_auth_destroy_authorizer(struct ceph_auth_client *ac,
+ struct ceph_authorizer *a);
+extern int ceph_auth_update_authorizer(struct ceph_auth_client *ac,
+ int peer_type,
+ struct ceph_auth_handshake *a);
+extern int ceph_auth_verify_authorizer_reply(struct ceph_auth_client *ac,
+ struct ceph_authorizer *a,
+ size_t len);
+extern void ceph_auth_invalidate_authorizer(struct ceph_auth_client *ac,
+ int peer_type);

#endif
--- a/net/ceph/auth.c
+++ b/net/ceph/auth.c
@@ -257,3 +257,50 @@ int ceph_auth_is_authenticated(struct ce
return 0;
return ac->ops->is_authenticated(ac);
}
+EXPORT_SYMBOL(ceph_auth_is_authenticated);
+
+int ceph_auth_create_authorizer(struct ceph_auth_client *ac,
+ int peer_type,
+ struct ceph_auth_handshake *auth)
+{
+ if (ac->ops && ac->ops->create_authorizer)
+ return ac->ops->create_authorizer(ac, peer_type, auth);
+ return 0;
+}
+EXPORT_SYMBOL(ceph_auth_create_authorizer);
+
+void ceph_auth_destroy_authorizer(struct ceph_auth_client *ac,
+ struct ceph_authorizer *a)
+{
+ if (ac->ops && ac->ops->destroy_authorizer)
+ ac->ops->destroy_authorizer(ac, a);
+}
+EXPORT_SYMBOL(ceph_auth_destroy_authorizer);
+
+int ceph_auth_update_authorizer(struct ceph_auth_client *ac,
+ int peer_type,
+ struct ceph_auth_handshake *a)
+{
+ int ret = 0;
+
+ if (ac->ops && ac->ops->update_authorizer)
+ ret = ac->ops->update_authorizer(ac, peer_type, a);
+ return ret;
+}
+EXPORT_SYMBOL(ceph_auth_update_authorizer);
+
+int ceph_auth_verify_authorizer_reply(struct ceph_auth_client *ac,
+ struct ceph_authorizer *a, size_t len)
+{
+ if (ac->ops && ac->ops->verify_authorizer_reply)
+ return ac->ops->verify_authorizer_reply(ac, a, len);
+ return 0;
+}
+EXPORT_SYMBOL(ceph_auth_verify_authorizer_reply);
+
+void ceph_auth_invalidate_authorizer(struct ceph_auth_client *ac, int peer_type)
+{
+ if (ac->ops && ac->ops->invalidate_authorizer)
+ ac->ops->invalidate_authorizer(ac, peer_type);
+}
+EXPORT_SYMBOL(ceph_auth_invalidate_authorizer);
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -562,7 +562,6 @@ static int ceph_x_update_authorizer(
{
struct ceph_x_authorizer *au;
struct ceph_x_ticket_handler *th;
- int ret;

th = get_ticket_handler(ac, peer_type);
if (IS_ERR(th))
--- a/net/ceph/mon_client.c
+++ b/net/ceph/mon_client.c
@@ -737,7 +737,7 @@ static void delayed_work(struct work_str

__validate_auth(monc);

- if (monc->auth->ops->is_authenticated(monc->auth))
+ if (ceph_auth_is_authenticated(monc->auth))
__send_subscribe(monc);
}
__schedule_delayed(monc);
@@ -892,8 +892,7 @@ static void handle_auth_reply(struct cep

mutex_lock(&monc->mutex);
had_debugfs_info = have_debugfs_info(monc);
- if (monc->auth->ops)
- was_auth = monc->auth->ops->is_authenticated(monc->auth);
+ was_auth = ceph_auth_is_authenticated(monc->auth);
monc->pending_auth = 0;
ret = ceph_handle_auth_reply(monc->auth, msg->front.iov_base,
msg->front.iov_len,
@@ -904,7 +903,7 @@ static void handle_auth_reply(struct cep
wake_up_all(&monc->client->auth_wq);
} else if (ret > 0) {
__send_prepared_auth_request(monc, ret);
- } else if (!was_auth && monc->auth->ops->is_authenticated(monc->auth)) {
+ } else if (!was_auth && ceph_auth_is_authenticated(monc->auth)) {
dout("authenticated, starting session\n");

monc->client->msgr.inst.name.type = CEPH_ENTITY_TYPE_CLIENT;
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -654,8 +654,7 @@ static void put_osd(struct ceph_osd *osd
if (atomic_dec_and_test(&osd->o_ref) && osd->o_auth.authorizer) {
struct ceph_auth_client *ac = osd->o_osdc->client->monc.auth;

- if (ac->ops && ac->ops->destroy_authorizer)
- ac->ops->destroy_authorizer(ac, osd->o_auth.authorizer);
+ ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer);
kfree(osd);
}
}
@@ -2168,17 +2167,16 @@ static struct ceph_auth_handshake *get_a
struct ceph_auth_handshake *auth = &o->o_auth;

if (force_new && auth->authorizer) {
- if (ac->ops && ac->ops->destroy_authorizer)
- ac->ops->destroy_authorizer(ac, auth->authorizer);
+ ceph_auth_destroy_authorizer(ac, auth->authorizer);
auth->authorizer = NULL;
}
- if (!auth->authorizer && ac->ops && ac->ops->create_authorizer) {
- int ret = ac->ops->create_authorizer(ac, CEPH_ENTITY_TYPE_OSD,
- auth);
+ if (!auth->authorizer) {
+ int ret = ceph_auth_create_authorizer(ac, CEPH_ENTITY_TYPE_OSD,
+ auth);
if (ret)
return ERR_PTR(ret);
- } else if (ac->ops && ac->ops->update_authorizer) {
- int ret = ac->ops->update_authorizer(ac, CEPH_ENTITY_TYPE_OSD,
+ } else {
+ int ret = ceph_auth_update_authorizer(ac, CEPH_ENTITY_TYPE_OSD,
auth);
if (ret)
return ERR_PTR(ret);
@@ -2195,11 +2193,7 @@ static int verify_authorizer_reply(struc
struct ceph_osd_client *osdc = o->o_osdc;
struct ceph_auth_client *ac = osdc->client->monc.auth;

- /*
- * XXX If ac->ops or ac->ops->verify_authorizer_reply is null,
- * XXX which do we do: succeed or fail?
- */
- return ac->ops->verify_authorizer_reply(ac, o->o_auth.authorizer, len);
+ return ceph_auth_verify_authorizer_reply(ac, o->o_auth.authorizer, len);
}

static int invalidate_authorizer(struct ceph_connection *con)
@@ -2208,9 +2202,7 @@ static int invalidate_authorizer(struct
struct ceph_osd_client *osdc = o->o_osdc;
struct ceph_auth_client *ac = osdc->client->monc.auth;

- if (ac->ops && ac->ops->invalidate_authorizer)
- ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_OSD);
-
+ ceph_auth_invalidate_authorizer(ac, CEPH_ENTITY_TYPE_OSD);
return ceph_monc_validate_auth(&osdc->client->monc);
}


2013-06-18 16:22:09

by Greg KH

[permalink] [raw]
Subject: [ 43/48] usb: chipidea: fix id change handling

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Shishkin <[email protected]>

commit 0c3f3dc68bb6e6950e8cd7851e7778c550e8dfb4 upstream.

Re-enable chipidea irq even if there's no role changing to do. This is
a problem since b183c19f ("USB: chipidea: re-order irq handling to avoid
unhandled irqs"); when it manifests, chipidea irq gets disabled for good.

Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/chipidea/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/usb/chipidea/core.c
+++ b/drivers/usb/chipidea/core.c
@@ -279,8 +279,9 @@ static void ci_role_work(struct work_str

ci_role_stop(ci);
ci_role_start(ci, role);
- enable_irq(ci->irq);
}
+
+ enable_irq(ci->irq);
}

static ssize_t show_role(struct device *dev, struct device_attribute *attr,

2013-06-18 16:22:37

by Greg KH

[permalink] [raw]
Subject: [ 30/48] x86: Fix adjust_range_size_mask calling position

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yinghai Lu <[email protected]>

commit 7de3d66b1387ddf5a37d9689e5eb8510fb75c765 upstream.

Commit

8d57470d x86, mm: setup page table in top-down

causes a kernel panic while setting mem=2G.

[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.

That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
it causes panic.

Bisected-by: "Xie, ChanglongX" <[email protected]>
Bisected-by: Yuanhan Liu <[email protected]>
Reported-and-tested-by: Yuanhan Liu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -277,6 +277,9 @@ static int __meminit split_mem_range(str
end_pfn = limit_pfn;
nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);

+ if (!after_bootmem)
+ adjust_range_page_size_mask(mr, nr_range);
+
/* try to merge same page size and continuous */
for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {
unsigned long old_start;
@@ -291,9 +294,6 @@ static int __meminit split_mem_range(str
nr_range--;
}

- if (!after_bootmem)
- adjust_range_page_size_mask(mr, nr_range);
-
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
mr[i].start, mr[i].end - 1,

2013-06-18 16:22:36

by Greg KH

[permalink] [raw]
Subject: [ 31/48] x86: Fix typo in kexec register clearing

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit c8a22d19dd238ede87aa0ac4f7dbea8da039b9c1 upstream.

Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.

Signed-off-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: PaX Team <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/relocate_kernel_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -160,7 +160,7 @@ identity_mapped:
xorq %rbp, %rbp
xorq %r8, %r8
xorq %r9, %r9
- xorq %r10, %r9
+ xorq %r10, %r10
xorq %r11, %r11
xorq %r12, %r12
xorq %r13, %r13

2013-06-18 16:22:35

by Greg KH

[permalink] [raw]
Subject: [ 32/48] drm/nv50/disp: force dac power state during load detect

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Skeggs <[email protected]>

commit ea9197cc323839ef3d5280c0453b2c622caa6bc7 upstream.

fdo#64904

Reported-by: Gerhard Bräunlich <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

---
drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c | 4 ++++
drivers/gpu/drm/nouveau/core/include/core/class.h | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c
+++ b/drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c
@@ -50,11 +50,15 @@ nv50_dac_sense(struct nv50_disp_priv *pr
{
const u32 doff = (or * 0x800);
int load = -EINVAL;
+ nv_mask(priv, 0x61a004 + doff, 0x807f0000, 0x80150000);
+ nv_wait(priv, 0x61a004 + doff, 0x80000000, 0x00000000);
nv_wr32(priv, 0x61a00c + doff, 0x00100000 | loadval);
udelay(9500);
nv_wr32(priv, 0x61a00c + doff, 0x80000000);
load = (nv_rd32(priv, 0x61a00c + doff) & 0x38000000) >> 27;
nv_wr32(priv, 0x61a00c + doff, 0x00000000);
+ nv_mask(priv, 0x61a004 + doff, 0x807f0000, 0x80550000);
+ nv_wait(priv, 0x61a004 + doff, 0x80000000, 0x00000000);
return load;
}

--- a/drivers/gpu/drm/nouveau/core/include/core/class.h
+++ b/drivers/gpu/drm/nouveau/core/include/core/class.h
@@ -216,7 +216,7 @@ struct nv04_display_class {
#define NV50_DISP_DAC_PWR_STATE 0x00000040
#define NV50_DISP_DAC_PWR_STATE_ON 0x00000000
#define NV50_DISP_DAC_PWR_STATE_OFF 0x00000040
-#define NV50_DISP_DAC_LOAD 0x0002000c
+#define NV50_DISP_DAC_LOAD 0x00020100
#define NV50_DISP_DAC_LOAD_VALUE 0x00000007

#define NV50_DISP_PIOR_MTHD 0x00030000

2013-06-18 16:22:33

by Greg KH

[permalink] [raw]
Subject: [ 34/48] libceph: clear messenger auth_retry flag when we authenticate

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit 20e55c4cc758e4dccdfd92ae8e9588dd624b2cd7 upstream.

We maintain a counter of failed auth attempts to allow us to retry once
before failing. However, if the second attempt succeeds, the flag isn't
cleared, which makes us think auth failed again later when the connection
resets for other reasons (like a socket error).

This is one part of the sorry sequence of events in bug

http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ceph/messenger.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1597,7 +1597,6 @@ static int process_connect(struct ceph_c
con->error_msg = "connect authorization failure";
return -1;
}
- con->auth_retry = 1;
con_out_kvec_reset(con);
ret = prepare_write_connect(con);
if (ret < 0)
@@ -1682,7 +1681,7 @@ static int process_connect(struct ceph_c

WARN_ON(con->state != CON_STATE_NEGOTIATING);
con->state = CON_STATE_OPEN;
-
+ con->auth_retry = 0; /* we authenticated; clear flag */
con->peer_global_seq = le32_to_cpu(con->in_reply.global_seq);
con->connect_seq++;
con->peer_features = server_feat;

2013-06-18 16:22:32

by Greg KH

[permalink] [raw]
Subject: [ 33/48] drm/nv50/kms: use dac loadval from vbios, where its available

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Skeggs <[email protected]>

commit d40ee48acde16894fb3b241d7e896d5fa84e0f10 upstream.

Regression from merging the old nv50/nvd9 code together, and may be
needed to fully fix fdo#64904.

The value is ignored completely by the hardware starting from nva3.

Reported-by: Emil Velikov <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

---
drivers/gpu/drm/nouveau/nv50_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -1554,7 +1554,9 @@ nv50_dac_detect(struct drm_encoder *enco
{
struct nv50_disp *disp = nv50_disp(encoder->dev);
int ret, or = nouveau_encoder(encoder)->or;
- u32 load = 0;
+ u32 load = nouveau_drm(encoder->dev)->vbios.dactestval;
+ if (load == 0)
+ load = 340;

ret = nv_exec(disp->core, NV50_DISP_DAC_LOAD + or, &load, sizeof(load));
if (ret || load != 7)

2013-06-18 16:24:37

by Greg KH

[permalink] [raw]
Subject: [ 29/48] mm: migration: add migrate_entry_wait_huge()

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <[email protected]>

commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.

When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes. As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.

This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage. This patch introduces
migration_entry_wait_huge() to solve this.

Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/swapops.h | 3 +++
mm/hugetlb.c | 2 +-
mm/migrate.c | 23 ++++++++++++++++++-----
3 files changed, 22 insertions(+), 6 deletions(-)

--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -137,6 +137,7 @@ static inline void make_migration_entry_

extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address);
+extern void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte);
#else

#define make_migration_entry(page, write) swp_entry(0, 0)
@@ -148,6 +149,8 @@ static inline int is_migration_entry(swp
static inline void make_migration_entry_read(swp_entry_t *entryp) { }
static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
unsigned long address) { }
+static inline void migration_entry_wait_huge(struct mm_struct *mm,
+ pte_t *pte) { }
static inline int is_write_migration_entry(swp_entry_t entry)
{
return 0;
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm,
if (ptep) {
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
- migration_entry_wait(mm, (pmd_t *)ptep, address);
+ migration_entry_wait_huge(mm, ptep);
return 0;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -200,15 +200,14 @@ static void remove_migration_ptes(struct
* get to the page and wait until migration is finished.
* When we return from this function the fault will be retried.
*/
-void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
- unsigned long address)
+static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
+ spinlock_t *ptl)
{
- pte_t *ptep, pte;
- spinlock_t *ptl;
+ pte_t pte;
swp_entry_t entry;
struct page *page;

- ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+ spin_lock(ptl);
pte = *ptep;
if (!is_swap_pte(pte))
goto out;
@@ -236,6 +235,20 @@ out:
pte_unmap_unlock(ptep, ptl);
}

+void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long address)
+{
+ spinlock_t *ptl = pte_lockptr(mm, pmd);
+ pte_t *ptep = pte_offset_map(pmd, address);
+ __migration_entry_wait(mm, ptep, ptl);
+}
+
+void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte)
+{
+ spinlock_t *ptl = &(mm)->page_table_lock;
+ __migration_entry_wait(mm, pte, ptl);
+}
+
#ifdef CONFIG_BLOCK
/* Returns true if all buffers are successfully locked */
static bool buffer_migrate_lock_buffers(struct buffer_head *head,

2013-06-18 16:24:36

by Greg KH

[permalink] [raw]
Subject: [ 39/48] Modify UEFI anti-bricking code

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Matthew Garrett <[email protected]>

commit f8b8404337de4e2466e2e1139ea68b1f8295974f upstream.

This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.

Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.

Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.

I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.

Signed-off-by: Matthew Garrett <[email protected]>
Signed-off-by: Lee, Chun-Y <[email protected]> [ dummy variable cleanup ]
Signed-off-by: Matt Fleming <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/boot/compressed/eboot.c | 47 --------
arch/x86/include/asm/efi.h | 7 -
arch/x86/include/uapi/asm/bootparam.h | 1
arch/x86/platform/efi/efi.c | 190 +++++++++++-----------------------
4 files changed, 66 insertions(+), 179 deletions(-)

--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -251,51 +251,6 @@ static void find_bits(unsigned long mask
*size = len;
}

-static efi_status_t setup_efi_vars(struct boot_params *params)
-{
- struct setup_data *data;
- struct efi_var_bootdata *efidata;
- u64 store_size, remaining_size, var_size;
- efi_status_t status;
-
- if (sys_table->runtime->hdr.revision < EFI_2_00_SYSTEM_TABLE_REVISION)
- return EFI_UNSUPPORTED;
-
- data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
- while (data && data->next)
- data = (struct setup_data *)(unsigned long)data->next;
-
- status = efi_call_phys4((void *)sys_table->runtime->query_variable_info,
- EFI_VARIABLE_NON_VOLATILE |
- EFI_VARIABLE_BOOTSERVICE_ACCESS |
- EFI_VARIABLE_RUNTIME_ACCESS, &store_size,
- &remaining_size, &var_size);
-
- if (status != EFI_SUCCESS)
- return status;
-
- status = efi_call_phys3(sys_table->boottime->allocate_pool,
- EFI_LOADER_DATA, sizeof(*efidata), &efidata);
-
- if (status != EFI_SUCCESS)
- return status;
-
- efidata->data.type = SETUP_EFI_VARS;
- efidata->data.len = sizeof(struct efi_var_bootdata) -
- sizeof(struct setup_data);
- efidata->data.next = 0;
- efidata->store_size = store_size;
- efidata->remaining_size = remaining_size;
- efidata->max_var_size = var_size;
-
- if (data)
- data->next = (unsigned long)efidata;
- else
- params->hdr.setup_data = (unsigned long)efidata;
-
-}
-
static efi_status_t setup_efi_pci(struct boot_params *params)
{
efi_pci_io_protocol *pci;
@@ -1202,8 +1157,6 @@ struct boot_params *efi_main(void *handl

setup_graphics(boot_params);

- setup_efi_vars(boot_params);
-
setup_efi_pci(boot_params);

status = efi_call_phys3(sys_table->boottime->allocate_pool,
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -102,13 +102,6 @@ extern void efi_call_phys_epilog(void);
extern void efi_unmap_memmap(void);
extern void efi_memory_uc(u64 addr, unsigned long size);

-struct efi_var_bootdata {
- struct setup_data data;
- u64 store_size;
- u64 remaining_size;
- u64 max_var_size;
-};
-
#ifdef CONFIG_EFI

static inline bool efi_is_native(void)
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -6,7 +6,6 @@
#define SETUP_E820_EXT 1
#define SETUP_DTB 2
#define SETUP_PCI 3
-#define SETUP_EFI_VARS 4

/* ram_size flags */
#define RAMDISK_IMAGE_START_MASK 0x07FF
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -41,7 +41,6 @@
#include <linux/io.h>
#include <linux/reboot.h>
#include <linux/bcd.h>
-#include <linux/ucs2_string.h>

#include <asm/setup.h>
#include <asm/efi.h>
@@ -52,12 +51,12 @@

#define EFI_DEBUG 1

-/*
- * There's some additional metadata associated with each
- * variable. Intel's reference implementation is 60 bytes - bump that
- * to account for potential alignment constraints
- */
-#define VAR_METADATA_SIZE 64
+#define EFI_MIN_RESERVE 5120
+
+#define EFI_DUMMY_GUID \
+ EFI_GUID(0x4424ac57, 0xbe4b, 0x47dd, 0x9e, 0x97, 0xed, 0x50, 0xf0, 0x9f, 0x92, 0xa9)
+
+static efi_char16_t efi_dummy_name[6] = { 'D', 'U', 'M', 'M', 'Y', 0 };

struct efi __read_mostly efi = {
.mps = EFI_INVALID_TABLE_ADDR,
@@ -77,13 +76,6 @@ struct efi_memory_map memmap;
static struct efi efi_phys __initdata;
static efi_system_table_t efi_systab __initdata;

-static u64 efi_var_store_size;
-static u64 efi_var_remaining_size;
-static u64 efi_var_max_var_size;
-static u64 boot_used_size;
-static u64 boot_var_size;
-static u64 active_size;
-
unsigned long x86_efi_facility;

/*
@@ -186,53 +178,8 @@ static efi_status_t virt_efi_get_next_va
efi_char16_t *name,
efi_guid_t *vendor)
{
- efi_status_t status;
- static bool finished = false;
- static u64 var_size;
-
- status = efi_call_virt3(get_next_variable,
- name_size, name, vendor);
-
- if (status == EFI_NOT_FOUND) {
- finished = true;
- if (var_size < boot_used_size) {
- boot_var_size = boot_used_size - var_size;
- active_size += boot_var_size;
- } else {
- printk(KERN_WARNING FW_BUG "efi: Inconsistent initial sizes\n");
- }
- }
-
- if (boot_used_size && !finished) {
- unsigned long size;
- u32 attr;
- efi_status_t s;
- void *tmp;
-
- s = virt_efi_get_variable(name, vendor, &attr, &size, NULL);
-
- if (s != EFI_BUFFER_TOO_SMALL || !size)
- return status;
-
- tmp = kmalloc(size, GFP_ATOMIC);
-
- if (!tmp)
- return status;
-
- s = virt_efi_get_variable(name, vendor, &attr, &size, tmp);
-
- if (s == EFI_SUCCESS && (attr & EFI_VARIABLE_NON_VOLATILE)) {
- var_size += size;
- var_size += ucs2_strsize(name, 1024);
- active_size += size;
- active_size += VAR_METADATA_SIZE;
- active_size += ucs2_strsize(name, 1024);
- }
-
- kfree(tmp);
- }
-
- return status;
+ return efi_call_virt3(get_next_variable,
+ name_size, name, vendor);
}

static efi_status_t virt_efi_set_variable(efi_char16_t *name,
@@ -241,34 +188,9 @@ static efi_status_t virt_efi_set_variabl
unsigned long data_size,
void *data)
{
- efi_status_t status;
- u32 orig_attr = 0;
- unsigned long orig_size = 0;
-
- status = virt_efi_get_variable(name, vendor, &orig_attr, &orig_size,
- NULL);
-
- if (status != EFI_BUFFER_TOO_SMALL)
- orig_size = 0;
-
- status = efi_call_virt5(set_variable,
- name, vendor, attr,
- data_size, data);
-
- if (status == EFI_SUCCESS) {
- if (orig_size) {
- active_size -= orig_size;
- active_size -= ucs2_strsize(name, 1024);
- active_size -= VAR_METADATA_SIZE;
- }
- if (data_size) {
- active_size += data_size;
- active_size += ucs2_strsize(name, 1024);
- active_size += VAR_METADATA_SIZE;
- }
- }
-
- return status;
+ return efi_call_virt5(set_variable,
+ name, vendor, attr,
+ data_size, data);
}

static efi_status_t virt_efi_query_variable_info(u32 attr,
@@ -776,9 +698,6 @@ void __init efi_init(void)
char vendor[100] = "unknown";
int i = 0;
void *tmp;
- struct setup_data *data;
- struct efi_var_bootdata *efi_var_data;
- u64 pa_data;

#ifdef CONFIG_X86_32
if (boot_params.efi_info.efi_systab_hi ||
@@ -796,22 +715,6 @@ void __init efi_init(void)
if (efi_systab_init(efi_phys.systab))
return;

- pa_data = boot_params.hdr.setup_data;
- while (pa_data) {
- data = early_ioremap(pa_data, sizeof(*efi_var_data));
- if (data->type == SETUP_EFI_VARS) {
- efi_var_data = (struct efi_var_bootdata *)data;
-
- efi_var_store_size = efi_var_data->store_size;
- efi_var_remaining_size = efi_var_data->remaining_size;
- efi_var_max_var_size = efi_var_data->max_var_size;
- }
- pa_data = data->next;
- early_iounmap(data, sizeof(*efi_var_data));
- }
-
- boot_used_size = efi_var_store_size - efi_var_remaining_size;
-
set_bit(EFI_SYSTEM_TABLES, &x86_efi_facility);

/*
@@ -1075,6 +978,13 @@ void __init efi_enter_virtual_mode(void)
runtime_code_page_mkexec();

kfree(new_memmap);
+
+ /* clean DUMMY object */
+ efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
+ EFI_VARIABLE_NON_VOLATILE |
+ EFI_VARIABLE_BOOTSERVICE_ACCESS |
+ EFI_VARIABLE_RUNTIME_ACCESS,
+ 0, NULL);
}

/*
@@ -1126,33 +1036,65 @@ efi_status_t efi_query_variable_store(u3
efi_status_t status;
u64 storage_size, remaining_size, max_size;

+ if (!(attributes & EFI_VARIABLE_NON_VOLATILE))
+ return 0;
+
status = efi.query_variable_info(attributes, &storage_size,
&remaining_size, &max_size);
if (status != EFI_SUCCESS)
return status;

- if (!max_size && remaining_size > size)
- printk_once(KERN_ERR FW_BUG "Broken EFI implementation"
- " is returning MaxVariableSize=0\n");
/*
* Some firmware implementations refuse to boot if there's insufficient
* space in the variable store. We account for that by refusing the
* write if permitting it would reduce the available space to under
- * 50%. However, some firmware won't reclaim variable space until
- * after the used (not merely the actively used) space drops below
- * a threshold. We can approximate that case with the value calculated
- * above. If both the firmware and our calculations indicate that the
- * available space would drop below 50%, refuse the write.
+ * 5KB. This figure was provided by Samsung, so should be safe.
*/
+ if ((remaining_size - size < EFI_MIN_RESERVE) &&
+ !efi_no_storage_paranoia) {

- if (!storage_size || size > remaining_size ||
- (max_size && size > max_size))
- return EFI_OUT_OF_RESOURCES;
-
- if (!efi_no_storage_paranoia &&
- ((active_size + size + VAR_METADATA_SIZE > storage_size / 2) &&
- (remaining_size - size < storage_size / 2)))
- return EFI_OUT_OF_RESOURCES;
+ /*
+ * Triggering garbage collection may require that the firmware
+ * generate a real EFI_OUT_OF_RESOURCES error. We can force
+ * that by attempting to use more space than is available.
+ */
+ unsigned long dummy_size = remaining_size + 1024;
+ void *dummy = kmalloc(dummy_size, GFP_ATOMIC);
+
+ status = efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
+ EFI_VARIABLE_NON_VOLATILE |
+ EFI_VARIABLE_BOOTSERVICE_ACCESS |
+ EFI_VARIABLE_RUNTIME_ACCESS,
+ dummy_size, dummy);
+
+ if (status == EFI_SUCCESS) {
+ /*
+ * This should have failed, so if it didn't make sure
+ * that we delete it...
+ */
+ efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
+ EFI_VARIABLE_NON_VOLATILE |
+ EFI_VARIABLE_BOOTSERVICE_ACCESS |
+ EFI_VARIABLE_RUNTIME_ACCESS,
+ 0, dummy);
+ }
+
+ /*
+ * The runtime code may now have triggered a garbage collection
+ * run, so check the variable info again
+ */
+ status = efi.query_variable_info(attributes, &storage_size,
+ &remaining_size, &max_size);
+
+ if (status != EFI_SUCCESS)
+ return status;
+
+ /*
+ * There still isn't enough room, so return an error
+ */
+ if (remaining_size - size < EFI_MIN_RESERVE)
+ return EFI_OUT_OF_RESOURCES;
+ }

return EFI_SUCCESS;
}

2013-06-18 16:25:11

by Greg KH

[permalink] [raw]
Subject: [ 38/48] libceph: wrap auth methods in a mutex

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit e9966076cdd952e19f2dd4854cd719be0d7cbebc upstream.

The auth code is called from a variety of contexts, include the mon_client
(protected by the monc's mutex) and the messenger callbacks (currently
protected by nothing). Avoid chaos by protecting all auth state with a
mutex. Nothing is blocking, so this should be simple and lightweight.

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/ceph/auth.h | 2 +
net/ceph/auth.c | 78 +++++++++++++++++++++++++++++++++-------------
2 files changed, 58 insertions(+), 22 deletions(-)

--- a/include/linux/ceph/auth.h
+++ b/include/linux/ceph/auth.h
@@ -78,6 +78,8 @@ struct ceph_auth_client {
u64 global_id; /* our unique id in system */
const struct ceph_crypto_key *key; /* our secret key */
unsigned want_keys; /* which services we want */
+
+ struct mutex mutex;
};

extern struct ceph_auth_client *ceph_auth_init(const char *name,
--- a/net/ceph/auth.c
+++ b/net/ceph/auth.c
@@ -47,6 +47,7 @@ struct ceph_auth_client *ceph_auth_init(
if (!ac)
goto out;

+ mutex_init(&ac->mutex);
ac->negotiating = true;
if (name)
ac->name = name;
@@ -73,10 +74,12 @@ void ceph_auth_destroy(struct ceph_auth_
*/
void ceph_auth_reset(struct ceph_auth_client *ac)
{
+ mutex_lock(&ac->mutex);
dout("auth_reset %p\n", ac);
if (ac->ops && !ac->negotiating)
ac->ops->reset(ac);
ac->negotiating = true;
+ mutex_unlock(&ac->mutex);
}

int ceph_entity_name_encode(const char *name, void **p, void *end)
@@ -102,6 +105,7 @@ int ceph_auth_build_hello(struct ceph_au
int i, num;
int ret;

+ mutex_lock(&ac->mutex);
dout("auth_build_hello\n");
monhdr->have_version = 0;
monhdr->session_mon = cpu_to_le16(-1);
@@ -122,15 +126,19 @@ int ceph_auth_build_hello(struct ceph_au

ret = ceph_entity_name_encode(ac->name, &p, end);
if (ret < 0)
- return ret;
+ goto out;
ceph_decode_need(&p, end, sizeof(u64), bad);
ceph_encode_64(&p, ac->global_id);

ceph_encode_32(&lenp, p - lenp - sizeof(u32));
- return p - buf;
+ ret = p - buf;
+out:
+ mutex_unlock(&ac->mutex);
+ return ret;

bad:
- return -ERANGE;
+ ret = -ERANGE;
+ goto out;
}

static int ceph_build_auth_request(struct ceph_auth_client *ac,
@@ -151,11 +159,13 @@ static int ceph_build_auth_request(struc
if (ret < 0) {
pr_err("error %d building auth method %s request\n", ret,
ac->ops->name);
- return ret;
+ goto out;
}
dout(" built request %d bytes\n", ret);
ceph_encode_32(&p, ret);
- return p + ret - msg_buf;
+ ret = p + ret - msg_buf;
+out:
+ return ret;
}

/*
@@ -176,6 +186,7 @@ int ceph_handle_auth_reply(struct ceph_a
int result_msg_len;
int ret = -EINVAL;

+ mutex_lock(&ac->mutex);
dout("handle_auth_reply %p %p\n", p, end);
ceph_decode_need(&p, end, sizeof(u32) * 3 + sizeof(u64), bad);
protocol = ceph_decode_32(&p);
@@ -227,35 +238,44 @@ int ceph_handle_auth_reply(struct ceph_a

ret = ac->ops->handle_reply(ac, result, payload, payload_end);
if (ret == -EAGAIN) {
- return ceph_build_auth_request(ac, reply_buf, reply_len);
+ ret = ceph_build_auth_request(ac, reply_buf, reply_len);
} else if (ret) {
pr_err("auth method '%s' error %d\n", ac->ops->name, ret);
- return ret;
}
- return 0;

-bad:
- pr_err("failed to decode auth msg\n");
out:
+ mutex_unlock(&ac->mutex);
return ret;
+
+bad:
+ pr_err("failed to decode auth msg\n");
+ ret = -EINVAL;
+ goto out;
}

int ceph_build_auth(struct ceph_auth_client *ac,
void *msg_buf, size_t msg_len)
{
+ int ret = 0;
+
+ mutex_lock(&ac->mutex);
if (!ac->protocol)
- return ceph_auth_build_hello(ac, msg_buf, msg_len);
- BUG_ON(!ac->ops);
- if (ac->ops->should_authenticate(ac))
- return ceph_build_auth_request(ac, msg_buf, msg_len);
- return 0;
+ ret = ceph_auth_build_hello(ac, msg_buf, msg_len);
+ else if (ac->ops->should_authenticate(ac))
+ ret = ceph_build_auth_request(ac, msg_buf, msg_len);
+ mutex_unlock(&ac->mutex);
+ return ret;
}

int ceph_auth_is_authenticated(struct ceph_auth_client *ac)
{
- if (!ac->ops)
- return 0;
- return ac->ops->is_authenticated(ac);
+ int ret = 0;
+
+ mutex_lock(&ac->mutex);
+ if (ac->ops)
+ ret = ac->ops->is_authenticated(ac);
+ mutex_unlock(&ac->mutex);
+ return ret;
}
EXPORT_SYMBOL(ceph_auth_is_authenticated);

@@ -263,17 +283,23 @@ int ceph_auth_create_authorizer(struct c
int peer_type,
struct ceph_auth_handshake *auth)
{
+ int ret = 0;
+
+ mutex_lock(&ac->mutex);
if (ac->ops && ac->ops->create_authorizer)
- return ac->ops->create_authorizer(ac, peer_type, auth);
- return 0;
+ ret = ac->ops->create_authorizer(ac, peer_type, auth);
+ mutex_unlock(&ac->mutex);
+ return ret;
}
EXPORT_SYMBOL(ceph_auth_create_authorizer);

void ceph_auth_destroy_authorizer(struct ceph_auth_client *ac,
struct ceph_authorizer *a)
{
+ mutex_lock(&ac->mutex);
if (ac->ops && ac->ops->destroy_authorizer)
ac->ops->destroy_authorizer(ac, a);
+ mutex_unlock(&ac->mutex);
}
EXPORT_SYMBOL(ceph_auth_destroy_authorizer);

@@ -283,8 +309,10 @@ int ceph_auth_update_authorizer(struct c
{
int ret = 0;

+ mutex_lock(&ac->mutex);
if (ac->ops && ac->ops->update_authorizer)
ret = ac->ops->update_authorizer(ac, peer_type, a);
+ mutex_unlock(&ac->mutex);
return ret;
}
EXPORT_SYMBOL(ceph_auth_update_authorizer);
@@ -292,15 +320,21 @@ EXPORT_SYMBOL(ceph_auth_update_authorize
int ceph_auth_verify_authorizer_reply(struct ceph_auth_client *ac,
struct ceph_authorizer *a, size_t len)
{
+ int ret = 0;
+
+ mutex_lock(&ac->mutex);
if (ac->ops && ac->ops->verify_authorizer_reply)
- return ac->ops->verify_authorizer_reply(ac, a, len);
- return 0;
+ ret = ac->ops->verify_authorizer_reply(ac, a, len);
+ mutex_unlock(&ac->mutex);
+ return ret;
}
EXPORT_SYMBOL(ceph_auth_verify_authorizer_reply);

void ceph_auth_invalidate_authorizer(struct ceph_auth_client *ac, int peer_type)
{
+ mutex_lock(&ac->mutex);
if (ac->ops && ac->ops->invalidate_authorizer)
ac->ops->invalidate_authorizer(ac, peer_type);
+ mutex_unlock(&ac->mutex);
}
EXPORT_SYMBOL(ceph_auth_invalidate_authorizer);

2013-06-18 16:17:55

by Greg KH

[permalink] [raw]
Subject: [ 20/48] cciss: fix broken mutex usage in ioctl

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Stephen M. Cameron" <[email protected]>

commit 03f47e888daf56c8e9046c674719a0bcc644eed5 upstream.

If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked
(as is normal with the Array Configuration Utility) the process will
hang as below. It attempts to acquire the same mutex twice, once in
do_ioctl() and once in cciss_unlocked_open(). The BKL was recursive,
the mutex isn't.

Linux version 3.10.0-rc2 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013
[...]
acu D 0000000000000001 0 3246 3191 0x00000080
Call Trace:
schedule+0x29/0x70
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x17b/0x220
mutex_lock+0x2b/0x50
cciss_unlocked_open+0x2f/0x110 [cciss]
__blkdev_get+0xd3/0x470
blkdev_get+0x5c/0x1e0
register_disk+0x182/0x1a0
add_disk+0x17c/0x310
cciss_add_disk+0x13a/0x170 [cciss]
cciss_update_drive_info+0x39b/0x480 [cciss]
rebuild_lun_table+0x258/0x370 [cciss]
cciss_ioctl+0x34f/0x470 [cciss]
do_ioctl+0x49/0x70 [cciss]
__blkdev_driver_ioctl+0x28/0x30
blkdev_ioctl+0x200/0x7b0
block_ioctl+0x3c/0x40
do_vfs_ioctl+0x89/0x350
SyS_ioctl+0xa1/0xb0
system_call_fastpath+0x16/0x1b

This mutex usage was added into the ioctl path when the big kernel lock
was removed. As it turns out, these paths are all thread safe anyway
(or can easily be made so) and we don't want ioctl() to be single
threaded in any case.

Signed-off-by: Stephen M. Cameron <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Mike Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/cciss.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -162,8 +162,6 @@ static irqreturn_t do_cciss_msix_intr(in
static int cciss_open(struct block_device *bdev, fmode_t mode);
static int cciss_unlocked_open(struct block_device *bdev, fmode_t mode);
static int cciss_release(struct gendisk *disk, fmode_t mode);
-static int do_ioctl(struct block_device *bdev, fmode_t mode,
- unsigned int cmd, unsigned long arg);
static int cciss_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg);
static int cciss_getgeo(struct block_device *bdev, struct hd_geometry *geo);
@@ -229,7 +227,7 @@ static const struct block_device_operati
.owner = THIS_MODULE,
.open = cciss_unlocked_open,
.release = cciss_release,
- .ioctl = do_ioctl,
+ .ioctl = cciss_ioctl,
.getgeo = cciss_getgeo,
#ifdef CONFIG_COMPAT
.compat_ioctl = cciss_compat_ioctl,
@@ -1138,16 +1136,6 @@ static int cciss_release(struct gendisk
return 0;
}

-static int do_ioctl(struct block_device *bdev, fmode_t mode,
- unsigned cmd, unsigned long arg)
-{
- int ret;
- mutex_lock(&cciss_mutex);
- ret = cciss_ioctl(bdev, mode, cmd, arg);
- mutex_unlock(&cciss_mutex);
- return ret;
-}
-
#ifdef CONFIG_COMPAT

static int cciss_ioctl32_passthru(struct block_device *bdev, fmode_t mode,
@@ -1174,7 +1162,7 @@ static int cciss_compat_ioctl(struct blo
case CCISS_REGNEWD:
case CCISS_RESCANDISK:
case CCISS_GETLUNINFO:
- return do_ioctl(bdev, mode, cmd, arg);
+ return cciss_ioctl(bdev, mode, cmd, arg);

case CCISS_PASSTHRU32:
return cciss_ioctl32_passthru(bdev, mode, cmd, arg);
@@ -1214,7 +1202,7 @@ static int cciss_ioctl32_passthru(struct
if (err)
return -EFAULT;

- err = do_ioctl(bdev, mode, CCISS_PASSTHRU, (unsigned long)p);
+ err = cciss_ioctl(bdev, mode, CCISS_PASSTHRU, (unsigned long)p);
if (err)
return err;
err |=
@@ -1256,7 +1244,7 @@ static int cciss_ioctl32_big_passthru(st
if (err)
return -EFAULT;

- err = do_ioctl(bdev, mode, CCISS_BIG_PASSTHRU, (unsigned long)p);
+ err = cciss_ioctl(bdev, mode, CCISS_BIG_PASSTHRU, (unsigned long)p);
if (err)
return err;
err |=
@@ -1306,11 +1294,14 @@ static int cciss_getpciinfo(ctlr_info_t
static int cciss_getintinfo(ctlr_info_t *h, void __user *argp)
{
cciss_coalint_struct intinfo;
+ unsigned long flags;

if (!argp)
return -EINVAL;
+ spin_lock_irqsave(&h->lock, flags);
intinfo.delay = readl(&h->cfgtable->HostWrite.CoalIntDelay);
intinfo.count = readl(&h->cfgtable->HostWrite.CoalIntCount);
+ spin_unlock_irqrestore(&h->lock, flags);
if (copy_to_user
(argp, &intinfo, sizeof(cciss_coalint_struct)))
return -EFAULT;
@@ -1351,12 +1342,15 @@ static int cciss_setintinfo(ctlr_info_t
static int cciss_getnodename(ctlr_info_t *h, void __user *argp)
{
NodeName_type NodeName;
+ unsigned long flags;
int i;

if (!argp)
return -EINVAL;
+ spin_lock_irqsave(&h->lock, flags);
for (i = 0; i < 16; i++)
NodeName[i] = readb(&h->cfgtable->ServerName[i]);
+ spin_unlock_irqrestore(&h->lock, flags);
if (copy_to_user(argp, NodeName, sizeof(NodeName_type)))
return -EFAULT;
return 0;
@@ -1393,10 +1387,13 @@ static int cciss_setnodename(ctlr_info_t
static int cciss_getheartbeat(ctlr_info_t *h, void __user *argp)
{
Heartbeat_type heartbeat;
+ unsigned long flags;

if (!argp)
return -EINVAL;
+ spin_lock_irqsave(&h->lock, flags);
heartbeat = readl(&h->cfgtable->HeartBeat);
+ spin_unlock_irqrestore(&h->lock, flags);
if (copy_to_user(argp, &heartbeat, sizeof(Heartbeat_type)))
return -EFAULT;
return 0;
@@ -1405,10 +1402,13 @@ static int cciss_getheartbeat(ctlr_info_
static int cciss_getbustypes(ctlr_info_t *h, void __user *argp)
{
BusTypes_type BusTypes;
+ unsigned long flags;

if (!argp)
return -EINVAL;
+ spin_lock_irqsave(&h->lock, flags);
BusTypes = readl(&h->cfgtable->BusTypes);
+ spin_unlock_irqrestore(&h->lock, flags);
if (copy_to_user(argp, &BusTypes, sizeof(BusTypes_type)))
return -EFAULT;
return 0;

2013-06-18 16:25:56

by Greg KH

[permalink] [raw]
Subject: [ 25/48] md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Lyakas <[email protected]>

commit 3056e3aec8d8ba61a0710fb78b2d562600aa2ea7 upstream.

Without that fix, the following scenario could happen:

- RAID1 with drives A and B; drive B was freshly-added and is rebuilding
- Drive A fails
- WRITE request arrives to the array. It is failed by drive A, so
r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
succeeds in writing it, so the same r1_bio is marked as
R1BIO_Uptodate.
- r1_bio arrives to handle_write_finished, badblocks are disabled,
md_error()->error() does nothing because we don't fail the last drive
of raid1
- raid_end_bio_io() calls call_bio_endio()
- As a result, in call_bio_endio():
if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
clear_bit(BIO_UPTODATE, &bio->bi_flags);
this code doesn't clear the BIO_UPTODATE flag, and the whole master
WRITE succeeds, back to the upper layer.

So we returned success to the upper layer, even though we had written
the data onto the rebuilding drive only. But when we want to read the
data back, we would not read from the rebuilding drive, so this data
is lost.

[neilb - applied identical change to raid10 as well]

This bug can result in lost data, so it is suitable for any
-stable kernel.

Signed-off-by: Alex Lyakas <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/raid1.c | 12 +++++++++++-
drivers/md/raid10.c | 12 +++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -427,7 +427,17 @@ static void raid1_end_write_request(stru

r1_bio->bios[mirror] = NULL;
to_put = bio;
- set_bit(R1BIO_Uptodate, &r1_bio->state);
+ /*
+ * Do not set R1BIO_Uptodate if the current device is
+ * rebuilding or Faulty. This is because we cannot use
+ * such device for properly reading the data back (we could
+ * potentially use it, if the current write would have felt
+ * before rdev->recovery_offset, but for simplicity we don't
+ * check this here.
+ */
+ if (test_bit(In_sync, &conf->mirrors[mirror].rdev->flags) &&
+ !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags))
+ set_bit(R1BIO_Uptodate, &r1_bio->state);

/* Maybe we can clear some bad blocks. */
if (is_badblock(conf->mirrors[mirror].rdev,
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -490,7 +490,17 @@ static void raid10_end_write_request(str
sector_t first_bad;
int bad_sectors;

- set_bit(R10BIO_Uptodate, &r10_bio->state);
+ /*
+ * Do not set R10BIO_Uptodate if the current device is
+ * rebuilding or Faulty. This is because we cannot use
+ * such device for properly reading the data back (we could
+ * potentially use it, if the current write would have felt
+ * before rdev->recovery_offset, but for simplicity we don't
+ * check this here.
+ */
+ if (test_bit(In_sync, &rdev->flags) &&
+ !test_bit(Faulty, &rdev->flags))
+ set_bit(R10BIO_Uptodate, &r10_bio->state);

/* Maybe we can clear some bad blocks. */
if (is_badblock(rdev,

2013-06-18 16:25:55

by Greg KH

[permalink] [raw]
Subject: [ 16/48] ath9k: Use minstrel rate control by default

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sujith Manoharan <[email protected]>

commit 5efac94999ff218e0101f67a059e44abb4b0b523 upstream.

The ath9k rate control algorithm has various architectural
issues that make it a poor fit in scenarios like congested
environments etc.

An example: https://bugzilla.redhat.com/show_bug.cgi?id=927191

Change the default to minstrel which is more robust in such cases.
The ath9k RC code is left in the driver for now, maybe it can
be removed altogether later on.

Signed-off-by: Sujith Manoharan <[email protected]>
Cc: Jouni Malinen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/ath/ath9k/Kconfig | 10 +++++++---
drivers/net/wireless/ath/ath9k/Makefile | 2 +-
drivers/net/wireless/ath/ath9k/init.c | 4 ----
drivers/net/wireless/ath/ath9k/rc.h | 2 +-
4 files changed, 9 insertions(+), 9 deletions(-)

--- a/drivers/net/wireless/ath/ath9k/Kconfig
+++ b/drivers/net/wireless/ath/ath9k/Kconfig
@@ -92,13 +92,17 @@ config ATH9K_MAC_DEBUG
This option enables collection of statistics for Rx/Tx status
data and some other MAC related statistics

-config ATH9K_RATE_CONTROL
+config ATH9K_LEGACY_RATE_CONTROL
bool "Atheros ath9k rate control"
depends on ATH9K
- default y
+ default n
---help---
Say Y, if you want to use the ath9k specific rate control
- module instead of minstrel_ht.
+ module instead of minstrel_ht. Be warned that there are various
+ issues with the ath9k RC and minstrel is a more robust algorithm.
+ Note that even if this option is selected, "ath9k_rate_control"
+ has to be passed to mac80211 using the module parameter,
+ ieee80211_default_rc_algo.

config ATH9K_HTC
tristate "Atheros HTC based wireless cards support"
--- a/drivers/net/wireless/ath/ath9k/Makefile
+++ b/drivers/net/wireless/ath/ath9k/Makefile
@@ -8,7 +8,7 @@ ath9k-y += beacon.o \
antenna.o

ath9k-$(CONFIG_ATH9K_BTCOEX_SUPPORT) += mci.o
-ath9k-$(CONFIG_ATH9K_RATE_CONTROL) += rc.o
+ath9k-$(CONFIG_ATH9K_LEGACY_RATE_CONTROL) += rc.o
ath9k-$(CONFIG_ATH9K_PCI) += pci.o
ath9k-$(CONFIG_ATH9K_AHB) += ahb.o
ath9k-$(CONFIG_ATH9K_DEBUGFS) += debug.o
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -808,10 +808,6 @@ void ath9k_set_hw_capab(struct ath_softc
sc->ant_rx = hw->wiphy->available_antennas_rx;
sc->ant_tx = hw->wiphy->available_antennas_tx;

-#ifdef CONFIG_ATH9K_RATE_CONTROL
- hw->rate_control_algorithm = "ath9k_rate_control";
-#endif
-
if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_2GHZ)
hw->wiphy->bands[IEEE80211_BAND_2GHZ] =
&sc->sbands[IEEE80211_BAND_2GHZ];
--- a/drivers/net/wireless/ath/ath9k/rc.h
+++ b/drivers/net/wireless/ath/ath9k/rc.h
@@ -231,7 +231,7 @@ static inline void ath_debug_stat_retrie
}
#endif

-#ifdef CONFIG_ATH9K_RATE_CONTROL
+#ifdef CONFIG_ATH9K_LEGACY_RATE_CONTROL
int ath_rate_control_register(void);
void ath_rate_control_unregister(void);
#else

2013-06-18 16:26:34

by Greg KH

[permalink] [raw]
Subject: [ 23/48] drm/i915: prefer VBT modes for SVDO-LVDS over EDID

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Vetter <[email protected]>

commit c3456fb3e4712d0448592af3c5d644c9472cd3c1 upstream.

In

commit 53d3b4d7778daf15900867336c85d3f8dd70600c
Author: Egbert Eich <[email protected]>
Date: Tue Jun 4 17:13:21 2013 +0200

drm/i915/sdvo: Use &intel_sdvo->ddc instead of intel_sdvo->i2c for DDC

Egbert Eich fixed a long-standing bug where we simply used a
non-working i2c controller to read the EDID for SDVO-LVDS panels.
Unfortunately some machines seem to not be able to cope with the mode
provided in the EDID. Specifically they seem to not be able to cope
with a 4x pixel mutliplier instead of a 2x one, which seems to have
been worked around by slightly changing the panels native mode in the
VBT so that the dotclock is just barely above 50MHz.

Since it took forever to notice the breakage it's fairly safe to
assume that at least for SDVO-LVDS panels the VBT contains fairly sane
data. So just switch around the order and use VBT modes first.

v2: Also add EDID modes just in case, and spell Egbert correctly.

v3: Elaborate a bit more about what's going on on Chris' machine.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65524
Reported-and-tested-by: Chris Wilson <[email protected]>
Cc: Egbert Eich <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/intel_sdvo.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/i915/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/intel_sdvo.c
@@ -1771,10 +1771,13 @@ static void intel_sdvo_get_lvds_modes(st
* arranged in priority order.
*/
intel_ddc_get_modes(connector, &intel_sdvo->ddc);
- if (list_empty(&connector->probed_modes) == false)
- goto end;

- /* Fetch modes from VBT */
+ /*
+ * Fetch modes from VBT. For SDVO prefer the VBT mode since some
+ * SDVO->LVDS transcoders can't cope with the EDID mode. Since
+ * drm_mode_probed_add adds the mode at the head of the list we add it
+ * last.
+ */
if (dev_priv->sdvo_lvds_vbt_mode != NULL) {
newmode = drm_mode_duplicate(connector->dev,
dev_priv->sdvo_lvds_vbt_mode);
@@ -1786,7 +1789,6 @@ static void intel_sdvo_get_lvds_modes(st
}
}

-end:
list_for_each_entry(newmode, &connector->probed_modes, head) {
if (newmode->type & DRM_MODE_TYPE_PREFERRED) {
intel_sdvo->sdvo_lvds_fixed_mode =

2013-06-18 16:26:50

by Greg KH

[permalink] [raw]
Subject: [ 22/48] wl12xx: fix minimum required firmware version for wl127x multirole

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Luciano Coelho <[email protected]>

commit 60c28cf18f970e1c1bd40d615596eeab6efbd9d7 upstream.

There was a typo in commit 8675f9 (wlcore/wl12xx/wl18xx: verify
multi-role and single-role fw versions), which was causing the
multirole firmware for wl127x (WiLink6) to be rejected. The actual
minimum version needed for wl127x multirole is 6.5.7.0.42.

Reported-by: Levi Pearson <[email protected]>
Reported-by: Michael Scott <[email protected]>
Signed-off-by: Luciano Coelho <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/ti/wl12xx/wl12xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/wireless/ti/wl12xx/wl12xx.h
+++ b/drivers/net/wireless/ti/wl12xx/wl12xx.h
@@ -41,7 +41,7 @@
#define WL127X_IFTYPE_MR_VER 5
#define WL127X_MAJOR_MR_VER 7
#define WL127X_SUBTYPE_MR_VER WLCORE_FW_VER_IGNORE
-#define WL127X_MINOR_MR_VER 115
+#define WL127X_MINOR_MR_VER 42

/* FW chip version for wl128x */
#define WL128X_CHIP_VER 7

2013-06-18 16:27:11

by Greg KH

[permalink] [raw]
Subject: [ 21/48] memcg: dont initialize kmem-cache destroying work for root caches

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andrey Vagin <[email protected]>

commit f101a9464bfbda42730b54a66f926d75ed2cd31e upstream.

struct memcg_cache_params has a union. Different parts of this union
are used for root and non-root caches. A part with destroying work is
used only for non-root caches.

BUG: unable to handle kernel paging request at 0000000fffffffe0
IP: kmem_cache_alloc+0x41/0x1f0
Modules linked in: netlink_diag af_packet_diag udp_diag tcp_diag inet_diag unix_diag ip6table_filter ip6_tables i2c_piix4 virtio_net virtio_balloon microcode i2c_core pcspkr floppy
CPU: 0 PID: 1929 Comm: lt-vzctl Tainted: G D 3.10.0-rc1+ #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: kmem_cache_alloc+0x41/0x1f0
Call Trace:
getname_flags.part.34+0x30/0x140
getname+0x38/0x60
do_sys_open+0xc5/0x1e0
SyS_open+0x22/0x30
system_call_fastpath+0x16/0x1b
Code: f4 53 48 83 ec 18 8b 05 8e 53 b7 00 4c 8b 4d 08 21 f0 a8 10 74 0d 4c 89 4d c0 e8 1b 76 4a 00 4c 8b 4d c0 e9 92 00 00 00 4d 89 f5 <4d> 8b 45 00 65 4c 03 04 25 48 cd 00 00 49 8b 50 08 4d 8b 38 49
RIP [<ffffffff8116b641>] kmem_cache_alloc+0x41/0x1f0

Signed-off-by: Andrey Vagin <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Li Zefan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3033,8 +3033,6 @@ int memcg_update_cache_size(struct kmem_
return -ENOMEM;
}

- INIT_WORK(&s->memcg_params->destroy,
- kmem_cache_destroy_work_func);
s->memcg_params->is_root_cache = true;

/*

2013-06-18 16:27:32

by Greg KH

[permalink] [raw]
Subject: [ 19/48] kmsg: honor dmesg_restrict sysctl on /dev/kmsg

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit 637241a900cbd982f744d44646b48a273d609b34 upstream.

The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.

- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).

The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[[email protected]: use pr_warn_once()]
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Christian Kujau <[email protected]>
Tested-by: Josh Boyer <[email protected]>
Cc: Kay Sievers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/proc/kmsg.c | 10 ++---
include/linux/syslog.h | 4 +-
kernel/printk.c | 91 ++++++++++++++++++++++++++-----------------------
3 files changed, 57 insertions(+), 48 deletions(-)

--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -21,12 +21,12 @@ extern wait_queue_head_t log_wait;

static int kmsg_open(struct inode * inode, struct file * file)
{
- return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_FILE);
+ return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
}

static int kmsg_release(struct inode * inode, struct file * file)
{
- (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_FILE);
+ (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC);
return 0;
}

@@ -34,15 +34,15 @@ static ssize_t kmsg_read(struct file *fi
size_t count, loff_t *ppos)
{
if ((file->f_flags & O_NONBLOCK) &&
- !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_FILE))
+ !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
return -EAGAIN;
- return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_FILE);
+ return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC);
}

static unsigned int kmsg_poll(struct file *file, poll_table *wait)
{
poll_wait(file, &log_wait, wait);
- if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_FILE))
+ if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
return POLLIN | POLLRDNORM;
return 0;
}
--- a/include/linux/syslog.h
+++ b/include/linux/syslog.h
@@ -44,8 +44,8 @@
/* Return size of the log buffer */
#define SYSLOG_ACTION_SIZE_BUFFER 10

-#define SYSLOG_FROM_CALL 0
-#define SYSLOG_FROM_FILE 1
+#define SYSLOG_FROM_READER 0
+#define SYSLOG_FROM_PROC 1

int do_syslog(int type, char __user *buf, int count, bool from_file);

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -368,6 +368,53 @@ static void log_store(int facility, int
log_next_seq++;
}

+#ifdef CONFIG_SECURITY_DMESG_RESTRICT
+int dmesg_restrict = 1;
+#else
+int dmesg_restrict;
+#endif
+
+static int syslog_action_restricted(int type)
+{
+ if (dmesg_restrict)
+ return 1;
+ /*
+ * Unless restricted, we allow "read all" and "get buffer size"
+ * for everybody.
+ */
+ return type != SYSLOG_ACTION_READ_ALL &&
+ type != SYSLOG_ACTION_SIZE_BUFFER;
+}
+
+static int check_syslog_permissions(int type, bool from_file)
+{
+ /*
+ * If this is from /proc/kmsg and we've already opened it, then we've
+ * already done the capabilities checks at open time.
+ */
+ if (from_file && type != SYSLOG_ACTION_OPEN)
+ return 0;
+
+ if (syslog_action_restricted(type)) {
+ if (capable(CAP_SYSLOG))
+ return 0;
+ /*
+ * For historical reasons, accept CAP_SYS_ADMIN too, with
+ * a warning.
+ */
+ if (capable(CAP_SYS_ADMIN)) {
+ pr_warn_once("%s (%d): Attempt to access syslog with "
+ "CAP_SYS_ADMIN but no CAP_SYSLOG "
+ "(deprecated).\n",
+ current->comm, task_pid_nr(current));
+ return 0;
+ }
+ return -EPERM;
+ }
+ return security_syslog(type);
+}
+
+
/* /dev/kmsg - userspace message inject/listen interface */
struct devkmsg_user {
u64 seq;
@@ -624,7 +671,8 @@ static int devkmsg_open(struct inode *in
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;

- err = security_syslog(SYSLOG_ACTION_READ_ALL);
+ err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
+ SYSLOG_FROM_READER);
if (err)
return err;

@@ -817,45 +865,6 @@ static inline void boot_delay_msec(int l
}
#endif

-#ifdef CONFIG_SECURITY_DMESG_RESTRICT
-int dmesg_restrict = 1;
-#else
-int dmesg_restrict;
-#endif
-
-static int syslog_action_restricted(int type)
-{
- if (dmesg_restrict)
- return 1;
- /* Unless restricted, we allow "read all" and "get buffer size" for everybody */
- return type != SYSLOG_ACTION_READ_ALL && type != SYSLOG_ACTION_SIZE_BUFFER;
-}
-
-static int check_syslog_permissions(int type, bool from_file)
-{
- /*
- * If this is from /proc/kmsg and we've already opened it, then we've
- * already done the capabilities checks at open time.
- */
- if (from_file && type != SYSLOG_ACTION_OPEN)
- return 0;
-
- if (syslog_action_restricted(type)) {
- if (capable(CAP_SYSLOG))
- return 0;
- /* For historical reasons, accept CAP_SYS_ADMIN too, with a warning */
- if (capable(CAP_SYS_ADMIN)) {
- printk_once(KERN_WARNING "%s (%d): "
- "Attempt to access syslog with CAP_SYS_ADMIN "
- "but no CAP_SYSLOG (deprecated).\n",
- current->comm, task_pid_nr(current));
- return 0;
- }
- return -EPERM;
- }
- return 0;
-}
-
#if defined(CONFIG_PRINTK_TIME)
static bool printk_time = 1;
#else
@@ -1253,7 +1262,7 @@ out:

SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
{
- return do_syslog(type, buf, len, SYSLOG_FROM_CALL);
+ return do_syslog(type, buf, len, SYSLOG_FROM_READER);
}

/*

2013-06-18 16:27:49

by Greg KH

[permalink] [raw]
Subject: [ 18/48] reboot: rigrate shutdown/reboot to boot cpu

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Robin Holt <[email protected]>

commit cf7df378aa4ff7da3a44769b7ff6e9eef1a9f3db upstream.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f2dc63 ("kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()").

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
boot cpu and then continuing with shutdown/reboot.

This also has the effect of not breaking x86's command line parameter
for specifying the reboot cpu. Note, this code was shamelessly copied
from arch/x86/kernel/reboot.c with bits removed pertaining to the
reboot_cpu command line parameter.

Signed-off-by: Robin Holt <[email protected]>
Tested-by: Shawn Guo <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russell King <[email protected]>
Cc: Guan Xuetao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/sys.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)

--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -357,6 +357,29 @@ int unregister_reboot_notifier(struct no
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+/* Add backwards compatibility for stable trees. */
+#ifndef PF_NO_SETAFFINITY
+#define PF_NO_SETAFFINITY PF_THREAD_BOUND
+#endif
+
+static void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int cpu = 0;
+
+ cpu_hotplug_disable();
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(cpu))
+ cpu = cpumask_first(cpu_online_mask);
+
+ /* Prevent races with other tasks migrating this task */
+ current->flags |= PF_NO_SETAFFINITY;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
/**
* kernel_restart - reboot the system
* @cmd: pointer to buffer containing command to execute for restart
@@ -368,7 +391,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
printk(KERN_EMERG "Restarting system.\n");
@@ -395,7 +418,7 @@ static void kernel_shutdown_prepare(enum
void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
@@ -414,7 +437,7 @@ void kernel_power_off(void)
kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);

2013-06-18 16:17:49

by Greg KH

[permalink] [raw]
Subject: [ 14/48] ath9k: Disable PowerSave by default

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sujith Manoharan <[email protected]>

commit 531671cb17af07281e6f28c1425f754346e65c41 upstream.

Almost all the DMA issues which have plagued ath9k (in station mode)
for years are related to PS. Disabling PS usually "fixes" the user's
connection stablility. Reports of DMA problems are still trickling in
and are sitting in the kernel bugzilla. Until the PS code in ath9k is
given a thorough review, disbale it by default. The slight increase
in chip power consumption is a small price to pay for improved link
stability.

Signed-off-by: Sujith Manoharan <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/ath/ath9k/init.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -766,8 +766,7 @@ void ath9k_set_hw_capab(struct ath_softc
hw->wiphy->iface_combinations = &if_comb;
hw->wiphy->n_iface_combinations = 1;

- if (AR_SREV_5416(sc->sc_ah))
- hw->wiphy->flags &= ~WIPHY_FLAG_PS_ON_BY_DEFAULT;
+ hw->wiphy->flags &= ~WIPHY_FLAG_PS_ON_BY_DEFAULT;

hw->wiphy->flags |= WIPHY_FLAG_IBSS_RSN;
hw->wiphy->flags |= WIPHY_FLAG_SUPPORTS_TDLS;

2013-06-18 16:28:13

by Greg KH

[permalink] [raw]
Subject: [ 27/48] md/raid1,raid10: use freeze_array in place of raise_barrier in various places.

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: NeilBrown <[email protected]>

commit e2d59925221cd562e07fee38ec8839f7209ae603 upstream.

Various places in raid1 and raid10 are calling raise_barrier when they
really should call freeze_array.
The former is only intended to be called from "make_request".
The later has extra checks for 'nr_queued' and makes a call to
flush_pending_writes(), so it is safe to call it from within the
management thread.

Using raise_barrier will sometimes deadlock. Using freeze_array
should not.

As 'freeze_array' currently expects one request to be pending (in
handle_read_error - the only previous caller), we need to pass
it the number of pending requests (extra) to ignore.

The deadlock was made particularly noticeable by commits
050b66152f87c7 (raid10) and 6b740b8d79252f13 (raid1) which
appeared in 3.4, so the fix is appropriate for any -stable
kernel since then.

This patch probably won't apply directly to some early kernels and
will need to be applied by hand.

Reported-by: Alexander Lyakas <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/raid1.c | 22 +++++++++++-----------
drivers/md/raid10.c | 14 +++++++-------
2 files changed, 18 insertions(+), 18 deletions(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -890,17 +890,17 @@ static void allow_barrier(struct r1conf
wake_up(&conf->wait_barrier);
}

-static void freeze_array(struct r1conf *conf)
+static void freeze_array(struct r1conf *conf, int extra)
{
/* stop syncio and normal IO and wait for everything to
* go quite.
* We increment barrier and nr_waiting, and then
- * wait until nr_pending match nr_queued+1
+ * wait until nr_pending match nr_queued+extra
* This is called in the context of one normal IO request
* that has failed. Thus any sync request that might be pending
* will be blocked by nr_pending, and we need to wait for
* pending IO requests to complete or be queued for re-try.
- * Thus the number queued (nr_queued) plus this request (1)
+ * Thus the number queued (nr_queued) plus this request (extra)
* must match the number of pending IOs (nr_pending) before
* we continue.
*/
@@ -908,7 +908,7 @@ static void freeze_array(struct r1conf *
conf->barrier++;
conf->nr_waiting++;
wait_event_lock_irq_cmd(conf->wait_barrier,
- conf->nr_pending == conf->nr_queued+1,
+ conf->nr_pending == conf->nr_queued+extra,
conf->resync_lock,
flush_pending_writes(conf));
spin_unlock_irq(&conf->resync_lock);
@@ -1568,8 +1568,8 @@ static int raid1_add_disk(struct mddev *
* we wait for all outstanding requests to complete.
*/
synchronize_sched();
- raise_barrier(conf);
- lower_barrier(conf);
+ freeze_array(conf, 0);
+ unfreeze_array(conf);
clear_bit(Unmerged, &rdev->flags);
}
md_integrity_add_rdev(rdev, mddev);
@@ -1619,11 +1619,11 @@ static int raid1_remove_disk(struct mdde
*/
struct md_rdev *repl =
conf->mirrors[conf->raid_disks + number].rdev;
- raise_barrier(conf);
+ freeze_array(conf, 0);
clear_bit(Replacement, &repl->flags);
p->rdev = repl;
conf->mirrors[conf->raid_disks + number].rdev = NULL;
- lower_barrier(conf);
+ unfreeze_array(conf);
clear_bit(WantReplacement, &rdev->flags);
} else
clear_bit(WantReplacement, &rdev->flags);
@@ -2240,7 +2240,7 @@ static void handle_read_error(struct r1c
* frozen
*/
if (mddev->ro == 0) {
- freeze_array(conf);
+ freeze_array(conf, 1);
fix_read_error(conf, r1_bio->read_disk,
r1_bio->sector, r1_bio->sectors);
unfreeze_array(conf);
@@ -3019,7 +3019,7 @@ static int raid1_reshape(struct mddev *m
return -ENOMEM;
}

- raise_barrier(conf);
+ freeze_array(conf, 0);

/* ok, everything is stopped */
oldpool = conf->r1bio_pool;
@@ -3050,7 +3050,7 @@ static int raid1_reshape(struct mddev *m
conf->raid_disks = mddev->raid_disks = raid_disks;
mddev->delta_disks = 0;

- lower_barrier(conf);
+ unfreeze_array(conf);

set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1065,17 +1065,17 @@ static void allow_barrier(struct r10conf
wake_up(&conf->wait_barrier);
}

-static void freeze_array(struct r10conf *conf)
+static void freeze_array(struct r10conf *conf, int extra)
{
/* stop syncio and normal IO and wait for everything to
* go quiet.
* We increment barrier and nr_waiting, and then
- * wait until nr_pending match nr_queued+1
+ * wait until nr_pending match nr_queued+extra
* This is called in the context of one normal IO request
* that has failed. Thus any sync request that might be pending
* will be blocked by nr_pending, and we need to wait for
* pending IO requests to complete or be queued for re-try.
- * Thus the number queued (nr_queued) plus this request (1)
+ * Thus the number queued (nr_queued) plus this request (extra)
* must match the number of pending IOs (nr_pending) before
* we continue.
*/
@@ -1083,7 +1083,7 @@ static void freeze_array(struct r10conf
conf->barrier++;
conf->nr_waiting++;
wait_event_lock_irq_cmd(conf->wait_barrier,
- conf->nr_pending == conf->nr_queued+1,
+ conf->nr_pending == conf->nr_queued+extra,
conf->resync_lock,
flush_pending_writes(conf));

@@ -1849,8 +1849,8 @@ static int raid10_add_disk(struct mddev
* we wait for all outstanding requests to complete.
*/
synchronize_sched();
- raise_barrier(conf, 0);
- lower_barrier(conf);
+ freeze_array(conf, 0);
+ unfreeze_array(conf);
clear_bit(Unmerged, &rdev->flags);
}
md_integrity_add_rdev(rdev, mddev);
@@ -2646,7 +2646,7 @@ static void handle_read_error(struct mdd
r10_bio->devs[slot].bio = NULL;

if (mddev->ro == 0) {
- freeze_array(conf);
+ freeze_array(conf, 1);
fix_read_error(conf, mddev, r10_bio);
unfreeze_array(conf);
} else

2013-06-18 16:28:54

by Greg KH

[permalink] [raw]
Subject: [ 26/48] md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <[email protected]>

commit 5026d7a9b2f3eb1f9bda66c18ac6bc3036ec9020 upstream.

There are cases where the kernel will believe that the WRITE SAME
command is supported by a block device which does not, in fact,
support WRITE SAME. This currently happens for SATA drivers behind a
SAS controller, but there are probably a hundred other ways that can
happen, including drive firmware bugs.

After receiving an error for WRITE SAME the block layer will retry the
request as a plain write of zeroes, but mdraid will consider the
failure as fatal and consider the drive failed. This has the effect
that all the mirrors containing a specific set of data are each
offlined in very rapid succession resulting in data loss.

However, just bouncing the request back up to the block layer isn't
ideal either, because the whole initial request-retry sequence should
be inside the write bitmap fence, which probably means that md needs
to do its own conversion of WRITE SAME to write zero.

Until the failure scenario has been sorted out, disable WRITE SAME for
raid1, raid5, and raid10.

[neilb: added raid5]

This patch is appropriate for any -stable since 3.7 when write_same
support was added.

Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/raid1.c | 4 ++--
drivers/md/raid10.c | 3 +--
drivers/md/raid5.c | 4 +++-
3 files changed, 6 insertions(+), 5 deletions(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2837,8 +2837,8 @@ static int run(struct mddev *mddev)
return PTR_ERR(conf);

if (mddev->queue)
- blk_queue_max_write_same_sectors(mddev->queue,
- mddev->chunk_sectors);
+ blk_queue_max_write_same_sectors(mddev->queue, 0);
+
rdev_for_each(rdev, mddev) {
if (!mddev->gendisk)
continue;
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3635,8 +3635,7 @@ static int run(struct mddev *mddev)
if (mddev->queue) {
blk_queue_max_discard_sectors(mddev->queue,
mddev->chunk_sectors);
- blk_queue_max_write_same_sectors(mddev->queue,
- mddev->chunk_sectors);
+ blk_queue_max_write_same_sectors(mddev->queue, 0);
blk_queue_io_min(mddev->queue, chunk_size);
if (conf->geo.raid_disks % conf->geo.near_copies)
blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5457,7 +5457,7 @@ static int run(struct mddev *mddev)
if (mddev->major_version == 0 &&
mddev->minor_version > 90)
rdev->recovery_offset = reshape_offset;
-
+
if (rdev->recovery_offset < reshape_offset) {
/* We need to check old and new layout */
if (!only_parity(rdev->raid_disk,
@@ -5580,6 +5580,8 @@ static int run(struct mddev *mddev)
*/
mddev->queue->limits.discard_zeroes_data = 0;

+ blk_queue_max_write_same_sectors(mddev->queue, 0);
+
rdev_for_each(rdev, mddev) {
disk_stack_limits(mddev->gendisk, rdev->bdev,
rdev->data_offset << 9);

2013-06-18 16:29:37

by Greg KH

[permalink] [raw]
Subject: [ 15/48] Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <[email protected]>

commit 96005931785238e1a24febf65ffb5016273e8225 upstream.

This reverts commit 68d9e1fa24d9c7c2e527f49df8d18fb8cf0ec943

This change reduces rx sensitivity with no apparent extra benefit.
It looks like it was meant for testing in a specific scenario,
but it was never properly validated.

Signed-off-by: Felix Fietkau <[email protected]>
Cc: [email protected]
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h
+++ b/drivers/net/wireless/ath/ath9k/ar9003_2p2_initvals.h
@@ -958,11 +958,11 @@ static const u32 ar9300Common_rx_gain_ta
{0x0000a074, 0x00000000},
{0x0000a078, 0x00000000},
{0x0000a07c, 0x00000000},
- {0x0000a080, 0x1a1a1a1a},
- {0x0000a084, 0x1a1a1a1a},
- {0x0000a088, 0x1a1a1a1a},
- {0x0000a08c, 0x1a1a1a1a},
- {0x0000a090, 0x171a1a1a},
+ {0x0000a080, 0x22222229},
+ {0x0000a084, 0x1d1d1d1d},
+ {0x0000a088, 0x1d1d1d1d},
+ {0x0000a08c, 0x1d1d1d1d},
+ {0x0000a090, 0x171d1d1d},
{0x0000a094, 0x11111717},
{0x0000a098, 0x00030311},
{0x0000a09c, 0x00000000},

2013-06-18 16:17:47

by Greg KH

[permalink] [raw]
Subject: [ 06/48] ceph: ceph_pagelist_append might sleep while atomic

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jim Schutt <[email protected]>

commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep. Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
. . .
[13439.376225] Call Trace:
[13439.378757] [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
[13439.384353] [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491] [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035] [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
[13439.403775] [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
[13439.409277] [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622] [<ffffffff81196748>] ? igrab+0x28/0x70
[13439.420610] [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584] [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499] [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646] [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363] [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250] [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
[13439.461534] [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432] [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
[13439.473934] [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464] [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
[13439.486492] [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
[13439.493190] [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
. . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[[email protected]: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ceph/locks.c | 76 +++++++++++++++++++++++++++++++--------------------
fs/ceph/mds_client.c | 63 ++++++++++++++++++++++--------------------
fs/ceph/super.h | 9 ++++--
3 files changed, 88 insertions(+), 60 deletions(-)

--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -191,29 +191,23 @@ void ceph_count_locks(struct inode *inod
}

/**
- * Encode the flock and fcntl locks for the given inode into the pagelist.
- * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
- * sequential flock locks.
- * Must be called with lock_flocks() already held.
- * If we encounter more of a specific lock type than expected,
- * we return the value 1.
+ * Encode the flock and fcntl locks for the given inode into the ceph_filelock
+ * array. Must be called with lock_flocks() already held.
+ * If we encounter more of a specific lock type than expected, return -ENOSPC.
*/
-int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
- int num_fcntl_locks, int num_flock_locks)
+int ceph_encode_locks_to_buffer(struct inode *inode,
+ struct ceph_filelock *flocks,
+ int num_fcntl_locks, int num_flock_locks)
{
struct file_lock *lock;
- struct ceph_filelock cephlock;
int err = 0;
int seen_fcntl = 0;
int seen_flock = 0;
- __le32 nlocks;
+ int l = 0;

dout("encoding %d flock and %d fcntl locks", num_flock_locks,
num_fcntl_locks);
- nlocks = cpu_to_le32(num_fcntl_locks);
- err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
- if (err)
- goto fail;
+
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
if (lock->fl_flags & FL_POSIX) {
++seen_fcntl;
@@ -221,20 +215,12 @@ int ceph_encode_locks(struct inode *inod
err = -ENOSPC;
goto fail;
}
- err = lock_to_ceph_filelock(lock, &cephlock);
+ err = lock_to_ceph_filelock(lock, &flocks[l]);
if (err)
goto fail;
- err = ceph_pagelist_append(pagelist, &cephlock,
- sizeof(struct ceph_filelock));
+ ++l;
}
- if (err)
- goto fail;
}
-
- nlocks = cpu_to_le32(num_flock_locks);
- err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
- if (err)
- goto fail;
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
if (lock->fl_flags & FL_FLOCK) {
++seen_flock;
@@ -242,19 +228,51 @@ int ceph_encode_locks(struct inode *inod
err = -ENOSPC;
goto fail;
}
- err = lock_to_ceph_filelock(lock, &cephlock);
+ err = lock_to_ceph_filelock(lock, &flocks[l]);
if (err)
goto fail;
- err = ceph_pagelist_append(pagelist, &cephlock,
- sizeof(struct ceph_filelock));
+ ++l;
}
- if (err)
- goto fail;
}
fail:
return err;
}

+/**
+ * Copy the encoded flock and fcntl locks into the pagelist.
+ * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
+ * sequential flock locks.
+ * Returns zero on success.
+ */
+int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
+ struct ceph_pagelist *pagelist,
+ int num_fcntl_locks, int num_flock_locks)
+{
+ int err = 0;
+ __le32 nlocks;
+
+ nlocks = cpu_to_le32(num_fcntl_locks);
+ err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
+ if (err)
+ goto out_fail;
+
+ err = ceph_pagelist_append(pagelist, flocks,
+ num_fcntl_locks * sizeof(*flocks));
+ if (err)
+ goto out_fail;
+
+ nlocks = cpu_to_le32(num_flock_locks);
+ err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
+ if (err)
+ goto out_fail;
+
+ err = ceph_pagelist_append(pagelist,
+ &flocks[num_fcntl_locks],
+ num_flock_locks * sizeof(*flocks));
+out_fail:
+ return err;
+}
+
/*
* Given a pointer to a lock, convert it to a ceph filelock
*/
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2474,39 +2474,44 @@ static int encode_caps_cb(struct inode *

if (recon_state->flock) {
int num_fcntl_locks, num_flock_locks;
- struct ceph_pagelist_cursor trunc_point;
+ struct ceph_filelock *flocks;

- ceph_pagelist_set_cursor(pagelist, &trunc_point);
- do {
- lock_flocks();
- ceph_count_locks(inode, &num_fcntl_locks,
- &num_flock_locks);
- rec.v2.flock_len = cpu_to_le32(2*sizeof(u32) +
- (num_fcntl_locks+num_flock_locks) *
- sizeof(struct ceph_filelock));
- unlock_flocks();
-
- /* pre-alloc pagelist */
- ceph_pagelist_truncate(pagelist, &trunc_point);
- err = ceph_pagelist_append(pagelist, &rec, reclen);
- if (!err)
- err = ceph_pagelist_reserve(pagelist,
- rec.v2.flock_len);
-
- /* encode locks */
- if (!err) {
- lock_flocks();
- err = ceph_encode_locks(inode,
- pagelist,
- num_fcntl_locks,
- num_flock_locks);
- unlock_flocks();
- }
- } while (err == -ENOSPC);
+encode_again:
+ lock_flocks();
+ ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
+ unlock_flocks();
+ flocks = kmalloc((num_fcntl_locks+num_flock_locks) *
+ sizeof(struct ceph_filelock), GFP_NOFS);
+ if (!flocks) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+ lock_flocks();
+ err = ceph_encode_locks_to_buffer(inode, flocks,
+ num_fcntl_locks,
+ num_flock_locks);
+ unlock_flocks();
+ if (err) {
+ kfree(flocks);
+ if (err == -ENOSPC)
+ goto encode_again;
+ goto out_free;
+ }
+ /*
+ * number of encoded locks is stable, so copy to pagelist
+ */
+ rec.v2.flock_len = cpu_to_le32(2*sizeof(u32) +
+ (num_fcntl_locks+num_flock_locks) *
+ sizeof(struct ceph_filelock));
+ err = ceph_pagelist_append(pagelist, &rec, reclen);
+ if (!err)
+ err = ceph_locks_to_pagelist(flocks, pagelist,
+ num_fcntl_locks,
+ num_flock_locks);
+ kfree(flocks);
} else {
err = ceph_pagelist_append(pagelist, &rec, reclen);
}
-
out_free:
kfree(path);
out_dput:
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -841,8 +841,13 @@ extern const struct export_operations ce
extern int ceph_lock(struct file *file, int cmd, struct file_lock *fl);
extern int ceph_flock(struct file *file, int cmd, struct file_lock *fl);
extern void ceph_count_locks(struct inode *inode, int *p_num, int *f_num);
-extern int ceph_encode_locks(struct inode *i, struct ceph_pagelist *p,
- int p_locks, int f_locks);
+extern int ceph_encode_locks_to_buffer(struct inode *inode,
+ struct ceph_filelock *flocks,
+ int num_fcntl_locks,
+ int num_flock_locks);
+extern int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
+ struct ceph_pagelist *pagelist,
+ int num_fcntl_locks, int num_flock_locks);
extern int lock_to_ceph_filelock(struct file_lock *fl, struct ceph_filelock *c);

/* debugfs.c */

2013-06-18 16:30:00

by Greg KH

[permalink] [raw]
Subject: [ 13/48] s390/pci: Implement IRQ functions if !PCI

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <[email protected]>

commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.

All architectures must implement IRQ functions. Since various
dependencies on !S390 were removed, there are various drivers that can
be selected but will fail to link. Provide a dummy implementation of
these functions for the !PCI case.

Signed-off-by: Ben Hutchings <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/s390/kernel/irq.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
arch/s390/pci/pci.c | 33 -------------------------
2 files changed, 64 insertions(+), 33 deletions(-)

--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -313,3 +313,67 @@ void measurement_alert_subclass_unregist
spin_unlock(&ma_subclass_lock);
}
EXPORT_SYMBOL(measurement_alert_subclass_unregister);
+
+void synchronize_irq(unsigned int irq)
+{
+ /*
+ * Not needed, the handler is protected by a lock and IRQs that occur
+ * after the handler is deleted are just NOPs.
+ */
+}
+EXPORT_SYMBOL_GPL(synchronize_irq);
+
+#ifndef CONFIG_PCI
+
+/* Only PCI devices have dynamically-defined IRQ handlers */
+
+int request_irq(unsigned int irq, irq_handler_t handler,
+ unsigned long irqflags, const char *devname, void *dev_id)
+{
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(request_irq);
+
+void free_irq(unsigned int irq, void *dev_id)
+{
+ WARN_ON(1);
+}
+EXPORT_SYMBOL_GPL(free_irq);
+
+void enable_irq(unsigned int irq)
+{
+ WARN_ON(1);
+}
+EXPORT_SYMBOL_GPL(enable_irq);
+
+void disable_irq(unsigned int irq)
+{
+ WARN_ON(1);
+}
+EXPORT_SYMBOL_GPL(disable_irq);
+
+#endif /* !CONFIG_PCI */
+
+void disable_irq_nosync(unsigned int irq)
+{
+ disable_irq(irq);
+}
+EXPORT_SYMBOL_GPL(disable_irq_nosync);
+
+unsigned long probe_irq_on(void)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(probe_irq_on);
+
+int probe_irq_off(unsigned long val)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(probe_irq_off);
+
+unsigned int probe_irq_mask(unsigned long val)
+{
+ return val;
+}
+EXPORT_SYMBOL_GPL(probe_irq_mask);
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -306,15 +306,6 @@ static int zpci_cfg_store(struct zpci_de
return rc;
}

-void synchronize_irq(unsigned int irq)
-{
- /*
- * Not needed, the handler is protected by a lock and IRQs that occur
- * after the handler is deleted are just NOPs.
- */
-}
-EXPORT_SYMBOL_GPL(synchronize_irq);
-
void enable_irq(unsigned int irq)
{
struct msi_desc *msi = irq_get_msi_desc(irq);
@@ -331,30 +322,6 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL_GPL(disable_irq);

-void disable_irq_nosync(unsigned int irq)
-{
- disable_irq(irq);
-}
-EXPORT_SYMBOL_GPL(disable_irq_nosync);
-
-unsigned long probe_irq_on(void)
-{
- return 0;
-}
-EXPORT_SYMBOL_GPL(probe_irq_on);
-
-int probe_irq_off(unsigned long val)
-{
- return 0;
-}
-EXPORT_SYMBOL_GPL(probe_irq_off);
-
-unsigned int probe_irq_mask(unsigned long val)
-{
- return val;
-}
-EXPORT_SYMBOL_GPL(probe_irq_mask);
-
void pcibios_fixup_bus(struct pci_bus *bus)
{
}

2013-06-18 16:17:45

by Greg KH

[permalink] [raw]
Subject: [ 01/48] audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <[email protected]>

commit f000cfdde5de4fc15dead5ccf524359c07eadf2b upstream.

audit_log_start() does wait_for_auditd() in a loop until
audit_backlog_wait_time passes or audit_skb_queue has a room.

If signal_pending() is true this becomes a busy-wait loop, schedule() in
TASK_INTERRUPTIBLE won't block.

Thanks to Guy for fully investigating and explaining the problem.

(akpm: that'll cause the system to lock up on a non-preemptible
uniprocessor kernel)

(Guy: "Our customer was in fact running a uniprocessor machine, and they
reported a system hang.")

Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Guy Streeter <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/audit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1107,7 +1107,7 @@ static inline void audit_get_stamp(struc
static void wait_for_auditd(unsigned long sleep_time)
{
DECLARE_WAITQUEUE(wait, current);
- set_current_state(TASK_INTERRUPTIBLE);
+ set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&audit_backlog_wait, &wait);

if (audit_backlog_limit &&

2013-06-18 16:31:19

by Greg KH

[permalink] [raw]
Subject: [ 12/48] Bluetooth: Fix mgmt handling of power on failures

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hedberg <[email protected]>

commit 96570ffcca0b872dc8626e97569d2697f374d868 upstream.

If hci_dev_open fails we need to ensure that the corresponding
mgmt_set_powered command gets an appropriate response. This patch fixes
the missing response by adding a new mgmt_set_powered_failed function
that's used to indicate a power on failure to mgmt. Since a situation
with the device being rfkilled may require special handling in user
space the patch uses a new dedicated mgmt status code for this.

Signed-off-by: Johan Hedberg <[email protected]>
Acked-by: Marcel Holtmann <[email protected]>
Signed-off-by: Gustavo Padovan <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/net/bluetooth/hci_core.h | 1 +
include/net/bluetooth/mgmt.h | 1 +
net/bluetooth/hci_core.c | 6 +++++-
net/bluetooth/mgmt.c | 21 +++++++++++++++++++++
4 files changed, 28 insertions(+), 1 deletion(-)

--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1065,6 +1065,7 @@ void hci_sock_dev_event(struct hci_dev *
int mgmt_control(struct sock *sk, struct msghdr *msg, size_t len);
int mgmt_index_added(struct hci_dev *hdev);
int mgmt_index_removed(struct hci_dev *hdev);
+int mgmt_set_powered_failed(struct hci_dev *hdev, int err);
int mgmt_powered(struct hci_dev *hdev, u8 powered);
int mgmt_discoverable(struct hci_dev *hdev, u8 discoverable);
int mgmt_connectable(struct hci_dev *hdev, u8 connectable);
--- a/include/net/bluetooth/mgmt.h
+++ b/include/net/bluetooth/mgmt.h
@@ -42,6 +42,7 @@
#define MGMT_STATUS_NOT_POWERED 0x0f
#define MGMT_STATUS_CANCELLED 0x10
#define MGMT_STATUS_INVALID_INDEX 0x11
+#define MGMT_STATUS_RFKILLED 0x12

struct mgmt_hdr {
__le16 opcode;
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1139,11 +1139,15 @@ static const struct rfkill_ops hci_rfkil
static void hci_power_on(struct work_struct *work)
{
struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);
+ int err;

BT_DBG("%s", hdev->name);

- if (hci_dev_open(hdev->id) < 0)
+ err = hci_dev_open(hdev->id);
+ if (err < 0) {
+ mgmt_set_powered_failed(hdev, err);
return;
+ }

if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -3124,6 +3124,27 @@ int mgmt_powered(struct hci_dev *hdev, u
return err;
}

+int mgmt_set_powered_failed(struct hci_dev *hdev, int err)
+{
+ struct pending_cmd *cmd;
+ u8 status;
+
+ cmd = mgmt_pending_find(MGMT_OP_SET_POWERED, hdev);
+ if (!cmd)
+ return -ENOENT;
+
+ if (err == -ERFKILL)
+ status = MGMT_STATUS_RFKILLED;
+ else
+ status = MGMT_STATUS_FAILED;
+
+ err = cmd_status(cmd->sk, hdev->id, MGMT_OP_SET_POWERED, status);
+
+ mgmt_pending_remove(cmd);
+
+ return err;
+}
+
int mgmt_discoverable(struct hci_dev *hdev, u8 discoverable)
{
struct cmd_lookup match = { NULL, hdev };

2013-06-18 16:31:39

by Greg KH

[permalink] [raw]
Subject: [ 11/48] Bluetooth: Fix missing length checks for L2CAP signalling PDUs

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hedberg <[email protected]>

commit cb3b3152b2f5939d67005cff841a1ca748b19888 upstream.

There has been code in place to check that the L2CAP length header
matches the amount of data received, but many PDU handlers have not been
checking that the data received actually matches that expected by the
specific PDU. This patch adds passing the length header to the specific
handler functions and ensures that those functions fail cleanly in the
case of an incorrect amount of data.

Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Gustavo Padovan <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/bluetooth/l2cap_core.c | 70 +++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 18 deletions(-)

--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -3568,10 +3568,14 @@ static void l2cap_conf_rfc_get(struct l2
}

static inline int l2cap_command_rej(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_cmd_rej_unk *rej = (struct l2cap_cmd_rej_unk *) data;

+ if (cmd_len < sizeof(*rej))
+ return -EPROTO;
+
if (rej->reason != L2CAP_REJ_NOT_UNDERSTOOD)
return 0;

@@ -3720,11 +3724,14 @@ sendresp:
}

static int l2cap_connect_req(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len, u8 *data)
{
struct hci_dev *hdev = conn->hcon->hdev;
struct hci_conn *hcon = conn->hcon;

+ if (cmd_len < sizeof(struct l2cap_conn_req))
+ return -EPROTO;
+
hci_dev_lock(hdev);
if (test_bit(HCI_MGMT, &hdev->dev_flags) &&
!test_and_set_bit(HCI_CONN_MGMT_CONNECTED, &hcon->flags))
@@ -3738,7 +3745,8 @@ static int l2cap_connect_req(struct l2ca
}

static int l2cap_connect_create_rsp(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_conn_rsp *rsp = (struct l2cap_conn_rsp *) data;
u16 scid, dcid, result, status;
@@ -3746,6 +3754,9 @@ static int l2cap_connect_create_rsp(stru
u8 req[128];
int err;

+ if (cmd_len < sizeof(*rsp))
+ return -EPROTO;
+
scid = __le16_to_cpu(rsp->scid);
dcid = __le16_to_cpu(rsp->dcid);
result = __le16_to_cpu(rsp->result);
@@ -3843,6 +3854,9 @@ static inline int l2cap_config_req(struc
struct l2cap_chan *chan;
int len, err = 0;

+ if (cmd_len < sizeof(*req))
+ return -EPROTO;
+
dcid = __le16_to_cpu(req->dcid);
flags = __le16_to_cpu(req->flags);

@@ -3866,7 +3880,7 @@ static inline int l2cap_config_req(struc

/* Reject if config buffer is too small. */
len = cmd_len - sizeof(*req);
- if (len < 0 || chan->conf_len + len > sizeof(chan->conf_req)) {
+ if (chan->conf_len + len > sizeof(chan->conf_req)) {
l2cap_send_cmd(conn, cmd->ident, L2CAP_CONF_RSP,
l2cap_build_conf_rsp(chan, rsp,
L2CAP_CONF_REJECT, flags), rsp);
@@ -3944,14 +3958,18 @@ unlock:
}

static inline int l2cap_config_rsp(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_conf_rsp *rsp = (struct l2cap_conf_rsp *)data;
u16 scid, flags, result;
struct l2cap_chan *chan;
- int len = le16_to_cpu(cmd->len) - sizeof(*rsp);
+ int len = cmd_len - sizeof(*rsp);
int err = 0;

+ if (cmd_len < sizeof(*rsp))
+ return -EPROTO;
+
scid = __le16_to_cpu(rsp->scid);
flags = __le16_to_cpu(rsp->flags);
result = __le16_to_cpu(rsp->result);
@@ -4052,7 +4070,8 @@ done:
}

static inline int l2cap_disconnect_req(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_disconn_req *req = (struct l2cap_disconn_req *) data;
struct l2cap_disconn_rsp rsp;
@@ -4060,6 +4079,9 @@ static inline int l2cap_disconnect_req(s
struct l2cap_chan *chan;
struct sock *sk;

+ if (cmd_len != sizeof(*req))
+ return -EPROTO;
+
scid = __le16_to_cpu(req->scid);
dcid = __le16_to_cpu(req->dcid);

@@ -4099,12 +4121,16 @@ static inline int l2cap_disconnect_req(s
}

static inline int l2cap_disconnect_rsp(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_disconn_rsp *rsp = (struct l2cap_disconn_rsp *) data;
u16 dcid, scid;
struct l2cap_chan *chan;

+ if (cmd_len != sizeof(*rsp))
+ return -EPROTO;
+
scid = __le16_to_cpu(rsp->scid);
dcid = __le16_to_cpu(rsp->dcid);

@@ -4134,11 +4160,15 @@ static inline int l2cap_disconnect_rsp(s
}

static inline int l2cap_information_req(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_info_req *req = (struct l2cap_info_req *) data;
u16 type;

+ if (cmd_len != sizeof(*req))
+ return -EPROTO;
+
type = __le16_to_cpu(req->type);

BT_DBG("type 0x%4.4x", type);
@@ -4185,11 +4215,15 @@ static inline int l2cap_information_req(
}

static inline int l2cap_information_rsp(struct l2cap_conn *conn,
- struct l2cap_cmd_hdr *cmd, u8 *data)
+ struct l2cap_cmd_hdr *cmd, u16 cmd_len,
+ u8 *data)
{
struct l2cap_info_rsp *rsp = (struct l2cap_info_rsp *) data;
u16 type, result;

+ if (cmd_len != sizeof(*rsp))
+ return -EPROTO;
+
type = __le16_to_cpu(rsp->type);
result = __le16_to_cpu(rsp->result);

@@ -5055,16 +5089,16 @@ static inline int l2cap_bredr_sig_cmd(st

switch (cmd->code) {
case L2CAP_COMMAND_REJ:
- l2cap_command_rej(conn, cmd, data);
+ l2cap_command_rej(conn, cmd, cmd_len, data);
break;

case L2CAP_CONN_REQ:
- err = l2cap_connect_req(conn, cmd, data);
+ err = l2cap_connect_req(conn, cmd, cmd_len, data);
break;

case L2CAP_CONN_RSP:
case L2CAP_CREATE_CHAN_RSP:
- err = l2cap_connect_create_rsp(conn, cmd, data);
+ err = l2cap_connect_create_rsp(conn, cmd, cmd_len, data);
break;

case L2CAP_CONF_REQ:
@@ -5072,15 +5106,15 @@ static inline int l2cap_bredr_sig_cmd(st
break;

case L2CAP_CONF_RSP:
- err = l2cap_config_rsp(conn, cmd, data);
+ err = l2cap_config_rsp(conn, cmd, cmd_len, data);
break;

case L2CAP_DISCONN_REQ:
- err = l2cap_disconnect_req(conn, cmd, data);
+ err = l2cap_disconnect_req(conn, cmd, cmd_len, data);
break;

case L2CAP_DISCONN_RSP:
- err = l2cap_disconnect_rsp(conn, cmd, data);
+ err = l2cap_disconnect_rsp(conn, cmd, cmd_len, data);
break;

case L2CAP_ECHO_REQ:
@@ -5091,11 +5125,11 @@ static inline int l2cap_bredr_sig_cmd(st
break;

case L2CAP_INFO_REQ:
- err = l2cap_information_req(conn, cmd, data);
+ err = l2cap_information_req(conn, cmd, cmd_len, data);
break;

case L2CAP_INFO_RSP:
- err = l2cap_information_rsp(conn, cmd, data);
+ err = l2cap_information_rsp(conn, cmd, cmd_len, data);
break;

case L2CAP_CREATE_CHAN_REQ:

2013-06-18 16:31:38

by Greg KH

[permalink] [raw]
Subject: [ 09/48] drm/gma500/psb: Unpin framebuffer on crtc disable

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Patrik Jakobsson <[email protected]>

commit 820de86a90089ee607d7864538c98a23b503c846 upstream.

The framebuffer needs to be unpinned in the crtc->disable callback
because of previous pinning in psb_intel_pipe_set_base(). This will fix
a memory leak where the framebuffer was released but not unpinned
properly. This patch only affects Poulsbo.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=889511
Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=812113
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Patrik Jakobsson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/gma500/psb_intel_display.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

--- a/drivers/gpu/drm/gma500/psb_intel_display.c
+++ b/drivers/gpu/drm/gma500/psb_intel_display.c
@@ -1246,6 +1246,19 @@ void psb_intel_crtc_destroy(struct drm_c
kfree(psb_intel_crtc);
}

+static void psb_intel_crtc_disable(struct drm_crtc *crtc)
+{
+ struct gtt_range *gt;
+ struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;
+
+ crtc_funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
+
+ if (crtc->fb) {
+ gt = to_psb_fb(crtc->fb)->gtt;
+ psb_gtt_unpin(gt);
+ }
+}
+
const struct drm_crtc_helper_funcs psb_intel_helper_funcs = {
.dpms = psb_intel_crtc_dpms,
.mode_fixup = psb_intel_crtc_mode_fixup,
@@ -1253,6 +1266,7 @@ const struct drm_crtc_helper_funcs psb_i
.mode_set_base = psb_intel_pipe_set_base,
.prepare = psb_intel_crtc_prepare,
.commit = psb_intel_crtc_commit,
+ .disable = psb_intel_crtc_disable,
};

const struct drm_crtc_funcs psb_intel_crtc_funcs = {

2013-06-18 16:17:43

by Greg KH

[permalink] [raw]
Subject: [ 02/48] b43: stop format string leaking into error msgs

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit e0e29b683d6784ef59bbc914eac85a04b650e63c upstream.

The module parameter "fwpostfix" is userspace controllable, unfiltered,
and is used to define the firmware filename. b43_do_request_fw() populates
ctx->errors[] on error, containing the firmware filename. b43err()
parses its arguments as a format string. For systems with b43 hardware,
this could lead to a uid-0 to ring-0 escalation.

CVE-2013-2852

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/wireless/b43/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -2451,7 +2451,7 @@ static void b43_request_firmware(struct
for (i = 0; i < B43_NR_FWTYPES; i++) {
errmsg = ctx->errors[i];
if (strlen(errmsg))
- b43err(dev->wl, errmsg);
+ b43err(dev->wl, "%s", errmsg);
}
b43_print_fw_helptext(dev->wl, 1);
goto out;

2013-06-18 16:32:17

by Greg KH

[permalink] [raw]
Subject: [ 10/48] drm/gma500/cdv: Unpin framebuffer on crtc disable

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Patrik Jakobsson <[email protected]>

commit 22e7c385a80d771aaf3a15ae7ccea3b0686bbe10 upstream.

The framebuffer needs to be unpinned in the crtc->disable callback
because of previous pinning in psb_intel_pipe_set_base(). This will fix
a memory leak where the framebuffer was released but not unpinned
properly. This patch only affects Cedarview.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=889511
Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=812113
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Patrik Jakobsson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/gma500/cdv_intel_display.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

--- a/drivers/gpu/drm/gma500/cdv_intel_display.c
+++ b/drivers/gpu/drm/gma500/cdv_intel_display.c
@@ -1750,6 +1750,19 @@ static void cdv_intel_crtc_destroy(struc
kfree(psb_intel_crtc);
}

+static void cdv_intel_crtc_disable(struct drm_crtc *crtc)
+{
+ struct gtt_range *gt;
+ struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;
+
+ crtc_funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
+
+ if (crtc->fb) {
+ gt = to_psb_fb(crtc->fb)->gtt;
+ psb_gtt_unpin(gt);
+ }
+}
+
const struct drm_crtc_helper_funcs cdv_intel_helper_funcs = {
.dpms = cdv_intel_crtc_dpms,
.mode_fixup = cdv_intel_crtc_mode_fixup,
@@ -1757,6 +1770,7 @@ const struct drm_crtc_helper_funcs cdv_i
.mode_set_base = cdv_intel_pipe_set_base,
.prepare = cdv_intel_crtc_prepare,
.commit = cdv_intel_crtc_commit,
+ .disable = cdv_intel_crtc_disable,
};

const struct drm_crtc_funcs cdv_intel_crtc_funcs = {

2013-06-18 16:32:55

by Greg KH

[permalink] [raw]
Subject: [ 08/48] drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tony Lindgren <[email protected]>

commit 24b8256a1fb28d357bc6fa09184ba29b4255ba5c upstream.

When booted in legacy mode device_init_wakeup() gets called by
drivers/mfd/twl-core.c when the children are initialized. However, when
booted using device tree, the children are created with
of_platform_populate() instead add_children().

This means that the RTC driver will not have device_init_wakeup() set,
and we need to call it from the driver probe like RTC drivers typically
do.

Without this we cannot test PM wake-up events on omaps for cases where
there may not be any physical wake-up event.

Signed-off-by: Tony Lindgren <[email protected]>
Reported-by: Kevin Hilman <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Jingoo Han <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/rtc/rtc-twl.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/rtc/rtc-twl.c
+++ b/drivers/rtc/rtc-twl.c
@@ -524,6 +524,7 @@ static int twl_rtc_probe(struct platform
}

platform_set_drvdata(pdev, rtc);
+ device_init_wakeup(&pdev->dev, 1);
return 0;

out2:

2013-06-18 16:33:14

by Greg KH

[permalink] [raw]
Subject: [ 07/48] rbd: dont destroy ceph_opts in rbd_add()

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Elder <[email protected]>

commit 7262cfca430a1a0e0707149af29ae86bc0ded230 upstream.

Whether rbd_client_create() successfully creates a new client or
not, it takes responsibility for getting the ceph_opts structure
it's passed destroyed. If successful, the structure becomes
associated with the created client; if not, rbd_client_create()
will destroy it.

Previously, rbd_get_client() would call ceph_destroy_options()
if rbd_get_client() failed, and that meant it got called twice.
That led freeing various pointers more than once, which is never a
good idea.

This resolves:
http://tracker.ceph.com/issues/4559

Reported-by: Dan van der Ster <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Josh Durgin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/rbd.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -435,8 +435,8 @@ static const struct block_device_operati
};

/*
- * Initialize an rbd client instance.
- * We own *ceph_opts.
+ * Initialize an rbd client instance. Success or not, this function
+ * consumes ceph_opts.
*/
static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts)
{
@@ -583,7 +583,8 @@ static int parse_rbd_opts_token(char *c,

/*
* Get a ceph client with specific addr and configuration, if one does
- * not exist create it.
+ * not exist create it. Either way, ceph_opts is consumed by this
+ * function.
*/
static struct rbd_client *rbd_get_client(struct ceph_options *ceph_opts)
{
@@ -4104,7 +4105,6 @@ static ssize_t rbd_add(struct bus_type *
rc = PTR_ERR(rbdc);
goto err_out_args;
}
- ceph_opts = NULL; /* rbd_dev client now owns this */

/* pick the pool */
osdc = &rbdc->client->osdc;
@@ -4140,8 +4140,6 @@ err_out_rbd_dev:
err_out_client:
rbd_put_client(rbdc);
err_out_args:
- if (ceph_opts)
- ceph_destroy_options(ceph_opts);
kfree(rbd_opts);
rbd_spec_put(spec);
err_out_module:

2013-06-18 16:33:38

by Greg KH

[permalink] [raw]
Subject: [ 05/48] ceph: add cpu_to_le32() calls when encoding a reconnect capability

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jim Schutt <[email protected]>

commit c420276a532a10ef59849adc2681f45306166b89 upstream.

In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Signed-off-by: Jim Schutt <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ceph/locks.c | 7 +++++--
fs/ceph/mds_client.c | 2 +-
2 files changed, 6 insertions(+), 3 deletions(-)

--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -206,10 +206,12 @@ int ceph_encode_locks(struct inode *inod
int err = 0;
int seen_fcntl = 0;
int seen_flock = 0;
+ __le32 nlocks;

dout("encoding %d flock and %d fcntl locks", num_flock_locks,
num_fcntl_locks);
- err = ceph_pagelist_append(pagelist, &num_fcntl_locks, sizeof(u32));
+ nlocks = cpu_to_le32(num_fcntl_locks);
+ err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
if (err)
goto fail;
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
@@ -229,7 +231,8 @@ int ceph_encode_locks(struct inode *inod
goto fail;
}

- err = ceph_pagelist_append(pagelist, &num_flock_locks, sizeof(u32));
+ nlocks = cpu_to_le32(num_flock_locks);
+ err = ceph_pagelist_append(pagelist, &nlocks, sizeof(nlocks));
if (err)
goto fail;
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2481,7 +2481,7 @@ static int encode_caps_cb(struct inode *
lock_flocks();
ceph_count_locks(inode, &num_fcntl_locks,
&num_flock_locks);
- rec.v2.flock_len = (2*sizeof(u32) +
+ rec.v2.flock_len = cpu_to_le32(2*sizeof(u32) +
(num_fcntl_locks+num_flock_locks) *
sizeof(struct ceph_filelock));
unlock_flocks();

2013-06-18 16:33:54

by Greg KH

[permalink] [raw]
Subject: [ 04/48] libceph: must hold mutex for reset_changed_osds()

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Elder <[email protected]>

commit 14d2f38df67fadee34625fcbd282ee22514c4846 upstream.

An osd client has a red-black tree describing its osds, and
occasionally we would get crashes due to one of these trees tree
becoming corrupt somehow.

The problem turned out to be that reset_changed_osds() was being
called without protection of the osd client request mutex. That
function would call __reset_osd() for any osd that had changed, and
__reset_osd() would call __remove_osd() for any osd with no
outstanding requests, and finally __remove_osd() would remove the
corresponding entry from the red-black tree. Thus, the tree was
getting modified without having any lock protection, and was
vulnerable to problems due to concurrent updates.

This appears to be the only osd tree updating path that has this
problem. It can be fairly easily fixed by moving the call up
a few lines, to just before the request mutex gets dropped
in kick_requests().

This resolves:
http://tracker.ceph.com/issues/5043

Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/ceph/osd_client.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1399,13 +1399,13 @@ static void kick_requests(struct ceph_os
__register_request(osdc, req);
__unregister_linger_request(osdc, req);
}
+ reset_changed_osds(osdc);
mutex_unlock(&osdc->request_mutex);

if (needmap) {
dout("%d requests for down osds, need new map\n", needmap);
ceph_monc_request_next_osdmap(&osdc->client->monc);
}
- reset_changed_osds(osdc);
}



2013-06-18 16:34:37

by Greg KH

[permalink] [raw]
Subject: [ 03/48] ACPI / video: Do not bind to device objects with a scan handler

From: Greg Kroah-Hartman <[email protected]>

3.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Rafael J. Wysocki" <[email protected]>

commit 8c9b7a7b2fc2750af418ddc28e707c42e78aa0bf upstream.

With the introduction of ACPI scan handlers, ACPI device objects
with an ACPI scan handler attached to them must not be bound to
by ACPI drivers any more. Unfortunately, however, the ACPI video
driver attempts to do just that if there is a _ROM ACPI control
method defined under a device object with an ACPI scan handler.

Prevent that from happening by making the video driver's "add"
routine check if the device object already has an ACPI scan handler
attached to it and return an error code in that case.

That is not sufficient, though, because acpi_bus_driver_init() would
then clear the device object's driver_data that may be set by its
scan handler, so for the fix to work acpi_bus_driver_init() has to be
modified to leave driver_data as is on errors.

References: https://bugzilla.kernel.org/show_bug.cgi?id=58091
Bisected-and-tested-by: Dmitry S. Demin <[email protected]>
Reported-and-tested-by: Jason Cassell <[email protected]>
Tracked-down-by: Aaron Lu <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Aaron Lu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/acpi/scan.c | 5 +----
drivers/acpi/video.c | 3 +++
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -830,11 +830,8 @@ acpi_bus_driver_init(struct acpi_device
return -ENOSYS;

result = driver->ops.add(device);
- if (result) {
- device->driver = NULL;
- device->driver_data = NULL;
+ if (result)
return result;
- }

device->driver = driver;

--- a/drivers/acpi/video.c
+++ b/drivers/acpi/video.c
@@ -1646,6 +1646,9 @@ static int acpi_video_bus_add(struct acp
int error;
acpi_status status;

+ if (device->handler)
+ return -EINVAL;
+
status = acpi_walk_namespace(ACPI_TYPE_DEVICE,
device->parent->handle, 1,
acpi_video_bus_match, NULL,

2013-06-18 17:35:48

by Ben Hutchings

[permalink] [raw]
Subject: Re: [ 13/48] s390/pci: Implement IRQ functions if !PCI

On Tue, Jun 18, 2013 at 09:17:39AM -0700, Greg Kroah-Hartman wrote:
> From: Greg Kroah-Hartman <[email protected]>
>
> 3.9-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Ben Hutchings <[email protected]>
>
> commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.
>
> All architectures must implement IRQ functions. Since various
> dependencies on !S390 were removed, there are various drivers that can
> be selected but will fail to link. Provide a dummy implementation of
> these functions for the !PCI case.
[...]

This breaks !SMP builds, so it's probably best to defer this until the
following fix is in mainline.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus

2013-06-18 17:42:38

by Greg KH

[permalink] [raw]
Subject: Re: [ 13/48] s390/pci: Implement IRQ functions if !PCI

On Tue, Jun 18, 2013 at 06:35:40PM +0100, Ben Hutchings wrote:
> On Tue, Jun 18, 2013 at 09:17:39AM -0700, Greg Kroah-Hartman wrote:
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Ben Hutchings <[email protected]>
> >
> > commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.
> >
> > All architectures must implement IRQ functions. Since various
> > dependencies on !S390 were removed, there are various drivers that can
> > be selected but will fail to link. Provide a dummy implementation of
> > these functions for the !PCI case.
> [...]
>
> This breaks !SMP builds, so it's probably best to defer this until the
> following fix is in mainline.

Can you resend the needed git ids to me when they both go in, otherwise
I'm going to forget about this.

thanks,

greg k-h

2013-06-18 21:35:40

by Ben Hutchings

[permalink] [raw]
Subject: Re: [ 13/48] s390/pci: Implement IRQ functions if !PCI

On Tue, 2013-06-18 at 10:42 -0700, Greg Kroah-Hartman wrote:
> On Tue, Jun 18, 2013 at 06:35:40PM +0100, Ben Hutchings wrote:
> > On Tue, Jun 18, 2013 at 09:17:39AM -0700, Greg Kroah-Hartman wrote:
> > > From: Greg Kroah-Hartman <[email protected]>
> > >
> > > 3.9-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Ben Hutchings <[email protected]>
> > >
> > > commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.
> > >
> > > All architectures must implement IRQ functions. Since various
> > > dependencies on !S390 were removed, there are various drivers that can
> > > be selected but will fail to link. Provide a dummy implementation of
> > > these functions for the !PCI case.
> > [...]
> >
> > This breaks !SMP builds, so it's probably best to defer this until the
> > following fix is in mainline.
>
> Can you resend the needed git ids to me when they both go in, otherwise
> I'm going to forget about this.

If you forget then you'll send me a reminder saying FAILED, so I can
tell you then. :-)

Ben.

--
Ben Hutchings
Humans are not rational beings; they are rationalising beings.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2013-06-18 21:55:19

by Shuah Khan

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

On 06/18/2013 11:24 AM, Greg Kroah-Hartman wrote:
> From: Greg Kroah-Hartman <[email protected]>
>
> This is the start of the stable review cycle for the 3.9.7 release.
> There are 48 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.9.7-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.82, 3.4.49, and 3.9.6

Compiled and booted on the following systems:

Samsung Series 9 900X4C Intel Corei5:
(3.4.50-rc1, and 3.9.7-rc1)
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics:
(3.0.83-rc1, 3.4.50-rc1, and 3.9.7-rc1)

dmesgs for all releases look good. No regressions compared to the
previous dmesgs for each of these releases.

Reviewed patches.

Cross-compile testing:
HP Compaq dc7700 SFF desktop: x86-64 Intel Core-i2:
(3.0.83-rc1, 3.4.50-rc1, and 3.9.7-rc1)

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.9.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y and 3.9.y
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

thanks,
-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research
America (Silicon Valley) [email protected] | (970) 672-0658

2013-06-18 22:11:18

by Greg KH

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

On Tue, Jun 18, 2013 at 09:55:13PM +0000, Shuah Khan wrote:
> On 06/18/2013 11:24 AM, Greg Kroah-Hartman wrote:
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > This is the start of the stable review cycle for the 3.9.7 release.
> > There are 48 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.9.7-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Patches applied cleanly to 3.0.82, 3.4.49, and 3.9.6
>
> Compiled and booted on the following systems:
>
> Samsung Series 9 900X4C Intel Corei5:
> (3.4.50-rc1, and 3.9.7-rc1)
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics:
> (3.0.83-rc1, 3.4.50-rc1, and 3.9.7-rc1)
>
> dmesgs for all releases look good. No regressions compared to the
> previous dmesgs for each of these releases.

Great, thanks for testing and letting me know.

greg k-h

2013-06-18 22:58:42

by Guenter Roeck

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

On Tue, Jun 18, 2013 at 09:17:26AM -0700, Greg Kroah-Hartman wrote:
> From: Greg Kroah-Hartman <[email protected]>
>
> This is the start of the stable review cycle for the 3.9.7 release.
> There are 48 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.9.7-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Test build results:

Build reference: v3.9.6-48-g72559a8

Build x86_64:defconfig passed
Build x86_64:allyesconfig passed
Build x86_64:allmodconfig passed
Build x86_64:allnoconfig passed
Build x86_64:alldefconfig passed
Build i386:defconfig passed
Build i386:allyesconfig passed
Build i386:allmodconfig passed
Build i386:allnoconfig passed
Build i386:alldefconfig passed
Build mips:defconfig passed
Build mips:bcm47xx_defconfig passed
Build mips:bcm63xx_defconfig passed
Build mips:nlm_xlp_defconfig passed
Build mips:ath79_defconfig passed
Build mips:ar7_defconfig passed
Build mips:fuloong2e_defconfig passed
Build mips:e55_defconfig passed
Build mips:cavium_octeon_defconfig passed
Build powerpc:defconfig passed
Build powerpc:allyesconfig failed
Build powerpc:allmodconfig passed
Build powerpc:mpc85xx_defconfig passed
Build powerpc:mpc85xx_smp_defconfig passed
Build powerpc:tqm8xx_defconfig passed
Build powerpc:85xx/sbc8548_defconfig passed
Build powerpc:83xx/mpc834x_mds_defconfig passed
Build powerpc:86xx/sbc8641d_defconfig passed
Build arm:defconfig passed
Build arm:allyesconfig failed
Build arm:allmodconfig failed
Build arm:exynos4_defconfig passed
Build arm:kirkwood_defconfig passed
Build arm:omap2plus_defconfig passed
Build arm:tegra_defconfig passed
Build arm:u8500_defconfig passed
Build m68k:defconfig passed
Build m68k:apollo_defconfig passed
Build m68k:m5272c3_defconfig failed
Build m68k:m5307c3_defconfig passed
Build m68k:mac_defconfig passed
Build m68k:multi_defconfig passed
Build m68k:sun3_defconfig passed
Build m68k:sun3x_defconfig passed
Build m68k:mvme16x_defconfig passed
Build m68k:hp300_defconfig passed

-----------------------
Total builds: 46 Total build errors: 4

Results match the results for v3.9.6.

Guenter

2013-06-18 23:27:03

by Greg KH

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

On Tue, Jun 18, 2013 at 03:58:37PM -0700, Guenter Roeck wrote:
> On Tue, Jun 18, 2013 at 09:17:26AM -0700, Greg Kroah-Hartman wrote:
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > This is the start of the stable review cycle for the 3.9.7 release.
> > There are 48 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.9.7-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> Test build results:
>
> Build reference: v3.9.6-48-g72559a8
>
> Build x86_64:defconfig passed
> Build x86_64:allyesconfig passed
> Build x86_64:allmodconfig passed
> Build x86_64:allnoconfig passed
> Build x86_64:alldefconfig passed
> Build i386:defconfig passed
> Build i386:allyesconfig passed
> Build i386:allmodconfig passed
> Build i386:allnoconfig passed
> Build i386:alldefconfig passed
> Build mips:defconfig passed
> Build mips:bcm47xx_defconfig passed
> Build mips:bcm63xx_defconfig passed
> Build mips:nlm_xlp_defconfig passed
> Build mips:ath79_defconfig passed
> Build mips:ar7_defconfig passed
> Build mips:fuloong2e_defconfig passed
> Build mips:e55_defconfig passed
> Build mips:cavium_octeon_defconfig passed
> Build powerpc:defconfig passed
> Build powerpc:allyesconfig failed
> Build powerpc:allmodconfig passed
> Build powerpc:mpc85xx_defconfig passed
> Build powerpc:mpc85xx_smp_defconfig passed
> Build powerpc:tqm8xx_defconfig passed
> Build powerpc:85xx/sbc8548_defconfig passed
> Build powerpc:83xx/mpc834x_mds_defconfig passed
> Build powerpc:86xx/sbc8641d_defconfig passed
> Build arm:defconfig passed
> Build arm:allyesconfig failed
> Build arm:allmodconfig failed
> Build arm:exynos4_defconfig passed
> Build arm:kirkwood_defconfig passed
> Build arm:omap2plus_defconfig passed
> Build arm:tegra_defconfig passed
> Build arm:u8500_defconfig passed
> Build m68k:defconfig passed
> Build m68k:apollo_defconfig passed
> Build m68k:m5272c3_defconfig failed
> Build m68k:m5307c3_defconfig passed
> Build m68k:mac_defconfig passed
> Build m68k:multi_defconfig passed
> Build m68k:sun3_defconfig passed
> Build m68k:sun3x_defconfig passed
> Build m68k:mvme16x_defconfig passed
> Build m68k:hp300_defconfig passed
>
> -----------------------
> Total builds: 46 Total build errors: 4
>
> Results match the results for v3.9.6.

Thanks for testing and letting me know.

greg k-h

2013-06-19 07:09:10

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [ 13/48] s390/pci: Implement IRQ functions if !PCI

On Tue, 18 Jun 2013 18:35:40 +0100
Ben Hutchings <[email protected]> wrote:

> On Tue, Jun 18, 2013 at 09:17:39AM -0700, Greg Kroah-Hartman wrote:
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Ben Hutchings <[email protected]>
> >
> > commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.
> >
> > All architectures must implement IRQ functions. Since various
> > dependencies on !S390 were removed, there are various drivers that can
> > be selected but will fail to link. Provide a dummy implementation of
> > these functions for the !PCI case.
> [...]
>
> This breaks !SMP builds, so it's probably best to defer this until the
> following fix is in mainline.

I guess that all of the relevant kernels that are build for s390 are SMP
enabled. The patch fixes fallout in a number of PCI drivers and should
go in as is. The patch to fix the !SMP build will go on top.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2013-06-20 09:54:43

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

Hi Naoya,

At Tue, 18 Jun 2013 09:17:55 -0700,
Greg Kroah-Hartman wrote:
>
> From: Greg Kroah-Hartman <[email protected]>
>
> 3.9-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Naoya Horiguchi <[email protected]>
>
> commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
>
> When we have a page fault for the address which is backed by a hugepage
> under migration, the kernel can't wait correctly and do busy looping on
> hugepage fault until the migration finishes. As a result, users who try
> to kick hugepage migration (via soft offlining, for example) occasionally
> experience long delay or soft lockup.
>
> This is because pte_offset_map_lock() can't get a correct migration entry
> or a correct page table lock for hugepage. This patch introduces
> migration_entry_wait_huge() to solve this.

I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().

Correct me if I'm wrong.

Thanks,
Satoru

>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Reviewed-by: Rik van Riel <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> Reviewed-by: Michal Hocko <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> include/linux/swapops.h | 3 +++
> mm/hugetlb.c | 2 +-
> mm/migrate.c | 23 ++++++++++++++++++-----
> 3 files changed, 22 insertions(+), 6 deletions(-)
>
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -137,6 +137,7 @@ static inline void make_migration_entry_
>
> extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> unsigned long address);
> +extern void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte);
> #else
>
> #define make_migration_entry(page, write) swp_entry(0, 0)
> @@ -148,6 +149,8 @@ static inline int is_migration_entry(swp
> static inline void make_migration_entry_read(swp_entry_t *entryp) { }
> static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> unsigned long address) { }
> +static inline void migration_entry_wait_huge(struct mm_struct *mm,
> + pte_t *pte) { }
> static inline int is_write_migration_entry(swp_entry_t entry)
> {
> return 0;
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm,
> if (ptep) {
> entry = huge_ptep_get(ptep);
> if (unlikely(is_hugetlb_entry_migration(entry))) {
> - migration_entry_wait(mm, (pmd_t *)ptep, address);
> + migration_entry_wait_huge(mm, ptep);
> return 0;
> } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> return VM_FAULT_HWPOISON_LARGE |
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -200,15 +200,14 @@ static void remove_migration_ptes(struct
> * get to the page and wait until migration is finished.
> * When we return from this function the fault will be retried.
> */
> -void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> - unsigned long address)
> +static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> + spinlock_t *ptl)
> {
> - pte_t *ptep, pte;
> - spinlock_t *ptl;
> + pte_t pte;
> swp_entry_t entry;
> struct page *page;
>
> - ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> + spin_lock(ptl);
> pte = *ptep;
> if (!is_swap_pte(pte))
> goto out;
> @@ -236,6 +235,20 @@ out:
> pte_unmap_unlock(ptep, ptl);
> }
>
> +void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> + unsigned long address)
> +{
> + spinlock_t *ptl = pte_lockptr(mm, pmd);
> + pte_t *ptep = pte_offset_map(pmd, address);
> + __migration_entry_wait(mm, ptep, ptl);
> +}
> +
> +void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte)
> +{
> + spinlock_t *ptl = &(mm)->page_table_lock;
> + __migration_entry_wait(mm, pte, ptl);
> +}
> +
> #ifdef CONFIG_BLOCK
> /* Returns true if all buffers are successfully locked */
> static bool buffer_migrate_lock_buffers(struct buffer_head *head,
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-06-20 10:02:50

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

At Tue, 18 Jun 2013 09:17:26 -0700,
Greg Kroah-Hartman wrote:
>
> From: Greg Kroah-Hartman <[email protected]>
>
> This is the start of the stable review cycle for the 3.9.7 release.
> There are 48 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> Anything received after that time might be too late.

This kernel can be built and boot without any problem.
Building a kernel with this kernel also works fine.

- Build Machine: debian jessy x86_64
CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz x 4
memory: 8GB

- Test machine: debian jessy x86_64(KVM guest on the Build Machine)
vCPU: x2
memory: 2GB

I reviewed the following patches.

The following patch seems to have a problem and I'm asking Naoya now.

> Naoya Horiguchi <[email protected]>
> mm: migration: add migrate_entry_wait_huge()

The following patches looks good to me.

> Kees Cook <[email protected]>
> x86: Fix typo in kexec register clearing
...
> Rafael Aquini <[email protected]>
> swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
...
> Oleg Nesterov <[email protected]>
> audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE

Thanks,
Satoru

2013-06-20 17:01:35

by Greg KH

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

On Thu, Jun 20, 2013 at 06:52:43PM +0900, Satoru Takeuchi wrote:
> Hi Naoya,
>
> At Tue, 18 Jun 2013 09:17:55 -0700,
> Greg Kroah-Hartman wrote:
> >
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Naoya Horiguchi <[email protected]>
> >
> > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> >
> > When we have a page fault for the address which is backed by a hugepage
> > under migration, the kernel can't wait correctly and do busy looping on
> > hugepage fault until the migration finishes. As a result, users who try
> > to kick hugepage migration (via soft offlining, for example) occasionally
> > experience long delay or soft lockup.
> >
> > This is because pte_offset_map_lock() can't get a correct migration entry
> > or a correct page table lock for hugepage. This patch introduces
> > migration_entry_wait_huge() to solve this.
>
> I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().

Have you tried this?

Also, the same issue is still in 3.10-rc6, right?

thanks,

greg k-h

2013-06-20 19:22:01

by Greg KH

[permalink] [raw]
Subject: Re: [ 13/48] s390/pci: Implement IRQ functions if !PCI

On Tue, Jun 18, 2013 at 06:35:40PM +0100, Ben Hutchings wrote:
> On Tue, Jun 18, 2013 at 09:17:39AM -0700, Greg Kroah-Hartman wrote:
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Ben Hutchings <[email protected]>
> >
> > commit c46b54f7406780ec4cf9c9124d1cfb777674dc70 upstream.
> >
> > All architectures must implement IRQ functions. Since various
> > dependencies on !S390 were removed, there are various drivers that can
> > be selected but will fail to link. Provide a dummy implementation of
> > these functions for the !PCI case.
> [...]
>
> This breaks !SMP builds, so it's probably best to defer this until the
> following fix is in mainline.

Now dropped, thanks.

greg k-h

2013-06-20 17:00:52

by Greg KH

[permalink] [raw]
Subject: Re: [ 00/48] 3.9.7-stable review

On Thu, Jun 20, 2013 at 07:02:42PM +0900, Satoru Takeuchi wrote:
> At Tue, 18 Jun 2013 09:17:26 -0700,
> Greg Kroah-Hartman wrote:
> >
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > This is the start of the stable review cycle for the 3.9.7 release.
> > There are 48 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Jun 20 16:15:42 UTC 2013.
> > Anything received after that time might be too late.
>
> This kernel can be built and boot without any problem.
> Building a kernel with this kernel also works fine.

Thanks for testing and letting us know.

greg k-h

2013-06-21 12:47:34

by Michal Hocko

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

On Thu 20-06-13 18:52:43, Satoru Takeuchi wrote:
> Hi Naoya,
>
> At Tue, 18 Jun 2013 09:17:55 -0700,
> Greg Kroah-Hartman wrote:
> >
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Naoya Horiguchi <[email protected]>
> >
> > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> >
> > When we have a page fault for the address which is backed by a hugepage
> > under migration, the kernel can't wait correctly and do busy looping on
> > hugepage fault until the migration finishes. As a result, users who try
> > to kick hugepage migration (via soft offlining, for example) occasionally
> > experience long delay or soft lockup.
> >
> > This is because pte_offset_map_lock() can't get a correct migration entry
> > or a correct page table lock for hugepage. This patch introduces
> > migration_entry_wait_huge() to solve this.
>
> I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().
>
> Correct me if I'm wrong.

I haven't checked the code closer but the patch doesn't change anything
regarding pte_unmap_unlock. The only thing that it touches is the
_locking_. So whether there ever was a problem with kmap or not this
patch doesn't change it.

That being said, the patch is OK for the stable tree.

>
> Thanks,
> Satoru
>
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > Reviewed-by: Rik van Riel <[email protected]>
> > Reviewed-by: Wanpeng Li <[email protected]>
> > Reviewed-by: Michal Hocko <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andi Kleen <[email protected]>
> > Cc: KOSAKI Motohiro <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > Signed-off-by: Linus Torvalds <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > ---
> > include/linux/swapops.h | 3 +++
> > mm/hugetlb.c | 2 +-
> > mm/migrate.c | 23 ++++++++++++++++++-----
> > 3 files changed, 22 insertions(+), 6 deletions(-)
> >
> > --- a/include/linux/swapops.h
> > +++ b/include/linux/swapops.h
> > @@ -137,6 +137,7 @@ static inline void make_migration_entry_
> >
> > extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> > unsigned long address);
> > +extern void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte);
> > #else
> >
> > #define make_migration_entry(page, write) swp_entry(0, 0)
> > @@ -148,6 +149,8 @@ static inline int is_migration_entry(swp
> > static inline void make_migration_entry_read(swp_entry_t *entryp) { }
> > static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> > unsigned long address) { }
> > +static inline void migration_entry_wait_huge(struct mm_struct *mm,
> > + pte_t *pte) { }
> > static inline int is_write_migration_entry(swp_entry_t entry)
> > {
> > return 0;
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm,
> > if (ptep) {
> > entry = huge_ptep_get(ptep);
> > if (unlikely(is_hugetlb_entry_migration(entry))) {
> > - migration_entry_wait(mm, (pmd_t *)ptep, address);
> > + migration_entry_wait_huge(mm, ptep);
> > return 0;
> > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> > return VM_FAULT_HWPOISON_LARGE |
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -200,15 +200,14 @@ static void remove_migration_ptes(struct
> > * get to the page and wait until migration is finished.
> > * When we return from this function the fault will be retried.
> > */
> > -void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> > - unsigned long address)
> > +static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> > + spinlock_t *ptl)
> > {
> > - pte_t *ptep, pte;
> > - spinlock_t *ptl;
> > + pte_t pte;
> > swp_entry_t entry;
> > struct page *page;
> >
> > - ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> > + spin_lock(ptl);
> > pte = *ptep;
> > if (!is_swap_pte(pte))
> > goto out;
> > @@ -236,6 +235,20 @@ out:
> > pte_unmap_unlock(ptep, ptl);
> > }
> >
> > +void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> > + unsigned long address)
> > +{
> > + spinlock_t *ptl = pte_lockptr(mm, pmd);
> > + pte_t *ptep = pte_offset_map(pmd, address);
> > + __migration_entry_wait(mm, ptep, ptl);
> > +}
> > +
> > +void migration_entry_wait_huge(struct mm_struct *mm, pte_t *pte)
> > +{
> > + spinlock_t *ptl = &(mm)->page_table_lock;
> > + __migration_entry_wait(mm, pte, ptl);
> > +}
> > +
> > #ifdef CONFIG_BLOCK
> > /* Returns true if all buffers are successfully locked */
> > static bool buffer_migrate_lock_buffers(struct buffer_head *head,
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe stable" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Michal Hocko
SUSE Labs

2013-06-21 22:38:49

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

At Thu, 20 Jun 2013 10:02:13 -0700,
Greg Kroah-Hartman wrote:
>
> On Thu, Jun 20, 2013 at 06:52:43PM +0900, Satoru Takeuchi wrote:
> > Hi Naoya,
> >
> > At Tue, 18 Jun 2013 09:17:55 -0700,
> > Greg Kroah-Hartman wrote:
> > >
> > > From: Greg Kroah-Hartman <[email protected]>
> > >
> > > 3.9-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Naoya Horiguchi <[email protected]>
> > >
> > > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> > >
> > > When we have a page fault for the address which is backed by a hugepage
> > > under migration, the kernel can't wait correctly and do busy looping on
> > > hugepage fault until the migration finishes. As a result, users who try
> > > to kick hugepage migration (via soft offlining, for example) occasionally
> > > experience long delay or soft lockup.
> > >
> > > This is because pte_offset_map_lock() can't get a correct migration entry
> > > or a correct page table lock for hugepage. This patch introduces
> > > migration_entry_wait_huge() to solve this.
> >
> > I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> > If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> > this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> > If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().
>
> Have you tried this?

Not yet. I'm now preparing the kernel to reproduce this problem.

>
> Also, the same issue is still in 3.10-rc6, right?

Yes.

Thanks,
Satoru

2013-06-21 22:56:35

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

At Fri, 21 Jun 2013 14:47:27 +0200,
Michal Hocko wrote:
>
> On Thu 20-06-13 18:52:43, Satoru Takeuchi wrote:
> > Hi Naoya,
> >
> > At Tue, 18 Jun 2013 09:17:55 -0700,
> > Greg Kroah-Hartman wrote:
> > >
> > > From: Greg Kroah-Hartman <[email protected]>
> > >
> > > 3.9-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Naoya Horiguchi <[email protected]>
> > >
> > > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> > >
> > > When we have a page fault for the address which is backed by a hugepage
> > > under migration, the kernel can't wait correctly and do busy looping on
> > > hugepage fault until the migration finishes. As a result, users who try
> > > to kick hugepage migration (via soft offlining, for example) occasionally
> > > experience long delay or soft lockup.
> > >
> > > This is because pte_offset_map_lock() can't get a correct migration entry
> > > or a correct page table lock for hugepage. This patch introduces
> > > migration_entry_wait_huge() to solve this.
> >
> > I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> > If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> > this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> > If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().
> >
> > Correct me if I'm wrong.
>
> I haven't checked the code closer but the patch doesn't change anything
> regarding pte_unmap_unlock. The only thing that it touches is the
> _locking_. So whether there ever was a problem with kmap or not this
> patch doesn't change it.
>
> That being said, the patch is OK for the stable tree.

OK, I see. This is a different problem from this patch. If I'll find a real problem
with kmap, I'll report it as the different problem.

Thanks,
Satoru

2013-06-22 12:29:03

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

> > > > From: Greg Kroah-Hartman <[email protected]>
> > > >
> > > > 3.9-stable review patch. If anyone has any objections, please let me know.
> > > >
> > > > ------------------
> > > >
> > > > From: Naoya Horiguchi <[email protected]>
> > > >
> > > > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> > > >
> > > > When we have a page fault for the address which is backed by a hugepage
> > > > under migration, the kernel can't wait correctly and do busy looping on
> > > > hugepage fault until the migration finishes. As a result, users who try
> > > > to kick hugepage migration (via soft offlining, for example) occasionally
> > > > experience long delay or soft lockup.
> > > >
> > > > This is because pte_offset_map_lock() can't get a correct migration entry
> > > > or a correct page table lock for hugepage. This patch introduces
> > > > migration_entry_wait_huge() to solve this.
> > >
> > > I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> > > If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> > > this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> > > If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().
> > >
> > > Correct me if I'm wrong.
> >
> > I haven't checked the code closer but the patch doesn't change anything
> > regarding pte_unmap_unlock. The only thing that it touches is the
> > _locking_. So whether there ever was a problem with kmap or not this
> > patch doesn't change it.
> >
> > That being said, the patch is OK for the stable tree.
>
> OK, I see. This is a different problem from this patch. If I'll find a real problem
> with kmap, I'll report it as the different problem.

As the result of detailed code review and test, I confirmed there is no problem about
this code path. Sorry for the noise.

Thanks,
Satoru

2013-06-27 18:48:53

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [ 29/48] mm: migration: add migrate_entry_wait_huge()

Takeuchi-san,
Sorry for the late response, I was on vacation this 2 weeks.

And as your mentioned in the later discussion in this thread,
the problem you worry about does not come from this patch.
So this patch is ok as it is.

Thanks,
Naoya Horiguchi

On Thu, Jun 20, 2013 at 06:52:43PM +0900, Satoru Takeuchi wrote:
> Hi Naoya,
>
> At Tue, 18 Jun 2013 09:17:55 -0700,
> Greg Kroah-Hartman wrote:
> >
> > From: Greg Kroah-Hartman <[email protected]>
> >
> > 3.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Naoya Horiguchi <[email protected]>
> >
> > commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.
> >
> > When we have a page fault for the address which is backed by a hugepage
> > under migration, the kernel can't wait correctly and do busy looping on
> > hugepage fault until the migration finishes. As a result, users who try
> > to kick hugepage migration (via soft offlining, for example) occasionally
> > experience long delay or soft lockup.
> >
> > This is because pte_offset_map_lock() can't get a correct migration entry
> > or a correct page table lock for hugepage. This patch introduces
> > migration_entry_wait_huge() to solve this.
>
> I suspect that this code doesn't work correctly on i686 box with CONFIG_HIGHPTE.
> If we call hugetlb_fault() -> migration_entry_wait_huge() -> __migration_entry_wait(),
> this function tries to kunmap pte, in this case pte is not-kmapped pmd, via pte_unmap_unlock().
> If CONFIG_DEBUG_HIGHMEM is also enabled, it results in BUG_ON() at __kunmap_atomic().
>
> Correct me if I'm wrong.