2015-08-24 09:09:27

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 00/82] 3.12.47-stable review

This is the start of the stable review cycle for the 3.12.47 release.
There are 82 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Aug 26 11:08:59 CEST 2015.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
http://kernel.org/pub/linux/kernel/people/jirislaby/stable-review/patch-3.12.47-rc1.xz
and the diffstat can be found below.

thanks,
js

===============


Al Viro (2):
freeing unlinked file indefinitely delayed
path_openat(): fix double fput()

Alex Deucher (2):
drm/radeon/combios: add some validation of lvds values
drm/radeon: add new OLAND pci id

Alexey Brodkin (1):
ARC: make sure instruction_pointer() returns unsigned value

Amanieu d'Antras (3):
signalfd: fix information leak in signalfd_copyinfo
signal: fix information leak in copy_siginfo_to_user
signal: fix information leak in copy_siginfo_from_user32

Andy Lutomirski (6):
x86/xen: Probe target addresses in set_aliased_prot() before the
hypercall
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
x86/nmi/64: Remove asm code that saves CR2
x86/nmi/64: Switch stacks on userspace NMI entry
x86/ldt: Make modify_ldt synchronous
x86/ldt: Further fix FPU emulation

Arnd Bergmann (2):
3w-xxxx: fix mis-aligned struct accesses
ARM: realview: fix sparsemem build

Axel Lin (1):
ASoC: pcm1681: Fix setting de-emphasis sampling rate selection

Ben Hutchings (1):
hwrng: via-rng - Mark device ID table as __maybe_unused

Benjamin Randazzo (1):
md: use kzalloc() when bitmap is disabled

Bernhard Bender (1):
Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen

Bob Liu (1):
xen-blkfront: don't add indirect pages to list when
!feature_persistent

Brian Campbell (1):
xhci: Calculate old endpoints correctly on device reset

Brian King (3):
ipr: Fix locking for unit attention handling
ipr: Fix incorrect trace indexing
ipr: Fix invalid array indexing for HRRQ

Brian Silverman (1):
futex: Fix a race condition between REQUEUE_PI and task death

Chris Metcalf (1):
tile: use free_bootmem_late() for initrd

Dan Carpenter (1):
ALSA: hda - fix cs4210_spdif_automute()

David Daney (1):
MIPS: Make set_pte() SMP safe.

David S. Miller (1):
sparc64: Fix userspace FPU register corruptions.

Dirk Behme (1):
USB: sierra: add 1199:68AB device ID

Dominic Sacré (1):
ALSA: usb-audio: Add MIDI support for Steinberg MI2/MI4

Felix Fietkau (1):
MIPS: Fix sched_getaffinity with MT FPAFF enabled

Fupan Li (1):
efi: fix 32bit kernel boot failed problem using efi

Herbert Xu (1):
crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer

Herton R. Krzesinski (2):
HID: usbhid: add Chicony/Pixart usb optical mouse that needs
QUIRK_ALWAYS_POLL
ipc,sem: fix use after free on IPC_RMID after a task using same
semaphore set exits

Ilya Dryomov (1):
rbd: fix copyup completion race

Jan Kara (1):
fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

Jingju Hou (1):
mmc: sdhci-pxav3: fix platform_data is not initialized

Joakim Tjernlund (1):
mmc: sdhci-esdhc: Make 8BIT bus work

Joe Thornber (1):
dm thin metadata: delete btrees when releasing metadata snapshot

Joseph Qi (1):
ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()

Juergen Gross (2):
x86/ldt: Correct LDT access in single stepping logic
x86/ldt: Correct FPU emulation access to LDT

Kirill A. Shutemov (1):
mm: avoid setting up anonymous pages into file mapping

Lior Amsalem (1):
ata: pmp: add quirk for Marvell 4140 SATA PMP

Manfred Spraul (1):
ipc/sem.c: update/correct memory barriers

Marc-André Lureau (1):
vhost: actually track log eventfd file

Marcus Gelderie (1):
ipc: modify message queue accounting to not take kernel data
structures into account

Marek Marczykowski-Górecki (1):
xen/gntdevt: Fix race condition in gntdev_release()

Martin Schwidefsky (1):
s390/sclp: clear upper register halves in _sclp_print_early

Mathias Nyman (1):
xhci: fix off by one error in TRB DMA address boundary check

Michael Walle (1):
EDAC, ppc4xx: Access mci->csrows array elements properly

Michal Hocko (1):
mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations

Mimi Zohar (2):
ima: add support for new "euid" policy condition
ima: extend "mask" policy matching support

NeilBrown (3):
md/raid1: fix test for 'was read error from last working device'.
md/raid1: extend spinlock to protect raid1_end_read_request against
inconsistencies
md/bitmap: return an error when bitmap superblock is corrupt.

Nicholas Bellinger (3):
iscsi-target: Fix use-after-free during TPG session shutdown
iscsi-target: Fix iser explicit logout TX kthread leak
iscsi-target: Fix iscsit_start_kthreads failure OOPs

Oliver Neukum (1):
usb-storage: ignore ZTE MF 823 card reader in mode 0x1225

Paul E. McKenney (1):
rcu: Provide counterpart to rcu_dereference() for non-RCU situations

Peter Zijlstra (3):
arch: Introduce smp_load_acquire(), smp_store_release()
rcu: Move lockless_dereference() out of rcupdate.h
perf: Fix fasync handling on inherited events

Richard Weinberger (1):
localmodconfig: Use Kbuild files too

Roger Quadros (1):
ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc

Seymour, Shane M (1):
st: null pointer dereference panic caused by use after kref_put by
st_open

Takashi Iwai (1):
ALSA: hda - Fix MacBook Pro 5,2 quirk

Tejun Heo (1):
blkcg: fix gendisk reference leak in blkg_conf_prep()

Thomas Gleixner (1):
genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD

Tom Hughes (1):
mac80211: clear subdir_stations when removing debugfs

Wanpeng Li (1):
mm/hwpoison: fix page refcount of unknown non LRU page

Wengang Wang (1):
rds: rds_ib_device.refcount overflow

Xie XiuQi (1):
ipmi: fix timeout calculation when bmc is disconnected

Yao-Wen Mao (1):
ALSA: usb-audio: add dB range mapping for some devices

Zhuang Jin Can (3):
xhci: report U3 when link is in resume state
xhci: prevent bus_suspend if SS port resuming in phase 1
xhci: do not report PLC when link is in internal resume state

Documentation/ABI/testing/ima_policy | 6 +-
arch/arc/include/asm/ptrace.h | 2 +-
arch/arm/include/asm/barrier.h | 15 ++
arch/arm/mach-omap2/omap_hwmod.c | 24 ++-
arch/arm/mach-realview/include/mach/memory.h | 2 +
arch/arm64/include/asm/barrier.h | 50 +++++
arch/arm64/kernel/signal32.c | 5 +-
arch/ia64/include/asm/barrier.h | 23 +++
arch/metag/include/asm/barrier.h | 15 ++
arch/mips/include/asm/barrier.h | 15 ++
arch/mips/include/asm/pgtable.h | 31 ++++
arch/mips/kernel/mips-mt-fpaff.c | 5 +-
arch/mips/kernel/signal32.c | 2 -
arch/powerpc/include/asm/barrier.h | 21 ++-
arch/powerpc/kernel/signal_32.c | 2 -
arch/s390/include/asm/barrier.h | 15 ++
arch/s390/kernel/sclp.S | 4 +
arch/sparc/include/asm/barrier_64.h | 15 ++
arch/sparc/include/asm/visasm.h | 16 +-
arch/sparc/lib/NG4memcpy.S | 5 +-
arch/sparc/lib/VISsave.S | 67 +------
arch/sparc/lib/ksyms.c | 4 -
arch/tile/kernel/setup.c | 2 +-
arch/x86/boot/compressed/head_32.S | 2 +-
arch/x86/include/asm/barrier.h | 43 ++++-
arch/x86/include/asm/desc.h | 15 --
arch/x86/include/asm/mmu.h | 3 +-
arch/x86/include/asm/mmu_context.h | 48 ++++-
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/cpu/perf_event.c | 13 +-
arch/x86/kernel/entry_64.S | 83 ++++++---
arch/x86/kernel/ldt.c | 262 +++++++++++++++------------
arch/x86/kernel/nmi.c | 123 ++++++-------
arch/x86/kernel/process_64.c | 4 +-
arch/x86/kernel/step.c | 8 +-
arch/x86/math-emu/fpu_entry.c | 3 +-
arch/x86/math-emu/fpu_system.h | 21 ++-
arch/x86/math-emu/get_address.c | 3 +-
arch/x86/power/cpu.c | 3 +-
arch/x86/xen/enlighten.c | 40 ++++
block/blk-cgroup.c | 6 +-
drivers/ata/libata-pmp.c | 7 +
drivers/block/rbd.c | 22 ++-
drivers/block/xen-blkfront.c | 6 +-
drivers/char/hw_random/via-rng.c | 2 +-
drivers/char/ipmi/ipmi_si_intf.c | 2 +-
drivers/crypto/ixp4xx_crypto.c | 1 -
drivers/edac/ppc4xx_edac.c | 2 +-
drivers/gpu/drm/radeon/radeon_combios.c | 7 +-
drivers/hid/hid-ids.h | 1 +
drivers/hid/usbhid/hid-quirks.c | 1 +
drivers/input/touchscreen/usbtouchscreen.c | 3 +
drivers/md/bitmap.c | 2 +
drivers/md/dm-thin-metadata.c | 4 +-
drivers/md/md.c | 3 +-
drivers/md/raid1.c | 12 +-
drivers/mmc/host/sdhci-esdhc.h | 2 +-
drivers/mmc/host/sdhci-pxav3.c | 1 +
drivers/scsi/3w-xxxx.h | 4 +-
drivers/scsi/ipr.c | 28 ++-
drivers/scsi/ipr.h | 1 +
drivers/scsi/st.c | 2 +-
drivers/target/iscsi/iscsi_target.c | 48 ++++-
drivers/target/iscsi/iscsi_target_core.h | 1 +
drivers/target/iscsi/iscsi_target_login.c | 43 ++---
drivers/target/iscsi/iscsi_target_login.h | 3 +-
drivers/target/iscsi/iscsi_target_nego.c | 34 +++-
drivers/usb/host/xhci-hub.c | 22 ++-
drivers/usb/host/xhci-ring.c | 5 +-
drivers/usb/host/xhci.c | 3 +
drivers/usb/host/xhci.h | 1 +
drivers/usb/serial/sierra.c | 1 +
drivers/usb/storage/unusual_devs.h | 12 ++
drivers/vhost/vhost.c | 1 +
drivers/xen/gntdev.c | 2 +
fs/dcache.c | 3 +
fs/namei.c | 3 +-
fs/notify/mark.c | 30 ++-
fs/ocfs2/dlmglue.c | 10 +-
fs/signalfd.c | 5 +-
include/asm-generic/barrier.h | 15 ++
include/drm/drm_pciids.h | 1 +
include/linux/compiler.h | 24 +++
include/linux/rcupdate.h | 1 -
ipc/mqueue.c | 5 -
ipc/sem.c | 41 ++++-
kernel/events/core.c | 12 +-
kernel/futex.c | 22 +--
kernel/irq/resend.c | 18 +-
kernel/signal.c | 7 +-
mm/memory-failure.c | 2 +
mm/memory.c | 13 +-
mm/vmscan.c | 14 +-
net/mac80211/debugfs_netdev.c | 1 +
net/rds/ib_rdma.c | 4 +-
scripts/kconfig/streamline_config.pl | 2 +-
security/integrity/ima/ima_policy.c | 47 ++++-
sound/pci/hda/patch_cirrus.c | 4 +-
sound/pci/hda/patch_realtek.c | 2 +-
sound/soc/codecs/pcm1681.c | 2 +-
sound/usb/mixer_maps.c | 24 +++
sound/usb/quirks-table.h | 68 +++++++
102 files changed, 1185 insertions(+), 514 deletions(-)

--
2.5.0


2015-08-24 09:09:48

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 01/82] efi: fix 32bit kernel boot failed problem using efi

From: Fupan Li <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

3.12 commit 065487a10a22a960bc4e41facb011d10692ef470 ("x86/efi:
Correct EFI boot stub use of code32_start"), upstream commit
7e8213c1f3acc064aef37813a39f13cbfe7c3ce7 imported a bug, which will
cause 32bit kernel boot to fail using EFI method. It should use the
label's address instead of the value stored in the label to calculate
the address of code32_start.

Signed-off-by: Fupan Li <[email protected]>
Reviewed-by: Matt Fleming <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index b1bd969e26aa..36ddc61182af 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -54,7 +54,7 @@ ENTRY(efi_pe_entry)
call reloc
reloc:
popl %ecx
- subl reloc, %ecx
+ subl $reloc, %ecx
movl %ecx, BP_code32_start(%eax)

sub $0x4, %esp
--
2.5.0

2015-08-24 09:38:01

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 02/82] futex: Fix a race condition between REQUEUE_PI and task death

From: Brian Silverman <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 30a6b8031fe14031ab27c1fa3483cb9780e7f63c upstream.

free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:

task A | task B
exit_pi_state_list |
pi_state = |
curr->pi_state_list->next |
| futex_requeue(requeue_pi=1)
| // pi_state is the same as
| // the one in task A
| free_pi_state(pi_state)
| list_del_init(&pi_state->list)
| kfree(pi_state)
list_del_init(&pi_state->list) |

Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.

[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
debugging. The latter comes via a seperate patch ]

Signed-off-by: Brian Silverman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Mike Galbraith <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/futex.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e4b9b60e25b1..bd0bc06772f6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -490,8 +490,14 @@ static struct futex_pi_state * alloc_pi_state(void)
return pi_state;
}

+/*
+ * Must be called with the hb lock held.
+ */
static void free_pi_state(struct futex_pi_state *pi_state)
{
+ if (!pi_state)
+ return;
+
if (!atomic_dec_and_test(&pi_state->refcount))
return;

@@ -1405,15 +1411,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
}

retry:
- if (pi_state != NULL) {
- /*
- * We will have to lookup the pi_state again, so free this one
- * to keep the accounting correct.
- */
- free_pi_state(pi_state);
- pi_state = NULL;
- }
-
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
if (unlikely(ret != 0))
goto out;
@@ -1501,6 +1498,8 @@ retry_private:
case 0:
break;
case -EFAULT:
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1510,6 +1509,8 @@ retry_private:
goto out;
case -EAGAIN:
/* The owner was exiting, try again. */
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1586,6 +1587,7 @@ retry_private:
}

out_unlock:
+ free_pi_state(pi_state);
double_unlock_hb(hb1, hb2);

/*
@@ -1602,8 +1604,6 @@ out_put_keys:
out_put_key1:
put_futex_key(&key1);
out:
- if (pi_state != NULL)
- free_pi_state(pi_state);
return ret ? ret : task_count;
}

--
2.5.0

2015-08-24 09:38:03

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 03/82] HID: usbhid: add Chicony/Pixart usb optical mouse that needs QUIRK_ALWAYS_POLL

From: "Herton R. Krzesinski" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7250dc3fee806eb2b7560ab7d6072302e7ae8cf8 upstream.

I received a report from an user of following mouse which needs this quirk:

usb 1-1.6: USB disconnect, device number 58
usb 1-1.6: new low speed USB device number 59 using ehci_hcd
usb 1-1.6: New USB device found, idVendor=04f2, idProduct=1053
usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.6: Product: USB Optical Mouse
usb 1-1.6: Manufacturer: PixArt
usb 1-1.6: configuration #1 chosen from 1 choice
input: PixArt USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6:1.0/input/input5887
generic-usb 0003:04F2:1053.16FE: input,hidraw2: USB HID v1.11 Mouse [PixArt USB Optical Mouse] on usb-0000:00:1a.0-1.6/input0

The quirk was tested by the reporter and it fixed the frequent disconnections etc.

[[email protected]: reorder the position in hid-ids.h]
Signed-off-by: Herton R. Krzesinski <[email protected]>
Reviewed-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Cc: Oliver Neukum <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/hid/hid-ids.h | 1 +
drivers/hid/usbhid/hid-quirks.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 2e65d7791060..6da09931a987 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -218,6 +218,7 @@
#define USB_DEVICE_ID_CHICONY_TACTICAL_PAD 0x0418
#define USB_DEVICE_ID_CHICONY_MULTI_TOUCH 0xb19d
#define USB_DEVICE_ID_CHICONY_WIRELESS 0x0618
+#define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE 0x1053
#define USB_DEVICE_ID_CHICONY_WIRELESS2 0x1123
#define USB_DEVICE_ID_CHICONY_AK1D 0x1125

diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c
index 8f884a6a8a8f..7bc98db768eb 100644
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -69,6 +69,7 @@ static const struct hid_blacklist {
{ USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_PRO_PEDALS, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_3AXIS_5BUTTON_STICK, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_CH, USB_DEVICE_ID_CH_AXIS_295, HID_QUIRK_NOGET },
+ { USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE, HID_QUIRK_ALWAYS_POLL },
{ USB_VENDOR_ID_DMI, USB_DEVICE_ID_DMI_ENC, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_ELAN, USB_DEVICE_ID_ELAN_TOUCHSCREEN, HID_QUIRK_ALWAYS_POLL },
{ USB_VENDOR_ID_ELAN, USB_DEVICE_ID_ELAN_TOUCHSCREEN_009B, HID_QUIRK_ALWAYS_POLL },
--
2.5.0

2015-08-24 09:34:10

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 04/82] mm: avoid setting up anonymous pages into file mapping

From: "Kirill A. Shutemov" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 6b7339f4c31ad69c8e9c0b2859276e22cf72176d upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
mm/memory.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 38617f049b9f..d0d84c36cd5c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3213,6 +3213,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,

pte_unmap(page_table);

+ /* File mapping without ->vm_ops ? */
+ if (vma->vm_flags & VM_SHARED)
+ return VM_FAULT_SIGBUS;
+
/* Check if we need to add a guard page to the stack */
if (check_stack_guard_page(vma, address) < 0)
return VM_FAULT_SIGSEGV;
@@ -3480,6 +3484,9 @@ static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;

pte_unmap(page_table);
+ /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
+ if (!vma->vm_ops->fault)
+ return VM_FAULT_SIGBUS;
return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}

@@ -3691,11 +3698,9 @@ static int handle_pte_fault(struct mm_struct *mm,
entry = ACCESS_ONCE(*pte);
if (!pte_present(entry)) {
if (pte_none(entry)) {
- if (vma->vm_ops) {
- if (likely(vma->vm_ops->fault))
- return do_linear_fault(mm, vma, address,
+ if (vma->vm_ops)
+ return do_linear_fault(mm, vma, address,
pte, pmd, flags, entry);
- }
return do_anonymous_page(mm, vma, address,
pte, pmd, flags);
}
--
2.5.0

2015-08-24 09:10:05

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 05/82] freeing unlinked file indefinitely delayed

From: Al Viro <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 75a6f82a0d10ef8f13cd8fe7212911a0252ab99e upstream.

Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course). However, there's one case where that
does *not* happen. Namely, if you open it by fhandle with cold dcache,
then unlink() and close().

In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal. In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()). The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in. As the result, we have the final iput() delayed
indefinitely. It's trivial to reproduce -

void flush_dcache(void)
{
system("mount -o remount,rw /");
}

static char buf[20 * 1024 * 1024];

main()
{
int fd;
union {
struct file_handle f;
char buf[MAX_HANDLE_SZ];
} x;
int m;

x.f.handle_bytes = sizeof(x);
chdir("/root");
mkdir("foo", 0700);
fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
close(fd);
name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
flush_dcache();
fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
unlink("foo/bar");
write(fd, buf, sizeof(buf));
system("df ."); /* 20Mb eaten */
close(fd);
system("df ."); /* should've freed those 20Mb */
flush_dcache();
system("df ."); /* should be the same as #2 */
}

will spit out something like
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 283282 21692 93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).

Acked-by: J. Bruce Fields <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
fs/dcache.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 64cfe24cdd88..4c227f81051b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -622,6 +622,9 @@ repeat:
if (unlikely(d_unhashed(dentry)))
goto kill_it;

+ if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
+ goto kill_it;
+
if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
if (dentry->d_op->d_delete(dentry))
goto kill_it;
--
2.5.0

2015-08-24 09:37:21

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 06/82] s390/sclp: clear upper register halves in _sclp_print_early

From: Martin Schwidefsky <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f9c87a6f46d508eae0d9ae640be98d50f237f827 upstream.

If the kernel is compiled with gcc 5.1 and the XZ compression option
the decompress_kernel function calls _sclp_print_early in 64-bit mode
while the content of the upper register half of %r6 is non-zero.
This causes a specification exception on the servc instruction in
_sclp_servc.

The _sclp_print_early function saves and restores the upper registers
halves but it fails to clear them for the 31-bit code of the mini sclp
driver.

Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/s390/kernel/sclp.S | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/s390/kernel/sclp.S b/arch/s390/kernel/sclp.S
index 29bd7bec4176..1ecd47b5e250 100644
--- a/arch/s390/kernel/sclp.S
+++ b/arch/s390/kernel/sclp.S
@@ -276,6 +276,8 @@ ENTRY(_sclp_print_early)
jno .Lesa2
ahi %r15,-80
stmh %r6,%r15,96(%r15) # store upper register halves
+ basr %r13,0
+ lmh %r0,%r15,.Lzeroes-.(%r13) # clear upper register halves
.Lesa2:
#endif
lr %r10,%r2 # save string pointer
@@ -299,6 +301,8 @@ ENTRY(_sclp_print_early)
#endif
lm %r6,%r15,120(%r15) # restore registers
br %r14
+.Lzeroes:
+ .fill 64,4,0

.LwritedataS4:
.long 0x00760005 # SCLP command for write data
--
2.5.0

2015-08-24 09:09:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 07/82] ARC: make sure instruction_pointer() returns unsigned value

From: Alexey Brodkin <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f51e2f1911122879eefefa4c592dea8bf794b39c upstream.

Currently instruction_pointer() returns pt_regs->ret and so return value
is of type "long", which implicitly stands for "signed long".

While that's perfectly fine when dealing with 32-bit values if return
value of instruction_pointer() gets assigned to 64-bit variable sign
extension may happen.

And at least in one real use-case it happens already.
In perf_prepare_sample() return value of perf_instruction_pointer()
(which is an alias to instruction_pointer() in case of ARC) is assigned
to (struct perf_sample_data)->ip (which type is "u64").

And what we see if instuction pointer points to user-space application
that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
leading 32 zeros. But if instruction pointer points to kernel address
space that starts from 0x8000_0000 then "ip" is set with 32 leadig
"f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.

In particular that issuse broke output of perf, because perf was unable
to associate addresses like 0xffff_ffff__8100_0000 with anything from
/proc/kallsyms.

That's what we used to see:
----------->8----------
6.27% ls [unknown] [k] 0xffffffff8046c5cc
2.96% ls libuClibc-0.9.34-git.so [.] memcpy
2.25% ls libuClibc-0.9.34-git.so [.] memset
1.66% ls [unknown] [k] 0xffffffff80666536
1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6
1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472
----------->8----------

With that change perf output looks much better now:
----------->8----------
8.21% ls [kernel.kallsyms] [k] memset
3.52% ls libuClibc-0.9.34-git.so [.] memcpy
2.11% ls libuClibc-0.9.34-git.so [.] malloc
1.88% ls libuClibc-0.9.34-git.so [.] memset
1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu
----------->8----------

Signed-off-by: Alexey Brodkin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arc/include/asm/ptrace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 1bfeec2c0558..2a58af7a2e3a 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -63,7 +63,7 @@ struct callee_regs {
long r25, r24, r23, r22, r21, r20, r19, r18, r17, r16, r15, r14, r13;
};

-#define instruction_pointer(regs) ((regs)->ret)
+#define instruction_pointer(regs) (unsigned long)((regs)->ret)
#define profile_pc(regs) instruction_pointer(regs)

/* return 1 if user mode or 0 if kernel mode */
--
2.5.0

2015-08-24 09:09:57

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 08/82] genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD

From: Thomas Gleixner <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 75a06189fc508a2acf470b0b12710362ffb2c4b1 upstream.

The resend mechanism happily calls the interrupt handler of interrupts
which are marked IRQ_NESTED_THREAD from softirq context. This can
result in crashes because the interrupt handler is not the proper way
to invoke the device handlers. They must be invoked via
handle_nested_irq.

Prevent the resend even if the interrupt has no valid parent irq
set. Its better to have a lost interrupt than a crashing machine.

Reported-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/irq/resend.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
index 9065107f083e..7a5237a1bce5 100644
--- a/kernel/irq/resend.c
+++ b/kernel/irq/resend.c
@@ -75,13 +75,21 @@ void check_irq_resend(struct irq_desc *desc, unsigned int irq)
!desc->irq_data.chip->irq_retrigger(&desc->irq_data)) {
#ifdef CONFIG_HARDIRQS_SW_RESEND
/*
- * If the interrupt has a parent irq and runs
- * in the thread context of the parent irq,
- * retrigger the parent.
+ * If the interrupt is running in the thread
+ * context of the parent irq we need to be
+ * careful, because we cannot trigger it
+ * directly.
*/
- if (desc->parent_irq &&
- irq_settings_is_nested_thread(desc))
+ if (irq_settings_is_nested_thread(desc)) {
+ /*
+ * If the parent_irq is valid, we
+ * retrigger the parent, otherwise we
+ * do nothing.
+ */
+ if (!desc->parent_irq)
+ return;
irq = desc->parent_irq;
+ }
/* Set it pending and activate the softirq: */
set_bit(irq, irqs_resend);
tasklet_schedule(&resend_tasklet);
--
2.5.0

2015-08-24 09:09:54

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 09/82] ALSA: usb-audio: Add MIDI support for Steinberg MI2/MI4

From: Dominic Sacré <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0689a86ae814f39af94a9736a0a5426dd82eb107 upstream.

The Steinberg MI2 and MI4 interfaces are compatible with the USB class
audio spec, but the MIDI part of the devices is reported as a vendor
specific interface.

This patch adds entries to quirks-table.h to recognize the MIDI
endpoints. Audio functionality was already working and is unaffected by
this change.

Signed-off-by: Dominic Sacré <[email protected]>
Signed-off-by: Albert Huitsing <[email protected]>
Acked-by: Clemens Ladisch <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
sound/usb/quirks-table.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h
index 5293b5ac8b9d..7c24088bcaa4 100644
--- a/sound/usb/quirks-table.h
+++ b/sound/usb/quirks-table.h
@@ -2516,6 +2516,74 @@ YAMAHA_DEVICE(0x7010, "UB99"),
}
},

+/* Steinberg devices */
+{
+ /* Steinberg MI2 */
+ USB_DEVICE_VENDOR_SPEC(0x0a4e, 0x2040),
+ .driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
+ .ifnum = QUIRK_ANY_INTERFACE,
+ .type = QUIRK_COMPOSITE,
+ .data = & (const struct snd_usb_audio_quirk[]) {
+ {
+ .ifnum = 0,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 1,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 2,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 3,
+ .type = QUIRK_MIDI_FIXED_ENDPOINT,
+ .data = &(const struct snd_usb_midi_endpoint_info) {
+ .out_cables = 0x0001,
+ .in_cables = 0x0001
+ }
+ },
+ {
+ .ifnum = -1
+ }
+ }
+ }
+},
+{
+ /* Steinberg MI4 */
+ USB_DEVICE_VENDOR_SPEC(0x0a4e, 0x4040),
+ .driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
+ .ifnum = QUIRK_ANY_INTERFACE,
+ .type = QUIRK_COMPOSITE,
+ .data = & (const struct snd_usb_audio_quirk[]) {
+ {
+ .ifnum = 0,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 1,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 2,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 3,
+ .type = QUIRK_MIDI_FIXED_ENDPOINT,
+ .data = &(const struct snd_usb_midi_endpoint_info) {
+ .out_cables = 0x0001,
+ .in_cables = 0x0001
+ }
+ },
+ {
+ .ifnum = -1
+ }
+ }
+ }
+},
+
/* TerraTec devices */
{
USB_DEVICE_VENDOR_SPEC(0x0ccd, 0x0012),
--
2.5.0

2015-08-24 09:10:03

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 10/82] ALSA: usb-audio: add dB range mapping for some devices

From: Yao-Wen Mao <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 2d1cb7f658fb9c3ba8f9dab8aca297d4dfdec835 upstream.

Add the correct dB ranges of Bose Companion 5 and Drangonfly DAC 1.2.

Signed-off-by: Yao-Wen Mao <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
sound/usb/mixer_maps.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/sound/usb/mixer_maps.c b/sound/usb/mixer_maps.c
index d06fbd9f7cbe..2d17f40fb16d 100644
--- a/sound/usb/mixer_maps.c
+++ b/sound/usb/mixer_maps.c
@@ -330,6 +330,20 @@ static const struct usbmix_name_map scms_usb3318_map[] = {
{ 0 }
};

+/* Bose companion 5, the dB conversion factor is 16 instead of 256 */
+static struct usbmix_dB_map bose_companion5_dB = {-5006, -6};
+static struct usbmix_name_map bose_companion5_map[] = {
+ { 3, NULL, .dB = &bose_companion5_dB },
+ { 0 } /* terminator */
+};
+
+/* Dragonfly DAC 1.2, the dB conversion factor is 1 instead of 256 */
+static struct usbmix_dB_map dragonfly_1_2_dB = {0, 5000};
+static struct usbmix_name_map dragonfly_1_2_map[] = {
+ { 7, NULL, .dB = &dragonfly_1_2_dB },
+ { 0 } /* terminator */
+};
+
/*
* Control map entries
*/
@@ -432,6 +446,16 @@ static struct usbmix_ctl_map usbmix_ctl_maps[] = {
.id = USB_ID(0x25c4, 0x0003),
.map = scms_usb3318_map,
},
+ {
+ /* Bose Companion 5 */
+ .id = USB_ID(0x05a7, 0x1020),
+ .map = bose_companion5_map,
+ },
+ {
+ /* Dragonfly DAC 1.2 */
+ .id = USB_ID(0x21b4, 0x0081),
+ .map = dragonfly_1_2_map,
+ },
{ 0 } /* terminator */
};

--
2.5.0

2015-08-24 09:34:06

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 11/82] ALSA: hda - Fix MacBook Pro 5,2 quirk

From: Takashi Iwai <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 649ccd08534ee26deb2e5b08509800d0e95167f5 upstream.

MacBook Pro 5,2 with ALC889 codec had already a fixup entry, but this
seems not working correctly, a fix for pin NID 0x15 is needed in
addition. It's equivalent with the fixup for MacBook Air 1,1, so use
this instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102131
Reported-and-tested-by: Jeffery Miller <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
sound/pci/hda/patch_realtek.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index a2e6f3ec7d26..f92057919273 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -2213,7 +2213,7 @@ static const struct snd_pci_quirk alc882_fixup_tbl[] = {
SND_PCI_QUIRK(0x106b, 0x4300, "iMac 9,1", ALC889_FIXUP_IMAC91_VREF),
SND_PCI_QUIRK(0x106b, 0x4600, "MacbookPro 5,2", ALC889_FIXUP_IMAC91_VREF),
SND_PCI_QUIRK(0x106b, 0x4900, "iMac 9,1 Aluminum", ALC889_FIXUP_IMAC91_VREF),
- SND_PCI_QUIRK(0x106b, 0x4a00, "Macbook 5,2", ALC889_FIXUP_IMAC91_VREF),
+ SND_PCI_QUIRK(0x106b, 0x4a00, "Macbook 5,2", ALC889_FIXUP_MBA11_VREF),

SND_PCI_QUIRK(0x1071, 0x8258, "Evesham Voyaeger", ALC882_FIXUP_EAPD),
SND_PCI_QUIRK(0x1462, 0x7350, "MSI-7350", ALC889_FIXUP_CD),
--
2.5.0

2015-08-24 09:34:08

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 12/82] st: null pointer dereference panic caused by use after kref_put by st_open

From: "Seymour, Shane M" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit e7ac6c6666bec0a354758a1298d3231e4a635362 upstream.

Two SLES11 SP3 servers encountered similar crashes simultaneously
following some kind of SAN/tape target issue:

...
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 -- 1 2002.
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 -- 1 2002.
qla2xxx [0000:81:00.0]-8009:3: DEVICE RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: DEVICE RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8009:3: TARGET RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: TARGET RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8012:3: BUS RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-802b:3: BUS RESET SUCCEEDED nexus=3:0:2.
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8018:3: ADAPTER RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-00af:3: Performing ISP error recovery - ha=ffff88bf04d18000.
rport-3:0-0: blocked FC remote port time out: removing target and saving binding
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:2.
rport-2:0-0: blocked FC remote port time out: removing target and saving binding
sg_rq_end_io: device detached
BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
IP: [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
PGD 7e6586f067 PUD 7e5af06067 PMD 0 [1739975.390354] Oops: 0002 [#1] SMP
CPU 0
...
Supported: No, Proprietary modules are loaded [1739975.390463]
Pid: 27965, comm: ABCD Tainted: PF X 3.0.101-0.29-default #1 HP ProLiant DL580 Gen8
RIP: 0010:[<ffffffff8133b268>] [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
RSP: 0018:ffff8839dc1e7c68 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff883f0592fc00 RCX: 0000000000000090
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000138
RBP: 0000000000000138 R08: 0000000000000010 R09: ffffffff81bd39d0
R10: 00000000000009c0 R11: ffffffff81025790 R12: 0000000000000001
R13: ffff883022212b80 R14: 0000000000000004 R15: ffff883022212b80
FS: 00007f8e54560720(0000) GS:ffff88407f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002a8 CR3: 0000007e6ced6000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ABCD (pid: 27965, threadinfo ffff8839dc1e6000, task ffff883592e0c640)
Stack:
ffff883f0592fc00 00000000fffffffa 0000000000000001 ffff883022212b80
ffff883eff772400 ffffffffa03fa309 0000000000000000 0000000000000000
ffffffffa04003a0 ffff883f063196c0 ffff887f0379a930 ffffffff8115ea1e
Call Trace:
[<ffffffffa03fa309>] st_open+0x129/0x240 [st]
[<ffffffff8115ea1e>] chrdev_open+0x13e/0x200
[<ffffffff811588a8>] __dentry_open+0x198/0x310
[<ffffffff81167d74>] do_last+0x1f4/0x800
[<ffffffff81168fe9>] path_openat+0xd9/0x420
[<ffffffff8116946c>] do_filp_open+0x4c/0xc0
[<ffffffff8115a00f>] do_sys_open+0x17f/0x250
[<ffffffff81468d92>] system_call_fastpath+0x16/0x1b
[<00007f8e4f617fd0>] 0x7f8e4f617fcf
Code: eb d3 90 48 83 ec 28 40 f6 c6 04 48 89 6c 24 08 4c 89 74 24 20 48 89 fd 48 89 1c 24 4c 89 64 24 10 41 89 f6 4c 89 6c 24 18 74 11 <f0> ff 8f 70 01 00 00 0f 94 c0 45 31 ed 84 c0 74 2b 4c 8d a5 a0
RIP [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
RSP <ffff8839dc1e7c68>
CR2: 00000000000002a8

Analysis reveals the cause of the crash to be due to STp->device
being NULL. The pointer was NULLed via scsi_tape_put(STp) when it
calls scsi_tape_release(). In st_open() we jump to err_out after
scsi_block_when_processing_errors() completes and returns the
device as offline (sdev_state was SDEV_DEL):

1180 /* Open the device. Needs to take the BKL only because of incrementing the SCSI host
1181 module count. */
1182 static int st_open(struct inode *inode, struct file *filp)
1183 {
1184 int i, retval = (-EIO);
1185 int resumed = 0;
1186 struct scsi_tape *STp;
1187 struct st_partstat *STps;
1188 int dev = TAPE_NR(inode);
1189 char *name;
...
1217 if (scsi_autopm_get_device(STp->device) < 0) {
1218 retval = -EIO;
1219 goto err_out;
1220 }
1221 resumed = 1;
1222 if (!scsi_block_when_processing_errors(STp->device)) {
1223 retval = (-ENXIO);
1224 goto err_out;
1225 }
...
1264 err_out:
1265 normalize_buffer(STp->buffer);
1266 spin_lock(&st_use_lock);
1267 STp->in_use = 0;
1268 spin_unlock(&st_use_lock);
1269 scsi_tape_put(STp); <-- STp->device = 0 after this
1270 if (resumed)
1271 scsi_autopm_put_device(STp->device);
1272 return retval;

The ref count for the struct scsi_tape had already been reduced
to 1 when the .remove method of the st module had been called.
The kref_put() in scsi_tape_put() caused scsi_tape_release()
to be called:

0266 static void scsi_tape_put(struct scsi_tape *STp)
0267 {
0268 struct scsi_device *sdev = STp->device;
0269
0270 mutex_lock(&st_ref_mutex);
0271 kref_put(&STp->kref, scsi_tape_release); <-- calls this
0272 scsi_device_put(sdev);
0273 mutex_unlock(&st_ref_mutex);
0274 }

In scsi_tape_release() the struct scsi_device in the struct
scsi_tape gets set to NULL:

4273 static void scsi_tape_release(struct kref *kref)
4274 {
4275 struct scsi_tape *tpnt = to_scsi_tape(kref);
4276 struct gendisk *disk = tpnt->disk;
4277
4278 tpnt->device = NULL; <<<---- where the dev is nulled
4279
4280 if (tpnt->buffer) {
4281 normalize_buffer(tpnt->buffer);
4282 kfree(tpnt->buffer->reserved_pages);
4283 kfree(tpnt->buffer);
4284 }
4285
4286 disk->private_data = NULL;
4287 put_disk(disk);
4288 kfree(tpnt);
4289 return;
4290 }

Although the problem was reported on SLES11.3 the problem appears
in linux-next as well.

The crash is fixed by reordering the code so we no longer access
the struct scsi_tape after the kref_put() is done on it in st_open().

Signed-off-by: Shane Seymour <[email protected]>
Signed-off-by: Darren Lavender <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Kai Mäkisara <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/scsi/st.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index ff44b3c2cff2..9903f1d58d3e 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -1262,9 +1262,9 @@ static int st_open(struct inode *inode, struct file *filp)
spin_lock(&st_use_lock);
STp->in_use = 0;
spin_unlock(&st_use_lock);
- scsi_tape_put(STp);
if (resumed)
scsi_autopm_put_device(STp->device);
+ scsi_tape_put(STp);
return retval;

}
--
2.5.0

2015-08-24 09:34:05

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 13/82] mac80211: clear subdir_stations when removing debugfs

From: Tom Hughes <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4479004e6409087d1b4986881dc98c6c15dffb28 upstream.

If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:

BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]

Signed-off-by: Tom Hughes <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
net/mac80211/debugfs_netdev.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 8e41f0163c5a..92be2863b7db 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -698,6 +698,7 @@ void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata)

debugfs_remove_recursive(sdata->vif.debugfs_dir);
sdata->vif.debugfs_dir = NULL;
+ sdata->debugfs.subdir_stations = NULL;
}

void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata)
--
2.5.0

2015-08-24 09:34:03

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 14/82] mmc: sdhci-esdhc: Make 8BIT bus work

From: Joakim Tjernlund <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 8e91125ff3f57f15c6568e2a6d32743b3f7815e4 upstream.

Support for 8BIT bus with was added some time ago to sdhci-esdhc but
then missed to remove the 8BIT from the reserved bit mask which made
8BIT non functional.

Fixes: 66b50a00992d ("mmc: esdhc: Add support for 8-bit bus width and..")
Signed-off-by: Joakim Tjernlund <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/mmc/host/sdhci-esdhc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci-esdhc.h b/drivers/mmc/host/sdhci-esdhc.h
index a2a06420e463..ebff71092743 100644
--- a/drivers/mmc/host/sdhci-esdhc.h
+++ b/drivers/mmc/host/sdhci-esdhc.h
@@ -47,7 +47,7 @@
#define ESDHC_DMA_SYSCTL 0x40c
#define ESDHC_DMA_SNOOP 0x00000040

-#define ESDHC_HOST_CONTROL_RES 0x05
+#define ESDHC_HOST_CONTROL_RES 0x01

static inline void esdhc_set_clock(struct sdhci_host *host, unsigned int clock,
unsigned int host_clock)
--
2.5.0

2015-08-24 09:33:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 15/82] mmc: sdhci-pxav3: fix platform_data is not initialized

From: Jingju Hou <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9cd76049f0d90ae241f5ad80e311489824527000 upstream.

pdev->dev.platform_data is not initialized if match is true in function
sdhci_pxav3_probe. Just local variable pdata is assigned the return value
from function pxav3_get_mmc_pdata().

static int sdhci_pxav3_probe(struct platform_device *pdev) {

struct sdhci_pxa_platdata *pdata = pdev->dev.platform_data;
...
if (match) {
ret = mmc_of_parse(host->mmc);
if (ret)
goto err_of_parse;
sdhci_get_of_property(pdev);
pdata = pxav3_get_mmc_pdata(dev);
}
...
}

Signed-off-by: Jingju Hou <[email protected]>
Fixes: b650352dd3df("mmc: sdhci-pxa: Add device tree support")
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/mmc/host/sdhci-pxav3.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/host/sdhci-pxav3.c b/drivers/mmc/host/sdhci-pxav3.c
index 561c6b4907a1..b80766699249 100644
--- a/drivers/mmc/host/sdhci-pxav3.c
+++ b/drivers/mmc/host/sdhci-pxav3.c
@@ -257,6 +257,7 @@ static int sdhci_pxav3_probe(struct platform_device *pdev)
goto err_of_parse;
sdhci_get_of_property(pdev);
pdata = pxav3_get_mmc_pdata(dev);
+ pdev->dev.platform_data = pdata;
} else if (pdata) {
/* on-chip device */
if (pdata->flags & PXA_FLAG_CARD_PERMANENT)
--
2.5.0

2015-08-24 09:34:02

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 16/82] md/raid1: fix test for 'was read error from last working device'.

From: NeilBrown <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 34cab6f42003cb06f48f86a86652984dec338ae9 upstream.

When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
read error to the caller.

However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device. However a spare which is rebuilding
would be non-faulty but so not the only working device.

So change the test from "!Faulty" to "In_sync". If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.

This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0

Reported-and-tested-by: Alexander Lyakas <[email protected]>
Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/md/raid1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 633b6e1e7d4d..14934ac112c3 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -327,7 +327,7 @@ static void raid1_end_read_request(struct bio *bio, int error)
spin_lock_irqsave(&conf->device_lock, flags);
if (r1_bio->mddev->degraded == conf->raid_disks ||
(r1_bio->mddev->degraded == conf->raid_disks-1 &&
- !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags)))
+ test_bit(In_sync, &conf->mirrors[mirror].rdev->flags)))
uptodate = 1;
spin_unlock_irqrestore(&conf->device_lock, flags);
}
--
2.5.0

2015-08-24 09:33:42

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 17/82] tile: use free_bootmem_late() for initrd

From: Chris Metcalf <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3f81d2447b37ac697b3c600039f2c6b628c06e21 upstream.

We were previously using free_bootmem() and just getting lucky
that nothing too bad happened.

Signed-off-by: Chris Metcalf <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/tile/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 74c91729a62a..bdb3ecf8e168 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1146,7 +1146,7 @@ static void __init load_hv_initrd(void)

void __init free_initrd_mem(unsigned long begin, unsigned long end)
{
- free_bootmem(__pa(begin), end - begin);
+ free_bootmem_late(__pa(begin), end - begin);
}

static int __init setup_initrd(char *str)
--
2.5.0

2015-08-24 09:32:55

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 18/82] Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen

From: Bernhard Bender <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 968491709e5b1aaf429428814fff3d932fa90b60 upstream.

This patch fixes a problem in the usbtouchscreen driver for DMC TSC-30
touch screen. Due to a missing delay between the RESET and SET_RATE
commands, the touch screen may become unresponsive during system startup or
driver loading.

According to the DMC documentation, a delay is needed after the RESET
command to allow the chip to complete its internal initialization. As this
delay is not guaranteed, we had a system where the touch screen
occasionally did not send any touch data. There was no other indication of
the problem.

The patch fixes the problem by adding a 150ms delay between the RESET and
SET_RATE commands.

Suggested-by: Jakob Mustafa <[email protected]>
Signed-off-by: Bernhard Bender <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/input/touchscreen/usbtouchscreen.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/input/touchscreen/usbtouchscreen.c b/drivers/input/touchscreen/usbtouchscreen.c
index e565530e3596..5679cd9003cc 100644
--- a/drivers/input/touchscreen/usbtouchscreen.c
+++ b/drivers/input/touchscreen/usbtouchscreen.c
@@ -628,6 +628,9 @@ static int dmc_tsc10_init(struct usbtouch_usb *usbtouch)
goto err_out;
}

+ /* TSC-25 data sheet specifies a delay after the RESET command */
+ msleep(150);
+
/* set coordinate output rate */
buf[0] = buf[1] = 0xFF;
ret = usb_control_msg(dev, usb_rcvctrlpipe (dev, 0),
--
2.5.0

2015-08-24 09:32:54

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 19/82] blkcg: fix gendisk reference leak in blkg_conf_prep()

From: Tejun Heo <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 5f6c2d2b7dbb541c1e922538c49fa04c494ae3d7 upstream.

When a blkcg configuration is targeted to a partition rather than a
whole device, blkg_conf_prep fails with -EINVAL; unfortunately, it
forgets to put the gendisk ref in that case. Fix it.

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
block/blk-cgroup.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a573d4bd71d9..47bf1599aa2f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -703,8 +703,12 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
return -EINVAL;

disk = get_gendisk(MKDEV(major, minor), &part);
- if (!disk || part)
+ if (!disk)
return -EINVAL;
+ if (part) {
+ put_disk(disk);
+ return -EINVAL;
+ }

rcu_read_lock();
spin_lock_irq(disk->queue->queue_lock);
--
2.5.0

2015-08-24 09:32:52

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 20/82] ata: pmp: add quirk for Marvell 4140 SATA PMP

From: Lior Amsalem <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 945b47441d83d2392ac9f984e0267ad521f24268 upstream.

This commit adds the necessary quirk to make the Marvell 4140 SATA PMP
work properly. This PMP doesn't like SRST on port number 4 (the host
port) so this commit marks this port as not supporting SRST.

Signed-off-by: Lior Amsalem <[email protected]>
Reviewed-by: Nadav Haklai <[email protected]>
Signed-off-by: Thomas Petazzoni <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/ata/libata-pmp.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 7ccc084bf1df..85aa76116a30 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -460,6 +460,13 @@ static void sata_pmp_quirks(struct ata_port *ap)
ATA_LFLAG_NO_SRST |
ATA_LFLAG_ASSUME_ATA;
}
+ } else if (vendor == 0x11ab && devid == 0x4140) {
+ /* Marvell 4140 quirks */
+ ata_for_each_link(link, ap, EDGE) {
+ /* port 4 is for SEMB device and it doesn't like SRST */
+ if (link->pmp == 4)
+ link->flags |= ATA_LFLAG_DISABLED;
+ }
}
}

--
2.5.0

2015-08-24 09:32:18

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 21/82] usb-storage: ignore ZTE MF 823 card reader in mode 0x1225

From: Oliver Neukum <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 5fb2c782f451a4fb9c19c076e2c442839faf0f76 upstream.

This device automatically switches itself to another mode (0x1405)
unless the specific access pattern of Windows is followed in its
initial mode. That makes a dirty unmount of the internal storage
devices inevitable if they are mounted. So the card reader of
such a device should be ignored, lest an unclean removal become
inevitable.

This replaces an earlier patch that ignored all LUNs of this device.
That patch was overly broad.

Signed-off-by: Oliver Neukum <[email protected]>
Reviewed-by: Lars Melin <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/storage/unusual_devs.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h
index 00b47646522b..ff273d527b6e 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -2045,6 +2045,18 @@ UNUSUAL_DEV( 0x1908, 0x3335, 0x0200, 0x0200,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_READ_DISC_INFO ),

+/* Reported by Oliver Neukum <[email protected]>
+ * This device morphes spontaneously into another device if the access
+ * pattern of Windows isn't followed. Thus writable media would be dirty
+ * if the initial instance is used. So the device is limited to its
+ * virtual CD.
+ * And yes, the concept that BCD goes up to 9 is not heeded */
+UNUSUAL_DEV( 0x19d2, 0x1225, 0x0000, 0xffff,
+ "ZTE,Incorporated",
+ "ZTE WCDMA Technologies MSM",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_SINGLE_LUN ),
+
/* Reported by Sven Geggus <[email protected]>
* This encrypted pen drive returns bogus data for the initial READ(10).
*/
--
2.5.0

2015-08-24 09:31:45

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 22/82] xhci: Calculate old endpoints correctly on device reset

From: Brian Campbell <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 326124a027abc9a7f43f72dc94f6f0f7a55b02b3 upstream.

When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.

This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.

The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.

Signed-off-by: Brian Campbell <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/host/xhci.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index e0ccc95c91e2..00686a8c4fa0 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3423,6 +3423,9 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, struct usb_device *udev)
return -EINVAL;
}

+ if (virt_dev->tt_info)
+ old_active_eps = virt_dev->tt_info->active_eps;
+
if (virt_dev->udev != udev) {
/* If the virt_dev and the udev does not match, this virt_dev
* may belong to another udev.
--
2.5.0

2015-08-24 09:31:28

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 23/82] xhci: report U3 when link is in resume state

From: Zhuang Jin Can <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 243292a2ad3dc365849b820a64868927168894ac upstream.

xhci_hub_report_usb3_link_state() returns pls as U0 when the link
is in resume state, and this causes usb core to think the link is in
U0 while actually it's in resume state. When usb core transfers
control request on the link, it fails with TRB error as the link
is not ready for transfer.

To fix the issue, report U3 when the link is in resume state, thus
usb core knows the link it's not ready for transfer.

Signed-off-by: Zhuang Jin Can <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/host/xhci-hub.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index abb36165515a..86cebec3c5d6 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -480,10 +480,13 @@ static void xhci_hub_report_usb3_link_state(struct xhci_hcd *xhci,
u32 pls = status_reg & PORT_PLS_MASK;

/* resume state is a xHCI internal state.
- * Do not report it to usb core.
+ * Do not report it to usb core, instead, pretend to be U3,
+ * thus usb core knows it's not ready for transfer
*/
- if (pls == XDEV_RESUME)
+ if (pls == XDEV_RESUME) {
+ *status |= USB_SS_PORT_LS_U3;
return;
+ }

/* When the CAS bit is set then warm reset
* should be performed on port
--
2.5.0

2015-08-24 09:31:47

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 24/82] xhci: prevent bus_suspend if SS port resuming in phase 1

From: Zhuang Jin Can <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit fac4271d1126c45ceaceb7f4a336317b771eb121 upstream.

When the link is just waken, it's in Resume state, and driver sets PLS to
U0. This refers to Phase 1. Phase 2 refers to when the link has completed
the transition from Resume state to U0.

With the fix of xhci: report U3 when link is in resume state, it also
exposes an issue that usb3 roothub and controller can suspend right
after phase 1, and this causes a hard hang in controller.

To fix the issue, we need to prevent usb3 bus suspend if any port is
resuming in phase 1.

[merge separate USB2 and USB3 port resume checking to one -Mathias]
Signed-off-by: Zhuang Jin Can <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/host/xhci-hub.c | 6 +++---
drivers/usb/host/xhci-ring.c | 3 +++
drivers/usb/host/xhci.h | 1 +
3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 86cebec3c5d6..4adc1be24b4a 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1120,10 +1120,10 @@ int xhci_bus_suspend(struct usb_hcd *hcd)
spin_lock_irqsave(&xhci->lock, flags);

if (hcd->self.root_hub->do_remote_wakeup) {
- if (bus_state->resuming_ports) {
+ if (bus_state->resuming_ports || /* USB2 */
+ bus_state->port_remote_wakeup) { /* USB3 */
spin_unlock_irqrestore(&xhci->lock, flags);
- xhci_dbg(xhci, "suspend failed because "
- "a port is resuming\n");
+ xhci_dbg(xhci, "suspend failed because a port is resuming\n");
return -EBUSY;
}
}
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 07aafa50f453..b16404723fc2 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1707,6 +1707,9 @@ static void handle_port_status(struct xhci_hcd *xhci,
usb_hcd_resume_root_hub(hcd);
}

+ if (hcd->speed == HCD_USB3 && (temp & PORT_PLS_MASK) == XDEV_INACTIVE)
+ bus_state->port_remote_wakeup &= ~(1 << faked_port_index);
+
if ((temp & PORT_PLC) && (temp & PORT_PLS_MASK) == XDEV_RESUME) {
xhci_dbg(xhci, "port resume event for port %d\n", port_id);

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 510e9c0efd18..8686a06d83d4 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -285,6 +285,7 @@ struct xhci_op_regs {
#define XDEV_U0 (0x0 << 5)
#define XDEV_U2 (0x2 << 5)
#define XDEV_U3 (0x3 << 5)
+#define XDEV_INACTIVE (0x6 << 5)
#define XDEV_RESUME (0xf << 5)
/* true: port has power (see HCC_PPC) */
#define PORT_POWER (1 << 9)
--
2.5.0

2015-08-24 09:30:53

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 25/82] xhci: do not report PLC when link is in internal resume state

From: Zhuang Jin Can <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit aca3a0489ac019b58cf32794d5362bb284cb9b94 upstream.

Port link change with port in resume state should not be
reported to usbcore, as this is an internal state to be
handled by xhci driver. Reporting PLC to usbcore may
cause usbcore clearing PLC first and port change event irq
won't be generated.

Signed-off-by: Zhuang Jin Can <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/host/xhci-hub.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 4adc1be24b4a..55b3aa33bc06 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -586,7 +586,14 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
status |= USB_PORT_STAT_C_RESET << 16;
/* USB3.0 only */
if (hcd->speed == HCD_USB3) {
- if ((raw_port_status & PORT_PLC))
+ /* Port link change with port in resume state should not be
+ * reported to usbcore, as this is an internal state to be
+ * handled by xhci driver. Reporting PLC to usbcore may
+ * cause usbcore clearing PLC first and port change event
+ * irq won't be generated.
+ */
+ if ((raw_port_status & PORT_PLC) &&
+ (raw_port_status & PORT_PLS_MASK) != XDEV_RESUME)
status |= USB_PORT_STAT_C_LINK_STATE << 16;
if ((raw_port_status & PORT_WRC))
status |= USB_PORT_STAT_C_BH_RESET << 16;
--
2.5.0

2015-08-24 09:30:52

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 26/82] rds: rds_ib_device.refcount overflow

From: Wengang Wang <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4fabb59449aa44a585b3603ffdadd4c5f4d0c033 upstream.

Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
net/rds/ib_rdma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index e8fdb172adbb..a985158d95d5 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
}

ibmr = rds_ib_alloc_fmr(rds_ibdev);
- if (IS_ERR(ibmr))
+ if (IS_ERR(ibmr)) {
+ rds_ib_dev_put(rds_ibdev);
return ibmr;
+ }

ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
if (ret == 0)
--
2.5.0

2015-08-24 09:29:46

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 27/82] vhost: actually track log eventfd file

From: Marc-André Lureau <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7932c0bd7740f4cd2aa168d3ce0199e7af7d72d5 upstream.

While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/vhost/vhost.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69068e0d8f31..384bcc8ed7ad 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -878,6 +878,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
}
if (eventfp != d->log_file) {
filep = d->log_file;
+ d->log_file = eventfp;
ctx = d->log_ctx;
d->log_ctx = eventfp ?
eventfd_ctx_fileget(eventfp) : NULL;
--
2.5.0

2015-08-24 09:29:44

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 28/82] iscsi-target: Fix use-after-free during TPG session shutdown

From: Nicholas Bellinger <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 417c20a9bdd1e876384127cf096d8ae8b559066c upstream.

This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.

The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.

To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.

As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.

Reported-by: Sunilkumar Nadumuttlu <[email protected]>
Cc: Sunilkumar Nadumuttlu <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/target/iscsi/iscsi_target.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 8ac1800eef06..13af385fc859 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4707,6 +4707,7 @@ int iscsit_release_sessions_for_tpg(struct iscsi_portal_group *tpg, int force)
struct iscsi_session *sess;
struct se_portal_group *se_tpg = &tpg->tpg_se_tpg;
struct se_session *se_sess, *se_sess_tmp;
+ LIST_HEAD(free_list);
int session_count = 0;

spin_lock_bh(&se_tpg->session_lock);
@@ -4728,14 +4729,17 @@ int iscsit_release_sessions_for_tpg(struct iscsi_portal_group *tpg, int force)
}
atomic_set(&sess->session_reinstatement, 1);
spin_unlock(&sess->conn_lock);
- spin_unlock_bh(&se_tpg->session_lock);

- iscsit_free_session(sess);
- spin_lock_bh(&se_tpg->session_lock);
+ list_move_tail(&se_sess->sess_list, &free_list);
+ }
+ spin_unlock_bh(&se_tpg->session_lock);

+ list_for_each_entry_safe(se_sess, se_sess_tmp, &free_list, sess_list) {
+ sess = (struct iscsi_session *)se_sess->fabric_sess_ptr;
+
+ iscsit_free_session(sess);
session_count++;
}
- spin_unlock_bh(&se_tpg->session_lock);

pr_debug("Released %d iSCSI Session(s) from Target Portal"
" Group: %hu\n", session_count, tpg->tpgt);
--
2.5.0

2015-08-24 09:28:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 29/82] iscsi-target: Fix iser explicit logout TX kthread leak

From: Nicholas Bellinger <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 007d038bdf95ccfe2491d0078be54040d110fd06 upstream.

This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where an explicit iser-target logout would result
in ->tx_thread_active being incorrectly cleared by the logout post
handler, and subsequent TX kthread leak:

commit 88dcd2dab5c23b1c9cfc396246d8f476c872f0ca
Author: Nicholas Bellinger <[email protected]>
Date: Thu Feb 26 22:19:15 2015 -0800

iscsi-target: Convert iscsi_thread_set usage to kthread.h

To address this bug, change iscsit_logout_post_handler_closesession()
and iscsit_logout_post_handler_samecid() to only cmpxchg() on
->tx_thread_active for traditional iscsi/tcp connections.

This is required because iscsi/tcp connections are invoking logout
post handler logic directly from TX kthread context, while iser
connections are invoking logout post handler logic from a seperate
workqueue context.

Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/target/iscsi/iscsi_target.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 13af385fc859..60e06fb049ef 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4474,7 +4474,18 @@ static void iscsit_logout_post_handler_closesession(
struct iscsi_conn *conn)
{
struct iscsi_session *sess = conn->sess;
- int sleep = cmpxchg(&conn->tx_thread_active, true, false);
+ int sleep = 1;
+ /*
+ * Traditional iscsi/tcp will invoke this logic from TX thread
+ * context during session logout, so clear tx_thread_active and
+ * sleep if iscsit_close_connection() has not already occured.
+ *
+ * Since iser-target invokes this logic from it's own workqueue,
+ * always sleep waiting for RX/TX thread shutdown to complete
+ * within iscsit_close_connection().
+ */
+ if (conn->conn_transport->transport_type == ISCSI_TCP)
+ sleep = cmpxchg(&conn->tx_thread_active, true, false);

atomic_set(&conn->conn_logout_remove, 0);
complete(&conn->conn_logout_comp);
@@ -4488,7 +4499,10 @@ static void iscsit_logout_post_handler_closesession(
static void iscsit_logout_post_handler_samecid(
struct iscsi_conn *conn)
{
- int sleep = cmpxchg(&conn->tx_thread_active, true, false);
+ int sleep = 1;
+
+ if (conn->conn_transport->transport_type == ISCSI_TCP)
+ sleep = cmpxchg(&conn->tx_thread_active, true, false);

atomic_set(&conn->conn_logout_remove, 0);
complete(&conn->conn_logout_comp);
--
2.5.0

2015-08-24 09:28:57

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 30/82] 3w-xxxx: fix mis-aligned struct accesses

From: Arnd Bergmann <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4bfaa5c4b99ddec00907e854d70453bd2aef39a0 upstream.

Building an allmodconfig ARM kernel, I get multiple such
warnings because of a spinlock contained in packed structure
in the 3w-xxxx driver:

../drivers/scsi/3w-xxxx.c: In function 'tw_chrdev_ioctl':
../drivers/scsi/3w-xxxx.c:1001:68: warning: mis-aligned access used for structure member [-fstrict-volatile-bitfields]
timeout = wait_event_timeout(tw_dev->ioctl_wqueue, tw_dev->chrdev_request_id == TW_IOCTL_CHRDEV_FREE, timeout);
^
../drivers/scsi/3w-xxxx.c:1001:68: note: when a volatile object spans multiple type-sized locations, the compiler must choose between using a single mis-aligned access to preserve the volatility, or using multiple aligned accesses to avoid runtime faults; this code may fail at runtime if the hardware does not allow this access

The same bug apparently was present in 3w-sas and 3w-9xxx, but has been
fixed in the past. This patch uses the same fix by moving the pragma
in front of the TW_Device_Extension definition, so it only covers
hardware structures.

Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Adam Radford <[email protected]>
Cc: Adam Radford <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/scsi/3w-xxxx.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/3w-xxxx.h b/drivers/scsi/3w-xxxx.h
index 1d31858766ce..6f65e663d393 100644
--- a/drivers/scsi/3w-xxxx.h
+++ b/drivers/scsi/3w-xxxx.h
@@ -387,6 +387,8 @@ typedef struct TAG_TW_Passthru
unsigned char padding[12];
} TW_Passthru;

+#pragma pack()
+
typedef struct TAG_TW_Device_Extension {
u32 base_addr;
unsigned long *alignment_virtual_address[TW_Q_LENGTH];
@@ -425,6 +427,4 @@ typedef struct TAG_TW_Device_Extension {
wait_queue_head_t ioctl_wqueue;
} TW_Device_Extension;

-#pragma pack()
-
#endif /* _3W_XXXX_H */
--
2.5.0

2015-08-24 09:27:51

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 31/82] hwrng: via-rng - Mark device ID table as __maybe_unused

From: Ben Hutchings <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit a44bc80e66b4014e462cb8be9d354a7bc4723b7e upstream.

It is only used in modular builds.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/char/hw_random/via-rng.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/via-rng.c b/drivers/char/hw_random/via-rng.c
index e737772ad69a..de5a6dcfb3e2 100644
--- a/drivers/char/hw_random/via-rng.c
+++ b/drivers/char/hw_random/via-rng.c
@@ -221,7 +221,7 @@ static void __exit mod_exit(void)
module_init(mod_init);
module_exit(mod_exit);

-static struct x86_cpu_id via_rng_cpu_id[] = {
+static struct x86_cpu_id __maybe_unused via_rng_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_XSTORE),
{}
};
--
2.5.0

2015-08-24 09:27:53

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 32/82] ARM: realview: fix sparsemem build

From: Arnd Bergmann <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit dd94d3558947756b102b1487911acd925224a38c upstream.

Commit b713aa0b15 "ARM: fix asm/memory.h build error" broke some
configurations on mach-realview with sparsemem enabled, which
is missing a definition of PHYS_OFFSET:

arch/arm/include/asm/memory.h:268:42: error: 'PHYS_OFFSET' undeclared (first use in this function)
#define PHYS_PFN_OFFSET ((unsigned long)(PHYS_OFFSET >> PAGE_SHIFT))
arch/arm/include/asm/dma-mapping.h:104:9: note: in expansion of macro 'PHYS_PFN_OFFSET'
return PHYS_PFN_OFFSET + dma_to_pfn(dev, *dev->dma_mask);

An easy workaround is for realview to define PHYS_OFFSET itself,
in the same way we define it for platforms that don't have a private
__virt_to_phys function.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Russell King <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arm/mach-realview/include/mach/memory.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm/mach-realview/include/mach/memory.h b/arch/arm/mach-realview/include/mach/memory.h
index 2022e092f0ca..db09170e3832 100644
--- a/arch/arm/mach-realview/include/mach/memory.h
+++ b/arch/arm/mach-realview/include/mach/memory.h
@@ -56,6 +56,8 @@
#define PAGE_OFFSET1 (PAGE_OFFSET + 0x10000000)
#define PAGE_OFFSET2 (PAGE_OFFSET + 0x30000000)

+#define PHYS_OFFSET PLAT_PHYS_OFFSET
+
#define __phys_to_virt(phys) \
((phys) >= 0x80000000 ? (phys) - 0x80000000 + PAGE_OFFSET2 : \
(phys) >= 0x20000000 ? (phys) - 0x20000000 + PAGE_OFFSET1 : \
--
2.5.0

2015-08-24 09:27:49

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 33/82] MIPS: Fix sched_getaffinity with MT FPAFF enabled

From: Felix Fietkau <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 1d62d737555e1378eb62a8bba26644f7d97139d2 upstream.

p->thread.user_cpus_allowed is zero-initialized and is only filled on
the first sched_setaffinity call.

To avoid adding overhead in the task initialization codepath, simply OR
the returned mask in sched_getaffinity with p->cpus_allowed.

Signed-off-by: Felix Fietkau <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/10740/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/mips/kernel/mips-mt-fpaff.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index cb098628aee8..ca16964a2b5e 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -154,7 +154,7 @@ asmlinkage long mipsmt_sys_sched_getaffinity(pid_t pid, unsigned int len,
unsigned long __user *user_mask_ptr)
{
unsigned int real_len;
- cpumask_t mask;
+ cpumask_t allowed, mask;
int retval;
struct task_struct *p;

@@ -173,7 +173,8 @@ asmlinkage long mipsmt_sys_sched_getaffinity(pid_t pid, unsigned int len,
if (retval)
goto out_unlock;

- cpumask_and(&mask, &p->thread.user_cpus_allowed, cpu_possible_mask);
+ cpumask_or(&allowed, &p->thread.user_cpus_allowed, &p->cpus_allowed);
+ cpumask_and(&mask, &allowed, cpu_active_mask);

out_unlock:
read_unlock(&tasklist_lock);
--
2.5.0

2015-08-24 09:27:12

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 34/82] MIPS: Make set_pte() SMP safe.

From: David Daney <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 46011e6ea39235e4aca656673c500eac81a07a17 upstream.

On MIPS the GLOBAL bit of the PTE must have the same value in any
aligned pair of PTEs. These pairs of PTEs are referred to as
"buddies". In a SMP system is is possible for two CPUs to be calling
set_pte() on adjacent PTEs at the same time. There is a race between
setting the PTE and a different CPU setting the GLOBAL bit in its
buddy PTE.

This race can be observed when multiple CPUs are executing
vmap()/vfree() at the same time.

Make setting the buddy PTE's GLOBAL bit an atomic operation to close
the race condition.

The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not*
handled.

Signed-off-by: David Daney <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/10835/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/mips/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 008324d1c261..b15495367d5c 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -150,8 +150,39 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
* Make sure the buddy is global too (if it's !none,
* it better already be global)
*/
+#ifdef CONFIG_SMP
+ /*
+ * For SMP, multiple CPUs can race, so we need to do
+ * this atomically.
+ */
+#ifdef CONFIG_64BIT
+#define LL_INSN "lld"
+#define SC_INSN "scd"
+#else /* CONFIG_32BIT */
+#define LL_INSN "ll"
+#define SC_INSN "sc"
+#endif
+ unsigned long page_global = _PAGE_GLOBAL;
+ unsigned long tmp;
+
+ __asm__ __volatile__ (
+ " .set push\n"
+ " .set noreorder\n"
+ "1: " LL_INSN " %[tmp], %[buddy]\n"
+ " bnez %[tmp], 2f\n"
+ " or %[tmp], %[tmp], %[global]\n"
+ " " SC_INSN " %[tmp], %[buddy]\n"
+ " beqz %[tmp], 1b\n"
+ " nop\n"
+ "2:\n"
+ " .set pop"
+ : [buddy] "+m" (buddy->pte),
+ [tmp] "=&r" (tmp)
+ : [global] "r" (page_global));
+#else /* !CONFIG_SMP */
if (pte_none(*buddy))
pte_val(*buddy) = pte_val(*buddy) | _PAGE_GLOBAL;
+#endif /* CONFIG_SMP */
}
#endif
}
--
2.5.0

2015-08-24 09:27:10

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 35/82] fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

From: Jan Kara <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 8f2f3eb59dff4ec538de55f2e0592fec85966aab upstream.

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list. This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara <[email protected]>
Reported-by: Ashish Sangwan <[email protected]>
Reviewed-by: Ashish Sangwan <[email protected]>
Cc: Lino Sanfilippo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
fs/notify/mark.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 923fe4a5f503..6bffc3331df6 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -293,16 +293,36 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
unsigned int flags)
{
struct fsnotify_mark *lmark, *mark;
+ LIST_HEAD(to_free);

+ /*
+ * We have to be really careful here. Anytime we drop mark_mutex, e.g.
+ * fsnotify_clear_marks_by_inode() can come and free marks. Even in our
+ * to_free list so we have to use mark_mutex even when accessing that
+ * list. And freeing mark requires us to drop mark_mutex. So we can
+ * reliably free only the first mark in the list. That's why we first
+ * move marks to free to to_free list in one go and then free marks in
+ * to_free list one by one.
+ */
mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
- if (mark->flags & flags) {
- fsnotify_get_mark(mark);
- fsnotify_destroy_mark_locked(mark, group);
- fsnotify_put_mark(mark);
- }
+ if (mark->flags & flags)
+ list_move(&mark->g_list, &to_free);
}
mutex_unlock(&group->mark_mutex);
+
+ while (1) {
+ mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+ if (list_empty(&to_free)) {
+ mutex_unlock(&group->mark_mutex);
+ break;
+ }
+ mark = list_first_entry(&to_free, struct fsnotify_mark, g_list);
+ fsnotify_get_mark(mark);
+ fsnotify_destroy_mark_locked(mark, group);
+ mutex_unlock(&group->mark_mutex);
+ fsnotify_put_mark(mark);
+ }
}

/*
--
2.5.0

2015-08-24 09:26:26

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 36/82] drm/radeon/combios: add some validation of lvds values

From: Alex Deucher <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0a90a0cff9f429f886f423967ae053150dce9259 upstream.

Fixes a broken hsync start value uncovered by:
abc0b1447d4974963548777a5ba4a4457c82c426
(drm: Perform basic sanity checks on probed modes)

The driver handled the bad hsync start elsewhere, but
the above commit prevented it from getting added.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=91401

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/gpu/drm/radeon/radeon_combios.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c
index 68ce36056019..8cac69819054 100644
--- a/drivers/gpu/drm/radeon/radeon_combios.c
+++ b/drivers/gpu/drm/radeon/radeon_combios.c
@@ -1271,10 +1271,15 @@ struct radeon_encoder_lvds *radeon_combios_get_lvds_info(struct radeon_encoder

if ((RBIOS16(tmp) == lvds->native_mode.hdisplay) &&
(RBIOS16(tmp + 2) == lvds->native_mode.vdisplay)) {
+ u32 hss = (RBIOS16(tmp + 21) - RBIOS16(tmp + 19) - 1) * 8;
+
+ if (hss > lvds->native_mode.hdisplay)
+ hss = (10 - 1) * 8;
+
lvds->native_mode.htotal = lvds->native_mode.hdisplay +
(RBIOS16(tmp + 17) - RBIOS16(tmp + 19)) * 8;
lvds->native_mode.hsync_start = lvds->native_mode.hdisplay +
- (RBIOS16(tmp + 21) - RBIOS16(tmp + 19) - 1) * 8;
+ hss;
lvds->native_mode.hsync_end = lvds->native_mode.hsync_start +
(RBIOS8(tmp + 23) * 8);

--
2.5.0

2015-08-24 09:26:25

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 37/82] ipr: Fix locking for unit attention handling

From: Brian King <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 36b8e180e1e929e00b351c3b72aab3147fc14116 upstream.

Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes
a crash seen as the __devices list in the scsi host was changing as we were
iterating through it.

Reviewed-by: Wen Xiong <[email protected]>
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Brian King <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/scsi/ipr.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 5f841652886e..5f06ca46e187 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6179,21 +6179,23 @@ static void ipr_scsi_done(struct ipr_cmnd *ipr_cmd)
struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
struct scsi_cmnd *scsi_cmd = ipr_cmd->scsi_cmd;
u32 ioasc = be32_to_cpu(ipr_cmd->s.ioasa.hdr.ioasc);
- unsigned long hrrq_flags;
+ unsigned long lock_flags;

scsi_set_resid(scsi_cmd, be32_to_cpu(ipr_cmd->s.ioasa.hdr.residual_data_len));

if (likely(IPR_IOASC_SENSE_KEY(ioasc) == 0)) {
scsi_dma_unmap(scsi_cmd);

- spin_lock_irqsave(ipr_cmd->hrrq->lock, hrrq_flags);
+ spin_lock_irqsave(ipr_cmd->hrrq->lock, lock_flags);
list_add_tail(&ipr_cmd->queue, &ipr_cmd->hrrq->hrrq_free_q);
scsi_cmd->scsi_done(scsi_cmd);
- spin_unlock_irqrestore(ipr_cmd->hrrq->lock, hrrq_flags);
+ spin_unlock_irqrestore(ipr_cmd->hrrq->lock, lock_flags);
} else {
- spin_lock_irqsave(ipr_cmd->hrrq->lock, hrrq_flags);
+ spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags);
+ spin_lock(&ipr_cmd->hrrq->_lock);
ipr_erp_start(ioa_cfg, ipr_cmd);
- spin_unlock_irqrestore(ipr_cmd->hrrq->lock, hrrq_flags);
+ spin_unlock(&ipr_cmd->hrrq->_lock);
+ spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags);
}
}

--
2.5.0

2015-08-24 09:26:24

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 38/82] ipr: Fix incorrect trace indexing

From: Brian King <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit bb7c54339e6a10ecce5c4961adf5e75b3cf0af30 upstream.

When ipr's internal driver trace was changed to an atomic, a signed/unsigned
bug slipped in which results in us indexing backwards in our memory buffer
writing on memory that does not belong to us. This patch fixes this by removing
the modulo and instead just mask off the low bits.

Tested-by: Wen Xiong <[email protected]>
Reviewed-by: Wen Xiong <[email protected]>
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Brian King <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/scsi/ipr.c | 5 +++--
drivers/scsi/ipr.h | 1 +
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 5f06ca46e187..fa55ad47cf65 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -592,9 +592,10 @@ static void ipr_trc_hook(struct ipr_cmnd *ipr_cmd,
{
struct ipr_trace_entry *trace_entry;
struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
+ unsigned int trace_index;

- trace_entry = &ioa_cfg->trace[atomic_add_return
- (1, &ioa_cfg->trace_index)%IPR_NUM_TRACE_ENTRIES];
+ trace_index = atomic_add_return(1, &ioa_cfg->trace_index) & IPR_TRACE_INDEX_MASK;
+ trace_entry = &ioa_cfg->trace[trace_index];
trace_entry->time = jiffies;
trace_entry->op_code = ipr_cmd->ioarcb.cmd_pkt.cdb[0];
trace_entry->type = type;
diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h
index f6d379725a00..06b3b4bb2911 100644
--- a/drivers/scsi/ipr.h
+++ b/drivers/scsi/ipr.h
@@ -1462,6 +1462,7 @@ struct ipr_ioa_cfg {

#define IPR_NUM_TRACE_INDEX_BITS 8
#define IPR_NUM_TRACE_ENTRIES (1 << IPR_NUM_TRACE_INDEX_BITS)
+#define IPR_TRACE_INDEX_MASK (IPR_NUM_TRACE_ENTRIES - 1)
#define IPR_TRACE_SIZE (sizeof(struct ipr_trace_entry) * IPR_NUM_TRACE_ENTRIES)
char trace_start[8];
#define IPR_TRACE_START_LABEL "trace"
--
2.5.0

2015-08-24 09:25:26

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 39/82] ipr: Fix invalid array indexing for HRRQ

From: Brian King <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3f1c0581310d5d94bd72740231507e763a6252a4 upstream.

Fixes another signed / unsigned array indexing bug in the ipr driver.
Currently, when hrrq_index wraps, it becomes a negative number. We
do the modulo, but still have a negative number, so we end up indexing
backwards in the array. Given where the hrrq array is located in memory,
we probably won't actually reference memory we don't own, but nonetheless
ipr is still looking at data within struct ipr_ioa_cfg and interpreting it as
struct ipr_hrr_queue data, so bad things could certainly happen.

Each ipr adapter has anywhere from 1 to 16 HRRQs. By default, we use 2 on new
adapters. Let's take an example:

Assume ioa_cfg->hrrq_index=0x7fffffffe and ioa_cfg->hrrq_num=4:

The atomic_add_return will then return -1. We mod this with 3 and get -2, add
one and get -1 for an array index.

On adapters which support more than a single HRRQ, we dedicate HRRQ to adapter
initialization and error interrupts so that we can optimize the other queues
for fast path I/O. So all normal I/O uses HRRQ 1-15. So we want to spread the
I/O requests across those HRRQs.

With the default module parameter settings, this bug won't hit, only when
someone sets the ipr.number_of_msix parameter to a value larger than 3 is when
bad things start to happen.

Tested-by: Wen Xiong <[email protected]>
Reviewed-by: Wen Xiong <[email protected]>
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Brian King <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/scsi/ipr.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index fa55ad47cf65..0f6412db121c 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -1045,10 +1045,15 @@ static void ipr_send_blocking_cmd(struct ipr_cmnd *ipr_cmd,

static int ipr_get_hrrq_index(struct ipr_ioa_cfg *ioa_cfg)
{
+ unsigned int hrrq;
+
if (ioa_cfg->hrrq_num == 1)
- return 0;
- else
- return (atomic_add_return(1, &ioa_cfg->hrrq_index) % (ioa_cfg->hrrq_num - 1)) + 1;
+ hrrq = 0;
+ else {
+ hrrq = atomic_add_return(1, &ioa_cfg->hrrq_index);
+ hrrq = (hrrq % (ioa_cfg->hrrq_num - 1)) + 1;
+ }
+ return hrrq;
}

/**
--
2.5.0

2015-08-24 09:25:28

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 40/82] xhci: fix off by one error in TRB DMA address boundary check

From: Mathias Nyman <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7895086afde2a05fa24a0e410d8e6b75ca7c8fdd upstream.

We need to check that a TRB is part of the current segment
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:

[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

The trb-end address is one outside the end-seg address.

Tested-by: Arkadiusz Miśkiewicz <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/host/xhci-ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b16404723fc2..66deb0af258e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -86,7 +86,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg->trbs;
- if (segment_offset > TRBS_PER_SEGMENT)
+ if (segment_offset >= TRBS_PER_SEGMENT)
return 0;
return seg->dma + (segment_offset * sizeof(*trb));
}
--
2.5.0

2015-08-24 09:24:58

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 41/82] USB: sierra: add 1199:68AB device ID

From: Dirk Behme <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 74472233233f577eaa0ca6d6e17d9017b6e53150 upstream.

Add support for the Sierra Wireless AR8550 device with
USB descriptor 0x1199, 0x68AB.

It is common with MC879x modules 1199:683c/683d which
also are composite devices with 7 interfaces (0..6)
and also MDM62xx based as the AR8550.

The major difference are only the interface attributes
02/02/01 on interfaces 3 and 4 on the AR8550. They are
vendor specific ff/ff/ff on MC879x modules.

lsusb reports:

Bus 001 Device 004: ID 1199:68ab Sierra Wireless, Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1199 Sierra Wireless, Inc.
idProduct 0x68ab
bcdDevice 0.06
iManufacturer 3 Sierra Wireless, Incorporated
iProduct 2 AR8550
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 198
bNumInterfaces 7
bConfigurationValue 1
iConfiguration 1 Sierra Configuration
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x85 EP 5 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 4
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x87 EP 7 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 5
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x88 EP 8 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x89 EP 9 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 6
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8a EP 10 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8b EP 11 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x07 EP 7 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0001
Self Powered

Signed-off-by: Dirk Behme <[email protected]>
Cc: Lars Melin <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/usb/serial/sierra.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index d09a4e790892..f1b1f4b643e4 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -289,6 +289,7 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
+ { USB_DEVICE(0x1199, 0x68AB) }, /* Sierra Wireless AR8550 */
/* AT&T Direct IP LTE modems */
{ USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
--
2.5.0

2015-08-24 09:24:23

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 42/82] ima: add support for new "euid" policy condition

From: Mimi Zohar <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 139069eff7388407f19794384c42a534d618ccd7 upstream.

The new "euid" policy condition measures files with the specified
effective uid (euid). In addition, for CAP_SETUID files it measures
files with the specified uid or suid.

Changelog:
- fixed checkpatch.pl warnings
- fixed avc denied {setuid} messages - based on Roberto's feedback

Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Dr. Greg Wettstein <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
Documentation/ABI/testing/ima_policy | 3 ++-
security/integrity/ima/ima_policy.c | 27 +++++++++++++++++++++++----
2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy
index 4c3efe434806..84c6a9c1c531 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -20,7 +20,7 @@ Description:
action: measure | dont_measure | appraise | dont_appraise | audit
condition:= base | lsm [option]
base: [[func=] [mask=] [fsmagic=] [fsuuid=] [uid=]
- [fowner]]
+ [euid=] [fowner=]]
lsm: [[subj_user=] [subj_role=] [subj_type=]
[obj_user=] [obj_role=] [obj_type=]]
option: [[appraise_type=]] [permit_directio]
@@ -30,6 +30,7 @@ Description:
fsmagic:= hex value
fsuuid:= file system UUID (e.g 8bcbe394-4f13-4144-be8e-5aa9ea2ce2f6)
uid:= decimal value
+ euid:= decimal value
fowner:=decimal value
lsm: are LSM specific
option: appraise_type:= [imasig]
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 085c4964be99..bab08da010ee 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -27,6 +27,7 @@
#define IMA_UID 0x0008
#define IMA_FOWNER 0x0010
#define IMA_FSUUID 0x0020
+#define IMA_EUID 0x0080

#define UNKNOWN 0
#define MEASURE 0x0001 /* same as IMA_MEASURE */
@@ -179,6 +180,16 @@ static bool ima_match_rules(struct ima_rule_entry *rule,
return false;
if ((rule->flags & IMA_UID) && !uid_eq(rule->uid, cred->uid))
return false;
+ if (rule->flags & IMA_EUID) {
+ if (has_capability_noaudit(current, CAP_SETUID)) {
+ if (!uid_eq(rule->uid, cred->euid)
+ && !uid_eq(rule->uid, cred->suid)
+ && !uid_eq(rule->uid, cred->uid))
+ return false;
+ } else if (!uid_eq(rule->uid, cred->euid))
+ return false;
+ }
+
if ((rule->flags & IMA_FOWNER) && !uid_eq(rule->fowner, inode->i_uid))
return false;
for (i = 0; i < MAX_LSM_RULES; i++) {
@@ -350,7 +361,8 @@ enum {
Opt_audit,
Opt_obj_user, Opt_obj_role, Opt_obj_type,
Opt_subj_user, Opt_subj_role, Opt_subj_type,
- Opt_func, Opt_mask, Opt_fsmagic, Opt_uid, Opt_fowner,
+ Opt_func, Opt_mask, Opt_fsmagic,
+ Opt_uid, Opt_euid, Opt_fowner,
Opt_appraise_type, Opt_fsuuid, Opt_permit_directio
};

@@ -371,6 +383,7 @@ static match_table_t policy_tokens = {
{Opt_fsmagic, "fsmagic=%s"},
{Opt_fsuuid, "fsuuid=%s"},
{Opt_uid, "uid=%s"},
+ {Opt_euid, "euid=%s"},
{Opt_fowner, "fowner=%s"},
{Opt_appraise_type, "appraise_type=%s"},
{Opt_permit_directio, "permit_directio"},
@@ -542,6 +555,9 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
break;
case Opt_uid:
ima_log_string(ab, "uid", args[0].from);
+ case Opt_euid:
+ if (token == Opt_euid)
+ ima_log_string(ab, "euid", args[0].from);

if (uid_valid(entry->uid)) {
result = -EINVAL;
@@ -550,11 +566,14 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)

result = strict_strtoul(args[0].from, 10, &lnum);
if (!result) {
- entry->uid = make_kuid(current_user_ns(), (uid_t)lnum);
- if (!uid_valid(entry->uid) || (((uid_t)lnum) != lnum))
+ entry->uid = make_kuid(current_user_ns(),
+ (uid_t) lnum);
+ if (!uid_valid(entry->uid) ||
+ (uid_t)lnum != lnum)
result = -EINVAL;
else
- entry->flags |= IMA_UID;
+ entry->flags |= (token == Opt_uid)
+ ? IMA_UID : IMA_EUID;
}
break;
case Opt_fowner:
--
2.5.0

2015-08-24 09:24:24

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 43/82] ima: extend "mask" policy matching support

From: Mimi Zohar <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4351c294b8c1028077280f761e158d167b592974 upstream.

The current "mask" policy option matches files opened as MAY_READ,
MAY_WRITE, MAY_APPEND or MAY_EXEC. This patch extends the "mask"
option to match files opened containing one of these modes. For
example, "mask=^MAY_READ" would match files opened read-write.

Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Dr. Greg Wettstein <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
Documentation/ABI/testing/ima_policy | 3 ++-
security/integrity/ima/ima_policy.c | 20 +++++++++++++++-----
2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy
index 84c6a9c1c531..750ab970fa95 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -26,7 +26,8 @@ Description:
option: [[appraise_type=]] [permit_directio]

base: func:= [BPRM_CHECK][MMAP_CHECK][FILE_CHECK][MODULE_CHECK]
- mask:= [MAY_READ] [MAY_WRITE] [MAY_APPEND] [MAY_EXEC]
+ mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
+ [[^]MAY_EXEC]
fsmagic:= hex value
fsuuid:= file system UUID (e.g 8bcbe394-4f13-4144-be8e-5aa9ea2ce2f6)
uid:= decimal value
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index bab08da010ee..9d8e420a80d9 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -27,6 +27,7 @@
#define IMA_UID 0x0008
#define IMA_FOWNER 0x0010
#define IMA_FSUUID 0x0020
+#define IMA_INMASK 0x0040
#define IMA_EUID 0x0080

#define UNKNOWN 0
@@ -172,6 +173,9 @@ static bool ima_match_rules(struct ima_rule_entry *rule,
return false;
if ((rule->flags & IMA_MASK) && rule->mask != mask)
return false;
+ if ((rule->flags & IMA_INMASK) &&
+ (!(rule->mask & mask) && func != POST_SETATTR))
+ return false;
if ((rule->flags & IMA_FSMAGIC)
&& rule->fsmagic != inode->i_sb->s_magic)
return false;
@@ -425,6 +429,7 @@ static void ima_log_string(struct audit_buffer *ab, char *key, char *value)
static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
{
struct audit_buffer *ab;
+ char *from;
char *p;
int result = 0;

@@ -513,18 +518,23 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
if (entry->mask)
result = -EINVAL;

- if ((strcmp(args[0].from, "MAY_EXEC")) == 0)
+ from = args[0].from;
+ if (*from == '^')
+ from++;
+
+ if ((strcmp(from, "MAY_EXEC")) == 0)
entry->mask = MAY_EXEC;
- else if (strcmp(args[0].from, "MAY_WRITE") == 0)
+ else if (strcmp(from, "MAY_WRITE") == 0)
entry->mask = MAY_WRITE;
- else if (strcmp(args[0].from, "MAY_READ") == 0)
+ else if (strcmp(from, "MAY_READ") == 0)
entry->mask = MAY_READ;
- else if (strcmp(args[0].from, "MAY_APPEND") == 0)
+ else if (strcmp(from, "MAY_APPEND") == 0)
entry->mask = MAY_APPEND;
else
result = -EINVAL;
if (!result)
- entry->flags |= IMA_MASK;
+ entry->flags |= (*args[0].from == '^')
+ ? IMA_INMASK : IMA_MASK;
break;
case Opt_fsmagic:
ima_log_string(ab, "fsmagic", args[0].from);
--
2.5.0

2015-08-24 09:18:44

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 44/82] ipmi: fix timeout calculation when bmc is disconnected

From: Xie XiuQi <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit e21404dc0ac7ac971c1e36274b48bb460463f4e5 upstream.

Loading ipmi_si module while bmc is disconnected, we found the timeout
is longer than 5 secs. Actually it takes about 3 mins and 20
secs.(HZ=250)

error message as below:
Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
[...]
Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]

Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
usecs from timeout.

Reported-by: Hu Shiyuan <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/char/ipmi/ipmi_si_intf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index e5bdd1a2f541..25ed69ffd8dd 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -2783,7 +2783,7 @@ static int wait_for_msg_done(struct smi_info *smi_info)
smi_result == SI_SM_CALL_WITH_TICK_DELAY) {
schedule_timeout_uninterruptible(1);
smi_result = smi_info->handlers->event(
- smi_info->si_sm, 100);
+ smi_info->si_sm, jiffies_to_usecs(1));
} else if (smi_result == SI_SM_CALL_WITHOUT_DELAY) {
smi_result = smi_info->handlers->event(
smi_info->si_sm, 0);
--
2.5.0

2015-08-24 09:10:16

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 45/82] sparc64: Fix userspace FPU register corruptions.

From: "David S. Miller" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 44922150d87cef616fd183220d43d8fde4d41390 ]

If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
ETRAP
VIS_ENTRY(fprs=0x4)
VIS_EXIT
RTRAP (kernel FPU restore with fpu_saved=0x4)
RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations. Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state. Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/sparc/include/asm/visasm.h | 16 +++-------
arch/sparc/lib/NG4memcpy.S | 5 ++-
arch/sparc/lib/VISsave.S | 67 ++---------------------------------------
arch/sparc/lib/ksyms.c | 4 ---
4 files changed, 11 insertions(+), 81 deletions(-)

diff --git a/arch/sparc/include/asm/visasm.h b/arch/sparc/include/asm/visasm.h
index 11fdf0ef50bb..50d6f16a1513 100644
--- a/arch/sparc/include/asm/visasm.h
+++ b/arch/sparc/include/asm/visasm.h
@@ -28,16 +28,10 @@
* Must preserve %o5 between VISEntryHalf and VISExitHalf */

#define VISEntryHalf \
- rd %fprs, %o5; \
- andcc %o5, FPRS_FEF, %g0; \
- be,pt %icc, 297f; \
- sethi %hi(298f), %g7; \
- sethi %hi(VISenterhalf), %g1; \
- jmpl %g1 + %lo(VISenterhalf), %g0; \
- or %g7, %lo(298f), %g7; \
- clr %o5; \
-297: wr %o5, FPRS_FEF, %fprs; \
-298:
+ VISEntry
+
+#define VISExitHalf \
+ VISExit

#define VISEntryHalfFast(fail_label) \
rd %fprs, %o5; \
@@ -47,7 +41,7 @@
ba,a,pt %xcc, fail_label; \
297: wr %o5, FPRS_FEF, %fprs;

-#define VISExitHalf \
+#define VISExitHalfFast \
wr %o5, 0, %fprs;

#ifndef __ASSEMBLY__
diff --git a/arch/sparc/lib/NG4memcpy.S b/arch/sparc/lib/NG4memcpy.S
index 140527a20e7d..83aeeb1dffdb 100644
--- a/arch/sparc/lib/NG4memcpy.S
+++ b/arch/sparc/lib/NG4memcpy.S
@@ -240,8 +240,11 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
add %o0, 0x40, %o0
bne,pt %icc, 1b
LOAD(prefetch, %g1 + 0x200, #n_reads_strong)
+#ifdef NON_USER_COPY
+ VISExitHalfFast
+#else
VISExitHalf
-
+#endif
brz,pn %o2, .Lexit
cmp %o2, 19
ble,pn %icc, .Lsmall_unaligned
diff --git a/arch/sparc/lib/VISsave.S b/arch/sparc/lib/VISsave.S
index b320ae9e2e2e..a063d84336d6 100644
--- a/arch/sparc/lib/VISsave.S
+++ b/arch/sparc/lib/VISsave.S
@@ -44,9 +44,8 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3

stx %g3, [%g6 + TI_GSR]
2: add %g6, %g1, %g3
- cmp %o5, FPRS_DU
- be,pn %icc, 6f
- sll %g1, 3, %g1
+ mov FPRS_DU | FPRS_DL | FPRS_FEF, %o5
+ sll %g1, 3, %g1
stb %o5, [%g3 + TI_FPSAVED]
rd %gsr, %g2
add %g6, %g1, %g3
@@ -80,65 +79,3 @@ vis1: ldub [%g6 + TI_FPSAVED], %g3
.align 32
80: jmpl %g7 + %g0, %g0
nop
-
-6: ldub [%g3 + TI_FPSAVED], %o5
- or %o5, FPRS_DU, %o5
- add %g6, TI_FPREGS+0x80, %g2
- stb %o5, [%g3 + TI_FPSAVED]
-
- sll %g1, 5, %g1
- add %g6, TI_FPREGS+0xc0, %g3
- wr %g0, FPRS_FEF, %fprs
- membar #Sync
- stda %f32, [%g2 + %g1] ASI_BLK_P
- stda %f48, [%g3 + %g1] ASI_BLK_P
- membar #Sync
- ba,pt %xcc, 80f
- nop
-
- .align 32
-80: jmpl %g7 + %g0, %g0
- nop
-
- .align 32
-VISenterhalf:
- ldub [%g6 + TI_FPDEPTH], %g1
- brnz,a,pn %g1, 1f
- cmp %g1, 1
- stb %g0, [%g6 + TI_FPSAVED]
- stx %fsr, [%g6 + TI_XFSR]
- clr %o5
- jmpl %g7 + %g0, %g0
- wr %g0, FPRS_FEF, %fprs
-
-1: bne,pn %icc, 2f
- srl %g1, 1, %g1
- ba,pt %xcc, vis1
- sub %g7, 8, %g7
-2: addcc %g6, %g1, %g3
- sll %g1, 3, %g1
- andn %o5, FPRS_DU, %g2
- stb %g2, [%g3 + TI_FPSAVED]
-
- rd %gsr, %g2
- add %g6, %g1, %g3
- stx %g2, [%g3 + TI_GSR]
- add %g6, %g1, %g2
- stx %fsr, [%g2 + TI_XFSR]
- sll %g1, 5, %g1
-3: andcc %o5, FPRS_DL, %g0
- be,pn %icc, 4f
- add %g6, TI_FPREGS, %g2
-
- add %g6, TI_FPREGS+0x40, %g3
- membar #Sync
- stda %f0, [%g2 + %g1] ASI_BLK_P
- stda %f16, [%g3 + %g1] ASI_BLK_P
- membar #Sync
- ba,pt %xcc, 4f
- nop
-
- .align 32
-4: and %o5, FPRS_DU, %o5
- jmpl %g7 + %g0, %g0
- wr %o5, FPRS_FEF, %fprs
diff --git a/arch/sparc/lib/ksyms.c b/arch/sparc/lib/ksyms.c
index 323335b9cd2b..ac094de28ccf 100644
--- a/arch/sparc/lib/ksyms.c
+++ b/arch/sparc/lib/ksyms.c
@@ -126,10 +126,6 @@ EXPORT_SYMBOL(copy_user_page);
void VISenter(void);
EXPORT_SYMBOL(VISenter);

-/* CRYPTO code needs this */
-void VISenterhalf(void);
-EXPORT_SYMBOL(VISenterhalf);
-
extern void xor_vis_2(unsigned long, unsigned long *, unsigned long *);
extern void xor_vis_3(unsigned long, unsigned long *, unsigned long *,
unsigned long *);
--
2.5.0

2015-08-24 09:10:13

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 46/82] md: use kzalloc() when bitmap is disabled

From: Benjamin Randazzo <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit b6878d9e03043695dbf3fa1caa6dfc09db225b16 upstream.

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769 file = kmalloc(sizeof(*file), GFP_NOIO);
5770 if (!file)
5771 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786 if (err == 0 &&
5787 copy_to_user(arg, file, sizeof(*file)))
5788 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775 /* bitmap disabled, zero the first byte and copy out */
5776 if (!mddev->bitmap_info.file)
5777 file->pathname[0] = '\0';

Signed-off-by: Benjamin Randazzo <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/md/md.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2394b5bbeab9..1c512dc1f17f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5642,8 +5642,7 @@ static int get_bitmap_file(struct mddev * mddev, void __user * arg)
char *ptr, *buf = NULL;
int err = -ENOMEM;

- file = kmalloc(sizeof(*file), GFP_NOIO);
-
+ file = kzalloc(sizeof(*file), GFP_NOIO);
if (!file)
goto out;

--
2.5.0

2015-08-24 09:21:44

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 47/82] ASoC: pcm1681: Fix setting de-emphasis sampling rate selection

From: Axel Lin <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit fa8173a3ef0570affde7da352de202190b3786c2 upstream.

The de-emphasis sampling rate selection is controlled by BIT[3:4] of
PCM1681_DEEMPH_CONTROL register. Do proper left shift to set it.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Marek Belisko <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
sound/soc/codecs/pcm1681.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/pcm1681.c b/sound/soc/codecs/pcm1681.c
index 0819fa2ff710..a7a34613c828 100644
--- a/sound/soc/codecs/pcm1681.c
+++ b/sound/soc/codecs/pcm1681.c
@@ -101,7 +101,7 @@ static int pcm1681_set_deemph(struct snd_soc_codec *codec)

if (val != -1) {
regmap_update_bits(priv->regmap, PCM1681_DEEMPH_CONTROL,
- PCM1681_DEEMPH_RATE_MASK, val);
+ PCM1681_DEEMPH_RATE_MASK, val << 3);
enable = 1;
} else
enable = 0;
--
2.5.0

2015-08-24 09:21:41

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 48/82] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected] <[email protected]>
Cc: xen-devel <[email protected]>
Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade76ef3f..2cbc2f2cf43e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -480,6 +480,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+ unsigned char dummy;

ptep = lookup_address((unsigned long)v, &level);
BUG_ON(ptep == NULL);
@@ -489,6 +490,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)

pte = pfn_pte(pfn, prot);

+ /*
+ * Careful: update_va_mapping() will fail if the virtual address
+ * we're poking isn't populated in the page tables. We don't
+ * need to worry about the direct map (that's always in the page
+ * tables), but we need to be careful about vmap space. In
+ * particular, the top level page table can lazily propagate
+ * entries between processes, so if we've switched mms since we
+ * vmapped the target in the first place, we might not have the
+ * top-level page table entry populated.
+ *
+ * We disable preemption because we want the same mm active when
+ * we probe the target and when we issue the hypercall. We'll
+ * have the same nominal mm, but if we're a kernel thread, lazy
+ * mm dropping could change our pgd.
+ *
+ * Out of an abundance of caution, this uses __get_user() to fault
+ * in the target address just in case there's some obscure case
+ * in which the target address isn't readable.
+ */
+
+ preempt_disable();
+
+ pagefault_disable(); /* Avoid warnings due to being atomic. */
+ __get_user(dummy, (unsigned char __user __force *)v);
+ pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();

@@ -500,6 +527,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+ preempt_enable();
}

static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -507,6 +536,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;

+ /*
+ * We need to mark the all aliases of the LDT pages RO. We
+ * don't need to call vm_flush_aliases(), though, since that's
+ * only responsible for flushing aliases out the TLBs, not the
+ * page tables, and Xen will flush the TLB for us if needed.
+ *
+ * To avoid confusing future readers: none of this is necessary
+ * to load the LDT. The hypervisor only checks this when the
+ * LDT is faulted in due to subsequent descriptor access.
+ */
+
for(i = 0; i < entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
}
--
2.5.0

2015-08-24 09:21:42

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

From: Marek Marczykowski-Górecki <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 upstream.

While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.

Signed-off-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/xen/gntdev.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c986ea..f2ca8d0af55f 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -529,12 +529,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)

pr_debug("priv %p\n", priv);

+ mutex_lock(&priv->lock);
while (!list_empty(&priv->maps)) {
map = list_entry(priv->maps.next, struct grant_map, next);
list_del(&map->next);
gntdev_put_map(NULL /* already removed */, map);
}
WARN_ON(!list_empty(&priv->freeable_maps));
+ mutex_unlock(&priv->lock);

if (use_ptemod)
mmu_notifier_unregister(&priv->mn, priv->mm);
--
2.5.0

2015-08-24 09:21:43

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 50/82] crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer

From: Herbert Xu <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f898c522f0e9ac9f3177d0762b76e2ab2d2cf9c0 upstream.

This patch removes a bogus BUG_ON in the ablkcipher path that
triggers when the destination buffer is different from the source
buffer and is scattered.

Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/crypto/ixp4xx_crypto.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/ixp4xx_crypto.c b/drivers/crypto/ixp4xx_crypto.c
index 21180d6cad6e..7cb51b3bb79e 100644
--- a/drivers/crypto/ixp4xx_crypto.c
+++ b/drivers/crypto/ixp4xx_crypto.c
@@ -915,7 +915,6 @@ static int ablk_perform(struct ablkcipher_request *req, int encrypt)
crypt->mode |= NPE_OP_NOT_IN_PLACE;
/* This was never tested by Intel
* for more than one dst buffer, I think. */
- BUG_ON(req->dst->length < nbytes);
req_ctx->dst = NULL;
if (!chainup_buffers(dev, req->dst, nbytes, &dst_hook,
flags, DMA_FROM_DEVICE))
--
2.5.0

2015-08-24 09:21:39

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 51/82] ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc

From: Roger Quadros <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9a258afa928b45e6dd2efcac46ccf7eea705d35a upstream.

For hwmods without sysc, _init_mpu_rt_base(oh) won't be called and so
_find_mpu_rt_port(oh) will return NULL thus preventing ready state check
on those modules after the module is enabled.

This can potentially cause a bus access error if the module is accessed
before the module is ready.

Fix this by unconditionally calling _init_mpu_rt_base() during hwmod
_init(). Do ioremap only if we need SYSC access.

Eventhough _wait_target_ready() check doesn't really need MPU RT port but
just the PRCM registers, we still mandate that the hwmod must have an
MPU RT port if ready state check needs to be done. Else it would mean that
the module is not accessible by MPU so there is no point in waiting
for target to be ready.

e.g. this fixes the below DCAN bus access error on AM437x-gp-evm.

[ 16.672978] ------------[ cut here ]------------
[ 16.677885] WARNING: CPU: 0 PID: 1580 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x234/0x35c()
[ 16.687946] 44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_PER_0 (Read): Data Access in User mode during Functional access
[ 16.700654] Modules linked in: xhci_hcd btwilink ti_vpfe dwc3 videobuf2_core ov2659 bluetooth v4l2_common videodev ti_am335x_adc kfifo_buf industrialio c_can_platform videobuf2_dma_contig media snd_soc_tlv320aic3x pixcir_i2c_ts c_can dc
[ 16.731144] CPU: 0 PID: 1580 Comm: rpc.statd Not tainted 3.14.26-02561-gf733aa036398 #180
[ 16.739747] Backtrace:
[ 16.742336] [<c0011108>] (dump_backtrace) from [<c00112a4>] (show_stack+0x18/0x1c)
[ 16.750285] r6:00000093 r5:00000009 r4:eab5b8a8 r3:00000000
[ 16.756252] [<c001128c>] (show_stack) from [<c05a4418>] (dump_stack+0x20/0x28)
[ 16.763870] [<c05a43f8>] (dump_stack) from [<c0037120>] (warn_slowpath_common+0x6c/0x8c)
[ 16.772408] [<c00370b4>] (warn_slowpath_common) from [<c00371e4>] (warn_slowpath_fmt+0x38/0x40)
[ 16.781550] r8:c05d1f90 r7:c0730844 r6:c0730448 r5:80080003 r4:ed0cd210
[ 16.788626] [<c00371b0>] (warn_slowpath_fmt) from [<c027fa94>] (l3_interrupt_handler+0x234/0x35c)
[ 16.797968] r3:ed0cd480 r2:c0730508
[ 16.801747] [<c027f860>] (l3_interrupt_handler) from [<c0063758>] (handle_irq_event_percpu+0x54/0x1bc)
[ 16.811533] r10:ed005600 r9:c084855b r8:0000002a r7:00000000 r6:00000000 r5:0000002a
[ 16.819780] r4:ed0e6d80
[ 16.822453] [<c0063704>] (handle_irq_event_percpu) from [<c00638f0>] (handle_irq_event+0x30/0x40)
[ 16.831789] r10:eb2b6938 r9:eb2b6960 r8:bf011420 r7:fa240100 r6:00000000 r5:0000002a
[ 16.840052] r4:ed005600
[ 16.842744] [<c00638c0>] (handle_irq_event) from [<c00661d8>] (handle_fasteoi_irq+0x74/0x128)
[ 16.851702] r4:ed005600 r3:00000000
[ 16.855479] [<c0066164>] (handle_fasteoi_irq) from [<c0063068>] (generic_handle_irq+0x28/0x38)
[ 16.864523] r4:0000002a r3:c0066164
[ 16.868294] [<c0063040>] (generic_handle_irq) from [<c000ef60>] (handle_IRQ+0x38/0x8c)
[ 16.876612] r4:c081c640 r3:00000202
[ 16.880380] [<c000ef28>] (handle_IRQ) from [<c00084f0>] (gic_handle_irq+0x30/0x5c)
[ 16.888328] r6:eab5ba38 r5:c0804460 r4:fa24010c r3:00000100
[ 16.894303] [<c00084c0>] (gic_handle_irq) from [<c05a8d80>] (__irq_svc+0x40/0x50)
[ 16.902193] Exception stack(0xeab5ba38 to 0xeab5ba80)
[ 16.907499] ba20: 00000000 00000006
[ 16.916108] ba40: fa1d0000 fa1d0008 ed3d3000 eab5bab4 ed3d3460 c0842af4 bf011420 eb2b6960
[ 16.924716] ba60: eb2b6938 eab5ba8c eab5ba90 eab5ba80 bf035220 bf07702c 600f0013 ffffffff
[ 16.933317] r7:eab5ba6c r6:ffffffff r5:600f0013 r4:bf07702c
[ 16.939317] [<bf077000>] (c_can_plat_read_reg_aligned_to_16bit [c_can_platform]) from [<bf035220>] (c_can_get_berr_counter+0x38/0x64 [c_can])
[ 16.952696] [<bf0351e8>] (c_can_get_berr_counter [c_can]) from [<bf010294>] (can_fill_info+0x124/0x15c [can_dev])
[ 16.963480] r5:ec8c9740 r4:ed3d3000
[ 16.967253] [<bf010170>] (can_fill_info [can_dev]) from [<c0502fa8>] (rtnl_fill_ifinfo+0x58c/0x8fc)
[ 16.976749] r6:ec8c9740 r5:ed3d3000 r4:eb2b6780
[ 16.981613] [<c0502a1c>] (rtnl_fill_ifinfo) from [<c0503408>] (rtnl_dump_ifinfo+0xf0/0x1dc)
[ 16.990401] r10:ec8c9740 r9:00000000 r8:00000000 r7:00000000 r6:ebd4d1b4 r5:ed3d3000
[ 16.998671] r4:00000000
[ 17.001342] [<c0503318>] (rtnl_dump_ifinfo) from [<c050e6e4>] (netlink_dump+0xa8/0x1e0)
[ 17.009772] r10:00000000 r9:00000000 r8:c0503318 r7:ebf3e6c0 r6:ebd4d1b4 r5:ec8c9740
[ 17.018050] r4:ebd4d000
[ 17.020714] [<c050e63c>] (netlink_dump) from [<c050ec10>] (__netlink_dump_start+0x104/0x154)
[ 17.029591] r6:eab5bd34 r5:ec8c9980 r4:ebd4d000
[ 17.034454] [<c050eb0c>] (__netlink_dump_start) from [<c0505604>] (rtnetlink_rcv_msg+0x110/0x1f4)
[ 17.043778] r7:00000000 r6:ec8c9980 r5:00000f40 r4:ebf3e6c0
[ 17.049743] [<c05054f4>] (rtnetlink_rcv_msg) from [<c05108e8>] (netlink_rcv_skb+0xb4/0xc8)
[ 17.058449] r8:eab5bdac r7:ec8c9980 r6:c05054f4 r5:ec8c9980 r4:ebf3e6c0
[ 17.065534] [<c0510834>] (netlink_rcv_skb) from [<c0504134>] (rtnetlink_rcv+0x24/0x2c)
[ 17.073854] r6:ebd4d000 r5:00000014 r4:ec8c9980 r3:c0504110
[ 17.079846] [<c0504110>] (rtnetlink_rcv) from [<c05102ac>] (netlink_unicast+0x180/0x1ec)
[ 17.088363] r4:ed0c6800 r3:c0504110
[ 17.092113] [<c051012c>] (netlink_unicast) from [<c0510670>] (netlink_sendmsg+0x2ac/0x380)
[ 17.100813] r10:00000000 r8:00000008 r7:ec8c9980 r6:ebd4d000 r5:eab5be70 r4:eab5bee4
[ 17.109083] [<c05103c4>] (netlink_sendmsg) from [<c04dfdb4>] (sock_sendmsg+0x90/0xb0)
[ 17.117305] r10:00000000 r9:eab5a000 r8:becdda3c r7:0000000c r6:ea978400 r5:eab5be70
[ 17.125563] r4:c05103c4
[ 17.128225] [<c04dfd24>] (sock_sendmsg) from [<c04e1c28>] (SyS_sendto+0xb8/0xdc)
[ 17.136001] r6:becdda5c r5:00000014 r4:ecd37040
[ 17.140876] [<c04e1b70>] (SyS_sendto) from [<c000e680>] (ret_fast_syscall+0x0/0x30)
[ 17.148923] r10:00000000 r8:c000e804 r7:00000122 r6:becdda5c r5:0000000c r4:becdda5c
[ 17.157169] ---[ end trace 2b71e15b38f58bad ]---

Fixes: 6423d6df1440 ("ARM: OMAP2+: hwmod: check for module address space during init")
Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: Paul Walmsley <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arm/mach-omap2/omap_hwmod.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index 3d29fb972cd0..68a9bec32c9e 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -2402,6 +2402,9 @@ static struct device_node *of_dev_hwmod_lookup(struct device_node *np,
* registers. This address is needed early so the OCP registers that
* are part of the device's address space can be ioremapped properly.
*
+ * If SYSC access is not needed, the registers will not be remapped
+ * and non-availability of MPU access is not treated as an error.
+ *
* Returns 0 on success, -EINVAL if an invalid hwmod is passed, and
* -ENXIO on absent or invalid register target address space.
*/
@@ -2416,6 +2419,11 @@ static int __init _init_mpu_rt_base(struct omap_hwmod *oh, void *data)

_save_mpu_port_index(oh);

+ /* if we don't need sysc access we don't need to ioremap */
+ if (!oh->class->sysc)
+ return 0;
+
+ /* we can't continue without MPU PORT if we need sysc access */
if (oh->_int_flags & _HWMOD_NO_MPU_PORT)
return -ENXIO;

@@ -2425,8 +2433,10 @@ static int __init _init_mpu_rt_base(struct omap_hwmod *oh, void *data)
oh->name);

/* Extract the IO space from device tree blob */
- if (!of_have_populated_dt())
+ if (!of_have_populated_dt()) {
+ pr_err("omap_hwmod: %s: no dt node\n", oh->name);
return -ENXIO;
+ }

np = of_dev_hwmod_lookup(of_find_node_by_name(NULL, "ocp"), oh);
if (np)
@@ -2467,13 +2477,11 @@ static int __init _init(struct omap_hwmod *oh, void *data)
if (oh->_state != _HWMOD_STATE_REGISTERED)
return 0;

- if (oh->class->sysc) {
- r = _init_mpu_rt_base(oh, NULL);
- if (r < 0) {
- WARN(1, "omap_hwmod: %s: doesn't have mpu register target base\n",
- oh->name);
- return 0;
- }
+ r = _init_mpu_rt_base(oh, NULL);
+ if (r < 0) {
+ WARN(1, "omap_hwmod: %s: doesn't have mpu register target base\n",
+ oh->name);
+ return 0;
}

r = _init_clocks(oh, NULL);
--
2.5.0

2015-08-24 09:21:38

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 52/82] iscsi-target: Fix iscsit_start_kthreads failure OOPs

From: Nicholas Bellinger <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit e54198657b65625085834847ab6271087323ffea upstream.

This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where a iscsit_start_kthreads() failure triggers
a NULL pointer dereference OOPs:

commit 88dcd2dab5c23b1c9cfc396246d8f476c872f0ca
Author: Nicholas Bellinger <[email protected]>
Date: Thu Feb 26 22:19:15 2015 -0800

iscsi-target: Convert iscsi_thread_set usage to kthread.h

To address this bug, move iscsit_start_kthreads() immediately
preceeding the transmit of last login response, before signaling
a successful transition into full-feature-phase within existing
iscsi_target_do_tx_login_io() logic.

This ensures that no target-side resource allocation failures can
occur after the final login response has been successfully sent.

Also, it adds a iscsi_conn->rx_login_comp to allow the RX thread
to sleep to prevent other socket related failures until the final
iscsi_post_login_handler() call is able to complete.

Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/target/iscsi/iscsi_target.c | 18 ++++++++++---
drivers/target/iscsi/iscsi_target_core.h | 1 +
drivers/target/iscsi/iscsi_target_login.c | 43 ++++++++++++-------------------
drivers/target/iscsi/iscsi_target_login.h | 3 ++-
drivers/target/iscsi/iscsi_target_nego.c | 34 +++++++++++++++++++++++-
5 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 60e06fb049ef..6f3aa50699f1 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -3933,7 +3933,13 @@ get_immediate:
}

transport_err:
- iscsit_take_action_for_connection_exit(conn);
+ /*
+ * Avoid the normal connection failure code-path if this connection
+ * is still within LOGIN mode, and iscsi_np process context is
+ * responsible for cleaning up the early connection failure.
+ */
+ if (conn->conn_state != TARG_CONN_STATE_IN_LOGIN)
+ iscsit_take_action_for_connection_exit(conn);
out:
return 0;
}
@@ -4019,7 +4025,7 @@ reject:

int iscsi_target_rx_thread(void *arg)
{
- int ret;
+ int ret, rc;
u8 buffer[ISCSI_HDR_LEN], opcode;
u32 checksum = 0, digest = 0;
struct iscsi_conn *conn = arg;
@@ -4029,10 +4035,16 @@ int iscsi_target_rx_thread(void *arg)
* connection recovery / failure event can be triggered externally.
*/
allow_signal(SIGINT);
+ /*
+ * Wait for iscsi_post_login_handler() to complete before allowing
+ * incoming iscsi/tcp socket I/O, and/or failing the connection.
+ */
+ rc = wait_for_completion_interruptible(&conn->rx_login_comp);
+ if (rc < 0)
+ return 0;

if (conn->conn_transport->transport_type == ISCSI_INFINIBAND) {
struct completion comp;
- int rc;

init_completion(&comp);
rc = wait_for_completion_interruptible(&comp);
diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 1c232c509dae..cc0cb60eb7bd 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -604,6 +604,7 @@ struct iscsi_conn {
int bitmap_id;
int rx_thread_active;
struct task_struct *rx_thread;
+ struct completion rx_login_comp;
int tx_thread_active;
struct task_struct *tx_thread;
/* list_head for session connection list */
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 9d5762011413..899b756fe290 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -83,6 +83,7 @@ static struct iscsi_login *iscsi_login_init_conn(struct iscsi_conn *conn)
init_completion(&conn->conn_logout_comp);
init_completion(&conn->rx_half_close_comp);
init_completion(&conn->tx_half_close_comp);
+ init_completion(&conn->rx_login_comp);
spin_lock_init(&conn->cmd_lock);
spin_lock_init(&conn->conn_usage_lock);
spin_lock_init(&conn->immed_queue_lock);
@@ -717,6 +718,7 @@ int iscsit_start_kthreads(struct iscsi_conn *conn)

return 0;
out_tx:
+ send_sig(SIGINT, conn->tx_thread, 1);
kthread_stop(conn->tx_thread);
conn->tx_thread_active = false;
out_bitmap:
@@ -727,7 +729,7 @@ out_bitmap:
return ret;
}

-int iscsi_post_login_handler(
+void iscsi_post_login_handler(
struct iscsi_np *np,
struct iscsi_conn *conn,
u8 zero_tsih)
@@ -737,7 +739,6 @@ int iscsi_post_login_handler(
struct se_session *se_sess = sess->se_sess;
struct iscsi_portal_group *tpg = ISCSI_TPG_S(sess);
struct se_portal_group *se_tpg = &tpg->tpg_se_tpg;
- int rc;

iscsit_inc_conn_usage_count(conn);

@@ -778,10 +779,6 @@ int iscsi_post_login_handler(
sess->sess_ops->InitiatorName);
spin_unlock_bh(&sess->conn_lock);

- rc = iscsit_start_kthreads(conn);
- if (rc)
- return rc;
-
iscsi_post_login_start_timers(conn);
/*
* Determine CPU mask to ensure connection's RX and TX kthreads
@@ -790,15 +787,20 @@ int iscsi_post_login_handler(
iscsit_thread_get_cpumask(conn);
conn->conn_rx_reset_cpumask = 1;
conn->conn_tx_reset_cpumask = 1;
-
+ /*
+ * Wakeup the sleeping iscsi_target_rx_thread() now that
+ * iscsi_conn is in TARG_CONN_STATE_LOGGED_IN state.
+ */
+ complete(&conn->rx_login_comp);
iscsit_dec_conn_usage_count(conn);
+
if (stop_timer) {
spin_lock_bh(&se_tpg->session_lock);
iscsit_stop_time2retain_timer(sess);
spin_unlock_bh(&se_tpg->session_lock);
}
iscsit_dec_session_usage_count(sess);
- return 0;
+ return;
}

iscsi_set_session_parameters(sess->sess_ops, conn->param_list, 1);
@@ -839,10 +841,6 @@ int iscsi_post_login_handler(
" iSCSI Target Portal Group: %hu\n", tpg->nsessions, tpg->tpgt);
spin_unlock_bh(&se_tpg->session_lock);

- rc = iscsit_start_kthreads(conn);
- if (rc)
- return rc;
-
iscsi_post_login_start_timers(conn);
/*
* Determine CPU mask to ensure connection's RX and TX kthreads
@@ -851,10 +849,12 @@ int iscsi_post_login_handler(
iscsit_thread_get_cpumask(conn);
conn->conn_rx_reset_cpumask = 1;
conn->conn_tx_reset_cpumask = 1;
-
+ /*
+ * Wakeup the sleeping iscsi_target_rx_thread() now that
+ * iscsi_conn is in TARG_CONN_STATE_LOGGED_IN state.
+ */
+ complete(&conn->rx_login_comp);
iscsit_dec_conn_usage_count(conn);
-
- return 0;
}

static void iscsi_handle_login_thread_timeout(unsigned long data)
@@ -1419,23 +1419,12 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
if (ret < 0)
goto new_sess_out;

- if (!conn->sess) {
- pr_err("struct iscsi_conn session pointer is NULL!\n");
- goto new_sess_out;
- }
-
iscsi_stop_login_thread_timer(np);

- if (signal_pending(current))
- goto new_sess_out;
-
if (ret == 1) {
tpg_np = conn->tpg_np;

- ret = iscsi_post_login_handler(np, conn, zero_tsih);
- if (ret < 0)
- goto new_sess_out;
-
+ iscsi_post_login_handler(np, conn, zero_tsih);
iscsit_deaccess_np(np, tpg, tpg_np);
}

diff --git a/drivers/target/iscsi/iscsi_target_login.h b/drivers/target/iscsi/iscsi_target_login.h
index 29d098324b7f..55cbf4533544 100644
--- a/drivers/target/iscsi/iscsi_target_login.h
+++ b/drivers/target/iscsi/iscsi_target_login.h
@@ -12,7 +12,8 @@ extern int iscsit_accept_np(struct iscsi_np *, struct iscsi_conn *);
extern int iscsit_get_login_rx(struct iscsi_conn *, struct iscsi_login *);
extern int iscsit_put_login_tx(struct iscsi_conn *, struct iscsi_login *, u32);
extern void iscsit_free_conn(struct iscsi_np *, struct iscsi_conn *);
-extern int iscsi_post_login_handler(struct iscsi_np *, struct iscsi_conn *, u8);
+extern int iscsit_start_kthreads(struct iscsi_conn *);
+extern void iscsi_post_login_handler(struct iscsi_np *, struct iscsi_conn *, u8);
extern void iscsi_target_login_sess_out(struct iscsi_conn *, struct iscsi_np *,
bool, bool);
extern int iscsi_target_login_thread(void *);
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index 76dc32fc5e1b..a801cad91742 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -17,6 +17,7 @@
******************************************************************************/

#include <linux/ctype.h>
+#include <linux/kthread.h>
#include <scsi/iscsi_proto.h>
#include <target/target_core_base.h>
#include <target/target_core_fabric.h>
@@ -361,10 +362,24 @@ static int iscsi_target_do_tx_login_io(struct iscsi_conn *conn, struct iscsi_log
ntohl(login_rsp->statsn), login->rsp_length);

padding = ((-login->rsp_length) & 3);
+ /*
+ * Before sending the last login response containing the transition
+ * bit for full-feature-phase, go ahead and start up TX/RX threads
+ * now to avoid potential resource allocation failures after the
+ * final login response has been sent.
+ */
+ if (login->login_complete) {
+ int rc = iscsit_start_kthreads(conn);
+ if (rc) {
+ iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
+ ISCSI_LOGIN_STATUS_NO_RESOURCES);
+ return -1;
+ }
+ }

if (conn->conn_transport->iscsit_put_login_tx(conn, login,
login->rsp_length + padding) < 0)
- return -1;
+ goto err;

login->rsp_length = 0;
mutex_lock(&sess->cmdsn_mutex);
@@ -373,6 +388,23 @@ static int iscsi_target_do_tx_login_io(struct iscsi_conn *conn, struct iscsi_log
mutex_unlock(&sess->cmdsn_mutex);

return 0;
+
+err:
+ if (login->login_complete) {
+ if (conn->rx_thread && conn->rx_thread_active) {
+ send_sig(SIGINT, conn->rx_thread, 1);
+ kthread_stop(conn->rx_thread);
+ }
+ if (conn->tx_thread && conn->tx_thread_active) {
+ send_sig(SIGINT, conn->tx_thread, 1);
+ kthread_stop(conn->tx_thread);
+ }
+ spin_lock(&iscsit_global->ts_bitmap_lock);
+ bitmap_release_region(iscsit_global->ts_bitmap, conn->bitmap_id,
+ get_order(1));
+ spin_unlock(&iscsit_global->ts_bitmap_lock);
+ }
+ return -1;
}

static void iscsi_target_sk_data_ready(struct sock *sk, int count)
--
2.5.0

2015-08-24 09:21:19

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 53/82] ALSA: hda - fix cs4210_spdif_automute()

From: Dan Carpenter <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 44008f0896ae205b02b0882dbf807f0de149efc4 upstream.

Smatch complains that we have nested checks for "spdif_present". It
turns out the current behavior isn't correct, we should remove the first
check and keep the second.

Fixes: 1077a024812d ('ALSA: hda - Use generic parser for Cirrus codec driver')
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
sound/pci/hda/patch_cirrus.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/pci/hda/patch_cirrus.c b/sound/pci/hda/patch_cirrus.c
index ab0d0a384c15..d54d218fe810 100644
--- a/sound/pci/hda/patch_cirrus.c
+++ b/sound/pci/hda/patch_cirrus.c
@@ -948,9 +948,7 @@ static void cs4210_spdif_automute(struct hda_codec *codec,

spec->spdif_present = spdif_present;
/* SPDIF TX on/off */
- if (spdif_present)
- snd_hda_set_pin_ctl(codec, spdif_pin,
- spdif_present ? PIN_OUT : 0);
+ snd_hda_set_pin_ctl(codec, spdif_pin, spdif_present ? PIN_OUT : 0);

cs_automute(codec);
}
--
2.5.0

2015-08-24 09:18:46

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 54/82] ipc: modify message queue accounting to not take kernel data structures into account

From: Marcus Gelderie <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit de54b9ac253787c366bbfb28d901a31954eb3511 upstream.

A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859b36d ("ipc/mqueue: improve performance of send/recv").

That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue. Before, this field
reflected the size of the user-data in the queue. Since, it also takes
kernel data structures into account. For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):

The pseudofiles in the mqueue filesystem (usually mounted at
/dev/mqueue) expose fields with metadata describing a message
queue. One of these fields, QSIZE, as originally implemented,
showed the total number of bytes of user data in all messages in
the message queue, and this feature was documented from the
beginning in the mq_overview(7) page. In 3.5, some other (useful)
work happened to break the user-space API in a couple of places,
including the value exposed via QSIZE, which now includes a measure
of kernel overhead bytes for the queue, a figure that renders QSIZE
useless for its original purpose, since there's no way to deduce
the number of overhead bytes consumed by the implementation.
(The other user-space breakage was subsequently fixed.)

This patch removes the accounting of kernel data structures in the
queue. Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above). Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation). Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.

Signed-off-by: Marcus Gelderie <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Acked-by: Michael Kerrisk <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: David Howells <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: John Duffy <[email protected]>
Cc: Arto Bendiken <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
ipc/mqueue.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index bb0248fc5187..82bb5e81ef57 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -143,7 +143,6 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info)
if (!leaf)
return -ENOMEM;
INIT_LIST_HEAD(&leaf->msg_list);
- info->qsize += sizeof(*leaf);
}
leaf->priority = msg->m_type;
rb_link_node(&leaf->rb_node, parent, p);
@@ -188,7 +187,6 @@ try_again:
"lazy leaf delete!\n");
rb_erase(&leaf->rb_node, &info->msg_tree);
if (info->node_cache) {
- info->qsize -= sizeof(*leaf);
kfree(leaf);
} else {
info->node_cache = leaf;
@@ -201,7 +199,6 @@ try_again:
if (list_empty(&leaf->msg_list)) {
rb_erase(&leaf->rb_node, &info->msg_tree);
if (info->node_cache) {
- info->qsize -= sizeof(*leaf);
kfree(leaf);
} else {
info->node_cache = leaf;
@@ -1026,7 +1023,6 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
/* Save our speculative allocation into the cache */
INIT_LIST_HEAD(&new_leaf->msg_list);
info->node_cache = new_leaf;
- info->qsize += sizeof(*new_leaf);
new_leaf = NULL;
} else {
kfree(new_leaf);
@@ -1133,7 +1129,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
/* Save our speculative allocation into the cache */
INIT_LIST_HEAD(&new_leaf->msg_list);
info->node_cache = new_leaf;
- info->qsize += sizeof(*new_leaf);
} else {
kfree(new_leaf);
}
--
2.5.0

2015-08-24 09:18:42

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 55/82] ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()

From: Joseph Qi <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 209f7512d007980fd111a74a064d70a3656079cf upstream.

The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:

ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres. During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing. And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.

Signed-off-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
fs/ocfs2/dlmglue.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 3988d0aeb72c..416a2ab68ac1 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -4009,9 +4009,13 @@ static void ocfs2_downconvert_thread_do_work(struct ocfs2_super *osb)
osb->dc_work_sequence = osb->dc_wake_sequence;

processed = osb->blocked_lock_count;
- while (processed) {
- BUG_ON(list_empty(&osb->blocked_lock_list));
-
+ /*
+ * blocked lock processing in this loop might call iput which can
+ * remove items off osb->blocked_lock_list. Downconvert up to
+ * 'processed' number of locks, but stop short if we had some
+ * removed in ocfs2_mark_lockres_freeing when downconverting.
+ */
+ while (processed && !list_empty(&osb->blocked_lock_list)) {
lockres = list_entry(osb->blocked_lock_list.next,
struct ocfs2_lock_res, l_blocked_list);
list_del_init(&lockres->l_blocked_list);
--
2.5.0

2015-08-24 09:18:41

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 56/82] md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

From: NeilBrown <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 423f04d63cf421ea436bcc5be02543d549ce4b28 upstream.

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/md/raid1.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 14934ac112c3..1cb7642c1ba9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1382,6 +1382,7 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
{
char b[BDEVNAME_SIZE];
struct r1conf *conf = mddev->private;
+ unsigned long flags;

/*
* If it is not operational, then we have already marked it as dead
@@ -1401,14 +1402,13 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
return;
}
set_bit(Blocked, &rdev->flags);
+ spin_lock_irqsave(&conf->device_lock, flags);
if (test_and_clear_bit(In_sync, &rdev->flags)) {
- unsigned long flags;
- spin_lock_irqsave(&conf->device_lock, flags);
mddev->degraded++;
set_bit(Faulty, &rdev->flags);
- spin_unlock_irqrestore(&conf->device_lock, flags);
} else
set_bit(Faulty, &rdev->flags);
+ spin_unlock_irqrestore(&conf->device_lock, flags);
/*
* if recovery is running, make sure it aborts.
*/
@@ -1466,7 +1466,10 @@ static int raid1_spare_active(struct mddev *mddev)
* Find all failed disks within the RAID1 configuration
* and mark them readable.
* Called under mddev lock, so rcu protection not needed.
+ * device_lock used to avoid races with raid1_end_read_request
+ * which expects 'In_sync' flags and ->degraded to be consistent.
*/
+ spin_lock_irqsave(&conf->device_lock, flags);
for (i = 0; i < conf->raid_disks; i++) {
struct md_rdev *rdev = conf->mirrors[i].rdev;
struct md_rdev *repl = conf->mirrors[conf->raid_disks + i].rdev;
@@ -1496,7 +1499,6 @@ static int raid1_spare_active(struct mddev *mddev)
sysfs_notify_dirent_safe(rdev->sysfs_state);
}
}
- spin_lock_irqsave(&conf->device_lock, flags);
mddev->degraded -= count;
spin_unlock_irqrestore(&conf->device_lock, flags);

--
2.5.0

2015-08-24 09:17:40

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 57/82] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream.

32-bit kernels handle nested NMIs in C. Enable the exact same
handling on 64-bit kernels as well. This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/nmi.c | 123 +++++++++++++++++++++-----------------------------
1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49ce50a1..b82e0fdc7edb 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,15 +392,15 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
}

/*
- * NMIs can hit breakpoints which will cause it to lose its
- * NMI context with the CPU when the breakpoint does an iret.
- */
-#ifdef CONFIG_X86_32
-/*
- * For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C (preventing nested
- * NMIs if an NMI takes a trap). Simply have 3 states the NMI
- * can be in:
+ * NMIs can hit breakpoints which will cause it to lose its NMI context
+ * with the CPU when the breakpoint or page fault does an IRET.
+ *
+ * As a result, NMIs can nest if NMIs get unmasked due an IRET during
+ * NMI processing. On x86_64, the asm glue protects us from nested NMIs
+ * if the outer NMI came from kernel mode, but we can still nest if the
+ * outer NMI came from user mode.
+ *
+ * To handle these nested NMIs, we have three states:
*
* 1) not running
* 2) executing
@@ -414,15 +414,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
* (Note, the latch is binary, thus multiple NMIs triggering,
* when one is running, are ignored. Only one NMI is restarted.)
*
- * If an NMI hits a breakpoint that executes an iret, another
- * NMI can preempt it. We do not want to allow this new NMI
- * to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the exit of the first NMI will
- * perform a dec_return, if the result is zero (NOT_RUNNING), then
- * it will simply exit the NMI handler. If not, the dec_return
- * would have set the state to NMI_EXECUTING (what we want it to
- * be when we are running). In this case, we simply jump back
- * to rerun the NMI handler again, and restart the 'latched' NMI.
+ * If an NMI executes an iret, another NMI can preempt it. We do not
+ * want to allow this new NMI to run, but we want to execute it when the
+ * first one finishes. We set the state to "latched", and the exit of
+ * the first NMI will perform a dec_return, if the result is zero
+ * (NOT_RUNNING), then it will simply exit the NMI handler. If not, the
+ * dec_return would have set the state to NMI_EXECUTING (what we want it
+ * to be when we are running). In this case, we simply jump back to
+ * rerun the NMI handler again, and restart the 'latched' NMI.
*
* No trap (breakpoint or page fault) should be hit before nmi_restart,
* thus there is no race between the first check of state for NOT_RUNNING
@@ -445,49 +444,36 @@ enum nmi_states {
static DEFINE_PER_CPU(enum nmi_states, nmi_state);
static DEFINE_PER_CPU(unsigned long, nmi_cr2);

-#define nmi_nesting_preprocess(regs) \
- do { \
- if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) { \
- this_cpu_write(nmi_state, NMI_LATCHED); \
- return; \
- } \
- this_cpu_write(nmi_state, NMI_EXECUTING); \
- this_cpu_write(nmi_cr2, read_cr2()); \
- } while (0); \
- nmi_restart:
-
-#define nmi_nesting_postprocess() \
- do { \
- if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \
- write_cr2(this_cpu_read(nmi_cr2)); \
- if (this_cpu_dec_return(nmi_state)) \
- goto nmi_restart; \
- } while (0)
-#else /* x86_64 */
+#ifdef CONFIG_X86_64
/*
- * In x86_64 things are a bit more difficult. This has the same problem
- * where an NMI hitting a breakpoint that calls iret will remove the
- * NMI context, allowing a nested NMI to enter. What makes this more
- * difficult is that both NMIs and breakpoints have their own stack.
- * When a new NMI or breakpoint is executed, the stack is set to a fixed
- * point. If an NMI is nested, it will have its stack set at that same
- * fixed address that the first NMI had, and will start corrupting the
- * stack. This is handled in entry_64.S, but the same problem exists with
- * the breakpoint stack.
+ * In x86_64, we need to handle breakpoint -> NMI -> breakpoint. Without
+ * some care, the inner breakpoint will clobber the outer breakpoint's
+ * stack.
*
- * If a breakpoint is being processed, and the debug stack is being used,
- * if an NMI comes in and also hits a breakpoint, the stack pointer
- * will be set to the same fixed address as the breakpoint that was
- * interrupted, causing that stack to be corrupted. To handle this case,
- * check if the stack that was interrupted is the debug stack, and if
- * so, change the IDT so that new breakpoints will use the current stack
- * and not switch to the fixed address. On return of the NMI, switch back
- * to the original IDT.
+ * If a breakpoint is being processed, and the debug stack is being
+ * used, if an NMI comes in and also hits a breakpoint, the stack
+ * pointer will be set to the same fixed address as the breakpoint that
+ * was interrupted, causing that stack to be corrupted. To handle this
+ * case, check if the stack that was interrupted is the debug stack, and
+ * if so, change the IDT so that new breakpoints will use the current
+ * stack and not switch to the fixed address. On return of the NMI,
+ * switch back to the original IDT.
*/
static DEFINE_PER_CPU(int, update_debug_stack);
+#endif

-static inline void nmi_nesting_preprocess(struct pt_regs *regs)
+dotraplinkage notrace __kprobes void
+do_nmi(struct pt_regs *regs, long error_code)
{
+ if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
+ this_cpu_write(nmi_state, NMI_LATCHED);
+ return;
+ }
+ this_cpu_write(nmi_state, NMI_EXECUTING);
+ this_cpu_write(nmi_cr2, read_cr2());
+nmi_restart:
+
+#ifdef CONFIG_X86_64
/*
* If we interrupted a breakpoint, it is possible that
* the nmi handler will have breakpoints too. We need to
@@ -498,22 +484,8 @@ static inline void nmi_nesting_preprocess(struct pt_regs *regs)
debug_stack_set_zero();
this_cpu_write(update_debug_stack, 1);
}
-}
-
-static inline void nmi_nesting_postprocess(void)
-{
- if (unlikely(this_cpu_read(update_debug_stack))) {
- debug_stack_reset();
- this_cpu_write(update_debug_stack, 0);
- }
-}
#endif

-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
- nmi_nesting_preprocess(regs);
-
nmi_enter();

inc_irq_stat(__nmi_count);
@@ -523,8 +495,17 @@ do_nmi(struct pt_regs *regs, long error_code)

nmi_exit();

- /* On i386, may loop back to preprocess */
- nmi_nesting_postprocess();
+#ifdef CONFIG_X86_64
+ if (unlikely(this_cpu_read(update_debug_stack))) {
+ debug_stack_reset();
+ this_cpu_write(update_debug_stack, 0);
+ }
+#endif
+
+ if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+ write_cr2(this_cpu_read(nmi_cr2));
+ if (this_cpu_dec_return(nmi_state))
+ goto nmi_restart;
}

void stop_nmi(void)
--
2.5.0

2015-08-24 09:17:38

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 58/82] x86/nmi/64: Remove asm code that saves CR2

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0e181bb58143cb4a2e8f01c281b0816cd0e4798e upstream.

Now that do_nmi saves CR2, we don't need to save it in asm.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/entry_64.S | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7b22af265d12..fd27e5b86d7b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1876,29 +1876,11 @@ end_repeat_nmi:
call save_paranoid
DEFAULT_FRAME 0

- /*
- * Save off the CR2 register. If we take a page fault in the NMI then
- * it could corrupt the CR2 value. If the NMI preempts a page fault
- * handler before it was able to read the CR2 register, and then the
- * NMI itself takes a page fault, the page fault that was preempted
- * will read the information from the NMI page fault and not the
- * origin fault. Save it off and restore it if it changes.
- * Use the r12 callee-saved register.
- */
- movq %cr2, %r12
-
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp,%rdi
movq $-1,%rsi
call do_nmi

- /* Did the NMI take a page fault? Restore cr2 if it did */
- movq %cr2, %rcx
- cmpq %rcx, %r12
- je 1f
- movq %r12, %cr2
-1:
-
testl %ebx,%ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
--
2.5.0

2015-08-24 09:17:21

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 59/82] x86/nmi/64: Switch stacks on userspace NMI entry

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET. Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/entry_64.S | 65 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index fd27e5b86d7b..691337073e1f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1706,19 +1706,76 @@ ENTRY(nmi)
* a nested NMI that updated the copy interrupt stack frame, a
* jump will be made to the repeat_nmi code that will handle the second
* NMI.
+ *
+ * However, espfix prevents us from directly returning to userspace
+ * with a single IRET instruction. Similarly, IRET to user mode
+ * can fault. We therefore handle NMIs from user space like
+ * other IST entries.
*/

/* Use %rdx as out temp variable throughout */
pushq_cfi %rdx
CFI_REL_OFFSET rdx, 0

+ testb $3, CS-RIP+8(%rsp)
+ jz .Lnmi_from_kernel
+
+ /*
+ * NMI from user mode. We need to run on the thread stack, but we
+ * can't go through the normal entry paths: NMIs are masked, and
+ * we don't want to enable interrupts, because then we'll end
+ * up in an awkward situation in which IRQs are on but NMIs
+ * are off.
+ */
+
+ SWAPGS
+ cld
+ movq %rsp, %rdx
+ movq PER_CPU_VAR(kernel_stack), %rsp
+ addq $KERNEL_STACK_OFFSET, %rsp
+ pushq 5*8(%rdx) /* pt_regs->ss */
+ pushq 4*8(%rdx) /* pt_regs->rsp */
+ pushq 3*8(%rdx) /* pt_regs->flags */
+ pushq 2*8(%rdx) /* pt_regs->cs */
+ pushq 1*8(%rdx) /* pt_regs->rip */
+ pushq $-1 /* pt_regs->orig_ax */
+ pushq %rdi /* pt_regs->di */
+ pushq %rsi /* pt_regs->si */
+ pushq (%rdx) /* pt_regs->dx */
+ pushq %rcx /* pt_regs->cx */
+ pushq %rax /* pt_regs->ax */
+ pushq %r8 /* pt_regs->r8 */
+ pushq %r9 /* pt_regs->r9 */
+ pushq %r10 /* pt_regs->r10 */
+ pushq %r11 /* pt_regs->r11 */
+ pushq %rbx /* pt_regs->rbx */
+ pushq %rbp /* pt_regs->rbp */
+ pushq %r12 /* pt_regs->r12 */
+ pushq %r13 /* pt_regs->r13 */
+ pushq %r14 /* pt_regs->r14 */
+ pushq %r15 /* pt_regs->r15 */
+
/*
- * If %cs was not the kernel segment, then the NMI triggered in user
- * space, which means it is definitely not nested.
+ * At this point we no longer need to worry about stack damage
+ * due to nesting -- we're on the normal thread stack and we're
+ * done with the NMI stack.
*/
- cmpl $__KERNEL_CS, 16(%rsp)
- jne first_nmi

+ movq %rsp, %rdi
+ movq $-1, %rsi
+ call do_nmi
+
+ /*
+ * Return back to user mode. We must *not* do the normal exit
+ * work, because we don't want to enable interrupts. Fortunately,
+ * do_nmi doesn't modify pt_regs.
+ */
+ SWAPGS
+
+ addq $6*8, %rsp /* skip bx, bp, and r12-r15 */
+ jmp restore_args
+
+.Lnmi_from_kernel:
/*
* Check the special variable on the stack to see if NMIs are
* executing.
--
2.5.0

2015-08-24 09:10:29

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 60/82] arch: Introduce smp_load_acquire(), smp_store_release()

From: Peter Zijlstra <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream.

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads. Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation. The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes. These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability. It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits. It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Russell King <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Victor Kaplansky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arm/include/asm/barrier.h | 15 +++++++++++
arch/arm64/include/asm/barrier.h | 50 +++++++++++++++++++++++++++++++++++++
arch/ia64/include/asm/barrier.h | 23 +++++++++++++++++
arch/metag/include/asm/barrier.h | 15 +++++++++++
arch/mips/include/asm/barrier.h | 15 +++++++++++
arch/powerpc/include/asm/barrier.h | 21 +++++++++++++++-
arch/s390/include/asm/barrier.h | 15 +++++++++++
arch/sparc/include/asm/barrier_64.h | 15 +++++++++++
arch/x86/include/asm/barrier.h | 43 ++++++++++++++++++++++++++++++-
include/asm-generic/barrier.h | 15 +++++++++++
include/linux/compiler.h | 9 +++++++
11 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 60f15e274e6d..2f59f7443396 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
#define smp_wmb() dmb(ishst)
#endif

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#define read_barrier_depends() do { } while(0)
#define smp_read_barrier_depends() do { } while(0)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a63338a53c..78e20ba8806b 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#else
+
#define smp_mb() asm volatile("dmb ish" : : : "memory")
#define smp_rmb() asm volatile("dmb ishld" : : : "memory")
#define smp_wmb() asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ switch (sizeof(*p)) { \
+ case 4: \
+ asm volatile ("stlr %w1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
+ case 8: \
+ asm volatile ("stlr %1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
+ } \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1; \
+ compiletime_assert_atomic_type(*p); \
+ switch (sizeof(*p)) { \
+ case 4: \
+ asm volatile ("ldar %w0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
+ case 8: \
+ asm volatile ("ldar %0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
+ } \
+ ___p1; \
+})
+
#endif

#define read_barrier_depends() do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 60576e06b6fb..d0a69aa35e27 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
# define smp_rmb() rmb()
# define smp_wmb() wmb()
# define smp_read_barrier_depends() read_barrier_depends()
+
#else
+
# define smp_mb() barrier()
# define smp_rmb() barrier()
# define smp_wmb() barrier()
# define smp_read_barrier_depends() do { } while(0)
+
#endif

/*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
+/*
* XXX check on this ---I suspect what Linus really wants here is
* acquire vs release semantics but we can't discuss this stuff with
* Linus just yet. Grrr...
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index e355a4c10968..2d6f0de77325 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,4 +85,19 @@ static inline void fence(void)
#define smp_read_barrier_depends() do { } while (0)
#define set_mb(var, value) do { var = value; smp_mb(); } while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 314ab5532019..52c5b61d7aba 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
#define nudge_writes() mb()
#endif

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index ae782254e731..f89da808ce31 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
# define SMPWMB eieio
#endif

+#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
#define smp_mb() mb()
-#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb() __lwsync()
#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
#define smp_read_barrier_depends() read_barrier_depends()
#else
+#define __lwsync() barrier()
+
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
@@ -65,4 +69,19 @@
#define data_barrier(x) \
asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ___p1; \
+})
+
#endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 16760eeb79b0..578680f6207a 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@

#define set_mb(var, value) do { var = value; mb(); } while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 95d45986f908..b5aad964558e 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \

#define smp_read_barrier_depends() do { } while(0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index c6cd358a1eec..04a48903b2eb 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
#endif
#define smp_read_barrier_depends() read_barrier_depends()
#define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#define smp_read_barrier_depends() do { } while (0)
#define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif

/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 639d7a4d033b..01613b382b0e 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -46,5 +46,20 @@
#define read_barrier_depends() do {} while (0)
#define smp_read_barrier_depends() do {} while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index a2329c5e6206..2472740d7ab2 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -302,6 +302,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
#endif

+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
/* Compile time object size, -1 for unknown */
#ifndef __compiletime_object_size
# define __compiletime_object_size(obj) -1
@@ -341,6 +346,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#define compiletime_assert(condition, msg) \
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

+#define compiletime_assert_atomic_type(t) \
+ compiletime_assert(__native_word(t), \
+ "Need native word sized stores/loads for atomicity.")
+
/*
* Prevent the compiler from merging or refetching accesses. The compiler
* is also forbidden from reordering successive instances of ACCESS_ONCE(),
--
2.5.0

2015-08-24 09:15:43

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 61/82] rcu: Provide counterpart to rcu_dereference() for non-RCU situations

From: "Paul E. McKenney" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 54ef6df3f3f1353d99c80c437259d317b2cd1cbd upstream.

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/rcupdate.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f1f1bc39346b..4912c953cab7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -554,6 +554,20 @@ static inline void rcu_preempt_sleep_check(void)
(p) = (typeof(*v) __force space *)(v); \
} while (0)

+/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})

/**
* rcu_access_pointer() - fetch RCU pointer with no dereferencing
--
2.5.0

2015-08-24 09:10:34

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 62/82] rcu: Move lockless_dereference() out of rcupdate.h

From: Peter Zijlstra <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 0a04b0166929405cd833c1cc40f99e862b965ddc upstream.

I want to use lockless_dereference() from seqlock.h, which would mean
including rcupdate.h from it, however rcupdate.h already includes
seqlock.h.

Avoid this by moving lockless_dereference() into compiler.h. This is
somewhat tricky since it uses smp_read_barrier_depends() which isn't
available there, but its a CPP macro so we can get away with it.

The alternative would be moving it into asm/barrier.h, but that would
be updating each arch (I can do if people feel that is more
appropriate).

Cc: Paul McKenney <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/compiler.h | 15 +++++++++++++++
include/linux/rcupdate.h | 15 ---------------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 2472740d7ab2..4a3caa61a002 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -364,6 +364,21 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
*/
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})
+
/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4912c953cab7..965725f957d9 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -555,21 +555,6 @@ static inline void rcu_preempt_sleep_check(void)
} while (0)

/**
- * lockless_dereference() - safely load a pointer for later dereference
- * @p: The pointer to load
- *
- * Similar to rcu_dereference(), but for situations where the pointed-to
- * object's lifetime is managed by something other than RCU. That
- * "something other" might be reference counting or simple immortality.
- */
-#define lockless_dereference(p) \
-({ \
- typeof(p) _________p1 = ACCESS_ONCE(p); \
- smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
- (_________p1); \
-})
-
-/**
* rcu_access_pointer() - fetch RCU pointer with no dereferencing
* @p: The pointer to read
*
--
2.5.0

2015-08-24 09:10:33

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 63/82] x86/ldt: Make modify_ldt synchronous

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

modify_ldt() has questionable locking and does not synchronize
threads. Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/include/asm/desc.h | 15 ---
arch/x86/include/asm/mmu.h | 3 +-
arch/x86/include/asm/mmu_context.h | 48 ++++++-
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/cpu/perf_event.c | 13 +-
arch/x86/kernel/ldt.c | 262 ++++++++++++++++++++-----------------
arch/x86/kernel/process_64.c | 4 +-
arch/x86/kernel/step.c | 6 +-
arch/x86/power/cpu.c | 3 +-
9 files changed, 208 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index f6aaf7d16571..cb69cd0ba8c7 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
}

-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
- set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
- preempt_disable();
- load_LDT_nolock(pc);
- preempt_enable();
-}
-
static inline unsigned long get_desc_base(const struct desc_struct *desc)
{
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5f55e6962769..926f67263287 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
* we put the segment information here.
*/
typedef struct {
- void *ldt;
- int size;
+ struct ldt_struct *ldt;

#ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index be12c534fd59..86fef96f4eca 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,6 +16,50 @@ static inline void paravirt_activate_mm(struct mm_struct *prev,
#endif /* !CONFIG_PARAVIRT */

/*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+ /*
+ * Xen requires page-aligned LDTs with special permissions. This is
+ * needed to prevent us from installing evil descriptors such as
+ * call gates. On native, we could merge the ldt_struct and LDT
+ * allocations, but it's not worth trying to optimize.
+ */
+ struct desc_struct *entries;
+ int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+ struct ldt_struct *ldt;
+
+ /* lockless_dereference synchronizes with smp_store_release */
+ ldt = lockless_dereference(mm->context.ldt);
+
+ /*
+ * Any change to mm->context.ldt is followed by an IPI to all
+ * CPUs with the mm active. The LDT will not be freed until
+ * after the IPI is handled by all such CPUs. This means that,
+ * if the ldt_struct changes before we return, the values we see
+ * will be safe, and the new values will be loaded before we run
+ * any user code.
+ *
+ * NB: don't try to convert this to use RCU without extreme care.
+ * We would still need IRQs off, because we don't want to change
+ * the local LDT after an IPI loaded a newer value than the one
+ * that we can see.
+ */
+
+ if (unlikely(ldt))
+ set_ldt(ldt->entries, ldt->size);
+ else
+ clear_LDT();
+
+ DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
* Used for LDT copy/destruction.
*/
int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -50,7 +94,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,

/* Load the LDT, if the LDT is different: */
if (unlikely(prev->context.ldt != next->context.ldt))
- load_LDT_nolock(&next->context);
+ load_mm_ldt(next);
}
#ifdef CONFIG_SMP
else {
@@ -71,7 +115,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
* to make sure to use no freed page tables.
*/
load_cr3(next->pgd);
- load_LDT_nolock(&next->context);
+ load_mm_ldt(next);
}
}
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 00cc6f79615d..6db4828574ef 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1309,7 +1309,7 @@ void cpu_init(void)
load_sp0(t, &current->thread);
set_tss_desc(cpu, t);
load_TR_desc();
- load_LDT(&init_mm.context);
+ load_mm_ldt(&init_mm);

clear_all_debug_regs();
dbg_restore_debug_regs();
@@ -1356,7 +1356,7 @@ void cpu_init(void)
load_sp0(t, thread);
set_tss_desc(cpu, t);
load_TR_desc();
- load_LDT(&init_mm.context);
+ load_mm_ldt(&init_mm);

t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c7106f116fb0..0271272d55d0 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,6 +31,7 @@
#include <asm/nmi.h>
#include <asm/smp.h>
#include <asm/alternative.h>
+#include <asm/mmu_context.h>
#include <asm/timer.h>
#include <asm/desc.h>
#include <asm/ldt.h>
@@ -1953,21 +1954,25 @@ static unsigned long get_segment_base(unsigned int segment)
int idx = segment >> 3;

if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+ struct ldt_struct *ldt;
+
if (idx > LDT_ENTRIES)
return 0;

- if (idx > current->active_mm->context.size)
+ /* IRQs are off, so this synchronizes with smp_store_release */
+ ldt = lockless_dereference(current->active_mm->context.ldt);
+ if (!ldt || idx > ldt->size)
return 0;

- desc = current->active_mm->context.ldt;
+ desc = &ldt->entries[idx];
} else {
if (idx > GDT_ENTRIES)
return 0;

- desc = __this_cpu_ptr(&gdt_page.gdt[0]);
+ desc = __this_cpu_ptr(&gdt_page.gdt[0]) + idx;
}

- return get_desc_base(desc + idx);
+ return get_desc_base(desc);
}

#ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d759cc..2bcc0525f1c1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/smp.h>
+#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>

@@ -20,82 +21,82 @@
#include <asm/mmu_context.h>
#include <asm/syscalls.h>

-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
static void flush_ldt(void *current_mm)
{
- if (current->active_mm == current_mm)
- load_LDT(&current->active_mm->context);
+ mm_context_t *pc;
+
+ if (current->active_mm != current_mm)
+ return;
+
+ pc = &current->active_mm->context;
+ set_ldt(pc->ldt->entries, pc->ldt->size);
}
-#endif

-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
{
- void *oldldt, *newldt;
- int oldsize;
-
- if (mincount <= pc->size)
- return 0;
- oldsize = pc->size;
- mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
- (~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
- if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
- newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+ struct ldt_struct *new_ldt;
+ int alloc_size;
+
+ if (size > LDT_ENTRIES)
+ return NULL;
+
+ new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+ if (!new_ldt)
+ return NULL;
+
+ BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+ alloc_size = size * LDT_ENTRY_SIZE;
+
+ /*
+ * Xen is very picky: it requires a page-aligned LDT that has no
+ * trailing nonzero bytes in any page that contains LDT descriptors.
+ * Keep it simple: zero the whole allocation and never allocate less
+ * than PAGE_SIZE.
+ */
+ if (alloc_size > PAGE_SIZE)
+ new_ldt->entries = vzalloc(alloc_size);
else
- newldt = (void *)__get_free_page(GFP_KERNEL);
-
- if (!newldt)
- return -ENOMEM;
+ new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);

- if (oldsize)
- memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
- oldldt = pc->ldt;
- memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
- (mincount - oldsize) * LDT_ENTRY_SIZE);
+ if (!new_ldt->entries) {
+ kfree(new_ldt);
+ return NULL;
+ }

- paravirt_alloc_ldt(newldt, mincount);
+ new_ldt->size = size;
+ return new_ldt;
+}

-#ifdef CONFIG_X86_64
- /* CHECKME: Do we really need this ? */
- wmb();
-#endif
- pc->ldt = newldt;
- wmb();
- pc->size = mincount;
- wmb();
-
- if (reload) {
-#ifdef CONFIG_SMP
- preempt_disable();
- load_LDT(pc);
- if (!cpumask_equal(mm_cpumask(current->mm),
- cpumask_of(smp_processor_id())))
- smp_call_function(flush_ldt, current->mm, 1);
- preempt_enable();
-#else
- load_LDT(pc);
-#endif
- }
- if (oldsize) {
- paravirt_free_ldt(oldldt, oldsize);
- if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(oldldt);
- else
- put_page(virt_to_page(oldldt));
- }
- return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+ paravirt_alloc_ldt(ldt->entries, ldt->size);
}

-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+ struct ldt_struct *ldt)
{
- int err = alloc_ldt(new, old->size, 0);
- int i;
+ /* Synchronizes with lockless_dereference in load_mm_ldt. */
+ smp_store_release(&current_mm->context.ldt, ldt);
+
+ /* Activate the LDT for all CPUs using current_mm. */
+ on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}

- if (err < 0)
- return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+ if (likely(!ldt))
+ return;

- for (i = 0; i < old->size; i++)
- write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
- return 0;
+ paravirt_free_ldt(ldt->entries, ldt->size);
+ if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+ vfree(ldt->entries);
+ else
+ kfree(ldt->entries);
+ kfree(ldt);
}

/*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
*/
int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
{
+ struct ldt_struct *new_ldt;
struct mm_struct *old_mm;
int retval = 0;

mutex_init(&mm->context.lock);
- mm->context.size = 0;
old_mm = current->mm;
- if (old_mm && old_mm->context.size > 0) {
- mutex_lock(&old_mm->context.lock);
- retval = copy_ldt(&mm->context, &old_mm->context);
- mutex_unlock(&old_mm->context.lock);
+ if (!old_mm) {
+ mm->context.ldt = NULL;
+ return 0;
}
+
+ mutex_lock(&old_mm->context.lock);
+ if (!old_mm->context.ldt) {
+ mm->context.ldt = NULL;
+ goto out_unlock;
+ }
+
+ new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+ if (!new_ldt) {
+ retval = -ENOMEM;
+ goto out_unlock;
+ }
+
+ memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+ new_ldt->size * LDT_ENTRY_SIZE);
+ finalize_ldt_struct(new_ldt);
+
+ mm->context.ldt = new_ldt;
+
+out_unlock:
+ mutex_unlock(&old_mm->context.lock);
return retval;
}

@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
*/
void destroy_context(struct mm_struct *mm)
{
- if (mm->context.size) {
-#ifdef CONFIG_X86_32
- /* CHECKME: Can this ever happen ? */
- if (mm == current->active_mm)
- clear_LDT();
-#endif
- paravirt_free_ldt(mm->context.ldt, mm->context.size);
- if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(mm->context.ldt);
- else
- put_page(virt_to_page(mm->context.ldt));
- mm->context.size = 0;
- }
+ free_ldt_struct(mm->context.ldt);
+ mm->context.ldt = NULL;
}

static int read_ldt(void __user *ptr, unsigned long bytecount)
{
- int err;
+ int retval;
unsigned long size;
struct mm_struct *mm = current->mm;

- if (!mm->context.size)
- return 0;
+ mutex_lock(&mm->context.lock);
+
+ if (!mm->context.ldt) {
+ retval = 0;
+ goto out_unlock;
+ }
+
if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;

- mutex_lock(&mm->context.lock);
- size = mm->context.size * LDT_ENTRY_SIZE;
+ size = mm->context.ldt->size * LDT_ENTRY_SIZE;
if (size > bytecount)
size = bytecount;

- err = 0;
- if (copy_to_user(ptr, mm->context.ldt, size))
- err = -EFAULT;
- mutex_unlock(&mm->context.lock);
- if (err < 0)
- goto error_return;
+ if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+ retval = -EFAULT;
+ goto out_unlock;
+ }
+
if (size != bytecount) {
- /* zero-fill the rest */
- if (clear_user(ptr + size, bytecount - size) != 0) {
- err = -EFAULT;
- goto error_return;
+ /* Zero-fill the rest and pretend we read bytecount bytes. */
+ if (clear_user(ptr + size, bytecount - size)) {
+ retval = -EFAULT;
+ goto out_unlock;
}
}
- return bytecount;
-error_return:
- return err;
+ retval = bytecount;
+
+out_unlock:
+ mutex_unlock(&mm->context.lock);
+ return retval;
}

static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
struct desc_struct ldt;
int error;
struct user_desc ldt_info;
+ int oldsize, newsize;
+ struct ldt_struct *new_ldt, *old_ldt;

error = -EINVAL;
if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
goto out;
}

- mutex_lock(&mm->context.lock);
- if (ldt_info.entry_number >= mm->context.size) {
- error = alloc_ldt(&current->mm->context,
- ldt_info.entry_number + 1, 1);
- if (error < 0)
- goto out_unlock;
- }
-
- /* Allow LDTs to be cleared by the user. */
- if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
- if (oldmode || LDT_empty(&ldt_info)) {
- memset(&ldt, 0, sizeof(ldt));
- goto install;
+ if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+ LDT_empty(&ldt_info)) {
+ /* The user wants to clear the entry. */
+ memset(&ldt, 0, sizeof(ldt));
+ } else {
+ if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+ error = -EINVAL;
+ goto out;
}
+
+ fill_ldt(&ldt, &ldt_info);
+ if (oldmode)
+ ldt.avl = 0;
}

- if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
- error = -EINVAL;
+ mutex_lock(&mm->context.lock);
+
+ old_ldt = mm->context.ldt;
+ oldsize = old_ldt ? old_ldt->size : 0;
+ newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+ error = -ENOMEM;
+ new_ldt = alloc_ldt_struct(newsize);
+ if (!new_ldt)
goto out_unlock;
- }

- fill_ldt(&ldt, &ldt_info);
- if (oldmode)
- ldt.avl = 0;
+ if (old_ldt)
+ memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+ new_ldt->entries[ldt_info.entry_number] = ldt;
+ finalize_ldt_struct(new_ldt);

- /* Install the new entry ... */
-install:
- write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+ install_ldt(mm, new_ldt);
+ free_ldt_struct(old_ldt);
error = 0;

out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8e9fe8dfd37b..f99825ea4f96 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -122,11 +122,11 @@ void __show_regs(struct pt_regs *regs, int all)
void release_thread(struct task_struct *dead_task)
{
if (dead_task->mm) {
- if (dead_task->mm->context.size) {
+ if (dead_task->mm->context.ldt) {
pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
dead_task->comm,
dead_task->mm->context.ldt,
- dead_task->mm->context.size);
+ dead_task->mm->context.ldt->size);
BUG();
}
}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d0c0d0..6273324186ac 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
#include <linux/mm.h>
#include <linux/ptrace.h>
#include <asm/desc.h>
+#include <asm/mmu_context.h>

unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
{
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
seg &= ~7UL;

mutex_lock(&child->mm->context.lock);
- if (unlikely((seg >> 3) >= child->mm->context.size))
+ if (unlikely(!child->mm->context.ldt ||
+ (seg >> 3) >= child->mm->context.ldt->size))
addr = -1L; /* bogus selector, access would fault */
else {
- desc = child->mm->context.ldt + seg;
+ desc = &child->mm->context.ldt->entries[seg];
base = get_desc_base(desc);

/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 424f4c97a44d..6c8e2f5ce056 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -23,6 +23,7 @@
#include <asm/debugreg.h>
#include <asm/fpu-internal.h> /* pcntxt_mask */
#include <asm/cpu.h>
+#include <asm/mmu_context.h>

#ifdef CONFIG_X86_32
__visible unsigned long saved_context_ebx;
@@ -157,7 +158,7 @@ static void fix_processor_context(void)
syscall_init(); /* This sets MSR_*STAR and related */
#endif
load_TR_desc(); /* This does ltr */
- load_LDT(&current->active_mm->context); /* This does lldt */
+ load_mm_ldt(current->active_mm); /* This does lldt */
}

/**
--
2.5.0

2015-08-24 09:10:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 64/82] x86/ldt: Correct LDT access in single stepping logic

From: Juergen Gross <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 136d9d83c07c5e30ac49fc83b27e8c4842f108fc upstream.

Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous")
introduced a new struct ldt_struct anchored at mm->context.ldt.

convert_ip_to_linear() was changed to reflect this, but indexing
into the ldt has to be changed as the pointer is no longer void *.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/step.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324186ac..0ccb53a9fcd9 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -28,11 +28,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
struct desc_struct *desc;
unsigned long base;

- seg &= ~7UL;
+ seg >>= 3;

mutex_lock(&child->mm->context.lock);
if (unlikely(!child->mm->context.ldt ||
- (seg >> 3) >= child->mm->context.ldt->size))
+ seg >= child->mm->context.ldt->size))
addr = -1L; /* bogus selector, access would fault */
else {
desc = &child->mm->context.ldt->entries[seg];
--
2.5.0

2015-08-24 09:15:40

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 65/82] x86/ldt: Correct FPU emulation access to LDT

From: Juergen Gross <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4809146b86c3d41ce588fdb767d021e2a80600dd upstream.

Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous")
introduced a new struct ldt_struct anchored at mm->context.ldt.

Adapt the x86 fpu emulation code to use that new structure.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/math-emu/fpu_entry.c | 3 +--
arch/x86/math-emu/fpu_system.h | 21 ++++++++++++++++++---
arch/x86/math-emu/get_address.c | 3 +--
3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 9b868124128d..274a52b1183e 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -29,7 +29,6 @@

#include <asm/uaccess.h>
#include <asm/traps.h>
-#include <asm/desc.h>
#include <asm/user.h>
#include <asm/i387.h>

@@ -185,7 +184,7 @@ void math_emulate(struct math_emu_info *info)
math_abort(FPU_info, SIGILL);
}

- code_descriptor = LDT_DESCRIPTOR(FPU_CS);
+ code_descriptor = FPU_get_ldt_descriptor(FPU_CS);
if (SEG_D_SIZE(code_descriptor)) {
/* The above test may be wrong, the book is not clear */
/* Segmented 32 bit protected mode */
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 2c614410a5f3..d342fce49447 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -16,9 +16,24 @@
#include <linux/kernel.h>
#include <linux/mm.h>

-/* s is always from a cpu register, and the cpu does bounds checking
- * during register load --> no further bounds checks needed */
-#define LDT_DESCRIPTOR(s) (((struct desc_struct *)current->mm->context.ldt)[(s) >> 3])
+#include <asm/desc.h>
+#include <asm/mmu_context.h>
+
+static inline struct desc_struct FPU_get_ldt_descriptor(unsigned seg)
+{
+ static struct desc_struct zero_desc;
+ struct desc_struct ret = zero_desc;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ seg >>= 3;
+ mutex_lock(&current->mm->context.lock);
+ if (current->mm->context.ldt && seg < current->mm->context.ldt->size)
+ ret = current->mm->context.ldt->entries[seg];
+ mutex_unlock(&current->mm->context.lock);
+#endif
+ return ret;
+}
+
#define SEG_D_SIZE(x) ((x).b & (3 << 21))
#define SEG_G_BIT(x) ((x).b & (1 << 23))
#define SEG_GRANULARITY(x) (((x).b & (1 << 23)) ? 4096 : 1)
diff --git a/arch/x86/math-emu/get_address.c b/arch/x86/math-emu/get_address.c
index 6ef5e99380f9..d13cab2aec45 100644
--- a/arch/x86/math-emu/get_address.c
+++ b/arch/x86/math-emu/get_address.c
@@ -20,7 +20,6 @@
#include <linux/stddef.h>

#include <asm/uaccess.h>
-#include <asm/desc.h>

#include "fpu_system.h"
#include "exception.h"
@@ -158,7 +157,7 @@ static long pm_address(u_char FPU_modrm, u_char segment,
addr->selector = PM_REG_(segment);
}

- descriptor = LDT_DESCRIPTOR(PM_REG_(segment));
+ descriptor = FPU_get_ldt_descriptor(segment);
base_address = SEG_BASE_ADDR(descriptor);
address = base_address + offset;
limit = base_address
--
2.5.0

2015-08-24 09:15:39

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 66/82] x86/ldt: Further fix FPU emulation

From: Andy Lutomirski <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 12e244f4b550498bbaf654a52f93633f7dde2dc7 upstream.

The previous fix confused a selector with a segment prefix. Fix it.

Compile-tested only.

Cc: [email protected]
Cc: Juergen Gross <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Fixes: 4809146b86c3 ("x86/ldt: Correct FPU emulation access to LDT")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/math-emu/get_address.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/math-emu/get_address.c b/arch/x86/math-emu/get_address.c
index d13cab2aec45..8300db71c2a6 100644
--- a/arch/x86/math-emu/get_address.c
+++ b/arch/x86/math-emu/get_address.c
@@ -157,7 +157,7 @@ static long pm_address(u_char FPU_modrm, u_char segment,
addr->selector = PM_REG_(segment);
}

- descriptor = FPU_get_ldt_descriptor(segment);
+ descriptor = FPU_get_ldt_descriptor(addr->selector);
base_address = SEG_BASE_ADDR(descriptor);
address = base_address + offset;
limit = base_address
--
2.5.0

2015-08-24 09:15:35

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 67/82] signalfd: fix information leak in signalfd_copyinfo

From: Amanieu d'Antras <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3ead7c52bdb0ab44f4bb1feed505a8323cc12ba7 upstream.

This function may copy the si_addr_lsb field to user mode when it hasn't
been initialized, which can leak kernel stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
fs/signalfd.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 424b7b65321f..148f8e7af882 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -121,8 +121,9 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
* Other callers might not initialize the si_lsb field,
* so check explicitly for the right codes here.
*/
- if (kinfo->si_code == BUS_MCEERR_AR ||
- kinfo->si_code == BUS_MCEERR_AO)
+ if (kinfo->si_signo == SIGBUS &&
+ (kinfo->si_code == BUS_MCEERR_AR ||
+ kinfo->si_code == BUS_MCEERR_AO))
err |= __put_user((short) kinfo->si_addr_lsb,
&uinfo->ssi_addr_lsb);
#endif
--
2.5.0

2015-08-24 09:15:37

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 68/82] signal: fix information leak in copy_siginfo_to_user

From: Amanieu d'Antras <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 26135022f85105ad725cda103fa069e29e83bd16 upstream.

This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arm64/kernel/signal32.c | 3 ++-
kernel/signal.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index 3d478102b1c0..efd1dde7094b 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -193,7 +193,8 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
* Other callers might not initialize the si_lsb field,
* so check explicitely for the right codes here.
*/
- if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
+ if (from->si_signo == SIGBUS &&
+ (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO))
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
break;
diff --git a/kernel/signal.c b/kernel/signal.c
index ded28b91fa53..5f558f1d3ab1 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2768,7 +2768,8 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
* Other callers might not initialize the si_lsb field,
* so check explicitly for the right codes here.
*/
- if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
+ if (from->si_signo == SIGBUS &&
+ (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO))
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
break;
--
2.5.0

2015-08-24 09:10:38

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 69/82] signal: fix information leak in copy_siginfo_from_user32

From: Amanieu d'Antras <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.

This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.

copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.

This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)

parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/arm64/kernel/signal32.c | 2 --
arch/mips/kernel/signal32.c | 2 --
arch/powerpc/kernel/signal_32.c | 2 --
kernel/signal.c | 4 ++--
4 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index efd1dde7094b..b9564b8d6bab 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -221,8 +221,6 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)

int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
{
- memset(to, 0, sizeof *to);
-
if (copy_from_user(to, from, __ARCH_SI_PREAMBLE_SIZE) ||
copy_from_user(to->_sifields._pad,
from->_sifields._pad, SI_PAD_SIZE))
diff --git a/arch/mips/kernel/signal32.c b/arch/mips/kernel/signal32.c
index 57de8b751627..41f8708d21a8 100644
--- a/arch/mips/kernel/signal32.c
+++ b/arch/mips/kernel/signal32.c
@@ -368,8 +368,6 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)

int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
{
- memset(to, 0, sizeof *to);
-
if (copy_from_user(to, from, 3*sizeof(int)) ||
copy_from_user(to->_sifields._pad,
from->_sifields._pad, SI_PAD_SIZE32))
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 50606e4261a1..7fce77b89f6d 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -958,8 +958,6 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *d, siginfo_t *s)

int copy_siginfo_from_user32(siginfo_t *to, struct compat_siginfo __user *from)
{
- memset(to, 0, sizeof *to);
-
if (copy_from_user(to, from, 3*sizeof(int)) ||
copy_from_user(to->_sifields._pad,
from->_sifields._pad, SI_PAD_SIZE32))
diff --git a/kernel/signal.c b/kernel/signal.c
index 5f558f1d3ab1..fca2decd695e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3036,7 +3036,7 @@ COMPAT_SYSCALL_DEFINE3(rt_sigqueueinfo,
int, sig,
struct compat_siginfo __user *, uinfo)
{
- siginfo_t info;
+ siginfo_t info = {};
int ret = copy_siginfo_from_user32(&info, uinfo);
if (unlikely(ret))
return ret;
@@ -3082,7 +3082,7 @@ COMPAT_SYSCALL_DEFINE4(rt_tgsigqueueinfo,
int, sig,
struct compat_siginfo __user *, uinfo)
{
- siginfo_t info;
+ siginfo_t info = {};

if (copy_siginfo_from_user32(&info, uinfo))
return -EFAULT;
--
2.5.0

2015-08-24 09:14:38

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 70/82] path_openat(): fix double fput()

From: Al Viro <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit f15133df088ecadd141ea1907f2c96df67c729f0 upstream.

path_openat() jumps to the wrong place after do_tmpfile() - it has
already done path_cleanup() (as part of path_lookupat() called by
do_tmpfile()), so doing that again can lead to double fput().

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index c0c78e193e2a..097bbeac8c66 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3202,7 +3202,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,

if (unlikely(file->f_flags & __O_TMPFILE)) {
error = do_tmpfile(dfd, pathname, nd, flags, op, file, &opened);
- goto out;
+ goto out2;
}

error = path_init(dfd, pathname->name, flags | LOOKUP_PARENT, nd, &base);
@@ -3240,6 +3240,7 @@ out:
path_put(&nd->root);
if (base)
fput(base);
+out2:
if (!(opened & FILE_OPENED)) {
BUG_ON(!error);
put_filp(file);
--
2.5.0

2015-08-24 09:14:37

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 71/82] md/bitmap: return an error when bitmap superblock is corrupt.

From: NeilBrown <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit b97e92574c0bf335db1cd2ec491d8ff5cd5d0b49 upstream
Use separate bitmaps for each nodes in the cluster

bitmap_read_sb() validates the bitmap superblock that it reads in.
If it finds an inconsistency like a bad magic number or out-of-range
version number, it prints an error and returns, but it incorrectly
returns zero, so the array is still assembled with the (invalid) bitmap.

This means it could try to use a bitmap with a new version number which
it therefore does not understand.

This bug was introduced in 3.5 and fix as part of a larger patch in 4.1.
So the patch is suitable for any -stable kernel in that range.

Fixes: 27581e5ae01f ("md/bitmap: centralise allocation of bitmap file pages.")
Signed-off-by: NeilBrown <[email protected]>
Reported-by: GuoQing Jiang <[email protected]>
---
drivers/md/bitmap.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 03b2edd35e19..64c4d0c2ca80 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -564,6 +564,8 @@ static int bitmap_read_sb(struct bitmap *bitmap)
if (err)
return err;

+ err = -EINVAL;
+
sb = kmap_atomic(sb_page);

chunksize = le32_to_cpu(sb->chunksize);
--
2.5.0

2015-08-24 09:14:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 72/82] mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations

From: Michal Hocko <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit ecf5fc6e9654cd7a268c782a523f072b2f1959f9 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync"
#0 __schedule at ffffffff815ab152
#1 schedule at ffffffff815ab76e
#2 schedule_timeout at ffffffff815ae5e5
#3 io_schedule_timeout at ffffffff815aad6a
#4 bit_wait_io at ffffffff815abfc6
#5 __wait_on_bit at ffffffff815abda5
#6 wait_on_page_bit at ffffffff8111fd4f
#7 shrink_page_list at ffffffff81135445
#8 shrink_inactive_list at ffffffff81135845
#9 shrink_lruvec at ffffffff81135ead
#10 shrink_zone at ffffffff811360c3
#11 shrink_zones at ffffffff81136eff
#12 do_try_to_free_pages at ffffffff8113712f
#13 try_to_free_mem_cgroup_pages at ffffffff811372be
#14 try_charge at ffffffff81189423
#15 mem_cgroup_try_charge at ffffffff8118c6f5
#16 __add_to_page_cache_locked at ffffffff8112137d
#17 add_to_page_cache_lru at ffffffff81121618
#18 pagecache_get_page at ffffffff8112170b
#19 grow_dev_page at ffffffff811c8297
#20 __getblk_slow at ffffffff811c91d6
#21 __getblk_gfp at ffffffff811c92c1
#22 ext4_ext_grow_indepth at ffffffff8124565c
#23 ext4_ext_create_new_leaf at ffffffff81246ca8
#24 ext4_ext_insert_extent at ffffffff81246f09
#25 ext4_ext_map_blocks at ffffffff8124a848
#26 ext4_map_blocks at ffffffff8121a5b7
#27 mpage_map_one_extent at ffffffff8121b1fa
#28 mpage_map_and_submit_extent at ffffffff8121f07b
#29 ext4_writepages at ffffffff8121f6d5
#30 do_writepages at ffffffff8112c490
#31 __filemap_fdatawrite_range at ffffffff81120199
#32 filemap_flush at ffffffff8112041c
#33 ext4_alloc_da_blocks at ffffffff81219da1
#34 ext4_rename at ffffffff81229b91
#35 ext4_rename2 at ffffffff81229e32
#36 vfs_rename at ffffffff811a08a5
#37 SYSC_renameat2 at ffffffff811a3ffc
#38 sys_renameat2 at ffffffff811a408e
#39 sys_rename at ffffffff8119e51e
#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384e9da8 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified. The code has been changed by c3b94f44fcb0 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code. But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback. The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[[email protected]: corrected the control flow]
Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
mm/vmscan.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ee8363f73cab..04c33d5fb079 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -871,21 +871,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
*
* 2) Global reclaim encounters a page, memcg encounters a
* page that is not marked for immediate reclaim or
- * the caller does not have __GFP_IO. In this case mark
+ * the caller does not have __GFP_FS (or __GFP_IO if it's
+ * simply going to swap, not to fs). In this case mark
* the page for immediate reclaim and continue scanning.
*
- * __GFP_IO is checked because a loop driver thread might
+ * Require may_enter_fs because we would wait on fs, which
+ * may not have submitted IO yet. And the loop driver might
* enter reclaim, and deadlock if it waits on a page for
* which it is needed to do the write (loop masks off
* __GFP_IO|__GFP_FS for this reason); but more thought
* would probably show more reasons.
*
- * Don't require __GFP_FS, since we're not going into the
- * FS, just waiting on its writeback completion. Worryingly,
- * ext4 gfs2 and xfs allocate pages with
- * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so testing
- * may_enter_fs here is liable to OOM on them.
- *
* 3) memcg encounters a page that is not already marked
* PageReclaim. memcg does not have any dirty pages
* throttling so we could easily OOM just because too many
@@ -902,7 +898,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,

/* Case 2 above */
} else if (global_reclaim(sc) ||
- !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) {
+ !PageReclaim(page) || !may_enter_fs) {
/*
* This is slightly racy - end_page_writeback()
* might have just cleared PageReclaim, then
--
2.5.0

2015-08-24 09:13:49

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 73/82] ipc/sem.c: update/correct memory barriers

From: Manfred Spraul <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul <[email protected]>
Reported-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
ipc/sem.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index d8456ad6131c..e960e12b189c 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -253,6 +253,16 @@ static void sem_rcu_free(struct rcu_head *head)
}

/*
+ * spin_unlock_wait() and !spin_is_locked() are not memory barriers, they
+ * are only control barriers.
+ * The code must pair with spin_unlock(&sem->lock) or
+ * spin_unlock(&sem_perm.lock), thus just the control barrier is insufficient.
+ *
+ * smp_rmb() is sufficient, as writes cannot pass the control barrier.
+ */
+#define ipc_smp_acquire__after_spin_is_unlocked() smp_rmb()
+
+/*
* Wait until all currently ongoing simple ops have completed.
* Caller must own sem_perm.lock.
* New simple ops cannot start, because simple ops first check
@@ -275,6 +285,7 @@ static void sem_wait_array(struct sem_array *sma)
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
}
+ ipc_smp_acquire__after_spin_is_unlocked();
}

/*
@@ -327,13 +338,12 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Then check that the global lock is free */
if (!spin_is_locked(&sma->sem_perm.lock)) {
/*
- * The ipc object lock check must be visible on all
- * cores before rechecking the complex count. Otherwise
- * we can race with another thread that does:
+ * We need a memory barrier with acquire semantics,
+ * otherwise we can race with another thread that does:
* complex_count++;
* spin_unlock(sem_perm.lock);
*/
- smp_rmb();
+ ipc_smp_acquire__after_spin_is_unlocked();

/*
* Now repeat the test of complex_count:
--
2.5.0

2015-08-24 09:13:48

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 74/82] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

From: "Herton R. Krzesinski" <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b kkkkkkkk.kkkkkkk
010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........
Prev obj: start=ffff88003b45c180, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff ...........7....
Next obj: start=ffff88003b45c200, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff ........h).<....
BUG: spinlock wrong CPU on CPU#2, test/18028
general protection fault: 0000 [#1] SMP
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: spin_dump+0x53/0xc0
Call Trace:
spin_bug+0x30/0x40
do_raw_spin_unlock+0x71/0xa0
_raw_spin_unlock+0xe/0x10
freeary+0x82/0x2a0
? _raw_spin_lock+0xe/0x10
semctl_down.clone.0+0xce/0x160
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
SyS_semctl+0x236/0x2c0
? syscall_trace_leave+0xde/0x130
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
RIP [<ffffffff810d6053>] spin_dump+0x53/0xc0
RSP <ffff88003750fd68>
---[ end trace 783ebb76612867a0 ]---
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 3 PID: 18053 Comm: test Tainted: G D 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: native_read_tsc+0x0/0x20
Call Trace:
? delay_tsc+0x40/0x70
__delay+0xf/0x20
do_raw_spin_lock+0x96/0x140
_raw_spin_lock+0xe/0x10
sem_lock_and_putref+0x11/0x70
SYSC_semtimedop+0x7bf/0x960
? handle_mm_fault+0xbf6/0x1880
? dequeue_task_fair+0x79/0x4a0
? __do_page_fault+0x19a/0x430
? kfree_debugcheck+0x16/0x40
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
? do_audit_syscall_entry+0x66/0x70
? syscall_trace_enter_phase1+0x139/0x160
SyS_semtimedop+0xe/0x10
SyS_semop+0x10/0x20
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>

#define NSEM 1
#define NSET 5

int sid[NSET];

void thread()
{
struct sembuf op;
int s;
uid_t pid = getuid();

s = rand() % NSET;
op.sem_num = pid % NSEM;
op.sem_op = 1;
op.sem_flg = SEM_UNDO;

semop(sid[s], &op, 1);
exit(EXIT_SUCCESS);
}

void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;

/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}

/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}

/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}

/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}

int main(int argc, char **argv)
{
pid_t p;

srand(time(NULL));

while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}

/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}

[[email protected]: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <[email protected]>
Acked-by: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rafael Aquini <[email protected]>
CC: Aristeu Rozanski <[email protected]>
Cc: David Jeffery <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>

Signed-off-by: Linus Torvalds <[email protected]>
---
ipc/sem.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index e960e12b189c..b064468e876f 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2067,17 +2067,28 @@ void exit_sem(struct task_struct *tsk)
rcu_read_lock();
un = list_entry_rcu(ulp->list_proc.next,
struct sem_undo, list_proc);
- if (&un->list_proc == &ulp->list_proc)
- semid = -1;
- else
- semid = un->semid;
+ if (&un->list_proc == &ulp->list_proc) {
+ /*
+ * We must wait for freeary() before freeing this ulp,
+ * in case we raced with last sem_undo. There is a small
+ * possibility where we exit while freeary() didn't
+ * finish unlocking sem_undo_list.
+ */
+ spin_unlock_wait(&ulp->lock);
+ rcu_read_unlock();
+ break;
+ }
+ spin_lock(&ulp->lock);
+ semid = un->semid;
+ spin_unlock(&ulp->lock);

+ /* exit_sem raced with IPC_RMID, nothing to do */
if (semid == -1) {
rcu_read_unlock();
- break;
+ continue;
}

- sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
+ sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
/* exit_sem raced with IPC_RMID, nothing to do */
if (IS_ERR(sma)) {
rcu_read_unlock();
--
2.5.0

2015-08-24 09:13:46

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 75/82] mm/hwpoison: fix page refcount of unknown non LRU page

From: Wanpeng Li <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 4f32be677b124a49459e2603321c7a5605ceb9f8 upstream.

After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.

Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().

Signed-off-by: Wanpeng Li <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
mm/memory-failure.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5785b59620ef..cb08faa72b77 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1524,6 +1524,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
*/
ret = __get_any_page(page, pfn, 0);
if (!PageLRU(page)) {
+ /* Drop page reference which is from __get_any_page() */
+ put_page(page);
pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n",
pfn, page->flags);
return -EIO;
--
2.5.0

2015-08-24 09:12:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 76/82] xen-blkfront: don't add indirect pages to list when !feature_persistent

From: Bob Liu <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc upstream.

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

Acked-by: Roger Pau Monné <[email protected]>
Signed-off-by: Bob Liu <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/block/xen-blkfront.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3bb5efdcdc8a..7d0eb3f8d629 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1090,8 +1090,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
* Add the used indirect page back to the list of
* available pages for indirect grefs.
*/
- indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
- list_add(&indirect_page->lru, &info->indirect_pages);
+ if (!info->feature_persistent) {
+ indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
+ list_add(&indirect_page->lru, &info->indirect_pages);
+ }
s->indirect_grants[i]->gref = GRANT_INVALID_REF;
list_add_tail(&s->indirect_grants[i]->node, &info->grants);
}
--
2.5.0

2015-08-24 09:12:35

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 77/82] perf: Fix fasync handling on inherited events

From: Peter Zijlstra <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream.

Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.

Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.

Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.

Reported-and-Tested-by: Vince Weaver <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho deMelo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/events/core.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 18de86cbcdac..cf9f61763ab1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4155,12 +4155,20 @@ static const struct file_operations perf_fops = {
* to user-space before waking everybody up.
*/

+static inline struct fasync_struct **perf_event_fasync(struct perf_event *event)
+{
+ /* only the parent has fasync state */
+ if (event->parent)
+ event = event->parent;
+ return &event->fasync;
+}
+
void perf_event_wakeup(struct perf_event *event)
{
ring_buffer_wakeup(event);

if (event->pending_kill) {
- kill_fasync(&event->fasync, SIGIO, event->pending_kill);
+ kill_fasync(perf_event_fasync(event), SIGIO, event->pending_kill);
event->pending_kill = 0;
}
}
@@ -5362,7 +5370,7 @@ static int __perf_event_overflow(struct perf_event *event,
else
perf_event_output(event, data, regs);

- if (event->fasync && event->pending_kill) {
+ if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
irq_work_queue(&event->pending);
}
--
2.5.0

2015-08-24 09:12:33

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 78/82] dm thin metadata: delete btrees when releasing metadata snapshot

From: Joe Thornber <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 7f518ad0a212e2a6fd68630e176af1de395070a7 upstream.

The device details and mapping trees were just being decremented
before. Now btree_del() is called to do a deep delete.

Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/md/dm-thin-metadata.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index b63095c73b5f..7e3da70ed646 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -1295,8 +1295,8 @@ static int __release_metadata_snap(struct dm_pool_metadata *pmd)
return r;

disk_super = dm_block_data(copy);
- dm_sm_dec_block(pmd->metadata_sm, le64_to_cpu(disk_super->data_mapping_root));
- dm_sm_dec_block(pmd->metadata_sm, le64_to_cpu(disk_super->device_details_root));
+ dm_btree_del(&pmd->info, le64_to_cpu(disk_super->data_mapping_root));
+ dm_btree_del(&pmd->details_info, le64_to_cpu(disk_super->device_details_root));
dm_sm_dec_block(pmd->metadata_sm, held_root);

return dm_tm_unlock(pmd->tm, copy);
--
2.5.0

2015-08-24 09:12:30

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 79/82] localmodconfig: Use Kbuild files too

From: Richard Weinberger <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit c0ddc8c745b7f89c50385fd7aa03c78dc543fa7a upstream.

In kbuild it is allowed to define objects in files named "Makefile"
and "Kbuild".
Currently localmodconfig reads objects only from "Makefile"s and misses
modules like nouveau.

Link: http://lkml.kernel.org/r/[email protected]

Reported-and-tested-by: Leonidas Spyropoulos <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/kconfig/streamline_config.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/streamline_config.pl b/scripts/kconfig/streamline_config.pl
index 4606cdfb859d..7dd7c391b4d8 100644
--- a/scripts/kconfig/streamline_config.pl
+++ b/scripts/kconfig/streamline_config.pl
@@ -137,7 +137,7 @@ my $ksource = ($ARGV[0] ? $ARGV[0] : '.');
my $kconfig = $ARGV[1];
my $lsmod_file = $ENV{'LSMOD'};

-my @makefiles = `find $ksource -name Makefile 2>/dev/null`;
+my @makefiles = `find $ksource -name Makefile -or -name Kbuild 2>/dev/null`;
chomp @makefiles;

my %depends;
--
2.5.0

2015-08-24 09:11:56

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 80/82] EDAC, ppc4xx: Access mci->csrows array elements properly

From: Michael Walle <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 5c16179b550b9fd8114637a56b153c9768ea06a5 upstream.

The commit

de3910eb79ac ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/edac/ppc4xx_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index ef6b7e08f485..5c361f3c66aa 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -921,7 +921,7 @@ static int ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
*/

for (row = 0; row < mci->nr_csrows; row++) {
- struct csrow_info *csi = &mci->csrows[row];
+ struct csrow_info *csi = mci->csrows[row];

/*
* Get the configuration settings for this
--
2.5.0

2015-08-24 09:12:32

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 81/82] drm/radeon: add new OLAND pci id

From: Alex Deucher <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit e037239e5e7b61007763984aa35a8329596d8c88 upstream.

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/drm/drm_pciids.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/drm/drm_pciids.h b/include/drm/drm_pciids.h
index 7571f433f0e3..2e2804c241fa 100644
--- a/include/drm/drm_pciids.h
+++ b/include/drm/drm_pciids.h
@@ -172,6 +172,7 @@
{0x1002, 0x6610, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6611, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6613, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x6617, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6620, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6621, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6623, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_OLAND|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
--
2.5.0

2015-08-24 09:11:54

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 3.12 82/82] rbd: fix copyup completion race

From: Ilya Dryomov <[email protected]>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

commit 2761713d35e370fd640b5781109f753066b746c4 upstream.

For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.

rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run. Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:

<obj_request-1/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
rbd_obj_request_complete()
rbd_img_obj_copyup_callback()
rbd_img_obj_callback()
<obj_request-2/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
for_each_obj_request(obj_request->img_request) {
rbd_img_obj_end_request(obj_request-1/2)
rbd_img_obj_end_request(obj_request-2/2) <--
}

Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0. We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on

rbd_assert(more ^ (which == img_request->obj_request_count));

with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.

To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request). So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().

Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().

Cc: Alex Elder <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
[[email protected]: backport to < 3.18: context]
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/block/rbd.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 6aeaa28f94f0..63ff17fc23df 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -461,6 +461,7 @@ void rbd_warn(struct rbd_device *rbd_dev, const char *fmt, ...)
# define rbd_assert(expr) ((void) 0)
#endif /* !RBD_DEBUG */

+static void rbd_osd_copyup_callback(struct rbd_obj_request *obj_request);
static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request);
static void rbd_img_parent_read(struct rbd_obj_request *obj_request);
static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
@@ -1664,6 +1665,16 @@ static void rbd_osd_stat_callback(struct rbd_obj_request *obj_request)
obj_request_done_set(obj_request);
}

+static void rbd_osd_call_callback(struct rbd_obj_request *obj_request)
+{
+ dout("%s: obj %p\n", __func__, obj_request);
+
+ if (obj_request_img_data_test(obj_request))
+ rbd_osd_copyup_callback(obj_request);
+ else
+ obj_request_done_set(obj_request);
+}
+
static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
struct ceph_msg *msg)
{
@@ -1702,6 +1713,8 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
rbd_osd_stat_callback(obj_request);
break;
case CEPH_OSD_OP_CALL:
+ rbd_osd_call_callback(obj_request);
+ break;
case CEPH_OSD_OP_NOTIFY_ACK:
case CEPH_OSD_OP_WATCH:
rbd_osd_trivial_callback(obj_request);
@@ -2293,13 +2306,15 @@ out_unwind:
}

static void
-rbd_img_obj_copyup_callback(struct rbd_obj_request *obj_request)
+rbd_osd_copyup_callback(struct rbd_obj_request *obj_request)
{
struct rbd_img_request *img_request;
struct rbd_device *rbd_dev;
struct page **pages;
u32 page_count;

+ dout("%s: obj %p\n", __func__, obj_request);
+
rbd_assert(obj_request->type == OBJ_REQUEST_BIO);
rbd_assert(obj_request_img_data_test(obj_request));
img_request = obj_request->img_request;
@@ -2325,9 +2340,7 @@ rbd_img_obj_copyup_callback(struct rbd_obj_request *obj_request)
if (!obj_request->result)
obj_request->xferred = obj_request->length;

- /* Finish up with the normal image object callback */
-
- rbd_img_obj_callback(obj_request);
+ obj_request_done_set(obj_request);
}

static void
@@ -2424,7 +2437,6 @@ rbd_img_obj_parent_read_full_callback(struct rbd_img_request *img_request)

/* All set, send it off. */

- orig_request->callback = rbd_img_obj_copyup_callback;
osdc = &rbd_dev->rbd_client->client->osdc;
img_result = rbd_obj_request_submit(osdc, orig_request);
if (!img_result)
--
2.5.0

2015-08-24 16:09:26

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3.12 00/82] 3.12.47-stable review

On 08/24/2015 02:09 AM, Jiri Slaby wrote:
> This is the start of the stable review cycle for the 3.12.47 release.
> There are 82 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 26 11:08:59 CEST 2015.
> Anything received after that time might be too late.
>

Build results:
total: 124 pass: 124 fail: 0
Qemu test results:
total: 70 pass: 70 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter

2015-08-24 23:36:19

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 3.12 00/82] 3.12.47-stable review

On 08/24/2015 03:09 AM, Jiri Slaby wrote:
> This is the start of the stable review cycle for the 3.12.47 release.
> There are 82 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 26 11:08:59 CEST 2015.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> http://kernel.org/pub/linux/kernel/people/jirislaby/stable-review/patch-3.12.47-rc1.xz
> and the diffstat can be found below.
>
> thanks,
> js

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978

2015-08-25 11:36:06

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

[ Adding Greg has he seems to have this patch queued for 3.10 and 3.14 ]

On Mon, Aug 24, 2015 at 11:09:09AM +0200, Jiri Slaby wrote:
> From: Marek Marczykowski-G?recki <[email protected]>
>
> 3.12-stable review patch. If anyone has any objections, please let me know.
>
> ===============
>
> commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 upstream.
>
> While gntdev_release() is called the MMU notifier is still registered
> and can traverse priv->maps list even if no pages are mapped (which is
> the case -- gntdev_release() is called after all). But
> gntdev_release() will clear that list, so make sure that only one of
> those things happens at the same time.
>
> Signed-off-by: Marek Marczykowski-G?recki <[email protected]>
> Signed-off-by: David Vrabel <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> drivers/xen/gntdev.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index e41c79c986ea..f2ca8d0af55f 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -529,12 +529,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>
> pr_debug("priv %p\n", priv);
>
> + mutex_lock(&priv->lock);

Since 3.12 doesn't seem to include 1401c00e59ea ("xen/gntdev: convert
priv->lock to a mutex"), this shouldn't be applied as priv->lock is
actually a spinlock. So, you'll need to pick 1401c00e59ea or backport
this patch using the appropriate locking directives. Not sure what's
the best solution. Maybe Marek or David can help...?

Cheers,
--
Lu?s

> while (!list_empty(&priv->maps)) {
> map = list_entry(priv->maps.next, struct grant_map, next);
> list_del(&map->next);
> gntdev_put_map(NULL /* already removed */, map);
> }
> WARN_ON(!list_empty(&priv->freeable_maps));
> + mutex_unlock(&priv->lock);
>
> if (use_ptemod)
> mmu_notifier_unregister(&priv->mn, priv->mm);
> --
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Subject: Re: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

On Tue, Aug 25, 2015 at 12:35:59PM +0100, Luis Henriques wrote:
> [ Adding Greg has he seems to have this patch queued for 3.10 and 3.14 ]
>
> On Mon, Aug 24, 2015 at 11:09:09AM +0200, Jiri Slaby wrote:
> > From: Marek Marczykowski-Górecki <[email protected]>
> >
> > 3.12-stable review patch. If anyone has any objections, please let me know.
> >
> > ===============
> >
> > commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 upstream.
> >
> > While gntdev_release() is called the MMU notifier is still registered
> > and can traverse priv->maps list even if no pages are mapped (which is
> > the case -- gntdev_release() is called after all). But
> > gntdev_release() will clear that list, so make sure that only one of
> > those things happens at the same time.
> >
> > Signed-off-by: Marek Marczykowski-Górecki <[email protected]>
> > Signed-off-by: David Vrabel <[email protected]>
> > Signed-off-by: Jiri Slaby <[email protected]>
> > ---
> > drivers/xen/gntdev.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index e41c79c986ea..f2ca8d0af55f 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -529,12 +529,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
> >
> > pr_debug("priv %p\n", priv);
> >
> > + mutex_lock(&priv->lock);
>
> Since 3.12 doesn't seem to include 1401c00e59ea ("xen/gntdev: convert
> priv->lock to a mutex"), this shouldn't be applied as priv->lock is
> actually a spinlock. So, you'll need to pick 1401c00e59ea or backport
> this patch using the appropriate locking directives. Not sure what's
> the best solution. Maybe Marek or David can help...?

I've used spinlock approach for some time (on 3.18.x) and it works ok. This applies
also to 3.10 and 3.14 of course.

Patch here:
https://raw.githubusercontent.com/QubesOS/qubes-linux-kernel/stable-3.18/patches.xen/0001-xen-grant-fix-race-condition-in-gntdev_release.patch

and here:

From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<[email protected]>
Date: Fri, 26 Jun 2015 02:16:49 +0200
Subject: [PATCH] xen/grant: fix race condition in gntdev_release
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Organization: Invisible Things Lab
Cc: Marek Marczykowski-Górecki <[email protected]>

While gntdev_release is called, MMU notifier is still registered and
will traverse priv->maps list even if no pages are mapped (which is the
case - gntdev_release is called after all). But gntdev_release will
clear that list, so make sure that only one of those things happens at
the same time.

Signed-off-by: Marek Marczykowski-Górecki <[email protected]>
---
drivers/xen/gntdev.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 8927485..4bd23bb 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)

pr_debug("priv %p\n", priv);

+ spin_lock(&priv->lock);
while (!list_empty(&priv->maps)) {
map = list_entry(priv->maps.next, struct grant_map, next);
list_del(&map->next);
gntdev_put_map(NULL /* already removed */, map);
}
WARN_ON(!list_empty(&priv->freeable_maps));
+ spin_unlock(&priv->lock);

if (use_ptemod)
mmu_notifier_unregister(&priv->mn, priv->mm);
--
1.9.3



--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


Attachments:
(No filename) (3.63 kB)
(No filename) (473.00 B)
Download all attachments

2015-08-25 13:18:22

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08/25/2015, 01:52 PM, Marek Marczykowski-Górecki wrote:
>>> --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@
>>> -529,12 +529,14 @@ static int gntdev_release(struct inode
>>> *inode, struct file *flip)
>>>
>>> pr_debug("priv %p\n", priv);
>>>
>>> + mutex_lock(&priv->lock);
>>
>> Since 3.12 doesn't seem to include 1401c00e59ea ("xen/gntdev:
>> convert priv->lock to a mutex"), this shouldn't be applied as
>> priv->lock is actually a spinlock. So, you'll need to pick
>> 1401c00e59ea or backport this patch using the appropriate locking
>> directives. Not sure what's the best solution. Maybe Marek or
>> David can help...?
>
> I've used spinlock approach for some time (on 3.18.x) and it works
> ok. This applies also to 3.10 and 3.14 of course.
>
> Patch here:
> https://raw.githubusercontent.com/QubesOS/qubes-linux-kernel/stable-3.
18/patches.xen/0001-xen-grant-fix-race-condition-in-gntdev_release.patch
>
> and here:
>
> From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00
> 2001 From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> <[email protected]> Date: Fri, 26 Jun 2015 02:16:49
> +0200 Subject: [PATCH] xen/grant: fix race condition in
> gntdev_release MIME-Version: 1.0 Content-Type: text/plain;
> charset=UTF-8 Content-Transfer-Encoding: 8bit Organization:
> Invisible Things Lab Cc: Marek Marczykowski-Górecki
> <[email protected]>
>
> While gntdev_release is called, MMU notifier is still registered
> and will traverse priv->maps list even if no pages are mapped
> (which is the case - gntdev_release is called after all). But
> gntdev_release will clear that list, so make sure that only one of
> those things happens at the same time.
>
> Signed-off-by: Marek Marczykowski-Górecki
> <[email protected]> --- drivers/xen/gntdev.c | 2 ++ 1
> file changed, 2 insertions(+)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index
> 8927485..4bd23bb 100644 --- a/drivers/xen/gntdev.c +++
> b/drivers/xen/gntdev.c @@ -568,12 +568,14 @@ static int
> gntdev_release(struct inode *inode, struct file *flip)
>
> pr_debug("priv %p\n", priv);
>
> + spin_lock(&priv->lock); while (!list_empty(&priv->maps)) { map =
> list_entry(priv->maps.next, struct grant_map, next);
> list_del(&map->next); gntdev_put_map(NULL /* already removed */,
> map); } WARN_ON(!list_empty(&priv->freeable_maps)); +
> spin_unlock(&priv->lock);

Hmm, but e.g.
gntdev_put_map
-> gntdev_free_map
-> free_xenballooned_pages
-> mutex_lock

means sleep inside atomic, right?

thanks,
- --
js
suse labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJV3GsVAAoJEL0lsQQGtHBJyfUP/Ro99Km/OWSpK3SRKzsVDo9z
OwYzvnHdHNR8uqcS/VnzWfZokine+qWFVNdqfWx3GPRL03wt1JUdjdZTA+cIi2Xn
P6PcxUXTz3aCz7vPV79nW0yLpaG+aUoCeNGFvVTfJQnz/4gNQ0DaVkD2wmy6GptX
ce1mW4AofkQl5lFaWOb5xehy/qGzaj9egtZKfVrGcrAj8oRSHCOCdgW6qMbztBkI
YRlZY2pBgGmkx3o6PktqouKx4zi9akEK1j9/axDzhjQxc2Put36P4XuqQIfgVqQr
H2EZZD5JF8HhgIWlOvwo7Ll0TEmpaQ8ouxBUhHtNTFjYtR+r8hTwl6PfZLH9qWzT
IkOF7gDMHkbm73dLG3r+pSOyLsX9f0koYFFLHyRNKPEuYjRYV65hzRgSj1feXuni
l5VG2vO6YFJcmjPh8q0pKciIWvNw+4n10XEwub1Qk/PVjGZDAaTKIWw2H4q/ZAqk
9+0zQzlzGC78pGQsdLmnjFchZ7ye9/F8zuDPhNEv4MgjoI2gq1zfGgWb+bZeq4JX
3/TLh33U/a9Zlbb1HCjs/bN3l+WlYCwOqVgSwUyZwem8gZrPlJ37aukCbolHecUj
xk07IaY1Wv4OHYzxdXeSd9XeUospM19WSYoVaTVBi7RBrKTHdBYHGeQxocYBgviG
Ckj9O9Pe+NfT4qC0Rj6I
=6tTn
-----END PGP SIGNATURE-----

Subject: Re: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

On Tue, Aug 25, 2015 at 03:18:17PM +0200, Jiri Slaby wrote:
> On 08/25/2015, 01:52 PM, Marek Marczykowski-Górecki wrote:
> >>> --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@
> >>> -529,12 +529,14 @@ static int gntdev_release(struct inode
> >>> *inode, struct file *flip)
> >>>
> >>> pr_debug("priv %p\n", priv);
> >>>
> >>> + mutex_lock(&priv->lock);
> >>
> >> Since 3.12 doesn't seem to include 1401c00e59ea ("xen/gntdev:
> >> convert priv->lock to a mutex"), this shouldn't be applied as
> >> priv->lock is actually a spinlock. So, you'll need to pick
> >> 1401c00e59ea or backport this patch using the appropriate locking
> >> directives. Not sure what's the best solution. Maybe Marek or
> >> David can help...?
> >
> > I've used spinlock approach for some time (on 3.18.x) and it works
> > ok. This applies also to 3.10 and 3.14 of course.
> >
> > Patch here:
> > https://raw.githubusercontent.com/QubesOS/qubes-linux-kernel/stable-3.
> 18/patches.xen/0001-xen-grant-fix-race-condition-in-gntdev_release.patch
> >
> > and here:
> >
> > From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00
> > 2001 From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> > <[email protected]> Date: Fri, 26 Jun 2015 02:16:49
> > +0200 Subject: [PATCH] xen/grant: fix race condition in
> > gntdev_release MIME-Version: 1.0 Content-Type: text/plain;
> > charset=UTF-8 Content-Transfer-Encoding: 8bit Organization:
> > Invisible Things Lab Cc: Marek Marczykowski-Górecki
> > <[email protected]>
> >
> > While gntdev_release is called, MMU notifier is still registered
> > and will traverse priv->maps list even if no pages are mapped
> > (which is the case - gntdev_release is called after all). But
> > gntdev_release will clear that list, so make sure that only one of
> > those things happens at the same time.
> >
> > Signed-off-by: Marek Marczykowski-Górecki
> > <[email protected]> --- drivers/xen/gntdev.c | 2 ++ 1
> > file changed, 2 insertions(+)
> >
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index
> > 8927485..4bd23bb 100644 --- a/drivers/xen/gntdev.c +++
> > b/drivers/xen/gntdev.c @@ -568,12 +568,14 @@ static int
> > gntdev_release(struct inode *inode, struct file *flip)
> >
> > pr_debug("priv %p\n", priv);
> >
> > + spin_lock(&priv->lock); while (!list_empty(&priv->maps)) { map =
> > list_entry(priv->maps.next, struct grant_map, next);
> > list_del(&map->next); gntdev_put_map(NULL /* already removed */,
> > map); } WARN_ON(!list_empty(&priv->freeable_maps)); +
> > spin_unlock(&priv->lock);
>
> Hmm, but e.g.
> gntdev_put_map
> -> gntdev_free_map
> -> free_xenballooned_pages
> -> mutex_lock
>
> means sleep inside atomic, right?

Indeed, you're probably right. But I haven't hit that problem ever...

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


Attachments:
(No filename) (2.93 kB)
(No filename) (473.00 B)
Download all attachments

2015-08-27 07:59:58

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 3.12 49/82] xen/gntdevt: Fix race condition in gntdev_release()

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08/25/2015, 04:08 PM, Marek Marczykowski-Górecki wrote:
>> Hmm, but e.g. gntdev_put_map -> gntdev_free_map ->
>> free_xenballooned_pages -> mutex_lock
>>
>> means sleep inside atomic, right?
>
> Indeed, you're probably right. But I haven't hit that problem
> ever...

Ok, so I dropped 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 from 3.12.

thanks,
- --
js
suse labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJV3sN2AAoJEL0lsQQGtHBJkIAP/Regy1kPHkB8dlHejk9IF+pO
8AG2VfeeDq9XTX94mb02AApgZbD+/0yLegCKoidaEpgjUNxpxcocrYo0K7dJQnAM
ThYsuhaXVp1H13nOt7a3IpXtz1290qpYe+t6es9uA5Dp/HW7Stv4vI82lN1B9S1l
FphFXt1GiHEr6hSF6YkQgRo9b0//hmY/pYbakkB7gwxUmelvBpNoOdqukTpOSKZ9
3asF2cpxQ2kHY3EoFTM16PbUlyH91FF8j/F/AChYQfOaM7ikChIfMKz0bXIbR/FH
esmP5IDkEdA01M1irtUjjzdtBSAOBM/9Ii2YhxAE4rPk/MV0fi5Lg8G4iBDLF4Fu
+/80G2UsYmR17AD/LIZ/ohSaO3Z9tXTZrJjPrqPAetZcieXBSast5r5LldN4WiHG
DilbDNU6zVgk2OHG1cJwVR8xvXTohwXveY4vrEPLLZWyyBYEh1ABBf0/NBgQvbKY
JqooOFU3Z1mPNRBko+IbIM9VaZtx9WLgR4QUwsoZVS8ItHIqZGnHxkEhMn8arjl1
K++xL/v2YK2iwCyyqR83tTBsFEg3DFFI4M0MTAh+LyZkW5/weHlVDwlCV6qdkOvW
JMJgdivgrHl2K1CGYIjXW8aOtKYq1GSI398JQPfFOlsX37wNiBrB3xODq2Eph7nS
3k7uCgwbeyj3coICr8Tm
=OuQx
-----END PGP SIGNATURE-----

2015-08-27 08:10:15

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 3.12 00/82] 3.12.47-stable review

On 08/24/2015, 06:09 PM, Guenter Roeck wrote:
> On 08/24/2015 02:09 AM, Jiri Slaby wrote:
>> This is the start of the stable review cycle for the 3.12.47 release.
>> There are 82 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Wed Aug 26 11:08:59 CEST 2015.
>> Anything received after that time might be too late.
>>
>
> Build results:
> total: 124 pass: 124 fail: 0
> Qemu test results:
> total: 70 pass: 70 fail: 0
>
> Details are available at http://server.roeck-us.net:8010/builders.

On 08/25/2015, 01:36 AM, Shuah Khan wrote:
> Compiled and booted on my test system. No dmesg regressions.

Thank you both!

--
js
suse labs