2017-06-12 15:39:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 00/90] 4.4.72-stable review

This is the start of the stable review cycle for the 4.4.72 release.
There are 90 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jun 14 15:25:30 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.72-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.4.72-rc1

Mark Rutland <[email protected]>
arm64: ensure extension of smp_store_release value

Mark Rutland <[email protected]>
arm64: armv8_deprecated: ensure extension of addr

Kees Cook <[email protected]>
usercopy: Adjust tests to deal with SMAP/PAN

Amey Telawane <[email protected]>
tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

Mike Marciniszyn <[email protected]>
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate

Kristina Martsenko <[email protected]>
arm64: entry: improve data abort handling of tagged pointers

Kristina Martsenko <[email protected]>
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers

Artem Savkov <[email protected]>
Make __xfs_xattr_put_listen preperly report errors.

Trond Myklebust <[email protected]>
NFSv4: Don't perform cached access checks before we've OPENed the file

Trond Myklebust <[email protected]>
NFS: Ensure we revalidate attributes before using execute_ok()

Michal Hocko <[email protected]>
mm: consider memblock reservations for deferred memory initialization sizing

Eric Dumazet <[email protected]>
net: better skb->sender_cpu and skb->napi_id cohabitation

Takatoshi Akiyama <[email protected]>
serial: sh-sci: Fix panic when serial console and DMA are enabled

Peter Hurley <[email protected]>
tty: Drop krefs for interrupted tty lock

Julius Werner <[email protected]>
drivers: char: mem: Fix wraparound check to allow mappings up to the end

Takashi Iwai <[email protected]>
ASoC: Fix use-after-free at card unregistration

Takashi Iwai <[email protected]>
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT

Takashi Iwai <[email protected]>
ALSA: timer: Fix race between read and ioctl

Ben Skeggs <[email protected]>
drm/nouveau/tmr: fully separate alarm execution/pending lists

Sinclair Yeh <[email protected]>
drm/vmwgfx: Make sure backup_handle is always valid

Vladis Dronov <[email protected]>
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()

Dan Carpenter <[email protected]>
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()

Jin Yao <[email protected]>
perf/core: Drop kernel samples even though :u is specified

Michael Bringmann <[email protected]>
powerpc/hotplug-mem: Fix missing endian conversion of aa_index

Michael Ellerman <[email protected]>
powerpc/numa: Fix percpu allocations to be NUMA aware

Russell Currey <[email protected]>
powerpc/eeh: Avoid use after free in eeh_handle_special_event()

Johannes Thumshirn <[email protected]>
scsi: qla2xxx: don't disable a not previously enabled PCI device

Marc Zyngier <[email protected]>
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages

Jeff Mahoney <[email protected]>
btrfs: fix memory leak in update_space_info failure path

David Sterba <[email protected]>
btrfs: use correct types for page indices in btrfs_page_exists_in_range

Frederic Barrat <[email protected]>
cxl: Fix error path on bad ioctl

Al Viro <[email protected]>
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path

Al Viro <[email protected]>
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()

Al Viro <[email protected]>
ufs: set correct ->s_maxsize

Al Viro <[email protected]>
ufs: restore maintaining ->i_blocks

Al Viro <[email protected]>
fix ufs_isblockset()

Al Viro <[email protected]>
ufs: restore proper tail allocation

Fabian Frederick <[email protected]>
fs: add i_blocksize()

Tejun Heo <[email protected]>
cpuset: consider dying css as offline

Ulrik De Bie <[email protected]>
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled

Eric Anholt <[email protected]>
drm/msm: Expose our reservation object when exporting a dmabuf.

Nicholas Bellinger <[email protected]>
target: Re-add check to reject control WRITEs with overflow data

David Arcari <[email protected]>
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails

Daniel Micay <[email protected]>
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms

Eric Biggers <[email protected]>
random: properly align get_random_int_hash

Daniel Cashman <[email protected]>
drivers: char: random: add get_random_long()

Matt Ranostay <[email protected]>
iio: proximity: as3935: fix AS3935_INT mask

Franziska Naepelt <[email protected]>
iio: light: ltr501 Fix interchanged als/ps register field

Oleg Drokin <[email protected]>
staging/lustre/lov: remove set_fs() call from lov_getstripe()

Michael Thalmeier <[email protected]>
usb: chipidea: debug: check before accessing ci_role

Jisheng Zhang <[email protected]>
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed

Thinh Nguyen <[email protected]>
usb: gadget: f_mass_storage: Serialize wake and sleep execution

Jan Kara <[email protected]>
ext4: fix fdatasync(2) after extent manipulation operations

Konstantin Khlebnikov <[email protected]>
ext4: keep existing extra fields when inode expands

Jan Kara <[email protected]>
ext4: fix SEEK_HOLE

Dongli Zhang <[email protected]>
xen-netfront: cast grant table reference first to type int

Dongli Zhang <[email protected]>
xen-netfront: do not cast grant table reference to signed short

Julien Grall <[email protected]>
xen/privcmd: Support correctly 64KB page granularity when mapping memory

Alexander Sverdlin <[email protected]>
dmaengine: ep93xx: Always start from BASE0

Hiroyuki Yokoyama <[email protected]>
dmaengine: usb-dmac: Fix DMAOR AE bit definition

Wanpeng Li <[email protected]>
KVM: async_pf: avoid async pf injection when in guest mode

Marc Zyngier <[email protected]>
arm: KVM: Allow unaligned accesses at HYP

Wanpeng Li <[email protected]>
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation

Paolo Bonzini <[email protected]>
kvm: async_pf: fix rcu_irq_enter() with irqs enabled

Trond Myklebust <[email protected]>
nfsd: Fix up the "supattr_exclcreat" attributes

J. Bruce Fields <[email protected]>
nfsd4: fix null dereference on replay

Alex Deucher <[email protected]>
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)

Gilad Ben-Yossef <[email protected]>
crypto: gcm - wait for crypto op not signal safe

Eric Biggers <[email protected]>
KEYS: fix freeing uninitialized memory in key_update()

Eric Biggers <[email protected]>
KEYS: fix dereferencing NULL payload with nonzero length

Eric W. Biederman <[email protected]>
ptrace: Properly initialize ptracer_cred on fork

Johan Hovold <[email protected]>
serial: ifx6x60: fix use-after-free on module unload

Jane Chu <[email protected]>
arch/sparc: support NR_CPUS = 4096

Pavel Tatashin <[email protected]>
sparc64: delete old wrap code

Pavel Tatashin <[email protected]>
sparc64: new context wrap

Pavel Tatashin <[email protected]>
sparc64: add per-cpu mm of secondary contexts

Pavel Tatashin <[email protected]>
sparc64: redefine first version

Pavel Tatashin <[email protected]>
sparc64: combine activate_mm and switch_mm

Pavel Tatashin <[email protected]>
sparc64: reset mm cpumask after wrap

James Clarke <[email protected]>
sparc: Machine description indices can vary

Mike Kravetz <[email protected]>
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs

Nikolay Aleksandrov <[email protected]>
net: bridge: start hello timer only if device is up

Max Filippov <[email protected]>
net: ethoc: enable NAPI before poll may be scheduled

Eric Dumazet <[email protected]>
net: ping: do not abuse udp_poll()

David S. Miller <[email protected]>
ipv6: Fix leak in ipv6_gso_segment().

Mark Bloch <[email protected]>
vxlan: fix use-after-free on deletion

Yuchung Cheng <[email protected]>
tcp: disallow cwnd undo when switching congestion control

Ganesh Goudar <[email protected]>
cxgb4: avoid enabling napi twice to the same queue

Ben Hutchings <[email protected]>
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

Mintz, Yuval <[email protected]>
bnx2x: Fix Multi-Cos


-------------

Diffstat:

Makefile | 4 +-
arch/arm/kvm/init.S | 5 +-
arch/arm/kvm/mmu.c | 3 +
arch/arm64/include/asm/asm-uaccess.h | 13 ++++
arch/arm64/include/asm/barrier.h | 18 ++++-
arch/arm64/include/asm/uaccess.h | 8 ++
arch/arm64/kernel/armv8_deprecated.c | 3 +-
arch/arm64/kernel/entry.S | 6 +-
arch/arm64/kernel/hw_breakpoint.c | 3 +-
arch/powerpc/include/asm/topology.h | 14 ++++
arch/powerpc/kernel/eeh_driver.c | 19 ++++-
arch/powerpc/kernel/setup_64.c | 4 +-
arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +
arch/sparc/Kconfig | 4 +-
arch/sparc/include/asm/mmu_64.h | 2 +-
arch/sparc/include/asm/mmu_context_64.h | 32 +-------
arch/sparc/include/asm/pil.h | 1 -
arch/sparc/include/asm/vio.h | 1 +
arch/sparc/kernel/irq_64.c | 17 ++++-
arch/sparc/kernel/kernel.h | 1 -
arch/sparc/kernel/smp_64.c | 31 --------
arch/sparc/kernel/tsb.S | 11 ++-
arch/sparc/kernel/ttable_64.S | 2 +-
arch/sparc/kernel/vio.c | 68 ++++++++++++++++-
arch/sparc/mm/init_64.c | 86 +++++++++++++++-------
arch/sparc/mm/tsb.c | 7 +-
arch/sparc/mm/ultra.S | 5 --
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kvm/cpuid.c | 20 ++---
arch/x86/kvm/mmu.c | 7 +-
arch/x86/kvm/mmu.h | 1 +
arch/x86/kvm/x86.c | 3 +-
crypto/gcm.c | 6 +-
drivers/char/mem.c | 2 +-
drivers/char/random.c | 26 ++++++-
drivers/cpufreq/cpufreq.c | 1 +
drivers/dma/ep93xx_dma.c | 2 +
drivers/dma/sh/usb-dmac.c | 2 +-
drivers/gpu/drm/amd/amdgpu/ci_dpm.c | 6 ++
drivers/gpu/drm/msm/msm_drv.c | 1 +
drivers/gpu/drm/msm/msm_drv.h | 1 +
drivers/gpu/drm/msm/msm_gem_prime.c | 7 ++
.../gpu/drm/nouveau/include/nvkm/subdev/timer.h | 1 +
drivers/gpu/drm/nouveau/nvkm/subdev/timer/base.c | 7 +-
drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 2 +
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 21 ++++--
drivers/iio/light/ltr501.c | 4 +-
drivers/iio/proximity/as3935.c | 4 +-
drivers/infiniband/hw/qib/qib_rc.c | 4 +-
drivers/input/mouse/elantech.c | 16 ++++
drivers/misc/cxl/file.c | 7 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 2 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 +
drivers/net/ethernet/ethoc.c | 3 +-
drivers/net/vxlan.c | 19 +++--
drivers/net/xen-netfront.c | 4 +-
drivers/scsi/qla2xxx/qla_os.c | 8 +-
drivers/staging/lustre/lustre/lov/lov_pack.c | 9 ---
drivers/target/target_core_transport.c | 23 ++++--
drivers/tty/serial/ifx6x60.c | 2 +-
drivers/tty/serial/sh-sci.c | 10 ++-
drivers/tty/tty_io.c | 3 +-
drivers/tty/tty_mutex.c | 7 +-
drivers/usb/chipidea/debug.c | 3 +-
drivers/usb/chipidea/udc.c | 8 +-
drivers/usb/gadget/function/f_mass_storage.c | 13 +++-
drivers/xen/privcmd.c | 4 +-
fs/btrfs/extent-tree.c | 1 +
fs/btrfs/file.c | 2 +-
fs/btrfs/inode.c | 4 +-
fs/buffer.c | 12 +--
fs/ceph/addr.c | 2 +-
fs/direct-io.c | 2 +-
fs/ext4/extents.c | 5 ++
fs/ext4/file.c | 50 ++++---------
fs/ext4/inode.c | 9 ++-
fs/ext4/move_extent.c | 2 +-
fs/jfs/super.c | 4 +-
fs/mpage.c | 2 +-
fs/nfs/dir.c | 21 +++++-
fs/nfsd/blocklayout.c | 4 +-
fs/nfsd/nfs4proc.c | 13 ++--
fs/nfsd/nfs4xdr.c | 13 +++-
fs/nilfs2/btnode.c | 2 +-
fs/nilfs2/inode.c | 4 +-
fs/nilfs2/mdt.c | 4 +-
fs/nilfs2/segment.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/ocfs2/file.c | 2 +-
fs/reiserfs/file.c | 2 +-
fs/reiserfs/inode.c | 2 +-
fs/stat.c | 3 +-
fs/udf/inode.c | 2 +-
fs/ufs/balloc.c | 26 ++++++-
fs/ufs/inode.c | 9 ++-
fs/ufs/super.c | 18 +++++
fs/ufs/util.h | 10 ++-
fs/xfs/xfs_aops.c | 10 +--
fs/xfs/xfs_file.c | 4 +-
fs/xfs/xfs_xattr.c | 1 +
include/linux/cgroup.h | 20 +++++
include/linux/fs.h | 5 ++
include/linux/memblock.h | 8 ++
include/linux/mmzone.h | 1 +
include/linux/ptrace.h | 7 +-
include/linux/random.h | 1 +
include/linux/skbuff.h | 3 -
include/net/ipv6.h | 1 +
kernel/cpuset.c | 4 +-
kernel/events/core.c | 21 ++++++
kernel/fork.c | 2 +-
kernel/ptrace.c | 20 +++--
kernel/trace/trace.c | 2 +-
lib/test_user_copy.c | 20 ++++-
mm/memblock.c | 24 ++++++
mm/page_alloc.c | 25 ++++++-
mm/truncate.c | 2 +-
net/bridge/br_stp_if.c | 3 +-
net/core/dev.c | 33 ++++-----
net/ipv4/af_inet.c | 2 +-
net/ipv4/tcp_cong.c | 1 +
net/ipv6/ip6_offload.c | 4 +-
net/ipv6/ping.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/xfrm6_mode_ro.c | 2 +
net/ipv6/xfrm6_mode_transport.c | 2 +
security/keys/key.c | 5 +-
security/keys/keyctl.c | 4 +-
sound/core/timer.c | 7 +-
sound/soc/soc-core.c | 5 +-
130 files changed, 770 insertions(+), 362 deletions(-)



2017-06-12 15:38:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 14/90] sparc64: redefine first version

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit c4415235b2be0cc791572e8e7f7466ab8f73a2bf ]

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_64.h | 2 +-
arch/sparc/mm/init_64.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/sparc/include/asm/mmu_64.h
+++ b/arch/sparc/include/asm/mmu_64.h
@@ -52,7 +52,7 @@
#define CTX_NR_MASK TAG_CONTEXT_BITS
#define CTX_HW_MASK (CTX_NR_MASK | CTX_PGSZ_MASK)

-#define CTX_FIRST_VERSION ((_AC(1,UL) << CTX_VERSION_SHIFT) + _AC(1,UL))
+#define CTX_FIRST_VERSION BIT(CTX_VERSION_SHIFT)
#define CTX_VALID(__ctx) \
(!(((__ctx.sparc64_ctx_val) ^ tlb_context_cache) & CTX_VERSION_MASK))
#define CTX_HWBITS(__ctx) ((__ctx.sparc64_ctx_val) & CTX_HW_MASK)
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -656,7 +656,7 @@ EXPORT_SYMBOL(__flush_dcache_range);

/* get_new_mmu_context() uses "cache + 1". */
DEFINE_SPINLOCK(ctx_alloc_lock);
-unsigned long tlb_context_cache = CTX_FIRST_VERSION - 1;
+unsigned long tlb_context_cache = CTX_FIRST_VERSION;
#define MAX_CTX_NR (1UL << CTX_NR_BITS)
#define CTX_BMAP_SLOTS BITS_TO_LONGS(MAX_CTX_NR)
DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
@@ -687,9 +687,9 @@ void get_new_mmu_context(struct mm_struc
if (new_ctx >= ctx) {
int i;
new_ctx = (tlb_context_cache & CTX_VERSION_MASK) +
- CTX_FIRST_VERSION;
+ CTX_FIRST_VERSION + 1;
if (new_ctx == 1)
- new_ctx = CTX_FIRST_VERSION;
+ new_ctx = CTX_FIRST_VERSION + 1;

/* Don't call memset, for 16 entries that's just
* plain silly...


2017-06-12 15:38:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 15/90] sparc64: add per-cpu mm of secondary contexts

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit 7a5b4bbf49fe86ce77488a70c5dccfe2d50d7a2d ]

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 5 +++--
arch/sparc/mm/init_64.c | 1 +
2 files changed, 4 insertions(+), 2 deletions(-)

--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -17,6 +17,7 @@ extern spinlock_t ctx_alloc_lock;
extern unsigned long tlb_context_cache;
extern unsigned long mmu_context_bmap[];

+DECLARE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm);
void get_new_mmu_context(struct mm_struct *mm);
#ifdef CONFIG_SMP
void smp_new_mmu_context_version(void);
@@ -74,8 +75,9 @@ void __flush_tlb_mm(unsigned long, unsig
static inline void switch_mm(struct mm_struct *old_mm, struct mm_struct *mm, struct task_struct *tsk)
{
unsigned long ctx_valid, flags;
- int cpu;
+ int cpu = smp_processor_id();

+ per_cpu(per_cpu_secondary_mm, cpu) = mm;
if (unlikely(mm == &init_mm))
return;

@@ -121,7 +123,6 @@ static inline void switch_mm(struct mm_s
* for the first time, we must flush that context out of the
* local TLB.
*/
- cpu = smp_processor_id();
if (!ctx_valid || !cpumask_test_cpu(cpu, mm_cpumask(mm))) {
cpumask_set_cpu(cpu, mm_cpumask(mm));
__flush_tlb_mm(CTX_HWBITS(mm->context),
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -660,6 +660,7 @@ unsigned long tlb_context_cache = CTX_FI
#define MAX_CTX_NR (1UL << CTX_NR_BITS)
#define CTX_BMAP_SLOTS BITS_TO_LONGS(MAX_CTX_NR)
DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
+DEFINE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm) = {0};

/* Caller does TLB context flushing on local CPU if necessary.
* The caller also ensures that CTX_VALID(mm->context) is false.


2017-06-12 15:38:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 16/90] sparc64: new context wrap

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit a0582f26ec9dfd5360ea2f35dd9a1b026f8adda0 ]

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap. This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime. We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/mm/init_64.c | 81 ++++++++++++++++++++++++++++++++----------------
1 file changed, 54 insertions(+), 27 deletions(-)

--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -662,6 +662,53 @@ unsigned long tlb_context_cache = CTX_FI
DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
DEFINE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm) = {0};

+static void mmu_context_wrap(void)
+{
+ unsigned long old_ver = tlb_context_cache & CTX_VERSION_MASK;
+ unsigned long new_ver, new_ctx, old_ctx;
+ struct mm_struct *mm;
+ int cpu;
+
+ bitmap_zero(mmu_context_bmap, 1 << CTX_NR_BITS);
+
+ /* Reserve kernel context */
+ set_bit(0, mmu_context_bmap);
+
+ new_ver = (tlb_context_cache & CTX_VERSION_MASK) + CTX_FIRST_VERSION;
+ if (unlikely(new_ver == 0))
+ new_ver = CTX_FIRST_VERSION;
+ tlb_context_cache = new_ver;
+
+ /*
+ * Make sure that any new mm that are added into per_cpu_secondary_mm,
+ * are going to go through get_new_mmu_context() path.
+ */
+ mb();
+
+ /*
+ * Updated versions to current on those CPUs that had valid secondary
+ * contexts
+ */
+ for_each_online_cpu(cpu) {
+ /*
+ * If a new mm is stored after we took this mm from the array,
+ * it will go into get_new_mmu_context() path, because we
+ * already bumped the version in tlb_context_cache.
+ */
+ mm = per_cpu(per_cpu_secondary_mm, cpu);
+
+ if (unlikely(!mm || mm == &init_mm))
+ continue;
+
+ old_ctx = mm->context.sparc64_ctx_val;
+ if (likely((old_ctx & CTX_VERSION_MASK) == old_ver)) {
+ new_ctx = (old_ctx & ~CTX_VERSION_MASK) | new_ver;
+ set_bit(new_ctx & CTX_NR_MASK, mmu_context_bmap);
+ mm->context.sparc64_ctx_val = new_ctx;
+ }
+ }
+}
+
/* Caller does TLB context flushing on local CPU if necessary.
* The caller also ensures that CTX_VALID(mm->context) is false.
*
@@ -676,50 +723,30 @@ void get_new_mmu_context(struct mm_struc
{
unsigned long ctx, new_ctx;
unsigned long orig_pgsz_bits;
- int new_version;

spin_lock(&ctx_alloc_lock);
+retry:
+ /* wrap might have happened, test again if our context became valid */
+ if (unlikely(CTX_VALID(mm->context)))
+ goto out;
orig_pgsz_bits = (mm->context.sparc64_ctx_val & CTX_PGSZ_MASK);
ctx = (tlb_context_cache + 1) & CTX_NR_MASK;
new_ctx = find_next_zero_bit(mmu_context_bmap, 1 << CTX_NR_BITS, ctx);
- new_version = 0;
if (new_ctx >= (1 << CTX_NR_BITS)) {
new_ctx = find_next_zero_bit(mmu_context_bmap, ctx, 1);
if (new_ctx >= ctx) {
- int i;
- new_ctx = (tlb_context_cache & CTX_VERSION_MASK) +
- CTX_FIRST_VERSION + 1;
- if (new_ctx == 1)
- new_ctx = CTX_FIRST_VERSION + 1;
-
- /* Don't call memset, for 16 entries that's just
- * plain silly...
- */
- mmu_context_bmap[0] = 3;
- mmu_context_bmap[1] = 0;
- mmu_context_bmap[2] = 0;
- mmu_context_bmap[3] = 0;
- for (i = 4; i < CTX_BMAP_SLOTS; i += 4) {
- mmu_context_bmap[i + 0] = 0;
- mmu_context_bmap[i + 1] = 0;
- mmu_context_bmap[i + 2] = 0;
- mmu_context_bmap[i + 3] = 0;
- }
- new_version = 1;
- goto out;
+ mmu_context_wrap();
+ goto retry;
}
}
if (mm->context.sparc64_ctx_val)
cpumask_clear(mm_cpumask(mm));
mmu_context_bmap[new_ctx>>6] |= (1UL << (new_ctx & 63));
new_ctx |= (tlb_context_cache & CTX_VERSION_MASK);
-out:
tlb_context_cache = new_ctx;
mm->context.sparc64_ctx_val = new_ctx | orig_pgsz_bits;
+out:
spin_unlock(&ctx_alloc_lock);
-
- if (unlikely(new_version))
- smp_new_mmu_context_version();
}

static int numa_enabled = 1;


2017-06-12 15:38:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 22/90] KEYS: fix freeing uninitialized memory in key_update()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 63a0b0509e700717a59f049ec6e4e04e903c7fe2 upstream.

key_update() freed the key_preparsed_payload even if it was not
initialized first. This would cause a crash if userspace called
keyctl_update() on a key with type like "asymmetric" that has a
->preparse() method but not an ->update() method. Possibly it could
even be triggered for other key types by racing with keyctl_setperm() to
make the KEY_NEED_WRITE check fail (the permission was already checked,
so normally it wouldn't fail there).

Reproducer with key type "asymmetric", given a valid cert.der:

keyctl new_session
keyid=$(keyctl padd asymmetric desc @s < cert.der)
keyctl setperm $keyid 0x3f000000
keyctl update $keyid data

[ 150.686666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[ 150.687601] IP: asymmetric_key_free_kids+0x12/0x30
[ 150.688139] PGD 38a3d067
[ 150.688141] PUD 3b3de067
[ 150.688447] PMD 0
[ 150.688745]
[ 150.689160] Oops: 0000 [#1] SMP
[ 150.689455] Modules linked in:
[ 150.689769] CPU: 1 PID: 2478 Comm: keyctl Not tainted 4.11.0-rc4-xfstests-00187-ga9f6b6b8cd2f #742
[ 150.690916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[ 150.692199] task: ffff88003b30c480 task.stack: ffffc90000350000
[ 150.692952] RIP: 0010:asymmetric_key_free_kids+0x12/0x30
[ 150.693556] RSP: 0018:ffffc90000353e58 EFLAGS: 00010202
[ 150.694142] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
[ 150.694845] RDX: ffffffff81ee3920 RSI: ffff88003d4b0700 RDI: 0000000000000001
[ 150.697569] RBP: ffffc90000353e60 R08: ffff88003d5d2140 R09: 0000000000000000
[ 150.702483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 150.707393] R13: 0000000000000004 R14: ffff880038a4d2d8 R15: 000000000040411f
[ 150.709720] FS: 00007fcbcee35700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 150.711504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.712733] CR2: 0000000000000001 CR3: 0000000039eab000 CR4: 00000000003406e0
[ 150.714487] Call Trace:
[ 150.714975] asymmetric_key_free_preparse+0x2f/0x40
[ 150.715907] key_update+0xf7/0x140
[ 150.716560] ? key_default_cmp+0x20/0x20
[ 150.717319] keyctl_update_key+0xb0/0xe0
[ 150.718066] SyS_keyctl+0x109/0x130
[ 150.718663] entry_SYSCALL_64_fastpath+0x1f/0xc2
[ 150.719440] RIP: 0033:0x7fcbce75ff19
[ 150.719926] RSP: 002b:00007ffd5d167088 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
[ 150.720918] RAX: ffffffffffffffda RBX: 0000000000404d80 RCX: 00007fcbce75ff19
[ 150.721874] RDX: 00007ffd5d16785e RSI: 000000002866cd36 RDI: 0000000000000002
[ 150.722827] RBP: 0000000000000006 R08: 000000002866cd36 R09: 00007ffd5d16785e
[ 150.723781] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000404d80
[ 150.724650] R13: 00007ffd5d16784d R14: 00007ffd5d167238 R15: 000000000040411f
[ 150.725447] Code: 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 ff 74 23 55 48 89 e5 53 48 89 fb <48> 8b 3f e8 06 21 c5 ff 48 8b 7b 08 e8 fd 20 c5 ff 48 89 df e8
[ 150.727489] RIP: asymmetric_key_free_kids+0x12/0x30 RSP: ffffc90000353e58
[ 150.728117] CR2: 0000000000000001
[ 150.728430] ---[ end trace f7f8fe1da2d5ae8d ]---

Fixes: 4d8c0250b841 ("KEYS: Call ->free_preparse() even after ->preparse() returns an error")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: James Morris <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
security/keys/key.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -934,12 +934,11 @@ int key_update(key_ref_t key_ref, const
/* the key must be writable */
ret = key_permission(key_ref, KEY_NEED_WRITE);
if (ret < 0)
- goto error;
+ return ret;

/* attempt to update it if supported */
- ret = -EOPNOTSUPP;
if (!key->type->update)
- goto error;
+ return -EOPNOTSUPP;

memset(&prep, 0, sizeof(prep));
prep.data = payload;


2017-06-12 15:38:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 23/90] crypto: gcm - wait for crypto op not signal safe

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Gilad Ben-Yossef <[email protected]>

commit f3ad587070d6bd961ab942b3fd7a85d00dfc934b upstream.

crypto_gcm_setkey() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the data buffer that is kfree'ed in this case.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Gilad Ben-Yossef <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
crypto/gcm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -152,10 +152,8 @@ static int crypto_gcm_setkey(struct cryp

err = crypto_ablkcipher_encrypt(&data->req);
if (err == -EINPROGRESS || err == -EBUSY) {
- err = wait_for_completion_interruptible(
- &data->result.completion);
- if (!err)
- err = data->result.err;
+ wait_for_completion(&data->result.completion);
+ err = data->result.err;
}

if (err)


2017-06-12 15:38:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 25/90] nfsd4: fix null dereference on replay

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <[email protected]>

commit 9a307403d374b993061f5992a6e260c944920d0b upstream.

if we receive a compound such that:

- the sessionid, slot, and sequence number in the SEQUENCE op
match a cached succesful reply with N ops, and
- the Nth operation of the compound is a PUTFH, PUTPUBFH,
PUTROOTFH, or RESTOREFH,

then nfsd4_sequence will return 0 and set cstate->status to
nfserr_replay_cache. The current filehandle will not be set. This will
cause us to call check_nfsd_access with first argument NULL.

To nfsd4_compound it looks like we just succesfully executed an
operation that set a filehandle, but the current filehandle is not set.

Fix this by moving the nfserr_replay_cache earlier. There was never any
reason to have it after the encode_op label, since the only case where
he hit that is when opdesc->op_func sets it.

Note that there are two ways we could hit this case:

- a client is resending a previously sent compound that ended
with one of the four PUTFH-like operations, or
- a client is sending a *new* compound that (incorrectly) shares
sessionid, slot, and sequence number with a previously sent
compound, and the length of the previously sent compound
happens to match the position of a PUTFH-like operation in the
new compound.

The second is obviously incorrect client behavior. The first is also
very strange--the only purpose of a PUTFH-like operation is to set the
current filehandle to be used by the following operation, so there's no
point in having it as the last in a compound.

So it's likely this requires a buggy or malicious client to reproduce.

Reported-by: Scott Mayhew <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfsd/nfs4proc.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1690,6 +1690,12 @@ nfsd4_proc_compound(struct svc_rqst *rqs
opdesc->op_get_currentstateid(cstate, &op->u);
op->status = opdesc->op_func(rqstp, cstate, &op->u);

+ /* Only from SEQUENCE */
+ if (cstate->status == nfserr_replay_cache) {
+ dprintk("%s NFS4.1 replay from cache\n", __func__);
+ status = op->status;
+ goto out;
+ }
if (!op->status) {
if (opdesc->op_set_currentstateid)
opdesc->op_set_currentstateid(cstate, &op->u);
@@ -1700,14 +1706,7 @@ nfsd4_proc_compound(struct svc_rqst *rqs
if (need_wrongsec_check(rqstp))
op->status = check_nfsd_access(current_fh->fh_export, rqstp);
}
-
encode_op:
- /* Only from SEQUENCE */
- if (cstate->status == nfserr_replay_cache) {
- dprintk("%s NFS4.1 replay from cache\n", __func__);
- status = op->status;
- goto out;
- }
if (op->status == nfserr_replay_me) {
op->replay = &cstate->replay_owner->so_replay;
nfsd4_encode_replay(&resp->xdr, op);


2017-06-12 15:38:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 29/90] arm: KVM: Allow unaligned accesses at HYP

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 33b5c38852b29736f3b472dd095c9a18ec22746f upstream.

We currently have the HSCTLR.A bit set, trapping unaligned accesses
at HYP, but we're not really prepared to deal with it.

Since the rest of the kernel is pretty happy about that, let's follow
its example and set HSCTLR.A to zero. Modern CPUs don't really care.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kvm/init.S | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -110,7 +110,6 @@ __do_hyp_init:
@ - Write permission implies XN: disabled
@ - Instruction cache: enabled
@ - Data/Unified cache: enabled
- @ - Memory alignment checks: enabled
@ - MMU: enabled (this code must be run from an identity mapping)
mrc p15, 4, r0, c1, c0, 0 @ HSCR
ldr r2, =HSCTLR_MASK
@@ -118,8 +117,8 @@ __do_hyp_init:
mrc p15, 0, r1, c1, c0, 0 @ SCTLR
ldr r2, =(HSCTLR_EE | HSCTLR_FI | HSCTLR_I | HSCTLR_C)
and r1, r1, r2
- ARM( ldr r2, =(HSCTLR_M | HSCTLR_A) )
- THUMB( ldr r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE) )
+ ARM( ldr r2, =(HSCTLR_M) )
+ THUMB( ldr r2, =(HSCTLR_M | HSCTLR_TE) )
orr r1, r1, r2
orr r0, r0, r1
isb


2017-06-12 15:38:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 04/90] tcp: disallow cwnd undo when switching congestion control

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <[email protected]>


[ Upstream commit 44abafc4cc094214a99f860f778c48ecb23422fc ]

When the sender switches its congestion control during loss
recovery, if the recovery is spurious then it may incorrectly
revert cwnd and ssthresh to the older values set by a previous
congestion control. Consider a congestion control (like BBR)
that does not use ssthresh and keeps it infinite: the connection
may incorrectly revert cwnd to an infinite value when switching
from BBR to another congestion control.

This patch fixes it by disallowing such cwnd undo operation
upon switching congestion control. Note that undo_marker
is not reset s.t. the packets that were incorrectly marked
lost would be corrected. We only avoid undoing the cwnd in
tcp_undo_cwnd_reduction().

Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_cong.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -183,6 +183,7 @@ void tcp_init_congestion_control(struct
{
const struct inet_connection_sock *icsk = inet_csk(sk);

+ tcp_sk(sk)->prior_ssthresh = 0;
if (icsk->icsk_ca_ops->init)
icsk->icsk_ca_ops->init(sk);
if (tcp_ca_needs_ecn(sk))


2017-06-12 15:39:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 41/90] usb: chipidea: debug: check before accessing ci_role

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Thalmeier <[email protected]>

commit 0340ff83cd4475261e7474033a381bc125b45244 upstream.

ci_role BUGs when the role is >= CI_ROLE_END.

Signed-off-by: Michael Thalmeier <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/chipidea/debug.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/usb/chipidea/debug.c
+++ b/drivers/usb/chipidea/debug.c
@@ -295,7 +295,8 @@ static int ci_role_show(struct seq_file
{
struct ci_hdrc *ci = s->private;

- seq_printf(s, "%s\n", ci_role(ci)->name);
+ if (ci->role != CI_ROLE_END)
+ seq_printf(s, "%s\n", ci_role(ci)->name);

return 0;
}


2017-06-12 15:39:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 06/90] ipv6: Fix leak in ipv6_gso_segment().

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>


[ Upstream commit e3e86b5119f81e5e2499bea7ea1ebe8ac6aab789 ]

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_offload.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -121,8 +121,10 @@ static struct sk_buff *ipv6_gso_segment(

if (udpfrag) {
int err = ip6_find_1stfragopt(skb, &prevhdr);
- if (err < 0)
+ if (err < 0) {
+ kfree_skb_list(segs);
return ERR_PTR(err);
+ }
fptr = (struct frag_hdr *)((u8 *)ipv6h + err);
fptr->frag_off = htons(offset);
if (skb->next)


2017-06-12 15:39:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 44/90] iio: proximity: as3935: fix AS3935_INT mask

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Matt Ranostay <[email protected]>

commit 275292d3a3d62670b1b13484707b74e5239b4bb0 upstream.

AS3935 interrupt mask has been incorrect so valid lightning events
would never trigger an buffer event. Also noise interrupt should be
BIT(0).

Fixes: 24ddb0e4bba4 ("iio: Add AS3935 lightning sensor support")
Signed-off-by: Matt Ranostay <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iio/proximity/as3935.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iio/proximity/as3935.c
+++ b/drivers/iio/proximity/as3935.c
@@ -40,9 +40,9 @@
#define AS3935_AFE_PWR_BIT BIT(0)

#define AS3935_INT 0x03
-#define AS3935_INT_MASK 0x07
+#define AS3935_INT_MASK 0x0f
#define AS3935_EVENT_INT BIT(3)
-#define AS3935_NOISE_INT BIT(1)
+#define AS3935_NOISE_INT BIT(0)

#define AS3935_DATA 0x07
#define AS3935_DATA_MASK 0x3F


2017-06-12 15:39:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 45/90] drivers: char: random: add get_random_long()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Cashman <[email protected]>

commit ec9ee4acd97c0039a61c0ae4f12705767ae62153 upstream.

Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.

The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().

Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.

Finally, replace calls to get_random_int() with get_random_long() where
appropriate.

This patch (of 2):

Add get_random_long().

Signed-off-by: Daniel Cashman <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Nick Kralevich <[email protected]>
Cc: Jeff Vander Stoep <[email protected]>
Cc: Mark Salyzyn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/random.c | 22 ++++++++++++++++++++++
include/linux/random.h | 1 +
2 files changed, 23 insertions(+)

--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1825,6 +1825,28 @@ unsigned int get_random_int(void)
EXPORT_SYMBOL(get_random_int);

/*
+ * Same as get_random_int(), but returns unsigned long.
+ */
+unsigned long get_random_long(void)
+{
+ __u32 *hash;
+ unsigned long ret;
+
+ if (arch_get_random_long(&ret))
+ return ret;
+
+ hash = get_cpu_var(get_random_int_hash);
+
+ hash[0] += current->pid + jiffies + random_get_entropy();
+ md5_transform(hash, random_int_secret);
+ ret = *(unsigned long *)hash;
+ put_cpu_var(get_random_int_hash);
+
+ return ret;
+}
+EXPORT_SYMBOL(get_random_long);
+
+/*
* randomize_range() returns a start address such that
*
* [...... <range> .....]
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern const struct file_operations rand
#endif

unsigned int get_random_int(void);
+unsigned long get_random_long(void);
unsigned long randomize_range(unsigned long start, unsigned long end, unsigned long len);

u32 prandom_u32(void);


2017-06-12 15:39:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 50/90] drm/msm: Expose our reservation object when exporting a dmabuf.

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Anholt <[email protected]>

commit 43523eba79bda8f5b4c27f8ffe20ea078d20113a upstream.

Without this, polling on the dma-buf (and presumably other devices
synchronizing against our rendering) would return immediately, even
while the BO was busy.

Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/msm/msm_drv.c | 1 +
drivers/gpu/drm/msm/msm_drv.h | 1 +
drivers/gpu/drm/msm/msm_gem_prime.c | 7 +++++++
3 files changed, 9 insertions(+)

--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -986,6 +986,7 @@ static struct drm_driver msm_driver = {
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
.gem_prime_export = drm_gem_prime_export,
.gem_prime_import = drm_gem_prime_import,
+ .gem_prime_res_obj = msm_gem_prime_res_obj,
.gem_prime_pin = msm_gem_prime_pin,
.gem_prime_unpin = msm_gem_prime_unpin,
.gem_prime_get_sg_table = msm_gem_prime_get_sg_table,
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -212,6 +212,7 @@ struct sg_table *msm_gem_prime_get_sg_ta
void *msm_gem_prime_vmap(struct drm_gem_object *obj);
void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
int msm_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
+struct reservation_object *msm_gem_prime_res_obj(struct drm_gem_object *obj);
struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach, struct sg_table *sg);
int msm_gem_prime_pin(struct drm_gem_object *obj);
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -70,3 +70,10 @@ void msm_gem_prime_unpin(struct drm_gem_
if (!obj->import_attach)
msm_gem_put_pages(obj);
}
+
+struct reservation_object *msm_gem_prime_res_obj(struct drm_gem_object *obj)
+{
+ struct msm_gem_object *msm_obj = to_msm_bo(obj);
+
+ return msm_obj->resv;
+}


2017-06-12 15:39:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 54/90] ufs: restore proper tail allocation

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 8785d84d002c2ce0f68fbcd6c2c86be859802c7e upstream.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ufs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -284,7 +284,7 @@ ufs_inode_getfrag(struct inode *inode, u
goal += uspi->s_fpb;
}
tmp = ufs_new_fragments(inode, p, ufs_blknum(new_fragment),
- goal, uspi->s_fpb, err, locked_page);
+ goal, nfrags, err, locked_page);

if (!tmp) {
*err = -ENOSPC;


2017-06-12 15:39:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 56/90] ufs: restore maintaining ->i_blocks

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit eb315d2ae614493fd1ebb026c75a80573d84f7ad upstream.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/stat.c | 1 +
fs/ufs/balloc.c | 26 +++++++++++++++++++++++++-
2 files changed, 26 insertions(+), 1 deletion(-)

--- a/fs/stat.c
+++ b/fs/stat.c
@@ -454,6 +454,7 @@ void __inode_add_bytes(struct inode *ino
inode->i_bytes -= 512;
}
}
+EXPORT_SYMBOL(__inode_add_bytes);

void inode_add_bytes(struct inode *inode, loff_t bytes)
{
--- a/fs/ufs/balloc.c
+++ b/fs/ufs/balloc.c
@@ -81,7 +81,8 @@ void ufs_free_fragments(struct inode *in
ufs_error (sb, "ufs_free_fragments",
"bit already cleared for fragment %u", i);
}
-
+
+ inode_sub_bytes(inode, count << uspi->s_fshift);
fs32_add(sb, &ucg->cg_cs.cs_nffree, count);
uspi->cs_total.cs_nffree += count;
fs32_add(sb, &UFS_SB(sb)->fs_cs(cgno).cs_nffree, count);
@@ -183,6 +184,7 @@ do_more:
ufs_error(sb, "ufs_free_blocks", "freeing free fragment");
}
ubh_setblock(UCPI_UBH(ucpi), ucpi->c_freeoff, blkno);
+ inode_sub_bytes(inode, uspi->s_fpb << uspi->s_fshift);
if ((UFS_SB(sb)->s_flags & UFS_CG_MASK) == UFS_CG_44BSD)
ufs_clusteracct (sb, ucpi, blkno, 1);

@@ -494,6 +496,20 @@ u64 ufs_new_fragments(struct inode *inod
return 0;
}

+static bool try_add_frags(struct inode *inode, unsigned frags)
+{
+ unsigned size = frags * i_blocksize(inode);
+ spin_lock(&inode->i_lock);
+ __inode_add_bytes(inode, size);
+ if (unlikely((u32)inode->i_blocks != inode->i_blocks)) {
+ __inode_sub_bytes(inode, size);
+ spin_unlock(&inode->i_lock);
+ return false;
+ }
+ spin_unlock(&inode->i_lock);
+ return true;
+}
+
static u64 ufs_add_fragments(struct inode *inode, u64 fragment,
unsigned oldcount, unsigned newcount)
{
@@ -530,6 +546,9 @@ static u64 ufs_add_fragments(struct inod
for (i = oldcount; i < newcount; i++)
if (ubh_isclr (UCPI_UBH(ucpi), ucpi->c_freeoff, fragno + i))
return 0;
+
+ if (!try_add_frags(inode, count))
+ return 0;
/*
* Block can be extended
*/
@@ -647,6 +666,7 @@ cg_found:
ubh_setbit (UCPI_UBH(ucpi), ucpi->c_freeoff, goal + i);
i = uspi->s_fpb - count;

+ inode_sub_bytes(inode, i << uspi->s_fshift);
fs32_add(sb, &ucg->cg_cs.cs_nffree, i);
uspi->cs_total.cs_nffree += i;
fs32_add(sb, &UFS_SB(sb)->fs_cs(cgno).cs_nffree, i);
@@ -657,6 +677,8 @@ cg_found:
result = ufs_bitmap_search (sb, ucpi, goal, allocsize);
if (result == INVBLOCK)
return 0;
+ if (!try_add_frags(inode, count))
+ return 0;
for (i = 0; i < count; i++)
ubh_clrbit (UCPI_UBH(ucpi), ucpi->c_freeoff, result + i);

@@ -716,6 +738,8 @@ norot:
return INVBLOCK;
ucpi->c_rotor = result;
gotit:
+ if (!try_add_frags(inode, uspi->s_fpb))
+ return 0;
blkno = ufs_fragstoblks(result);
ubh_clrblock (UCPI_UBH(ucpi), ucpi->c_freeoff, blkno);
if ((UFS_SB(sb)->s_flags & UFS_CG_MASK) == UFS_CG_44BSD)


2017-06-12 15:39:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 60/90] cxl: Fix error path on bad ioctl

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Frederic Barrat <[email protected]>

commit cec422c11caeeccae709e9942058b6b644ce434c upstream.

Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
ioctl. We shouldn't unlock the context status mutex as it was not
locked (yet).

Fixes: 0712dc7e73e5 ("cxl: Fix issues when unmapping contexts")
Signed-off-by: Frederic Barrat <[email protected]>
Reviewed-by: Vaibhav Jain <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/misc/cxl/file.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -158,11 +158,8 @@ static long afu_ioctl_start_work(struct

/* Do this outside the status_mutex to avoid a circular dependency with
* the locking in cxl_mmap_fault() */
- if (copy_from_user(&work, uwork,
- sizeof(struct cxl_ioctl_start_work))) {
- rc = -EFAULT;
- goto out;
- }
+ if (copy_from_user(&work, uwork, sizeof(work)))
+ return -EFAULT;

mutex_lock(&ctx->status_mutex);
if (ctx->status != OPENED) {


2017-06-12 15:40:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 61/90] btrfs: use correct types for page indices in btrfs_page_exists_in_range

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Sterba <[email protected]>

commit cc2b702c52094b637a351d7491ac5200331d0445 upstream.

Variables start_idx and end_idx are supposed to hold a page index
derived from the file offsets. The int type is not the right one though,
offsets larger than 1 << 44 will get silently trimmed off the high bits.
(1 << 44 is 16TiB)

What can go wrong, if start is below the boundary and end gets trimmed:
- if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
- the final check "if (page->index <= end_idx)" will unexpectedly fail

The function will return false, ie. "there's no page in the range",
although there is at least one.

btrfs_page_exists_in_range is used to prevent races in:

* in hole punching, where we make sure there are not pages in the
truncated range, otherwise we'll wait for them to finish and redo
truncation, but we're going to replace the pages with holes anyway so
the only problem is the intermediate state

* lock_extent_direct: we want to make sure there are no pages before we
lock and start DIO, to prevent stale data reads

For practical occurence of the bug, there are several constaints. The
file must be quite large, the affected range must cross the 16TiB
boundary and the internal state of the file pages and pending operations
must match. Also, we must not have started any ordered data in the
range, otherwise we don't even reach the buggy function check.

DIO locking tries hard in several places to avoid deadlocks with
buffered IO and avoids waiting for ranges. The worst consequence seems
to be stale data read.

CC: Liu Bo <[email protected]>
Fixes: fc4adbff823f7 ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/btrfs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7318,8 +7318,8 @@ bool btrfs_page_exists_in_range(struct i
int found = false;
void **pagep = NULL;
struct page *page = NULL;
- int start_idx;
- int end_idx;
+ unsigned long start_idx;
+ unsigned long end_idx;

start_idx = start >> PAGE_CACHE_SHIFT;



2017-06-12 15:40:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 68/90] perf/core: Drop kernel samples even though :u is specified

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jin Yao <[email protected]>

commit cc1582c231ea041fbc68861dfaf957eaf902b829 upstream.

When doing sampling, for example:

perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.

This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.

The patch drops the kernel samples if exclude_kernel is specified.

For example, test on Haswell desktop:

perf record -e cycles:u <mgen>
perf report --stdio

Before patch applied:

99.77% mgen mgen [.] buf_read
0.20% mgen mgen [.] rand_buf_init
0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen mgen [.] rand_array_init
0.00% mgen [kernel.vmlinux] [k] page_fault
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] __strcasestr
0.00% mgen ld-2.23.so [.] strcmp
0.00% mgen ld-2.23.so [.] _dl_start
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _start

We can see kernel symbols apic_timer_interrupt and page_fault.

After patch applied:

99.79% mgen mgen [.] buf_read
0.19% mgen mgen [.] rand_buf_init
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen mgen [.] rand_array_init
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] vfprintf
0.00% mgen libc-2.23.so [.] rand
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen libc-2.23.so [.] _IO_doallocbuf
0.00% mgen ld-2.23.so [.] do_lookup_x
0.00% mgen ld-2.23.so [.] open_verify.constprop.7
0.00% mgen ld-2.23.so [.] _dl_important_hwcaps
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _start

There are only userspace symbols.

Signed-off-by: Jin Yao <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/events/core.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6410,6 +6410,21 @@ static void perf_log_itrace_start(struct
perf_output_end(&handle);
}

+static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
+{
+ /*
+ * Due to interrupt latency (AKA "skid"), we may enter the
+ * kernel before taking an overflow, even if the PMU is only
+ * counting user events.
+ * To avoid leaking information to userspace, we must always
+ * reject kernel samples when exclude_kernel is set.
+ */
+ if (event->attr.exclude_kernel && !user_mode(regs))
+ return false;
+
+ return true;
+}
+
/*
* Generic event overflow handling, sampling.
*/
@@ -6457,6 +6472,12 @@ static int __perf_event_overflow(struct
}

/*
+ * For security, drop the skid kernel samples if necessary.
+ */
+ if (!sample_is_allowed(event, regs))
+ return ret;
+
+ /*
* XXX event_limit might not quite work as expected on inherited
* events
*/


2017-06-12 15:40:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <[email protected]>

commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.

In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

CPU 0: data_offset = 0x0ffe8b0000
CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

node 0: [mem 0x0000000000000000-0x0000000fffffffff]
node 1: [mem 0x0000001000000000-0x0000001fffffffff]

Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/include/asm/topology.h | 14 ++++++++++++++
arch/powerpc/kernel/setup_64.c | 4 ++--
2 files changed, 16 insertions(+), 2 deletions(-)

--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topolog
extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);

+static inline int early_cpu_to_node(int cpu)
+{
+ int nid;
+
+ nid = numa_cpu_lookup_table[cpu];
+
+ /*
+ * Fall back to node 0 if nid is unset (it should be, except bugs).
+ * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
+ */
+ return (nid < 0) ? 0 : nid;
+}
#else

+static inline int early_cpu_to_node(int cpu) { return 0; }
+
static inline void dump_numa_cpu_topology(void) {}

static inline int sysfs_add_device_to_node(struct device *dev, int nid)
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -751,7 +751,7 @@ void __init setup_arch(char **cmdline_p)

static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
{
- return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
+ return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align,
__pa(MAX_DMA_ADDRESS));
}

@@ -762,7 +762,7 @@ static void __init pcpu_fc_free(void *pt

static int pcpu_cpu_distance(unsigned int from, unsigned int to)
{
- if (cpu_to_node(from) == cpu_to_node(to))
+ if (early_cpu_to_node(from) == early_cpu_to_node(to))
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;


2017-06-12 15:40:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 69/90] drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit f0c62e9878024300319ba2438adc7b06c6b9c448 upstream.

If vmalloc() fails then we need to a bit of cleanup before returning.

Fixes: fb1d9738ca05 ("drm/vmwgfx: Add DRM driver for VMware Virtual GPU")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -368,6 +368,8 @@ static void *vmw_local_fifo_reserve(stru
return fifo_state->static_buffer;
else {
fifo_state->dynamic_buffer = vmalloc(bytes);
+ if (!fifo_state->dynamic_buffer)
+ goto out_err;
return fifo_state->dynamic_buffer;
}
}


2017-06-12 15:40:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 39/90] usb: gadget: f_mass_storage: Serialize wake and sleep execution

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thinh Nguyen <[email protected]>

commit dc9217b69dd6089dcfeb86ed4b3c671504326087 upstream.

f_mass_storage has a memorry barrier issue with the sleep and wake
functions that can cause a deadlock. This results in intermittent hangs
during MSC file transfer. The host will reset the device after receiving
no response to resume the transfer. This issue is seen when dwc3 is
processing 2 transfer-in-progress events at the same time, invoking
completion handlers for CSW and CBW. Also this issue occurs depending on
the system timing and latency.

To increase the chance to hit this issue, you can force dwc3 driver to
wait and process those 2 events at once by adding a small delay (~100us)
in dwc3_check_event_buf() whenever the request is for CSW and read the
event count again. Avoid debugging with printk and ftrace as extra
delays and memory barrier will mask this issue.

Scenario which can lead to failure:
-----------------------------------
1) The main thread sleeps and waits for the next command in
get_next_command().
2) bulk_in_complete() wakes up main thread for CSW.
3) bulk_out_complete() tries to wake up the running main thread for CBW.
4) thread_wakeup_needed is not loaded with correct value in
sleep_thread().
5) Main thread goes to sleep again.

The pattern is shown below. Note the 2 critical variables.
* common->thread_wakeup_needed
* bh->state

CPU 0 (sleep_thread) CPU 1 (wakeup_thread)
============================== ===============================

bh->state = BH_STATE_FULL;
smp_wmb();
thread_wakeup_needed = 0; thread_wakeup_needed = 1;
smp_rmb();
if (bh->state != BH_STATE_FULL)
sleep again ...

As pointed out by Alan Stern, this is an R-pattern issue. The issue can
be seen when there are two wakeups in quick succession. The
thread_wakeup_needed can be overwritten in sleep_thread, and the read of
the bh->state maybe reordered before the write to thread_wakeup_needed.

This patch applies full memory barrier smp_mb() in both sleep_thread()
and wakeup_thread() to ensure the order which the thread_wakeup_needed
and bh->state are written and loaded.

However, a better solution in the future would be to use wait_queue
method that takes care of managing memory barrier between waker and
waiter.

Acked-by: Alan Stern <[email protected]>
Signed-off-by: Thinh Nguyen <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/gadget/function/f_mass_storage.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/function/f_mass_storage.c
+++ b/drivers/usb/gadget/function/f_mass_storage.c
@@ -399,7 +399,11 @@ static int fsg_set_halt(struct fsg_dev *
/* Caller must hold fsg->lock */
static void wakeup_thread(struct fsg_common *common)
{
- smp_wmb(); /* ensure the write of bh->state is complete */
+ /*
+ * Ensure the reading of thread_wakeup_needed
+ * and the writing of bh->state are completed
+ */
+ smp_mb();
/* Tell the main thread that something has happened */
common->thread_wakeup_needed = 1;
if (common->thread_task)
@@ -630,7 +634,12 @@ static int sleep_thread(struct fsg_commo
}
__set_current_state(TASK_RUNNING);
common->thread_wakeup_needed = 0;
- smp_rmb(); /* ensure the latest bh->state is visible */
+
+ /*
+ * Ensure the writing of thread_wakeup_needed
+ * and the reading of bh->state are completed
+ */
+ smp_mb();
return rc;
}



2017-06-12 15:40:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 71/90] drm/vmwgfx: Make sure backup_handle is always valid

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sinclair Yeh <[email protected]>

commit 07678eca2cf9c9a18584e546c2b2a0d0c9a3150c upstream.

When vmw_gb_surface_define_ioctl() is called with an existing buffer,
we end up returning an uninitialized variable in the backup_handle.

The fix is to first initialize backup_handle to 0 just to be sure, and
second, when a user-provided buffer is found, we will use the
req->buffer_handle as the backup_handle.

Reported-by: Murray McAllister <[email protected]>
Signed-off-by: Sinclair Yeh <[email protected]>
Reviewed-by: Deepak Rawat <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -1288,7 +1288,7 @@ int vmw_gb_surface_define_ioctl(struct d
struct ttm_object_file *tfile = vmw_fpriv(file_priv)->tfile;
int ret;
uint32_t size;
- uint32_t backup_handle;
+ uint32_t backup_handle = 0;

if (req->multisample_count != 0)
return -EINVAL;
@@ -1331,12 +1331,16 @@ int vmw_gb_surface_define_ioctl(struct d
ret = vmw_user_dmabuf_lookup(tfile, req->buffer_handle,
&res->backup,
&user_srf->backup_base);
- if (ret == 0 && res->backup->base.num_pages * PAGE_SIZE <
- res->backup_size) {
- DRM_ERROR("Surface backup buffer is too small.\n");
- vmw_dmabuf_unreference(&res->backup);
- ret = -EINVAL;
- goto out_unlock;
+ if (ret == 0) {
+ if (res->backup->base.num_pages * PAGE_SIZE <
+ res->backup_size) {
+ DRM_ERROR("Surface backup buffer is too small.\n");
+ vmw_dmabuf_unreference(&res->backup);
+ ret = -EINVAL;
+ goto out_unlock;
+ } else {
+ backup_handle = req->buffer_handle;
+ }
}
} else if (req->drm_surface_flags & drm_vmw_surface_flag_create_buffer)
ret = vmw_user_dmabuf_alloc(dev_priv, tfile,


2017-06-12 15:40:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 82/90] NFSv4: Dont perform cached access checks before weve OPENed the file

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

commit 762674f86d0328d5dc923c966e209e1ee59663f2 upstream.

Donald Buczek reports that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.

The root cause is ultimately that the VFS is calling may_open()
before the NFS client has a chance to OPEN the file and hence revalidate
the access and attribute caches.

Al Viro suggests:
>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>> that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>
> 3) in do_last(), followed on success by vfs_open()
>
> That's all. All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.

Reported-by: Donald Buczek <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Link: http://lkml.kernel.org/r/[email protected]
Cc: Al Viro <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Paul Menzel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/nfs/dir.c | 3 +++
1 file changed, 3 insertions(+)

--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2452,6 +2452,9 @@ int nfs_permission(struct inode *inode,
case S_IFLNK:
goto out;
case S_IFREG:
+ if ((mask & MAY_OPEN) &&
+ nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
+ return 0;
break;
case S_IFDIR:
/*


2017-06-12 15:41:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 89/90] arm64: armv8_deprecated: ensure extension of addr

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mark Rutland <[email protected]>

commit 55de49f9aa17b0b2b144dd2af587177b9aadf429 upstream.

Our compat swp emulation holds the compat user address in an unsigned
int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
in a register, the upper 32 bits of the register are unknown, and we
must extend the value to 64 bits before we can use it as a base address.

This patch casts the address to unsigned long to ensure it has been
suitably extended, avoiding the potential issue, and silencing a related
warning from clang.

Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm")
Cc: <[email protected]> # 3.19.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/kernel/armv8_deprecated.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -305,7 +305,8 @@ static void register_insn_emulation_sysc
ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN, \
CONFIG_ARM64_PAN) \
: "=&r" (res), "+r" (data), "=&r" (temp) \
- : "r" (addr), "i" (-EAGAIN), "i" (-EFAULT) \
+ : "r" ((unsigned long)addr), "i" (-EAGAIN), \
+ "i" (-EFAULT) \
: "memory")

#define __user_swp_asm(data, addr, res, temp) \


2017-06-12 15:41:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 75/90] ASoC: Fix use-after-free at card unregistration

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 4efda5f2130da033aeedc5b3205569893b910de2 upstream.

soc_cleanup_card_resources() call snd_card_free() at the last of its
procedure. This turned out to lead to a use-after-free.
PCM runtimes have been already removed via soc_remove_pcm_runtimes(),
while it's dereferenced later in soc_pcm_free() called via
snd_card_free().

The fix is simple: just move the snd_card_free() call to the beginning
of the whole procedure. This also gives another benefit: it
guarantees that all operations have been shut down before actually
releasing the resources, which was racy until now.

Reported-and-tested-by: Robert Jarzmik <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/soc-core.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -1775,6 +1775,9 @@ static int soc_cleanup_card_resources(st
for (i = 0; i < card->num_aux_devs; i++)
soc_remove_aux_dev(card, i);

+ /* free the ALSA card at first; this syncs with pending operations */
+ snd_card_free(card->snd_card);
+
/* remove and free each DAI */
soc_remove_dai_links(card);

@@ -1786,9 +1789,7 @@ static int soc_cleanup_card_resources(st

snd_soc_dapm_free(&card->dapm);

- snd_card_free(card->snd_card);
return 0;
-
}

/* removes a socdev */


2017-06-12 15:40:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 85/90] arm64: entry: improve data abort handling of tagged pointers

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kristina Martsenko <[email protected]>

commit 276e93279a630657fff4b086ba14c95955912dfa upstream.

This backport has a minor difference from the upstream commit: it adds
the asm-uaccess.h file, which is not present in 4.4, because 4.4 does
not have commit b4b8664d291a ("arm64: don't pull uaccess.h into *.S").

Original patch description:

When handling a data abort from EL0, we currently zero the top byte of
the faulting address, as we assume the address is a TTBR0 address, which
may contain a non-zero address tag. However, the address may be a TTBR1
address, in which case we should not zero the top byte. This patch fixes
that. The effect is that the full TTBR1 address is passed to the task's
signal handler (or printed out in the kernel log).

When handling a data abort from EL1, we leave the faulting address
intact, as we assume it's either a TTBR1 address or a TTBR0 address with
tag 0x00. This is true as far as I'm aware, we don't seem to access a
tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
forget about address tags, and code added in the future may not always
remember to remove tags from addresses before accessing them. So add tag
handling to the EL1 data abort handler as well. This also makes it
consistent with the EL0 data abort handler.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Reviewed-by: Dave Martin <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/include/asm/asm-uaccess.h | 13 +++++++++++++
arch/arm64/kernel/entry.S | 6 ++++--
2 files changed, 17 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/arch/arm64/include/asm/asm-uaccess.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_ASM_UACCESS_H
+#define __ASM_ASM_UACCESS_H
+
+/*
+ * Remove the address tag from a virtual address, if present.
+ */
+ .macro clear_address_tag, dst, addr
+ tst \addr, #(1 << 55)
+ bic \dst, \addr, #(0xff << 56)
+ csel \dst, \dst, \addr, eq
+ .endm
+
+#endif
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -29,6 +29,7 @@
#include <asm/esr.h>
#include <asm/memory.h>
#include <asm/thread_info.h>
+#include <asm/asm-uaccess.h>
#include <asm/unistd.h>

/*
@@ -316,12 +317,13 @@ el1_da:
/*
* Data abort handling
*/
- mrs x0, far_el1
+ mrs x3, far_el1
enable_dbg
// re-enable interrupts if they were enabled in the aborted context
tbnz x23, #7, 1f // PSR_I_BIT
enable_irq
1:
+ clear_address_tag x0, x3
mov x2, sp // struct pt_regs
bl do_mem_abort

@@ -483,7 +485,7 @@ el0_da:
// enable interrupts before calling the main handler
enable_dbg_and_irq
ct_user_exit
- bic x0, x26, #(0xff << 56)
+ clear_address_tag x0, x26
mov x1, x25
mov x2, sp
bl do_mem_abort


2017-06-12 15:41:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 79/90] net: better skb->sender_cpu and skb->napi_id cohabitation

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

commit 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb upstream.

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Cc: Paul Menzel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/skbuff.h | 3 ---
net/core/dev.c | 33 ++++++++++++++++-----------------
2 files changed, 16 insertions(+), 20 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1084,9 +1084,6 @@ static inline void skb_copy_hash(struct

static inline void skb_sender_cpu_clear(struct sk_buff *skb)
{
-#ifdef CONFIG_XPS
- skb->sender_cpu = 0;
-#endif
}

#ifdef NET_SKBUFF_DATA_USES_OFFSET
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -182,7 +182,7 @@ EXPORT_SYMBOL(dev_base_lock);
/* protects napi_hash addition/deletion and napi_gen_id */
static DEFINE_SPINLOCK(napi_hash_lock);

-static unsigned int napi_gen_id;
+static unsigned int napi_gen_id = NR_CPUS;
static DEFINE_HASHTABLE(napi_hash, 8);

static seqcount_t devnet_rename_seq;
@@ -3049,7 +3049,9 @@ struct netdev_queue *netdev_pick_tx(stru
int queue_index = 0;

#ifdef CONFIG_XPS
- if (skb->sender_cpu == 0)
+ u32 sender_cpu = skb->sender_cpu - 1;
+
+ if (sender_cpu >= (u32)NR_CPUS)
skb->sender_cpu = raw_smp_processor_id() + 1;
#endif

@@ -4726,25 +4728,22 @@ EXPORT_SYMBOL_GPL(napi_by_id);

void napi_hash_add(struct napi_struct *napi)
{
- if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+ if (test_and_set_bit(NAPI_STATE_HASHED, &napi->state))
+ return;

- spin_lock(&napi_hash_lock);
+ spin_lock(&napi_hash_lock);

- /* 0 is not a valid id, we also skip an id that is taken
- * we expect both events to be extremely rare
- */
- napi->napi_id = 0;
- while (!napi->napi_id) {
- napi->napi_id = ++napi_gen_id;
- if (napi_by_id(napi->napi_id))
- napi->napi_id = 0;
- }
+ /* 0..NR_CPUS+1 range is reserved for sender_cpu use */
+ do {
+ if (unlikely(++napi_gen_id < NR_CPUS + 1))
+ napi_gen_id = NR_CPUS + 1;
+ } while (napi_by_id(napi_gen_id));
+ napi->napi_id = napi_gen_id;

- hlist_add_head_rcu(&napi->napi_hash_node,
- &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+ hlist_add_head_rcu(&napi->napi_hash_node,
+ &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);

- spin_unlock(&napi_hash_lock);
- }
+ spin_unlock(&napi_hash_lock);
}
EXPORT_SYMBOL_GPL(napi_hash_add);



2017-06-12 15:46:24

by Jann Horn

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 4.4 47/90] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms

Ah, nevermind, I just saw that this was changed in
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.4/drivers-char-random-add-get_random_long.patch
.

On Mon, Jun 12, 2017 at 5:41 PM, Jann Horn <[email protected]> wrote:
> AFAICS get_random_long() doesn't exist in 4.4 (except in
> arch/x86/boot/compressed/aslr.c)? IIRC the same problem already
> occured with another kernel version?
>
> On Mon, Jun 12, 2017 at 5:25 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Daniel Micay <[email protected]>
>>
>> commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.
>>
>> The stack canary is an 'unsigned long' and should be fully initialized to
>> random data rather than only 32 bits of random data.
>>
>> Signed-off-by: Daniel Micay <[email protected]>
>> Acked-by: Arjan van de Ven <[email protected]>
>> Acked-by: Rik van Riel <[email protected]>
>> Acked-by: Kees Cook <[email protected]>
>> Cc: Arjan van Ven <[email protected]>
>> Cc: Linus Torvalds <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: Ingo Molnar <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>
>> ---
>> kernel/fork.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -368,7 +368,7 @@ static struct task_struct *dup_task_stru
>> set_task_stack_end_magic(tsk);
>>
>> #ifdef CONFIG_CC_STACKPROTECTOR
>> - tsk->stack_canary = get_random_int();
>> + tsk->stack_canary = get_random_long();
>> #endif
>>
>> /*
>>
>>

2017-06-12 15:49:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 36/90] ext4: fix SEEK_HOLE

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 7d95eddf313c88b24f99d4ca9c2411a4b82fef33 upstream.

Currently, SEEK_HOLE implementation in ext4 may both return that there's
a hole at some offset although that offset already has data and skip
some holes during a search for the next hole. The first problem is
demostrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
Whence Result
HOLE 0

Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
a hole although we have written data there. The second problem can be
demonstrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
-c "seek -h 0" file

wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
Whence Result
HOLE 139264

Where we can see that hole at offsets 56k..128k has been ignored by the
SEEK_HOLE call.

The underlying problem is in the ext4_find_unwritten_pgoff() which is
just buggy. In some cases it fails to update returned offset when it
finds a hole (when no pages are found or when the first found page has
higher index than expected), in some cases conditions for detecting hole
are just missing (we fail to detect a situation where indices of
returned pages are not contiguous).

Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
indices and also handle all cases where we got less pages then expected
in one place and handle it properly there.

Fixes: c8c0df241cc2719b1262e627f999638411934f60
CC: Zheng Liu <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/file.c | 50 ++++++++++++++------------------------------------
1 file changed, 14 insertions(+), 36 deletions(-)

--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -463,47 +463,27 @@ static int ext4_find_unwritten_pgoff(str
num = min_t(pgoff_t, end - index, PAGEVEC_SIZE);
nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
(pgoff_t)num);
- if (nr_pages == 0) {
- if (whence == SEEK_DATA)
- break;
-
- BUG_ON(whence != SEEK_HOLE);
- /*
- * If this is the first time to go into the loop and
- * offset is not beyond the end offset, it will be a
- * hole at this offset
- */
- if (lastoff == startoff || lastoff < endoff)
- found = 1;
+ if (nr_pages == 0)
break;
- }
-
- /*
- * If this is the first time to go into the loop and
- * offset is smaller than the first page offset, it will be a
- * hole at this offset.
- */
- if (lastoff == startoff && whence == SEEK_HOLE &&
- lastoff < page_offset(pvec.pages[0])) {
- found = 1;
- break;
- }

for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
struct buffer_head *bh, *head;

/*
- * If the current offset is not beyond the end of given
- * range, it will be a hole.
+ * If current offset is smaller than the page offset,
+ * there is a hole at this offset.
*/
- if (lastoff < endoff && whence == SEEK_HOLE &&
- page->index > end) {
+ if (whence == SEEK_HOLE && lastoff < endoff &&
+ lastoff < page_offset(pvec.pages[i])) {
found = 1;
*offset = lastoff;
goto out;
}

+ if (page->index > end)
+ goto out;
+
lock_page(page);

if (unlikely(page->mapping != inode->i_mapping)) {
@@ -543,20 +523,18 @@ static int ext4_find_unwritten_pgoff(str
unlock_page(page);
}

- /*
- * The no. of pages is less than our desired, that would be a
- * hole in there.
- */
- if (nr_pages < num && whence == SEEK_HOLE) {
- found = 1;
- *offset = lastoff;
+ /* The no. of pages is less than our desired, we are done. */
+ if (nr_pages < num)
break;
- }

index = pvec.pages[i - 1]->index + 1;
pagevec_release(&pvec);
} while (index <= end);

+ if (whence == SEEK_HOLE && lastoff < endoff) {
+ found = 1;
+ *offset = lastoff;
+ }
out:
pagevec_release(&pvec);
return found;


2017-06-12 15:50:08

by Jann Horn

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 4.4 47/90] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms

AFAICS get_random_long() doesn't exist in 4.4 (except in
arch/x86/boot/compressed/aslr.c)? IIRC the same problem already
occured with another kernel version?

On Mon, Jun 12, 2017 at 5:25 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Daniel Micay <[email protected]>
>
> commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.
>
> The stack canary is an 'unsigned long' and should be fully initialized to
> random data rather than only 32 bits of random data.
>
> Signed-off-by: Daniel Micay <[email protected]>
> Acked-by: Arjan van de Ven <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Kees Cook <[email protected]>
> Cc: Arjan van Ven <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> kernel/fork.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -368,7 +368,7 @@ static struct task_struct *dup_task_stru
> set_task_stack_end_magic(tsk);
>
> #ifdef CONFIG_CC_STACKPROTECTOR
> - tsk->stack_canary = get_random_int();
> + tsk->stack_canary = get_random_long();
> #endif
>
> /*
>
>

2017-06-12 15:41:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 77/90] tty: Drop krefs for interrupted tty lock

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Peter Hurley <[email protected]>

commit e9036d0662360cd4c79578565ce422ed5872f301 upstream.

When the tty lock is interrupted on attempted re-open, 2 tty krefs
are still held. Drop extra kref before returning failure from
tty_lock_interruptible(), and drop lookup kref before returning
failure from tty_open().

Fixes: 0bfd464d3fdd ("tty: Wait interruptibly for tty lock on reopen")
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/tty_io.c | 3 +--
drivers/tty/tty_mutex.c | 7 ++++++-
2 files changed, 7 insertions(+), 3 deletions(-)

--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2070,13 +2070,12 @@ retry_open:
if (tty) {
mutex_unlock(&tty_mutex);
retval = tty_lock_interruptible(tty);
+ tty_kref_put(tty); /* drop kref from tty_driver_lookup_tty() */
if (retval) {
if (retval == -EINTR)
retval = -ERESTARTSYS;
goto err_unref;
}
- /* safe to drop the kref from tty_driver_lookup_tty() */
- tty_kref_put(tty);
retval = tty_reopen(tty);
if (retval < 0) {
tty_unlock(tty);
--- a/drivers/tty/tty_mutex.c
+++ b/drivers/tty/tty_mutex.c
@@ -24,10 +24,15 @@ EXPORT_SYMBOL(tty_lock);

int tty_lock_interruptible(struct tty_struct *tty)
{
+ int ret;
+
if (WARN(tty->magic != TTY_MAGIC, "L Bad %p\n", tty))
return -EIO;
tty_kref_get(tty);
- return mutex_lock_interruptible(&tty->legacy_mutex);
+ ret = mutex_lock_interruptible(&tty->legacy_mutex);
+ if (ret)
+ tty_kref_put(tty);
+ return ret;
}

void __lockfunc tty_unlock(struct tty_struct *tty)


2017-06-12 15:53:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 70/90] drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vladis Dronov <[email protected]>

commit ee9c4e681ec4f58e42a83cb0c22a0289ade1aacf upstream.

The 'req->mip_levels' parameter in vmw_gb_surface_define_ioctl() is
a user-controlled 'uint32_t' value which is used as a loop count limit.
This can lead to a kernel lockup and DoS. Add check for 'req->mip_levels'.

References:
https://bugzilla.redhat.com/show_bug.cgi?id=1437431

Signed-off-by: Vladis Dronov <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -1293,6 +1293,9 @@ int vmw_gb_surface_define_ioctl(struct d
if (req->multisample_count != 0)
return -EINVAL;

+ if (req->mip_levels > DRM_VMW_MAX_MIP_LEVELS)
+ return -EINVAL;
+
if (unlikely(vmw_user_surface_size == 0))
vmw_user_surface_size = ttm_round_pot(sizeof(*user_srf)) +
128;


2017-06-12 15:41:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 78/90] serial: sh-sci: Fix panic when serial console and DMA are enabled

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takatoshi Akiyama <[email protected]>

commit 3c9101766b502a0163d1d437fada5801cf616be2 upstream.

This patch fixes an issue that kernel panic happens when DMA is enabled
and we press enter key while the kernel booting on the serial console.

* An interrupt may occur after sci_request_irq().
* DMA transfer area is initialized by setup_timer() in sci_request_dma()
and used in interrupt.

If an interrupt occurred between sci_request_irq() and setup_timer() in
sci_request_dma(), DMA transfer area has not been initialized yet.
So, this patch changes the order of sci_request_irq() and
sci_request_dma().

Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
Signed-off-by: Takatoshi Akiyama <[email protected]>
[Shimoda changes the commit log]
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/serial/sh-sci.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1800,11 +1800,13 @@ static int sci_startup(struct uart_port

dev_dbg(port->dev, "%s(%d)\n", __func__, port->line);

+ sci_request_dma(port);
+
ret = sci_request_irq(s);
- if (unlikely(ret < 0))
+ if (unlikely(ret < 0)) {
+ sci_free_dma(port);
return ret;
-
- sci_request_dma(port);
+ }

spin_lock_irqsave(&port->lock, flags);
sci_start_tx(port);
@@ -1834,8 +1836,8 @@ static void sci_shutdown(struct uart_por
}
#endif

- sci_free_dma(port);
sci_free_irq(s);
+ sci_free_dma(port);
}

static unsigned int sci_scbrr_calc(struct sci_port *s, unsigned int bps,


2017-06-12 15:41:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 90/90] arm64: ensure extension of smp_store_release value

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mark Rutland <[email protected]>

commit 994870bead4ab19087a79492400a5478e2906196 upstream.

When an inline assembly operand's type is narrower than the register it
is allocated to, the least significant bits of the register (up to the
operand type's width) are valid, and any other bits are permitted to
contain any arbitrary value. This aligns with the AAPCS64 parameter
passing rules.

Our __smp_store_release() implementation does not account for this, and
implicitly assumes that operands have been zero-extended to the width of
the type being stored to. Thus, we may store unknown values to memory
when the value type is narrower than the pointer type (e.g. when storing
a char to a long).

This patch fixes the issue by casting the value operand to the same
width as the pointer operand in all cases, which ensures that the value
is zero-extended as we expect. We use the same union trickery as
__smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
pointers are potentially cast to narrower width integers in unreachable
paths.

A whitespace issue at the top of __smp_store_release() is also
corrected.

No changes are necessary for __smp_load_acquire(). Load instructions
implicitly clear any upper bits of the register, and the compiler will
only consider the least significant bits of the register as valid
regardless.

Fixes: 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
Fixes: 878a84d5a8a1 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
Cc: <[email protected]> # 3.14.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/include/asm/barrier.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -41,23 +41,33 @@

#define smp_store_release(p, v) \
do { \
+ union { typeof(*p) __val; char __c[1]; } __u = \
+ { .__val = (__force typeof(*p)) (v) }; \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
case 1: \
asm volatile ("stlrb %w1, %0" \
- : "=Q" (*p) : "r" (v) : "memory"); \
+ : "=Q" (*p) \
+ : "r" (*(__u8 *)__u.__c) \
+ : "memory"); \
break; \
case 2: \
asm volatile ("stlrh %w1, %0" \
- : "=Q" (*p) : "r" (v) : "memory"); \
+ : "=Q" (*p) \
+ : "r" (*(__u16 *)__u.__c) \
+ : "memory"); \
break; \
case 4: \
asm volatile ("stlr %w1, %0" \
- : "=Q" (*p) : "r" (v) : "memory"); \
+ : "=Q" (*p) \
+ : "r" (*(__u32 *)__u.__c) \
+ : "memory"); \
break; \
case 8: \
asm volatile ("stlr %1, %0" \
- : "=Q" (*p) : "r" (v) : "memory"); \
+ : "=Q" (*p) \
+ : "r" (*(__u64 *)__u.__c) \
+ : "memory"); \
break; \
} \
} while (0)


2017-06-12 15:54:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 74/90] ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit ba3021b2c79b2fa9114f92790a99deb27a65b728 upstream.

snd_timer_user_tselect() reallocates the queue buffer dynamically, but
it forgot to reset its indices. Since the read may happen
concurrently with ioctl and snd_timer_user_tselect() allocates the
buffer via kmalloc(), this may lead to the leak of uninitialized
kernel-space data, as spotted via KMSAN:

BUG: KMSAN: use of unitialized memory in snd_timer_user_read+0x6c4/0xa10
CPU: 0 PID: 1037 Comm: probe Not tainted 4.11.0-rc5+ #2739
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x143/0x1b0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
kmsan_check_memory+0xc2/0x140 mm/kmsan/kmsan.c:1086
copy_to_user ./arch/x86/include/asm/uaccess.h:725
snd_timer_user_read+0x6c4/0xa10 sound/core/timer.c:2004
do_loop_readv_writev fs/read_write.c:716
__do_readv_writev+0x94c/0x1380 fs/read_write.c:864
do_readv_writev fs/read_write.c:894
vfs_readv fs/read_write.c:908
do_readv+0x52a/0x5d0 fs/read_write.c:934
SYSC_readv+0xb6/0xd0 fs/read_write.c:1021
SyS_readv+0x87/0xb0 fs/read_write.c:1018

This patch adds the missing reset of queue indices. Together with the
previous fix for the ioctl/read race, we cover the whole problem.

Reported-by: Alexander Potapenko <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/core/timer.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1621,6 +1621,7 @@ static int snd_timer_user_tselect(struct
if (err < 0)
goto __err;

+ tu->qhead = tu->qtail = tu->qused = 0;
kfree(tu->queue);
tu->queue = NULL;
kfree(tu->tqueue);


2017-06-12 15:56:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 76/90] drivers: char: mem: Fix wraparound check to allow mappings up to the end

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julius Werner <[email protected]>

commit 32829da54d9368103a2f03269a5120aa9ee4d5da upstream.

A recent fix to /dev/mem prevents mappings from wrapping around the end
of physical address space. However, the check was written in a way that
also prevents a mapping reaching just up to the end of physical address
space, which may be a valid use case (especially on 32-bit systems).
This patch fixes it by checking the last mapped address (instead of the
first address behind that) for overflow.

Fixes: b299cde245 ("drivers: char: mem: Check for address space wraparound with mmap()")
Reported-by: Nico Huber <[email protected]>
Signed-off-by: Julius Werner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -346,7 +346,7 @@ static int mmap_mem(struct file *file, s
phys_addr_t offset = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;

/* It's illegal to wrap around the end of the physical address space. */
- if (offset + (phys_addr_t)size < offset)
+ if (offset + (phys_addr_t)size - 1 < offset)
return -EINVAL;

if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))


2017-06-12 15:59:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 73/90] ALSA: timer: Fix race between read and ioctl

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit d11662f4f798b50d8c8743f433842c3e40fe3378 upstream.

The read from ALSA timer device, the function snd_timer_user_tread(),
may access to an uninitialized struct snd_timer_user fields when the
read is concurrently performed while the ioctl like
snd_timer_user_tselect() is invoked. We have already fixed the races
among ioctls via a mutex, but we seem to have forgotten the race
between read vs ioctl.

This patch simply applies (more exactly extends the already applied
range of) tu->ioctl_lock in snd_timer_user_tread() for closing the
race window.

Reported-by: Alexander Potapenko <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/core/timer.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1958,6 +1958,7 @@ static ssize_t snd_timer_user_read(struc

tu = file->private_data;
unit = tu->tread ? sizeof(struct snd_timer_tread) : sizeof(struct snd_timer_read);
+ mutex_lock(&tu->ioctl_lock);
spin_lock_irq(&tu->qlock);
while ((long)count - result >= unit) {
while (!tu->qused) {
@@ -1973,7 +1974,9 @@ static ssize_t snd_timer_user_read(struc
add_wait_queue(&tu->qchange_sleep, &wait);

spin_unlock_irq(&tu->qlock);
+ mutex_unlock(&tu->ioctl_lock);
schedule();
+ mutex_lock(&tu->ioctl_lock);
spin_lock_irq(&tu->qlock);

remove_wait_queue(&tu->qchange_sleep, &wait);
@@ -1993,7 +1996,6 @@ static ssize_t snd_timer_user_read(struc
tu->qused--;
spin_unlock_irq(&tu->qlock);

- mutex_lock(&tu->ioctl_lock);
if (tu->tread) {
if (copy_to_user(buffer, &tu->tqueue[qhead],
sizeof(struct snd_timer_tread)))
@@ -2003,7 +2005,6 @@ static ssize_t snd_timer_user_read(struc
sizeof(struct snd_timer_read)))
err = -EFAULT;
}
- mutex_unlock(&tu->ioctl_lock);

spin_lock_irq(&tu->qlock);
if (err < 0)
@@ -2013,6 +2014,7 @@ static ssize_t snd_timer_user_read(struc
}
_error:
spin_unlock_irq(&tu->qlock);
+ mutex_unlock(&tu->ioctl_lock);
return result > 0 ? result : err;
}



2017-06-12 15:40:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 87/90] tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Amey Telawane <[email protected]>

commit e09e28671cda63e6308b31798b997639120e2a21 upstream.

Strcpy is inherently not safe, and strlcpy() should be used instead.
__trace_find_cmdline() uses strcpy() because the comms saved must have a
terminating nul character, but it doesn't hurt to add the extra protection
of using strlcpy() instead of strcpy().

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Amey Telawane <[email protected]>
[AmitP: Cherry-picked this commit from CodeAurora kernel/msm-3.10
https://source.codeaurora.org/quic/la/kernel/msm-3.10/commit/?id=2161ae9a70b12cf18ac8e5952a20161ffbccb477]
Signed-off-by: Amit Pundir <[email protected]>
[ Updated change log and removed the "- 1" from len parameter ]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1617,7 +1617,7 @@ static void __trace_find_cmdline(int pid

map = savedcmd->map_pid_to_cmdline[pid];
if (map != NO_CMDLINE_MAP)
- strcpy(comm, get_saved_cmdlines(map));
+ strlcpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
else
strcpy(comm, "<...>");
}


2017-06-12 15:59:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 72/90] drm/nouveau/tmr: fully separate alarm execution/pending lists

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Skeggs <[email protected]>

commit b4e382ca7586a63b6c1e5221ce0863ff867c2df6 upstream.

Reusing the list_head for both is a bad idea. Callback execution is done
with the lock dropped so that alarms can be rescheduled from the callback,
which means that with some unfortunate timing, lists can get corrupted.

The execution list should not require its own locking, the single function
that uses it can only be called from a single context.

Signed-off-by: Ben Skeggs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/nouveau/include/nvkm/subdev/timer.h | 1 +
drivers/gpu/drm/nouveau/nvkm/subdev/timer/base.c | 7 ++++---
2 files changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/timer.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/timer.h
@@ -4,6 +4,7 @@

struct nvkm_alarm {
struct list_head head;
+ struct list_head exec;
u64 timestamp;
void (*func)(struct nvkm_alarm *);
};
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/base.c
@@ -50,7 +50,8 @@ nvkm_timer_alarm_trigger(struct nvkm_tim
/* Move to completed list. We'll drop the lock before
* executing the callback so it can reschedule itself.
*/
- list_move_tail(&alarm->head, &exec);
+ list_del_init(&alarm->head);
+ list_add(&alarm->exec, &exec);
}

/* Shut down interrupt if no more pending alarms. */
@@ -59,8 +60,8 @@ nvkm_timer_alarm_trigger(struct nvkm_tim
spin_unlock_irqrestore(&tmr->lock, flags);

/* Execute completed callbacks. */
- list_for_each_entry_safe(alarm, atemp, &exec, head) {
- list_del_init(&alarm->head);
+ list_for_each_entry_safe(alarm, atemp, &exec, exec) {
+ list_del(&alarm->exec);
alarm->func(alarm);
}
}


2017-06-12 16:00:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 88/90] usercopy: Adjust tests to deal with SMAP/PAN

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit f5f893c57e37ca730808cb2eee3820abd05e7507 upstream.

Under SMAP/PAN/etc, we cannot write directly to userspace memory, so
this rearranges the test bytes to get written through copy_to_user().
Additionally drops the bad copy_from_user() test that would trigger a
memcpy() against userspace on failure.

[arnd: the test module was added in 3.14, and this backported patch
should apply cleanly on all version from 3.14 to 4.10.
The original patch was in 4.11 on top of a context change
I saw the bug triggered with kselftest on a 4.4.y stable kernel]

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
lib/test_user_copy.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

--- a/lib/test_user_copy.c
+++ b/lib/test_user_copy.c
@@ -58,7 +58,9 @@ static int __init test_user_copy_init(vo
usermem = (char __user *)user_addr;
bad_usermem = (char *)user_addr;

- /* Legitimate usage: none of these should fail. */
+ /*
+ * Legitimate usage: none of these copies should fail.
+ */
ret |= test(copy_from_user(kmem, usermem, PAGE_SIZE),
"legitimate copy_from_user failed");
ret |= test(copy_to_user(usermem, kmem, PAGE_SIZE),
@@ -68,19 +70,33 @@ static int __init test_user_copy_init(vo
ret |= test(put_user(value, (unsigned long __user *)usermem),
"legitimate put_user failed");

- /* Invalid usage: none of these should succeed. */
+ /*
+ * Invalid usage: none of these copies should succeed.
+ */
+
+ /* Reject kernel-to-kernel copies through copy_from_user(). */
ret |= test(!copy_from_user(kmem, (char __user *)(kmem + PAGE_SIZE),
PAGE_SIZE),
"illegal all-kernel copy_from_user passed");
+
+#if 0
+ /*
+ * When running with SMAP/PAN/etc, this will Oops the kernel
+ * due to the zeroing of userspace memory on failure. This needs
+ * to be tested in LKDTM instead, since this test module does not
+ * expect to explode.
+ */
ret |= test(!copy_from_user(bad_usermem, (char __user *)kmem,
PAGE_SIZE),
"illegal reversed copy_from_user passed");
+#endif
ret |= test(!copy_to_user((char __user *)kmem, kmem + PAGE_SIZE,
PAGE_SIZE),
"illegal all-kernel copy_to_user passed");
ret |= test(!copy_to_user((char __user *)kmem, bad_usermem,
PAGE_SIZE),
"illegal reversed copy_to_user passed");
+
ret |= test(!get_user(value, (unsigned long __user *)kmem),
"illegal get_user passed");
ret |= test(!put_user(value, (unsigned long __user *)kmem),


2017-06-12 15:40:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 86/90] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mike Marciniszyn <[email protected]>

commit 1feb40067cf04ae48d65f728d62ca255c9449178 upstream.

The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.

The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten. The
the peer will send the identical single packet request when its RNR
timer pops.

The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.

Cc: Stable <[email protected]> # 4.7+
Tested-by: Tadeusz Struk <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/infiniband/hw/qib/qib_rc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -2088,8 +2088,10 @@ send_last:
ret = qib_get_rwqe(qp, 1);
if (ret < 0)
goto nack_op_err;
- if (!ret)
+ if (!ret) {
+ qib_put_ss(&qp->r_sge);
goto rnr_nak;
+ }
wc.ex.imm_data = ohdr->u.rc.imm_data;
hdrsize += 4;
wc.wc_flags = IB_WC_WITH_IMM;


2017-06-12 16:00:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 31/90] dmaengine: usb-dmac: Fix DMAOR AE bit definition

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hiroyuki Yokoyama <[email protected]>

commit 9a445bbb1607d9f14556a532453dd86d1b7e381e upstream.

This patch fixes the register definition of AE (Address Error flag) bit.

Fixes: 0c1c8ff32fa2 ("dmaengine: usb-dmac: Add Renesas USB DMA Controller (USB-DMAC) driver")
Signed-off-by: Hiroyuki Yokoyama <[email protected]>
[Shimoda: add Fixes and Cc tags in the commit log]
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/dma/sh/usb-dmac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/dma/sh/usb-dmac.c
+++ b/drivers/dma/sh/usb-dmac.c
@@ -117,7 +117,7 @@ struct usb_dmac {
#define USB_DMASWR 0x0008
#define USB_DMASWR_SWR (1 << 0)
#define USB_DMAOR 0x0060
-#define USB_DMAOR_AE (1 << 2)
+#define USB_DMAOR_AE (1 << 1)
#define USB_DMAOR_DME (1 << 0)

#define USB_DMASAR 0x0000


2017-06-12 15:40:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 84/90] arm64: hw_breakpoint: fix watchpoint matching for tagged pointers

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kristina Martsenko <[email protected]>

commit 7dcd9dd8cebe9fa626af7e2358d03a37041a70fb upstream.

This backport has a few small differences from the upstream commit:
- The address tag is removed in watchpoint_handler() instead of
get_distance_from_watchpoint(), because 4.4 does not have commit
fdfeff0f9e3d ("arm64: hw_breakpoint: Handle inexact watchpoint
addresses").
- A macro is backported (untagged_addr), as it is not present in 4.4.

Original patch description:

When we take a watchpoint exception, the address that triggered the
watchpoint is found in FAR_EL1. We compare it to the address of each
configured watchpoint to see which one was hit.

The configured watchpoint addresses are untagged, while the address in
FAR_EL1 will have an address tag if the data access was done using a
tagged address. The tag needs to be removed to compare the address to
the watchpoints.

Currently we don't remove it, and as a result can report the wrong
watchpoint as being hit (specifically, always either the highest TTBR0
watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Acked-by: Mark Rutland <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 8 ++++++++
arch/arm64/kernel/hw_breakpoint.c | 3 ++-
2 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -21,6 +21,7 @@
/*
* User space memory access functions
*/
+#include <linux/bitops.h>
#include <linux/string.h>
#include <linux/thread_info.h>

@@ -103,6 +104,13 @@ static inline void set_fs(mm_segment_t f
flag; \
})

+/*
+ * When dealing with data aborts, watchpoints, or instruction traps we may end
+ * up with a tagged userland pointer. Clear the tag to get a sane pointer to
+ * pass on to access_ok(), for instance.
+ */
+#define untagged_addr(addr) sign_extend64(addr, 55)
+
#define access_ok(type, addr, size) __range_ok(addr, size)
#define user_addr_max get_fs

--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -35,6 +35,7 @@
#include <asm/traps.h>
#include <asm/cputype.h>
#include <asm/system_misc.h>
+#include <asm/uaccess.h>

/* Breakpoint currently in use for each BRP. */
static DEFINE_PER_CPU(struct perf_event *, bp_on_reg[ARM_MAX_BRP]);
@@ -690,7 +691,7 @@ static int watchpoint_handler(unsigned l

/* Check if the watchpoint value matches. */
val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
- if (val != (addr & ~alignment_mask))
+ if (val != (untagged_addr(addr) & ~alignment_mask))
goto unlock;

/* Possible match, check the byte address select to confirm. */


2017-06-12 15:40:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 80/90] mm: consider memblock reservations for deferred memory initialization sizing

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michal Hocko <[email protected]>

commit 864b9a393dcb5aed09b8fd31b9bbda0fdda99374 upstream.

We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=7
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
dump_header+0xb0/0x258
out_of_memory+0x5f0/0x640
__alloc_pages_nodemask+0xa8c/0xc80
kmem_getpages+0x84/0x1a0
fallback_alloc+0x2a4/0x320
kmem_cache_alloc_node+0xc0/0x2e0
copy_process.isra.25+0x260/0x1b30
_do_fork+0x94/0x470
kernel_thread+0x48/0x60
kthreadd+0x264/0x330
ret_from_kernel_thread+0x5c/0xa4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5 slab_unreclaimable:73
mapped:0 shmem:0 pagetables:0 bounce:0
free:0 free_pcp:0 free_cma:0
Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
0 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
819200 pages RAM
0 pages HighMem/MovableOnly
817481 pages reserved
0 pages cma reserved
0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done. update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range. In this particular case we've had

Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation. Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address. The cumulative size is than added on top
of the initial estimation. This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Tested-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
include/linux/memblock.h | 8 ++++++++
include/linux/mmzone.h | 1 +
mm/memblock.c | 24 ++++++++++++++++++++++++
mm/page_alloc.c | 25 ++++++++++++++++++++++---
4 files changed, 55 insertions(+), 3 deletions(-)

--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -408,11 +408,19 @@ static inline void early_memtest(phys_ad
}
#endif

+extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+ phys_addr_t end_addr);
#else
static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
{
return 0;
}
+
+static inline unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
+ phys_addr_t end_addr)
+{
+ return 0;
+}

#endif /* CONFIG_HAVE_MEMBLOCK */

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -688,6 +688,7 @@ typedef struct pglist_data {
* is the first PFN that needs to be initialised.
*/
unsigned long first_deferred_pfn;
+ unsigned long static_init_size;
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
} pg_data_t;

--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1634,6 +1634,30 @@ static void __init_memblock memblock_dum
}
}

+extern unsigned long __init_memblock
+memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
+{
+ struct memblock_type *type = &memblock.reserved;
+ unsigned long size = 0;
+ int idx;
+
+ for (idx = 0; idx < type->cnt; idx++) {
+ struct memblock_region *rgn = &type->regions[idx];
+ phys_addr_t start, end;
+
+ if (rgn->base + rgn->size < start_addr)
+ continue;
+ if (rgn->base > end_addr)
+ continue;
+
+ start = rgn->base;
+ end = start + rgn->size;
+ size += end - start;
+ }
+
+ return size;
+}
+
void __init_memblock __memblock_dump_all(void)
{
pr_info("MEMBLOCK configuration:\n");
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -269,6 +269,26 @@ int page_group_by_mobility_disabled __re
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
static inline void reset_deferred_meminit(pg_data_t *pgdat)
{
+ unsigned long max_initialise;
+ unsigned long reserved_lowmem;
+
+ /*
+ * Initialise at least 2G of a node but also take into account that
+ * two large system hashes that can take up 1GB for 0.25TB/node.
+ */
+ max_initialise = max(2UL << (30 - PAGE_SHIFT),
+ (pgdat->node_spanned_pages >> 8));
+
+ /*
+ * Compensate the all the memblock reservations (e.g. crash kernel)
+ * from the initial estimation to make sure we will initialize enough
+ * memory to boot.
+ */
+ reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
+ pgdat->node_start_pfn + max_initialise);
+ max_initialise += reserved_lowmem;
+
+ pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
pgdat->first_deferred_pfn = ULONG_MAX;
}

@@ -302,10 +322,9 @@ static inline bool update_defer_init(pg_
/* Always populate low zones for address-contrained allocations */
if (zone_end < pgdat_end_pfn(pgdat))
return true;
-
/* Initialise at least 2G of the highest zone */
(*nr_initialised)++;
- if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+ if ((*nr_initialised > pgdat->static_init_size) &&
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
pgdat->first_deferred_pfn = pfn;
return false;
@@ -5343,7 +5362,6 @@ void __paginginit free_area_init_node(in
/* pg_data_t should be reset to zero when it's allocated */
WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);

- reset_deferred_meminit(pgdat);
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5362,6 +5380,7 @@ void __paginginit free_area_init_node(in
(unsigned long)pgdat->node_mem_map);
#endif

+ reset_deferred_meminit(pgdat);
free_area_init_core(pgdat);
}



2017-06-12 15:40:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 81/90] NFS: Ensure we revalidate attributes before using execute_ok()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

commit 5c5fc09a1157a11dbe84e6421c3e0b37d05238cb upstream.

Donald Buczek reports that NFS clients can also report incorrect
results for access() due to lack of revalidation of attributes
before calling execute_ok().
Looking closely, it seems chdir() is afflicted with the same problem.

Fix is to ensure we call nfs_revalidate_inode_rcu() or
nfs_revalidate_inode() as appropriate before deciding to trust
execute_ok().

Reported-by: Donald Buczek <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Paul Menzel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/nfs/dir.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2421,6 +2421,20 @@ int nfs_may_open(struct inode *inode, st
}
EXPORT_SYMBOL_GPL(nfs_may_open);

+static int nfs_execute_ok(struct inode *inode, int mask)
+{
+ struct nfs_server *server = NFS_SERVER(inode);
+ int ret;
+
+ if (mask & MAY_NOT_BLOCK)
+ ret = nfs_revalidate_inode_rcu(server, inode);
+ else
+ ret = nfs_revalidate_inode(server, inode);
+ if (ret == 0 && !execute_ok(inode))
+ ret = -EACCES;
+ return ret;
+}
+
int nfs_permission(struct inode *inode, int mask)
{
struct rpc_cred *cred;
@@ -2470,8 +2484,8 @@ force_lookup:
res = PTR_ERR(cred);
}
out:
- if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
- res = -EACCES;
+ if (!res && (mask & MAY_EXEC))
+ res = nfs_execute_ok(inode, mask);

dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
inode->i_sb->s_id, inode->i_ino, mask, res);


2017-06-12 16:02:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 83/90] Make __xfs_xattr_put_listen preperly report errors.

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Artem Savkov <[email protected]>

commit 791cc43b36eb1f88166c8505900cad1b43c7fe1a upstream.

Commit 2a6fba6 "xfs: only return -errno or success from attr ->put_listent"
changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is
insufficient space in the buffer assuming that setting context->count to -1
would be enough, but all of the ->put_listent callers only check seen_enough.
This results in a failed assertion:
XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
in insufficient buffer size case.

This is only reproducible with at least 2 xattrs and only when the buffer
gets depleted before the last one.

Furthermore if buffersize is such that it is enough to hold the last xattr's
name, but not enough to hold the sum of preceeding xattr names listxattr won't
fail with ERANGE, but will suceed returning last xattr's name without the
first character. The first character end's up overwriting data stored at
(context->alist - 1).

Signed-off-by: Artem Savkov <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/xfs/xfs_xattr.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -180,6 +180,7 @@ xfs_xattr_put_listent(
arraytop = context->count + prefix_len + namelen + 1;
if (arraytop > context->firstu) {
context->count = -1; /* insufficient space */
+ context->seen_enough = 1;
return 0;
}
offset = (char *)context->alist + context->count;


2017-06-12 15:40:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 37/90] ext4: keep existing extra fields when inode expands

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <[email protected]>

commit 887a9730614727c4fff7cb756711b190593fc1df upstream.

ext4_expand_extra_isize() should clear only space between old and new
size.

Fixes: 6dd4ee7cab7e # v2.6.23
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/inode.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5162,8 +5162,9 @@ static int ext4_expand_extra_isize(struc
/* No extended attributes present */
if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR) ||
header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC)) {
- memset((void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE, 0,
- new_extra_isize);
+ memset((void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE +
+ EXT4_I(inode)->i_extra_isize, 0,
+ new_extra_isize - EXT4_I(inode)->i_extra_isize);
EXT4_I(inode)->i_extra_isize = new_extra_isize;
return 0;
}


2017-06-12 16:02:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 40/90] usb: chipidea: udc: fix NULL pointer dereference if udc_start failed

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jisheng Zhang <[email protected]>

commit aa1f058d7d9244423b8c5a75b9484b1115df7f02 upstream.

Fix below NULL pointer dereference. we set ci->roles[CI_ROLE_GADGET]
too early in ci_hdrc_gadget_init(), if udc_start() fails due to some
reason, the ci->roles[CI_ROLE_GADGET] check in ci_hdrc_gadget_destroy
can't protect us.

We fix this issue by only setting ci->roles[CI_ROLE_GADGET] if
udc_start() succeed.

[ 1.398550] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
...
[ 1.448600] PC is at dma_pool_free+0xb8/0xf0
[ 1.453012] LR is at dma_pool_free+0x28/0xf0
[ 2.113369] [<ffffff80081817d8>] dma_pool_free+0xb8/0xf0
[ 2.118857] [<ffffff800841209c>] destroy_eps+0x4c/0x68
[ 2.124165] [<ffffff8008413770>] ci_hdrc_gadget_destroy+0x28/0x50
[ 2.130461] [<ffffff800840fa30>] ci_hdrc_probe+0x588/0x7e8
[ 2.136129] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
[ 2.142066] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
[ 2.148270] [<ffffff800837f68c>] __device_attach_driver+0x9c/0xf8
[ 2.154563] [<ffffff800837d570>] bus_for_each_drv+0x58/0x98
[ 2.160317] [<ffffff800837f174>] __device_attach+0xc4/0x138
[ 2.166072] [<ffffff800837f738>] device_initial_probe+0x10/0x18
[ 2.172185] [<ffffff800837e58c>] bus_probe_device+0x94/0xa0
[ 2.177940] [<ffffff800837c560>] device_add+0x3f0/0x560
[ 2.183337] [<ffffff8008380d20>] platform_device_add+0x180/0x240
[ 2.189541] [<ffffff800840f0e8>] ci_hdrc_add_device+0x440/0x4f8
[ 2.195654] [<ffffff8008414194>] ci_hdrc_usb2_probe+0x13c/0x2d8
[ 2.201769] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
[ 2.207705] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
[ 2.213910] [<ffffff800837f5ec>] __driver_attach+0xac/0xb0
[ 2.219575] [<ffffff800837d4b0>] bus_for_each_dev+0x60/0xa0
[ 2.225329] [<ffffff800837ec80>] driver_attach+0x20/0x28
[ 2.230816] [<ffffff800837e880>] bus_add_driver+0x1d0/0x238
[ 2.236571] [<ffffff800837fdb0>] driver_register+0x60/0xf8
[ 2.242237] [<ffffff8008380ef4>] __platform_driver_register+0x44/0x50
[ 2.248891] [<ffffff80086fd440>] ci_hdrc_usb2_driver_init+0x18/0x20
[ 2.255365] [<ffffff8008082950>] do_one_initcall+0x38/0x128
[ 2.261121] [<ffffff80086e0d00>] kernel_init_freeable+0x1ac/0x250
[ 2.267414] [<ffffff800852f0b8>] kernel_init+0x10/0x100
[ 2.272810] [<ffffff8008082680>] ret_from_fork+0x10/0x50

Fixes: 3f124d233e97 ("usb: chipidea: add role init and destroy APIs")
Signed-off-by: Jisheng Zhang <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/usb/chipidea/udc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/usb/chipidea/udc.c
+++ b/drivers/usb/chipidea/udc.c
@@ -1982,6 +1982,7 @@ static void udc_id_switch_for_host(struc
int ci_hdrc_gadget_init(struct ci_hdrc *ci)
{
struct ci_role_driver *rdrv;
+ int ret;

if (!hw_read(ci, CAP_DCCPARAMS, DCCPARAMS_DC))
return -ENXIO;
@@ -1994,7 +1995,10 @@ int ci_hdrc_gadget_init(struct ci_hdrc *
rdrv->stop = udc_id_switch_for_host;
rdrv->irq = udc_irq;
rdrv->name = "gadget";
- ci->roles[CI_ROLE_GADGET] = rdrv;

- return udc_start(ci);
+ ret = udc_start(ci);
+ if (!ret)
+ ci->roles[CI_ROLE_GADGET] = rdrv;
+
+ return ret;
}


2017-06-12 16:03:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 38/90] ext4: fix fdatasync(2) after extent manipulation operations

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 67a7d5f561f469ad2fa5154d2888258ab8e6df7c upstream.

Currently, extent manipulation operations such as hole punch, range
zeroing, or extent shifting do not record the fact that file data has
changed and thus fdatasync(2) has a work to do. As a result if we crash
e.g. after a punch hole and fdatasync, user can still possibly see the
punched out data after journal replay. Test generic/392 fails due to
these problems.

Fix the problem by properly marking that file data has changed in these
operations.

Fixes: a4bb6b64e39abc0e41ca077725f2a72c868e7622
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/extents.c | 5 +++++
fs/ext4/inode.c | 2 ++
2 files changed, 7 insertions(+)

--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4902,6 +4902,8 @@ static long ext4_zero_range(struct file

/* Zero out partial block at the edges of the range */
ret = ext4_zero_partial_blocks(handle, inode, offset, len);
+ if (ret >= 0)
+ ext4_update_inode_fsync_trans(handle, inode, 1);

if (file->f_flags & O_SYNC)
ext4_handle_sync(handle);
@@ -5597,6 +5599,7 @@ int ext4_collapse_range(struct inode *in
ext4_handle_sync(handle);
inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
ext4_mark_inode_dirty(handle, inode);
+ ext4_update_inode_fsync_trans(handle, inode, 1);

out_stop:
ext4_journal_stop(handle);
@@ -5770,6 +5773,8 @@ int ext4_insert_range(struct inode *inod
up_write(&EXT4_I(inode)->i_data_sem);
if (IS_SYNC(inode))
ext4_handle_sync(handle);
+ if (ret >= 0)
+ ext4_update_inode_fsync_trans(handle, inode, 1);

out_stop:
ext4_journal_stop(handle);
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3793,6 +3793,8 @@ int ext4_punch_hole(struct inode *inode,

inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
ext4_mark_inode_dirty(handle, inode);
+ if (ret >= 0)
+ ext4_update_inode_fsync_trans(handle, inode, 1);
out_stop:
ext4_journal_stop(handle);
out_dio:


2017-06-12 16:04:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 35/90] xen-netfront: cast grant table reference first to type int

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dongli Zhang <[email protected]>

commit 269ebce4531b8edc4224259a02143181a1c1d77c upstream.

IS_ERR_VALUE() in commit 87557efc27f6a50140fb20df06a917f368ce3c66
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.

Signed-off-by: Dongli Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Cc: Blake Cooper <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/xen-netfront.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -304,7 +304,7 @@ static void xennet_alloc_rx_buffers(stru
queue->rx_skbs[id] = skb;

ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
- WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));
+ WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref));
queue->grant_rx_ref[id] = ref;

page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
@@ -437,7 +437,7 @@ static void xennet_tx_setup_grant(unsign
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
- WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));
+ WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref));

gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
gfn, GNTMAP_readonly);


2017-06-12 15:40:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 62/90] btrfs: fix memory leak in update_space_info failure path

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeff Mahoney <[email protected]>

commit 896533a7da929136d0432713f02a3edffece2826 upstream.

If we fail to add the space_info kobject, we'll leak the memory
for the percpu counter.

Fixes: 6ab0a2029c (btrfs: publish allocation data in sysfs)
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/btrfs/extent-tree.c | 1 +
1 file changed, 1 insertion(+)

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3854,6 +3854,7 @@ static int update_space_info(struct btrf
info->space_info_kobj, "%s",
alloc_name(found->flags));
if (ret) {
+ percpu_counter_destroy(&found->total_bytes_pinned);
kfree(found);
return ret;
}


2017-06-12 16:04:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 65/90] powerpc/eeh: Avoid use after free in eeh_handle_special_event()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Russell Currey <[email protected]>

commit daeba2956f32f91f3493788ff6ee02fb1b2f02fa upstream.

eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE. This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Reported-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Russell Currey <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


---
arch/powerpc/kernel/eeh_driver.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -655,7 +655,7 @@ static int eeh_reset_device(struct eeh_p
*/
#define MAX_WAIT_FOR_RECOVERY 300

-static void eeh_handle_normal_event(struct eeh_pe *pe)
+static bool eeh_handle_normal_event(struct eeh_pe *pe)
{
struct pci_bus *frozen_bus;
int rc = 0;
@@ -665,7 +665,7 @@ static void eeh_handle_normal_event(stru
if (!frozen_bus) {
pr_err("%s: Cannot find PCI bus for PHB#%d-PE#%x\n",
__func__, pe->phb->global_number, pe->addr);
- return;
+ return false;
}

eeh_pe_update_time_stamp(pe);
@@ -790,7 +790,7 @@ static void eeh_handle_normal_event(stru
pr_info("EEH: Notify device driver to resume\n");
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);

- return;
+ return false;

excess_failures:
/*
@@ -831,7 +831,11 @@ perm_error:
pci_lock_rescan_remove();
pcibios_remove_pci_devices(frozen_bus);
pci_unlock_rescan_remove();
+
+ /* The passed PE should no longer be used */
+ return true;
}
+ return false;
}

static void eeh_handle_special_event(void)
@@ -897,7 +901,14 @@ static void eeh_handle_special_event(voi
*/
if (rc == EEH_NEXT_ERR_FROZEN_PE ||
rc == EEH_NEXT_ERR_FENCED_PHB) {
- eeh_handle_normal_event(pe);
+ /*
+ * eeh_handle_normal_event() can make the PE stale if it
+ * determines that the PE cannot possibly be recovered.
+ * Don't modify the PE state if that's the case.
+ */
+ if (eeh_handle_normal_event(pe))
+ continue;
+
eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
} else {
pci_lock_rescan_remove();


2017-06-12 16:05:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 67/90] powerpc/hotplug-mem: Fix missing endian conversion of aa_index

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Michael Bringmann <[email protected]>

commit dc421b200f91930c9c6a9586810ff8c232cf10fc upstream.

When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Fixes: 5f97b2a0d176 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Signed-off-by: Michael Bringmann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/platforms/pseries/hotplug-memory.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -110,6 +110,7 @@ static struct property *dlpar_clone_drco
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = be64_to_cpu(lmbs[i].base_addr);
lmbs[i].drc_index = be32_to_cpu(lmbs[i].drc_index);
+ lmbs[i].aa_index = be32_to_cpu(lmbs[i].aa_index);
lmbs[i].flags = be32_to_cpu(lmbs[i].flags);
}

@@ -553,6 +554,7 @@ static void dlpar_update_drconf_property
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = cpu_to_be64(lmbs[i].base_addr);
lmbs[i].drc_index = cpu_to_be32(lmbs[i].drc_index);
+ lmbs[i].aa_index = cpu_to_be32(lmbs[i].aa_index);
lmbs[i].flags = cpu_to_be32(lmbs[i].flags);
}



2017-06-12 15:40:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 63/90] KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit d6dbdd3c8558cad3b6d74cc357b408622d122331 upstream.

Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:

[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G W 4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
[ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
[ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
[ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
[ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
[ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
[ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
[ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
[ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
[ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
[...]

The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.

Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Christoffer Dall <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kvm/mmu.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -869,6 +869,9 @@ static pmd_t *stage2_get_pmd(struct kvm
pmd_t *pmd;

pud = stage2_get_pud(kvm, cache, addr);
+ if (!pud)
+ return NULL;
+
if (pud_none(*pud)) {
if (!cache)
return NULL;


2017-06-12 15:39:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 34/90] xen-netfront: do not cast grant table reference to signed short

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dongli Zhang <[email protected]>

commit 87557efc27f6a50140fb20df06a917f368ce3c66 upstream.

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.

Signed-off-by: Dongli Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Cc: Blake Cooper <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/xen-netfront.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -304,7 +304,7 @@ static void xennet_alloc_rx_buffers(stru
queue->rx_skbs[id] = skb;

ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
- BUG_ON((signed short)ref < 0);
+ WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));
queue->grant_rx_ref[id] = ref;

page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
@@ -437,7 +437,7 @@ static void xennet_tx_setup_grant(unsign
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
- BUG_ON((signed short)ref < 0);
+ WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)ref));

gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
gfn, GNTMAP_readonly);


2017-06-12 16:06:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 59/90] ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 006351ac8ead0d4a67dd3845e3ceffe650a23212 upstream.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ufs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -403,7 +403,9 @@ static int ufs_getfrag_block(struct inod

if (!create) {
phys64 = ufs_frag_map(inode, offsets, depth);
- goto out;
+ if (phys64)
+ map_bh(bh_result, sb, phys64 + frag);
+ return 0;
}

/* This code entered only while writing ....? */


2017-06-12 16:07:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 64/90] scsi: qla2xxx: dont disable a not previously enabled PCI device

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johannes Thumshirn <[email protected]>

commit ddff7ed45edce4a4c92949d3c61cd25d229c4a14 upstream.

When pci_enable_device() or pci_enable_device_mem() fail in
qla2x00_probe_one() we bail out but do a call to
pci_disable_device(). This causes the dev_WARN_ON() in
pci_disable_device() to trigger, as the device wasn't enabled
previously.

So instead of taking the 'probe_out' error path we can directly return
*iff* one of the pci_enable_device() calls fails.

Additionally rename the 'probe_out' goto label's name to the more
descriptive 'disable_device'.

Signed-off-by: Johannes Thumshirn <[email protected]>
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Giridhar Malavali <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/scsi/qla2xxx/qla_os.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -2311,10 +2311,10 @@ qla2x00_probe_one(struct pci_dev *pdev,

if (mem_only) {
if (pci_enable_device_mem(pdev))
- goto probe_out;
+ return ret;
} else {
if (pci_enable_device(pdev))
- goto probe_out;
+ return ret;
}

/* This may fail but that's ok */
@@ -2324,7 +2324,7 @@ qla2x00_probe_one(struct pci_dev *pdev,
if (!ha) {
ql_log_pci(ql_log_fatal, pdev, 0x0009,
"Unable to allocate memory for ha.\n");
- goto probe_out;
+ goto disable_device;
}
ql_dbg_pci(ql_dbg_init, pdev, 0x000a,
"Memory allocated for ha=%p.\n", ha);
@@ -2923,7 +2923,7 @@ iospace_config_failed:
kfree(ha);
ha = NULL;

-probe_out:
+disable_device:
pci_disable_device(pdev);
return ret;
}


2017-06-12 16:07:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 57/90] ufs: set correct ->s_maxsize

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 6b0d144fa758869bdd652c50aa41aaf601232550 upstream.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ufs/super.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -746,6 +746,23 @@ static void ufs_put_super(struct super_b
return;
}

+static u64 ufs_max_bytes(struct super_block *sb)
+{
+ struct ufs_sb_private_info *uspi = UFS_SB(sb)->s_uspi;
+ int bits = uspi->s_apbshift;
+ u64 res;
+
+ if (bits > 21)
+ res = ~0ULL;
+ else
+ res = UFS_NDADDR + (1LL << bits) + (1LL << (2*bits)) +
+ (1LL << (3*bits));
+
+ if (res >= (MAX_LFS_FILESIZE >> uspi->s_bshift))
+ return MAX_LFS_FILESIZE;
+ return res << uspi->s_bshift;
+}
+
static int ufs_fill_super(struct super_block *sb, void *data, int silent)
{
struct ufs_sb_info * sbi;
@@ -1212,6 +1229,7 @@ magic_found:
"fast symlink size (%u)\n", uspi->s_maxsymlinklen);
uspi->s_maxsymlinklen = maxsymlen;
}
+ sb->s_maxbytes = ufs_max_bytes(sb);
sb->s_max_links = UFS_LINK_MAX;

inode = ufs_iget(sb, UFS_ROOTINO);


2017-06-12 16:07:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 55/90] fix ufs_isblockset()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 414cf7186dbec29bd946c138d6b5c09da5955a08 upstream.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ufs/util.h | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- a/fs/ufs/util.h
+++ b/fs/ufs/util.h
@@ -473,15 +473,19 @@ static inline unsigned _ubh_find_last_ze
static inline int _ubh_isblockset_(struct ufs_sb_private_info * uspi,
struct ufs_buffer_head * ubh, unsigned begin, unsigned block)
{
+ u8 mask;
switch (uspi->s_fpb) {
case 8:
return (*ubh_get_addr (ubh, begin + block) == 0xff);
case 4:
- return (*ubh_get_addr (ubh, begin + (block >> 1)) == (0x0f << ((block & 0x01) << 2)));
+ mask = 0x0f << ((block & 0x01) << 2);
+ return (*ubh_get_addr (ubh, begin + (block >> 1)) & mask) == mask;
case 2:
- return (*ubh_get_addr (ubh, begin + (block >> 2)) == (0x03 << ((block & 0x03) << 1)));
+ mask = 0x03 << ((block & 0x03) << 1);
+ return (*ubh_get_addr (ubh, begin + (block >> 2)) & mask) == mask;
case 1:
- return (*ubh_get_addr (ubh, begin + (block >> 3)) == (0x01 << (block & 0x07)));
+ mask = 0x01 << (block & 0x07);
+ return (*ubh_get_addr (ubh, begin + (block >> 3)) & mask) == mask;
}
return 0;
}


2017-06-12 15:39:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 58/90] ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 940ef1a0ed939c2ca029fca715e25e7778ce1e34 upstream.

... and it really needs splitting into "new" and "extend" cases, but that's for
later

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ufs/inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -235,7 +235,8 @@ ufs_extend_tail(struct inode *inode, u64

p = ufs_get_direct_data_ptr(uspi, ufsi, block);
tmp = ufs_new_fragments(inode, p, lastfrag, ufs_data_ptr_to_cpu(sb, p),
- new_size, err, locked_page);
+ new_size - (lastfrag & uspi->s_fpbmask), err,
+ locked_page);
return tmp != 0;
}



2017-06-12 16:08:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 53/90] fs: add i_blocksize()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Fabian Frederick <[email protected]>

commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream.

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[[email protected]: truncate: use i_blocksize()]
Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Ross Zwisler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/btrfs/file.c | 2 +-
fs/buffer.c | 12 ++++++------
fs/ceph/addr.c | 2 +-
fs/direct-io.c | 2 +-
fs/ext4/inode.c | 2 +-
fs/ext4/move_extent.c | 2 +-
fs/jfs/super.c | 4 ++--
fs/mpage.c | 2 +-
fs/nfsd/blocklayout.c | 4 ++--
fs/nilfs2/btnode.c | 2 +-
fs/nilfs2/inode.c | 4 ++--
fs/nilfs2/mdt.c | 4 ++--
fs/nilfs2/segment.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/ocfs2/file.c | 2 +-
fs/reiserfs/file.c | 2 +-
fs/reiserfs/inode.c | 2 +-
fs/stat.c | 2 +-
fs/udf/inode.c | 2 +-
fs/xfs/xfs_aops.c | 10 +++++-----
fs/xfs/xfs_file.c | 4 ++--
include/linux/fs.h | 5 +++++
mm/truncate.c | 2 +-
23 files changed, 41 insertions(+), 36 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2771,7 +2771,7 @@ static long btrfs_fallocate(struct file
if (!ret)
ret = btrfs_prealloc_file_range(inode, mode,
range->start,
- range->len, 1 << inode->i_blkbits,
+ range->len, i_blocksize(inode),
offset + len, &alloc_hint);
list_del(&range->list);
kfree(range);
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2298,7 +2298,7 @@ static int cont_expand_zero(struct file
loff_t pos, loff_t *bytes)
{
struct inode *inode = mapping->host;
- unsigned blocksize = 1 << inode->i_blkbits;
+ unsigned int blocksize = i_blocksize(inode);
struct page *page;
void *fsdata;
pgoff_t index, curidx;
@@ -2378,8 +2378,8 @@ int cont_write_begin(struct file *file,
get_block_t *get_block, loff_t *bytes)
{
struct inode *inode = mapping->host;
- unsigned blocksize = 1 << inode->i_blkbits;
- unsigned zerofrom;
+ unsigned int blocksize = i_blocksize(inode);
+ unsigned int zerofrom;
int err;

err = cont_expand_zero(file, mapping, pos, bytes);
@@ -2741,7 +2741,7 @@ int nobh_truncate_page(struct address_sp
struct buffer_head map_bh;
int err;

- blocksize = 1 << inode->i_blkbits;
+ blocksize = i_blocksize(inode);
length = offset & (blocksize - 1);

/* Block boundary? Nothing to do */
@@ -2819,7 +2819,7 @@ int block_truncate_page(struct address_s
struct buffer_head *bh;
int err;

- blocksize = 1 << inode->i_blkbits;
+ blocksize = i_blocksize(inode);
length = offset & (blocksize - 1);

/* Block boundary? Nothing to do */
@@ -2931,7 +2931,7 @@ sector_t generic_block_bmap(struct addre
struct inode *inode = mapping->host;
tmp.b_state = 0;
tmp.b_blocknr = 0;
- tmp.b_size = 1 << inode->i_blkbits;
+ tmp.b_size = i_blocksize(inode);
get_block(inode, block, &tmp, 0);
return tmp.b_blocknr;
}
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -697,7 +697,7 @@ static int ceph_writepages_start(struct
struct pagevec pvec;
int done = 0;
int rc = 0;
- unsigned wsize = 1 << inode->i_blkbits;
+ unsigned int wsize = i_blocksize(inode);
struct ceph_osd_request *req = NULL;
int do_sync = 0;
loff_t snap_size, i_size;
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -575,7 +575,7 @@ static int dio_set_defer_completion(stru
/*
* Call into the fs to map some more disk blocks. We record the current number
* of available blocks at sdio->blocks_available. These are in units of the
- * fs blocksize, (1 << inode->i_blkbits).
+ * fs blocksize, i_blocksize(inode).
*
* The fs is allowed to map lots of blocks at once. If it wants to do that,
* it uses the passed inode-relative block number as the file offset, as usual.
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2044,7 +2044,7 @@ static int mpage_process_page_bufs(struc
{
struct inode *inode = mpd->inode;
int err;
- ext4_lblk_t blocks = (i_size_read(inode) + (1 << inode->i_blkbits) - 1)
+ ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
>> inode->i_blkbits;

do {
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -187,7 +187,7 @@ mext_page_mkuptodate(struct page *page,
if (PageUptodate(page))
return 0;

- blocksize = 1 << inode->i_blkbits;
+ blocksize = i_blocksize(inode);
if (!page_has_buffers(page))
create_empty_buffers(page, blocksize, 0);

--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -758,7 +758,7 @@ static ssize_t jfs_quota_read(struct sup
sb->s_blocksize - offset : toread;

tmp_bh.b_state = 0;
- tmp_bh.b_size = 1 << inode->i_blkbits;
+ tmp_bh.b_size = i_blocksize(inode);
err = jfs_get_block(inode, blk, &tmp_bh, 0);
if (err)
return err;
@@ -798,7 +798,7 @@ static ssize_t jfs_quota_write(struct su
sb->s_blocksize - offset : towrite;

tmp_bh.b_state = 0;
- tmp_bh.b_size = 1 << inode->i_blkbits;
+ tmp_bh.b_size = i_blocksize(inode);
err = jfs_get_block(inode, blk, &tmp_bh, 1);
if (err)
goto out;
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -111,7 +111,7 @@ map_buffer_to_page(struct page *page, st
SetPageUptodate(page);
return;
}
- create_empty_buffers(page, 1 << inode->i_blkbits, 0);
+ create_empty_buffers(page, i_blocksize(inode), 0);
}
head = page_buffers(page);
page_bh = head;
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -50,7 +50,7 @@ nfsd4_block_proc_layoutget(struct inode
{
struct nfsd4_layout_seg *seg = &args->lg_seg;
struct super_block *sb = inode->i_sb;
- u32 block_size = (1 << inode->i_blkbits);
+ u32 block_size = i_blocksize(inode);
struct pnfs_block_extent *bex;
struct iomap iomap;
u32 device_generation = 0;
@@ -151,7 +151,7 @@ nfsd4_block_proc_layoutcommit(struct ino
int error;

nr_iomaps = nfsd4_block_decode_layoutupdate(lcp->lc_up_layout,
- lcp->lc_up_len, &iomaps, 1 << inode->i_blkbits);
+ lcp->lc_up_len, &iomaps, i_blocksize(inode));
if (nr_iomaps < 0)
return nfserrno(nr_iomaps);

--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -55,7 +55,7 @@ nilfs_btnode_create_block(struct address
brelse(bh);
BUG();
}
- memset(bh->b_data, 0, 1 << inode->i_blkbits);
+ memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
bh->b_blocknr = blocknr;
set_buffer_mapped(bh);
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -55,7 +55,7 @@ void nilfs_inode_add_blocks(struct inode
{
struct nilfs_root *root = NILFS_I(inode)->i_root;

- inode_add_bytes(inode, (1 << inode->i_blkbits) * n);
+ inode_add_bytes(inode, i_blocksize(inode) * n);
if (root)
atomic64_add(n, &root->blocks_count);
}
@@ -64,7 +64,7 @@ void nilfs_inode_sub_blocks(struct inode
{
struct nilfs_root *root = NILFS_I(inode)->i_root;

- inode_sub_bytes(inode, (1 << inode->i_blkbits) * n);
+ inode_sub_bytes(inode, i_blocksize(inode) * n);
if (root)
atomic64_sub(n, &root->blocks_count);
}
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -60,7 +60,7 @@ nilfs_mdt_insert_new_block(struct inode
set_buffer_mapped(bh);

kaddr = kmap_atomic(bh->b_page);
- memset(kaddr + bh_offset(bh), 0, 1 << inode->i_blkbits);
+ memset(kaddr + bh_offset(bh), 0, i_blocksize(inode));
if (init_block)
init_block(inode, bh, kaddr);
flush_dcache_page(bh->b_page);
@@ -503,7 +503,7 @@ void nilfs_mdt_set_entry_size(struct ino
struct nilfs_mdt_info *mi = NILFS_MDT(inode);

mi->mi_entry_size = entry_size;
- mi->mi_entries_per_block = (1 << inode->i_blkbits) / entry_size;
+ mi->mi_entries_per_block = i_blocksize(inode) / entry_size;
mi->mi_first_entry_offset = DIV_ROUND_UP(header_size, entry_size);
}

--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -719,7 +719,7 @@ static size_t nilfs_lookup_dirty_data_bu

lock_page(page);
if (!page_has_buffers(page))
- create_empty_buffers(page, 1 << inode->i_blkbits, 0);
+ create_empty_buffers(page, i_blocksize(inode), 0);
unlock_page(page);

bh = head = page_buffers(page);
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1103,7 +1103,7 @@ int ocfs2_map_page_blocks(struct page *p
int ret = 0;
struct buffer_head *head, *bh, *wait[2], **wait_bh = wait;
unsigned int block_end, block_start;
- unsigned int bsize = 1 << inode->i_blkbits;
+ unsigned int bsize = i_blocksize(inode);

if (!page_has_buffers(page))
create_empty_buffers(page, bsize, 0);
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -808,7 +808,7 @@ static int ocfs2_write_zero_page(struct
/* We know that zero_from is block aligned */
for (block_start = zero_from; block_start < zero_to;
block_start = block_end) {
- block_end = block_start + (1 << inode->i_blkbits);
+ block_end = block_start + i_blocksize(inode);

/*
* block_start is block-aligned. Bump it by one to force
--- a/fs/reiserfs/file.c
+++ b/fs/reiserfs/file.c
@@ -189,7 +189,7 @@ int reiserfs_commit_page(struct inode *i
int ret = 0;

th.t_trans_id = 0;
- blocksize = 1 << inode->i_blkbits;
+ blocksize = i_blocksize(inode);

if (logit) {
reiserfs_write_lock(s);
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -524,7 +524,7 @@ static int reiserfs_get_blocks_direct_io
* referenced in convert_tail_for_hole() that may be called from
* reiserfs_get_block()
*/
- bh_result->b_size = (1 << inode->i_blkbits);
+ bh_result->b_size = i_blocksize(inode);

ret = reiserfs_get_block(inode, iblock, bh_result,
create | GET_BLOCK_NO_DANGLE);
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -31,7 +31,7 @@ void generic_fillattr(struct inode *inod
stat->atime = inode->i_atime;
stat->mtime = inode->i_mtime;
stat->ctime = inode->i_ctime;
- stat->blksize = (1 << inode->i_blkbits);
+ stat->blksize = i_blocksize(inode);
stat->blocks = inode->i_blocks;
}

--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -1206,7 +1206,7 @@ int udf_setsize(struct inode *inode, lof
{
int err;
struct udf_inode_info *iinfo;
- int bsize = 1 << inode->i_blkbits;
+ int bsize = i_blocksize(inode);

if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode)))
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -288,7 +288,7 @@ xfs_map_blocks(
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
- ssize_t count = 1 << inode->i_blkbits;
+ ssize_t count = i_blocksize(inode);
xfs_fileoff_t offset_fsb, end_fsb;
int error = 0;
int bmapi_flags = XFS_BMAPI_ENTIRE;
@@ -921,7 +921,7 @@ xfs_aops_discard_page(
break;
}
next_buffer:
- offset += 1 << inode->i_blkbits;
+ offset += i_blocksize(inode);

} while ((bh = bh->b_this_page) != head);

@@ -1363,7 +1363,7 @@ xfs_map_trim_size(
offset + mapping_size >= i_size_read(inode)) {
/* limit mapping to block that spans EOF */
mapping_size = roundup_64(i_size_read(inode) - offset,
- 1 << inode->i_blkbits);
+ i_blocksize(inode));
}
if (mapping_size > LONG_MAX)
mapping_size = LONG_MAX;
@@ -1395,7 +1395,7 @@ __xfs_get_blocks(
return -EIO;

offset = (xfs_off_t)iblock << inode->i_blkbits;
- ASSERT(bh_result->b_size >= (1 << inode->i_blkbits));
+ ASSERT(bh_result->b_size >= i_blocksize(inode));
size = bh_result->b_size;

if (!create && direct && offset >= i_size_read(inode))
@@ -1968,7 +1968,7 @@ xfs_vm_set_page_dirty(
if (offset < end_offset)
set_buffer_dirty(bh);
bh = bh->b_this_page;
- offset += 1 << inode->i_blkbits;
+ offset += i_blocksize(inode);
} while (bh != head);
}
/*
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -947,7 +947,7 @@ xfs_file_fallocate(
if (error)
goto out_unlock;
} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
- unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
+ unsigned int blksize_mask = i_blocksize(inode) - 1;

if (offset & blksize_mask || len & blksize_mask) {
error = -EINVAL;
@@ -969,7 +969,7 @@ xfs_file_fallocate(
if (error)
goto out_unlock;
} else if (mode & FALLOC_FL_INSERT_RANGE) {
- unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
+ unsigned int blksize_mask = i_blocksize(inode) - 1;

new_size = i_size_read(inode) + len;
if (offset & blksize_mask || len & blksize_mask) {
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -680,6 +680,11 @@ struct inode {
void *i_private; /* fs or device private pointer */
};

+static inline unsigned int i_blocksize(const struct inode *node)
+{
+ return (1 << node->i_blkbits);
+}
+
static inline int inode_unhashed(struct inode *inode)
{
return hlist_unhashed(&inode->i_hash);
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -732,7 +732,7 @@ EXPORT_SYMBOL(truncate_setsize);
*/
void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
{
- int bsize = 1 << inode->i_blkbits;
+ int bsize = i_blocksize(inode);
loff_t rounded_from;
struct page *page;
pgoff_t index;


2017-06-12 16:09:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 52/90] cpuset: consider dying css as offline

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 41c25707d21716826e3c1f60967f5550610ec1c9 upstream.

In most cases, a cgroup controller don't care about the liftimes of
cgroups. For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not. Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period. The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending. This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Daniel Jordan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/cgroup.h | 20 ++++++++++++++++++++
kernel/cpuset.c | 4 ++--
2 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -340,6 +340,26 @@ static inline bool css_tryget_online(str
}

/**
+ * css_is_dying - test whether the specified css is dying
+ * @css: target css
+ *
+ * Test whether @css is in the process of offlining or already offline. In
+ * most cases, ->css_online() and ->css_offline() callbacks should be
+ * enough; however, the actual offline operations are RCU delayed and this
+ * test returns %true also when @css is scheduled to be offlined.
+ *
+ * This is useful, for example, when the use case requires synchronous
+ * behavior with respect to cgroup removal. cgroup removal schedules css
+ * offlining but the css can seem alive while the operation is being
+ * delayed. If the delay affects user visible semantics, this test can be
+ * used to resolve the situation.
+ */
+static inline bool css_is_dying(struct cgroup_subsys_state *css)
+{
+ return !(css->flags & CSS_NO_REF) && percpu_ref_is_dying(&css->refcnt);
+}
+
+/**
* css_put - put a css reference
* @css: target css
*
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -173,9 +173,9 @@ typedef enum {
} cpuset_flagbits_t;

/* convenient tests for these bits */
-static inline bool is_cpuset_online(const struct cpuset *cs)
+static inline bool is_cpuset_online(struct cpuset *cs)
{
- return test_bit(CS_ONLINE, &cs->flags);
+ return test_bit(CS_ONLINE, &cs->flags) && !css_is_dying(&cs->css);
}

static inline int is_cpu_exclusive(const struct cpuset *cs)


2017-06-12 15:39:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 49/90] target: Re-add check to reject control WRITEs with overflow data

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 4ff83daa0200affe1894bd33d17bac404e3d78d4 upstream.

During v4.3 when the overflow/underflow check was relaxed by
commit c72c525022:

commit c72c5250224d475614a00c1d7e54a67f77cd3410
Author: Roland Dreier <[email protected]>
Date: Wed Jul 22 15:08:18 2015 -0700

target: allow underflow/overflow for PR OUT etc. commands

to allow underflow/overflow for Windows compliance + FCP, a
consequence was to allow control CDBs to process overflow
data for iscsi-target with immediate data as well.

As per Roland's original change, continue to allow underflow
cases for control CDBs to make Windows compliance + FCP happy,
but until overflow for control CDBs is supported tree-wide,
explicitly reject all control WRITEs with overflow following
pre v4.3.y logic.

Reported-by: Bart Van Assche <[email protected]>
Cc: Roland Dreier <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/target/target_core_transport.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1154,15 +1154,28 @@ target_cmd_size_check(struct se_cmd *cmd
if (cmd->unknown_data_length) {
cmd->data_length = size;
} else if (size != cmd->data_length) {
- pr_warn("TARGET_CORE[%s]: Expected Transfer Length:"
+ pr_warn_ratelimited("TARGET_CORE[%s]: Expected Transfer Length:"
" %u does not match SCSI CDB Length: %u for SAM Opcode:"
" 0x%02x\n", cmd->se_tfo->get_fabric_name(),
cmd->data_length, size, cmd->t_task_cdb[0]);

- if (cmd->data_direction == DMA_TO_DEVICE &&
- cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
- pr_err("Rejecting underflow/overflow WRITE data\n");
- return TCM_INVALID_CDB_FIELD;
+ if (cmd->data_direction == DMA_TO_DEVICE) {
+ if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
+ pr_err_ratelimited("Rejecting underflow/overflow"
+ " for WRITE data CDB\n");
+ return TCM_INVALID_CDB_FIELD;
+ }
+ /*
+ * Some fabric drivers like iscsi-target still expect to
+ * always reject overflow writes. Reject this case until
+ * full fabric driver level support for overflow writes
+ * is introduced tree-wide.
+ */
+ if (size > cmd->data_length) {
+ pr_err_ratelimited("Rejecting overflow for"
+ " WRITE control CDB\n");
+ return TCM_INVALID_CDB_FIELD;
+ }
}
/*
* Reject READ_* or WRITE_* with overflow/underflow for


2017-06-12 16:09:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 33/90] xen/privcmd: Support correctly 64KB page granularity when mapping memory

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julien Grall <[email protected]>

commit 753c09b5652bb4fe53e2db648002ec64b32b8827 upstream.

Commit 5995a68 "xen/privcmd: Add support for Linux 64KB page granularity" did
not go far enough to support 64KB in mmap_batch_fn.

The variable 'nr' is the number of 4KB chunk to map. However, when Linux
is using 64KB page granularity the array of pages (vma->vm_private_data)
contain one page per 64KB. Fix it by incrementing st->index correctly.

Furthermore, st->va is not correctly incremented as PAGE_SIZE !=
XEN_PAGE_SIZE.

Fixes: 5995a68 ("xen/privcmd: Add support for Linux 64KB page granularity")
Reported-by: Feng Kan <[email protected]>
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/xen/privcmd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -335,8 +335,8 @@ static int mmap_batch_fn(void *data, int
st->global_error = 1;
}
}
- st->va += PAGE_SIZE * nr;
- st->index += nr;
+ st->va += XEN_PAGE_SIZE * nr;
+ st->index += nr / XEN_PFN_PER_PAGE;

return 0;
}


2017-06-12 16:10:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 51/90] Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ulrik De Bie <[email protected]>

commit 47eb0c8b4d9eb6368941c6a9bb443f00847a46d7 upstream.

The Lifebook E546 and E557 touchpad were also not functioning and
worked after running:

echo "1" > /sys/devices/platform/i8042/serio2/crc_enabled

Add them to the list of machines that need this workaround.

Signed-off-by: Ulrik De Bie <[email protected]>
Reviewed-by: Arjan Opmeer <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/elantech.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1122,8 +1122,10 @@ static int elantech_get_resolution_v4(st
* Asus UX32VD 0x361f02 00, 15, 0e clickpad
* Avatar AVIU-145A2 0x361f00 ? clickpad
* Fujitsu LIFEBOOK E544 0x470f00 d0, 12, 09 2 hw buttons
+ * Fujitsu LIFEBOOK E546 0x470f00 50, 12, 09 2 hw buttons
* Fujitsu LIFEBOOK E547 0x470f00 50, 12, 09 2 hw buttons
* Fujitsu LIFEBOOK E554 0x570f01 40, 14, 0c 2 hw buttons
+ * Fujitsu LIFEBOOK E557 0x570f01 40, 14, 0c 2 hw buttons
* Fujitsu T725 0x470f01 05, 12, 09 2 hw buttons
* Fujitsu H730 0x570f00 c0, 14, 0c 3 hw buttons (**)
* Gigabyte U2442 0x450f01 58, 17, 0c 2 hw buttons
@@ -1529,6 +1531,13 @@ static const struct dmi_system_id elante
},
},
{
+ /* Fujitsu LIFEBOOK E546 does not work with crc_enabled == 0 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E546"),
+ },
+ },
+ {
/* Fujitsu LIFEBOOK E547 does not work with crc_enabled == 0 */
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
@@ -1550,6 +1559,13 @@ static const struct dmi_system_id elante
},
},
{
+ /* Fujitsu LIFEBOOK E557 does not work with crc_enabled == 0 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E557"),
+ },
+ },
+ {
/* Fujitsu LIFEBOOK U745 does not work with crc_enabled == 0 */
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),


2017-06-12 16:10:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 47/90] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Micay <[email protected]>

commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Arjan van Ven <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -368,7 +368,7 @@ static struct task_struct *dup_task_stru
set_task_stack_end_magic(tsk);

#ifdef CONFIG_CC_STACKPROTECTOR
- tsk->stack_canary = get_random_int();
+ tsk->stack_canary = get_random_long();
#endif

/*


2017-06-12 16:10:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 48/90] cpufreq: cpufreq_register_driver() should return -ENODEV if init fails

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Arcari <[email protected]>

commit 6c77003677d5f1ce15f26d24360cb66c0bc07bb3 upstream.

For a driver that does not set the CPUFREQ_STICKY flag, if all of the
->init() calls fail, cpufreq_register_driver() should return an error.
This will prevent the driver from loading.

Fixes: ce1bcfe94db8 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
Signed-off-by: David Arcari <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/cpufreq/cpufreq.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2451,6 +2451,7 @@ int cpufreq_register_driver(struct cpufr
if (!(cpufreq_driver->flags & CPUFREQ_STICKY) &&
list_empty(&cpufreq_policy_list)) {
/* if all ->init() calls failed, unregister */
+ ret = -ENODEV;
pr_debug("%s: No CPU initialized for driver %s\n", __func__,
driver_data->name);
goto err_if_unreg;


2017-06-12 16:10:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 46/90] random: properly align get_random_int_hash

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit b1132deac01c2332d234fa821a70022796b79182 upstream.

get_random_long() reads from the get_random_int_hash array using an
unsigned long pointer. For this code to be guaranteed correct on all
architectures, the array must be aligned to an unsigned long boundary.

Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/char/random.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1798,13 +1798,15 @@ int random_int_secret_init(void)
return 0;
}

+static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
+ __aligned(sizeof(unsigned long));
+
/*
* Get a random word for internal kernel use only. Similar to urandom but
* with the goal of minimal entropy pool depletion. As a result, the random
* value is not cryptographically secure but for several uses the cost of
* depleting entropy is too high
*/
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash);
unsigned int get_random_int(void)
{
__u32 *hash;


2017-06-12 15:39:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 42/90] staging/lustre/lov: remove set_fs() call from lov_getstripe()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Oleg Drokin <[email protected]>

commit 0a33252e060e97ed3fbdcec9517672f1e91aaef3 upstream.

lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
lov_user_md pointer from user- or kernel-space. This changes the
behavior of copy_from_user() on SPARC and may result in a misaligned
access exception which in turn oopses the kernel. In fact the
relevant argument to lov_getstripe() is never called with a
kernel-space pointer and so changing the address limits is unnecessary
and so we remove the calls to save, set, and restore the address
limits.

Signed-off-by: John L. Hammond <[email protected]>
Reviewed-on: http://review.whamcloud.com/6150
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Li Wei <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/staging/lustre/lustre/lov/lov_pack.c | 9 ---------
1 file changed, 9 deletions(-)

--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -399,18 +399,10 @@ int lov_getstripe(struct obd_export *exp
struct lov_mds_md *lmmk = NULL;
int rc, lmm_size;
int lum_size;
- mm_segment_t seg;

if (!lsm)
return -ENODATA;

- /*
- * "Switch to kernel segment" to allow copying from kernel space by
- * copy_{to,from}_user().
- */
- seg = get_fs();
- set_fs(KERNEL_DS);
-
/* we only need the header part from user space to get lmm_magic and
* lmm_stripe_count, (the header part is common to v1 and v3) */
lum_size = sizeof(struct lov_user_md_v1);
@@ -485,6 +477,5 @@ int lov_getstripe(struct obd_export *exp

obd_free_diskmd(exp, &lmmk);
out_set:
- set_fs(seg);
return rc;
}


2017-06-12 16:11:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 43/90] iio: light: ltr501 Fix interchanged als/ps register field

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Franziska Naepelt <[email protected]>

commit 7cc3bff4efe6164a0c8163331c8aa55454799f42 upstream.

The register mapping for the IIO driver for the Liteon Light and Proximity
sensor LTR501 interrupt mode is interchanged (ALS/PS).
There is a register called INTERRUPT register (address 0x8F)
Bit 0 represents PS measurement trigger.
Bit 1 represents ALS measurement trigger.
This two bit fields are interchanged within the driver.
see datasheet page 24:
http://optoelectronics.liteon.com/upload/download/DS86-2012-0006/S_110_LTR-501ALS-01_PrelimDS_ver1%5B1%5D.pdf

Signed-off-by: Franziska Naepelt <[email protected]>
Fixes: 7ac702b3144b6 ("iio: ltr501: Add interrupt support")
Acked-by: Peter Meerwald-Stadler <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iio/light/ltr501.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iio/light/ltr501.c
+++ b/drivers/iio/light/ltr501.c
@@ -74,9 +74,9 @@ static const int int_time_mapping[] = {1
static const struct reg_field reg_field_it =
REG_FIELD(LTR501_ALS_MEAS_RATE, 3, 4);
static const struct reg_field reg_field_als_intr =
- REG_FIELD(LTR501_INTR, 0, 0);
-static const struct reg_field reg_field_ps_intr =
REG_FIELD(LTR501_INTR, 1, 1);
+static const struct reg_field reg_field_ps_intr =
+ REG_FIELD(LTR501_INTR, 0, 0);
static const struct reg_field reg_field_als_rate =
REG_FIELD(LTR501_ALS_MEAS_RATE, 0, 2);
static const struct reg_field reg_field_ps_rate =


2017-06-12 15:38:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 08/90] net: ethoc: enable NAPI before poll may be scheduled

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Max Filippov <[email protected]>


[ Upstream commit d220b942a4b6a0640aee78841608f4aa5e8e185e ]

ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <[email protected]>
Reviewed-by: Tobias Klauser <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/ethoc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -713,6 +713,8 @@ static int ethoc_open(struct net_device
if (ret)
return ret;

+ napi_enable(&priv->napi);
+
ethoc_init_ring(priv, dev->mem_start);
ethoc_reset(priv);

@@ -725,7 +727,6 @@ static int ethoc_open(struct net_device
}

phy_start(priv->phy);
- napi_enable(&priv->napi);

if (netif_msg_ifup(priv)) {
dev_info(&dev->dev, "I/O: %08lx Memory: %08lx-%08lx\n",


2017-06-12 16:12:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 32/90] dmaengine: ep93xx: Always start from BASE0

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Sverdlin <[email protected]>

commit 0037ae47812b1f431cc602100d1d51f37d77b61e upstream.

The current buffer is being reset to zero on device_free_chan_resources()
but not on device_terminate_all(). It could happen that HW is restarted and
expects BASE0 to be used, but the driver is not synchronized and will start
from BASE1. One solution is to reset the buffer explicitly in
m2p_hw_setup().

Signed-off-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/dma/ep93xx_dma.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/dma/ep93xx_dma.c
+++ b/drivers/dma/ep93xx_dma.c
@@ -325,6 +325,8 @@ static int m2p_hw_setup(struct ep93xx_dm
| M2P_CONTROL_ENABLE;
m2p_set_control(edmac, control);

+ edmac->buffer = 0;
+
return 0;
}



2017-06-12 15:38:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 09/90] net: bridge: start hello timer only if device is up

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <[email protected]>


[ Upstream commit aeb073241fe7a2b932e04e20c60e47718332877f ]

When the transition of NO_STP -> KERNEL_STP was fixed by always calling
mod_timer in br_stp_start, it introduced a new regression which causes
the timer to be armed even when the bridge is down, and since we stop
the timers in its ndo_stop() function, they never get disabled if the
device is destroyed before it's upped.

To reproduce:
$ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on;
ip l del br0; done;

CC: Xin Long <[email protected]>
CC: Ivan Vecera <[email protected]>
CC: Sebastian Ott <[email protected]>
Reported-by: Sebastian Ott <[email protected]>
Fixes: 6d18c732b95c ("bridge: start hello_timer when enabling KERNEL_STP in br_stp_start")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_stp_if.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -166,7 +166,8 @@ static void br_stp_start(struct net_brid
br_debug(br, "using kernel STP\n");

/* To start timers on any ports left in blocking */
- mod_timer(&br->hello_timer, jiffies + br->hello_time);
+ if (br->dev->flags & IFF_UP)
+ mod_timer(&br->hello_timer, jiffies + br->hello_time);
br_port_state_selection(br);
}



2017-06-12 16:13:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 07/90] net: ping: do not abuse udp_poll()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 77d4b1d36926a9b8387c6b53eeba42bcaaffcea3 ]

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Cc: Solar Designer <[email protected]>
Cc: Vasiliy Kulikov <[email protected]>
Cc: Lorenzo Colitti <[email protected]>
Acked-By: Lorenzo Colitti <[email protected]>
Tested-By: Lorenzo Colitti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/ipv6.h | 1 +
net/ipv4/af_inet.c | 2 +-
net/ipv6/ping.c | 2 +-
net/ipv6/raw.c | 2 +-
4 files changed, 4 insertions(+), 3 deletions(-)

--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -958,6 +958,7 @@ int inet6_hash_connect(struct inet_timew
*/
extern const struct proto_ops inet6_stream_ops;
extern const struct proto_ops inet6_dgram_ops;
+extern const struct proto_ops inet6_sockraw_ops;

struct group_source_req;
struct group_filter;
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1014,7 +1014,7 @@ static struct inet_protosw inetsw_array[
.type = SOCK_DGRAM,
.protocol = IPPROTO_ICMP,
.prot = &ping_prot,
- .ops = &inet_dgram_ops,
+ .ops = &inet_sockraw_ops,
.flags = INET_PROTOSW_REUSE,
},

--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -50,7 +50,7 @@ static struct inet_protosw pingv6_protos
.type = SOCK_DGRAM,
.protocol = IPPROTO_ICMPV6,
.prot = &pingv6_prot,
- .ops = &inet6_dgram_ops,
+ .ops = &inet6_sockraw_ops,
.flags = INET_PROTOSW_REUSE,
};

--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1303,7 +1303,7 @@ void raw6_proc_exit(void)
#endif /* CONFIG_PROC_FS */

/* Same as inet6_dgram_ops, sans udp_poll. */
-static const struct proto_ops inet6_sockraw_ops = {
+const struct proto_ops inet6_sockraw_ops = {
.family = PF_INET6,
.owner = THIS_MODULE,
.release = inet6_release,


2017-06-12 15:38:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 05/90] vxlan: fix use-after-free on deletion

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mark Bloch <[email protected]>


[ Upstream commit a53cb29b0af346af44e4abf13d7e59f807fba690 ]

Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.

We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Acked-by: Jiri Benc <[email protected]>
Tested-by: Roi Dayan <[email protected]>
Signed-off-by: Mark Bloch <[email protected]>
Acked-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/vxlan.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -77,6 +77,8 @@ static const u8 all_zeros_mac[ETH_ALEN];

static int vxlan_sock_add(struct vxlan_dev *vxlan);

+static void vxlan_vs_del_dev(struct vxlan_dev *vxlan);
+
/* per-network namespace private data for this module */
struct vxlan_net {
struct list_head vxlan_list;
@@ -1052,6 +1054,8 @@ static void __vxlan_sock_release(struct

static void vxlan_sock_release(struct vxlan_dev *vxlan)
{
+ vxlan_vs_del_dev(vxlan);
+
__vxlan_sock_release(vxlan->vn4_sock);
#if IS_ENABLED(CONFIG_IPV6)
__vxlan_sock_release(vxlan->vn6_sock);
@@ -2255,6 +2259,15 @@ static void vxlan_cleanup(unsigned long
mod_timer(&vxlan->age_timer, next_timer);
}

+static void vxlan_vs_del_dev(struct vxlan_dev *vxlan)
+{
+ struct vxlan_net *vn = net_generic(vxlan->net, vxlan_net_id);
+
+ spin_lock(&vn->sock_lock);
+ hlist_del_init_rcu(&vxlan->hlist);
+ spin_unlock(&vn->sock_lock);
+}
+
static void vxlan_vs_add_dev(struct vxlan_sock *vs, struct vxlan_dev *vxlan)
{
struct vxlan_net *vn = net_generic(vxlan->net, vxlan_net_id);
@@ -3028,12 +3041,6 @@ static int vxlan_newlink(struct net *src
static void vxlan_dellink(struct net_device *dev, struct list_head *head)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
- struct vxlan_net *vn = net_generic(vxlan->net, vxlan_net_id);
-
- spin_lock(&vn->sock_lock);
- if (!hlist_unhashed(&vxlan->hlist))
- hlist_del_rcu(&vxlan->hlist);
- spin_unlock(&vn->sock_lock);

gro_cells_destroy(&vxlan->gro_cells);
list_del(&vxlan->next);


2017-06-12 15:38:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 03/90] cxgb4: avoid enabling napi twice to the same queue

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ganesh Goudar <[email protected]>


[ Upstream commit e7519f9926f1d0d11c776eb0475eb098c7760f68 ]

Take uld mutex to avoid race between cxgb_up() and
cxgb4_register_uld() to enable napi for the same uld
queue.

Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2714,10 +2714,14 @@ static int cxgb_up(struct adapter *adap)
if (err)
goto irq_err;
}
+
+ mutex_lock(&uld_mutex);
enable_rx(adap);
t4_sge_start(adap);
t4_intr_enable(adap);
adap->flags |= FULL_INIT_DONE;
+ mutex_unlock(&uld_mutex);
+
notify_ulds(adap, CXGB4_STATE_UP);
#if IS_ENABLED(CONFIG_IPV6)
update_clip(adap);


2017-06-12 16:14:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 26/90] nfsd: Fix up the "supattr_exclcreat" attributes

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

commit b26b78cb726007533d81fdf90a62e915002ef5c8 upstream.

If an NFSv4 client asks us for the supattr_exclcreat, then we must
not return attributes that are unsupported by this minor version.

Signed-off-by: Trond Myklebust <[email protected]>
Fixes: 75976de6556f ("NFSD: Return word2 bitmask if setting security..,")
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfsd/nfs4xdr.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2753,9 +2753,16 @@ out_acl:
}
#endif /* CONFIG_NFSD_PNFS */
if (bmval2 & FATTR4_WORD2_SUPPATTR_EXCLCREAT) {
- status = nfsd4_encode_bitmap(xdr, NFSD_SUPPATTR_EXCLCREAT_WORD0,
- NFSD_SUPPATTR_EXCLCREAT_WORD1,
- NFSD_SUPPATTR_EXCLCREAT_WORD2);
+ u32 supp[3];
+
+ supp[0] = nfsd_suppattrs0(minorversion);
+ supp[1] = nfsd_suppattrs1(minorversion);
+ supp[2] = nfsd_suppattrs2(minorversion);
+ supp[0] &= NFSD_SUPPATTR_EXCLCREAT_WORD0;
+ supp[1] &= NFSD_SUPPATTR_EXCLCREAT_WORD1;
+ supp[2] &= NFSD_SUPPATTR_EXCLCREAT_WORD2;
+
+ status = nfsd4_encode_bitmap(xdr, supp[0], supp[1], supp[2]);
if (status)
goto out;
}


2017-06-12 15:38:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 18/90] arch/sparc: support NR_CPUS = 4096

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jane Chu <[email protected]>


[ Upstream commit c79a13734d104b5b147d7cb0870276ccdd660dae ]

Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <[email protected]>

Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/Kconfig | 4 ++--
arch/sparc/kernel/irq_64.c | 17 +++++++++++++----
2 files changed, 15 insertions(+), 6 deletions(-)

--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -182,9 +182,9 @@ config NR_CPUS
int "Maximum number of CPUs"
depends on SMP
range 2 32 if SPARC32
- range 2 1024 if SPARC64
+ range 2 4096 if SPARC64
default 32 if SPARC32
- default 64 if SPARC64
+ default 4096 if SPARC64

source kernel/Kconfig.hz

--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -1034,17 +1034,26 @@ static void __init init_cpu_send_mondo_i
{
#ifdef CONFIG_SMP
unsigned long page;
+ void *mondo, *p;

- BUILD_BUG_ON((NR_CPUS * sizeof(u16)) > (PAGE_SIZE - 64));
+ BUILD_BUG_ON((NR_CPUS * sizeof(u16)) > PAGE_SIZE);
+
+ /* Make sure mondo block is 64byte aligned */
+ p = kzalloc(127, GFP_KERNEL);
+ if (!p) {
+ prom_printf("SUN4V: Error, cannot allocate mondo block.\n");
+ prom_halt();
+ }
+ mondo = (void *)(((unsigned long)p + 63) & ~0x3f);
+ tb->cpu_mondo_block_pa = __pa(mondo);

page = get_zeroed_page(GFP_KERNEL);
if (!page) {
- prom_printf("SUN4V: Error, cannot allocate cpu mondo page.\n");
+ prom_printf("SUN4V: Error, cannot allocate cpu list page.\n");
prom_halt();
}

- tb->cpu_mondo_block_pa = __pa(page);
- tb->cpu_list_pa = __pa(page + 64);
+ tb->cpu_list_pa = __pa(page);
#endif
}



2017-06-12 15:38:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 02/90] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <[email protected]>


[ Upstream commit 6e80ac5cc992ab6256c3dae87f7e57db15e1a58c ]

xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <[email protected]>
Acked-by: Craig Gallek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/xfrm6_mode_ro.c | 2 ++
net/ipv6/xfrm6_mode_transport.c | 2 ++
2 files changed, 4 insertions(+)

--- a/net/ipv6/xfrm6_mode_ro.c
+++ b/net/ipv6/xfrm6_mode_ro.c
@@ -47,6 +47,8 @@ static int xfrm6_ro_output(struct xfrm_s
iph = ipv6_hdr(skb);

hdr_len = x->type->hdr_offset(x, skb, &prevhdr);
+ if (hdr_len < 0)
+ return hdr_len;
skb_set_mac_header(skb, (prevhdr - x->props.header_len) - skb->data);
skb_set_network_header(skb, -x->props.header_len);
skb->transport_header = skb->network_header + hdr_len;
--- a/net/ipv6/xfrm6_mode_transport.c
+++ b/net/ipv6/xfrm6_mode_transport.c
@@ -28,6 +28,8 @@ static int xfrm6_transport_output(struct
iph = ipv6_hdr(skb);

hdr_len = x->type->hdr_offset(x, skb, &prevhdr);
+ if (hdr_len < 0)
+ return hdr_len;
skb_set_mac_header(skb, (prevhdr - x->props.header_len) - skb->data);
skb_set_network_header(skb, -x->props.header_len);
skb->transport_header = skb->network_header + hdr_len;


2017-06-12 16:15:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 20/90] ptrace: Properly initialize ptracer_cred on fork

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric W. Biederman <[email protected]>

commit c70d9d809fdeecedb96972457ee45c49a232d97f upstream.

When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default. This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork. Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <[email protected]>
Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/ptrace.h | 7 +++++--
kernel/ptrace.c | 20 +++++++++++++-------
2 files changed, 18 insertions(+), 9 deletions(-)

--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -50,7 +50,8 @@ extern int ptrace_request(struct task_st
unsigned long addr, unsigned long data);
extern void ptrace_notify(int exit_code);
extern void __ptrace_link(struct task_struct *child,
- struct task_struct *new_parent);
+ struct task_struct *new_parent,
+ const struct cred *ptracer_cred);
extern void __ptrace_unlink(struct task_struct *child);
extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
#define PTRACE_MODE_READ 0x01
@@ -202,7 +203,7 @@ static inline void ptrace_init_task(stru

if (unlikely(ptrace) && current->ptrace) {
child->ptrace = current->ptrace;
- __ptrace_link(child, current->parent);
+ __ptrace_link(child, current->parent, current->ptracer_cred);

if (child->ptrace & PT_SEIZED)
task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);
@@ -211,6 +212,8 @@ static inline void ptrace_init_task(stru

set_tsk_thread_flag(child, TIF_SIGPENDING);
}
+ else
+ child->ptracer_cred = NULL;
}

/**
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -28,19 +28,25 @@
#include <linux/compat.h>


+void __ptrace_link(struct task_struct *child, struct task_struct *new_parent,
+ const struct cred *ptracer_cred)
+{
+ BUG_ON(!list_empty(&child->ptrace_entry));
+ list_add(&child->ptrace_entry, &new_parent->ptraced);
+ child->parent = new_parent;
+ child->ptracer_cred = get_cred(ptracer_cred);
+}
+
/*
* ptrace a task: make the debugger its new parent and
* move it to the ptrace list.
*
* Must be called with the tasklist lock write-held.
*/
-void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
+static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)
{
- BUG_ON(!list_empty(&child->ptrace_entry));
- list_add(&child->ptrace_entry, &new_parent->ptraced);
- child->parent = new_parent;
rcu_read_lock();
- child->ptracer_cred = get_cred(__task_cred(new_parent));
+ __ptrace_link(child, new_parent, __task_cred(new_parent));
rcu_read_unlock();
}

@@ -353,7 +359,7 @@ static int ptrace_attach(struct task_str
flags |= PT_SEIZED;
task->ptrace = flags;

- __ptrace_link(task, current);
+ ptrace_link(task, current);

/* SEIZE doesn't trap tracee on attach */
if (!seize)
@@ -420,7 +426,7 @@ static int ptrace_traceme(void)
*/
if (!ret && !(current->real_parent->flags & PF_EXITING)) {
current->ptrace = PT_PTRACED;
- __ptrace_link(current, current->real_parent);
+ ptrace_link(current, current->real_parent);
}
}
write_unlock_irq(&tasklist_lock);


2017-06-12 16:15:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 21/90] KEYS: fix dereferencing NULL payload with nonzero length

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit 5649645d725c73df4302428ee4e02c869248b4c5 upstream.

sys_add_key() and the KEYCTL_UPDATE operation of sys_keyctl() allowed a
NULL payload with nonzero length to be passed to the key type's
->preparse(), ->instantiate(), and/or ->update() methods. Various key
types including asymmetric, cifs.idmap, cifs.spnego, and pkcs7_test did
not handle this case, allowing an unprivileged user to trivially cause a
NULL pointer dereference (kernel oops) if one of these key types was
present. Fix it by doing the copy_from_user() when 'plen' is nonzero
rather than when '_payload' is non-NULL, causing the syscall to fail
with EFAULT as expected when an invalid buffer is specified.

Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: James Morris <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
security/keys/keyctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -97,7 +97,7 @@ SYSCALL_DEFINE5(add_key, const char __us
/* pull the payload in if one was supplied */
payload = NULL;

- if (_payload) {
+ if (plen) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
if (!payload) {
@@ -327,7 +327,7 @@ long keyctl_update_key(key_serial_t id,

/* pull the payload in if one was supplied */
payload = NULL;
- if (_payload) {
+ if (plen) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
if (!payload)


2017-06-12 16:16:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 19/90] serial: ifx6x60: fix use-after-free on module unload

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit 1e948479b3d63e3ac0ecca13cbf4921c7d17c168 upstream.

Make sure to deregister the SPI driver before releasing the tty driver
to avoid use-after-free in the SPI remove callback where the tty
devices are deregistered.

Fixes: 72d4724ea54c ("serial: ifx6x60: Add modem power off function in the platform reboot process")
Cc: Jun Chen <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/serial/ifx6x60.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/serial/ifx6x60.c
+++ b/drivers/tty/serial/ifx6x60.c
@@ -1378,9 +1378,9 @@ static struct spi_driver ifx_spi_driver
static void __exit ifx_spi_exit(void)
{
/* unregister */
+ spi_unregister_driver(&ifx_spi_driver);
tty_unregister_driver(tty_drv);
put_tty_driver(tty_drv);
- spi_unregister_driver(&ifx_spi_driver);
unregister_reboot_notifier(&ifx_modem_reboot_notifier_block);
}



2017-06-12 16:17:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 17/90] sparc64: delete old wrap code

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit 0197e41ce70511dc3b71f7fefa1a676e2b5cd60b ]

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 6 ------
arch/sparc/include/asm/pil.h | 1 -
arch/sparc/kernel/kernel.h | 1 -
arch/sparc/kernel/smp_64.c | 31 -------------------------------
arch/sparc/kernel/ttable_64.S | 2 +-
arch/sparc/mm/ultra.S | 5 -----
6 files changed, 1 insertion(+), 45 deletions(-)

--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -19,12 +19,6 @@ extern unsigned long mmu_context_bmap[];

DECLARE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm);
void get_new_mmu_context(struct mm_struct *mm);
-#ifdef CONFIG_SMP
-void smp_new_mmu_context_version(void);
-#else
-#define smp_new_mmu_context_version() do { } while (0)
-#endif
-
int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
void destroy_context(struct mm_struct *mm);

--- a/arch/sparc/include/asm/pil.h
+++ b/arch/sparc/include/asm/pil.h
@@ -20,7 +20,6 @@
#define PIL_SMP_CALL_FUNC 1
#define PIL_SMP_RECEIVE_SIGNAL 2
#define PIL_SMP_CAPTURE 3
-#define PIL_SMP_CTX_NEW_VERSION 4
#define PIL_DEVICE_IRQ 5
#define PIL_SMP_CALL_FUNC_SNGL 6
#define PIL_DEFERRED_PCR_WORK 7
--- a/arch/sparc/kernel/kernel.h
+++ b/arch/sparc/kernel/kernel.h
@@ -37,7 +37,6 @@ void handle_stdfmna(struct pt_regs *regs
/* smp_64.c */
void __irq_entry smp_call_function_client(int irq, struct pt_regs *regs);
void __irq_entry smp_call_function_single_client(int irq, struct pt_regs *regs);
-void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs);
void __irq_entry smp_penguin_jailcell(int irq, struct pt_regs *regs);
void __irq_entry smp_receive_signal_client(int irq, struct pt_regs *regs);

--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -959,37 +959,6 @@ void flush_dcache_page_all(struct mm_str
preempt_enable();
}

-void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
-{
- struct mm_struct *mm;
- unsigned long flags;
-
- clear_softint(1 << irq);
-
- /* See if we need to allocate a new TLB context because
- * the version of the one we are using is now out of date.
- */
- mm = current->active_mm;
- if (unlikely(!mm || (mm == &init_mm)))
- return;
-
- spin_lock_irqsave(&mm->context.lock, flags);
-
- if (unlikely(!CTX_VALID(mm->context)))
- get_new_mmu_context(mm);
-
- spin_unlock_irqrestore(&mm->context.lock, flags);
-
- load_secondary_context(mm);
- __flush_tlb_mm(CTX_HWBITS(mm->context),
- SECONDARY_CONTEXT);
-}
-
-void smp_new_mmu_context_version(void)
-{
- smp_cross_call(&xcall_new_mmu_context_version, 0, 0, 0);
-}
-
#ifdef CONFIG_KGDB
void kgdb_roundup_cpus(unsigned long flags)
{
--- a/arch/sparc/kernel/ttable_64.S
+++ b/arch/sparc/kernel/ttable_64.S
@@ -50,7 +50,7 @@ tl0_resv03e: BTRAP(0x3e) BTRAP(0x3f) BTR
tl0_irq1: TRAP_IRQ(smp_call_function_client, 1)
tl0_irq2: TRAP_IRQ(smp_receive_signal_client, 2)
tl0_irq3: TRAP_IRQ(smp_penguin_jailcell, 3)
-tl0_irq4: TRAP_IRQ(smp_new_mmu_context_version_client, 4)
+tl0_irq4: BTRAP(0x44)
#else
tl0_irq1: BTRAP(0x41)
tl0_irq2: BTRAP(0x42)
--- a/arch/sparc/mm/ultra.S
+++ b/arch/sparc/mm/ultra.S
@@ -971,11 +971,6 @@ xcall_capture:
wr %g0, (1 << PIL_SMP_CAPTURE), %set_softint
retry

- .globl xcall_new_mmu_context_version
-xcall_new_mmu_context_version:
- wr %g0, (1 << PIL_SMP_CTX_NEW_VERSION), %set_softint
- retry
-
#ifdef CONFIG_KGDB
.globl xcall_kgdb_capture
xcall_kgdb_capture:


2017-06-12 15:37:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 01/90] bnx2x: Fix Multi-Cos

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Mintz, Yuval" <[email protected]>


[ Upstream commit 3968d38917eb9bd0cd391265f6c9c538d9b33ffa ]

Apparently multi-cos isn't working for bnx2x quite some time -
driver implements ndo_select_queue() to allow queue-selection
for FCoE, but the regular L2 flow would cause it to modulo the
fallback's result by the number of queues.
The fallback would return a queue matching the needed tc
[via __skb_tx_hash()], but since the modulo is by the number of TSS
queues where number of TCs is not accounted, transmission would always
be done by a queue configured into using TC0.

Fixes: ada7c19e6d27 ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash")
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1949,7 +1949,7 @@ u16 bnx2x_select_queue(struct net_device
}

/* select a non-FCoE queue */
- return fallback(dev, skb) % BNX2X_NUM_ETH_QUEUES(bp);
+ return fallback(dev, skb) % (BNX2X_NUM_ETH_QUEUES(bp) * bp->max_cos);
}

void bnx2x_set_num_queues(struct bnx2x *bp)


2017-06-12 15:37:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 11/90] sparc: Machine description indices can vary

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: James Clarke <[email protected]>


[ Upstream commit c982aa9c304bf0b9a7522fd118fed4afa5a0263c ]

VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/vio.h | 1
arch/sparc/kernel/vio.c | 68 ++++++++++++++++++++++++++++++++++++++++---
2 files changed, 65 insertions(+), 4 deletions(-)

--- a/arch/sparc/include/asm/vio.h
+++ b/arch/sparc/include/asm/vio.h
@@ -327,6 +327,7 @@ struct vio_dev {
int compat_len;

u64 dev_no;
+ u64 id;

unsigned long channel_id;

--- a/arch/sparc/kernel/vio.c
+++ b/arch/sparc/kernel/vio.c
@@ -284,13 +284,16 @@ static struct vio_dev *vio_create_one(st
if (!id) {
dev_set_name(&vdev->dev, "%s", bus_id_name);
vdev->dev_no = ~(u64)0;
+ vdev->id = ~(u64)0;
} else if (!cfg_handle) {
dev_set_name(&vdev->dev, "%s-%llu", bus_id_name, *id);
vdev->dev_no = *id;
+ vdev->id = ~(u64)0;
} else {
dev_set_name(&vdev->dev, "%s-%llu-%llu", bus_id_name,
*cfg_handle, *id);
vdev->dev_no = *cfg_handle;
+ vdev->id = *id;
}

vdev->dev.parent = parent;
@@ -333,27 +336,84 @@ static void vio_add(struct mdesc_handle
(void) vio_create_one(hp, node, &root_vdev->dev);
}

+struct vio_md_node_query {
+ const char *type;
+ u64 dev_no;
+ u64 id;
+};
+
static int vio_md_node_match(struct device *dev, void *arg)
{
+ struct vio_md_node_query *query = (struct vio_md_node_query *) arg;
struct vio_dev *vdev = to_vio_dev(dev);

- if (vdev->mp == (u64) arg)
- return 1;
+ if (vdev->dev_no != query->dev_no)
+ return 0;
+ if (vdev->id != query->id)
+ return 0;
+ if (strcmp(vdev->type, query->type))
+ return 0;

- return 0;
+ return 1;
}

static void vio_remove(struct mdesc_handle *hp, u64 node)
{
+ const char *type;
+ const u64 *id, *cfg_handle;
+ u64 a;
+ struct vio_md_node_query query;
struct device *dev;

- dev = device_find_child(&root_vdev->dev, (void *) node,
+ type = mdesc_get_property(hp, node, "device-type", NULL);
+ if (!type) {
+ type = mdesc_get_property(hp, node, "name", NULL);
+ if (!type)
+ type = mdesc_node_name(hp, node);
+ }
+
+ query.type = type;
+
+ id = mdesc_get_property(hp, node, "id", NULL);
+ cfg_handle = NULL;
+ mdesc_for_each_arc(a, hp, node, MDESC_ARC_TYPE_BACK) {
+ u64 target;
+
+ target = mdesc_arc_target(hp, a);
+ cfg_handle = mdesc_get_property(hp, target,
+ "cfg-handle", NULL);
+ if (cfg_handle)
+ break;
+ }
+
+ if (!id) {
+ query.dev_no = ~(u64)0;
+ query.id = ~(u64)0;
+ } else if (!cfg_handle) {
+ query.dev_no = *id;
+ query.id = ~(u64)0;
+ } else {
+ query.dev_no = *cfg_handle;
+ query.id = *id;
+ }
+
+ dev = device_find_child(&root_vdev->dev, &query,
vio_md_node_match);
if (dev) {
printk(KERN_INFO "VIO: Removing device %s\n", dev_name(dev));

device_unregister(dev);
put_device(dev);
+ } else {
+ if (!id)
+ printk(KERN_ERR "VIO: Removed unknown %s node.\n",
+ type);
+ else if (!cfg_handle)
+ printk(KERN_ERR "VIO: Removed unknown %s node %llu.\n",
+ type, *id);
+ else
+ printk(KERN_ERR "VIO: Removed unknown %s node %llu-%llu.\n",
+ type, *cfg_handle, *id);
}
}



2017-06-12 16:18:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 13/90] sparc64: combine activate_mm and switch_mm

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit 14d0334c6748ff2aedb3f2f7fdc51ee90a9b54e7 ]

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/mmu_context_64.h | 21 +--------------------
1 file changed, 1 insertion(+), 20 deletions(-)

--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -131,26 +131,7 @@ static inline void switch_mm(struct mm_s
}

#define deactivate_mm(tsk,mm) do { } while (0)
-
-/* Activate a new MM instance for the current task. */
-static inline void activate_mm(struct mm_struct *active_mm, struct mm_struct *mm)
-{
- unsigned long flags;
- int cpu;
-
- spin_lock_irqsave(&mm->context.lock, flags);
- if (!CTX_VALID(mm->context))
- get_new_mmu_context(mm);
- cpu = smp_processor_id();
- if (!cpumask_test_cpu(cpu, mm_cpumask(mm)))
- cpumask_set_cpu(cpu, mm_cpumask(mm));
-
- load_secondary_context(mm);
- __flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT);
- tsb_context_switch(mm);
- spin_unlock_irqrestore(&mm->context.lock, flags);
-}
-
+#define activate_mm(active_mm, mm) switch_mm(active_mm, mm, NULL)
#endif /* !(__ASSEMBLY__) */

#endif /* !(__SPARC64_MMU_CONTEXT_H) */


2017-06-12 16:19:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 10/90] sparc64: mm: fix copy_tsb to correctly copy huge page TSBs

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <[email protected]>


[ Upstream commit 654f4807624a657f364417c2a7454f0df9961734 ]

When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these
calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used. As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB. When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB. We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/kernel/tsb.S | 11 +++++++----
arch/sparc/mm/tsb.c | 7 +++++--
2 files changed, 12 insertions(+), 6 deletions(-)

--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -470,13 +470,16 @@ __tsb_context_switch:
.type copy_tsb,#function
copy_tsb: /* %o0=old_tsb_base, %o1=old_tsb_size
* %o2=new_tsb_base, %o3=new_tsb_size
+ * %o4=page_size_shift
*/
sethi %uhi(TSB_PASS_BITS), %g7
srlx %o3, 4, %o3
- add %o0, %o1, %g1 /* end of old tsb */
+ add %o0, %o1, %o1 /* end of old tsb */
sllx %g7, 32, %g7
sub %o3, 1, %o3 /* %o3 == new tsb hash mask */

+ mov %o4, %g1 /* page_size_shift */
+
661: prefetcha [%o0] ASI_N, #one_read
.section .tsb_phys_patch, "ax"
.word 661b
@@ -501,9 +504,9 @@ copy_tsb: /* %o0=old_tsb_base, %o1=old_
/* This can definitely be computed faster... */
srlx %o0, 4, %o5 /* Build index */
and %o5, 511, %o5 /* Mask index */
- sllx %o5, PAGE_SHIFT, %o5 /* Put into vaddr position */
+ sllx %o5, %g1, %o5 /* Put into vaddr position */
or %o4, %o5, %o4 /* Full VADDR. */
- srlx %o4, PAGE_SHIFT, %o4 /* Shift down to create index */
+ srlx %o4, %g1, %o4 /* Shift down to create index */
and %o4, %o3, %o4 /* Mask with new_tsb_nents-1 */
sllx %o4, 4, %o4 /* Shift back up into tsb ent offset */
TSB_STORE(%o2 + %o4, %g2) /* Store TAG */
@@ -511,7 +514,7 @@ copy_tsb: /* %o0=old_tsb_base, %o1=old_
TSB_STORE(%o2 + %o4, %g3) /* Store TTE */

80: add %o0, 16, %o0
- cmp %o0, %g1
+ cmp %o0, %o1
bne,pt %xcc, 90b
nop

--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -451,7 +451,8 @@ retry_tsb_alloc:
extern void copy_tsb(unsigned long old_tsb_base,
unsigned long old_tsb_size,
unsigned long new_tsb_base,
- unsigned long new_tsb_size);
+ unsigned long new_tsb_size,
+ unsigned long page_size_shift);
unsigned long old_tsb_base = (unsigned long) old_tsb;
unsigned long new_tsb_base = (unsigned long) new_tsb;

@@ -459,7 +460,9 @@ retry_tsb_alloc:
old_tsb_base = __pa(old_tsb_base);
new_tsb_base = __pa(new_tsb_base);
}
- copy_tsb(old_tsb_base, old_size, new_tsb_base, new_size);
+ copy_tsb(old_tsb_base, old_size, new_tsb_base, new_size,
+ tsb_index == MM_TSB_BASE ?
+ PAGE_SHIFT : REAL_HPAGE_SHIFT);
}

mm->context.tsb_block[tsb_index].tsb = new_tsb;


2017-06-12 16:19:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.4 12/90] sparc64: reset mm cpumask after wrap

4.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <[email protected]>


[ Upstream commit 588974857359861891f478a070b1dc7ae04a3880 ]

After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/mm/init_64.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -708,6 +708,8 @@ void get_new_mmu_context(struct mm_struc
goto out;
}
}
+ if (mm->context.sparc64_ctx_val)
+ cpumask_clear(mm_cpumask(mm));
mmu_context_bmap[new_ctx>>6] |= (1UL << (new_ctx & 63));
new_ctx |= (tlb_context_cache & CTX_VERSION_MASK);
out:


2017-06-12 21:53:32

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/90] 4.4.72-stable review

On Mon, Jun 12, 2017 at 05:25:06PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.72 release.
> There are 90 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jun 14 15:25:30 UTC 2017.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 115 pass: 115 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2017-06-13 00:45:16

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.4 00/90] 4.4.72-stable review

On 06/12/2017 09:25 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.72 release.
> There are 90 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Jun 14 15:25:30 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.72-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-07-28 13:53:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware

JFYI. We have encountered a regression after applying this patch on a
large ppc machine. While the patch is the right thing to do it doesn't
work well with the current vmalloc area size on ppc and large machines
where NUMA nodes are very far from each other. Just for the reference
the boot fails on such a machine with bunch of warning preceeding it.
See http://lkml.kernel.org/r/[email protected]

It seems the right thing to do is to enlarge the vmalloc space on ppc
but this is not the case in the upstream kernel yet AFAIK. It is also
questionable whether that is a stable material but I will decision on
you here.

We have reverted this patch from our 4.4 based kernel.

On Mon 12-06-17 17:26:12, Greg KH wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Michael Ellerman <[email protected]>
>
> commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.
>
> In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
> switched to the generic implementation of cpu_to_node(), which uses a percpu
> variable to hold the NUMA node for each CPU.
>
> Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
> of our percpu areas, leading to a chicken and egg problem. In practice what
> happens is when we are setting up the percpu areas, cpu_to_node() reports that
> all CPUs are on node 0, so we allocate all percpu areas on node 0.
>
> This is visible in the dmesg output, as all pcpu allocs being in group 0:
>
> pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
> pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
> pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
> pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
> pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
> pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
>
> To fix it we need an early_cpu_to_node() which can run prior to percpu being
> setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
> in. With the patch dmesg output shows two groups, 0 and 1:
>
> pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
> pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
> pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
> pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
> pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
> pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
>
> We can also check the data_offset in the paca of various CPUs, with the fix we
> see:
>
> CPU 0: data_offset = 0x0ffe8b0000
> CPU 24: data_offset = 0x1ffe5b0000
>
> And we can see from dmesg that CPU 24 has an allocation on node 1:
>
> node 0: [mem 0x0000000000000000-0x0000000fffffffff]
> node 1: [mem 0x0000001000000000-0x0000001fffffffff]
>
> Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
> Signed-off-by: Michael Ellerman <[email protected]>
> Reviewed-by: Nicholas Piggin <[email protected]>
> Signed-off-by: Michael Ellerman <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> arch/powerpc/include/asm/topology.h | 14 ++++++++++++++
> arch/powerpc/kernel/setup_64.c | 4 ++--
> 2 files changed, 16 insertions(+), 2 deletions(-)
>
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topolog
> extern int sysfs_add_device_to_node(struct device *dev, int nid);
> extern void sysfs_remove_device_from_node(struct device *dev, int nid);
>
> +static inline int early_cpu_to_node(int cpu)
> +{
> + int nid;
> +
> + nid = numa_cpu_lookup_table[cpu];
> +
> + /*
> + * Fall back to node 0 if nid is unset (it should be, except bugs).
> + * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
> + */
> + return (nid < 0) ? 0 : nid;
> +}
> #else
>
> +static inline int early_cpu_to_node(int cpu) { return 0; }
> +
> static inline void dump_numa_cpu_topology(void) {}
>
> static inline int sysfs_add_device_to_node(struct device *dev, int nid)
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -751,7 +751,7 @@ void __init setup_arch(char **cmdline_p)
>
> static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
> {
> - return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
> + return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align,
> __pa(MAX_DMA_ADDRESS));
> }
>
> @@ -762,7 +762,7 @@ static void __init pcpu_fc_free(void *pt
>
> static int pcpu_cpu_distance(unsigned int from, unsigned int to)
> {
> - if (cpu_to_node(from) == cpu_to_node(to))
> + if (early_cpu_to_node(from) == early_cpu_to_node(to))
> return LOCAL_DISTANCE;
> else
> return REMOTE_DISTANCE;
>

--
Michal Hocko
SUSE Labs

2017-07-28 22:41:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware

On Fri, Jul 28, 2017 at 03:53:35PM +0200, Michal Hocko wrote:
> JFYI. We have encountered a regression after applying this patch on a
> large ppc machine. While the patch is the right thing to do it doesn't
> work well with the current vmalloc area size on ppc and large machines
> where NUMA nodes are very far from each other. Just for the reference
> the boot fails on such a machine with bunch of warning preceeding it.
> See http://lkml.kernel.org/r/[email protected]
>
> It seems the right thing to do is to enlarge the vmalloc space on ppc
> but this is not the case in the upstream kernel yet AFAIK. It is also
> questionable whether that is a stable material but I will decision on
> you here.
>
> We have reverted this patch from our 4.4 based kernel.

But all is fine on newer kernels? That is odd.

I'll be glad to drop it, but should it be dropped from all stable trees?

thanks,

greg k-h

2017-07-31 06:41:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware

On Fri 28-07-17 15:41:47, Greg KH wrote:
> On Fri, Jul 28, 2017 at 03:53:35PM +0200, Michal Hocko wrote:
> > JFYI. We have encountered a regression after applying this patch on a
> > large ppc machine. While the patch is the right thing to do it doesn't
> > work well with the current vmalloc area size on ppc and large machines
> > where NUMA nodes are very far from each other. Just for the reference
> > the boot fails on such a machine with bunch of warning preceeding it.
> > See http://lkml.kernel.org/r/[email protected]
> >
> > It seems the right thing to do is to enlarge the vmalloc space on ppc
> > but this is not the case in the upstream kernel yet AFAIK. It is also
> > questionable whether that is a stable material but I will decision on
> > you here.
> >
> > We have reverted this patch from our 4.4 based kernel.
>
> But all is fine on newer kernels? That is odd.

Newer kernels do not have enlarged vmalloc space yet AFAIK so they won't
work properly eiter. This bug is quite rare though because you need a
specific HW configuration to trigger the issue - namely NUMA nodes have
to be far away from each other in the physical memory space.

> I'll be glad to drop it, but should it be dropped from all stable trees?

Yes from all stable backports.
--
Michal Hocko
SUSE Labs

2017-08-03 19:30:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware

On Mon, Jul 31, 2017 at 08:41:16AM +0200, Michal Hocko wrote:
> On Fri 28-07-17 15:41:47, Greg KH wrote:
> > On Fri, Jul 28, 2017 at 03:53:35PM +0200, Michal Hocko wrote:
> > > JFYI. We have encountered a regression after applying this patch on a
> > > large ppc machine. While the patch is the right thing to do it doesn't
> > > work well with the current vmalloc area size on ppc and large machines
> > > where NUMA nodes are very far from each other. Just for the reference
> > > the boot fails on such a machine with bunch of warning preceeding it.
> > > See http://lkml.kernel.org/r/[email protected]
> > >
> > > It seems the right thing to do is to enlarge the vmalloc space on ppc
> > > but this is not the case in the upstream kernel yet AFAIK. It is also
> > > questionable whether that is a stable material but I will decision on
> > > you here.
> > >
> > > We have reverted this patch from our 4.4 based kernel.
> >
> > But all is fine on newer kernels? That is odd.
>
> Newer kernels do not have enlarged vmalloc space yet AFAIK so they won't
> work properly eiter. This bug is quite rare though because you need a
> specific HW configuration to trigger the issue - namely NUMA nodes have
> to be far away from each other in the physical memory space.
>
> > I'll be glad to drop it, but should it be dropped from all stable trees?
>
> Yes from all stable backports.

Ok, now all reverted, thanks.

greg k-h