2014-11-06 22:37:16

by Kamal Mostafa

[permalink] [raw]
Subject: [3.13.y.z extended stable] Linux 3.13.11.11 stable review

This is the start of the review cycle for the Linux 3.13.11.11 stable kernel.

This version contains 162 new patches, summarized below. The new patches are
posted as replies to this message and also available in this git branch:

http://kernel.ubuntu.com/git?p=ubuntu/linux.git;h=linux-3.13.y-review;a=shortlog

git://kernel.ubuntu.com/ubuntu/linux.git linux-3.13.y-review

The review period for version 3.13.11.11 will be open for the next three days.
To report a problem, please reply to the relevant follow-up patch message.

For more information about the Linux 3.13.y.z extended stable kernel version,
see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable .

-Kamal

--
arch/arc/boot/dts/nsimosci.dts | 2 +-
arch/arc/include/asm/kgdb.h | 32 +-
arch/arm64/include/asm/compat.h | 4 +-
arch/mips/include/asm/ftrace.h | 4 +-
arch/mips/mm/tlbex.c | 6 +-
arch/sparc/Kconfig | 1 +
arch/sparc/include/asm/hypervisor.h | 11 +
arch/sparc/include/asm/irq_64.h | 9 +-
arch/sparc/include/asm/ldc.h | 5 +-
arch/sparc/include/asm/oplib_64.h | 3 +-
arch/sparc/include/asm/page_64.h | 33 +-
arch/sparc/include/asm/pgalloc_64.h | 28 +-
arch/sparc/include/asm/pgtable_64.h | 100 ++--
arch/sparc/include/asm/setup.h | 4 +
arch/sparc/include/asm/spitfire.h | 2 +
arch/sparc/include/asm/thread_info_64.h | 4 +-
arch/sparc/include/asm/tsb.h | 87 +++-
arch/sparc/include/asm/visasm.h | 8 +
arch/sparc/kernel/cpu.c | 12 +
arch/sparc/kernel/cpumap.c | 2 +
arch/sparc/kernel/ds.c | 4 +-
arch/sparc/kernel/dtlb_prot.S | 6 +-
arch/sparc/kernel/entry.h | 3 -
arch/sparc/kernel/head_64.S | 52 +-
arch/sparc/kernel/hvapi.c | 1 +
arch/sparc/kernel/hvcalls.S | 16 +
arch/sparc/kernel/hvtramp.S | 1 -
arch/sparc/kernel/ioport.c | 5 +-
arch/sparc/kernel/irq_64.c | 507 ++++++++++++-------
arch/sparc/kernel/ktlb.S | 125 +----
arch/sparc/kernel/ldc.c | 41 +-
arch/sparc/kernel/nmi.c | 1 -
arch/sparc/kernel/pcr.c | 47 +-
arch/sparc/kernel/perf_event.c | 10 +-
arch/sparc/kernel/process_64.c | 3 +
arch/sparc/kernel/setup_64.c | 36 +-
arch/sparc/kernel/smp_64.c | 8 +-
arch/sparc/kernel/sun4v_tlb_miss.S | 35 +-
arch/sparc/kernel/trampoline_64.S | 12 +-
arch/sparc/kernel/traps_64.c | 15 +-
arch/sparc/kernel/tsb.S | 6 +-
arch/sparc/kernel/viohs.c | 4 +-
arch/sparc/kernel/vmlinux.lds.S | 10 +-
arch/sparc/lib/NG4memcpy.S | 14 +-
arch/sparc/lib/memset.S | 18 +-
arch/sparc/mm/fault_64.c | 3 +
arch/sparc/mm/gup.c | 30 ++
arch/sparc/mm/init_64.c | 554 +++++++++++----------
arch/sparc/mm/init_64.h | 18 -
arch/sparc/power/hibernate_asm.S | 4 +-
arch/sparc/prom/bootstr_64.c | 5 +-
arch/sparc/prom/cif.S | 5 +-
arch/sparc/prom/init_64.c | 6 +-
arch/sparc/prom/p1275.c | 9 +-
arch/x86/ia32/ia32entry.S | 2 +-
arch/x86/include/asm/kvm_host.h | 16 +-
arch/x86/include/uapi/asm/vmx.h | 2 +
arch/x86/kernel/apic/apic.c | 4 +-
arch/x86/kernel/tsc.c | 5 +-
arch/x86/kvm/emulate.c | 250 +++++++---
arch/x86/kvm/i8254.c | 2 +
arch/x86/kvm/svm.c | 8 +-
arch/x86/kvm/vmx.c | 24 +-
arch/x86/kvm/x86.c | 38 +-
arch/x86/mm/pageattr.c | 2 +-
block/scsi_ioctl.c | 3 +-
drivers/acpi/ec.c | 28 +-
drivers/char/random.c | 10 +-
drivers/cpufreq/cpufreq.c | 23 +-
drivers/cpufreq/intel_pstate.c | 112 ++++-
drivers/edac/cpc925_edac.c | 2 +-
drivers/edac/e7xxx_edac.c | 2 +-
drivers/edac/i3200_edac.c | 4 +-
drivers/edac/i82860_edac.c | 2 +-
drivers/gpu/drm/cirrus/cirrus_drv.c | 2 +
drivers/gpu/drm/qxl/qxl_display.c | 16 +-
drivers/gpu/drm/radeon/cik_sdma.c | 21 +-
drivers/gpu/drm/radeon/dce6_afmt.c | 6 +-
drivers/gpu/drm/radeon/evergreen_hdmi.c | 6 +-
drivers/gpu/drm/radeon/kv_dpm.c | 19 +-
drivers/gpu/drm/radeon/r600_dma.c | 21 +-
drivers/gpu/drm/radeon/radeon.h | 2 +
drivers/gpu/drm/radeon/si_dpm.c | 2 +-
drivers/iio/common/st_sensors/st_sensors_buffer.c | 2 +-
drivers/infiniband/ulp/isert/ib_isert.c | 4 +-
drivers/input/serio/i8042-x86ia64io.h | 16 +
drivers/net/bonding/bond_main.c | 17 +-
drivers/net/ethernet/broadcom/tg3.c | 23 +-
drivers/net/ethernet/cadence/macb.c | 11 -
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 88 ++--
drivers/net/hyperv/netvsc_drv.c | 3 +-
drivers/net/macvlan.c | 1 +
drivers/net/macvtap.c | 18 +-
drivers/net/team/team.c | 4 +-
drivers/net/vxlan.c | 8 +-
drivers/net/wireless/ath/ar5523/ar5523.c | 3 +-
drivers/net/wireless/ath/ath10k/mac.c | 3 +-
drivers/net/wireless/ath/ath9k/main.c | 3 +-
drivers/net/wireless/ath/carl9170/main.c | 4 +-
.../net/wireless/brcm80211/brcmsmac/mac80211_if.c | 3 +-
drivers/net/wireless/cw1200/sta.c | 3 +-
drivers/net/wireless/cw1200/sta.h | 3 +-
drivers/net/wireless/iwlegacy/common.c | 3 +-
drivers/net/wireless/iwlegacy/common.h | 3 +-
drivers/net/wireless/iwlwifi/dvm/mac80211.c | 27 +-
drivers/net/wireless/iwlwifi/iwl-trans.h | 2 +
drivers/net/wireless/iwlwifi/mvm/fw-api-power.h | 35 +-
drivers/net/wireless/iwlwifi/mvm/fw-api.h | 1 +
drivers/net/wireless/iwlwifi/mvm/fw.c | 9 +
drivers/net/wireless/iwlwifi/mvm/ops.c | 1 +
drivers/net/wireless/iwlwifi/mvm/tx.c | 8 +-
drivers/net/wireless/iwlwifi/pcie/trans.c | 16 +-
drivers/net/wireless/mac80211_hwsim.c | 4 +-
drivers/net/wireless/p54/main.c | 3 +-
drivers/net/wireless/rt2x00/rt2800usb.c | 1 +
drivers/net/wireless/rt2x00/rt2x00.h | 3 +-
drivers/net/wireless/rt2x00/rt2x00mac.c | 3 +-
drivers/net/wireless/rtlwifi/core.c | 3 +-
drivers/net/wireless/ti/wlcore/main.c | 3 +-
drivers/pci/pci-sysfs.c | 8 +-
drivers/scsi/qla2xxx/tcm_qla2xxx.c | 11 +-
drivers/spi/spi-pl022.c | 2 +-
drivers/staging/iio/adc/mxs-lradc.c | 12 +-
drivers/staging/iio/impedance-analyzer/ad5933.c | 15 +-
drivers/target/target_core_device.c | 3 +-
drivers/target/target_core_pr.c | 6 +-
drivers/target/target_core_pr.h | 2 +-
drivers/target/target_core_tpg.c | 8 +
drivers/target/target_core_transport.c | 3 +-
drivers/usb/dwc3/ep0.c | 4 +-
drivers/usb/dwc3/gadget.c | 17 +-
drivers/usb/dwc3/gadget.h | 2 +-
drivers/usb/gadget/f_acm.c | 7 +-
drivers/usb/gadget/udc-core.c | 5 +
drivers/usb/musb/musb_cppi41.c | 3 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/ftdi_sio.c | 3 +
drivers/usb/serial/ftdi_sio_ids.h | 12 +-
drivers/usb/serial/option.c | 10 +
fs/buffer.c | 3 +
fs/ext3/super.c | 7 -
fs/ext4/balloc.c | 13 +-
fs/ext4/bitmap.c | 12 +-
fs/ext4/ext4.h | 17 +-
fs/ext4/extents.c | 6 +-
fs/ext4/ialloc.c | 7 +-
fs/ext4/inline.c | 3 +-
fs/ext4/inode.c | 46 +-
fs/ext4/ioctl.c | 13 +-
fs/ext4/mmp.c | 6 +-
fs/ext4/namei.c | 73 ++-
fs/ext4/resize.c | 5 +-
fs/ext4/super.c | 32 +-
fs/ext4/xattr.c | 38 +-
fs/jbd2/recovery.c | 1 +
fs/namei.c | 3 +-
fs/nfsd/nfs4proc.c | 3 +-
fs/quota/dquot.c | 2 +-
include/drm/drm_pciids.h | 1 -
include/linux/compiler-gcc.h | 3 +
include/linux/compiler-intel.h | 7 +
include/linux/compiler.h | 4 +
include/linux/khugepaged.h | 17 +-
include/linux/mm.h | 1 +
include/linux/oom.h | 3 +
include/linux/string.h | 5 +-
include/net/dst.h | 16 +-
include/net/inet_connection_sock.h | 1 +
include/net/mac80211.h | 4 +-
include/net/sctp/command.h | 2 +-
include/net/sock.h | 1 -
include/net/tcp.h | 5 +-
kernel/freezer.c | 3 +
kernel/futex.c | 22 +-
kernel/posix-timers.c | 1 +
kernel/power/hibernate.c | 8 +-
kernel/power/process.c | 40 +-
kernel/trace/trace_syscalls.c | 8 +-
lib/bitmap.c | 8 +-
lib/string.c | 16 +
mm/huge_memory.c | 15 +-
mm/mmap.c | 8 +-
mm/oom_kill.c | 17 +
mm/page_alloc.c | 8 +
mm/page_cgroup.c | 1 +
mm/truncate.c | 57 +++
net/bridge/br_private.h | 3 +
net/bridge/br_vlan.c | 13 +-
net/core/rtnetlink.c | 3 +-
net/core/skbuff.c | 3 +
net/ipv4/route.c | 6 +-
net/ipv4/tcp.c | 14 +-
net/ipv4/tcp_input.c | 8 +-
net/ipv4/tcp_ipv4.c | 5 +-
net/ipv4/tcp_output.c | 11 +-
net/ipv6/ip6_gre.c | 4 +-
net/ipv6/ip6_output.c | 4 +-
net/ipv6/sit.c | 6 +-
net/ipv6/tcp_ipv6.c | 3 +-
net/l2tp/l2tp_ppp.c | 3 +-
net/mac80211/driver-ops.h | 8 +-
net/mac80211/rate.c | 2 +-
net/mac80211/util.c | 2 +-
net/netlink/af_netlink.c | 2 +-
net/openvswitch/actions.c | 5 +
net/packet/af_packet.c | 17 +
net/packet/internal.h | 1 +
net/sctp/sm_statefuns.c | 19 +-
net/xfrm/xfrm_policy.c | 48 +-
security/integrity/evm/evm_main.c | 9 +-
security/integrity/ima/ima_appraise.c | 3 +
security/integrity/integrity.h | 1 +
sound/core/pcm_compat.c | 2 +
sound/core/pcm_native.c | 2 +-
sound/pci/hda/patch_hdmi.c | 15 +-
sound/pci/hda/patch_realtek.c | 3 +
sound/usb/quirks-table.h | 30 ++
virt/kvm/iommu.c | 8 +-
218 files changed, 2543 insertions(+), 1373 deletions(-)

Alex Deucher (5):
drm/radeon: fix speaker allocation setup
drm/radeon: use gart memory for DMA ring tests
drm/radeon/dpm: disable ulv support on SI
drm/radeon: dpm fixes for asrock systems
drm/radeon: remove invalid pci id

Allen Pais (3):
sparc64: correctly recognise M6 and M7 cpu type
sparc64: support M6 and M7 for building CPU distribution map
sparc64: cpu hardware caps support for sparc M6 and M7

Anatol Pomozov (1):
ALSA: pcm: use the same dma mmap codepath both for arm and arm64

Andreas Larsson (1):
sparc: Let memset return the address argument

Andrey Vagin (1):
tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)

Andy Honig (2):
KVM: x86: Prevent host from panicking on shared MSR writes.
KVM: x86: Improve thread safety in pit

Andy Lutomirski (2):
x86, apic: Handle a bad TSC more gracefully
x86_64, entry: Fix out of bounds read on sysenter

Anssi Hannula (1):
ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug

Anton Kolesov (1):
ARC: Update order of registers in KGDB to match GDB 7.5

Brian Silverman (1):
futex: Fix a race condition between REQUEUE_PI and task death

Cesar Eduardo Barros (1):
compiler: define OPTIMIZER_HIDE_VAR() macro

Cong Wang (1):
freezer: Do not freeze tasks killed by OOM killer

Cyril Brulebois (1):
wireless: rt2x00: add new rt2800usb device

Dan Williams (1):
USB: option: add Haier CE81B CDMA modem

Daniel Borkmann (2):
netlink: reset network header before passing to taps
random: add and use memzero_explicit() for clearing data

Daniel Hellstrom (1):
sparc32: dma_alloc_coherent must honour gfp flags

Daniele Palmas (1):
usb: option: add support for Telit LE910

Darrick J. Wong (4):
ext4: check EA value offset when loading
jbd2: free bh when descriptor block checksum fails
ext4: check s_chksum_driver when looking for bg csum presence
ext4: enable journal checksum when metadata checksum feature enabled

Dave Kleikamp (1):
sparc64: Increase size of boot string to 1024 bytes

David Daney (1):
MIPS: tlbex: Properly fix HUGE TLB Refill exception handler

David Rientjes (1):
mm, thp: fix collapsing of hugepages on madvise

David S. Miller (18):
sparc64: Do not disable interrupts in nmi_cpu_busy()
sparc64: Fix pcr_ops initialization and usage bugs.
sparc64: Fix corrupted thread fault code.
sparc64: Fix reversed start/end in flush_tlb_kernel_range()
sparc64: Fix lockdep warnings on reboot on Ultra-5
sparc64: Fix FPU register corruption with AES crypto offload.
sparc64: Do not define thread fpregs save area as zero-length array.
sparc64: Fix hibernation code refrence to PAGE_OFFSET.
sparc64: Switch to 4-level page tables.
sparc64: Define VA hole at run time, rather than at compile time.
sparc64: Adjust KTSB assembler to support larger physical addresses.
sparc64: Fix physical memory management regressions with large max_phys_bits.
sparc64: Use kernel page tables for vmemmap.
sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53.
sparc64: Adjust vmalloc region size based upon available virtual address bits.
sparc64: Kill unnecessary tables and increase MAX_BANKS.
sparc64: Fix register corruption in top-most kernel stack frame during boot.
sparc64: Implement __get_user_pages_fast().

Dexuan Cui (1):
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE

Dirk Brandewie (4):
cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers
intel_pstate: Don't lose sysfs settings during cpu offline
intel_pstate: Fix BYT frequency reporting
intel_pstate: Correct BYT VID values.

Dmitry Kasatkin (2):
ima: check xattr value length and type in the ima_inode_setxattr()
evm: check xattr value length and type in evm_inode_setxattr()

Dmitry Monakhov (3):
ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
ext4: Replace open coded mdata csum feature to helper function
ext4: move error report out of atomic context in ext4_init_block_bitmap()

Emmanuel Grumbach (4):
iwlwifi: configure the LTR
mac80211: add vif to flush call
iwlwifi: dvm: drop non VO frames when flushing
Revert "iwlwifi: mvm: treat EAPOLs like mgmt frames wrt rate"

Eric Dumazet (2):
packet: handle too big packets for PACKET_V3
gro: fix aggregation for skb using frag_list

Eric Rannaud (1):
fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

Eric Sandeen (1):
ext4: fix reservation overflow in ext4_da_write_begin

Fabio Estevam (2):
iio: mxs-lradc: Propagate the real error code on platform_get_irq() failure
iio: adc: mxs-lradc: Disable the clock on probe failure

Felipe Balbi (3):
usb: dwc3: gadget: fix set_halt() bug with pending transfers
usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
usb: gadget: udc: core: fix kernel oops with soft-connect

Francesco Ruggeri (1):
net: allow macvlans to move to net namespace

Frans Klaver (1):
usb: serial: ftdi_sio: add Awinda Station and Dongle products

Gabriele Mazzotta (1):
cpufreq: intel_pstate: Reflect current no_turbo state correctly

Gerhard Stenzel (1):
vxlan: fix incorrect initializer in union vxlan_addr

Greg Kroah-Hartman (1):
PCI: Rename sysfs 'enabled' file back to 'enable'

Guillaume Nault (1):
l2tp: fix race while getting PMTU on PPP pseudo-wire

Hans de Goede (1):
Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544

Harsha Priya (1):
ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume

Imre Deak (1):
PM / Sleep: fix recovery during resuming from hibernation

J. Bruce Fields (1):
nfsd4: fix crash on unknown operation number

Jack Pham (1):
usb: dwc3: gadget: Properly initialize LINK TRB

Jan Kara (10):
ext4: don't check quota format when there are no quota files
vfs: fix data corruption when blocksize < pagesize for mmaped data
ext4: fix mmap data corruption when blocksize < pagesize
ext3: Don't check quota format when there are no quota files
quota: Properly return errors from dquot_writeback_dquots()
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
ext4: fix overflow when updating superblock backups after resize
ext4: fix oops when loading block bitmap failed
ext4: bail out from make_indexed_dir() on first error

Jason Baron (4):
i3200_edac: Report CE events properly
i82860_edac: Report CE events properly
cpc925_edac: Report UE events properly
e7xxx_edac: Report CE events properly

Jiri Benc (2):
rtnetlink: fix VF info size
openvswitch: fix panic with multiple vlan headers

Joe Lawrence (1):
team: avoid race condition in scheduling delayed work

Joern Engel (1):
qla_target: don't delete changed nacls

KY Srinivasan (1):
hyperv: Fix a bug in netvsc_start_xmit()

Karl Beldan (1):
mac80211: fix typo in starting baserate for rts_cts_rate_idx

Lars-Peter Clausen (2):
staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
staging:iio:ad5933: Drop "raw" from channel names

Lv Zheng (2):
ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set
ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer.

Marc-André Lureau (1):
qxl: don't create too large primary surface

Markos Chandras (1):
MIPS: ftrace: Fix a microMIPS build problem

Mathias Krause (1):
posix-timers: Fix stack info leak in timer_create()

Michael S. Tsirkin (1):
kvm: x86: don't kill guest on unknown exit reason

Michal Hocko (1):
OOM, PM: OOM killed task shouldn't escape PM suspend

Nadav Amit (5):
KVM: x86: Check non-canonical addresses upon WRMSR
KVM: x86: Fix wrong masking on relative jump/call
KVM: x86: Emulator fixes for eip canonical checks on near branches
KVM: x86: Handle errors when RIP is set during far jumps
KVM: x86: Fix far-jump to non-canonical check

Nathaniel Ting (1):
USB: serial: cp210x: add Silicon Labs 358x VID and PID

Neal Cardwell (2):
tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
tcp: fix ssthresh and undo for consecutive short FRTO episodes

Nicholas Bellinger (2):
target: Fix APTPL metadata handling for dynamic MappedLUNs
iser-target: Disable TX completion interrupt coalescing

Nicolas Dichtel (1):
ip6_gre: fix flowi6_proto value in xmit path

Nikolay Aleksandrov (1):
bonding: fix div by zero while enslaving and transmitting

Olaf Hering (1):
drm/cirrus: bind also to qemu-xen-traditional

Pali Rohár (1):
cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy

Paolo Bonzini (1):
KVM: x86: use new CS.RPL as CPL during task switch

Per Hurtig (1):
tcp: fixing TLP's FIN recovery

Perry Hung (1):
usb: serial: ftdi_sio: add "bricked" FTDI device PID

Petr Matousek (1):
kvm: vmx: handle invvpid vm exit gracefully

Quentin Casasnovas (1):
kvm: fix excessive pages un-pinning in kvm_iommu_map error path.

Quinn Tran (1):
target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE

Rabin Vincent (1):
tracing/syscalls: Ignore numbers outside NR_syscalls' range

Ray Jui (1):
spi: pl022: Fix incorrect dma_unmap_sg

Robin van der Gracht (1):
iio: st_sensors: Fix buffer copy

Shmulik Ladkani (1):
sit: Fix ipip6_tunnel_lookup device matching criteria

Soren Brinkmann (1):
Revert "net/macb: add pinctrl consumer support"

Sowmini Varadhan (1):
sparc64: Move request_irq() from ldc_bind() to ldc_alloc()

Stanislaw Gruszka (1):
myri10ge: check for DMA mapping errors

Steffen Klassert (2):
xfrm: Generate blackhole routes only from route lookup functions
xfrm: Generate queueing routes only from route lookup functions

Takashi Iwai (1):
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode

Theodore Ts'o (2):
ext4: don't orphan or truncate the boot loader inode
ext4: add ext4_iget_normal() which is to be used for dir tree lookups

Thomas Gleixner (1):
usb: musb: cppi41: restart hrtimer only if not yet done

Victor Kamensky (1):
arm64: compat: fix compat types affecting struct compat_elf_prpsinfo

Vineet Gupta (1):
ARC: [nsimosci] Allow "headless" models to boot

Vlad Catoi (1):
ALSA: usb-audio: Add support for Steinberg UR22 USB interface

Vlad Yasevich (6):
bridge: Check if vlan filtering is enabled only once.
bridge: Fix br_should_learn to check vlan_enabled
tg3: Work around HW/FW limitations with vlan encapsulated frames
tg3: Allow for recieve of full-size 8021AD frames
macvtap: Fix race between device delete and open.
sctp: handle association restarts when the socket is closed.

Wang Nan (1):
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.

Yu Zhao (1):
mm: free compound page with correct order

bob picco (4):
sparc64: sun4v TLB error power off events
sparc64: find_node adjustment
sparc64: T5 PMU
sparc64: sparse irq


2014-11-06 22:37:33

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 021/162] Revert "net/macb: add pinctrl consumer support"

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Soren Brinkmann <[email protected]>

[ Upstream commit 9026968abe7ad102f4ac5c6d96d733643f75399c ]

This reverts commit 8ef29f8aae524bd51298fb10ac6a5ce6c4c5a3d8.
The driver core already calls pinctrl_get() and claims the default
state. There is no need to replicate this in the driver.
Acked-by: Nicolas Ferre <[email protected]>

Acked-by: Nicolas Ferre <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/ethernet/cadence/macb.c | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 9257869..b020d1c 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -29,7 +29,6 @@
#include <linux/of_device.h>
#include <linux/of_mdio.h>
#include <linux/of_net.h>
-#include <linux/pinctrl/consumer.h>

#include "macb.h"

@@ -1755,7 +1754,6 @@ static int __init macb_probe(struct platform_device *pdev)
struct phy_device *phydev;
u32 config;
int err = -ENXIO;
- struct pinctrl *pinctrl;
const char *mac;

regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -1764,15 +1762,6 @@ static int __init macb_probe(struct platform_device *pdev)
goto err_out;
}

- pinctrl = devm_pinctrl_get_select_default(&pdev->dev);
- if (IS_ERR(pinctrl)) {
- err = PTR_ERR(pinctrl);
- if (err == -EPROBE_DEFER)
- goto err_out;
-
- dev_warn(&pdev->dev, "No pinctrl provided\n");
- }
-
err = -ENOMEM;
dev = alloc_etherdev(sizeof(*bp));
if (!dev)
--
1.9.1

2014-11-06 22:37:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 017/162] tg3: Allow for recieve of full-size 8021AD frames

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit 7d3083ee36b51e425b6abd76778a2046906b0fd3 ]

When receiving a vlan-tagged frame that still contains
a vlan header, the length of the packet will be greater
then MTU+ETH_HLEN since it will account of the extra
vlan header. TG3 checks this for the case for 802.1Q,
but not for 802.1ad. As a result, full sized 802.1ad
frames get dropped by the card.

Add a check for 802.1ad protocol when receving full
sized frames.

Suggested-by: Prashant Sreedharan <[email protected]>
CC: Prashant Sreedharan <[email protected]>
CC: Michael Chan <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 072f401..1b84139 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6907,7 +6907,8 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
skb->protocol = eth_type_trans(skb, tp->dev);

if (len > (tp->dev->mtu + ETH_HLEN) &&
- skb->protocol != htons(ETH_P_8021Q)) {
+ skb->protocol != htons(ETH_P_8021Q) &&
+ skb->protocol != htons(ETH_P_8021AD)) {
dev_kfree_skb(skb);
goto drop_it_no_recycle;
}
--
1.9.1

2014-11-06 22:37:47

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 004/162] tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andrey Vagin <[email protected]>

[ Upstream commit 9d186cac7ffb1831e9f34cb4a3a8b22abb9dd9d4 ]

We don't know right timestamp for repaired skb-s. Wrong RTT estimations
isn't good, because some congestion modules heavily depends on it.

This patch adds the TCPCB_REPAIRED flag, which is included in
TCPCB_RETRANS.

Thanks to Eric for the advice how to fix this issue.

This patch fixes the warning:
[ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
[ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
[ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
[ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
[ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
[ 879.569982] Call Trace:
[ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
[ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
[ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
[ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
[ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
[ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
[ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0
[ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
[ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
[ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
[ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
[ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
[ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
[ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]---

v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
where it's set for other skb-s.

Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages")
Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Cc: Eric Dumazet <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrey Vagin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/tcp.h | 4 +++-
net/ipv4/tcp.c | 14 +++++++-------
net/ipv4/tcp_output.c | 5 ++++-
3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 920fc2e..0a00e7c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -720,8 +720,10 @@ struct tcp_skb_cb {
#define TCPCB_SACKED_RETRANS 0x02 /* SKB retransmitted */
#define TCPCB_LOST 0x04 /* SKB is lost */
#define TCPCB_TAGBITS 0x07 /* All tag bits */
+#define TCPCB_REPAIRED 0x10 /* SKB repaired (no skb_mstamp) */
#define TCPCB_EVER_RETRANS 0x80 /* Ever retransmitted frame */
-#define TCPCB_RETRANS (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS)
+#define TCPCB_RETRANS (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS| \
+ TCPCB_REPAIRED)

__u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */
/* 1 byte hole */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e7a02d8..e06e1df 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1132,13 +1132,6 @@ new_segment:
goto wait_for_memory;

/*
- * All packets are restored as if they have
- * already been sent.
- */
- if (tp->repair)
- TCP_SKB_CB(skb)->when = tcp_time_stamp;
-
- /*
* Check whether we can use HW checksum.
*/
if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
@@ -1147,6 +1140,13 @@ new_segment:
skb_entail(sk, skb);
copy = size_goal;
max = size_goal;
+
+ /* All packets are restored as if they have
+ * already been sent. skb_mstamp isn't set to
+ * avoid wrong rtt estimation.
+ */
+ if (tp->repair)
+ TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
}

/* Try to append data to the end of skb. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e3688a4..fa5b391 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1856,8 +1856,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
BUG_ON(!tso_segs);

- if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE)
+ if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
+ /* "when" is used as a start point for the retransmit timer */
+ TCP_SKB_CB(skb)->when = tcp_time_stamp;
goto repair; /* Skip network transmission */
+ }

cwnd_quota = tcp_cwnd_test(tp, skb);
if (!cwnd_quota) {
--
1.9.1

2014-11-06 22:37:51

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 013/162] bridge: Check if vlan filtering is enabled only once.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit 20adfa1a81af00bf2027644507ad4fa9cd2849cf ]

The bridge code checks if vlan filtering is enabled on both
ingress and egress. When the state flip happens, it
is possible for the bridge to currently be forwarding packets
and forwarding behavior becomes non-deterministic. Bridge
may drop packets on some interfaces, but not others.

This patch solves this by caching the filtered state of the
packet into skb_cb on ingress. The skb_cb is guaranteed to
not be over-written between the time packet entres bridge
forwarding path and the time it leaves it. On egress, we
can then check the cached state to see if we need to
apply filtering information.

Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/bridge/br_private.h | 3 +++
net/bridge/br_vlan.c | 15 +++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 280f601..cd37763 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -309,6 +309,9 @@ struct br_input_skb_cb {
int igmp;
int mrouters_only;
#endif
+#ifdef CONFIG_BRIDGE_VLAN_FILTERING
+ bool vlan_filtered;
+#endif
};

#define BR_INPUT_SKB_CB(__skb) ((struct br_input_skb_cb *)(__skb)->cb)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 4f5341a..f71b4d8 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -141,7 +141,8 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
{
u16 vid;

- if (!br->vlan_enabled)
+ /* If this packet was not filtered at input, let it pass */
+ if (!BR_INPUT_SKB_CB(skb)->vlan_filtered)
goto out;

/* At this point, we know that the frame was filtered and contains
@@ -186,8 +187,10 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
/* If VLAN filtering is disabled on the bridge, all packets are
* permitted.
*/
- if (!br->vlan_enabled)
+ if (!br->vlan_enabled) {
+ BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
return true;
+ }

/* If there are no vlan in the permitted list, all packets are
* rejected.
@@ -195,6 +198,8 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
if (!v)
goto drop;

+ BR_INPUT_SKB_CB(skb)->vlan_filtered = true;
+
err = br_vlan_get_tag(skb, vid);
if (!*vid) {
u16 pvid = br_get_pvid(v);
@@ -239,7 +244,8 @@ bool br_allowed_egress(struct net_bridge *br,
{
u16 vid;

- if (!br->vlan_enabled)
+ /* If this packet was not filtered at input, let it pass */
+ if (!BR_INPUT_SKB_CB(skb)->vlan_filtered)
return true;

if (!v)
@@ -258,7 +264,8 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
struct net_bridge *br = p->br;
struct net_port_vlans *v;

- if (!br->vlan_enabled)
+ /* If filtering was disabled at input, let it pass. */
+ if (!BR_INPUT_SKB_CB(skb)->vlan_filtered)
return true;

v = rcu_dereference(p->vlan_info);
--
1.9.1

2014-11-06 22:37:54

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 012/162] bonding: fix div by zero while enslaving and transmitting

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <[email protected]>

[ Upstream commit 9a72c2da690d78e93cff24b9f616412508678dd5 ]

The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit()
bond_has_slaves() is used to evaluate further transmission and it becomes
true after the slave is linked in, but when slave_cnt is used in the xmit
path it is still 0, so fetch it once and transmit based on that. Since
it is used only in round-robin and XOR modes, the fix is only for them.
Thanks to Eric Dumazet for pointing out the fault in my first try to fix
this.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051] ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>] [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8 EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS: 00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071] ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072] ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073] ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077] [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084] [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086] [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088] [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089] [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090] [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091] [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092] [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094] [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096] [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097] [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098] [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100] [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101] [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102] [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103] [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104] [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105] [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106] [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107] [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110] [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111] [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113] [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114] [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116] [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118] [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121] [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122] [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125] [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127] [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129] [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130] [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131] [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134] [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146] RSP <ffff88005b72b7f8>

CC: Eric Dumazet <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: Jay Vosburgh <[email protected]>
CC: Veaceslav Falico <[email protected]>
Fixes: 278b208375 ("bonding: initial RCU conversion")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/bonding/bond_main.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1091fa2..a656d4c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3630,8 +3630,14 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
else
bond_xmit_slave_id(bond, skb, 0);
} else {
- slave_id = bond_rr_gen_slave_id(bond);
- bond_xmit_slave_id(bond, skb, slave_id % bond->slave_cnt);
+ int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
+
+ if (likely(slave_cnt)) {
+ slave_id = bond_rr_gen_slave_id(bond);
+ bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
+ } else {
+ dev_kfree_skb_any(skb);
+ }
}

return NETDEV_TX_OK;
@@ -3662,8 +3668,13 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
+ int slave_cnt = ACCESS_ONCE(bond->slave_cnt);

- bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
+ if (likely(slave_cnt))
+ bond_xmit_slave_id(bond, skb,
+ bond_xmit_hash(bond, skb, bond->slave_cnt));
+ else
+ dev_kfree_skb_any(skb);

return NETDEV_TX_OK;
}
--
1.9.1

2014-11-06 22:38:19

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 124/162] i82860_edac: Report CE events properly

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <[email protected]>

commit ab0543de6ff0877474f57a5aafbb51a61e88676f upstream.

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <[email protected]>
Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/edac/i82860_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index 3e3e431..b93b0d0 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -124,7 +124,7 @@ static int i82860_process_error_info(struct mem_ctl_info *mci,
dimm->location[0], dimm->location[1], -1,
"i82860 UE", "");
else
- edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1,
+ edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1,
info->eap, 0, info->derrsyn,
dimm->location[0], dimm->location[1], -1,
"i82860 CE", "");
--
1.9.1

2014-11-06 22:38:28

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 116/162] ext3: Don't check quota format when there are no quota files

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 7938db449bbc55bbeb164bec7af406212e7e98f1 upstream.

The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext3/super.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 37fd31e..0498390 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1354,13 +1354,6 @@ set_qf_format:
"not specified.");
return 0;
}
- } else {
- if (sbi->s_jquota_fmt) {
- ext3_msg(sb, KERN_ERR, "error: journaled quota format "
- "specified with no journaling "
- "enabled.");
- return 0;
- }
}
#endif
return 1;
--
1.9.1

2014-11-06 22:38:34

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 129/162] usb: musb: cppi41: restart hrtimer only if not yet done

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit d2e6d62c9cbbc2da4211f672dbeea08960e29a80 upstream.

commit c58d80f52 ("usb: musb: Ensure that cppi41 timer gets armed on
premature DMA TX irq") fixed hrtimer scheduling bug. There is one left
which does not trigger that often.
The following scenario is still possible:

lock(&x->lock);
hrtimer_start(&x->t);
unlock(&x->lock);

expires:
t->function();
lock(&x->lock);
lock(&x->lock); if (!hrtimer_queued(&x->t))
hrtimer_start(&x->t);
unlock(&x->lock);

if (!list_empty(x->early_tx_list))
ret = HRTIMER_RESTART;
-> hrtimer_forward_now(...)
} else
ret = HRTIMER_NORESTART;

unlock(&x->lock);

and the timer callback returns HRTIMER_RESTART for an armed timer. This
is wrong and we run into the BUG_ON() in __run_hrtimer().
This can happens on SMP or PREEMPT-RT.
The patch fixes the problem by only starting the timer if the timer is
not yet queued.

Reported-by: Torben Hohn <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
[bigeasy: collected information and created a patch + description based
on it]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/musb/musb_cppi41.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/musb/musb_cppi41.c b/drivers/usb/musb/musb_cppi41.c
index 4a5af5c..3faaeb7 100644
--- a/drivers/usb/musb/musb_cppi41.c
+++ b/drivers/usb/musb/musb_cppi41.c
@@ -190,7 +190,8 @@ static enum hrtimer_restart cppi41_recheck_tx_req(struct hrtimer *timer)
}
}

- if (!list_empty(&controller->early_tx_list)) {
+ if (!list_empty(&controller->early_tx_list) &&
+ !hrtimer_is_queued(&controller->early_tx)) {
ret = HRTIMER_RESTART;
hrtimer_forward_now(&controller->early_tx,
ktime_set(0, 150 * NSEC_PER_USEC));
--
1.9.1

2014-11-06 22:38:48

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 121/162] USB: option: add Haier CE81B CDMA modem

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <[email protected]>

commit 012eee1522318b5ccd64d277d50ac32f7e9974fe upstream.

Port layout:

0: QCDM/DIAG
1: NMEA
2: AT
3: AT/PPP

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/serial/option.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index ac6749c..edbd457 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -362,6 +362,7 @@ static void option_instat_callback(struct urb *urb);

/* Haier products */
#define HAIER_VENDOR_ID 0x201e
+#define HAIER_PRODUCT_CE81B 0x10f8
#define HAIER_PRODUCT_CE100 0x2009

/* Cinterion (formerly Siemens) products */
@@ -1620,6 +1621,7 @@ static const struct usb_device_id option_ids[] = {
{ USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) },
{ USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) },
{ USB_DEVICE(HAIER_VENDOR_ID, HAIER_PRODUCT_CE100) },
+ { USB_DEVICE_AND_INTERFACE_INFO(HAIER_VENDOR_ID, HAIER_PRODUCT_CE81B, 0xff, 0xff, 0xff) },
/* Pirelli */
{ USB_DEVICE_INTERFACE_CLASS(PIRELLI_VENDOR_ID, PIRELLI_PRODUCT_C100_1, 0xff) },
{ USB_DEVICE_INTERFACE_CLASS(PIRELLI_VENDOR_ID, PIRELLI_PRODUCT_C100_2, 0xff) },
--
1.9.1

2014-11-06 22:38:58

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 132/162] iwlwifi: configure the LTR

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit 9180ac50716a097a407c6d7e7e4589754a922260 upstream.

The LTR is the handshake between the device and the root
complex about the latency allowed when the bus exits power
save. This configuration was missing and this led to high
latency in the link power up. The end user could experience
high latency in the network because of this.

Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-trans.h | 2 ++
drivers/net/wireless/iwlwifi/mvm/fw-api-power.h | 35 ++++++++++++++++++++++++-
drivers/net/wireless/iwlwifi/mvm/fw-api.h | 1 +
drivers/net/wireless/iwlwifi/mvm/fw.c | 9 +++++++
drivers/net/wireless/iwlwifi/mvm/ops.c | 1 +
drivers/net/wireless/iwlwifi/pcie/trans.c | 16 ++++++-----
6 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-trans.h b/drivers/net/wireless/iwlwifi/iwl-trans.h
index 143292b..2379419 100644
--- a/drivers/net/wireless/iwlwifi/iwl-trans.h
+++ b/drivers/net/wireless/iwlwifi/iwl-trans.h
@@ -484,6 +484,7 @@ enum iwl_trans_state {
* Set during transport allocation.
* @hw_id_str: a string with info about HW ID. Set during transport allocation.
* @pm_support: set to true in start_hw if link pm is supported
+ * @ltr_enabled: set to true if the LTR is enabled
* @dev_cmd_pool: pool for Tx cmd allocation - for internal use only.
* The user should use iwl_trans_{alloc,free}_tx_cmd.
* @dev_cmd_headroom: room needed for the transport's private use before the
@@ -508,6 +509,7 @@ struct iwl_trans {
u8 rx_mpdu_cmd, rx_mpdu_cmd_hdr_size;

bool pm_support;
+ bool ltr_enabled;

/* The following fields are internal only */
struct kmem_cache *dev_cmd_pool;
diff --git a/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h b/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
index 5cb93ae..71eee38 100644
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
@@ -66,13 +66,46 @@

/* Power Management Commands, Responses, Notifications */

+/**
+ * enum iwl_ltr_config_flags - masks for LTR config command flags
+ * @LTR_CFG_FLAG_FEATURE_ENABLE: Feature operational status
+ * @LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS: allow LTR change on shadow
+ * memory access
+ * @LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH: allow LTR msg send on ANY LTR
+ * reg change
+ * @LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3: allow LTR msg send on transition from
+ * D0 to D3
+ * @LTR_CFG_FLAG_SW_SET_SHORT: fixed static short LTR register
+ * @LTR_CFG_FLAG_SW_SET_LONG: fixed static short LONG register
+ * @LTR_CFG_FLAG_DENIE_C10_ON_PD: allow going into C10 on PD
+ */
+enum iwl_ltr_config_flags {
+ LTR_CFG_FLAG_FEATURE_ENABLE = BIT(0),
+ LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS = BIT(1),
+ LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH = BIT(2),
+ LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3 = BIT(3),
+ LTR_CFG_FLAG_SW_SET_SHORT = BIT(4),
+ LTR_CFG_FLAG_SW_SET_LONG = BIT(5),
+ LTR_CFG_FLAG_DENIE_C10_ON_PD = BIT(6),
+};
+
+/**
+ * struct iwl_ltr_config_cmd - configures the LTR
+ * @flags: See %enum iwl_ltr_config_flags
+ */
+struct iwl_ltr_config_cmd {
+ __le32 flags;
+ __le32 static_long;
+ __le32 static_short;
+} __packed;
+
/* Radio LP RX Energy Threshold measured in dBm */
#define POWER_LPRX_RSSI_THRESHOLD 75
#define POWER_LPRX_RSSI_THRESHOLD_MAX 94
#define POWER_LPRX_RSSI_THRESHOLD_MIN 30

/**
- * enum iwl_scan_flags - masks for power table command flags
+ * enum iwl_power_flags - masks for power table command flags
* @POWER_FLAGS_POWER_SAVE_ENA_MSK: '1' Allow to save power by turning off
* receiver and transmitter. '0' - does not allow.
* @POWER_FLAGS_POWER_MANAGEMENT_ENA_MSK: '0' Driver disables power management,
diff --git a/drivers/net/wireless/iwlwifi/mvm/fw-api.h b/drivers/net/wireless/iwlwifi/mvm/fw-api.h
index bad5a55..c079783 100644
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api.h
@@ -141,6 +141,7 @@ enum {

/* Power - legacy power table command */
POWER_TABLE_CMD = 0x77,
+ LTR_CONFIG = 0xee,

/* Thermal Throttling*/
REPLY_THERMAL_MNG_BACKOFF = 0x7e,
diff --git a/drivers/net/wireless/iwlwifi/mvm/fw.c b/drivers/net/wireless/iwlwifi/mvm/fw.c
index 70e5297..d6de231 100644
--- a/drivers/net/wireless/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/iwlwifi/mvm/fw.c
@@ -431,6 +431,15 @@ int iwl_mvm_up(struct iwl_mvm *mvm)
goto error;
}

+ if (mvm->trans->ltr_enabled) {
+ struct iwl_ltr_config_cmd cmd = {
+ .flags = cpu_to_le32(LTR_CFG_FLAG_FEATURE_ENABLE),
+ };
+
+ WARN_ON(iwl_mvm_send_cmd_pdu(mvm, LTR_CONFIG, 0,
+ sizeof(cmd), &cmd));
+ }
+
ret = iwl_mvm_power_update_device_mode(mvm);
if (ret)
goto error;
diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c b/drivers/net/wireless/iwlwifi/mvm/ops.c
index bb58342..50620a7 100644
--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
@@ -310,6 +310,7 @@ static const char *iwl_mvm_cmd_strings[REPLY_MAX] = {
CMD(REPLY_BEACON_FILTERING_CMD),
CMD(REPLY_THERMAL_MNG_BACKOFF),
CMD(MAC_PM_POWER_TABLE),
+ CMD(LTR_CONFIG),
CMD(BT_COEX_CI),
};
#undef CMD
diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c b/drivers/net/wireless/iwlwifi/pcie/trans.c
index f69aeb3..e7b2565 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -121,6 +121,7 @@ static void iwl_pcie_apm_config(struct iwl_trans *trans)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
u16 lctl;
+ u16 cap;

/*
* HW bug W/A for instability in PCIe bus L0S->L1 transition.
@@ -131,16 +132,17 @@ static void iwl_pcie_apm_config(struct iwl_trans *trans)
* power savings, even without L1.
*/
pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_LNKCTL, &lctl);
- if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
- /* L1-ASPM enabled; disable(!) L0S */
+ if (lctl & PCI_EXP_LNKCTL_ASPM_L1)
iwl_set_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
- dev_info(trans->dev, "L1 Enabled; Disabling L0S\n");
- } else {
- /* L1-ASPM disabled; enable(!) L0S */
+ else
iwl_clear_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
- dev_info(trans->dev, "L1 Disabled; Enabling L0S\n");
- }
trans->pm_support = !(lctl & PCI_EXP_LNKCTL_ASPM_L0S);
+
+ pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_DEVCTL2, &cap);
+ trans->ltr_enabled = cap & PCI_EXP_DEVCTL2_LTR_EN;
+ dev_info(trans->dev, "L1 %sabled - LTR %sabled\n",
+ (lctl & PCI_EXP_LNKCTL_ASPM_L1) ? "En" : "Dis",
+ trans->ltr_enabled ? "En" : "Dis");
}

/*
--
1.9.1

2014-11-06 22:38:42

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 118/162] USB: serial: cp210x: add Silicon Labs 358x VID and PID

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nathaniel Ting <[email protected]>

commit 35cc83eab097e5720a9cc0ec12bdc3a726f58381 upstream.

Enable Silicon Labs Ember VID chips to enumerate with the cp210x usb serial
driver. EM358x devices operating with the Ember Z-Net 5.1.2 stack may now
connect to host PCs over a USB serial link.

Signed-off-by: Nathaniel Ting <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/serial/cp210x.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index 3c73940..ad85f16 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -155,6 +155,7 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x18EF, 0xE00F) }, /* ELV USB-I2C-Interface */
{ USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */
{ USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */
+ { USB_DEVICE(0x1BA4, 0x0002) }, /* Silicon Labs 358x factory default */
{ USB_DEVICE(0x1BE3, 0x07A6) }, /* WAGO 750-923 USB Service Cable */
{ USB_DEVICE(0x1D6F, 0x0010) }, /* Seluxit ApS RF Dongle */
{ USB_DEVICE(0x1E29, 0x0102) }, /* Festo CPX-USB */
--
1.9.1

2014-11-06 22:39:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 128/162] usb: serial: ftdi_sio: add "bricked" FTDI device PID

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Perry Hung <[email protected]>

commit 7f2719f0003da1ad13124ef00f48d7514c79e30d upstream.

An official recent Windows driver from FTDI detects counterfeit devices
and reprograms the internal EEPROM containing the USB PID to 0, effectively
bricking the device.

Add support for this VID/PID pair to correctly bind the driver on these
devices.

See:
http://hackaday.com/2014/10/22/watch-that-windows-update-ftdi-drivers-are-killing-fake-chips/

Signed-off-by: Perry Hung <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 6c69d1f..9e0e29f 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -146,6 +146,7 @@ static struct ftdi_sio_quirk ftdi_8u2232c_quirk = {
* /sys/bus/usb-serial/drivers/ftdi_sio/new_id and send a patch or report.
*/
static struct usb_device_id id_table_combined [] = {
+ { USB_DEVICE(FTDI_VID, FTDI_BRICK_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_ZEITCONTROL_TAGTRACE_MIFARE_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_CTI_MINI_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_CTI_NANO_PID) },
diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
index b68084c..6786b70 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -30,6 +30,12 @@

/*** third-party PIDs (using FTDI_VID) ***/

+/*
+ * Certain versions of the official Windows FTDI driver reprogrammed
+ * counterfeit FTDI devices to PID 0. Support these devices anyway.
+ */
+#define FTDI_BRICK_PID 0x0000
+
#define FTDI_LUMEL_PD12_PID 0x6002

/*
--
1.9.1

2014-11-06 22:39:05

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 120/162] usb: option: add support for Telit LE910

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniele Palmas <[email protected]>

commit 2d0eb862dd477c3c4f32b201254ca0b40e6f465c upstream.

Add VID/PID for Telit LE910 modem. Interfaces description is almost the
same than LE920, except that the qmi interface is number 2 (instead than
5).

Signed-off-by: Daniele Palmas <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/serial/option.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 454c6d1..ac6749c 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -269,6 +269,7 @@ static void option_instat_callback(struct urb *urb);
#define TELIT_PRODUCT_DE910_DUAL 0x1010
#define TELIT_PRODUCT_UE910_V2 0x1012
#define TELIT_PRODUCT_LE920 0x1200
+#define TELIT_PRODUCT_LE910 0x1201

/* ZTE PRODUCTS */
#define ZTE_VENDOR_ID 0x19d2
@@ -588,6 +589,11 @@ static const struct option_blacklist_info zte_1255_blacklist = {
.reserved = BIT(3) | BIT(4),
};

+static const struct option_blacklist_info telit_le910_blacklist = {
+ .sendsetup = BIT(0),
+ .reserved = BIT(1) | BIT(2),
+};
+
static const struct option_blacklist_info telit_le920_blacklist = {
.sendsetup = BIT(0),
.reserved = BIT(1) | BIT(5),
@@ -1137,6 +1143,8 @@ static const struct usb_device_id option_ids[] = {
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_CC864_SINGLE) },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_DE910_DUAL) },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_UE910_V2) },
+ { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910),
+ .driver_info = (kernel_ulong_t)&telit_le910_blacklist },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE920),
.driver_info = (kernel_ulong_t)&telit_le920_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, ZTE_PRODUCT_MF622, 0xff, 0xff, 0xff) }, /* ZTE WCDMA products */
--
1.9.1

2014-11-06 22:39:09

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 135/162] Revert "iwlwifi: mvm: treat EAPOLs like mgmt frames wrt rate"

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit 1ffde699aae127e7abdb98dbdedc2cc6a973a1a1 upstream.

This reverts commit aa11bbf3df026d6b1c6b528bef634fd9de7c2619.
This commit was causing connection issues and is not needed
if IWL_MVM_RS_RSSI_BASED_INIT_RATE is set to false by default.

Regardless of the issues mentioned above, this patch added the
following WARNING:

WARNING: CPU: 0 PID: 3946 at drivers/net/wireless/iwlwifi/mvm/tx.c:190 iwl_mvm_set_tx_params+0x60a/0x6f0 [iwlmvm]()
Got an HT rate for a non data frame 0x8
CPU: 0 PID: 3946 Comm: wpa_supplicant Tainted: G O 3.17.0+ #6
Hardware name: LENOVO 20ANCTO1WW/20ANCTO1WW, BIOS GLET71WW (2.25 ) 07/02/2014
0000000000000009 ffffffff814fa911 ffff8804288db8f8 ffffffff81064f52
0000000000001808 ffff8804288db948 ffff88040add8660 ffff8804291b5600
0000000000000000 ffffffff81064fb7 ffffffffa07b73d0 0000000000000020
Call Trace:
[<ffffffff814fa911>] ? dump_stack+0x41/0x51
[<ffffffff81064f52>] ? warn_slowpath_common+0x72/0x90
[<ffffffff81064fb7>] ? warn_slowpath_fmt+0x47/0x50
[<ffffffffa07a39ea>] ? iwl_mvm_set_tx_params+0x60a/0x6f0 [iwlmvm]
[<ffffffffa07a3cf8>] ? iwl_mvm_tx_skb+0x48/0x3c0 [iwlmvm]
[<ffffffffa079cb9b>] ? iwl_mvm_mac_tx+0x7b/0x180 [iwlmvm]
[<ffffffffa0746ce9>] ? __ieee80211_tx+0x2b9/0x3c0 [mac80211]
[<ffffffffa07492f3>] ? ieee80211_tx+0xb3/0x100 [mac80211]
[<ffffffffa0749c49>] ? ieee80211_subif_start_xmit+0x459/0xca0 [mac80211]
[<ffffffff814116e7>] ? dev_hard_start_xmit+0x337/0x5f0
[<ffffffff81430d46>] ? sch_direct_xmit+0x96/0x1f0
[<ffffffff81411ba3>] ? __dev_queue_xmit+0x203/0x4f0
[<ffffffff8142f670>] ? ether_setup+0x70/0x70
[<ffffffff814e96a1>] ? packet_sendmsg+0xf81/0x1110
[<ffffffff8140625c>] ? skb_free_datagram+0xc/0x40
[<ffffffff813f7538>] ? sock_sendmsg+0x88/0xc0
[<ffffffff813f7274>] ? move_addr_to_kernel.part.20+0x14/0x60
[<ffffffff811c47c2>] ? __inode_wait_for_writeback+0x62/0xb0
[<ffffffff813f7a91>] ? SYSC_sendto+0xf1/0x180
[<ffffffff813f88f9>] ? __sys_recvmsg+0x39/0x70
[<ffffffff8150066d>] ? system_call_fastpath+0x1a/0x1f
---[ end trace cc19a150d311fc63 ]---

which was reported here: https://bugzilla.kernel.org/show_bug.cgi?id=85691

Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/wireless/iwlwifi/mvm/tx.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/mvm/tx.c b/drivers/net/wireless/iwlwifi/mvm/tx.c
index edf9f7b..c8f6974 100644
--- a/drivers/net/wireless/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/mvm/tx.c
@@ -173,14 +173,10 @@ static void iwl_mvm_set_tx_cmd_rate(struct iwl_mvm *mvm,

/*
* for data packets, rate info comes from the table inside the fw. This
- * table is controlled by LINK_QUALITY commands. Exclude ctrl port
- * frames like EAPOLs which should be treated as mgmt frames. This
- * avoids them being sent initially in high rates which increases the
- * chances for completion of the 4-Way handshake.
+ * table is controlled by LINK_QUALITY commands
*/

- if (ieee80211_is_data(fc) && sta &&
- !(info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO)) {
+ if (ieee80211_is_data(fc) && sta) {
tx_cmd->initial_rate_index = 0;
tx_cmd->tx_flags |= cpu_to_le32(TX_CMD_FLG_STA_RATE);
return;
--
1.9.1

2014-11-06 22:39:16

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 119/162] usb: serial: ftdi_sio: add Awinda Station and Dongle products

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Frans Klaver <[email protected]>

commit edd74ffab1f6909eee400c7de8ce621870aacac9 upstream.

Add new IDs for the Xsens Awinda Station and Awinda Dongle.

While at it, order the definitions by PID and add a logical separation
between devices using Xsens' VID and those using FTDI's VID.

Signed-off-by: Frans Klaver <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/serial/ftdi_sio.c | 2 ++
drivers/usb/serial/ftdi_sio_ids.h | 6 +++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index c903b0b..6c69d1f 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -675,6 +675,8 @@ static struct usb_device_id id_table_combined [] = {
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_5_PID) },
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_6_PID) },
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_7_PID) },
+ { USB_DEVICE(XSENS_VID, XSENS_AWINDA_DONGLE_PID) },
+ { USB_DEVICE(XSENS_VID, XSENS_AWINDA_STATION_PID) },
{ USB_DEVICE(XSENS_VID, XSENS_CONVERTER_PID) },
{ USB_DEVICE(XSENS_VID, XSENS_MTW_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_OMNI1509) },
diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
index 5937b2d..b68084c 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -143,8 +143,12 @@
* Xsens Technologies BV products (http://www.xsens.com).
*/
#define XSENS_VID 0x2639
-#define XSENS_CONVERTER_PID 0xD00D /* Xsens USB-serial converter */
+#define XSENS_AWINDA_STATION_PID 0x0101
+#define XSENS_AWINDA_DONGLE_PID 0x0102
#define XSENS_MTW_PID 0x0200 /* Xsens MTw */
+#define XSENS_CONVERTER_PID 0xD00D /* Xsens USB-serial converter */
+
+/* Xsens devices using FTDI VID */
#define XSENS_CONVERTER_0_PID 0xD388 /* Xsens USB converter */
#define XSENS_CONVERTER_1_PID 0xD389 /* Xsens Wireless Receiver */
#define XSENS_CONVERTER_2_PID 0xD38A
--
1.9.1

2014-11-06 22:39:35

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 126/162] e7xxx_edac: Report CE events properly

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <[email protected]>

commit 8030122a9ccf939186f8db96c318dbb99b5463f6 upstream.

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <[email protected]>
Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/edac/e7xxx_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 1c4056a..2697dea 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -226,7 +226,7 @@ static void process_ce(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
static void process_ce_no_info(struct mem_ctl_info *mci)
{
edac_dbg(3, "\n");
- edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1, 0, 0, 0, -1, -1, -1,
+ edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1, 0, 0, 0, -1, -1, -1,
"e7xxx CE log register overflow", "");
}

--
1.9.1

2014-11-06 22:39:40

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 154/162] ext4: enable journal checksum when metadata checksum feature enabled

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <[email protected]>

commit 98c1a7593fa355fda7f5a5940c8bf5326ca964ba upstream.

If metadata checksumming is turned on for the FS, we need to tell the
journal to use checksumming too.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/super.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 608db58..9fb3e6c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3486,6 +3486,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#ifdef CONFIG_EXT4_FS_POSIX_ACL
set_opt(sb, POSIX_ACL);
#endif
+ /* don't forget to enable journal_csum when metadata_csum is enabled. */
+ if (ext4_has_metadata_csum(sb))
+ set_opt(sb, JOURNAL_CHECKSUM);
+
if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
set_opt(sb, JOURNAL_DATA);
else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
--
1.9.1

2014-11-06 22:39:48

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 143/162] evm: check xattr value length and type in evm_inode_setxattr()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Kasatkin <[email protected]>

commit 3b1deef6b1289a99505858a3b212c5b50adf0c2f upstream.

evm_inode_setxattr() can be called with no value. The function does not
check the length so that following command can be used to produce the
kernel oops: setfattr -n security.evm FOO. This patch fixes it.

Changes in v3:
* there is no reason to return different error codes for EVM_XATTR_HMAC
and non EVM_XATTR_HMAC. Remove unnecessary test then.

Changes in v2:
* testing for validity of xattr type

[ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1106.398192] IP: [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0
[ 1106.399953] Oops: 0000 [#1] SMP
[ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936
[ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000
[ 1106.400020] RIP: 0010:[<ffffffff812af7b8>] [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP: 0018:ffff88002917fd50 EFLAGS: 00010246
[ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000
[ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8
[ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df
[ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00
[ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1106.400020] FS: 00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
[ 1106.400020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0
[ 1106.400020] Stack:
[ 1106.400020] ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98
[ 1106.400020] ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000
[ 1106.400020] 0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8
[ 1106.400020] Call Trace:
[ 1106.400020] [<ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a
[ 1106.400020] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f
[ 1106.400020] [<ffffffff8116d1e0>] setxattr+0x122/0x16c
[ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143
[ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 1106.400020] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f
[ 1106.400020] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0
[ 1106.400020] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b
[ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83
[ 1106.400020] RIP [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP <ffff88002917fd50>
[ 1106.400020] CR2: 0000000000000000
[ 1106.428061] ---[ end trace ae08331628ba3050 ]---

Reported-by: Jan Kara <[email protected]>
Signed-off-by: Dmitry Kasatkin <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
security/integrity/evm/evm_main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 3c5cbb9..09036f4 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -296,9 +296,12 @@ int evm_inode_setxattr(struct dentry *dentry, const char *xattr_name,
{
const struct evm_ima_xattr_data *xattr_data = xattr_value;

- if ((strcmp(xattr_name, XATTR_NAME_EVM) == 0)
- && (xattr_data->type == EVM_XATTR_HMAC))
- return -EPERM;
+ if (strcmp(xattr_name, XATTR_NAME_EVM) == 0) {
+ if (!xattr_value_len)
+ return -EINVAL;
+ if (xattr_data->type != EVM_IMA_XATTR_DIGSIG)
+ return -EPERM;
+ }
return evm_protect_xattr(dentry, xattr_name, xattr_value,
xattr_value_len);
}
--
1.9.1

2014-11-06 22:40:01

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 150/162] mm, thp: fix collapsing of hugepages on madvise

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: David Rientjes <[email protected]>

commit 6d50e60cd2edb5a57154db5a6f64eef5aa59b751 upstream.

If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.

This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.

Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.

It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.

Signed-off-by: David Rientjes <[email protected]>
Reported-by: Suleiman Souhlal <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[ kamal: backport to 3.13: no VM_BUG_ON_VMA ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/linux/khugepaged.h | 17 ++++++++++-------
mm/huge_memory.c | 11 ++++++-----
mm/mmap.c | 8 ++++----
3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 6b394f0..eeb3079 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -6,7 +6,8 @@
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern int __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
-extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma);
+extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags);

#define khugepaged_enabled() \
(transparent_hugepage_flags & \
@@ -35,13 +36,13 @@ static inline void khugepaged_exit(struct mm_struct *mm)
__khugepaged_exit(mm);
}

-static inline int khugepaged_enter(struct vm_area_struct *vma)
+static inline int khugepaged_enter(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
if ((khugepaged_always() ||
- (khugepaged_req_madv() &&
- vma->vm_flags & VM_HUGEPAGE)) &&
- !(vma->vm_flags & VM_NOHUGEPAGE))
+ (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
+ !(vm_flags & VM_NOHUGEPAGE))
if (__khugepaged_enter(vma->vm_mm))
return -ENOMEM;
return 0;
@@ -54,11 +55,13 @@ static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
static inline void khugepaged_exit(struct mm_struct *mm)
{
}
-static inline int khugepaged_enter(struct vm_area_struct *vma)
+static inline int khugepaged_enter(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
return 0;
}
-static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 310f27e..65aa131 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -788,7 +788,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
return VM_FAULT_FALLBACK;
if (unlikely(anon_vma_prepare(vma)))
return VM_FAULT_OOM;
- if (unlikely(khugepaged_enter(vma)))
+ if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
if (!(flags & FAULT_FLAG_WRITE) &&
transparent_hugepage_use_zero_page()) {
@@ -1993,7 +1993,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
* register it here without waiting a page fault that
* may not happen any time soon.
*/
- if (unlikely(khugepaged_enter_vma_merge(vma)))
+ if (unlikely(khugepaged_enter_vma_merge(vma, *vm_flags)))
return -ENOMEM;
break;
case MADV_NOHUGEPAGE:
@@ -2094,7 +2094,8 @@ int __khugepaged_enter(struct mm_struct *mm)
return 0;
}

-int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
unsigned long hstart, hend;
if (!vma->anon_vma)
@@ -2106,11 +2107,11 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
if (vma->vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
- VM_BUG_ON(vma->vm_flags & VM_NO_THP);
+ VM_BUG_ON(vm_flags & VM_NO_THP);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
if (hstart < hend)
- return khugepaged_enter(vma);
+ return khugepaged_enter(vma, vm_flags);
return 0;
}

diff --git a/mm/mmap.c b/mm/mmap.c
index 546db74..32dbf6d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1046,7 +1046,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
end, prev->vm_pgoff, NULL);
if (err)
return NULL;
- khugepaged_enter_vma_merge(prev);
+ khugepaged_enter_vma_merge(prev, vm_flags);
return prev;
}

@@ -1065,7 +1065,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
next->vm_pgoff - pglen, NULL);
if (err)
return NULL;
- khugepaged_enter_vma_merge(area);
+ khugepaged_enter_vma_merge(area, vm_flags);
return area;
}

@@ -2155,7 +2155,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
vma_unlock_anon_vma(vma);
- khugepaged_enter_vma_merge(vma);
+ khugepaged_enter_vma_merge(vma, vma->vm_flags);
validate_mm(vma->vm_mm);
return error;
}
@@ -2224,7 +2224,7 @@ int expand_downwards(struct vm_area_struct *vma,
}
}
vma_unlock_anon_vma(vma);
- khugepaged_enter_vma_merge(vma);
+ khugepaged_enter_vma_merge(vma, vma->vm_flags);
validate_mm(vma->vm_mm);
return error;
}
--
1.9.1

2014-11-06 22:40:10

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 144/162] drm/radeon/dpm: disable ulv support on SI

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit 6fa455935ab956248b165f150ec6ae9106210077 upstream.

Causes problems on some boards.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=82889

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/radeon/si_dpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index 0d50df3..b044e19 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -6224,7 +6224,7 @@ static void si_parse_pplib_clock_info(struct radeon_device *rdev,
if ((rps->class2 & ATOM_PPLIB_CLASSIFICATION2_ULV) &&
index == 0) {
/* XXX disable for A0 tahiti */
- si_pi->ulv.supported = true;
+ si_pi->ulv.supported = false;
si_pi->ulv.pl = *pl;
si_pi->ulv.one_pcie_lane_in_ulv = false;
si_pi->ulv.volt_change_delay = SISLANDS_ULVVOLTAGECHANGEDELAY_DFLT;
--
1.9.1

2014-11-06 22:40:17

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 145/162] drm/radeon: dpm fixes for asrock systems

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit 72b3f9183ed57e4a2f0601a1c25ae2fd39855952 upstream.

- bapm seems to cause CPU stuck messages so disable it.
- nb dpm seems to prevent GPU dpm from getting enabled, so
disable it.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=85107

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/radeon/kv_dpm.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c
index 1b0331c..8fcc491 100644
--- a/drivers/gpu/drm/radeon/kv_dpm.c
+++ b/drivers/gpu/drm/radeon/kv_dpm.c
@@ -2620,7 +2620,11 @@ int kv_dpm_init(struct radeon_device *rdev)
if (rdev->family == CHIP_KABINI)
pi->high_voltage_t = 4001;

- pi->enable_nb_dpm = true;
+ /* Enabling nb dpm on an asrock system prevents dpm from working */
+ if (rdev->pdev->subsystem_vendor == 0x1849)
+ pi->enable_nb_dpm = false;
+ else
+ pi->enable_nb_dpm = true;

pi->caps_power_containment = true;
pi->caps_cac = true;
@@ -2635,10 +2639,19 @@ int kv_dpm_init(struct radeon_device *rdev)
pi->caps_sclk_ds = true;
pi->enable_auto_thermal_throttling = true;
pi->disable_nb_ps3_in_battery = false;
- if (radeon_bapm == 0)
+ if (radeon_bapm == -1) {
+ /* There are stability issues reported on with
+ * bapm enabled on an asrock system.
+ */
+ if (rdev->pdev->subsystem_vendor == 0x1849)
+ pi->bapm_enable = false;
+ else
+ pi->bapm_enable = true;
+ } else if (radeon_bapm == 0) {
pi->bapm_enable = false;
- else
+ } else {
pi->bapm_enable = true;
+ }
pi->voltage_drop_t = 0;
pi->caps_sclk_throttle_low_notification = false;
pi->caps_fps = false; /* true? */
--
1.9.1

2014-11-06 22:40:14

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 148/162] cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Wang Nan <[email protected]>

commit 401507d67d5c2854f5a88b3f93f64fc6f267bca5 upstream.

Commit ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.

This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.

bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak

Offlined Pages 32768
kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x46/0x58
create_object+0x266/0x2c0
kmemleak_alloc+0x26/0x50
kmem_cache_alloc+0xd3/0x160
__sigqueue_alloc+0x49/0xd0
__send_signal+0xcb/0x410
send_signal+0x45/0x90
__group_send_sig_info+0x13/0x20
do_notify_parent+0x1bb/0x260
do_exit+0x767/0xa40
do_group_exit+0x44/0xa0
SyS_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b

kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff880016900000 (size 524288):
kmemleak: comm "swapper/0", pid 0, jiffies 4294667296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
log_early+0x63/0x77
kmemleak_alloc+0x4b/0x50
init_section_page_cgroup+0x7f/0xf5
page_cgroup_init+0xc5/0xd0
start_kernel+0x333/0x408
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf5/0xfc

Fixes: ff7ee93f4715 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
mm/page_cgroup.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3a..e007236 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -170,6 +170,7 @@ static void free_page_cgroup(void *addr)
sizeof(struct page_cgroup) * PAGES_PER_SECTION;

BUG_ON(PageReserved(page));
+ kmemleak_free(addr);
free_pages_exact(addr, table_size);
}
}
--
1.9.1

2014-11-06 22:40:35

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 141/162] ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit 317168d0c766defd14b3d0e9c2c4a9a258b803ee upstream.

In compat mode, we copy each field of snd_pcm_status struct but don't
touch the reserved fields, and this leaves uninitialized values
there. Meanwhile the native ioctl does zero-clear the whole
structure, so we should follow the same rule in compat mode, too.

Reported-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
sound/core/pcm_compat.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/sound/core/pcm_compat.c b/sound/core/pcm_compat.c
index af49721..c4ac3c1 100644
--- a/sound/core/pcm_compat.c
+++ b/sound/core/pcm_compat.c
@@ -206,6 +206,8 @@ static int snd_pcm_status_user_compat(struct snd_pcm_substream *substream,
if (err < 0)
return err;

+ if (clear_user(src, sizeof(*src)))
+ return -EFAULT;
if (put_user(status.state, &src->state) ||
compat_put_timespec(&status.trigger_tstamp, &src->trigger_tstamp) ||
compat_put_timespec(&status.tstamp, &src->tstamp) ||
--
1.9.1

2014-11-06 22:41:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 137/162] Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit 993b3a3f80a7842a48cd46c2b41e1b3ef6302468 upstream.

These models need i8042.notimeout, otherwise the touchpad will not work.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=69731
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1111138
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/input/serio/i8042-x86ia64io.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h
index 40ff494..ce715b1 100644
--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -615,6 +615,22 @@ static const struct dmi_system_id __initconst i8042_dmi_notimeout_table[] = {
},
},
{
+ /* Fujitsu A544 laptop */
+ /* https://bugzilla.redhat.com/show_bug.cgi?id=1111138 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK A544"),
+ },
+ },
+ {
+ /* Fujitsu AH544 laptop */
+ /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK AH544"),
+ },
+ },
+ {
/* Fujitsu U574 laptop */
/* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */
.matches = {
--
1.9.1

2014-11-06 22:41:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 138/162] posix-timers: Fix stack info leak in timer_create()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <[email protected]>

commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Brad Spengler <[email protected]>
Cc: PaX Team <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
kernel/posix-timers.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 424c2d4..77e6b83 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -634,6 +634,7 @@ SYSCALL_DEFINE3(timer_create, const clockid_t, which_clock,
goto out;
}
} else {
+ memset(&event.sigev_value, 0, sizeof(event.sigev_value));
event.sigev_notify = SIGEV_SIGNAL;
event.sigev_signo = SIGALRM;
event.sigev_value.sival_int = new_timer->it_id;
--
1.9.1

2014-11-06 22:41:17

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 147/162] x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dexuan Cui <[email protected]>

commit d1cd1210834649ce1ca6bafe5ac25d2f40331343 upstream.

pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.

Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).

Cast pte_pfn() to phys_addr_t before shifting.

Fixes: "commit d76565344512: x86, mm: Create slow_virt_to_phys()"
Signed-off-by: Dexuan Cui <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/mm/pageattr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index bb32480..aabdf76 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -389,7 +389,7 @@ phys_addr_t slow_virt_to_phys(void *__virt_addr)
psize = page_level_size(level);
pmask = page_level_mask(level);
offset = virt_addr & ~pmask;
- phys_addr = pte_pfn(*pte) << PAGE_SHIFT;
+ phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
return (phys_addr | offset);
}
EXPORT_SYMBOL_GPL(slow_virt_to_phys);
--
1.9.1

2014-11-06 22:41:27

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 050/162] sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 7c0fa0f24bb76ce3d67be7f737b799846a04570f ]

Make sure, at compile time, that the kernel can properly support
whatever MAX_PHYS_ADDRESS_BITS is defined to.

On M7 chips, use a max_phys_bits value of 49.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/page_64.h | 8 ++++----
arch/sparc/include/asm/pgtable_64.h | 4 ++++
arch/sparc/mm/init_64.c | 9 ++++++++-
3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index b70210d..275fc6c 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -122,11 +122,11 @@ extern unsigned long PAGE_OFFSET;

#endif /* !(__ASSEMBLY__) */

-/* The maximum number of physical memory address bits we support, this
- * is used to size various tables used to manage kernel TLB misses and
- * also the sparsemem code.
+/* The maximum number of physical memory address bits we support. The
+ * largest value we can support is whatever "KPGD_SHIFT + KPTE_BITS"
+ * evaluates to.
*/
-#define MAX_PHYS_ADDRESS_BITS 47
+#define MAX_PHYS_ADDRESS_BITS 53

#define ILOG2_4MB 22
#define ILOG2_256MB 28
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index be03c00..58547f7 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -67,6 +67,10 @@
#define PGDIR_MASK (~(PGDIR_SIZE-1))
#define PGDIR_BITS (PAGE_SHIFT - 3)

+#if (MAX_PHYS_ADDRESS_BITS > PGDIR_SHIFT + PGDIR_BITS)
+#error MAX_PHYS_ADDRESS_BITS exceeds what kernel page tables can support
+#endif
+
#if (PGDIR_SHIFT + PGDIR_BITS) != 53
#error Page table parameters do not cover virtual address space properly.
#endif
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 75d26ee..9565f4d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1683,12 +1683,19 @@ static void __init setup_page_offset(void)
case SUN4V_CHIP_NIAGARA4:
case SUN4V_CHIP_NIAGARA5:
case SUN4V_CHIP_SPARC64X:
- default:
+ case SUN4V_CHIP_SPARC_M6:
/* T4 and later support 52-bit virtual addresses. */
sparc64_va_hole_top = 0xfff8000000000000UL;
sparc64_va_hole_bottom = 0x0008000000000000UL;
max_phys_bits = 47;
break;
+ case SUN4V_CHIP_SPARC_M7:
+ default:
+ /* M7 and later support 52-bit virtual addresses. */
+ sparc64_va_hole_top = 0xfff8000000000000UL;
+ sparc64_va_hole_bottom = 0x0008000000000000UL;
+ max_phys_bits = 49;
+ break;
}
}

--
1.9.1

2014-11-06 22:41:33

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 051/162] sparc64: Adjust vmalloc region size based upon available virtual address bits.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit bb4e6e85daa52a9f6210fa06a5ec6269598a202b ]

In order to accomodate embedded per-cpu allocation with large numbers
of cpus and numa nodes, we have to use as much virtual address space
as possible for the vmalloc region. Otherwise we can get things like:

PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000

So, once we select a value for PAGE_OFFSET, derive the size of the
vmalloc region based upon that.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/page_64.h | 1 -
arch/sparc/include/asm/pgtable_64.h | 9 +++++----
arch/sparc/kernel/ktlb.S | 8 ++++----
arch/sparc/mm/init_64.c | 30 +++++++++++++++++++-----------
4 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 275fc6c..b18e602 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -117,7 +117,6 @@ extern unsigned long sparc64_va_hole_bottom;

#include <asm-generic/memory_model.h>

-#define PAGE_OFFSET_BY_BITS(X) (-(_AC(1,UL) << (X)))
extern unsigned long PAGE_OFFSET;

#endif /* !(__ASSEMBLY__) */
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 58547f7..ad1def4 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -40,10 +40,7 @@
#define LOW_OBP_ADDRESS _AC(0x00000000f0000000,UL)
#define HI_OBP_ADDRESS _AC(0x0000000100000000,UL)
#define VMALLOC_START _AC(0x0000000100000000,UL)
-#define VMALLOC_END _AC(0x0000010000000000,UL)
-#define VMEMMAP_BASE _AC(0x0000010000000000,UL)
-
-#define vmemmap ((struct page *)VMEMMAP_BASE)
+#define VMEMMAP_BASE VMALLOC_END

/* PMD_SHIFT determines the size of the area a second-level page
* table can map
@@ -81,6 +78,10 @@

#ifndef __ASSEMBLY__

+extern unsigned long VMALLOC_END;
+
+#define vmemmap ((struct page *)VMEMMAP_BASE)
+
#include <linux/sched.h>

bool kern_addr_valid(unsigned long addr);
diff --git a/arch/sparc/kernel/ktlb.S b/arch/sparc/kernel/ktlb.S
index 2627a7f..ef0d8e9 100644
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -199,8 +199,8 @@ kvmap_dtlb_nonlinear:

#ifdef CONFIG_SPARSEMEM_VMEMMAP
/* Do not use the TSB for vmemmap. */
- mov (VMEMMAP_BASE >> 40), %g5
- sllx %g5, 40, %g5
+ sethi %hi(VMEMMAP_BASE), %g5
+ ldx [%g5 + %lo(VMEMMAP_BASE)], %g5
cmp %g4,%g5
bgeu,pn %xcc, kvmap_vmemmap
nop
@@ -212,8 +212,8 @@ kvmap_dtlb_tsbmiss:
sethi %hi(MODULES_VADDR), %g5
cmp %g4, %g5
blu,pn %xcc, kvmap_dtlb_longpath
- mov (VMALLOC_END >> 40), %g5
- sllx %g5, 40, %g5
+ sethi %hi(VMALLOC_END), %g5
+ ldx [%g5 + %lo(VMALLOC_END)], %g5
cmp %g4, %g5
bgeu,pn %xcc, kvmap_dtlb_longpath
nop
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 9565f4d..ca634b3 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1362,25 +1362,24 @@ static unsigned long max_phys_bits = 40;

bool kern_addr_valid(unsigned long addr)
{
- unsigned long above = ((long)addr) >> max_phys_bits;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;

- if (above != 0 && above != -1UL)
- return false;
-
- if (addr >= (unsigned long) KERNBASE &&
- addr < (unsigned long)&_end)
- return true;
-
- if (addr >= PAGE_OFFSET) {
+ if ((long)addr < 0L) {
unsigned long pa = __pa(addr);

+ if ((addr >> max_phys_bits) != 0UL)
+ return false;
+
return pfn_valid(pa >> PAGE_SHIFT);
}

+ if (addr >= (unsigned long) KERNBASE &&
+ addr < (unsigned long)&_end)
+ return true;
+
pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
return 0;
@@ -1649,6 +1648,9 @@ unsigned long __init find_ecache_flush_span(unsigned long size)
unsigned long PAGE_OFFSET;
EXPORT_SYMBOL(PAGE_OFFSET);

+unsigned long VMALLOC_END = 0x0000010000000000UL;
+EXPORT_SYMBOL(VMALLOC_END);
+
unsigned long sparc64_va_hole_top = 0xfffff80000000000UL;
unsigned long sparc64_va_hole_bottom = 0x0000080000000000UL;

@@ -1705,10 +1707,16 @@ static void __init setup_page_offset(void)
prom_halt();
}

- PAGE_OFFSET = PAGE_OFFSET_BY_BITS(max_phys_bits);
+ PAGE_OFFSET = sparc64_va_hole_top;
+ VMALLOC_END = ((sparc64_va_hole_bottom >> 1) +
+ (sparc64_va_hole_bottom >> 2));

- pr_info("PAGE_OFFSET is 0x%016lx (max_phys_bits == %lu)\n",
+ pr_info("MM: PAGE_OFFSET is 0x%016lx (max_phys_bits == %lu)\n",
PAGE_OFFSET, max_phys_bits);
+ pr_info("MM: VMALLOC [0x%016lx --> 0x%016lx]\n",
+ VMALLOC_START, VMALLOC_END);
+ pr_info("MM: VMEMMAP [0x%016lx --> 0x%016lx]\n",
+ VMEMMAP_BASE, VMEMMAP_BASE << 1);
}

static void __init tsb_phys_patch(void)
--
1.9.1

2014-11-06 22:41:30

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 043/162] sparc64: cpu hardware caps support for sparc M6 and M7

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Allen Pais <[email protected]>

[ Upstream commit 408316258521168614bfb4da0e070490d3e65a17 ]

Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/setup_64.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 3fdb455..1c7bfdf 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -500,12 +500,16 @@ static void __init init_sparc64_elf_hwcap(void)
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= HWCAP_SPARC_BLKINIT;
if (sun4v_chip_type == SUN4V_CHIP_NIAGARA2 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= HWCAP_SPARC_N2;
}
@@ -533,6 +537,8 @@ static void __init init_sparc64_elf_hwcap(void)
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= (AV_SPARC_VIS | AV_SPARC_VIS2 |
AV_SPARC_ASI_BLK_INIT |
@@ -540,6 +546,8 @@ static void __init init_sparc64_elf_hwcap(void)
if (sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
+ sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= (AV_SPARC_VIS3 | AV_SPARC_HPC |
AV_SPARC_FMAF);
--
1.9.1

2014-11-06 22:41:39

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 040/162] sparc64: Fix hibernation code refrence to PAGE_OFFSET.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 9d0713edf72461438bc3526e4ea55fec47754cd9 ]

We changed PAGE_OFFSET to be a variable rather than a constant,
but this reference here in the hibernate assembler got missed.

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/power/hibernate_asm.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/power/hibernate_asm.S b/arch/sparc/power/hibernate_asm.S
index 7994216..d7d9017 100644
--- a/arch/sparc/power/hibernate_asm.S
+++ b/arch/sparc/power/hibernate_asm.S
@@ -54,8 +54,8 @@ ENTRY(swsusp_arch_resume)
nop

/* Write PAGE_OFFSET to %g7 */
- sethi %uhi(PAGE_OFFSET), %g7
- sllx %g7, 32, %g7
+ sethi %hi(PAGE_OFFSET), %g7
+ ldx [%g7 + %lo(PAGE_OFFSET)], %g7

setuw (PAGE_SIZE-8), %g3

--
1.9.1

2014-11-06 22:41:50

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 046/162] sparc64: Define VA hole at run time, rather than at compile time.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 4397bed080598001e88f612deb8b080bb1cc2322 ]

Now that we use 4-level page tables, we can provide up to 53-bits of
virtual address space to the user.

Adjust the VA hole based upon the capabilities of the cpu type probed.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/page_64.h | 15 ++++-----------
arch/sparc/mm/init_64.c | 21 +++++++++++++++++++++
2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 3747c4f..6dc1948 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -102,21 +102,14 @@ typedef unsigned long pgprot_t;

typedef pte_t *pgtable_t;

-/* These two values define the virtual address space range in which we
- * must forbid 64-bit user processes from making mappings. It used to
- * represent precisely the virtual address space hole present in most
- * early sparc64 chips including UltraSPARC-I. But now it also is
- * further constrained by the limits of our page tables, which is
- * 43-bits of virtual address.
- */
-#define SPARC64_VA_HOLE_TOP _AC(0xfffffc0000000000,UL)
-#define SPARC64_VA_HOLE_BOTTOM _AC(0x0000040000000000,UL)
+extern unsigned long sparc64_va_hole_top;
+extern unsigned long sparc64_va_hole_bottom;

/* The next two defines specify the actual exclusion region we
* enforce, wherein we use a 4GB red zone on each side of the VA hole.
*/
-#define VA_EXCLUDE_START (SPARC64_VA_HOLE_BOTTOM - (1UL << 32UL))
-#define VA_EXCLUDE_END (SPARC64_VA_HOLE_TOP + (1UL << 32UL))
+#define VA_EXCLUDE_START (sparc64_va_hole_bottom - (1UL << 32UL))
+#define VA_EXCLUDE_END (sparc64_va_hole_top + (1UL << 32UL))

#define TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \
_AC(0x0000000070000000,UL) : \
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 7b03310..44eb215 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1623,25 +1623,46 @@ static void __init page_offset_shift_patch(unsigned long phys_bits)
}
}

+unsigned long sparc64_va_hole_top = 0xfffff80000000000UL;
+unsigned long sparc64_va_hole_bottom = 0x0000080000000000UL;
+
static void __init setup_page_offset(void)
{
unsigned long max_phys_bits = 40;

if (tlb_type == cheetah || tlb_type == cheetah_plus) {
+ /* Cheetah/Panther support a full 64-bit virtual
+ * address, so we can use all that our page tables
+ * support.
+ */
+ sparc64_va_hole_top = 0xfff0000000000000UL;
+ sparc64_va_hole_bottom = 0x0010000000000000UL;
+
max_phys_bits = 42;
} else if (tlb_type == hypervisor) {
switch (sun4v_chip_type) {
case SUN4V_CHIP_NIAGARA1:
case SUN4V_CHIP_NIAGARA2:
+ /* T1 and T2 support 48-bit virtual addresses. */
+ sparc64_va_hole_top = 0xffff800000000000UL;
+ sparc64_va_hole_bottom = 0x0000800000000000UL;
+
max_phys_bits = 39;
break;
case SUN4V_CHIP_NIAGARA3:
+ /* T3 supports 48-bit virtual addresses. */
+ sparc64_va_hole_top = 0xffff800000000000UL;
+ sparc64_va_hole_bottom = 0x0000800000000000UL;
+
max_phys_bits = 43;
break;
case SUN4V_CHIP_NIAGARA4:
case SUN4V_CHIP_NIAGARA5:
case SUN4V_CHIP_SPARC64X:
default:
+ /* T4 and later support 52-bit virtual addresses. */
+ sparc64_va_hole_top = 0xfff8000000000000UL;
+ sparc64_va_hole_bottom = 0x0008000000000000UL;
max_phys_bits = 47;
break;
}
--
1.9.1

2014-11-06 22:41:54

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 041/162] sparc64: correctly recognise M6 and M7 cpu type

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Allen Pais <[email protected]>

[ Upstream commit cadbb58039f7cab1def9c931012ab04c953a6997 ]

The following patch adds support for correctly
recognising M6 and M7 cpu type.

Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/spitfire.h | 2 ++
arch/sparc/kernel/cpu.c | 12 ++++++++++++
arch/sparc/kernel/head_64.S | 12 ++++++++++++
3 files changed, 26 insertions(+)

diff --git a/arch/sparc/include/asm/spitfire.h b/arch/sparc/include/asm/spitfire.h
index 6b67e50..69424d4 100644
--- a/arch/sparc/include/asm/spitfire.h
+++ b/arch/sparc/include/asm/spitfire.h
@@ -45,6 +45,8 @@
#define SUN4V_CHIP_NIAGARA3 0x03
#define SUN4V_CHIP_NIAGARA4 0x04
#define SUN4V_CHIP_NIAGARA5 0x05
+#define SUN4V_CHIP_SPARC_M6 0x06
+#define SUN4V_CHIP_SPARC_M7 0x07
#define SUN4V_CHIP_SPARC64X 0x8a
#define SUN4V_CHIP_UNKNOWN 0xff

diff --git a/arch/sparc/kernel/cpu.c b/arch/sparc/kernel/cpu.c
index 5c51258..52e10de 100644
--- a/arch/sparc/kernel/cpu.c
+++ b/arch/sparc/kernel/cpu.c
@@ -493,6 +493,18 @@ static void __init sun4v_cpu_probe(void)
sparc_pmu_type = "niagara5";
break;

+ case SUN4V_CHIP_SPARC_M6:
+ sparc_cpu_type = "SPARC-M6";
+ sparc_fpu_type = "SPARC-M6 integrated FPU";
+ sparc_pmu_type = "sparc-m6";
+ break;
+
+ case SUN4V_CHIP_SPARC_M7:
+ sparc_cpu_type = "SPARC-M7";
+ sparc_fpu_type = "SPARC-M7 integrated FPU";
+ sparc_pmu_type = "sparc-m7";
+ break;
+
case SUN4V_CHIP_SPARC64X:
sparc_cpu_type = "SPARC64-X";
sparc_fpu_type = "SPARC64-X integrated FPU";
diff --git a/arch/sparc/kernel/head_64.S b/arch/sparc/kernel/head_64.S
index 452f04f..4fdeb80 100644
--- a/arch/sparc/kernel/head_64.S
+++ b/arch/sparc/kernel/head_64.S
@@ -427,6 +427,12 @@ sun4v_chip_type:
cmp %g2, '5'
be,pt %xcc, 5f
mov SUN4V_CHIP_NIAGARA5, %g4
+ cmp %g2, '6'
+ be,pt %xcc, 5f
+ mov SUN4V_CHIP_SPARC_M6, %g4
+ cmp %g2, '7'
+ be,pt %xcc, 5f
+ mov SUN4V_CHIP_SPARC_M7, %g4
ba,pt %xcc, 49f
nop

@@ -585,6 +591,12 @@ niagara_tlb_fixup:
cmp %g1, SUN4V_CHIP_NIAGARA5
be,pt %xcc, niagara4_patch
nop
+ cmp %g1, SUN4V_CHIP_SPARC_M6
+ be,pt %xcc, niagara4_patch
+ nop
+ cmp %g1, SUN4V_CHIP_SPARC_M7
+ be,pt %xcc, niagara4_patch
+ nop

call generic_patch_copyops
nop
--
1.9.1

2014-11-06 22:42:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 039/162] sparc64: Do not define thread fpregs save area as zero-length array.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit e2653143d7d79a49f1a961aeae1d82612838b12c ]

This breaks the stack end corruption detection facility.

What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.

"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.

So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.

Fix this by making the size explicit.

Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves. So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.

Reported-by: Meelis Roos <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/thread_info_64.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index f85dc85..cc6275c 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -63,7 +63,8 @@ struct thread_info {
struct pt_regs *kern_una_regs;
unsigned int kern_una_insn;

- unsigned long fpregs[0] __attribute__ ((aligned(64)));
+ unsigned long fpregs[(7 * 256) / sizeof(unsigned long)]
+ __attribute__ ((aligned(64)));
};

#endif /* !(__ASSEMBLY__) */
--
1.9.1

2014-11-06 22:42:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 111/162] iio: adc: mxs-lradc: Disable the clock on probe failure

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Fabio Estevam <[email protected]>

commit 75d7ed3b9e7cb79a3b0e1f417fb674d54b4fc668 upstream.

We should disable lradc->clk in the case of errors in the probe function.

Signed-off-by: Fabio Estevam <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/staging/iio/adc/mxs-lradc.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c
index cef5dde..da4f0b1 100644
--- a/drivers/staging/iio/adc/mxs-lradc.c
+++ b/drivers/staging/iio/adc/mxs-lradc.c
@@ -1307,14 +1307,16 @@ static int mxs_lradc_probe(struct platform_device *pdev)
/* Grab all IRQ sources */
for (i = 0; i < of_cfg->irq_count; i++) {
lradc->irq[i] = platform_get_irq(pdev, i);
- if (lradc->irq[i] < 0)
- return lradc->irq[i];
+ if (lradc->irq[i] < 0) {
+ ret = lradc->irq[i];
+ goto err_clk;
+ }

ret = devm_request_irq(dev, lradc->irq[i],
mxs_lradc_handle_irq, 0,
of_cfg->irq_name[i], iio);
if (ret)
- return ret;
+ goto err_clk;
}

platform_set_drvdata(pdev, iio);
@@ -1334,7 +1336,7 @@ static int mxs_lradc_probe(struct platform_device *pdev)
&mxs_lradc_trigger_handler,
&mxs_lradc_buffer_ops);
if (ret)
- return ret;
+ goto err_clk;

ret = mxs_lradc_trigger_init(iio);
if (ret)
@@ -1369,6 +1371,8 @@ err_dev:
mxs_lradc_trigger_remove(iio);
err_trig:
iio_triggered_buffer_cleanup(iio);
+err_clk:
+ clk_disable_unprepare(lradc->clk);
return ret;
}

--
1.9.1

2014-11-06 22:42:06

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 100/162] KVM: x86: Emulator fixes for eip canonical checks on near branches

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream.

Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do. During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur. The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.

This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.

This fixes CVE-2014-3647.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/emulate.c | 78 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 54 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 44fc76b..38d3751 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -571,7 +571,8 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, NM_VECTOR, 0, false);
}

-static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
+ int cs_l)
{
switch (ctxt->op_bytes) {
case 2:
@@ -581,16 +582,25 @@ static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
ctxt->_eip = (u32)dst;
break;
case 8:
+ if ((cs_l && is_noncanonical_address(dst)) ||
+ (!cs_l && (dst & ~(u32)-1)))
+ return emulate_gp(ctxt, 0);
ctxt->_eip = dst;
break;
default:
WARN(1, "unsupported eip assignment size\n");
}
+ return X86EMUL_CONTINUE;
+}
+
+static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+{
+ return assign_eip_far(ctxt, dst, ctxt->mode == X86EMUL_MODE_PROT64);
}

-static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
+static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
{
- assign_eip_near(ctxt, ctxt->_eip + rel);
+ return assign_eip_near(ctxt, ctxt->_eip + rel);
}

static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
@@ -1975,13 +1985,15 @@ static int em_grp45(struct x86_emulate_ctxt *ctxt)
case 2: /* call near abs */ {
long int old_eip;
old_eip = ctxt->_eip;
- ctxt->_eip = ctxt->src.val;
+ rc = assign_eip_near(ctxt, ctxt->src.val);
+ if (rc != X86EMUL_CONTINUE)
+ break;
ctxt->src.val = old_eip;
rc = em_push(ctxt);
break;
}
case 4: /* jmp abs */
- ctxt->_eip = ctxt->src.val;
+ rc = assign_eip_near(ctxt, ctxt->src.val);
break;
case 5: /* jmp far */
rc = em_jmp_far(ctxt);
@@ -2013,10 +2025,14 @@ static int em_cmpxchg8b(struct x86_emulate_ctxt *ctxt)

static int em_ret(struct x86_emulate_ctxt *ctxt)
{
- ctxt->dst.type = OP_REG;
- ctxt->dst.addr.reg = &ctxt->_eip;
- ctxt->dst.bytes = ctxt->op_bytes;
- return em_pop(ctxt);
+ int rc;
+ unsigned long eip;
+
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ return assign_eip_near(ctxt, eip);
}

static int em_ret_far(struct x86_emulate_ctxt *ctxt)
@@ -2294,7 +2310,7 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
{
const struct x86_emulate_ops *ops = ctxt->ops;
struct desc_struct cs, ss;
- u64 msr_data;
+ u64 msr_data, rcx, rdx;
int usermode;
u16 cs_sel = 0, ss_sel = 0;

@@ -2310,6 +2326,9 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
else
usermode = X86EMUL_MODE_PROT32;

+ rcx = reg_read(ctxt, VCPU_REGS_RCX);
+ rdx = reg_read(ctxt, VCPU_REGS_RDX);
+
cs.dpl = 3;
ss.dpl = 3;
ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data);
@@ -2327,6 +2346,9 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
ss_sel = cs_sel + 8;
cs.d = 0;
cs.l = 1;
+ if (is_noncanonical_address(rcx) ||
+ is_noncanonical_address(rdx))
+ return emulate_gp(ctxt, 0);
break;
}
cs_sel |= SELECTOR_RPL_MASK;
@@ -2335,8 +2357,8 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);

- ctxt->_eip = reg_read(ctxt, VCPU_REGS_RDX);
- *reg_write(ctxt, VCPU_REGS_RSP) = reg_read(ctxt, VCPU_REGS_RCX);
+ ctxt->_eip = rdx;
+ *reg_write(ctxt, VCPU_REGS_RSP) = rcx;

return X86EMUL_CONTINUE;
}
@@ -2875,10 +2897,13 @@ static int em_aad(struct x86_emulate_ctxt *ctxt)

static int em_call(struct x86_emulate_ctxt *ctxt)
{
+ int rc;
long rel = ctxt->src.val;

ctxt->src.val = (unsigned long)ctxt->_eip;
- jmp_rel(ctxt, rel);
+ rc = jmp_rel(ctxt, rel);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
return em_push(ctxt);
}

@@ -2910,11 +2935,12 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
{
int rc;
+ unsigned long eip;

- ctxt->dst.type = OP_REG;
- ctxt->dst.addr.reg = &ctxt->_eip;
- ctxt->dst.bytes = ctxt->op_bytes;
- rc = emulate_pop(ctxt, &ctxt->dst.val, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+ rc = assign_eip_near(ctxt, eip);
if (rc != X86EMUL_CONTINUE)
return rc;
rsp_increment(ctxt, ctxt->src.val);
@@ -3244,20 +3270,24 @@ static int em_lmsw(struct x86_emulate_ctxt *ctxt)

static int em_loop(struct x86_emulate_ctxt *ctxt)
{
+ int rc = X86EMUL_CONTINUE;
+
register_address_increment(ctxt, reg_rmw(ctxt, VCPU_REGS_RCX), -1);
if ((address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) != 0) &&
(ctxt->b == 0xe2 || test_cc(ctxt->b ^ 0x5, ctxt->eflags)))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);

- return X86EMUL_CONTINUE;
+ return rc;
}

static int em_jcxz(struct x86_emulate_ctxt *ctxt)
{
+ int rc = X86EMUL_CONTINUE;
+
if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0)
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);

- return X86EMUL_CONTINUE;
+ return rc;
}

static int em_in(struct x86_emulate_ctxt *ctxt)
@@ -4654,7 +4684,7 @@ special_insn:
break;
case 0x70 ... 0x7f: /* jcc (short) */
if (test_cc(ctxt->b, ctxt->eflags))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
break;
case 0x8d: /* lea r16/r32, m */
ctxt->dst.val = ctxt->src.addr.mem.ea;
@@ -4683,7 +4713,7 @@ special_insn:
break;
case 0xe9: /* jmp rel */
case 0xeb: /* jmp rel short */
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
ctxt->dst.type = OP_NONE; /* Disable writeback. */
break;
case 0xf4: /* hlt */
@@ -4803,7 +4833,7 @@ twobyte_insn:
break;
case 0x80 ... 0x8f: /* jnz rel, etc*/
if (test_cc(ctxt->b, ctxt->eflags))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
break;
case 0x90 ... 0x9f: /* setcc r/m8 */
ctxt->dst.val = test_cc(ctxt->b, ctxt->eflags);
--
1.9.1

2014-11-06 22:42:14

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 157/162] wireless: rt2x00: add new rt2800usb device

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Cyril Brulebois <[email protected]>

commit 664d6a792785cc677c2091038ce10322c8d04ae1 upstream.

0x1b75 0xa200 AirLive WN-200USB wireless 11b/g/n dongle

References: https://bugs.debian.org/766802
Reported-by: Martin Mokrejs <[email protected]>
Signed-off-by: Cyril Brulebois <[email protected]>
Acked-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/wireless/rt2x00/rt2800usb.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index a81ceb6..24da6e8 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -1064,6 +1064,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
/* Ovislink */
{ USB_DEVICE(0x1b75, 0x3071) },
{ USB_DEVICE(0x1b75, 0x3072) },
+ { USB_DEVICE(0x1b75, 0xa200) },
/* Para */
{ USB_DEVICE(0x20b8, 0x8888) },
/* Pegatron */
--
1.9.1

2014-11-06 22:42:21

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 103/162] kvm: fix excessive pages un-pinning in kvm_iommu_map error path.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Quentin Casasnovas <[email protected]>

commit 3d32e4dbe71374a6780eaf51d719d76f9a9bf22f upstream.

The third parameter of kvm_unpin_pages() when called from
kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
and not the page size.

This error was facilitated with an inconsistent API: kvm_pin_pages() takes
a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
by matching the two.

This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
un-pinning for pages intended to be un-pinned (i.e. memory leak) but
unfortunately potentially aggravated the number of pages we un-pin that
should have stayed pinned. As far as I understand though, the same
practical mitigations apply.

This issue was found during review of Red Hat 6.6 patches to prepare
Ksplice rebootless updates.

Thanks to Vegard for his time on a late Friday evening to help me in
understanding this code.

Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
Signed-off-by: Quentin Casasnovas <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Reviewed-by: Sasha Levin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
virt/kvm/iommu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 714b949..1f0dc1e 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -43,13 +43,13 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
gfn_t base_gfn, unsigned long npages);

static pfn_t kvm_pin_pages(struct kvm_memory_slot *slot, gfn_t gfn,
- unsigned long size)
+ unsigned long npages)
{
gfn_t end_gfn;
pfn_t pfn;

pfn = gfn_to_pfn_memslot(slot, gfn);
- end_gfn = gfn + (size >> PAGE_SHIFT);
+ end_gfn = gfn + npages;
gfn += 1;

if (is_error_noslot_pfn(pfn))
@@ -119,7 +119,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
* Pin all pages we are about to map in memory. This is
* important because we unmap and unpin in 4kb steps later.
*/
- pfn = kvm_pin_pages(slot, gfn, page_size);
+ pfn = kvm_pin_pages(slot, gfn, page_size >> PAGE_SHIFT);
if (is_error_noslot_pfn(pfn)) {
gfn += 1;
continue;
@@ -131,7 +131,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
if (r) {
printk(KERN_ERR "kvm_iommu_map_address:"
"iommu failed to map pfn=%llx\n", pfn);
- kvm_unpin_pages(kvm, pfn, page_size);
+ kvm_unpin_pages(kvm, pfn, page_size >> PAGE_SHIFT);
goto unmap_pages;
}

--
1.9.1

2014-11-06 22:42:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 113/162] mac80211: fix typo in starting baserate for rts_cts_rate_idx

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Karl Beldan <[email protected]>

commit c7abf25af0f41be4b50d44c5b185d52eea360cb8 upstream.

It affects non-(V)HT rates and can lead to selecting an rts_cts rate
that is not a basic rate or way superior to the reference rate (ATM
rates[0] used for the 1st attempt of the protected frame data).

E.g, assuming drivers register growing (bitrate) sorted tables of
ieee80211_rate-s, having :
- rates[0].idx == d'2 and basic_rates == b'10100
will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise
- rates[0].idx == d'2 and basic_rates == b'10001
will select rts_cts idx b'10000
The first is not a basic rate and the second is > rates[0].

Also, wrt severity of the addressed misbehavior, ATM we only have one
rts_cts_rate_idx rather than one per rate table entry, so this idx might
still point to bitrates > rates[1..MAX_RATES].

Fixes: 5253ffb8c9e1 ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates")
Signed-off-by: Karl Beldan <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/mac80211/rate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index 22b223f..74350c3 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -462,7 +462,7 @@ static void rate_fixup_ratelist(struct ieee80211_vif *vif,
*/
if (!(rates[0].flags & IEEE80211_TX_RC_MCS)) {
u32 basic_rates = vif->bss_conf.basic_rates;
- s8 baserate = basic_rates ? ffs(basic_rates - 1) : 0;
+ s8 baserate = basic_rates ? ffs(basic_rates) - 1 : 0;

rate = &sband->bitrates[rates[0].idx];

--
1.9.1

2014-11-06 22:42:53

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 112/162] spi: pl022: Fix incorrect dma_unmap_sg

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Ray Jui <[email protected]>

commit 3ffa6158f002e096d28ede71be4e0ee8ab20baa2 upstream.

When mapped RX DMA entries are unmapped in an error condition when DMA
is firstly configured in the driver, the number of TX DMA entries was
passed in, which is incorrect

Signed-off-by: Ray Jui <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/spi/spi-pl022.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spi-pl022.c b/drivers/spi/spi-pl022.c
index 2789b45..971855e 100644
--- a/drivers/spi/spi-pl022.c
+++ b/drivers/spi/spi-pl022.c
@@ -1075,7 +1075,7 @@ err_rxdesc:
pl022->sgt_tx.nents, DMA_TO_DEVICE);
err_tx_sgmap:
dma_unmap_sg(rxchan->device->dev, pl022->sgt_rx.sgl,
- pl022->sgt_tx.nents, DMA_FROM_DEVICE);
+ pl022->sgt_rx.nents, DMA_FROM_DEVICE);
err_rx_sgmap:
sg_free_table(&pl022->sgt_tx);
err_alloc_tx_sg:
--
1.9.1

2014-11-06 22:42:39

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 159/162] tracing/syscalls: Ignore numbers outside NR_syscalls' range

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Rabin Vincent <[email protected]>

commit 086ba77a6db00ed858ff07451bedee197df868c9 upstream.

ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls. If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264
true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
true-653 [000] 384.675988: sys_exit: NR 983045 = 0
...

# trace-cmd record -e syscalls:* true
[ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace
[ 17.289590] pgd = 9e71c000
[ 17.289696] [aaaaaace] *pgd=00000000
[ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 17.290169] Modules linked in:
[ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
[ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[ 17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/[email protected]

Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
kernel/trace/trace_syscalls.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index ea90eb5..7708669 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -313,7 +313,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
int size;

syscall_nr = trace_get_syscall_nr(current, regs);
- if (syscall_nr < 0)
+ if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;

/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
@@ -361,7 +361,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
int syscall_nr;

syscall_nr = trace_get_syscall_nr(current, regs);
- if (syscall_nr < 0)
+ if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;

/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */
@@ -569,7 +569,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
int size;

syscall_nr = trace_get_syscall_nr(current, regs);
- if (syscall_nr < 0)
+ if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
if (!test_bit(syscall_nr, enabled_perf_enter_syscalls))
return;
@@ -643,7 +643,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
int size;

syscall_nr = trace_get_syscall_nr(current, regs);
- if (syscall_nr < 0)
+ if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
return;
--
1.9.1

2014-11-06 22:42:57

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 106/162] KVM: x86: Fix far-jump to non-canonical check

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 7e46dddd6f6cd5dbf3c7bd04a7e75d19475ac9f2 upstream.

Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete. Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP. As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.

Fixes: d1442d85cc30ea75f7d399474ca738e0bc96f715
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
[ kamal: backport to 3.13-stable: omitted WARN_ON fixes (not in 3.13) ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/emulate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a440bea..4ae37e7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -581,12 +581,14 @@ static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
case 4:
ctxt->_eip = (u32)dst;
break;
+#ifdef CONFIG_X86_64
case 8:
if ((cs_l && is_noncanonical_address(dst)) ||
- (!cs_l && (dst & ~(u32)-1)))
+ (!cs_l && (dst >> 32) != 0))
return emulate_gp(ctxt, 0);
ctxt->_eip = dst;
break;
+#endif
default:
WARN(1, "unsupported eip assignment size\n");
}
--
1.9.1

2014-11-06 22:43:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 110/162] iio: mxs-lradc: Propagate the real error code on platform_get_irq() failure

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Fabio Estevam <[email protected]>

commit 9610e08e97370ca7780bea977b03355fa0261635 upstream.

No need to return a 'fake' return value on platform_get_irq() failure.

Just return the error code itself instead.

Signed-off-by: Fabio Estevam <[email protected]>
Acked-by: Marek Vasut <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
[ kamal: 3.13-stable prereq for
75d7ed3 iio: adc: mxs-lradc: Disable the clock on probe failure ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/staging/iio/adc/mxs-lradc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c
index 558a76c..cef5dde 100644
--- a/drivers/staging/iio/adc/mxs-lradc.c
+++ b/drivers/staging/iio/adc/mxs-lradc.c
@@ -1308,7 +1308,7 @@ static int mxs_lradc_probe(struct platform_device *pdev)
for (i = 0; i < of_cfg->irq_count; i++) {
lradc->irq[i] = platform_get_irq(pdev, i);
if (lradc->irq[i] < 0)
- return -EINVAL;
+ return lradc->irq[i];

ret = devm_request_irq(dev, lradc->irq[i],
mxs_lradc_handle_irq, 0,
--
1.9.1

2014-11-06 22:43:08

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 102/162] kvm: x86: don't kill guest on unknown exit reason

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "Michael S. Tsirkin" <[email protected]>

commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream.

KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application. Let's not kill the guest: WARN
and inject #UD instead.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/svm.c | 6 +++---
arch/x86/kvm/vmx.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 121bd0d..1f5faa5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3482,9 +3482,9 @@ static int handle_exit(struct kvm_vcpu *vcpu)

if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
|| !svm_exit_handlers[exit_code]) {
- kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
- kvm_run->hw.hardware_exit_reason = exit_code;
- return 0;
+ WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_code);
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
}

return svm_exit_handlers[exit_code](svm);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4b08b..f5c384c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6854,10 +6854,10 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu);
else {
- vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
- vcpu->run->hw.hardware_exit_reason = exit_reason;
+ WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
}
- return 0;
}

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
--
1.9.1

2014-11-06 22:43:14

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 096/162] KVM: x86: Check non-canonical addresses upon WRMSR

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 854e8bb1aa06c578c2c9145fa6bfe3680ef63b23 upstream.

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 27 ++++++++++++++++++++++++++-
4 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1ee50a4..7b39dfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -981,6 +981,20 @@ static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
}

+static inline u64 get_canonical(u64 la)
+{
+ return ((int64_t)la << 16) >> 16;
+}
+
+static inline bool is_noncanonical_address(u64 la)
+{
+#ifdef CONFIG_X86_64
+ return get_canonical(la) != la;
+#else
+ return false;
+#endif
+}
+
#define TSS_IOPB_BASE_OFFSET 0x66
#define TSS_BASE_SIZE 0x68
#define TSS_IOPB_SIZE (65536 / 8)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 532add1..121bd0d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3200,7 +3200,7 @@ static int wrmsr_interception(struct vcpu_svm *svm)
msr.host_initiated = false;

svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
- if (svm_set_msr(&svm->vcpu, &msr)) {
+ if (kvm_set_msr(&svm->vcpu, &msr)) {
trace_kvm_msr_write_ex(ecx, data);
kvm_inject_gp(&svm->vcpu, 0);
} else {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c11b1ad..21cacdc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5135,7 +5135,7 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu)
msr.data = data;
msr.index = ecx;
msr.host_initiated = false;
- if (vmx_set_msr(vcpu, &msr) != 0) {
+ if (kvm_set_msr(vcpu, &msr) != 0) {
trace_kvm_msr_write_ex(ecx, data);
kvm_inject_gp(vcpu, 0);
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e33b85..f435e75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -919,7 +919,6 @@ void kvm_enable_efer_bits(u64 mask)
}
EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);

-
/*
* Writes msr value into into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -927,8 +926,34 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
*/
int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
+ switch (msr->index) {
+ case MSR_FS_BASE:
+ case MSR_GS_BASE:
+ case MSR_KERNEL_GS_BASE:
+ case MSR_CSTAR:
+ case MSR_LSTAR:
+ if (is_noncanonical_address(msr->data))
+ return 1;
+ break;
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_IA32_SYSENTER_ESP:
+ /*
+ * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
+ * non-canonical address is written on Intel but not on
+ * AMD (which ignores the top 32-bits, because it does
+ * not implement 64-bit SYSENTER).
+ *
+ * 64-bit code should hence be able to write a non-canonical
+ * value on AMD. Making the address canonical ensures that
+ * vmentry does not fail on Intel after writing a non-canonical
+ * value, and that something deterministic happens if the guest
+ * invokes 64-bit SYSENTER.
+ */
+ msr->data = get_canonical(msr->data);
+ }
return kvm_x86_ops->set_msr(vcpu, msr);
}
+EXPORT_SYMBOL_GPL(kvm_set_msr);

/*
* Adapt set_msr() to msr_io()'s calling convention
--
1.9.1

2014-11-06 22:43:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 099/162] KVM: x86: Fix wrong masking on relative jump/call

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 05c83ec9b73c8124555b706f6af777b10adf0862 upstream.

Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/emulate.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7bff3e2..44fc76b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -498,11 +498,6 @@ static void rsp_increment(struct x86_emulate_ctxt *ctxt, int inc)
masked_increment(reg_rmw(ctxt, VCPU_REGS_RSP), stack_mask(ctxt), inc);
}

-static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
-{
- register_address_increment(ctxt, &ctxt->_eip, rel);
-}
-
static u32 desc_limit_scaled(struct desc_struct *desc)
{
u32 limit = get_desc_limit(desc);
@@ -576,6 +571,28 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, NM_VECTOR, 0, false);
}

+static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+{
+ switch (ctxt->op_bytes) {
+ case 2:
+ ctxt->_eip = (u16)dst;
+ break;
+ case 4:
+ ctxt->_eip = (u32)dst;
+ break;
+ case 8:
+ ctxt->_eip = dst;
+ break;
+ default:
+ WARN(1, "unsupported eip assignment size\n");
+ }
+}
+
+static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
+{
+ assign_eip_near(ctxt, ctxt->_eip + rel);
+}
+
static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
{
u16 selector;
--
1.9.1

2014-11-06 22:43:18

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 093/162] intel_pstate: Fix BYT frequency reporting

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dirk Brandewie <[email protected]>

commit b27580b05e6f5253228debc60b8ff4a786ff573a upstream.

BYT has a different conversion from P state to frequency than the core
processors. This causes the min/max and current frequency to be
misreported on some BYT SKUs. Tested on BYT N2820, Ivybridge and
Haswell processors.

Link: https://bugzilla.yoctoproject.org/show_bug.cgi?id=6663
Signed-off-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 43 ++++++++++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4469fed..2cb7028 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -65,6 +65,7 @@ struct pstate_data {
int current_pstate;
int min_pstate;
int max_pstate;
+ int scaling;
int turbo_pstate;
};

@@ -118,6 +119,7 @@ struct pstate_funcs {
int (*get_max)(void);
int (*get_min)(void);
int (*get_turbo)(void);
+ int (*get_scaling)(void);
void (*set)(struct cpudata*, int pstate);
void (*get_vid)(struct cpudata *);
};
@@ -436,6 +438,22 @@ static void byt_set_pstate(struct cpudata *cpudata, int pstate)
wrmsrl(MSR_IA32_PERF_CTL, val);
}

+#define BYT_BCLK_FREQS 5
+static int byt_freq_table[BYT_BCLK_FREQS] = { 833, 1000, 1333, 1167, 800};
+
+static int byt_get_scaling(void)
+{
+ u64 value;
+ int i;
+
+ rdmsrl(MSR_FSB_FREQ, value);
+ i = value & 0x3;
+
+ BUG_ON(i > BYT_BCLK_FREQS);
+
+ return byt_freq_table[i] * 100;
+}
+
static void byt_get_vid(struct cpudata *cpudata)
{
u64 value;
@@ -480,6 +498,11 @@ static int core_get_turbo_pstate(void)
return ret;
}

+static inline int core_get_scaling(void)
+{
+ return 100000;
+}
+
static void core_set_pstate(struct cpudata *cpudata, int pstate)
{
u64 val;
@@ -504,6 +527,7 @@ static struct cpu_defaults core_params = {
.get_max = core_get_max_pstate,
.get_min = core_get_min_pstate,
.get_turbo = core_get_turbo_pstate,
+ .get_scaling = core_get_scaling,
.set = core_set_pstate,
},
};
@@ -522,6 +546,7 @@ static struct cpu_defaults byt_params = {
.get_min = byt_get_min_pstate,
.get_turbo = byt_get_turbo_pstate,
.set = byt_set_pstate,
+ .get_scaling = byt_get_scaling,
.get_vid = byt_get_vid,
},
};
@@ -558,7 +583,7 @@ static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate)
if (pstate == cpu->pstate.current_pstate)
return;

- trace_cpu_frequency(pstate * 100000, cpu->cpu);
+ trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);

cpu->pstate.current_pstate = pstate;

@@ -587,6 +612,7 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
cpu->pstate.min_pstate = pstate_funcs.get_min();
cpu->pstate.max_pstate = pstate_funcs.get_max();
cpu->pstate.turbo_pstate = pstate_funcs.get_turbo();
+ cpu->pstate.scaling = pstate_funcs.get_scaling();

if (pstate_funcs.get_vid)
pstate_funcs.get_vid(cpu);
@@ -604,7 +630,10 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu,
u64 core_pct;
core_pct = div64_u64(int_tofp(sample->aperf * 100),
sample->mperf);
- sample->freq = fp_toint(cpu->pstate.max_pstate * core_pct * 1000);
+ sample->freq = fp_toint(
+ mul_fp(int_tofp(
+ cpu->pstate.max_pstate * cpu->pstate.scaling / 100),
+ core_pct));

sample->core_pct_busy = core_pct;
}
@@ -820,12 +849,13 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
else
policy->policy = CPUFREQ_POLICY_POWERSAVE;

- policy->min = cpu->pstate.min_pstate * 100000;
- policy->max = cpu->pstate.turbo_pstate * 100000;
+ policy->min = cpu->pstate.min_pstate * cpu->pstate.scaling;
+ policy->max = cpu->pstate.turbo_pstate * cpu->pstate.scaling;

/* cpuinfo and default policy values */
- policy->cpuinfo.min_freq = cpu->pstate.min_pstate * 100000;
- policy->cpuinfo.max_freq = cpu->pstate.turbo_pstate * 100000;
+ policy->cpuinfo.min_freq = cpu->pstate.min_pstate * cpu->pstate.scaling;
+ policy->cpuinfo.max_freq =
+ cpu->pstate.turbo_pstate * cpu->pstate.scaling;
policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
cpumask_set_cpu(policy->cpu, policy->cpus);

@@ -883,6 +913,7 @@ static void copy_cpu_funcs(struct pstate_funcs *funcs)
pstate_funcs.get_max = funcs->get_max;
pstate_funcs.get_min = funcs->get_min;
pstate_funcs.get_turbo = funcs->get_turbo;
+ pstate_funcs.get_scaling = funcs->get_scaling;
pstate_funcs.set = funcs->set;
pstate_funcs.get_vid = funcs->get_vid;
}
--
1.9.1

2014-11-06 22:43:22

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 094/162] intel_pstate: Correct BYT VID values.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dirk Brandewie <[email protected]>

commit d022a65ed2473fac4a600e3424503dc571160a3e upstream.

Using a VID value that is not high enough for the requested P state can
cause machine checks. Add a ceiling function to ensure calulated VIDs
with fractional values are set to the next highest integer VID value.

The algorythm for calculating the non-trubo VID from the BIOS writers
guide is:
vid_ratio = (vid_max - vid_min) / (max_pstate - min_pstate)
vid = ceiling(vid_min + (req_pstate - min_pstate) * vid_ratio)

Signed-off-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2cb7028..4e8949d 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -54,6 +54,17 @@ static inline int32_t div_fp(int32_t x, int32_t y)
return div_s64((int64_t)x << FRAC_BITS, (int64_t)y);
}

+static inline int ceiling_fp(int32_t x)
+{
+ int mask, ret;
+
+ ret = fp_toint(x);
+ mask = (1 << FRAC_BITS) - 1;
+ if (x & mask)
+ ret += 1;
+ return ret;
+}
+
struct sample {
int32_t core_pct_busy;
u64 aperf;
@@ -428,7 +439,7 @@ static void byt_set_pstate(struct cpudata *cpudata, int pstate)
cpudata->vid.ratio);

vid_fp = clamp_t(int32_t, vid_fp, cpudata->vid.min, cpudata->vid.max);
- vid = fp_toint(vid_fp);
+ vid = ceiling_fp(vid_fp);

if (pstate > cpudata->pstate.max_pstate)
vid = cpudata->vid.turbo;
--
1.9.1

2014-11-06 22:43:30

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 084/162] freezer: Do not freeze tasks killed by OOM killer

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Cong Wang <[email protected]>

commit 51fae6da640edf9d266c94f36bc806c63c301991 upstream.

Since f660daac474c6f (oom: thaw threads if oom killed thread is frozen
before deferring) OOM killer relies on being able to thaw a frozen task
to handle OOM situation but a3201227f803 (freezer: make freezing() test
freeze conditions in effect instead of TIF_FREEZE) has reorganized the
code and stopped clearing freeze flag in __thaw_task. This means that
the target task only wakes up and goes into the fridge again because the
freezing condition hasn't changed for it. This reintroduces the bug
fixed by f660daac474c6f.

Fix the issue by checking for TIF_MEMDIE thread flag in
freezing_slow_path and exclude the task from freezing completely. If a
task was already frozen it would get woken by __thaw_task from OOM killer
and get out of freezer after rechecking freezing().

Changes since v1
- put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
as per Oleg
- return __thaw_task into oom_scan_process_thread because
oom_kill_process will not wake task in the fridge because it is
sleeping uninterruptible

[[email protected]: rewrote the changelog]
Fixes: a3201227f803 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
kernel/freezer.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index aa6a8aa..8f9279b 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -42,6 +42,9 @@ bool freezing_slow_path(struct task_struct *p)
if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK))
return false;

+ if (test_thread_flag(TIF_MEMDIE))
+ return false;
+
if (pm_nosig_freezing || cgroup_freezing(p))
return true;

--
1.9.1

2014-11-06 22:43:42

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 088/162] drm/cirrus: bind also to qemu-xen-traditional

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Olaf Hering <[email protected]>

commit c0c3e735fa7bae29c6623511127fd021b2d6d849 upstream.

qemu as used by xend/xm toolstack uses a different subvendor id.
Bind the drm driver also to this emulated card.

Signed-off-by: Olaf Hering <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/cirrus/cirrus_drv.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.c b/drivers/gpu/drm/cirrus/cirrus_drv.c
index 953fc8a..8555548 100644
--- a/drivers/gpu/drm/cirrus/cirrus_drv.c
+++ b/drivers/gpu/drm/cirrus/cirrus_drv.c
@@ -31,6 +31,8 @@ static struct drm_driver driver;
static DEFINE_PCI_DEVICE_TABLE(pciidlist) = {
{ PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, 0x1af4, 0x1100, 0,
0, 0 },
+ { PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, PCI_VENDOR_ID_XEN,
+ 0x0001, 0, 0, 0 },
{0,}
};

--
1.9.1

2014-11-06 22:43:56

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 069/162] ext4: fix reservation overflow in ext4_da_write_begin

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <[email protected]>

commit 0ff8947fc5f700172b37cbca811a38eb9cb81e08 upstream.

Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a4c4f38..37a8c8b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2638,6 +2638,20 @@ static int ext4_nonda_switch(struct super_block *sb)
return 0;
}

+/* We always reserve for an inode update; the superblock could be there too */
+static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
+{
+ if (likely(EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_LARGE_FILE)))
+ return 1;
+
+ if (pos + len <= 0x7fffffffULL)
+ return 1;
+
+ /* We might need to update the superblock to set LARGE_FILE */
+ return 2;
+}
+
static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -2688,7 +2702,8 @@ retry_grab:
* of file which has an already mapped buffer.
*/
retry_journal:
- handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+ ext4_da_write_credits(inode, pos, len));
if (IS_ERR(handle)) {
page_cache_release(page);
return PTR_ERR(handle);
--
1.9.1

2014-11-06 22:43:59

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 083/162] arm64: compat: fix compat types affecting struct compat_elf_prpsinfo

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Victor Kamensky <[email protected]>

commit 971a5b6fe634bb7b617d8c5f25b6a3ddbc600194 upstream.

The compat_elf_prpsinfo structure does not match the arch/arm struct
elf_pspsinfo definition. As result NT_PRPSINFO note in core file
created by arm64 kernel for aarch32 (compat) process has wrong size.
So gdb cannot display command that caused process crash.

Fix is to change size of __compat_uid_t, __compat_gid_t so it would
match size of similar fields in arch/arm case.

Signed-off-by: Victor Kamensky <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/arm64/include/asm/compat.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index fda2704..e72289a 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -37,8 +37,8 @@ typedef s32 compat_ssize_t;
typedef s32 compat_time_t;
typedef s32 compat_clock_t;
typedef s32 compat_pid_t;
-typedef u32 __compat_uid_t;
-typedef u32 __compat_gid_t;
+typedef u16 __compat_uid_t;
+typedef u16 __compat_gid_t;
typedef u16 __compat_uid16_t;
typedef u16 __compat_gid16_t;
typedef u32 __compat_uid32_t;
--
1.9.1

2014-11-06 22:44:06

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 081/162] ALSA: usb-audio: Add support for Steinberg UR22 USB interface

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Catoi <[email protected]>

commit f0b127fbfdc8756eba7437ab668f3169280bd358 upstream.

Adding support for Steinberg UR22 USB interface via quirks table patch

See Ubuntu bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317244
Also see threads:
http://linux-audio.4202.n7.nabble.com/Support-for-Steinberg-UR22-Yamaha-USB-chipset-0499-1509-tc82888.html#a82917
http://www.steinberg.net/forums/viewtopic.php?t=62290

Tested by at least 4 people judging by the threads.
Did not test MIDI interface, but audio output and capture both are
functional. Built 3.17 kernel with this driver on Ubuntu 14.04 & tested with mpg123
Patch applied to 3.13 Ubuntu kernel works well enough for daily use.

Signed-off-by: Vlad Catoi <[email protected]>
Acked-by: Clemens Ladisch <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
sound/usb/quirks-table.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h
index 0e52836..8590a32 100644
--- a/sound/usb/quirks-table.h
+++ b/sound/usb/quirks-table.h
@@ -386,6 +386,36 @@ YAMAHA_DEVICE(0x105d, NULL),
}
},
{
+ USB_DEVICE(0x0499, 0x1509),
+ .driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
+ /* .vendor_name = "Yamaha", */
+ /* .product_name = "Steinberg UR22", */
+ .ifnum = QUIRK_ANY_INTERFACE,
+ .type = QUIRK_COMPOSITE,
+ .data = (const struct snd_usb_audio_quirk[]) {
+ {
+ .ifnum = 1,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 2,
+ .type = QUIRK_AUDIO_STANDARD_INTERFACE
+ },
+ {
+ .ifnum = 3,
+ .type = QUIRK_MIDI_YAMAHA
+ },
+ {
+ .ifnum = 4,
+ .type = QUIRK_IGNORE_INTERFACE
+ },
+ {
+ .ifnum = -1
+ }
+ }
+ }
+},
+{
USB_DEVICE(0x0499, 0x150a),
.driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
/* .vendor_name = "Yamaha", */
--
1.9.1

2014-11-06 22:44:16

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 029/162] sparc64: Fix pcr_ops initialization and usage bugs.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 8bccf5b313180faefce38e0d1140f76e0f327d28 ]

Christopher reports that perf_event_print_debug() can crash in uniprocessor
builds. The crash is due to pcr_ops being NULL.

This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
only executes in SMP builds.

init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
therefore:

1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
from smp_cpus_done().

2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().

3) Move init_hw_perf_events to a later initcall so that it we will be
sure to invoke pcr_arch_init() after all cpus are brought up.

Finally, guard the one naked sequence of pcr_ops dereferences in
__global_pmu_self() with an appropriate NULL check.

Reported-by: Christopher Alexander Tobias Schulze <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/perf_event.c | 7 +++++--
arch/sparc/kernel/process_64.c | 3 +++
arch/sparc/kernel/smp_64.c | 1 -
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index b5c38fa..857baca 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1671,9 +1671,12 @@ static bool __init supported_pmu(void)

int __init init_hw_perf_events(void)
{
+ int err;
+
pr_info("Performance events: ");

- if (!supported_pmu()) {
+ err = pcr_arch_init();
+ if (err || !supported_pmu()) {
pr_cont("No support for PMU type '%s'\n", sparc_pmu_type);
return 0;
}
@@ -1685,7 +1688,7 @@ int __init init_hw_perf_events(void)

return 0;
}
-early_initcall(init_hw_perf_events);
+pure_initcall(init_hw_perf_events);

void perf_callchain_kernel(struct perf_callchain_entry *entry,
struct pt_regs *regs)
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index d7b4967..c6f7113 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -306,6 +306,9 @@ static void __global_pmu_self(int this_cpu)
struct global_pmu_snapshot *pp;
int i, num;

+ if (!pcr_ops)
+ return;
+
pp = &global_cpu_snapshot[this_cpu].pmu;

num = 1;
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 8416d7f..8311f3d 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1395,7 +1395,6 @@ void __cpu_die(unsigned int cpu)

void __init smp_cpus_done(unsigned int max_cpus)
{
- pcr_arch_init();
}

void smp_send_reschedule(int cpu)
--
1.9.1

2014-11-06 22:44:26

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 031/162] sparc64: sun4v TLB error power off events

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: bob picco <[email protected]>

[ Upstream commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ]

We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: [email protected]
Signed-off-by: Bob Picco <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/thread_info_64.h | 1 +
arch/sparc/kernel/sun4v_tlb_miss.S | 35 ++++++++++++++++++++-------------
arch/sparc/kernel/traps_64.c | 15 ++++++++------
arch/sparc/mm/fault_64.c | 3 +++
4 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index a5f01ac..f85dc85 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -102,6 +102,7 @@ struct thread_info {
#define FAULT_CODE_ITLB 0x04 /* Miss happened in I-TLB */
#define FAULT_CODE_WINFIXUP 0x08 /* Miss happened during spill/fill */
#define FAULT_CODE_BLKCOMMIT 0x10 /* Use blk-commit ASI in copy_page */
+#define FAULT_CODE_BAD_RA 0x20 /* Bad RA for sun4v */

#if PAGE_SHIFT == 13
#define THREAD_SIZE (2*PAGE_SIZE)
diff --git a/arch/sparc/kernel/sun4v_tlb_miss.S b/arch/sparc/kernel/sun4v_tlb_miss.S
index e0c09bf8..6179e19 100644
--- a/arch/sparc/kernel/sun4v_tlb_miss.S
+++ b/arch/sparc/kernel/sun4v_tlb_miss.S
@@ -195,6 +195,11 @@ sun4v_tsb_miss_common:
ldx [%g2 + TRAP_PER_CPU_PGD_PADDR], %g7

sun4v_itlb_error:
+ rdpr %tl, %g1
+ cmp %g1, 1
+ ble,pt %icc, sun4v_bad_ra
+ or %g0, FAULT_CODE_BAD_RA | FAULT_CODE_ITLB, %g1
+
sethi %hi(sun4v_err_itlb_vaddr), %g1
stx %g4, [%g1 + %lo(sun4v_err_itlb_vaddr)]
sethi %hi(sun4v_err_itlb_ctx), %g1
@@ -206,15 +211,10 @@ sun4v_itlb_error:
sethi %hi(sun4v_err_itlb_error), %g1
stx %o0, [%g1 + %lo(sun4v_err_itlb_error)]

+ sethi %hi(1f), %g7
rdpr %tl, %g4
- cmp %g4, 1
- ble,pt %icc, 1f
- sethi %hi(2f), %g7
ba,pt %xcc, etraptl1
- or %g7, %lo(2f), %g7
-
-1: ba,pt %xcc, etrap
-2: or %g7, %lo(2b), %g7
+1: or %g7, %lo(1f), %g7
mov %l4, %o1
call sun4v_itlb_error_report
add %sp, PTREGS_OFF, %o0
@@ -222,6 +222,11 @@ sun4v_itlb_error:
/* NOTREACHED */

sun4v_dtlb_error:
+ rdpr %tl, %g1
+ cmp %g1, 1
+ ble,pt %icc, sun4v_bad_ra
+ or %g0, FAULT_CODE_BAD_RA | FAULT_CODE_DTLB, %g1
+
sethi %hi(sun4v_err_dtlb_vaddr), %g1
stx %g4, [%g1 + %lo(sun4v_err_dtlb_vaddr)]
sethi %hi(sun4v_err_dtlb_ctx), %g1
@@ -233,21 +238,23 @@ sun4v_dtlb_error:
sethi %hi(sun4v_err_dtlb_error), %g1
stx %o0, [%g1 + %lo(sun4v_err_dtlb_error)]

+ sethi %hi(1f), %g7
rdpr %tl, %g4
- cmp %g4, 1
- ble,pt %icc, 1f
- sethi %hi(2f), %g7
ba,pt %xcc, etraptl1
- or %g7, %lo(2f), %g7
-
-1: ba,pt %xcc, etrap
-2: or %g7, %lo(2b), %g7
+1: or %g7, %lo(1f), %g7
mov %l4, %o1
call sun4v_dtlb_error_report
add %sp, PTREGS_OFF, %o0

/* NOTREACHED */

+sun4v_bad_ra:
+ or %g0, %g4, %g5
+ ba,pt %xcc, sparc64_realfault_common
+ or %g1, %g0, %g4
+
+ /* NOTREACHED */
+
/* Instruction Access Exception, tl0. */
sun4v_iacc:
ldxa [%g0] ASI_SCRATCHPAD, %g2
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index 4ced92f..25d0c7e 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -2102,6 +2102,11 @@ void sun4v_nonresum_overflow(struct pt_regs *regs)
atomic_inc(&sun4v_nonresum_oflow_cnt);
}

+static void sun4v_tlb_error(struct pt_regs *regs)
+{
+ die_if_kernel("TLB/TSB error", regs);
+}
+
unsigned long sun4v_err_itlb_vaddr;
unsigned long sun4v_err_itlb_ctx;
unsigned long sun4v_err_itlb_pte;
@@ -2109,8 +2114,7 @@ unsigned long sun4v_err_itlb_error;

void sun4v_itlb_error_report(struct pt_regs *regs, int tl)
{
- if (tl > 1)
- dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
+ dump_tl1_traplog((struct tl1_traplog *)(regs + 1));

printk(KERN_EMERG "SUN4V-ITLB: Error at TPC[%lx], tl %d\n",
regs->tpc, tl);
@@ -2123,7 +2127,7 @@ void sun4v_itlb_error_report(struct pt_regs *regs, int tl)
sun4v_err_itlb_vaddr, sun4v_err_itlb_ctx,
sun4v_err_itlb_pte, sun4v_err_itlb_error);

- prom_halt();
+ sun4v_tlb_error(regs);
}

unsigned long sun4v_err_dtlb_vaddr;
@@ -2133,8 +2137,7 @@ unsigned long sun4v_err_dtlb_error;

void sun4v_dtlb_error_report(struct pt_regs *regs, int tl)
{
- if (tl > 1)
- dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
+ dump_tl1_traplog((struct tl1_traplog *)(regs + 1));

printk(KERN_EMERG "SUN4V-DTLB: Error at TPC[%lx], tl %d\n",
regs->tpc, tl);
@@ -2147,7 +2150,7 @@ void sun4v_dtlb_error_report(struct pt_regs *regs, int tl)
sun4v_err_dtlb_vaddr, sun4v_err_dtlb_ctx,
sun4v_err_dtlb_pte, sun4v_err_dtlb_error);

- prom_halt();
+ sun4v_tlb_error(regs);
}

void hypervisor_tlbop_error(unsigned long err, unsigned long op)
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 4ced3fc..45a413e 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -348,6 +348,9 @@ retry:
down_read(&mm->mmap_sem);
}

+ if (fault_code & FAULT_CODE_BAD_RA)
+ goto do_sigbus;
+
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
--
1.9.1

2014-11-06 22:44:20

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 023/162] hyperv: Fix a bug in netvsc_start_xmit()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: KY Srinivasan <[email protected]>

[ Upstream commit dedb845ded56ded1c62f5398a94ffa8615d4592d ]

After the packet is successfully sent, we should not touch the skb
as it may have been freed. This patch is based on the work done by
Long Li <[email protected]>.

In this version of the patch I have fixed issues pointed out by David.
David, please queue this up for stable.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Tested-by: Long Li <[email protected]>
Tested-by: Sitsofe Wheeler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/hyperv/netvsc_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 6ea06a8..f4aef42 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -147,6 +147,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
struct hv_netvsc_packet *packet;
int ret;
unsigned int i, num_pages, npg_data;
+ u32 skb_length = skb->len;

/* Add multipages for skb->data and additional 2 for RNDIS */
npg_data = (((unsigned long)skb->data + skb_headlen(skb) - 1)
@@ -217,7 +218,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
ret = rndis_filter_send(net_device_ctx->device_ctx,
packet);
if (ret == 0) {
- net->stats.tx_bytes += skb->len;
+ net->stats.tx_bytes += skb_length;
net->stats.tx_packets++;
} else {
kfree(packet);
--
1.9.1

2014-11-06 22:44:36

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 075/162] drm/radeon: fix speaker allocation setup

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit 4910403836ded89803fab201d4b5caaa85de3a89 upstream.

If the sad_count is 0, set the hw to stereo and change
the error message to a warn. A lot of monitors don't
set the speaker allocation block.

Signed-off-by: Alex Deucher <[email protected]>
[ kamal: backport to 3.13-stable: no dce3_1_afmt.c in 3.13. ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/radeon/dce6_afmt.c | 6 +++---
drivers/gpu/drm/radeon/evergreen_hdmi.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/dce6_afmt.c b/drivers/gpu/drm/radeon/dce6_afmt.c
index ff0001c..3483828 100644
--- a/drivers/gpu/drm/radeon/dce6_afmt.c
+++ b/drivers/gpu/drm/radeon/dce6_afmt.c
@@ -174,9 +174,9 @@ void dce6_afmt_write_speaker_allocation(struct drm_encoder *encoder)
}

sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, &sadb);
- if (sad_count <= 0) {
- DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count);
- return;
+ if (sad_count < 0) {
+ DRM_DEBUG("Couldn't read Speaker Allocation Data Block: %d\n", sad_count);
+ sad_count = 0;
}

/* program the speaker allocation */
diff --git a/drivers/gpu/drm/radeon/evergreen_hdmi.c b/drivers/gpu/drm/radeon/evergreen_hdmi.c
index 0c6d5ce..738c1ec 100644
--- a/drivers/gpu/drm/radeon/evergreen_hdmi.c
+++ b/drivers/gpu/drm/radeon/evergreen_hdmi.c
@@ -118,9 +118,9 @@ static void dce4_afmt_write_speaker_allocation(struct drm_encoder *encoder)
}

sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, &sadb);
- if (sad_count <= 0) {
- DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count);
- return;
+ if (sad_count < 0) {
+ DRM_DEBUG("Couldn't read Speaker Allocation Data Block: %d\n", sad_count);
+ sad_count = 0;
}

/* program the speaker allocation */
--
1.9.1

2014-11-06 22:44:44

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 024/162] ip6_gre: fix flowi6_proto value in xmit path

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <[email protected]>

[ Upstream commit 3be07244b7337760a3269d56b2f4a63e72218648 ]

In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: Matthieu Ternisien d'Ouville <[email protected]>
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/ipv6/ip6_gre.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 01ee297..bbb5ccd 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -790,7 +790,7 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
encap_limit = t->parms.encap_limit;

memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
- fl6.flowi6_proto = IPPROTO_IPIP;
+ fl6.flowi6_proto = IPPROTO_GRE;

dsfield = ipv4_get_dsfield(iph);

@@ -840,7 +840,7 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
encap_limit = t->parms.encap_limit;

memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
- fl6.flowi6_proto = IPPROTO_IPV6;
+ fl6.flowi6_proto = IPPROTO_GRE;

dsfield = ipv6_get_dsfield(ipv6h);
if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
--
1.9.1

2014-11-06 22:44:58

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 027/162] tcp: fixing TLP's FIN recovery

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Per Hurtig <[email protected]>

[ Upstream commit bef1909ee3ed1ca39231b260a8d3b4544ecd0c8f ]

Fix to a problem observed when losing a FIN segment that does not
contain data. In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Nandita Dukkipati <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/ipv4/tcp_output.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f587e1c..395f909 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2048,9 +2048,7 @@ void tcp_send_loss_probe(struct sock *sk)
if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
goto rearm_timer;

- /* Probe with zero data doesn't trigger fast recovery. */
- if (skb->len > 0)
- err = __tcp_retransmit_skb(sk, skb);
+ err = __tcp_retransmit_skb(sk, skb);

/* Record snd_nxt for loss detection. */
if (likely(!err))
--
1.9.1

2014-11-06 22:44:47

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 068/162] ext4: add ext4_iget_normal() which is to be used for dir tree lookups

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <[email protected]>

commit f4bb2981024fc91b23b4d09a8817c415396dbabb upstream.

If there is a corrupted file system which has directory entries that
point at reserved, metadata inodes, prohibit them from being used by
treating them the same way we treat Boot Loader inodes --- that is,
mark them to be bad inodes. This prohibits them from being opened,
deleted, or modified via chmod, chown, utimes, etc.

In particular, this prevents a corrupted file system which has a
directory entry which points at the journal inode from being deleted
and its blocks released, after which point Much Hilarity Ensues.

Reported-by: Sami Liedes <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/inode.c | 7 +++++++
fs/ext4/namei.c | 4 ++--
fs/ext4/super.c | 2 +-
4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e531054..c9d0794 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2110,6 +2110,7 @@ int do_journal_get_write_access(handle_t *handle,
#define CONVERT_INLINE_DATA 2

extern struct inode *ext4_iget(struct super_block *, unsigned long);
+extern struct inode *ext4_iget_normal(struct super_block *, unsigned long);
extern int ext4_write_inode(struct inode *, struct writeback_control *);
extern int ext4_setattr(struct dentry *, struct iattr *);
extern int ext4_getattr(struct vfsmount *mnt, struct dentry *dentry,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 37d3b1a..a4c4f38 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4260,6 +4260,13 @@ bad_inode:
return ERR_PTR(ret);
}

+struct inode *ext4_iget_normal(struct super_block *sb, unsigned long ino)
+{
+ if (ino < EXT4_FIRST_INO(sb) && ino != EXT4_ROOT_INO)
+ return ERR_PTR(-EIO);
+ return ext4_iget(sb, ino);
+}
+
static int ext4_inode_blocks_set(handle_t *handle,
struct ext4_inode *raw_inode,
struct ext4_inode_info *ei)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index d4fa3ed..5bfbf94 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1430,7 +1430,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
dentry->d_name.name);
return ERR_PTR(-EIO);
}
- inode = ext4_iget(dir->i_sb, ino);
+ inode = ext4_iget_normal(dir->i_sb, ino);
if (inode == ERR_PTR(-ESTALE)) {
EXT4_ERROR_INODE(dir,
"deleted inode referenced: %u",
@@ -1461,7 +1461,7 @@ struct dentry *ext4_get_parent(struct dentry *child)
return ERR_PTR(-EIO);
}

- return d_obtain_alias(ext4_iget(child->d_inode->i_sb, ino));
+ return d_obtain_alias(ext4_iget_normal(child->d_inode->i_sb, ino));
}

/*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5ee03e5d..c888f23 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -996,7 +996,7 @@ static struct inode *ext4_nfs_get_inode(struct super_block *sb,
* Currently we don't know the generation for parent directory, so
* a generation of 0 means "accept any"
*/
- inode = ext4_iget(sb, ino);
+ inode = ext4_iget_normal(sb, ino);
if (IS_ERR(inode))
return ERR_CAST(inode);
if (generation && inode->i_generation != generation) {
--
1.9.1

2014-11-06 22:45:35

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 022/162] gro: fix aggregation for skb using frag_list

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 73d3fe6d1c6d840763ceafa9afae0aaafa18c4b5 ]

In commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
I added a regression for linear skb that traditionally force GRO
to use the frag_list fallback.

Erez Shitrit found that at most two segments were aggregated and
the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.

This is because pinfo at this spot still points to the last skb in the
chain, instead of the first one, where we find the correct gso_size
information.

Signed-off-by: Eric Dumazet <[email protected]>
Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb")
Reported-by: Erez Shitrit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/core/skbuff.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 808a270..ddbb949 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3043,6 +3043,9 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
NAPI_GRO_CB(skb)->free = NAPI_GRO_FREE_STOLEN_HEAD;
goto done;
}
+ /* switch back to head shinfo */
+ pinfo = skb_shinfo(p);
+
if (pinfo->frag_list)
goto merge;
if (skb_gro_len(p) != pinfo->gso_size)
--
1.9.1

2014-11-06 22:45:51

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 062/162] ext4: fix mmap data corruption when blocksize < pagesize

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit d6320cbfc92910a3e5f10c42d98c231c98db4f60 upstream.

Use truncate_isize_extended() when hole is being created in a file so that
->page_mkwrite() will get called for the partial tail page if it is
mmaped (see the first patch in the series for details).

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2d782aa..fd6501e3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4656,8 +4656,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
ext4_orphan_del(NULL, inode);
goto err_out;
}
- } else
+ } else {
+ loff_t oldsize = inode->i_size;
+
i_size_write(inode, attr->ia_size);
+ pagecache_isize_extended(inode, oldsize, inode->i_size);
+ }

/*
* Blocks are going to be removed from the inode. Wait
--
1.9.1

2014-11-06 22:46:08

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 078/162] random: add and use memzero_explicit() for clearing data

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit d4c5efdb97773f59a2b711754ca0953f24516739 upstream.

zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.

Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]

Fixes kernel bugzilla: 82041

Reported-by: [email protected]
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
[ kamal: backport to 3.13-stable: one more memzero_explicit in extract_buf() ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/char/random.c | 10 +++++-----
include/linux/string.h | 5 +++--
lib/string.c | 16 ++++++++++++++++
3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75b..8a64dbe 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1063,8 +1063,8 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* pool while mixing, and hash one final time.
*/
sha_transform(hash.w, extract, workspace);
- memset(extract, 0, sizeof(extract));
- memset(workspace, 0, sizeof(workspace));
+ memzero_explicit(extract, sizeof(extract));
+ memzero_explicit(workspace, sizeof(workspace));

/*
* In case the hash function has some recognizable output
@@ -1076,7 +1076,7 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
hash.w[2] ^= rol32(hash.w[2], 16);

memcpy(out, &hash, EXTRACT_SIZE);
- memset(&hash, 0, sizeof(hash));
+ memzero_explicit(&hash, sizeof(hash));
}

static ssize_t extract_entropy(struct entropy_store *r, void *buf,
@@ -1124,7 +1124,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
}

/* Wipe data just returned from memory */
- memset(tmp, 0, sizeof(tmp));
+ memzero_explicit(tmp, sizeof(tmp));

return ret;
}
@@ -1162,7 +1162,7 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
}

/* Wipe data just returned from memory */
- memset(tmp, 0, sizeof(tmp));
+ memzero_explicit(tmp, sizeof(tmp));

return ret;
}
diff --git a/include/linux/string.h b/include/linux/string.h
index ac889c5..0ed878d 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -129,7 +129,7 @@ int bprintf(u32 *bin_buf, size_t size, const char *fmt, ...) __printf(3, 4);
#endif

extern ssize_t memory_read_from_buffer(void *to, size_t count, loff_t *ppos,
- const void *from, size_t available);
+ const void *from, size_t available);

/**
* strstarts - does @str start with @prefix?
@@ -141,7 +141,8 @@ static inline bool strstarts(const char *str, const char *prefix)
return strncmp(str, prefix, strlen(prefix)) == 0;
}

-extern size_t memweight(const void *ptr, size_t bytes);
+size_t memweight(const void *ptr, size_t bytes);
+void memzero_explicit(void *s, size_t count);

/**
* kbasename - return the last part of a pathname.
diff --git a/lib/string.c b/lib/string.c
index e5878de..43d0781 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -586,6 +586,22 @@ void *memset(void *s, int c, size_t count)
EXPORT_SYMBOL(memset);
#endif

+/**
+ * memzero_explicit - Fill a region of memory (e.g. sensitive
+ * keying data) with 0s.
+ * @s: Pointer to the start of the area.
+ * @count: The size of the area.
+ *
+ * memzero_explicit() doesn't need an arch-specific version as
+ * it just invokes the one of memset() implicitly.
+ */
+void memzero_explicit(void *s, size_t count)
+{
+ memset(s, 0, count);
+ OPTIMIZER_HIDE_VAR(s);
+}
+EXPORT_SYMBOL(memzero_explicit);
+
#ifndef __HAVE_ARCH_MEMCPY
/**
* memcpy - Copy one area of memory to another
--
1.9.1

2014-11-06 22:44:31

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 066/162] iser-target: Disable TX completion interrupt coalescing

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit 0d0f660d882c1c02748ced13966a2413aa5d6cc2 upstream.

This patch explicitly disables TX completion interrupt coalescing logic
in isert_put_response() and isert_put_datain() that was originally added
as an efficiency optimization in commit 95b60f07.

It has been reported that this change can trigger ABORT_TASK timeouts
under certain small block workloads, where disabling coalescing was
required for stability. According to Sagi, this doesn't impact
overall performance, so go ahead and disable it for now.

Reported-by: Moussa Ba <[email protected]>
Reported-by: Sagi Grimberg <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/infiniband/ulp/isert/ib_isert.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 44b883f..9b2a108 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1909,7 +1909,7 @@ isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
isert_cmd->tx_desc.num_sge = 2;
}

- isert_init_send_wr(isert_conn, isert_cmd, send_wr, true);
+ isert_init_send_wr(isert_conn, isert_cmd, send_wr, false);

pr_debug("Posting SCSI Response IB_WR_SEND >>>>>>>>>>>>>>>>>>>>>>\n");

@@ -2432,7 +2432,7 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
&isert_cmd->tx_desc.iscsi_header);
isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
isert_init_send_wr(isert_conn, isert_cmd,
- &isert_cmd->tx_desc.send_wr, true);
+ &isert_cmd->tx_desc.send_wr, false);

atomic_add(wr->send_wr_num + 1, &isert_conn->post_send_buf_count);

--
1.9.1

2014-11-06 22:44:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 064/162] qla_target: don't delete changed nacls

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Joern Engel <[email protected]>

commit f4c24db1b7ad0ce84409e15744d26c6f86a96840 upstream.

The code is currently riddled with "drop the hardware_lock to avoid a
deadlock" bugs that expose races. One of those races seems to expose a
valid warning in tcm_qla2xxx_clear_nacl_from_fcport_map. Add some
bandaid to it.

Signed-off-by: Joern Engel <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/scsi/qla2xxx/tcm_qla2xxx.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index 7eb19be..b93f24a 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -740,7 +740,16 @@ static void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess)
pr_debug("fc_rport domain: port_id 0x%06x\n", nacl->nport_id);

node = btree_remove32(&lport->lport_fcport_map, nacl->nport_id);
- WARN_ON(node && (node != se_nacl));
+ if (WARN_ON(node && (node != se_nacl))) {
+ /*
+ * The nacl no longer matches what we think it should be.
+ * Most likely a new dynamic acl has been added while
+ * someone dropped the hardware lock. It clearly is a
+ * bug elsewhere, but this bit can't make things worse.
+ */
+ btree_insert32(&lport->lport_fcport_map, nacl->nport_id,
+ node, GFP_ATOMIC);
+ }

pr_debug("Removed from fcport_map: %p for WWNN: 0x%016LX, port_id: 0x%06x\n",
se_nacl, nacl->nport_wwnn, nacl->nport_id);
--
1.9.1

2014-11-06 22:47:02

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 030/162] sparc32: dma_alloc_coherent must honour gfp flags

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Hellstrom <[email protected]>

[ Upstream commit d1105287aabe88dbb3af825140badaa05cf0442c ]

dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO)
but the sparc32 implementations sbus_alloc_coherent() and
pci32_alloc_coherent() doesn't take the gfp flags into
account.

Tested on the SPARC32/LEON GRETH Ethernet driver which fails
due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed
pages.

Signed-off-by: Daniel Hellstrom <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/ioport.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index e7e215d..c2d81ad 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -278,7 +278,8 @@ static void *sbus_alloc_coherent(struct device *dev, size_t len,
}

order = get_order(len_total);
- if ((va = __get_free_pages(GFP_KERNEL|__GFP_COMP, order)) == 0)
+ va = __get_free_pages(gfp, order);
+ if (va == 0)
goto err_nopages;

if ((res = kzalloc(sizeof(struct resource), GFP_KERNEL)) == NULL)
@@ -443,7 +444,7 @@ static void *pci32_alloc_coherent(struct device *dev, size_t len,
}

order = get_order(len_total);
- va = (void *) __get_free_pages(GFP_KERNEL, order);
+ va = (void *) __get_free_pages(gfp, order);
if (va == NULL) {
printk("pci_alloc_consistent: no %ld pages\n", len_total>>PAGE_SHIFT);
goto err_nopages;
--
1.9.1

2014-11-06 22:47:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 076/162] drm/radeon: use gart memory for DMA ring tests

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit adfed2b0587289013f8143c54913ddfd44ac1fd3 upstream.

Avoids HDP cache flush issues when using vram which can
cause ring test failures on certain boards.

Signed-off-by: Alex Deucher <[email protected]>
Cc: Alexander Fyodorov <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/radeon/cik_sdma.c | 21 ++++++++++++---------
drivers/gpu/drm/radeon/r600_dma.c | 21 ++++++++++++---------
drivers/gpu/drm/radeon/radeon.h | 2 ++
3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik_sdma.c b/drivers/gpu/drm/radeon/cik_sdma.c
index 9fd95c7..376502d 100644
--- a/drivers/gpu/drm/radeon/cik_sdma.c
+++ b/drivers/gpu/drm/radeon/cik_sdma.c
@@ -508,16 +508,19 @@ int cik_sdma_ring_test(struct radeon_device *rdev,
{
unsigned i;
int r;
- void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
+ unsigned index;
u32 tmp;
+ u64 gpu_addr;

- if (!ptr) {
- DRM_ERROR("invalid vram scratch pointer\n");
- return -EINVAL;
- }
+ if (ring->idx == R600_RING_TYPE_DMA_INDEX)
+ index = R600_WB_DMA_RING_TEST_OFFSET;
+ else
+ index = CAYMAN_WB_DMA1_RING_TEST_OFFSET;
+
+ gpu_addr = rdev->wb.gpu_addr + index;

tmp = 0xCAFEDEAD;
- writel(tmp, ptr);
+ rdev->wb.wb[index/4] = cpu_to_le32(tmp);

r = radeon_ring_lock(rdev, ring, 5);
if (r) {
@@ -525,14 +528,14 @@ int cik_sdma_ring_test(struct radeon_device *rdev,
return r;
}
radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE, SDMA_WRITE_SUB_OPCODE_LINEAR, 0));
- radeon_ring_write(ring, rdev->vram_scratch.gpu_addr & 0xfffffffc);
- radeon_ring_write(ring, upper_32_bits(rdev->vram_scratch.gpu_addr) & 0xffffffff);
+ radeon_ring_write(ring, lower_32_bits(gpu_addr));
+ radeon_ring_write(ring, upper_32_bits(gpu_addr));
radeon_ring_write(ring, 1); /* number of DWs to follow */
radeon_ring_write(ring, 0xDEADBEEF);
radeon_ring_unlock_commit(rdev, ring);

for (i = 0; i < rdev->usec_timeout; i++) {
- tmp = readl(ptr);
+ tmp = le32_to_cpu(rdev->wb.wb[index/4]);
if (tmp == 0xDEADBEEF)
break;
DRM_UDELAY(1);
diff --git a/drivers/gpu/drm/radeon/r600_dma.c b/drivers/gpu/drm/radeon/r600_dma.c
index 616d37a..cf5e181 100644
--- a/drivers/gpu/drm/radeon/r600_dma.c
+++ b/drivers/gpu/drm/radeon/r600_dma.c
@@ -227,16 +227,19 @@ int r600_dma_ring_test(struct radeon_device *rdev,
{
unsigned i;
int r;
- void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
+ unsigned index;
u32 tmp;
+ u64 gpu_addr;

- if (!ptr) {
- DRM_ERROR("invalid vram scratch pointer\n");
- return -EINVAL;
- }
+ if (ring->idx == R600_RING_TYPE_DMA_INDEX)
+ index = R600_WB_DMA_RING_TEST_OFFSET;
+ else
+ index = CAYMAN_WB_DMA1_RING_TEST_OFFSET;
+
+ gpu_addr = rdev->wb.gpu_addr + index;

tmp = 0xCAFEDEAD;
- writel(tmp, ptr);
+ rdev->wb.wb[index/4] = cpu_to_le32(tmp);

r = radeon_ring_lock(rdev, ring, 4);
if (r) {
@@ -244,13 +247,13 @@ int r600_dma_ring_test(struct radeon_device *rdev,
return r;
}
radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_WRITE, 0, 0, 1));
- radeon_ring_write(ring, rdev->vram_scratch.gpu_addr & 0xfffffffc);
- radeon_ring_write(ring, upper_32_bits(rdev->vram_scratch.gpu_addr) & 0xff);
+ radeon_ring_write(ring, lower_32_bits(gpu_addr));
+ radeon_ring_write(ring, upper_32_bits(gpu_addr) & 0xff);
radeon_ring_write(ring, 0xDEADBEEF);
radeon_ring_unlock_commit(rdev, ring);

for (i = 0; i < rdev->usec_timeout; i++) {
- tmp = readl(ptr);
+ tmp = le32_to_cpu(rdev->wb.wb[index/4]);
if (tmp == 0xDEADBEEF)
break;
DRM_UDELAY(1);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index fd2526c..ee3be48 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1081,6 +1081,8 @@ struct radeon_wb {
#define R600_WB_EVENT_OFFSET 3072
#define CIK_WB_CP1_WPTR_OFFSET 3328
#define CIK_WB_CP2_WPTR_OFFSET 3584
+#define R600_WB_DMA_RING_TEST_OFFSET 3588
+#define CAYMAN_WB_DMA1_RING_TEST_OFFSET 3592

/**
* struct radeon_pm - power management datas
--
1.9.1

2014-11-06 22:47:47

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 074/162] ext4: check s_chksum_driver when looking for bg csum presence

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <[email protected]>

commit 813d32f91333e4c33d5a19b67167c4bae42dae75 upstream.

Convert the ext4_has_group_desc_csum predicate to look for a checksum
driver instead of the metadata_csum flag and change the bg checksum
calculation function to look for GDT_CSUM before taking the crc16
path.

Without this patch, if we mount with ^uninit_bg,^metadata_csum and
later metadata_csum gets turned on by accident, the block group
checksum functions will incorrectly assume that checksumming is
enabled (metadata_csum) but that crc16 should be used
(!s_chksum_driver). This is totally wrong, so fix the predicate
and the checksum formula selection.

(Granted, if the metadata_csum feature bit gets enabled on a live FS
then something underhanded is going on, but we could at least avoid
writing garbage into the on-disk fields.)

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Dmitry Monakhov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/ext4.h | 4 ++--
fs/ext4/super.c | 4 ++++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5b250fe..bfb7fed 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2337,8 +2337,8 @@ extern int ext4_register_li_request(struct super_block *sb,
static inline int ext4_has_group_desc_csum(struct super_block *sb)
{
return EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM) ||
+ (EXT4_SB(sb)->s_chksum_driver != NULL);
}

static inline int ext4_has_metadata_csum(struct super_block *sb)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6686ce4..608db58 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2019,6 +2019,10 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
}

/* old crc16 code */
+ if (!(sbi->s_es->s_feature_ro_compat &
+ cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+ return 0;
+
offset = offsetof(struct ext4_group_desc, bg_checksum);

crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
--
1.9.1

2014-11-06 22:47:50

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 026/162] sctp: handle association restarts when the socket is closed.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit bdf6fa52f01b941d4a80372d56de465bdbbd1d23 ]

Currently association restarts do not take into consideration the
state of the socket. When a restart happens, the current assocation
simply transitions into established state. This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user. The conditions
to trigger this are as follows:
1) Remote does not acknoledge some data causing data to remain
outstanding.
2) Local application calls close() on the socket. Since data
is still outstanding, the association is placed in SHUTDOWN_PENDING
state. However, the socket is closed.
3) The remote tries to create a new association, triggering a restart
on the local system. The association moves from SHUTDOWN_PENDING
to ESTABLISHED. At this point, it is no longer reachable by
any socket on the local system.

This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk. This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.

Reported-by: David Laight <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/sctp/command.h | 2 +-
net/sctp/sm_statefuns.c | 19 ++++++++++++++++---
2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
index 832f219..c3f0cd9 100644
--- a/include/net/sctp/command.h
+++ b/include/net/sctp/command.h
@@ -116,7 +116,7 @@ typedef enum {
* analysis of the state functions, but in reality just taken from
* thin air in the hopes othat we don't trigger a kernel panic.
*/
-#define SCTP_MAX_NUM_COMMANDS 14
+#define SCTP_MAX_NUM_COMMANDS 20

typedef union {
__s32 i32;
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index cd0d5a2..4b549a3 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1776,9 +1776,22 @@ static sctp_disposition_t sctp_sf_do_dupcook_a(struct net *net,
/* Update the content of current association. */
sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, SCTP_ULPEVENT(ev));
- sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
- SCTP_STATE(SCTP_STATE_ESTABLISHED));
- sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl));
+ if (sctp_state(asoc, SHUTDOWN_PENDING) &&
+ (sctp_sstate(asoc->base.sk, CLOSING) ||
+ sock_flag(asoc->base.sk, SOCK_DEAD))) {
+ /* if were currently in SHUTDOWN_PENDING, but the socket
+ * has been closed by user, don't transition to ESTABLISHED.
+ * Instead trigger SHUTDOWN bundled with COOKIE_ACK.
+ */
+ sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl));
+ return sctp_sf_do_9_2_start_shutdown(net, ep, asoc,
+ SCTP_ST_CHUNK(0), NULL,
+ commands);
+ } else {
+ sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
+ SCTP_STATE(SCTP_STATE_ESTABLISHED));
+ sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl));
+ }
return SCTP_DISPOSITION_CONSUME;

nomem_ev:
--
1.9.1

2014-11-06 22:48:52

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 072/162] ARC: [nsimosci] Allow "headless" models to boot

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vineet Gupta <[email protected]>

commit 5c05483e2db91890faa9a7be0a831701a3f442d6 upstream.

There are certain test configuration of virtual platform which don't
have any real console device (uart/pgu). So add tty0 as a fallback console
device to allow system to boot and be accessible via telnet

Otherwise with ttyS0 as only console, but 8250 disabled in kernel build,
init chokes.

Reported-by: Anton Kolesov <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/arc/boot/dts/nsimosci.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/boot/dts/nsimosci.dts b/arch/arc/boot/dts/nsimosci.dts
index 4f31b2e..398064c 100644
--- a/arch/arc/boot/dts/nsimosci.dts
+++ b/arch/arc/boot/dts/nsimosci.dts
@@ -20,7 +20,7 @@
/* this is for console on PGU */
/* bootargs = "console=tty0 consoleblank=0"; */
/* this is for console on serial */
- bootargs = "earlycon=uart8250,mmio32,0xc0000000,115200n8 console=ttyS0,115200n8 consoleblank=0 debug";
+ bootargs = "earlycon=uart8250,mmio32,0xc0000000,115200n8 console=tty0 console=ttyS0,115200n8 consoleblank=0 debug";
};

aliases {
--
1.9.1

2014-11-06 22:48:49

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 028/162] sparc64: Do not disable interrupts in nmi_cpu_busy()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 58556104e9cd0107a7a8d2692cf04ef31669f6e4 ]

nmi_cpu_busy() is a SMP function call that just makes sure that all of the
cpus are spinning using cpu cycles while the NMI test runs.

It does not need to disable IRQs because we just care about NMIs executing
which will even with 'normal' IRQs disabled.

It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
triggers the BUG check in irq_work_run_list():

BUG_ON(!irqs_disabled());

Because now irq_work_run() is invoked from the tail of
generic_smp_call_function_single_interrupt().

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/nmi.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 6479256..fce8ab1 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -141,7 +141,6 @@ static inline unsigned int get_nmi_count(int cpu)

static __init void nmi_cpu_busy(void *data)
{
- local_irq_enable_in_hardirq();
while (endflag == 0)
mb();
}
--
1.9.1

2014-11-06 22:49:22

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 077/162] compiler: define OPTIMIZER_HIDE_VAR() macro

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Cesar Eduardo Barros <[email protected]>

[ 3.13-stable only: OPTIMIZER_HIDE_VAR() macro
extracted from fe8c8a1 "crypto: more robust crypto_memneq" ]

[...] dummy inline assembly
(based on RELOC_HIDE) to block the problematic kinds of optimization,
while still allowing other optimizations to be applied to the code.

Signed-off-by: Cesar Eduardo Barros <[email protected]>
[ kamal: 3.13-stable prereq for
d4c5efd "random: add and use memzero_explicit() for clearing data" ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/linux/compiler-gcc.h | 3 +++
include/linux/compiler-intel.h | 7 +++++++
include/linux/compiler.h | 4 ++++
3 files changed, 14 insertions(+)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 24545cd..02ae99e 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,6 +37,9 @@
__asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
(typeof(ptr)) (__ptr + (off)); })

+/* Make the optimizer believe the variable can be manipulated arbitrarily. */
+#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
+
#ifdef __CHECKER__
#define __must_be_array(arr) 0
#else
diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h
index dc1bd3d..5529c52 100644
--- a/include/linux/compiler-intel.h
+++ b/include/linux/compiler-intel.h
@@ -15,6 +15,7 @@
*/
#undef barrier
#undef RELOC_HIDE
+#undef OPTIMIZER_HIDE_VAR

#define barrier() __memory_barrier()

@@ -23,6 +24,12 @@
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })

+/* This should act as an optimization barrier on var.
+ * Given that this compiler does not have inline assembly, a compiler barrier
+ * is the best we can do.
+ */
+#define OPTIMIZER_HIDE_VAR(var) barrier()
+
/* Intel ECC compiler doesn't support __builtin_types_compatible_p() */
#define __must_be_array(a) 0

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..a2329c5 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -170,6 +170,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
(typeof(ptr)) (__ptr + (off)); })
#endif

+#ifndef OPTIMIZER_HIDE_VAR
+#define OPTIMIZER_HIDE_VAR(var) barrier()
+#endif
+
/* Not-quite-unique ID. */
#ifndef __UNIQUE_ID
# define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
--
1.9.1

2014-11-06 22:49:44

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 061/162] vfs: fix data corruption when blocksize < pagesize for mmaped data

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 90a8020278c1598fafd071736a0846b38510309c upstream.

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/buffer.c | 3 +++
include/linux/mm.h | 1 +
mm/truncate.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4252d82..948aae2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2077,6 +2077,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
struct page *page, void *fsdata)
{
struct inode *inode = mapping->host;
+ loff_t old_size = inode->i_size;
int i_size_changed = 0;

copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2096,6 +2097,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
unlock_page(page);
page_cache_release(page);

+ if (old_size < pos)
+ pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4506b84..6a47519 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1087,6 +1087,7 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,

extern void truncate_pagecache(struct inode *inode, loff_t new);
extern void truncate_setsize(struct inode *inode, loff_t newsize);
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
int truncate_inode_page(struct address_space *mapping, struct page *page);
int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 353b683..855c38c 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
#include <linux/buffer_head.h> /* grr. try_to_release_page,
do_invalidatepage */
#include <linux/cleancache.h>
+#include <linux/rmap.h>
#include "internal.h"


@@ -613,12 +614,68 @@ EXPORT_SYMBOL(truncate_pagecache);
*/
void truncate_setsize(struct inode *inode, loff_t newsize)
{
+ loff_t oldsize = inode->i_size;
+
i_size_write(inode, newsize);
+ if (newsize > oldsize)
+ pagecache_isize_extended(inode, oldsize, newsize);
truncate_pagecache(inode, newsize);
}
EXPORT_SYMBOL(truncate_setsize);

/**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode: inode for which i_size was extended
+ * @from: original inode size
+ * @to: new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page. This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+ int bsize = 1 << inode->i_blkbits;
+ loff_t rounded_from;
+ struct page *page;
+ pgoff_t index;
+
+ WARN_ON(!mutex_is_locked(&inode->i_mutex));
+ WARN_ON(to > inode->i_size);
+
+ if (from >= to || bsize == PAGE_CACHE_SIZE)
+ return;
+ /* Page straddling @from will not have any hole block created? */
+ rounded_from = round_up(from, bsize);
+ if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+ return;
+
+ index = from >> PAGE_CACHE_SHIFT;
+ page = find_lock_page(inode->i_mapping, index);
+ /* Page not cached? Nothing to do */
+ if (!page)
+ return;
+ /*
+ * See clear_page_dirty_for_io() for details why set_page_dirty()
+ * is needed.
+ */
+ if (page_mkclean(page))
+ set_page_dirty(page);
+ unlock_page(page);
+ page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
* truncate_pagecache_range - unmap and remove pagecache that is hole-punched
* @inode: inode
* @lstart: offset of beginning of hole
--
1.9.1

2014-11-06 22:44:13

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 025/162] team: avoid race condition in scheduling delayed work

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Joe Lawrence <[email protected]>

[ Upstream commit 47549650abd13d873fd2e5fc218db19e21031074 ]

When team_notify_peers and team_mcast_rejoin are called, they both reset
their respective .count_pending atomic variable. Then when the actual
worker function is executed, the variable is atomically decremented.
This pattern introduces a potential race condition where the
.count_pending rolls over and the worker function keeps rescheduling
until .count_pending decrements to zero again:

THREAD 1 THREAD 2

======== ========
team_notify_peers(teamX)
atomic_set count_pending = 1
schedule_delayed_work
team_notify_peers(teamX)
atomic_set count_pending = 1
team_notify_peers_work
atomic_dec_and_test
count_pending = 0
(return)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test
count_pending = -1
schedule_delayed_work
(repeat until count_pending = 0)

Instead of assigning a new value to .count_pending, use atomic_add to
tack-on the additional desired worker function invocations.

Signed-off-by: Joe Lawrence <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Fixes: fc423ff00df3a19554414ee ("team: add peer notification")
Fixes: 492b200efdd20b8fcfdac87 ("team: add support for sending multicast rejoins")
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/team/team.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 99500d1..2898c2a 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -647,7 +647,7 @@ static void team_notify_peers(struct team *team)
{
if (!team->notify_peers.count || !netif_running(team->dev))
return;
- atomic_set(&team->notify_peers.count_pending, team->notify_peers.count);
+ atomic_add(team->notify_peers.count, &team->notify_peers.count_pending);
schedule_delayed_work(&team->notify_peers.dw, 0);
}

@@ -687,7 +687,7 @@ static void team_mcast_rejoin(struct team *team)
{
if (!team->mcast_rejoin.count || !netif_running(team->dev))
return;
- atomic_set(&team->mcast_rejoin.count_pending, team->mcast_rejoin.count);
+ atomic_add(team->mcast_rejoin.count, &team->mcast_rejoin.count_pending);
schedule_delayed_work(&team->mcast_rejoin.dw, 0);
}

--
1.9.1

2014-11-06 22:50:03

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 070/162] ext4: Replace open coded mdata csum feature to helper function

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Monakhov <[email protected]>

commit 9aa5d32ba269bec0e7eaba2697a986a7b0bc8528 upstream.

Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized csum machinery.

IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
tune2fs -O metadata_csum $IMG
touch $MNT/test
umount $MNT

@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)

https://bugzilla.kernel.org/show_bug.cgi?id=82201

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/bitmap.c | 12 ++++--------
fs/ext4/ext4.h | 8 ++++++++
fs/ext4/extents.c | 6 ++----
fs/ext4/ialloc.c | 3 +--
fs/ext4/inline.c | 3 +--
fs/ext4/inode.c | 9 +++------
fs/ext4/ioctl.c | 3 +--
fs/ext4/mmp.c | 6 ++----
fs/ext4/namei.c | 39 +++++++++++++--------------------------
fs/ext4/resize.c | 3 +--
fs/ext4/super.c | 15 +++++----------
fs/ext4/xattr.c | 6 ++----
12 files changed, 43 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/bitmap.c b/fs/ext4/bitmap.c
index 3285aa5..b610779 100644
--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -24,8 +24,7 @@ int ext4_inode_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
__u32 provided, calculated;
struct ext4_sb_info *sbi = EXT4_SB(sb);

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return 1;

provided = le16_to_cpu(gdp->bg_inode_bitmap_csum_lo);
@@ -46,8 +45,7 @@ void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
__u32 csum;
struct ext4_sb_info *sbi = EXT4_SB(sb);

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return;

csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
@@ -65,8 +63,7 @@ int ext4_block_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
struct ext4_sb_info *sbi = EXT4_SB(sb);
int sz = EXT4_CLUSTERS_PER_GROUP(sb) / 8;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return 1;

provided = le16_to_cpu(gdp->bg_block_bitmap_csum_lo);
@@ -91,8 +88,7 @@ void ext4_block_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
__u32 csum;
struct ext4_sb_info *sbi = EXT4_SB(sb);

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return;

csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c9d0794..0e9930f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2345,6 +2345,14 @@ static inline int ext4_has_group_desc_csum(struct super_block *sb)
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
}

+static inline int ext4_has_metadata_csum(struct super_block *sb)
+{
+ WARN_ON_ONCE(EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+ !EXT4_SB(sb)->s_chksum_driver);
+
+ return (EXT4_SB(sb)->s_chksum_driver != NULL);
+}
static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
{
return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a09bb7..da05fe2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -74,8 +74,7 @@ static int ext4_extent_block_csum_verify(struct inode *inode,
{
struct ext4_extent_tail *et;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return 1;

et = find_ext4_extent_tail(eh);
@@ -89,8 +88,7 @@ static void ext4_extent_block_csum_set(struct inode *inode,
{
struct ext4_extent_tail *et;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return;

et = find_ext4_extent_tail(eh);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 64bb32f1..158ddf6 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -988,8 +988,7 @@ got:
spin_unlock(&sbi->s_next_gen_lock);

/* Precompute checksum seed for inode metadata */
- if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ if (ext4_has_metadata_csum(sb)) {
__u32 csum;
__le32 inum = cpu_to_le32(inode->i_ino);
__le32 gen = cpu_to_le32(inode->i_generation);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 7f03208..3af1fc4 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1124,8 +1124,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle,
memcpy((void *)de, buf + EXT4_INLINE_DOTDOT_SIZE,
inline_size - EXT4_INLINE_DOTDOT_SIZE);

- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(inode->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

inode->i_size = inode->i_sb->s_blocksize;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 37a8c8b..f2a02dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -83,8 +83,7 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw,

if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_LINUX) ||
- !EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ !ext4_has_metadata_csum(inode->i_sb))
return 1;

provided = le16_to_cpu(raw->i_checksum_lo);
@@ -105,8 +104,7 @@ static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,

if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_LINUX) ||
- !EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ !ext4_has_metadata_csum(inode->i_sb))
return;

csum = ext4_inode_csum(inode, raw, ei);
@@ -4086,8 +4084,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
ei->i_extra_isize = 0;

/* Precompute checksum seed for inode metadata */
- if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ if (ext4_has_metadata_csum(sb)) {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
__u32 csum;
__le32 inum = cpu_to_le32(inode->i_ino);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 2c3541b..fdcadc9 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -347,8 +347,7 @@ flags_out:
if (!inode_owner_or_capable(inode))
return -EPERM;

- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ if (ext4_has_metadata_csum(inode->i_sb)) {
ext4_warning(sb, "Setting inode version is not "
"supported with metadata_csum enabled.");
return -ENOTTY;
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 04434ad..1268a1b 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -20,8 +20,7 @@ static __le32 ext4_mmp_csum(struct super_block *sb, struct mmp_struct *mmp)

int ext4_mmp_csum_verify(struct super_block *sb, struct mmp_struct *mmp)
{
- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return 1;

return mmp->mmp_checksum == ext4_mmp_csum(sb, mmp);
@@ -29,8 +28,7 @@ int ext4_mmp_csum_verify(struct super_block *sb, struct mmp_struct *mmp)

void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
{
- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return;

mmp->mmp_checksum = ext4_mmp_csum(sb, mmp);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5bfbf94..08de398 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -123,8 +123,7 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
"directory leaf block found instead of index block");
return ERR_PTR(-EIO);
}
- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) ||
+ if (!ext4_has_metadata_csum(inode->i_sb) ||
buffer_verified(bh))
return bh;

@@ -339,8 +338,7 @@ int ext4_dirent_csum_verify(struct inode *inode, struct ext4_dir_entry *dirent)
{
struct ext4_dir_entry_tail *t;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return 1;

t = get_dirent_tail(inode, dirent);
@@ -361,8 +359,7 @@ static void ext4_dirent_csum_set(struct inode *inode,
{
struct ext4_dir_entry_tail *t;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return;

t = get_dirent_tail(inode, dirent);
@@ -437,8 +434,7 @@ static int ext4_dx_csum_verify(struct inode *inode,
struct dx_tail *t;
int count_offset, limit, count;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return 1;

c = get_dx_countlimit(inode, dirent, &count_offset);
@@ -467,8 +463,7 @@ static void ext4_dx_csum_set(struct inode *inode, struct ext4_dir_entry *dirent)
struct dx_tail *t;
int count_offset, limit, count;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return;

c = get_dx_countlimit(inode, dirent, &count_offset);
@@ -556,8 +551,7 @@ static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize)
unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(1) -
EXT4_DIR_REC_LEN(2) - infosize;

- if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(dir->i_sb))
entry_space -= sizeof(struct dx_tail);
return entry_space / sizeof(struct dx_entry);
}
@@ -566,8 +560,7 @@ static inline unsigned dx_node_limit(struct inode *dir)
{
unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(0);

- if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(dir->i_sb))
entry_space -= sizeof(struct dx_tail);
return entry_space / sizeof(struct dx_entry);
}
@@ -1535,8 +1528,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
int csum_size = 0;
int err = 0, i;

- if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(dir->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

bh2 = ext4_append(handle, dir, &newblock);
@@ -1705,8 +1697,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
int csum_size = 0;
int err;

- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(inode->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

if (!de) {
@@ -1773,8 +1764,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
struct fake_dirent *fde;
int csum_size = 0;

- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(inode->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

blocksize = dir->i_sb->s_blocksize;
@@ -1890,8 +1880,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
ext4_lblk_t block, blocks;
int csum_size = 0;

- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(inode->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

sb = dir->i_sb;
@@ -2153,8 +2142,7 @@ static int ext4_delete_entry(handle_t *handle,
return err;
}

- if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(dir->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

BUFFER_TRACE(bh, "get_write_access");
@@ -2373,8 +2361,7 @@ static int ext4_init_new_dir(handle_t *handle, struct inode *dir,
int csum_size = 0;
int err;

- if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(dir->i_sb))
csum_size = sizeof(struct ext4_dir_entry_tail);

if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index f3b84cd..14e0f8a 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1200,8 +1200,7 @@ static int ext4_set_bitmap_checksums(struct super_block *sb,
{
struct buffer_head *bh;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return 0;

bh = ext4_get_bitmap(sb, group_data->inode_bitmap);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c888f23..6686ce4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -140,8 +140,7 @@ static __le32 ext4_superblock_csum(struct super_block *sb,
int ext4_superblock_csum_verify(struct super_block *sb,
struct ext4_super_block *es)
{
- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return 1;

return es->s_checksum == ext4_superblock_csum(sb, es);
@@ -151,8 +150,7 @@ void ext4_superblock_csum_set(struct super_block *sb)
{
struct ext4_super_block *es = EXT4_SB(sb)->s_es;

- if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(sb))
return;

es->s_checksum = ext4_superblock_csum(sb, es);
@@ -2003,8 +2001,7 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
__u16 crc = 0;
__le32 le_group = cpu_to_le32(block_group);

- if ((sbi->s_es->s_feature_ro_compat &
- cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) {
+ if (ext4_has_metadata_csum(sbi->s_sb)) {
/* Use new metadata_csum algorithm */
__le16 save_csum;
__u32 csum32;
@@ -3160,8 +3157,7 @@ static int set_journal_csum_feature_set(struct super_block *sb)
int compat, incompat;
struct ext4_sb_info *sbi = EXT4_SB(sb);

- if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ if (ext4_has_metadata_csum(sb)) {
/* journal checksum v3 */
compat = 0;
incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3;
@@ -3468,8 +3464,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}

/* Precompute checksum seed for all metadata */
- if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (ext4_has_metadata_csum(sb))
sbi->s_csum_seed = ext4_chksum(sbi, ~0, es->s_uuid,
sizeof(es->s_uuid));

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index a20816e..a5d2f1b 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -141,8 +141,7 @@ static int ext4_xattr_block_csum_verify(struct inode *inode,
sector_t block_nr,
struct ext4_xattr_header *hdr)
{
- if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+ if (ext4_has_metadata_csum(inode->i_sb) &&
(hdr->h_checksum != ext4_xattr_block_csum(inode, block_nr, hdr)))
return 0;
return 1;
@@ -152,8 +151,7 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
sector_t block_nr,
struct ext4_xattr_header *hdr)
{
- if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ if (!ext4_has_metadata_csum(inode->i_sb))
return;

hdr->h_checksum = ext4_xattr_block_csum(inode, block_nr, hdr);
--
1.9.1

2014-11-06 22:50:48

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 032/162] sparc64: Fix corrupted thread fault code.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 84bd6d8b9c0f06b3f188efb479c77e20f05e9a8a ]

Every path that ends up at do_sparc64_fault() must install a valid
FAULT_CODE_* bitmask in the per-thread fault code byte.

Two paths leading to the label winfix_trampoline (which expects the
FAULT_CODE_* mask in register %g4) were not doing so:

1) For pre-hypervisor TLB protection violation traps, if we took
the 'winfix_trampoline' path we wouldn't have %g4 initialized
with the FAULT_CODE_* value yet. Resulting in using the
TLB_TAG_ACCESS register address value instead.

2) In the TSB miss path, when we notice that we are going to use a
hugepage mapping, but we haven't allocated the hugepage TSB yet, we
still have to take the window fixup case into consideration and
in that particular path we leave %g4 not setup properly.

Errors on this sort were largely invisible previously, but after
commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ("sparc64: sun4v TLB
error power off events") we now have a fault_code mask bit
(FAULT_CODE_BAD_RA) that triggers due to this bug.

FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
(see #1 above) and thus we get seemingly random bus errors triggered
for user processes.

Fixes: 4ccb9272892c ("sparc64: sun4v TLB error power off events")
Reported-by: Meelis Roos <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/dtlb_prot.S | 6 +++---
arch/sparc/kernel/tsb.S | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/kernel/dtlb_prot.S b/arch/sparc/kernel/dtlb_prot.S
index b2c2c5b..d668ca14 100644
--- a/arch/sparc/kernel/dtlb_prot.S
+++ b/arch/sparc/kernel/dtlb_prot.S
@@ -24,11 +24,11 @@
mov TLB_TAG_ACCESS, %g4 ! For reload of vaddr

/* PROT ** ICACHE line 2: More real fault processing */
+ ldxa [%g4] ASI_DMMU, %g5 ! Put tagaccess in %g5
bgu,pn %xcc, winfix_trampoline ! Yes, perform winfixup
- ldxa [%g4] ASI_DMMU, %g5 ! Put tagaccess in %g5
- ba,pt %xcc, sparc64_realfault_common ! Nope, normal fault
mov FAULT_CODE_DTLB | FAULT_CODE_WRITE, %g4
- nop
+ ba,pt %xcc, sparc64_realfault_common ! Nope, normal fault
+ nop
nop
nop
nop
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 14158d4..be98685 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -162,10 +162,10 @@ tsb_miss_page_table_walk_sun4v_fastpath:
nop
.previous

- rdpr %tl, %g3
- cmp %g3, 1
+ rdpr %tl, %g7
+ cmp %g7, 1
bne,pn %xcc, winfix_trampoline
- nop
+ mov %g3, %g4
ba,pt %xcc, etrap
rd %pc, %g7
call hugetlb_setup
--
1.9.1

2014-11-06 22:44:09

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 060/162] target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Quinn Tran <[email protected]>

commit 082f58ac4a48d3f5cb4597232cb2ac6823a96f43 upstream.

During temporary resource starvation at lower transport layer, command
is placed on queue full retry path, which expose this problem. The TCM
queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
cmd twice to lower layer. The 1st time led to cmd normal free path.
The 2nd time cause Null pointer access.

This regression bug was originally introduced v3.1-rc code in the
following commit:

commit e057f53308a5f071556ee80586b99ee755bf07f5
Author: Christoph Hellwig <[email protected]>
Date: Mon Oct 17 13:56:41 2011 -0400

target: remove the transport_qf_callback se_cmd callback

Signed-off-by: Quinn Tran <[email protected]>
Signed-off-by: Saurav Kashyap <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/target/target_core_transport.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index ea545f4..3acb125 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1817,8 +1817,7 @@ static void transport_complete_qf(struct se_cmd *cmd)
if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) {
trace_target_cmd_complete(cmd);
ret = cmd->se_tfo->queue_status(cmd);
- if (ret)
- goto out;
+ goto out;
}

switch (cmd->data_direction) {
--
1.9.1

2014-11-06 22:51:24

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 073/162] ARC: Update order of registers in KGDB to match GDB 7.5

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Kolesov <[email protected]>

commit ebc0c74e76cec9c4dd860eb0ca1c0b39dc63c482 upstream.

Order of registers has changed in GDB moving from 6.8 to 7.5. This patch
updates KGDB to work properly with GDB 7.5, though makes it incompatible
with 6.8.

Signed-off-by: Anton Kolesov <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/arc/include/asm/kgdb.h | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arc/include/asm/kgdb.h b/arch/arc/include/asm/kgdb.h
index b65fca7..fea9316 100644
--- a/arch/arc/include/asm/kgdb.h
+++ b/arch/arc/include/asm/kgdb.h
@@ -19,7 +19,7 @@
* register API yet */
#undef DBG_MAX_REG_NUM

-#define GDB_MAX_REGS 39
+#define GDB_MAX_REGS 87

#define BREAK_INSTR_SIZE 2
#define CACHE_FLUSH_IS_SAFE 1
@@ -33,23 +33,27 @@ static inline void arch_kgdb_breakpoint(void)

extern void kgdb_trap(struct pt_regs *regs);

-enum arc700_linux_regnums {
+/* This is the numbering of registers according to the GDB. See GDB's
+ * arc-tdep.h for details.
+ *
+ * Registers are ordered for GDB 7.5. It is incompatible with GDB 6.8. */
+enum arc_linux_regnums {
_R0 = 0,
_R1, _R2, _R3, _R4, _R5, _R6, _R7, _R8, _R9, _R10, _R11, _R12, _R13,
_R14, _R15, _R16, _R17, _R18, _R19, _R20, _R21, _R22, _R23, _R24,
_R25, _R26,
- _BTA = 27,
- _LP_START = 28,
- _LP_END = 29,
- _LP_COUNT = 30,
- _STATUS32 = 31,
- _BLINK = 32,
- _FP = 33,
- __SP = 34,
- _EFA = 35,
- _RET = 36,
- _ORIG_R8 = 37,
- _STOP_PC = 38
+ _FP = 27,
+ __SP = 28,
+ _R30 = 30,
+ _BLINK = 31,
+ _LP_COUNT = 60,
+ _STOP_PC = 64,
+ _RET = 64,
+ _LP_START = 65,
+ _LP_END = 66,
+ _STATUS32 = 67,
+ _ECR = 76,
+ _BTA = 82,
};

#else
--
1.9.1

2014-11-06 22:52:19

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 058/162] jbd2: free bh when descriptor block checksum fails

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <[email protected]>

commit 064d83892e9ba547f7d4eae22cbca066d95210ce upstream.

Free the buffer head if the journal descriptor block fails checksum
verification.

This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
verify error in do_one_pass".

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/jbd2/recovery.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 9b329b5..bcbef08 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -525,6 +525,7 @@ static int do_one_pass(journal_t *journal,
!jbd2_descr_block_csum_verify(journal,
bh->b_data)) {
err = -EIO;
+ brelse(bh);
goto failed;
}

--
1.9.1

2014-11-06 22:52:44

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 079/162] ALSA: pcm: use the same dma mmap codepath both for arm and arm64

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Anatol Pomozov <[email protected]>

commit a011e213f3700233ed2a676f1ef0a74a052d7162 upstream.

This avoids following kernel crash when try to playback on arm64

[ 107.497203] [<ffffffc00046b310>] snd_pcm_mmap_data_fault+0x90/0xd4
[ 107.503405] [<ffffffc0001541ac>] __do_fault+0xb0/0x498
[ 107.508565] [<ffffffc0001576a0>] handle_mm_fault+0x224/0x7b0
[ 107.514246] [<ffffffc000092640>] do_page_fault+0x11c/0x310
[ 107.519738] [<ffffffc000081100>] do_mem_abort+0x38/0x98

Tested: backported to 3.14 and tried to playback on arm64 machine

Signed-off-by: Anatol Pomozov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
sound/core/pcm_native.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 01a5e05..566b0f6 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3189,7 +3189,7 @@ static const struct vm_operations_struct snd_pcm_vm_ops_data_fault = {

#ifndef ARCH_HAS_DMA_MMAP_COHERENT
/* This should be defined / handled globally! */
-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
#define ARCH_HAS_DMA_MMAP_COHERENT
#endif
#endif
--
1.9.1

2014-11-06 22:53:21

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 067/162] ext4: don't orphan or truncate the boot loader inode

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <[email protected]>

commit e2bfb088fac03c0f621886a04cffc7faa2b49b1d upstream.

The boot loader inode (inode #5) should never be visible in the
directory hierarchy, but it's possible if the file system is corrupted
that there will be a directory entry that points at inode #5. In
order to avoid accidentally trashing it, when such a directory inode
is opened, the inode will be marked as a bad inode, so that it's not
possible to modify (or read) the inode from userspace.

Unfortunately, when we unlink this (invalid/illegal) directory entry,
we will put the bad inode on the ophan list, and then when try to
unlink the directory, we don't actually remove the bad inode from the
orphan list before freeing in-memory inode structure. This means the
in-memory orphan list is corrupted, leading to a kernel oops.

In addition, avoid truncating a bad inode in ext4_destroy_inode(),
since truncating the boot loader inode is not a smart thing to do.

Reported-by: Sami Liedes <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
[ kamal: backport to 3.13-stable (context) ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/inode.c | 7 +++----
fs/ext4/namei.c | 2 +-
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fd6501e3..37d3b1a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -221,16 +221,15 @@ void ext4_evict_inode(struct inode *inode)
goto no_delete;
}

- if (!is_bad_inode(inode))
- dquot_initialize(inode);
+ if (is_bad_inode(inode))
+ goto no_delete;
+ dquot_initialize(inode);

if (ext4_should_order_data(inode))
ext4_begin_ordered_truncate(inode, 0);
truncate_inode_pages(&inode->i_data, 0);

WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
- if (is_bad_inode(inode))
- goto no_delete;

/*
* Protect us against freezing - iput() caller didn't have to have any
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5a0408d..d4fa3ed 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2554,7 +2554,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0, rc;

- if (!EXT4_SB(sb)->s_journal)
+ if (!EXT4_SB(sb)->s_journal || is_bad_inode(inode))
return 0;

mutex_lock(&EXT4_SB(sb)->s_orphan_lock);
--
1.9.1

2014-11-06 22:53:23

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 063/162] ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Monakhov <[email protected]>

commit 3e67cfad22230ebed85c56cbe413876f33fea82b upstream.

Otherwise this provokes complain like follows:
WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
Call Trace:
[<ffffffff815c7dfc>] dump_stack+0x51/0x6d
[<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0
[<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
[<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0
[<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110
[<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0
[<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0
[<ffffffff81088530>] ? alloc_pid+0x480/0x480
[<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0
[<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550
[<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff81186603>] SyS_ioctl+0x53/0x80
[<ffffffff815ce2ce>] tracesys+0xd0/0xd5

Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/ioctl.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 4a5fe55..2c3541b 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -548,9 +548,17 @@ group_add_out:
}

case EXT4_IOC_SWAP_BOOT:
+ {
+ int err;
if (!(filp->f_mode & FMODE_WRITE))
return -EBADF;
- return swap_inode_boot_loader(sb, inode);
+ err = mnt_want_write_file(filp);
+ if (err)
+ return err;
+ err = swap_inode_boot_loader(sb, inode);
+ mnt_drop_write_file(filp);
+ return err;
+ }

case EXT4_IOC_RESIZE_FS: {
ext4_fsblk_t n_blocks_count;
--
1.9.1

2014-11-06 22:43:51

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 090/162] cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dirk Brandewie <[email protected]>

commit c034b02e213d271b98c45c4a7b54af8f69aaac1e upstream.

Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741
Signed-off-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/cpufreq.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c166b4a..a9cd300 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -405,7 +405,18 @@ show_one(cpuinfo_max_freq, cpuinfo.max_freq);
show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
show_one(scaling_min_freq, min);
show_one(scaling_max_freq, max);
-show_one(scaling_cur_freq, cur);
+
+static ssize_t show_scaling_cur_freq(
+ struct cpufreq_policy *policy, char *buf)
+{
+ ssize_t ret;
+
+ if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get)
+ ret = sprintf(buf, "%u\n", cpufreq_driver->get(policy->cpu));
+ else
+ ret = sprintf(buf, "%u\n", policy->cur);
+ return ret;
+}

static int cpufreq_set_policy(struct cpufreq_policy *policy,
struct cpufreq_policy *new_policy);
@@ -799,11 +810,11 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
if (ret)
goto err_out_kobj_put;
}
- if (has_target()) {
- ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
- if (ret)
- goto err_out_kobj_put;
- }
+
+ ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
+ if (ret)
+ goto err_out_kobj_put;
+
if (cpufreq_driver->bios_limit) {
ret = sysfs_create_file(&policy->kobj, &bios_limit.attr);
if (ret)
--
1.9.1

2014-11-06 22:54:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 095/162] MIPS: ftrace: Fix a microMIPS build problem

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Markos Chandras <[email protected]>

commit aedd153f5bb5b1f1d6d9142014f521ae2ec294cc upstream.

Code before the .fixup section needs to have the .insn directive.
This has no side effects on MIPS32/64 but it affects the way microMIPS
loads the address for the return label.

Fixes the following build problem:
mips-linux-gnu-ld: arch/mips/built-in.o: .fixup+0x4a0: Unsupported jump between
ISA modes; consider recompiling with interlinking enabled.
mips-linux-gnu-ld: final link failed: Bad value
Makefile:819: recipe for target 'vmlinux' failed

The fix is similar to 1658f914ff91c3bf ("MIPS: microMIPS:
Disable LL/SC and fix linker bug.")

Signed-off-by: Markos Chandras <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8117/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/mips/include/asm/ftrace.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
index ce35c9a..370ae7c 100644
--- a/arch/mips/include/asm/ftrace.h
+++ b/arch/mips/include/asm/ftrace.h
@@ -24,7 +24,7 @@ do { \
asm volatile ( \
"1: " load " %[" STR(dst) "], 0(%[" STR(src) "])\n"\
" li %[" STR(error) "], 0\n" \
- "2:\n" \
+ "2: .insn\n" \
\
".section .fixup, \"ax\"\n" \
"3: li %[" STR(error) "], 1\n" \
@@ -46,7 +46,7 @@ do { \
asm volatile ( \
"1: " store " %[" STR(src) "], 0(%[" STR(dst) "])\n"\
" li %[" STR(error) "], 0\n" \
- "2:\n" \
+ "2: .insn\n" \
\
".section .fixup, \"ax\"\n" \
"3: li %[" STR(error) "], 1\n" \
--
1.9.1

2014-11-06 22:54:15

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 071/162] ext4: move error report out of atomic context in ext4_init_block_bitmap()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Monakhov <[email protected]>

commit aef4885ae14f1df75b58395c5314d71f613d26d9 upstream.

Error report likely result in IO so it is bad idea to do it from
atomic context.

This patch should fix following issue:

BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
5 locks held by kworker/u128:1/137:
#0: ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
#1: ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
#2: (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
#3: (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
#4: (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
Workqueue: writeback bdi_writeback_workfn (flush-1:0)
0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
Call Trace:
[<ffffffff815c7fdc>] dump_stack+0x51/0x6d
[<ffffffff8108fb30>] __might_sleep+0xf0/0x100
[<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
[<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
[<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
[<ffffffff8120fa03>] save_error_info+0x23/0x30
[<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
[<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
[<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
[<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
[<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
[<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
[<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
[<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
[<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
[<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
[<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
[<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
[<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
[<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
[<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
[<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
[<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
[<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
[<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
[<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
[<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
[<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
[<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
[<ffffffff811377e3>] do_writepages+0x23/0x40
[<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
[<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
[<ffffffff811a0664>] wb_writeback+0x174/0x2d0
[<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
[<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
[<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
[<ffffffff81085618>] ? process_one_work+0x228/0x4d0
[<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
[<ffffffff81085618>] ? process_one_work+0x228/0x4d0
[<ffffffff81085c1d>] worker_thread+0x35d/0x460
[<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
[<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
[<ffffffff8108a885>] kthread+0xf5/0x100
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
[<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
[<ffffffff8108a790>] ? __init_kthread_work

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
[ kamal: backport to 3.13-stable: also make ext4_init_block_bitmap static,
as per c197855ea ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/balloc.c | 13 +++++++++----
fs/ext4/ext4.h | 4 ----
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 6ea7b14..c8c3cb2 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -176,7 +176,8 @@ static unsigned int num_clusters_in_group(struct super_block *sb,
}

/* Initializes an uninitialized block bitmap */
-void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
+static int ext4_init_block_bitmap(struct super_block *sb,
+ struct buffer_head *bh,
ext4_group_t block_group,
struct ext4_group_desc *gdp)
{
@@ -191,11 +192,10 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
/* If checksum is bad mark all blocks used to prevent allocation
* essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
- ext4_error(sb, "Checksum bad for group %u", block_group);
grp = ext4_get_group_info(sb, block_group);
set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT, &grp->bb_state);
set_bit(EXT4_GROUP_INFO_IBITMAP_CORRUPT_BIT, &grp->bb_state);
- return;
+ return -EIO;
}
memset(bh->b_data, 0, sb->s_blocksize);

@@ -233,6 +233,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
sb->s_blocksize * 8, bh->b_data);
ext4_block_bitmap_csum_set(sb, block_group, gdp, bh);
ext4_group_desc_csum_set(sb, block_group, gdp);
+ return 0;
}

/* Return the number of free blocks in a block group. It is used when
@@ -419,11 +420,15 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group)
}
ext4_lock_group(sb, block_group);
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
- ext4_init_block_bitmap(sb, bh, block_group, desc);
+ int err;
+
+ err = ext4_init_block_bitmap(sb, bh, block_group, desc);
set_bitmap_uptodate(bh);
set_buffer_uptodate(bh);
ext4_unlock_group(sb, block_group);
unlock_buffer(bh);
+ if (err)
+ ext4_error(sb, "Checksum bad for grp %u", block_group);
return bh;
}
ext4_unlock_group(sb, block_group);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0e9930f..5b250fe 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1979,10 +1979,6 @@ extern int ext4_wait_block_bitmap(struct super_block *sb,
struct buffer_head *bh);
extern struct buffer_head *ext4_read_block_bitmap(struct super_block *sb,
ext4_group_t block_group);
-extern void ext4_init_block_bitmap(struct super_block *sb,
- struct buffer_head *bh,
- ext4_group_t group,
- struct ext4_group_desc *desc);
extern unsigned ext4_free_clusters_after_init(struct super_block *sb,
ext4_group_t block_group,
struct ext4_group_desc *gdp);
--
1.9.1

2014-11-06 22:55:25

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 089/162] cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <[email protected]>

commit 36b4bed5cd8f6e17019fa7d380e0836872c7b367 upstream.

Code which changes policy to powersave changes also max_policy_pct based on
max_freq. Code which change max_perf_pct has upper limit base on value
max_policy_pct. When policy is changing from powersave back to performance
then max_policy_pct is not changed. Which means that changing max_perf_pct is
not possible to high values if max_freq was too low in powersave policy.

Test case:

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
800000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100

$ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
800000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
20

$ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
24

And now intel_pstate driver allows to set maximal value for max_perf_pct based
on max_policy_pct which is 24 for previous powersave max_freq 800000.

This patch will set default value for max_policy_pct when setting policy to
performance so it will allow to set also max value for max_perf_pct.

Signed-off-by: Pali Rohár <[email protected]>
Acked-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2d26563..d21cd02 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -733,6 +733,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
if (policy->policy == CPUFREQ_POLICY_PERFORMANCE) {
limits.min_perf_pct = 100;
limits.min_perf = int_tofp(1);
+ limits.max_policy_pct = 100;
limits.max_perf_pct = 100;
limits.max_perf = int_tofp(1);
limits.no_turbo = limits.turbo_disabled;
--
1.9.1

2014-11-06 22:55:55

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 059/162] ext4: don't check quota format when there are no quota files

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 279bf6d390933d5353ab298fcc306c391a961469 upstream.

The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/super.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a46030d..5ee03e5d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1706,13 +1706,6 @@ static int parse_options(char *options, struct super_block *sb,
"not specified");
return 0;
}
- } else {
- if (sbi->s_jquota_fmt) {
- ext4_msg(sb, KERN_ERR, "journaled quota format "
- "specified with no journaling "
- "enabled");
- return 0;
- }
}
#endif
if (test_opt(sb, DIOREAD_NOLOCK)) {
--
1.9.1

2014-11-06 22:55:53

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 080/162] ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Harsha Priya <[email protected]>

commit b450b17c156e264bc44a198046d3ebaaef5a041d upstream.

This patch sets the headphones mode to default before suspending
which helps avoid the pop noise on headphones

Signed-off-by: Harsha Priya <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
sound/pci/hda/patch_realtek.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b2a5832..1a24b2c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -2871,6 +2871,9 @@ static void alc283_shutup(struct hda_codec *codec)

alc_write_coef_idx(codec, 0x43, 0x9004);

+ /*depop hp during suspend*/
+ alc_write_coef_idx(codec, 0x06, 0x2100);
+
snd_hda_codec_write(codec, hp_pin, 0,
AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE);

--
1.9.1

2014-11-06 22:56:57

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 085/162] OOM, PM: OOM killed task shouldn't escape PM suspend

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Michal Hocko <[email protected]>

commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
check_frozen_processes and invert the condition to make the code more
readable as per Rafael

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/linux/oom.h | 3 +++
kernel/power/process.c | 40 +++++++++++++++++++++++++++++++++++++++-
mm/oom_kill.c | 17 +++++++++++++++++
mm/page_alloc.c | 8 ++++++++
4 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 4cd6267..17f0949 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -50,6 +50,9 @@ static inline bool oom_task_origin(const struct task_struct *p)
extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
unsigned long totalpages);
+
+extern int oom_kills_count(void);
+extern void note_oom_kill(void);
extern void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
unsigned int points, unsigned long totalpages,
struct mem_cgroup *memcg, nodemask_t *nodemask,
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 14f9a8d..f1fe7ec 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -107,6 +107,28 @@ static int try_to_freeze_tasks(bool user_only)
return todo ? -EBUSY : 0;
}

+/*
+ * Returns true if all freezable tasks (except for current) are frozen already
+ */
+static bool check_frozen_processes(void)
+{
+ struct task_struct *g, *p;
+ bool ret = true;
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, p) {
+ if (p != current && !freezer_should_skip(p) &&
+ !frozen(p)) {
+ ret = false;
+ goto done;
+ }
+ }
+done:
+ read_unlock(&tasklist_lock);
+
+ return ret;
+}
+
/**
* freeze_processes - Signal user space processes to enter the refrigerator.
* The current thread will not be frozen. The same process that calls
@@ -117,6 +139,7 @@ static int try_to_freeze_tasks(bool user_only)
int freeze_processes(void)
{
int error;
+ int oom_kills_saved;

error = __usermodehelper_disable(UMH_FREEZING);
if (error)
@@ -130,12 +153,27 @@ int freeze_processes(void)

printk("Freezing user space processes ... ");
pm_freezing = true;
+ oom_kills_saved = oom_kills_count();
error = try_to_freeze_tasks(true);
if (!error) {
- printk("done.");
__usermodehelper_set_disable_depth(UMH_DISABLED);
oom_killer_disable();
+
+ /*
+ * There might have been an OOM kill while we were
+ * freezing tasks and the killed task might be still
+ * on the way out so we have to double check for race.
+ */
+ if (oom_kills_count() != oom_kills_saved &&
+ !check_frozen_processes()) {
+ __usermodehelper_set_disable_depth(UMH_ENABLED);
+ printk("OOM in progress.");
+ error = -EBUSY;
+ goto done;
+ }
+ printk("done.");
}
+done:
printk("\n");
BUG_ON(in_atomic());

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 616b2eb..9c01afd 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -394,6 +394,23 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
dump_tasks(memcg, nodemask);
}

+/*
+ * Number of OOM killer invocations (including memcg OOM killer).
+ * Primarily used by PM freezer to check for potential races with
+ * OOM killed frozen task.
+ */
+static atomic_t oom_kills = ATOMIC_INIT(0);
+
+int oom_kills_count(void)
+{
+ return atomic_read(&oom_kills);
+}
+
+void note_oom_kill(void)
+{
+ atomic_inc(&oom_kills);
+}
+
#define K(x) ((x) << (PAGE_SHIFT-10))
/*
* Must be called while holding a reference to p, which will be released upon
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 38dca81..83a20df 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2178,6 +2178,14 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
}

/*
+ * PM-freezer should be notified that there might be an OOM killer on
+ * its way to kill and wake somebody up. This is too early and we might
+ * end up not killing anything but false positives are acceptable.
+ * See freeze_processes.
+ */
+ note_oom_kill();
+
+ /*
* Go through the zonelist yet one more time, keep very high watermark
* here, this is only to catch a parallel oom killing, we must fail if
* we're still under heavy pressure.
--
1.9.1

2014-11-06 22:43:40

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 087/162] MIPS: tlbex: Properly fix HUGE TLB Refill exception handler

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: David Daney <[email protected]>

commit 9e0f162a36914937a937358fcb45e0609ef2bfc4 upstream.

In commit 8393c524a25609 (MIPS: tlbex: Fix a missing statement for
HUGETLB), the TLB Refill handler was fixed so that non-OCTEON targets
would work properly with huge pages. The change was incorrect in that
it broke the OCTEON case.

The problem is shown here:

xxx0: df7a0000 ld k0,0(k1)
.
.
.
xxxc0: df610000 ld at,0(k1)
xxxc4: 335a0ff0 andi k0,k0,0xff0
xxxc8: e825ffcd bbit1 at,0x5,0x0
xxxcc: 003ad82d daddu k1,at,k0
.
.
.

In the non-octeon case there is a destructive test for the huge PTE
bit, and then at 0, $k0 is reloaded (that is what the 8393c524a25609
patch added).

In the octeon case, we modify k1 in the branch delay slot, but we
never need k0 again, so the new load is not needed, but since k1 is
modified, if we do the load, we load from a garbage location and then
get a nested TLB Refill, which is seen in userspace as either SIGBUS
or SIGSEGV (depending on the garbage).

The real fix is to only do this reloading if it is needed, and never
where it is harmful.

Signed-off-by: David Daney <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Fuxin Zhang <[email protected]>
Cc: Zhangjin Wu <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8151/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/mips/mm/tlbex.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index ec90a27..4b95653 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1057,6 +1057,7 @@ static void build_update_entries(u32 **p, unsigned int tmp, unsigned int ptep)
struct mips_huge_tlb_info {
int huge_pte;
int restore_scratch;
+ bool need_reload_pte;
};

static struct mips_huge_tlb_info
@@ -1071,6 +1072,7 @@ build_fast_tlb_refill_handler (u32 **p, struct uasm_label **l,

rv.huge_pte = scratch;
rv.restore_scratch = 0;
+ rv.need_reload_pte = false;

if (check_for_high_segbits) {
UASM_i_MFC0(p, tmp, C0_BADVADDR);
@@ -1259,6 +1261,7 @@ static void build_r4000_tlb_refill_handler(void)
} else {
htlb_info.huge_pte = K0;
htlb_info.restore_scratch = 0;
+ htlb_info.need_reload_pte = true;
vmalloc_mode = refill_noscratch;
/*
* create the plain linear handler
@@ -1295,7 +1298,8 @@ static void build_r4000_tlb_refill_handler(void)
}
#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
uasm_l_tlb_huge_update(&l, p);
- UASM_i_LW(&p, K0, 0, K1);
+ if (htlb_info.need_reload_pte)
+ UASM_i_LW(&p, htlb_info.huge_pte, 0, K1);
build_huge_update_entries(&p, htlb_info.huge_pte, K1);
build_huge_tlb_write_entry(&p, &l, &r, K0, tlb_random,
htlb_info.restore_scratch);
--
1.9.1

2014-11-06 22:57:47

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 086/162] qxl: don't create too large primary surface

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <[email protected]>

commit c572aaf46f71f63ae5914d4e194a955e0ba1b519 upstream.

Limit primary to qemu vgamem size, to avoid reaching
qemu guest bug "requested primary larger than framebuffer"
on resizing screen too large to fit.

Remove unneeded and misleading variables.

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1127552

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/gpu/drm/qxl/qxl_display.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
index d70aafb..fb71c10 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -516,7 +516,6 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
struct qxl_framebuffer *qfb;
struct qxl_bo *bo, *old_bo = NULL;
struct qxl_crtc *qcrtc = to_qxl_crtc(crtc);
- uint32_t width, height, base_offset;
bool recreate_primary = false;
int ret;
int surf_id;
@@ -546,9 +545,10 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
if (qcrtc->index == 0)
recreate_primary = true;

- width = mode->hdisplay;
- height = mode->vdisplay;
- base_offset = 0;
+ if (bo->surf.stride * bo->surf.height > qdev->vram_size) {
+ DRM_ERROR("Mode doesn't fit in vram size (vgamem)");
+ return -EINVAL;
+ }

ret = qxl_bo_reserve(bo, false);
if (ret != 0)
@@ -562,10 +562,10 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
if (recreate_primary) {
qxl_io_destroy_primary(qdev);
qxl_io_log(qdev,
- "recreate primary: %dx%d (was %dx%d,%d,%d)\n",
- width, height, bo->surf.width,
- bo->surf.height, bo->surf.stride, bo->surf.format);
- qxl_io_create_primary(qdev, base_offset, bo);
+ "recreate primary: %dx%d,%d,%d\n",
+ bo->surf.width, bo->surf.height,
+ bo->surf.stride, bo->surf.format);
+ qxl_io_create_primary(qdev, 0, bo);
bo->is_primary = true;
surf_id = 0;
} else {
--
1.9.1

2014-11-06 22:58:03

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 082/162] ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Anssi Hannula <[email protected]>

commit 6acce400d9daf1353fbf497302670c90a3205e1d upstream.

The ELD ALSA control change event is sent by hdmi_present_sense() when
eld_changed is true.

Currently, it is only true when the ELD buffer contents have been
modified. However, the user-visible ELD controls also change to a
zero-length value and back when eld_valid is unset/set, and no event is
currently sent in such cases (such as when unplugging or replugging a
sink).

Fix the code to always set eld_changed if eld_valid value is changed,
and therefore to always send the change event when the user-visible
value changes.

Signed-off-by: Anssi Hannula <[email protected]>
Cc: David Henningsson <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
sound/pci/hda/patch_hdmi.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index 9c67b07..c2c80ca 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -1575,19 +1575,22 @@ static bool hdmi_present_sense(struct hdmi_spec_per_pin *per_pin, int repoll)
}
}

- if (pin_eld->eld_valid && !eld->eld_valid) {
- update_eld = true;
+ if (pin_eld->eld_valid != eld->eld_valid)
eld_changed = true;
- }
+
+ if (pin_eld->eld_valid && !eld->eld_valid)
+ update_eld = true;
+
if (update_eld) {
bool old_eld_valid = pin_eld->eld_valid;
pin_eld->eld_valid = eld->eld_valid;
- eld_changed = pin_eld->eld_size != eld->eld_size ||
+ if (pin_eld->eld_size != eld->eld_size ||
memcmp(pin_eld->eld_buffer, eld->eld_buffer,
- eld->eld_size) != 0;
- if (eld_changed)
+ eld->eld_size) != 0) {
memcpy(pin_eld->eld_buffer, eld->eld_buffer,
eld->eld_size);
+ eld_changed = true;
+ }
pin_eld->eld_size = eld->eld_size;
pin_eld->info = eld->info;

--
1.9.1

2014-11-06 22:58:40

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 092/162] intel_pstate: Don't lose sysfs settings during cpu offline

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dirk Brandewie <[email protected]>

commit c034871712730a33e0267095f48b62eae958499c upstream.

The user may have custom settings don't destroy them during suspend.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=80651
Reported-by: Tobias Jakobi <[email protected]>
Signed-off-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 7d622cd..4469fed 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -711,7 +711,9 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
if (!id)
return -ENODEV;

- all_cpu_data[cpunum] = kzalloc(sizeof(struct cpudata), GFP_KERNEL);
+ if (!all_cpu_data[cpunum])
+ all_cpu_data[cpunum] = kzalloc(sizeof(struct cpudata),
+ GFP_KERNEL);
if (!all_cpu_data[cpunum])
return -ENOMEM;

@@ -799,8 +801,6 @@ static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
int cpu = policy->cpu;

del_timer(&all_cpu_data[cpu]->timer);
- kfree(all_cpu_data[cpu]);
- all_cpu_data[cpu] = NULL;
return 0;
}

--
1.9.1

2014-11-06 22:59:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 091/162] cpufreq: intel_pstate: Reflect current no_turbo state correctly

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Gabriele Mazzotta <[email protected]>

commit 4521e1a0ce173daa23dfef8312d09051e057ff8e upstream.

Some BIOSes modify the state of MSR_IA32_MISC_ENABLE_TURBO_DISABLE
based on the current power source for the system battery AC vs
battery. Reflect the correct current state and ability to modify the
no_turbo sysfs file based on current state of
MSR_IA32_MISC_ENABLE_TURBO_DISABLE.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=83151
Signed-off-by: Gabriele Mazzotta <[email protected]>
Signed-off-by: Dirk Brandewie <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 49 +++++++++++++++++++++++++++++++-----------
1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d21cd02..7d622cd 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -143,6 +143,7 @@ struct perf_limits {

static struct perf_limits limits = {
.no_turbo = 0,
+ .turbo_disabled = 0,
.max_perf_pct = 100,
.max_perf = int_tofp(1),
.min_perf_pct = 0,
@@ -227,6 +228,18 @@ static inline void intel_pstate_reset_all_pid(void)
}
}

+static inline void update_turbo_state(void)
+{
+ u64 misc_en;
+ struct cpudata *cpu;
+
+ cpu = all_cpu_data[0];
+ rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
+ limits.turbo_disabled =
+ (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
+ cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
+}
+
/************************** debugfs begin ************************/
static int pid_param_set(void *data, u64 val)
{
@@ -283,6 +296,20 @@ static void intel_pstate_debug_expose_params(void)
return sprintf(buf, "%u\n", limits.object); \
}

+static ssize_t show_no_turbo(struct kobject *kobj,
+ struct attribute *attr, char *buf)
+{
+ ssize_t ret;
+
+ update_turbo_state();
+ if (limits.turbo_disabled)
+ ret = sprintf(buf, "%u\n", limits.turbo_disabled);
+ else
+ ret = sprintf(buf, "%u\n", limits.no_turbo);
+
+ return ret;
+}
+
static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
const char *buf, size_t count)
{
@@ -291,11 +318,14 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
ret = sscanf(buf, "%u", &input);
if (ret != 1)
return -EINVAL;
- limits.no_turbo = clamp_t(int, input, 0 , 1);
+
+ update_turbo_state();
if (limits.turbo_disabled) {
pr_warn("Turbo disabled by BIOS or unavailable on processor\n");
- limits.no_turbo = limits.turbo_disabled;
+ return -EPERM;
}
+ limits.no_turbo = clamp_t(int, input, 0, 1);
+
return count;
}

@@ -328,7 +358,6 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
return count;
}

-show_one(no_turbo, no_turbo);
show_one(max_perf_pct, max_perf_pct);
show_one(min_perf_pct, min_perf_pct);

@@ -503,7 +532,8 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max)
int max_perf = cpu->pstate.turbo_pstate;
int max_perf_adj;
int min_perf;
- if (limits.no_turbo)
+
+ if (limits.no_turbo || limits.turbo_disabled)
max_perf = cpu->pstate.max_pstate;

max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf));
@@ -519,6 +549,8 @@ static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate)
{
int max_perf, min_perf;

+ update_turbo_state();
+
intel_pstate_get_min_max(cpu, &min_perf, &max_perf);

pstate = clamp_t(int, pstate, min_perf, max_perf);
@@ -736,7 +768,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
limits.max_policy_pct = 100;
limits.max_perf_pct = 100;
limits.max_perf = int_tofp(1);
- limits.no_turbo = limits.turbo_disabled;
+ limits.no_turbo = 0;
return 0;
}
limits.min_perf_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
@@ -776,7 +808,6 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
{
struct cpudata *cpu;
int rc;
- u64 misc_en;

rc = intel_pstate_init_cpu(policy->cpu);
if (rc)
@@ -784,12 +815,6 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy)

cpu = all_cpu_data[policy->cpu];

- rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
- if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
- cpu->pstate.max_pstate == cpu->pstate.turbo_pstate) {
- limits.turbo_disabled = 1;
- limits.no_turbo = 1;
- }
if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100)
policy->policy = CPUFREQ_POLICY_PERFORMANCE;
else
--
1.9.1

2014-11-06 22:59:32

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 097/162] KVM: x86: Prevent host from panicking on shared MSR writes.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Honig <[email protected]>

commit 8b3c3104c3f4f706e99365c3e0d2aa61b95f969f upstream.

The previous patch blocked invalid writes directly when the MSR
is written. As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Signed-off-by: Andrew Honig <[email protected]>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx.c | 7 +++++--
arch/x86/kvm/x86.c | 11 ++++++++---
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b39dfd..dba6f79 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1053,7 +1053,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu);

void kvm_define_shared_msr(unsigned index, u32 msr);
-void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
+int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);

bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 21cacdc..4dd0201 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2546,12 +2546,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
msr = find_msr_entry(vmx, msr_index);
if (msr) {
+ u64 old_msr_data = msr->data;
msr->data = data;
if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
preempt_disable();
- kvm_set_shared_msr(msr->index, msr->data,
- msr->mask);
+ ret = kvm_set_shared_msr(msr->index, msr->data,
+ msr->mask);
preempt_enable();
+ if (ret)
+ msr->data = old_msr_data;
}
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f435e75..450c5c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -225,20 +225,25 @@ static void kvm_shared_msr_cpu_online(void)
shared_msr_update(i, shared_msrs_global.msrs[i]);
}

-void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
+int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
{
unsigned int cpu = smp_processor_id();
struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
+ int err;

if (((value ^ smsr->values[slot].curr) & mask) == 0)
- return;
+ return 0;
smsr->values[slot].curr = value;
- wrmsrl(shared_msrs_global.msrs[slot], value);
+ err = wrmsrl_safe(shared_msrs_global.msrs[slot], value);
+ if (err)
+ return 1;
+
if (!smsr->registered) {
smsr->urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(&smsr->urn);
smsr->registered = true;
}
+ return 0;
}
EXPORT_SYMBOL_GPL(kvm_set_shared_msr);

--
1.9.1

2014-11-06 22:42:48

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 109/162] iio: st_sensors: Fix buffer copy

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Robin van der Gracht <[email protected]>

commit 4250c90b30b8bf2a1a21122ba0484f8f351f152d upstream.

Use byte_for_channel as iterator to properly initialize the buffer.

Signed-off-by: Robin van der Gracht <[email protected]>
Acked-by: Denis Ciocca <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/iio/common/st_sensors/st_sensors_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iio/common/st_sensors/st_sensors_buffer.c b/drivers/iio/common/st_sensors/st_sensors_buffer.c
index 1665c8e..e18bc67 100644
--- a/drivers/iio/common/st_sensors/st_sensors_buffer.c
+++ b/drivers/iio/common/st_sensors/st_sensors_buffer.c
@@ -71,7 +71,7 @@ int st_sensors_get_buffer_element(struct iio_dev *indio_dev, u8 *buf)
goto st_sensors_free_memory;
}

- for (i = 0; i < n * num_data_channels; i++) {
+ for (i = 0; i < n * byte_for_channel; i++) {
if (i < n)
buf[i] = rx_array[i];
else
--
1.9.1

2014-11-06 23:00:26

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 105/162] KVM: x86: Handle errors when RIP is set during far jumps

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit d1442d85cc30ea75f7d399474ca738e0bc96f715 upstream.

Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done. The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state. Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[ kamal: backport to 3.13-stable: context; __long_segment_descriptor args ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/emulate.c | 117 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 87 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 300c164..a440bea 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1437,7 +1437,8 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,

/* Does not support long mode */
static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, int seg, u8 cpl)
+ u16 selector, int seg, u8 cpl,
+ struct desc_struct *desc)
{
struct desc_struct seg_desc, old_desc;
u8 dpl, rpl;
@@ -1563,6 +1564,8 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
}
load:
ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg);
+ if (desc)
+ *desc = seg_desc;
return X86EMUL_CONTINUE;
exception:
emulate_exception(ctxt, err_vec, err_code, true);
@@ -1573,7 +1576,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, int seg)
{
u8 cpl = ctxt->ops->cpl(ctxt);
- return __load_segment_descriptor(ctxt, selector, seg, cpl);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl, NULL);
}

static void write_register_operand(struct operand *op)
@@ -1970,17 +1973,31 @@ static int em_iret(struct x86_emulate_ctxt *ctxt)
static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned short sel;
+ unsigned short sel, old_sel;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ u8 cpl = ctxt->ops->cpl(ctxt);
+
+ /* Assignment of RIP may only fail in 64-bit mode */
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
+ VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);

- rc = load_segment_descriptor(ctxt, sel, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ &new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
- return X86EMUL_CONTINUE;
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ /* assigning eip failed; restore the old cs */
+ ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+ }
+ return rc;
}

static int em_grp45(struct x86_emulate_ctxt *ctxt)
@@ -2044,21 +2061,34 @@ static int em_ret(struct x86_emulate_ctxt *ctxt)
static int em_ret_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned long cs;
+ unsigned long eip, cs;
+ u16 old_cs;
int cpl = ctxt->ops->cpl(ctxt);
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
+ VCPU_SREG_CS);

- rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
- if (ctxt->op_bytes == 4)
- ctxt->_eip = (u32)ctxt->_eip;
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
/* Outer-privilege level return is not implemented */
if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
return X86EMUL_UNHANDLEABLE;
- rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, 0,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+ rc = assign_eip_far(ctxt, eip, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ }
return rc;
}

@@ -2482,19 +2512,24 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
* Now load segment descriptors. If fault happens at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2620,25 +2655,32 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
+ cpl, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2918,24 +2960,39 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
u16 sel, old_cs;
ulong old_eip;
int rc;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ int cpl = ctxt->ops->cpl(ctxt);

- old_cs = get_segment_selector(ctxt, VCPU_SREG_CS);
old_eip = ctxt->_eip;
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
- if (load_segment_descriptor(ctxt, sel, VCPU_SREG_CS))
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;

ctxt->src.val = old_cs;
rc = em_push(ctxt);
if (rc != X86EMUL_CONTINUE)
- return rc;
+ goto fail;

ctxt->src.val = old_eip;
- return em_push(ctxt);
+ rc = em_push(ctxt);
+ /* If we failed, we tainted the memory, but the very least we should
+ restore cs */
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;
+ return rc;
+fail:
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+
}

static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
--
1.9.1

2014-11-06 23:00:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 065/162] target: Fix APTPL metadata handling for dynamic MappedLUNs

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit e24805637d2d270d7975502e9024d473de86afdb upstream.

This patch fixes a bug in handling of SPC-3 PR Activate Persistence
across Target Power Loss (APTPL) logic where re-creation of state for
MappedLUNs from dynamically generated NodeACLs did not occur during
I_T Nexus establishment.

It adds the missing core_scsi3_check_aptpl_registration() call during
core_tpg_check_initiator_node_acl() -> core_tpg_add_node_to_devs() in
order to replay any pre-loaded APTPL metadata state associated with
the newly connected SCSI Initiator Port.

Cc: Mike Christie <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/target/target_core_device.c | 3 ++-
drivers/target/target_core_pr.c | 6 +++---
drivers/target/target_core_pr.h | 2 +-
drivers/target/target_core_tpg.c | 8 ++++++++
4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 04e9288..27bee70 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -1321,7 +1321,8 @@ int core_dev_add_initiator_node_lun_acl(
* Check to see if there are any existing persistent reservation APTPL
* pre-registrations that need to be enabled for this LUN ACL..
*/
- core_scsi3_check_aptpl_registration(lun->lun_se_dev, tpg, lun, lacl);
+ core_scsi3_check_aptpl_registration(lun->lun_se_dev, tpg, lun, nacl,
+ lacl->mapped_lun);
return 0;
}

diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
index 3013287..1205dbd 100644
--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -944,10 +944,10 @@ int core_scsi3_check_aptpl_registration(
struct se_device *dev,
struct se_portal_group *tpg,
struct se_lun *lun,
- struct se_lun_acl *lun_acl)
+ struct se_node_acl *nacl,
+ u32 mapped_lun)
{
- struct se_node_acl *nacl = lun_acl->se_lun_nacl;
- struct se_dev_entry *deve = nacl->device_list[lun_acl->mapped_lun];
+ struct se_dev_entry *deve = nacl->device_list[mapped_lun];

if (dev->dev_reservation_flags & DRF_SPC2_RESERVATIONS)
return 0;
diff --git a/drivers/target/target_core_pr.h b/drivers/target/target_core_pr.h
index ed75cdd..14a0a2e 100644
--- a/drivers/target/target_core_pr.h
+++ b/drivers/target/target_core_pr.h
@@ -55,7 +55,7 @@ extern int core_scsi3_alloc_aptpl_registration(
unsigned char *, u16, u32, int, int, u8);
extern int core_scsi3_check_aptpl_registration(struct se_device *,
struct se_portal_group *, struct se_lun *,
- struct se_lun_acl *);
+ struct se_node_acl *, u32);
extern void core_scsi3_free_pr_reg_from_nacl(struct se_device *,
struct se_node_acl *);
extern void core_scsi3_free_all_registrations(struct se_device *);
diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index 2a573de..6c4596a 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -40,6 +40,7 @@
#include <target/target_core_fabric.h>

#include "target_core_internal.h"
+#include "target_core_pr.h"

extern struct se_device *g_lun0_dev;

@@ -166,6 +167,13 @@ void core_tpg_add_node_to_devs(

core_enable_device_list_for_node(lun, NULL, lun->unpacked_lun,
lun_access, acl, tpg);
+ /*
+ * Check to see if there are any existing persistent reservation
+ * APTPL pre-registrations that need to be enabled for this dynamic
+ * LUN ACL now..
+ */
+ core_scsi3_check_aptpl_registration(dev, tpg, lun, acl,
+ lun->unpacked_lun);
spin_lock(&tpg->tpg_lun_lock);
}
spin_unlock(&tpg->tpg_lun_lock);
--
1.9.1

2014-11-06 22:42:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 057/162] ext4: check EA value offset when loading

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <[email protected]>

commit a0626e75954078cfacddb00a4545dde821170bc5 upstream.

When loading extended attributes, check each entry's value offset to
make sure it doesn't collide with the entries.

Without this check it is easy to crash the kernel by mounting a
malicious FS containing a file with an EA wherein e_value_offs = 0 and
e_value_size > 0 and then deleting the EA, which corrupts the name
list.

(See the f_ea_value_crash test's FS image in e2fsprogs for an example.)

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/xattr.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 298e9c8..a20816e 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -189,14 +189,28 @@ ext4_listxattr(struct dentry *dentry, char *buffer, size_t size)
}

static int
-ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end)
+ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end,
+ void *value_start)
{
- while (!IS_LAST_ENTRY(entry)) {
- struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(entry);
+ struct ext4_xattr_entry *e = entry;
+
+ while (!IS_LAST_ENTRY(e)) {
+ struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(e);
if ((void *)next >= end)
return -EIO;
- entry = next;
+ e = next;
}
+
+ while (!IS_LAST_ENTRY(entry)) {
+ if (entry->e_value_size != 0 &&
+ (value_start + le16_to_cpu(entry->e_value_offs) <
+ (void *)e + sizeof(__u32) ||
+ value_start + le16_to_cpu(entry->e_value_offs) +
+ le32_to_cpu(entry->e_value_size) > end))
+ return -EIO;
+ entry = EXT4_XATTR_NEXT(entry);
+ }
+
return 0;
}

@@ -213,7 +227,8 @@ ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh)
return -EIO;
if (!ext4_xattr_block_csum_verify(inode, bh->b_blocknr, BHDR(bh)))
return -EIO;
- error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size);
+ error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size,
+ bh->b_data);
if (!error)
set_buffer_verified(bh);
return error;
@@ -329,7 +344,7 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name,
header = IHDR(inode, raw_inode);
entry = IFIRST(header);
end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
- error = ext4_xattr_check_names(entry, end);
+ error = ext4_xattr_check_names(entry, end, entry);
if (error)
goto cleanup;
error = ext4_xattr_find_entry(&entry, name_index, name,
@@ -457,7 +472,7 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size)
raw_inode = ext4_raw_inode(&iloc);
header = IHDR(inode, raw_inode);
end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
- error = ext4_xattr_check_names(IFIRST(header), end);
+ error = ext4_xattr_check_names(IFIRST(header), end, IFIRST(header));
if (error)
goto cleanup;
error = ext4_xattr_list_entries(dentry, IFIRST(header),
@@ -972,7 +987,8 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i,
is->s.here = is->s.first;
is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
- error = ext4_xattr_check_names(IFIRST(header), is->s.end);
+ error = ext4_xattr_check_names(IFIRST(header), is->s.end,
+ IFIRST(header));
if (error)
return error;
/* Find the named attribute. */
--
1.9.1

2014-11-06 23:01:34

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 098/162] KVM: x86: Improve thread safety in pit

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Honig <[email protected]>

commit 2febc839133280d5a5e8e1179c94ea674489dae2 upstream.

There's a race condition in the PIT emulation code in KVM. In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization. If the race condition occurs at the wrong time this
can crash the host kernel.

This fixes CVE-2014-3611.

Signed-off-by: Andrew Honig <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/i8254.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 518d864..298781d 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -262,8 +262,10 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
return;

timer = &pit->pit_state.timer;
+ mutex_lock(&pit->pit_state.lock);
if (hrtimer_cancel(timer))
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+ mutex_unlock(&pit->pit_state.lock);
}

static void destroy_pit_timer(struct kvm_pit *pit)
--
1.9.1

2014-11-06 22:42:37

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 104/162] KVM: x86: use new CS.RPL as CPL during task switch

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <[email protected]>

commit 2356aaeb2f58f491679dc0c38bc3f6dbe54e7ded upstream.

During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
to all the other requirements) and will be the new CPL. So far this
worked by carefully setting the CS selector and flag before doing the
task switch; setting CS.selector will already change the CPL.

However, this will not work once we get the CPL from SS.DPL, because
then you will have to set the full segment descriptor cache to change
the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the
task switch, and the check that SS.DPL == CPL will fail.

Temporarily assume that the CPL comes from CS.RPL during task switch
to a protected-mode task. This is the same approach used in QEMU's
emulation code, which (until version 2.0) manually tracks the CPL.

Signed-off-by: Paolo Bonzini <[email protected]>
[ kamal: 3.13-stable prereq for
d1442d8 KVM: x86: Handle errors when RIP is set during far jumps ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kvm/emulate.c | 60 +++++++++++++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 38d3751..300c164 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1436,11 +1436,11 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
}

/* Does not support long mode */
-static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, int seg)
+static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+ u16 selector, int seg, u8 cpl)
{
struct desc_struct seg_desc, old_desc;
- u8 dpl, rpl, cpl;
+ u8 dpl, rpl;
unsigned err_vec = GP_VECTOR;
u32 err_code = 0;
bool null_selector = !(selector & ~0x3); /* 0000-0003 are null */
@@ -1468,7 +1468,6 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
}

rpl = selector & 3;
- cpl = ctxt->ops->cpl(ctxt);

/* NULL selector is not valid for TR, CS and SS (except for long mode) */
if ((seg == VCPU_SREG_CS
@@ -1570,6 +1569,13 @@ exception:
return X86EMUL_PROPAGATE_FAULT;
}

+static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+ u16 selector, int seg)
+{
+ u8 cpl = ctxt->ops->cpl(ctxt);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl);
+}
+
static void write_register_operand(struct operand *op)
{
/* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
@@ -2447,6 +2453,7 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
struct tss_segment_16 *tss)
{
int ret;
+ u8 cpl;

ctxt->_eip = tss->ip;
ctxt->eflags = tss->flag | 2;
@@ -2469,23 +2476,25 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
set_segment_selector(ctxt, tss->ss, VCPU_SREG_SS);
set_segment_selector(ctxt, tss->ds, VCPU_SREG_DS);

+ cpl = tss->cs & 3;
+
/*
* Now load segment descriptors. If fault happens at this stage
* it is handled in a context of new task
*/
- ret = load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR);
+ ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2564,6 +2573,7 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
struct tss_segment_32 *tss)
{
int ret;
+ u8 cpl;

if (ctxt->ops->set_cr(ctxt, 3, tss->cr3))
return emulate_gp(ctxt, 0);
@@ -2582,7 +2592,8 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,

/*
* SDM says that segment selectors are loaded before segment
- * descriptors
+ * descriptors. This is important because CPL checks will
+ * use CS.RPL.
*/
set_segment_selector(ctxt, tss->ldt_selector, VCPU_SREG_LDTR);
set_segment_selector(ctxt, tss->es, VCPU_SREG_ES);
@@ -2596,43 +2607,38 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
* If we're switching between Protected Mode and VM86, we need to make
* sure to update the mode before loading the segment descriptors so
* that the selectors are interpreted correctly.
- *
- * Need to get rflags to the vcpu struct immediately because it
- * influences the CPL which is checked at least when loading the segment
- * descriptors and when pushing an error code to the new kernel stack.
- *
- * TODO Introduce a separate ctxt->ops->set_cpl callback
*/
- if (ctxt->eflags & X86_EFLAGS_VM)
+ if (ctxt->eflags & X86_EFLAGS_VM) {
ctxt->mode = X86EMUL_MODE_VM86;
- else
+ cpl = 3;
+ } else {
ctxt->mode = X86EMUL_MODE_PROT32;
-
- ctxt->ops->set_rflags(ctxt, ctxt->eflags);
+ cpl = tss->cs & 3;
+ }

/*
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR);
+ ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS);
+ ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS);
+ ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;

--
1.9.1

2014-11-06 23:01:50

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 156/162] PCI: Rename sysfs 'enabled' file back to 'enable'

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <[email protected]>

commit d8e7d53a2fc14e0830ab728cb84ee19933d3ac8d upstream.

Back in commit 5136b2da770d ("PCI: convert bus code to use dev_groups"),
I misstyped the 'enable' sysfs filename as 'enabled', which broke the
userspace API. This patch fixes that issue by renaming the file back.

Fixes: 5136b2da770d ("PCI: convert bus code to use dev_groups")
Reported-by: Jeff Epler <[email protected]>
Tested-by: Jeff Epler <[email protected]> # on v3.14-rt
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/pci/pci-sysfs.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 21ba076..6a02775 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -186,7 +186,7 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(modalias);

-static ssize_t enabled_store(struct device *dev,
+static ssize_t enable_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
{
@@ -212,7 +212,7 @@ static ssize_t enabled_store(struct device *dev,
return result < 0 ? result : count;
}

-static ssize_t enabled_show(struct device *dev,
+static ssize_t enable_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev;
@@ -220,7 +220,7 @@ static ssize_t enabled_show(struct device *dev,
pdev = to_pci_dev (dev);
return sprintf (buf, "%u\n", atomic_read(&pdev->enable_cnt));
}
-static DEVICE_ATTR_RW(enabled);
+static DEVICE_ATTR_RW(enable);

#ifdef CONFIG_NUMA
static ssize_t
@@ -531,7 +531,7 @@ static struct attribute *pci_dev_attrs[] = {
#endif
&dev_attr_dma_mask_bits.attr,
&dev_attr_consistent_dma_mask_bits.attr,
- &dev_attr_enabled.attr,
+ &dev_attr_enable.attr,
&dev_attr_broken_parity_status.attr,
&dev_attr_msi_bus.attr,
#if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
--
1.9.1

2014-11-06 23:02:36

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 158/162] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Rannaud <[email protected]>

commit 69a91c237ab0ebe4e9fdeaf6d0090c85275594ec upstream.

The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:

Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.

The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.

O_TMPFILE, however, behaves differently:

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);

For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:

if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}

But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().

This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().

A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523

The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.

On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.

But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.

Signed-off-by: Eric Rannaud <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index eaabb52..ddb6721 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3128,7 +3128,8 @@ static int do_tmpfile(int dfd, struct filename *pathname,
if (error)
goto out2;
audit_inode(pathname, nd->path.dentry, 0);
- error = may_open(&nd->path, op->acc_mode, op->open_flag);
+ /* Don't check for other permissions, the inode was just created */
+ error = may_open(&nd->path, MAY_OPEN, op->open_flag);
if (error)
goto out2;
file->f_path.mnt = nd->path.mnt;
--
1.9.1

2014-11-06 23:02:54

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 115/162] usb: gadget: function: acm: make f_acm pass USB20CV Chapter9

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 52ec49a5e56a27c5b6f8217708783eff39f24c16 upstream.

During Halt Endpoint Test, our interrupt endpoint
will be disabled, which will clear out ep->desc
to NULL. Unless we call config_ep_by_speed() again,
we will not be able to enable this endpoint which
will make us fail that test.

Fixes: f9c56cd (usb: gadget: Clear usb_endpoint_descriptor
inside the struct usb_ep on disable)
Signed-off-by: Felipe Balbi <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/gadget/f_acm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/gadget/f_acm.c b/drivers/usb/gadget/f_acm.c
index ab1065a..d3d7d1b 100644
--- a/drivers/usb/gadget/f_acm.c
+++ b/drivers/usb/gadget/f_acm.c
@@ -430,11 +430,12 @@ static int acm_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
if (acm->notify->driver_data) {
VDBG(cdev, "reset acm control interface %d\n", intf);
usb_ep_disable(acm->notify);
- } else {
- VDBG(cdev, "init acm ctrl interface %d\n", intf);
+ }
+
+ if (acm->notify->desc)
if (config_ep_by_speed(cdev->gadget, f, acm->notify))
return -EINVAL;
- }
+
usb_ep_enable(acm->notify);
acm->notify->driver_data = acm;

--
1.9.1

2014-11-06 23:03:13

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 101/162] kvm: vmx: handle invvpid vm exit gracefully

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Petr Matousek <[email protected]>

commit a642fc305053cc1c6e47e4f4df327895747ab485 upstream.

On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.

Fix this by installing an invvpid vm exit handler.

This is CVE-2014-3646.

Signed-off-by: Petr Matousek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/include/uapi/asm/vmx.h | 2 ++
arch/x86/kvm/vmx.c | 9 ++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 0e79420..990a2fe 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -67,6 +67,7 @@
#define EXIT_REASON_EPT_MISCONFIG 49
#define EXIT_REASON_INVEPT 50
#define EXIT_REASON_PREEMPTION_TIMER 52
+#define EXIT_REASON_INVVPID 53
#define EXIT_REASON_WBINVD 54
#define EXIT_REASON_XSETBV 55
#define EXIT_REASON_APIC_WRITE 56
@@ -114,6 +115,7 @@
{ EXIT_REASON_EOI_INDUCED, "EOI_INDUCED" }, \
{ EXIT_REASON_INVALID_STATE, "INVALID_STATE" }, \
{ EXIT_REASON_INVD, "INVD" }, \
+ { EXIT_REASON_INVVPID, "INVVPID" }, \
{ EXIT_REASON_INVPCID, "INVPCID" }

#endif /* _UAPIVMX_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4dd0201..6e4b08b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6410,6 +6410,12 @@ static int handle_invept(struct kvm_vcpu *vcpu)
return 1;
}

+static int handle_invvpid(struct kvm_vcpu *vcpu)
+{
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@@ -6455,6 +6461,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_INVEPT] = handle_invept,
+ [EXIT_REASON_INVVPID] = handle_invvpid,
};

static const int kvm_vmx_max_exit_handlers =
@@ -6684,7 +6691,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
- case EXIT_REASON_INVEPT:
+ case EXIT_REASON_INVEPT: case EXIT_REASON_INVVPID:
/*
* VMX instructions trap unconditionally. This allows L1 to
* emulate them for its L2 guest, i.e., allows 3-level nesting!
--
1.9.1

2014-11-06 23:03:36

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 160/162] x86_64, entry: Fix out of bounds read on sysenter

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 653bc77af60911ead1f423e588f54fc2547c4957 upstream.

Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
reads out of bounds, causing the NT fix to be unreliable. But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.

Excerpt from the crash:

[ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296

2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)

That read is deterministically above the top of the stack. I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.

Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/ia32/ia32entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 711de08..92a2e93 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -157,7 +157,7 @@ ENTRY(ia32_sysenter_target)
* ourselves. To save a few cycles, we can check whether
* NT was set instead of doing an unconditional popfq.
*/
- testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */
+ testl $X86_EFLAGS_NT,EFLAGS-ARGOFFSET(%rsp)
jnz sysenter_fix_flags
sysenter_flags_fixed:

--
1.9.1

2014-11-06 22:42:26

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 162/162] ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Lv Zheng <[email protected]>

commit 79149001105f18bd2285ada109f9229ea24a7571 upstream.

It is reported that Samsung laptops that need to poll events are broken by
the following commit:
Commit 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08
Subject: ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set

The behaviors of the 2 vendor firmwares are conflict:
1. Acer: OSPM shouldn't issue QR_EC unless SCI_EVT is set, firmware
automatically sets SCI_EVT as long as there is event queued up.
2. Samsung: OSPM should issue QR_EC whatever SCI_EVT is set, firmware
returns 0 when there is no event queued up.

This patch is a quick fix to distinguish the behaviors to make Acer
behavior only effective for Acer EC firmware so that the breakages on
Samsung EC firmware can be avoided.

Fixes: 3afcf2ece453 (ACPI / EC: Add support to disallow QR_EC to be issued ...)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161
Reported-and-tested-by: Ortwin Glück <[email protected]>
Signed-off-by: Lv Zheng <[email protected]>
[ rjw : Subject ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/acpi/ec.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index aa232a3..8c02f84 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -127,6 +127,7 @@ static int EC_FLAGS_MSI; /* Out-of-spec MSI controller */
static int EC_FLAGS_VALIDATE_ECDT; /* ASUStec ECDTs need to be validated */
static int EC_FLAGS_SKIP_DSDT_SCAN; /* Not all BIOS survive early DSDT scan */
static int EC_FLAGS_CLEAR_ON_RESUME; /* Needs acpi_ec_clear() on boot/resume */
+static int EC_FLAGS_QUERY_HANDSHAKE; /* Needs QR_EC issued when SCI_EVT set */

/* --------------------------------------------------------------------------
Transaction Management
@@ -204,13 +205,8 @@ static bool advance_transaction(struct acpi_ec *ec)
}
return wakeup;
} else {
- /*
- * There is firmware refusing to respond QR_EC when SCI_EVT
- * is not set, for which case, we complete the QR_EC
- * without issuing it to the firmware.
- * https://bugzilla.kernel.org/show_bug.cgi?id=86211
- */
- if (!(status & ACPI_EC_FLAG_SCI) &&
+ if (EC_FLAGS_QUERY_HANDSHAKE &&
+ !(status & ACPI_EC_FLAG_SCI) &&
(t->command == ACPI_EC_COMMAND_QUERY)) {
t->flags |= ACPI_EC_COMMAND_POLL;
t->rdata[t->ri++] = 0x00;
@@ -996,6 +992,18 @@ static int ec_enlarge_storm_threshold(const struct dmi_system_id *id)
}

/*
+ * Acer EC firmware refuses to respond QR_EC when SCI_EVT is not set, for
+ * which case, we complete the QR_EC without issuing it to the firmware.
+ * https://bugzilla.kernel.org/show_bug.cgi?id=86211
+ */
+static int ec_flag_query_handshake(const struct dmi_system_id *id)
+{
+ pr_debug("Detected the EC firmware requiring QR_EC issued when SCI_EVT set\n");
+ EC_FLAGS_QUERY_HANDSHAKE = 1;
+ return 0;
+}
+
+/*
* On some hardware it is necessary to clear events accumulated by the EC during
* sleep. These ECs stop reporting GPEs until they are manually polled, if too
* many events are accumulated. (e.g. Samsung Series 5/9 notebooks)
@@ -1065,6 +1073,9 @@ static struct dmi_system_id ec_dmi_table[] __initdata = {
{
ec_clear_on_resume, "Samsung hardware", {
DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD.")}, NULL},
+ {
+ ec_flag_query_handshake, "Acer hardware", {
+ DMI_MATCH(DMI_SYS_VENDOR, "Acer"), }, NULL},
{},
};

--
1.9.1

2014-11-06 23:03:54

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 108/162] staging:iio:ad5933: Drop "raw" from channel names

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Lars-Peter Clausen <[email protected]>

commit 6822ee34ad57b29a3b44df2c2829910f03c34fa4 upstream.

"raw" is the name of a channel property, but should not be part of the
channel name itself.

Signed-off-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/staging/iio/impedance-analyzer/ad5933.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c
index 3854f99..97d4b3f 100644
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c
+++ b/drivers/staging/iio/impedance-analyzer/ad5933.c
@@ -125,7 +125,7 @@ static const struct iio_chan_spec ad5933_channels[] = {
.type = IIO_VOLTAGE,
.indexed = 1,
.channel = 0,
- .extend_name = "real_raw",
+ .extend_name = "real",
.address = AD5933_REG_REAL_DATA,
.scan_index = 0,
.scan_type = {
@@ -137,7 +137,7 @@ static const struct iio_chan_spec ad5933_channels[] = {
.type = IIO_VOLTAGE,
.indexed = 1,
.channel = 0,
- .extend_name = "imag_raw",
+ .extend_name = "imag",
.address = AD5933_REG_IMAG_DATA,
.scan_index = 1,
.scan_type = {
--
1.9.1

2014-11-06 23:04:27

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 161/162] ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Lv Zheng <[email protected]>

commit 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08 upstream.

There is a platform refusing to respond QR_EC when SCI_EVT isn't set
(Acer Aspire V5-573G).

Currently, we rely on the behaviour that the EC firmware can respond
something (for example, 0x00 to indicate "no outstanding events") to
QR_EC even when SCI_EVT is not set, but the reporter has complained
about AC/battery pluging/unpluging and video brightness change delay
on that platform.

This is because the work item that has issued QR_EC has to wait until
timeout in this case, and the _Qxx method evaluation work item queued
after QR_EC one is delayed.

It sounds reasonable to fix this issue by:
1. Implementing SCI_EVT sanity check before issuing QR_EC in the EC
driver's main state machine.
2. Moving QR_EC issuing out of the work queue used by _Qxx evaluation
to a seperate IRQ handling thread.

This patch fixes this issue using solution 1.

By disallowing QR_EC to be issued when SCI_EVT isn't set, we are able to
handle such platform in the EC driver's main state machine. This patch
enhances the state machine in this way to survive with such malfunctioning
EC firmware.

Note that this patch can also fix CLEAR_ON_RESUME quirk which also relies
on the assumption that the platforms are able to respond even when SCI_EVT
isn't set.

Fixes: c0d653412fc8 ACPI / EC: Fix race condition in ec_transaction_completed()
Link: https://bugzilla.kernel.org/show_bug.cgi?id=82611
Reported-and-tested-by: Alexander Mezin <[email protected]>
Signed-off-by: Lv Zheng <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/acpi/ec.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 3b20294..aa232a3 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -191,6 +191,8 @@ static bool advance_transaction(struct acpi_ec *ec)
t->rdata[t->ri++] = acpi_ec_read_data(ec);
if (t->rlen == t->ri) {
t->flags |= ACPI_EC_COMMAND_COMPLETE;
+ if (t->command == ACPI_EC_COMMAND_QUERY)
+ pr_debug("hardware QR_EC completion\n");
wakeup = true;
}
} else
@@ -202,7 +204,20 @@ static bool advance_transaction(struct acpi_ec *ec)
}
return wakeup;
} else {
- if ((status & ACPI_EC_FLAG_IBF) == 0) {
+ /*
+ * There is firmware refusing to respond QR_EC when SCI_EVT
+ * is not set, for which case, we complete the QR_EC
+ * without issuing it to the firmware.
+ * https://bugzilla.kernel.org/show_bug.cgi?id=86211
+ */
+ if (!(status & ACPI_EC_FLAG_SCI) &&
+ (t->command == ACPI_EC_COMMAND_QUERY)) {
+ t->flags |= ACPI_EC_COMMAND_POLL;
+ t->rdata[t->ri++] = 0x00;
+ t->flags |= ACPI_EC_COMMAND_COMPLETE;
+ pr_debug("software QR_EC completion\n");
+ wakeup = true;
+ } else if ((status & ACPI_EC_FLAG_IBF) == 0) {
acpi_ec_write_cmd(ec, t->command);
t->flags |= ACPI_EC_COMMAND_POLL;
} else
--
1.9.1

2014-11-06 23:05:04

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 107/162] staging:iio:ad5933: Fix NULL pointer deref when enabling buffer

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Lars-Peter Clausen <[email protected]>

commit 824269c5868d2a7a26417e5ef3841a27d42c6139 upstream.

In older versions of the IIO framework it was possible to pass a
completely different set of channels to iio_buffer_register() as the one
that is assigned to the IIO device. Commit 959d2952d124 ("staging:iio: make
iio_sw_buffer_preenable much more general.") introduced a restriction that
requires that the set of channels that is passed to iio_buffer_register() is
a subset of the channels assigned to the IIO device as the IIO core will use
the list of channels that is assigned to the device to lookup a channel by
scan index in iio_compute_scan_bytes(). If it can not find the channel the
function will crash. This patch fixes the issue by making sure that the same
set of channels is assigned to the IIO device and passed to
iio_buffer_register().

Fixes the follow NULL pointer derefernce kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000016
pgd = d53d0000
[00000016] *pgd=1534e831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1626 Comm: bash Not tainted 3.15.0-19969-g2a180eb-dirty #9545
task: d6c124c0 ti: d539a000 task.ti: d539a000
PC is at iio_compute_scan_bytes+0x34/0xa8
LR is at iio_compute_scan_bytes+0x34/0xa8
pc : [<c03052e4>] lr : [<c03052e4>] psr: 60070013
sp : d539beb8 ip : 00000001 fp : 00000000
r10: 00000002 r9 : 00000000 r8 : 00000001
r7 : 00000000 r6 : d6dc8800 r5 : d7571000 r4 : 00000002
r3 : d7571000 r2 : 00000044 r1 : 00000001 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 18c5387d Table: 153d004a DAC: 00000015
Process bash (pid: 1626, stack limit = 0xd539a240)
Stack: (0xd539beb8 to 0xd539c000)
bea0: c02fc0e4 d7571000
bec0: d76c1640 d6dc8800 d757117c 00000000 d757112c c0305b04 d76c1690 d76c1640
bee0: d7571188 00000002 00000000 d7571000 d539a000 00000000 000dd1c8 c0305d54
bf00: d7571010 0160b868 00000002 c69d3900 d7573278 d7573308 c69d3900 c01ece90
bf20: 00000002 c0103fac c0103f6c d539bf88 00000002 c69d3b00 c69d3b0c c0103468
bf40: 00000000 00000000 d7694a00 00000002 000af408 d539bf88 c000dd84 c00b2f94
bf60: d7694a00 000af408 00000002 d7694a00 d7694a00 00000002 000af408 c000dd84
bf80: 00000000 c00b32d0 00000000 00000000 00000002 b6f1aa78 00000002 000af408
bfa0: 00000004 c000dc00 b6f1aa78 00000002 00000001 000af408 00000002 00000000
bfc0: b6f1aa78 00000002 000af408 00000004 be806a4c 000a6094 00000000 000dd1c8
bfe0: 00000000 be8069cc b6e8ab77 b6ec125c 40070010 00000001 22940489 154a5007
[<c03052e4>] (iio_compute_scan_bytes) from [<c0305b04>] (__iio_update_buffers+0x248/0x438)
[<c0305b04>] (__iio_update_buffers) from [<c0305d54>] (iio_buffer_store_enable+0x60/0x7c)
[<c0305d54>] (iio_buffer_store_enable) from [<c01ece90>] (dev_attr_store+0x18/0x24)
[<c01ece90>] (dev_attr_store) from [<c0103fac>] (sysfs_kf_write+0x40/0x4c)
[<c0103fac>] (sysfs_kf_write) from [<c0103468>] (kernfs_fop_write+0x110/0x154)
[<c0103468>] (kernfs_fop_write) from [<c00b2f94>] (vfs_write+0xd0/0x160)
[<c00b2f94>] (vfs_write) from [<c00b32d0>] (SyS_write+0x40/0x78)
[<c00b32d0>] (SyS_write) from [<c000dc00>] (ret_fast_syscall+0x0/0x30)
Code: ea00000e e1a01008 e1a00005 ebfff6fc (e5d0a016)

Fixes: 959d2952d124 ("staging:iio: make iio_sw_buffer_preenable much more general.")
Signed-off-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/staging/iio/impedance-analyzer/ad5933.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c
index 2b96665..3854f99 100644
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c
+++ b/drivers/staging/iio/impedance-analyzer/ad5933.c
@@ -115,6 +115,7 @@ static const struct iio_chan_spec ad5933_channels[] = {
.channel = 0,
.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
.address = AD5933_REG_TEMP_DATA,
+ .scan_index = -1,
.scan_type = {
.sign = 's',
.realbits = 14,
@@ -125,8 +126,6 @@ static const struct iio_chan_spec ad5933_channels[] = {
.indexed = 1,
.channel = 0,
.extend_name = "real_raw",
- .info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
- BIT(IIO_CHAN_INFO_SCALE),
.address = AD5933_REG_REAL_DATA,
.scan_index = 0,
.scan_type = {
@@ -139,8 +138,6 @@ static const struct iio_chan_spec ad5933_channels[] = {
.indexed = 1,
.channel = 0,
.extend_name = "imag_raw",
- .info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
- BIT(IIO_CHAN_INFO_SCALE),
.address = AD5933_REG_IMAG_DATA,
.scan_index = 1,
.scan_type = {
@@ -748,14 +745,14 @@ static int ad5933_probe(struct i2c_client *client,
indio_dev->name = id->name;
indio_dev->modes = INDIO_DIRECT_MODE;
indio_dev->channels = ad5933_channels;
- indio_dev->num_channels = 1; /* only register temp0_input */
+ indio_dev->num_channels = ARRAY_SIZE(ad5933_channels);

ret = ad5933_register_ring_funcs_and_init(indio_dev);
if (ret)
goto error_disable_reg;

- /* skip temp0_input, register in0_(real|imag)_raw */
- ret = iio_buffer_register(indio_dev, &ad5933_channels[1], 2);
+ ret = iio_buffer_register(indio_dev, ad5933_channels,
+ ARRAY_SIZE(ad5933_channels));
if (ret)
goto error_unreg_ring;

--
1.9.1

2014-11-06 23:05:35

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 114/162] usb: dwc3: gadget: fix set_halt() bug with pending transfers

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 7a60855972f0d3c014093046cb6f013a1ee5bb19 upstream.

According to our Gadget Framework API documentation,
->set_halt() *must* return -EAGAIN if we have pending
transfers (on either direction) or FIFO isn't empty (on
TX endpoints).

Fix this bug so that the mass storage gadget can be used
without stall=0 parameter.

This patch should be backported to all kernels since v3.2.

Suggested-by: Alan Stern <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
[ kamal: backport to 3.13-stable: omitted ep_set_wedge() change ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/dwc3/ep0.c | 4 ++--
drivers/usb/dwc3/gadget.c | 14 +++++++++++---
drivers/usb/dwc3/gadget.h | 2 +-
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 21a3520..0985ff7 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -251,7 +251,7 @@ static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)

/* stall is always issued on EP0 */
dep = dwc->eps[0];
- __dwc3_gadget_ep_set_halt(dep, 1);
+ __dwc3_gadget_ep_set_halt(dep, 1, false);
dep->flags = DWC3_EP_ENABLED;
dwc->delayed_status = false;

@@ -461,7 +461,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
return -EINVAL;
if (set == 0 && (dep->flags & DWC3_EP_WEDGE))
break;
- ret = __dwc3_gadget_ep_set_halt(dep, set);
+ ret = __dwc3_gadget_ep_set_halt(dep, set, true);
if (ret)
return -EINVAL;
break;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index c37da0c..27b9c8a 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -588,7 +588,7 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep)

/* make sure HW endpoint isn't stalled */
if (dep->flags & DWC3_EP_STALL)
- __dwc3_gadget_ep_set_halt(dep, 0);
+ __dwc3_gadget_ep_set_halt(dep, 0, false);

reg = dwc3_readl(dwc->regs, DWC3_DALEPENA);
reg &= ~DWC3_DALEPENA_EP(dep->number);
@@ -1186,7 +1186,7 @@ out0:
return ret;
}

-int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value)
+int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol)
{
struct dwc3_gadget_ep_cmd_params params;
struct dwc3 *dwc = dep->dwc;
@@ -1195,6 +1195,14 @@ int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value)
memset(&params, 0x00, sizeof(params));

if (value) {
+ if (!protocol && ((dep->direction && dep->flags & DWC3_EP_BUSY) ||
+ (!list_empty(&dep->req_queued) ||
+ !list_empty(&dep->request_list)))) {
+ dev_dbg(dwc->dev, "%s: pending request, cannot halt\n",
+ dep->name);
+ return -EAGAIN;
+ }
+
ret = dwc3_send_gadget_ep_cmd(dwc, dep->number,
DWC3_DEPCMD_SETSTALL, &params);
if (ret)
@@ -1234,7 +1242,7 @@ static int dwc3_gadget_ep_set_halt(struct usb_ep *ep, int value)
goto out;
}

- ret = __dwc3_gadget_ep_set_halt(dep, value);
+ ret = __dwc3_gadget_ep_set_halt(dep, value, false);
out:
spin_unlock_irqrestore(&dwc->lock, flags);

diff --git a/drivers/usb/dwc3/gadget.h b/drivers/usb/dwc3/gadget.h
index a0ee75b..ac625582 100644
--- a/drivers/usb/dwc3/gadget.h
+++ b/drivers/usb/dwc3/gadget.h
@@ -85,7 +85,7 @@ void dwc3_ep0_out_start(struct dwc3 *dwc);
int dwc3_gadget_ep0_set_halt(struct usb_ep *ep, int value);
int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
gfp_t gfp_flags);
-int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value);
+int __dwc3_gadget_ep_set_halt(struct dwc3_ep *dep, int value, int protocol);

/**
* dwc3_gadget_ep_get_transfer_index - Gets transfer index from HW
--
1.9.1

2014-11-06 23:07:30

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 042/162] sparc64: support M6 and M7 for building CPU distribution map

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Allen Pais <[email protected]>

[ Upstream commit 9bd3ee33f6b97de092610d8dcabc4cb98d99505c ]

Add M6 and M7 chip type in cpumap.c to correctly build CPU distribution map that spans all online CPUs.

Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/cpumap.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/sparc/kernel/cpumap.c b/arch/sparc/kernel/cpumap.c
index cb5d272..b031c9c 100644
--- a/arch/sparc/kernel/cpumap.c
+++ b/arch/sparc/kernel/cpumap.c
@@ -327,6 +327,8 @@ static int iterate_cpu(struct cpuinfo_tree *t, unsigned int root_index)
case SUN4V_CHIP_NIAGARA3:
case SUN4V_CHIP_NIAGARA4:
case SUN4V_CHIP_NIAGARA5:
+ case SUN4V_CHIP_SPARC_M6:
+ case SUN4V_CHIP_SPARC_M7:
case SUN4V_CHIP_SPARC64X:
rover_inc_table = niagara_iterate_method;
break;
--
1.9.1

2014-11-06 23:07:57

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 045/162] sparc64: Switch to 4-level page tables.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit ac55c768143aa34cc3789c4820cbb0809a76fd9c ]

This has become necessary with chips that support more than 43-bits
of physical addressing.

Based almost entirely upon a patch by Bob Picco.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/page_64.h | 6 ++++++
arch/sparc/include/asm/pgalloc_64.h | 28 +++++++++++++++++++++++++++-
arch/sparc/include/asm/pgtable_64.h | 37 ++++++++++++++++++++++++++++++++-----
arch/sparc/include/asm/tsb.h | 10 ++++++++++
arch/sparc/kernel/smp_64.c | 7 +++++++
arch/sparc/mm/init_64.c | 31 +++++++++++++++++++++++++++----
6 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index aac53fc..3747c4f 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -57,18 +57,21 @@ extern void copy_user_page(void *to, void *from, unsigned long vaddr, struct pag
typedef struct { unsigned long pte; } pte_t;
typedef struct { unsigned long iopte; } iopte_t;
typedef struct { unsigned long pmd; } pmd_t;
+typedef struct { unsigned long pud; } pud_t;
typedef struct { unsigned long pgd; } pgd_t;
typedef struct { unsigned long pgprot; } pgprot_t;

#define pte_val(x) ((x).pte)
#define iopte_val(x) ((x).iopte)
#define pmd_val(x) ((x).pmd)
+#define pud_val(x) ((x).pud)
#define pgd_val(x) ((x).pgd)
#define pgprot_val(x) ((x).pgprot)

#define __pte(x) ((pte_t) { (x) } )
#define __iopte(x) ((iopte_t) { (x) } )
#define __pmd(x) ((pmd_t) { (x) } )
+#define __pud(x) ((pud_t) { (x) } )
#define __pgd(x) ((pgd_t) { (x) } )
#define __pgprot(x) ((pgprot_t) { (x) } )

@@ -77,18 +80,21 @@ typedef struct { unsigned long pgprot; } pgprot_t;
typedef unsigned long pte_t;
typedef unsigned long iopte_t;
typedef unsigned long pmd_t;
+typedef unsigned long pud_t;
typedef unsigned long pgd_t;
typedef unsigned long pgprot_t;

#define pte_val(x) (x)
#define iopte_val(x) (x)
#define pmd_val(x) (x)
+#define pud_val(x) (x)
#define pgd_val(x) (x)
#define pgprot_val(x) (x)

#define __pte(x) (x)
#define __iopte(x) (x)
#define __pmd(x) (x)
+#define __pud(x) (x)
#define __pgd(x) (x)
#define __pgprot(x) (x)

diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h
index bcfe063..2c8d41f 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -15,6 +15,13 @@

extern struct kmem_cache *pgtable_cache;

+static inline void __pgd_populate(pgd_t *pgd, pud_t *pud)
+{
+ pgd_set(pgd, pud);
+}
+
+#define pgd_populate(MM, PGD, PUD) __pgd_populate(PGD, PUD)
+
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
@@ -25,7 +32,23 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
kmem_cache_free(pgtable_cache, pgd);
}

-#define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD)
+static inline void __pud_populate(pud_t *pud, pmd_t *pmd)
+{
+ pud_set(pud, pmd);
+}
+
+#define pud_populate(MM, PUD, PMD) __pud_populate(PUD, PMD)
+
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ return kmem_cache_alloc(pgtable_cache,
+ GFP_KERNEL|__GFP_REPEAT);
+}
+
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+ kmem_cache_free(pgtable_cache, pud);
+}

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
@@ -91,4 +114,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pte_t *pte,
#define __pmd_free_tlb(tlb, pmd, addr) \
pgtable_free_tlb(tlb, pmd, false)

+#define __pud_free_tlb(tlb, pud, addr) \
+ pgtable_free_tlb(tlb, pud, false)
+
#endif /* _SPARC64_PGALLOC_H */
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 1a49ffd..5c91201 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -20,8 +20,6 @@
#include <asm/page.h>
#include <asm/processor.h>

-#include <asm-generic/pgtable-nopud.h>
-
/* The kernel image occupies 0x4000000 to 0x6000000 (4MB --> 96MB).
* The page copy blockops can use 0x6000000 to 0x8000000.
* The 8K TSB is mapped in the 0x8000000 to 0x8400000 range.
@@ -55,13 +53,21 @@
#define PMD_MASK (~(PMD_SIZE-1))
#define PMD_BITS (PAGE_SHIFT - 3)

-/* PGDIR_SHIFT determines what a third-level page table entry can map */
-#define PGDIR_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-3) + PMD_BITS)
+/* PUD_SHIFT determines the size of the area a third-level page
+ * table can map
+ */
+#define PUD_SHIFT (PMD_SHIFT + PMD_BITS)
+#define PUD_SIZE (_AC(1,UL) << PUD_SHIFT)
+#define PUD_MASK (~(PUD_SIZE-1))
+#define PUD_BITS (PAGE_SHIFT - 3)
+
+/* PGDIR_SHIFT determines what a fourth-level page table entry can map */
+#define PGDIR_SHIFT (PUD_SHIFT + PUD_BITS)
#define PGDIR_SIZE (_AC(1,UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE-1))
#define PGDIR_BITS (PAGE_SHIFT - 3)

-#if (PGDIR_SHIFT + PGDIR_BITS) != 43
+#if (PGDIR_SHIFT + PGDIR_BITS) != 53
#error Page table parameters do not cover virtual address space properly.
#endif

@@ -93,6 +99,7 @@ static inline bool kern_addr_valid(unsigned long addr)
/* Entries per page directory level. */
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-3))
#define PTRS_PER_PMD (1UL << PMD_BITS)
+#define PTRS_PER_PUD (1UL << PUD_BITS)
#define PTRS_PER_PGD (1UL << PGDIR_BITS)

/* Kernel has a separate 44bit address space. */
@@ -101,6 +108,9 @@ static inline bool kern_addr_valid(unsigned long addr)
#define pmd_ERROR(e) \
pr_err("%s:%d: bad pmd %p(%016lx) seen at (%pS)\n", \
__FILE__, __LINE__, &(e), pmd_val(e), __builtin_return_address(0))
+#define pud_ERROR(e) \
+ pr_err("%s:%d: bad pud %p(%016lx) seen at (%pS)\n", \
+ __FILE__, __LINE__, &(e), pud_val(e), __builtin_return_address(0))
#define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %p(%016lx) seen at (%pS)\n", \
__FILE__, __LINE__, &(e), pgd_val(e), __builtin_return_address(0))
@@ -779,6 +789,11 @@ static inline int pmd_present(pmd_t pmd)
#define pud_bad(pud) ((pud_val(pud) & ~PAGE_MASK) || \
!__kern_addr_valid(pud_val(pud)))

+#define pgd_none(pgd) (!pgd_val(pgd))
+
+#define pgd_bad(pgd) ((pgd_val(pgd) & ~PAGE_MASK) || \
+ !__kern_addr_valid(pgd_val(pgd)))
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
@@ -815,10 +830,17 @@ static inline unsigned long __pmd_page(pmd_t pmd)
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
#define pud_present(pud) (pud_val(pud) != 0U)
#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL)
+#define pgd_page_vaddr(pgd) \
+ ((unsigned long) __va(pgd_val(pgd)))
+#define pgd_present(pgd) (pgd_val(pgd) != 0U)
+#define pgd_clear(pgdp) (pgd_val(*(pgd)) = 0UL)

/* Same in both SUN4V and SUN4U. */
#define pte_none(pte) (!pte_val(pte))

+#define pgd_set(pgdp, pudp) \
+ (pgd_val(*(pgdp)) = (__pa((unsigned long) (pudp))))
+
/* to find an entry in a page-table-directory. */
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
@@ -826,6 +848,11 @@ static inline unsigned long __pmd_page(pmd_t pmd)
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(address) pgd_offset(&init_mm, address)

+/* Find an entry in the third-level page table.. */
+#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+#define pud_offset(pgdp, address) \
+ ((pud_t *) pgd_page_vaddr(*(pgdp)) + pud_index(address))
+
/* Find an entry in the second-level page table.. */
#define pmd_offset(pudp, address) \
((pmd_t *) pud_page_vaddr(*(pudp)) + \
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 90916f9..2e268b6 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -145,6 +145,11 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
andn REG2, 0x7, REG2; \
ldx [REG1 + REG2], REG1; \
brz,pn REG1, FAIL_LABEL; \
+ sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
+ srlx REG2, 64 - PAGE_SHIFT, REG2; \
+ andn REG2, 0x7, REG2; \
+ ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
@@ -198,6 +203,11 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
andn REG2, 0x7, REG2; \
ldxa [PHYS_PGD + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
+ sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
+ srlx REG2, 64 - PAGE_SHIFT, REG2; \
+ andn REG2, 0x7, REG2; \
+ ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 8311f3d..50c3dd03 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1479,6 +1479,13 @@ static void __init pcpu_populate_pte(unsigned long addr)
pud_t *pud;
pmd_t *pmd;

+ if (pgd_none(*pgd)) {
+ pud_t *new;
+
+ new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+ pgd_populate(&init_mm, pgd, new);
+ }
+
pud = pud_offset(pgd, addr);
if (pud_none(*pud)) {
pmd_t *new;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 1a9cc81..7b03310 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1383,6 +1383,13 @@ static unsigned long __ref kernel_map_range(unsigned long pstart,
pmd_t *pmd;
pte_t *pte;

+ if (pgd_none(*pgd)) {
+ pud_t *new;
+
+ new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+ alloc_bytes += PAGE_SIZE;
+ pgd_populate(&init_mm, pgd, new);
+ }
pud = pud_offset(pgd, vstart);
if (pud_none(*pud)) {
pmd_t *new;
@@ -1849,7 +1856,12 @@ static void __init sun4v_linear_pte_xor_finalize(void)
/* paging_init() sets up the page tables */

static unsigned long last_valid_pfn;
-pgd_t swapper_pg_dir[PTRS_PER_PGD];
+
+/* These must be page aligned in order to not trigger the
+ * alignment tests of pgd_bad() and pud_bad().
+ */
+pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__ ((aligned (PAGE_SIZE)));
+static pud_t swapper_pud_dir[PTRS_PER_PUD] __attribute__ ((aligned (PAGE_SIZE)));

static void sun4u_pgprot_init(void);
static void sun4v_pgprot_init(void);
@@ -1858,6 +1870,8 @@ void __init paging_init(void)
{
unsigned long end_pfn, shift, phys_base;
unsigned long real_end, i;
+ pud_t *pud;
+ pmd_t *pmd;
int node;

setup_page_offset();
@@ -1954,9 +1968,18 @@ void __init paging_init(void)

memset(swapper_low_pmd_dir, 0, sizeof(swapper_low_pmd_dir));

- /* Now can init the kernel/bad page tables. */
- pud_set(pud_offset(&swapper_pg_dir[0], 0),
- swapper_low_pmd_dir + (shift / sizeof(pgd_t)));
+ /* The kernel page tables we publish into what the rest of the
+ * world sees must be adjusted so that they see the PAGE_OFFSET
+ * address of these in-kerenel data structures. However right
+ * here we must access them from the kernel image side, because
+ * the trap tables haven't been taken over and therefore we cannot
+ * take TLB misses in the PAGE_OFFSET linear mappings yet.
+ */
+ pud = swapper_pud_dir + (shift / sizeof(pud_t));
+ pgd_set(&swapper_pg_dir[0], pud);
+
+ pmd = swapper_low_pmd_dir + (shift / sizeof(pmd_t));
+ pud_set(&swapper_pud_dir[0], pmd);

inherit_prom_mappings();

--
1.9.1

2014-11-06 23:08:23

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 047/162] sparc64: Adjust KTSB assembler to support larger physical addresses.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 8c82dc0e883821c098c8b0b130ffebabf9aab5df ]

As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/tsb.h | 30 ++++++++++++------------------
arch/sparc/mm/init_64.c | 28 +++++++++++++++++++++++++---
2 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 2e268b6..a2f5419 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -256,8 +256,6 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
(KERNEL_TSB_SIZE_BYTES / 16)
#define KERNEL_TSB4M_NENTRIES 4096

-#define KTSB_PHYS_SHIFT 15
-
/* Do a kernel TSB lookup at tl>0 on VADDR+TAG, branch to OK_LABEL
* on TSB hit. REG1, REG2, REG3, and REG4 are used as temporaries
* and the found TTE will be left in REG1. REG3 and REG4 must
@@ -266,17 +264,15 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
* VADDR and TAG will be preserved and not clobbered by this macro.
*/
#define KERN_TSB_LOOKUP_TL1(VADDR, TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
-661: sethi %hi(swapper_tsb), REG1; \
- or REG1, %lo(swapper_tsb), REG1; \
+661: sethi %uhi(swapper_tsb), REG1; \
+ sethi %hi(swapper_tsb), REG2; \
+ or REG1, %ulo(swapper_tsb), REG1; \
+ or REG2, %lo(swapper_tsb), REG2; \
.section .swapper_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
-661: nop; \
- .section .tsb_ldquad_phys_patch, "ax"; \
- .word 661b; \
- sllx REG1, KTSB_PHYS_SHIFT, REG1; \
- sllx REG1, KTSB_PHYS_SHIFT, REG1; \
- .previous; \
+ sllx REG1, 32, REG1; \
+ or REG1, REG2, REG1; \
srlx VADDR, PAGE_SHIFT, REG2; \
and REG2, (KERNEL_TSB_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
@@ -291,17 +287,15 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
* we can make use of that for the index computation.
*/
#define KERN_TSB4M_LOOKUP_TL1(TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
-661: sethi %hi(swapper_4m_tsb), REG1; \
- or REG1, %lo(swapper_4m_tsb), REG1; \
+661: sethi %uhi(swapper_4m_tsb), REG1; \
+ sethi %hi(swapper_4m_tsb), REG2; \
+ or REG1, %ulo(swapper_4m_tsb), REG1; \
+ or REG2, %lo(swapper_4m_tsb), REG2; \
.section .swapper_4m_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
-661: nop; \
- .section .tsb_ldquad_phys_patch, "ax"; \
- .word 661b; \
- sllx REG1, KTSB_PHYS_SHIFT, REG1; \
- sllx REG1, KTSB_PHYS_SHIFT, REG1; \
- .previous; \
+ sllx REG1, 32, REG1; \
+ or REG1, REG2, REG1; \
and TAG, (KERNEL_TSB4M_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
add REG1, REG2, REG2; \
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 44eb215..f743f05 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1726,19 +1726,41 @@ static void __init tsb_phys_patch(void)
static struct hv_tsb_descr ktsb_descr[NUM_KTSB_DESCR];
extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];

+/* The swapper TSBs are loaded with a base sequence of:
+ *
+ * sethi %uhi(SYMBOL), REG1
+ * sethi %hi(SYMBOL), REG2
+ * or REG1, %ulo(SYMBOL), REG1
+ * or REG2, %lo(SYMBOL), REG2
+ * sllx REG1, 32, REG1
+ * or REG1, REG2, REG1
+ *
+ * When we use physical addressing for the TSB accesses, we patch the
+ * first four instructions in the above sequence.
+ */
+
static void patch_one_ktsb_phys(unsigned int *start, unsigned int *end, unsigned long pa)
{
- pa >>= KTSB_PHYS_SHIFT;
+ unsigned long high_bits, low_bits;
+
+ high_bits = (pa >> 32) & 0xffffffff;
+ low_bits = (pa >> 0) & 0xffffffff;

while (start < end) {
unsigned int *ia = (unsigned int *)(unsigned long)*start;

- ia[0] = (ia[0] & ~0x3fffff) | (pa >> 10);
+ ia[0] = (ia[0] & ~0x3fffff) | (high_bits >> 10);
__asm__ __volatile__("flush %0" : : "r" (ia));

- ia[1] = (ia[1] & ~0x3ff) | (pa & 0x3ff);
+ ia[1] = (ia[1] & ~0x3fffff) | (low_bits >> 10);
__asm__ __volatile__("flush %0" : : "r" (ia + 1));

+ ia[2] = (ia[2] & ~0x1fff) | (high_bits & 0x3ff);
+ __asm__ __volatile__("flush %0" : : "r" (ia + 2));
+
+ ia[3] = (ia[3] & ~0x1fff) | (low_bits & 0x3ff);
+ __asm__ __volatile__("flush %0" : : "r" (ia + 3));
+
start++;
}
}
--
1.9.1

2014-11-06 23:08:19

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 035/162] sparc: Let memset return the address argument

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andreas Larsson <[email protected]>

[ Upstream commit 74cad25c076a2f5253312c2fe82d1a4daecc1323 ]

This makes memset follow the standard (instead of returning 0 on success). This
is needed when certain versions of gcc optimizes around memset calls and assume
that the address argument is preserved in %o0.

Signed-off-by: Andreas Larsson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/lib/memset.S | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/lib/memset.S b/arch/sparc/lib/memset.S
index 99c017b..f75e690 100644
--- a/arch/sparc/lib/memset.S
+++ b/arch/sparc/lib/memset.S
@@ -3,8 +3,9 @@
* Copyright (C) 1996,1997 Jakub Jelinek ([email protected])
* Copyright (C) 1996 David S. Miller ([email protected])
*
- * Returns 0, if ok, and number of bytes not yet set if exception
- * occurs and we were called as clear_user.
+ * Calls to memset returns initial %o0. Calls to bzero returns 0, if ok, and
+ * number of bytes not yet set if exception occurs and we were called as
+ * clear_user.
*/

#include <asm/ptrace.h>
@@ -65,6 +66,8 @@ __bzero_begin:
.globl __memset_start, __memset_end
__memset_start:
memset:
+ mov %o0, %g1
+ mov 1, %g4
and %o1, 0xff, %g3
sll %g3, 8, %g2
or %g3, %g2, %g3
@@ -89,6 +92,7 @@ memset:
sub %o0, %o2, %o0

__bzero:
+ clr %g4
mov %g0, %g3
1:
cmp %o1, 7
@@ -151,8 +155,8 @@ __bzero:
bne,a 8f
EX(stb %g3, [%o0], and %o1, 1)
8:
- retl
- clr %o0
+ b 0f
+ nop
7:
be 13b
orcc %o1, 0, %g0
@@ -164,6 +168,12 @@ __bzero:
bne 8b
EX(stb %g3, [%o0 - 1], add %o1, 1)
0:
+ andcc %g4, 1, %g0
+ be 5f
+ nop
+ retl
+ mov %g1, %o0
+5:
retl
clr %o0
__memset_end:
--
1.9.1

2014-11-06 22:41:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 034/162] sparc64: Move request_irq() from ldc_bind() to ldc_alloc()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Sowmini Varadhan <[email protected]>

[ Upstream commit c21c4ab0d6921f7160a43216fa6973b5924de561 ]

The request_irq() needs to be done from ldc_alloc()
to avoid the following (caught by lockdep)

[00000000004a0738] __might_sleep+0xf8/0x120
[000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
[00000000004faf80] request_threaded_irq+0x80/0x160
[000000000044f71c] ldc_bind+0x7c/0x220
[0000000000452454] vio_port_up+0x54/0xe0
[00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
[00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
[0000000000451a88] vio_device_probe+0x48/0x60
[000000000074c56c] really_probe+0x6c/0x300
[000000000074c83c] driver_probe_device+0x3c/0xa0
[000000000074c92c] __driver_attach+0x8c/0xa0
[000000000074a6ec] bus_for_each_dev+0x6c/0xa0
[000000000074c1dc] driver_attach+0x1c/0x40
[000000000074b0fc] bus_add_driver+0xbc/0x280

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Dwight Engen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/ldc.h | 5 +++--
arch/sparc/kernel/ds.c | 4 ++--
arch/sparc/kernel/ldc.c | 41 +++++++++++++++++++++--------------------
arch/sparc/kernel/viohs.c | 4 ++--
4 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/sparc/include/asm/ldc.h b/arch/sparc/include/asm/ldc.h
index bdb524a..8732ed3 100644
--- a/arch/sparc/include/asm/ldc.h
+++ b/arch/sparc/include/asm/ldc.h
@@ -53,13 +53,14 @@ struct ldc_channel;
/* Allocate state for a channel. */
extern struct ldc_channel *ldc_alloc(unsigned long id,
const struct ldc_channel_config *cfgp,
- void *event_arg);
+ void *event_arg,
+ const char *name);

/* Shut down and free state for a channel. */
extern void ldc_free(struct ldc_channel *lp);

/* Register TX and RX queues of the link with the hypervisor. */
-extern int ldc_bind(struct ldc_channel *lp, const char *name);
+extern int ldc_bind(struct ldc_channel *lp);

/* For non-RAW protocols we need to complete a handshake before
* communication can proceed. ldc_connect() does that, if the
diff --git a/arch/sparc/kernel/ds.c b/arch/sparc/kernel/ds.c
index dff60ab..f87a55d 100644
--- a/arch/sparc/kernel/ds.c
+++ b/arch/sparc/kernel/ds.c
@@ -1200,14 +1200,14 @@ static int ds_probe(struct vio_dev *vdev, const struct vio_device_id *id)
ds_cfg.tx_irq = vdev->tx_irq;
ds_cfg.rx_irq = vdev->rx_irq;

- lp = ldc_alloc(vdev->channel_id, &ds_cfg, dp);
+ lp = ldc_alloc(vdev->channel_id, &ds_cfg, dp, "DS");
if (IS_ERR(lp)) {
err = PTR_ERR(lp);
goto out_free_ds_states;
}
dp->lp = lp;

- err = ldc_bind(lp, "DS");
+ err = ldc_bind(lp);
if (err)
goto out_free_ldc;

diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c
index 66dacd5..27bb554 100644
--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -1078,7 +1078,8 @@ static void ldc_iommu_release(struct ldc_channel *lp)

struct ldc_channel *ldc_alloc(unsigned long id,
const struct ldc_channel_config *cfgp,
- void *event_arg)
+ void *event_arg,
+ const char *name)
{
struct ldc_channel *lp;
const struct ldc_mode_ops *mops;
@@ -1093,6 +1094,8 @@ struct ldc_channel *ldc_alloc(unsigned long id,
err = -EINVAL;
if (!cfgp)
goto out_err;
+ if (!name)
+ goto out_err;

switch (cfgp->mode) {
case LDC_MODE_RAW:
@@ -1185,6 +1188,21 @@ struct ldc_channel *ldc_alloc(unsigned long id,

INIT_HLIST_HEAD(&lp->mh_list);

+ snprintf(lp->rx_irq_name, LDC_IRQ_NAME_MAX, "%s RX", name);
+ snprintf(lp->tx_irq_name, LDC_IRQ_NAME_MAX, "%s TX", name);
+
+ err = request_irq(lp->cfg.rx_irq, ldc_rx, 0,
+ lp->rx_irq_name, lp);
+ if (err)
+ goto out_free_txq;
+
+ err = request_irq(lp->cfg.tx_irq, ldc_tx, 0,
+ lp->tx_irq_name, lp);
+ if (err) {
+ free_irq(lp->cfg.rx_irq, lp);
+ goto out_free_txq;
+ }
+
return lp;

out_free_txq:
@@ -1237,31 +1255,14 @@ EXPORT_SYMBOL(ldc_free);
* state. This does not initiate a handshake, ldc_connect() does
* that.
*/
-int ldc_bind(struct ldc_channel *lp, const char *name)
+int ldc_bind(struct ldc_channel *lp)
{
unsigned long hv_err, flags;
int err = -EINVAL;

- if (!name ||
- (lp->state != LDC_STATE_INIT))
+ if (lp->state != LDC_STATE_INIT)
return -EINVAL;

- snprintf(lp->rx_irq_name, LDC_IRQ_NAME_MAX, "%s RX", name);
- snprintf(lp->tx_irq_name, LDC_IRQ_NAME_MAX, "%s TX", name);
-
- err = request_irq(lp->cfg.rx_irq, ldc_rx, 0,
- lp->rx_irq_name, lp);
- if (err)
- return err;
-
- err = request_irq(lp->cfg.tx_irq, ldc_tx, 0,
- lp->tx_irq_name, lp);
- if (err) {
- free_irq(lp->cfg.rx_irq, lp);
- return err;
- }
-
-
spin_lock_irqsave(&lp->lock, flags);

enable_irq(lp->cfg.rx_irq);
diff --git a/arch/sparc/kernel/viohs.c b/arch/sparc/kernel/viohs.c
index f8e7dd5..9c5fbd0 100644
--- a/arch/sparc/kernel/viohs.c
+++ b/arch/sparc/kernel/viohs.c
@@ -714,7 +714,7 @@ int vio_ldc_alloc(struct vio_driver_state *vio,
cfg.tx_irq = vio->vdev->tx_irq;
cfg.rx_irq = vio->vdev->rx_irq;

- lp = ldc_alloc(vio->vdev->channel_id, &cfg, event_arg);
+ lp = ldc_alloc(vio->vdev->channel_id, &cfg, event_arg, vio->name);
if (IS_ERR(lp))
return PTR_ERR(lp);

@@ -746,7 +746,7 @@ void vio_port_up(struct vio_driver_state *vio)

err = 0;
if (state == LDC_STATE_INIT) {
- err = ldc_bind(vio->lp, vio->name);
+ err = ldc_bind(vio->lp);
if (err)
printk(KERN_WARNING "%s: Port %lu bind failed, "
"err=%d\n",
--
1.9.1

2014-11-06 23:08:51

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 052/162] sparc64: sparse irq

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: bob picco <[email protected]>

[ Upstream commit ee6a9333fa58e11577c1b531b8e0f5ffc0fd6f50 ]

This patch attempts to do a few things. The highlights are: 1) enable
SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
ivector_table at boot time and 4) default to cookie only VIRQ mechanism
for supported firmware. The first firmware with cookie only support for
me appears on T5. You can optionally force the HV firmware to not cookie
only mode which is the sysino support.

The sysino is a deprecated HV mechanism according to the most recent
SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
cookie/sysino firmware versioning.

The history of this interface is:

1) Major version 1.0 only supported sysino based interrupt interfaces.

2) Major version 2.0 added cookie based VIRQs, however due to the fact
that OSs were using the VIRQs without negoatiating major version
2.0 (Linux and Solaris are both guilty), the VIRQs calls were
allowed even with major version 1.0

To complicate things even further, the VIRQ interfaces were only
actually hooked up in the hypervisor for LDC interrupt sources.
VIRQ calls on other device types would result in HV_EINVAL errors.

So effectively, major version 2.0 is unusable.

3) Major version 3.0 was created to signal use of VIRQs and the fact
that the hypervisor has these calls hooked up for all interrupt
sources, not just those for LDC devices.

A new boot option is provided should cookie only HV support have issues.
hvirq - this is the version for HV_GRP_INTR. This is related to HV API
versioning. The code attempts major=3 first by default. The option can
be used to override this default.

I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.

Signed-off-by: Bob Picco <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/Kconfig | 1 +
arch/sparc/include/asm/irq_64.h | 9 +-
arch/sparc/kernel/irq_64.c | 507 ++++++++++++++++++++++++++--------------
3 files changed, 342 insertions(+), 175 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 94cd9c1..57cf13a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -66,6 +66,7 @@ config SPARC64
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_CONTEXT_TRACKING
select HAVE_DEBUG_KMEMLEAK
+ select SPARSE_IRQ
select RTC_DRV_CMOS
select RTC_DRV_BQ4802
select RTC_DRV_SUN4V
diff --git a/arch/sparc/include/asm/irq_64.h b/arch/sparc/include/asm/irq_64.h
index abf6afe..3deb07f 100644
--- a/arch/sparc/include/asm/irq_64.h
+++ b/arch/sparc/include/asm/irq_64.h
@@ -37,7 +37,7 @@
*
* ino_bucket->irq allocation is made during {sun4v_,}build_irq().
*/
-#define NR_IRQS 255
+#define NR_IRQS (2048)

extern void irq_install_pre_handler(int irq,
void (*func)(unsigned int, void *, void *),
@@ -57,11 +57,8 @@ extern unsigned int sun4u_build_msi(u32 portid, unsigned int *irq_p,
unsigned long iclr_base);
extern void sun4u_destroy_msi(unsigned int irq);

-extern unsigned char irq_alloc(unsigned int dev_handle,
- unsigned int dev_ino);
-#ifdef CONFIG_PCI_MSI
-extern void irq_free(unsigned int irq);
-#endif
+unsigned int irq_alloc(unsigned int dev_handle, unsigned int dev_ino);
+void irq_free(unsigned int irq);

extern void __init init_IRQ(void);
extern void fixup_irqs(void);
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index 666193f..4033c23 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -47,8 +47,6 @@
#include "cpumap.h"
#include "kstack.h"

-#define NUM_IVECS (IMAP_INR + 1)
-
struct ino_bucket *ivector_table;
unsigned long ivector_table_pa;

@@ -107,55 +105,196 @@ static void bucket_set_irq(unsigned long bucket_pa, unsigned int irq)

#define irq_work_pa(__cpu) &(trap_block[(__cpu)].irq_worklist_pa)

-static struct {
- unsigned int dev_handle;
- unsigned int dev_ino;
- unsigned int in_use;
-} irq_table[NR_IRQS];
-static DEFINE_SPINLOCK(irq_alloc_lock);
+static unsigned long hvirq_major __initdata;
+static int __init early_hvirq_major(char *p)
+{
+ int rc = kstrtoul(p, 10, &hvirq_major);
+
+ return rc;
+}
+early_param("hvirq", early_hvirq_major);
+
+static int hv_irq_version;
+
+/* Major version 2.0 of HV_GRP_INTR added support for the VIRQ cookie
+ * based interfaces, but:
+ *
+ * 1) Several OSs, Solaris and Linux included, use them even when only
+ * negotiating version 1.0 (or failing to negotiate at all). So the
+ * hypervisor has a workaround that provides the VIRQ interfaces even
+ * when only verion 1.0 of the API is in use.
+ *
+ * 2) Second, and more importantly, with major version 2.0 these VIRQ
+ * interfaces only were actually hooked up for LDC interrupts, even
+ * though the Hypervisor specification clearly stated:
+ *
+ * The new interrupt API functions will be available to a guest
+ * when it negotiates version 2.0 in the interrupt API group 0x2. When
+ * a guest negotiates version 2.0, all interrupt sources will only
+ * support using the cookie interface, and any attempt to use the
+ * version 1.0 interrupt APIs numbered 0xa0 to 0xa6 will result in the
+ * ENOTSUPPORTED error being returned.
+ *
+ * with an emphasis on "all interrupt sources".
+ *
+ * To correct this, major version 3.0 was created which does actually
+ * support VIRQs for all interrupt sources (not just LDC devices). So
+ * if we want to move completely over the cookie based VIRQs we must
+ * negotiate major version 3.0 or later of HV_GRP_INTR.
+ */
+static bool sun4v_cookie_only_virqs(void)
+{
+ if (hv_irq_version >= 3)
+ return true;
+ return false;
+}

-unsigned char irq_alloc(unsigned int dev_handle, unsigned int dev_ino)
+static void __init irq_init_hv(void)
{
- unsigned long flags;
- unsigned char ent;
+ unsigned long hv_error, major, minor = 0;
+
+ if (tlb_type != hypervisor)
+ return;

- BUILD_BUG_ON(NR_IRQS >= 256);
+ if (hvirq_major)
+ major = hvirq_major;
+ else
+ major = 3;

- spin_lock_irqsave(&irq_alloc_lock, flags);
+ hv_error = sun4v_hvapi_register(HV_GRP_INTR, major, &minor);
+ if (!hv_error)
+ hv_irq_version = major;
+ else
+ hv_irq_version = 1;

- for (ent = 1; ent < NR_IRQS; ent++) {
- if (!irq_table[ent].in_use)
+ pr_info("SUN4V: Using IRQ API major %d, cookie only virqs %s\n",
+ hv_irq_version,
+ sun4v_cookie_only_virqs() ? "enabled" : "disabled");
+}
+
+/* This function is for the timer interrupt.*/
+int __init arch_probe_nr_irqs(void)
+{
+ return 1;
+}
+
+#define DEFAULT_NUM_IVECS (0xfffU)
+static unsigned int nr_ivec = DEFAULT_NUM_IVECS;
+#define NUM_IVECS (nr_ivec)
+
+static unsigned int __init size_nr_ivec(void)
+{
+ if (tlb_type == hypervisor) {
+ switch (sun4v_chip_type) {
+ /* Athena's devhandle|devino is large.*/
+ case SUN4V_CHIP_SPARC64X:
+ nr_ivec = 0xffff;
break;
+ }
}
- if (ent >= NR_IRQS) {
- printk(KERN_ERR "IRQ: Out of virtual IRQs.\n");
- ent = 0;
- } else {
- irq_table[ent].dev_handle = dev_handle;
- irq_table[ent].dev_ino = dev_ino;
- irq_table[ent].in_use = 1;
- }
+ return nr_ivec;
+}
+
+struct irq_handler_data {
+ union {
+ struct {
+ unsigned int dev_handle;
+ unsigned int dev_ino;
+ };
+ unsigned long sysino;
+ };
+ struct ino_bucket bucket;
+ unsigned long iclr;
+ unsigned long imap;
+};
+
+static inline unsigned int irq_data_to_handle(struct irq_data *data)
+{
+ struct irq_handler_data *ihd = data->handler_data;
+
+ return ihd->dev_handle;
+}
+
+static inline unsigned int irq_data_to_ino(struct irq_data *data)
+{
+ struct irq_handler_data *ihd = data->handler_data;

- spin_unlock_irqrestore(&irq_alloc_lock, flags);
+ return ihd->dev_ino;
+}
+
+static inline unsigned long irq_data_to_sysino(struct irq_data *data)
+{
+ struct irq_handler_data *ihd = data->handler_data;

- return ent;
+ return ihd->sysino;
}

-#ifdef CONFIG_PCI_MSI
void irq_free(unsigned int irq)
{
- unsigned long flags;
+ void *data = irq_get_handler_data(irq);

- if (irq >= NR_IRQS)
- return;
+ kfree(data);
+ irq_set_handler_data(irq, NULL);
+ irq_free_descs(irq, 1);
+}

- spin_lock_irqsave(&irq_alloc_lock, flags);
+unsigned int irq_alloc(unsigned int dev_handle, unsigned int dev_ino)
+{
+ int irq;

- irq_table[irq].in_use = 0;
+ irq = __irq_alloc_descs(-1, 1, 1, numa_node_id(), NULL);
+ if (irq <= 0)
+ goto out;

- spin_unlock_irqrestore(&irq_alloc_lock, flags);
+ return irq;
+out:
+ return 0;
+}
+
+static unsigned int cookie_exists(u32 devhandle, unsigned int devino)
+{
+ unsigned long hv_err, cookie;
+ struct ino_bucket *bucket;
+ unsigned int irq = 0U;
+
+ hv_err = sun4v_vintr_get_cookie(devhandle, devino, &cookie);
+ if (hv_err) {
+ pr_err("HV get cookie failed hv_err = %ld\n", hv_err);
+ goto out;
+ }
+
+ if (cookie & ((1UL << 63UL))) {
+ cookie = ~cookie;
+ bucket = (struct ino_bucket *) __va(cookie);
+ irq = bucket->__irq;
+ }
+out:
+ return irq;
+}
+
+static unsigned int sysino_exists(u32 devhandle, unsigned int devino)
+{
+ unsigned long sysino = sun4v_devino_to_sysino(devhandle, devino);
+ struct ino_bucket *bucket;
+ unsigned int irq;
+
+ bucket = &ivector_table[sysino];
+ irq = bucket_get_irq(__pa(bucket));
+
+ return irq;
+}
+
+void ack_bad_irq(unsigned int irq)
+{
+ pr_crit("BAD IRQ ack %d\n", irq);
+}
+
+void irq_install_pre_handler(int irq,
+ void (*func)(unsigned int, void *, void *),
+ void *arg1, void *arg2)
+{
+ pr_warn("IRQ pre handler NOT supported.\n");
}
-#endif

/*
* /proc/interrupts printing:
@@ -206,15 +345,6 @@ static unsigned int sun4u_compute_tid(unsigned long imap, unsigned long cpuid)
return tid;
}

-struct irq_handler_data {
- unsigned long iclr;
- unsigned long imap;
-
- void (*pre_handler)(unsigned int, void *, void *);
- void *arg1;
- void *arg2;
-};
-
#ifdef CONFIG_SMP
static int irq_choose_cpu(unsigned int irq, const struct cpumask *affinity)
{
@@ -316,8 +446,8 @@ static void sun4u_irq_eoi(struct irq_data *data)

static void sun4v_irq_enable(struct irq_data *data)
{
- unsigned int ino = irq_table[data->irq].dev_ino;
unsigned long cpuid = irq_choose_cpu(data->irq, data->affinity);
+ unsigned int ino = irq_data_to_sysino(data);
int err;

err = sun4v_intr_settarget(ino, cpuid);
@@ -337,8 +467,8 @@ static void sun4v_irq_enable(struct irq_data *data)
static int sun4v_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
{
- unsigned int ino = irq_table[data->irq].dev_ino;
unsigned long cpuid = irq_choose_cpu(data->irq, mask);
+ unsigned int ino = irq_data_to_sysino(data);
int err;

err = sun4v_intr_settarget(ino, cpuid);
@@ -351,7 +481,7 @@ static int sun4v_set_affinity(struct irq_data *data,

static void sun4v_irq_disable(struct irq_data *data)
{
- unsigned int ino = irq_table[data->irq].dev_ino;
+ unsigned int ino = irq_data_to_sysino(data);
int err;

err = sun4v_intr_setenabled(ino, HV_INTR_DISABLED);
@@ -362,7 +492,7 @@ static void sun4v_irq_disable(struct irq_data *data)

static void sun4v_irq_eoi(struct irq_data *data)
{
- unsigned int ino = irq_table[data->irq].dev_ino;
+ unsigned int ino = irq_data_to_sysino(data);
int err;

err = sun4v_intr_setstate(ino, HV_INTR_STATE_IDLE);
@@ -373,14 +503,13 @@ static void sun4v_irq_eoi(struct irq_data *data)

static void sun4v_virq_enable(struct irq_data *data)
{
- unsigned long cpuid, dev_handle, dev_ino;
+ unsigned long dev_handle = irq_data_to_handle(data);
+ unsigned long dev_ino = irq_data_to_ino(data);
+ unsigned long cpuid;
int err;

cpuid = irq_choose_cpu(data->irq, data->affinity);

- dev_handle = irq_table[data->irq].dev_handle;
- dev_ino = irq_table[data->irq].dev_ino;
-
err = sun4v_vintr_set_target(dev_handle, dev_ino, cpuid);
if (err != HV_EOK)
printk(KERN_ERR "sun4v_vintr_set_target(%lx,%lx,%lu): "
@@ -403,14 +532,13 @@ static void sun4v_virq_enable(struct irq_data *data)
static int sun4v_virt_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
{
- unsigned long cpuid, dev_handle, dev_ino;
+ unsigned long dev_handle = irq_data_to_handle(data);
+ unsigned long dev_ino = irq_data_to_ino(data);
+ unsigned long cpuid;
int err;

cpuid = irq_choose_cpu(data->irq, mask);

- dev_handle = irq_table[data->irq].dev_handle;
- dev_ino = irq_table[data->irq].dev_ino;
-
err = sun4v_vintr_set_target(dev_handle, dev_ino, cpuid);
if (err != HV_EOK)
printk(KERN_ERR "sun4v_vintr_set_target(%lx,%lx,%lu): "
@@ -422,11 +550,10 @@ static int sun4v_virt_set_affinity(struct irq_data *data,

static void sun4v_virq_disable(struct irq_data *data)
{
- unsigned long dev_handle, dev_ino;
+ unsigned long dev_handle = irq_data_to_handle(data);
+ unsigned long dev_ino = irq_data_to_ino(data);
int err;

- dev_handle = irq_table[data->irq].dev_handle;
- dev_ino = irq_table[data->irq].dev_ino;

err = sun4v_vintr_set_valid(dev_handle, dev_ino,
HV_INTR_DISABLED);
@@ -438,12 +565,10 @@ static void sun4v_virq_disable(struct irq_data *data)

static void sun4v_virq_eoi(struct irq_data *data)
{
- unsigned long dev_handle, dev_ino;
+ unsigned long dev_handle = irq_data_to_handle(data);
+ unsigned long dev_ino = irq_data_to_ino(data);
int err;

- dev_handle = irq_table[data->irq].dev_handle;
- dev_ino = irq_table[data->irq].dev_ino;
-
err = sun4v_vintr_set_state(dev_handle, dev_ino,
HV_INTR_STATE_IDLE);
if (err != HV_EOK)
@@ -479,31 +604,10 @@ static struct irq_chip sun4v_virq = {
.flags = IRQCHIP_EOI_IF_HANDLED,
};

-static void pre_flow_handler(struct irq_data *d)
-{
- struct irq_handler_data *handler_data = irq_data_get_irq_handler_data(d);
- unsigned int ino = irq_table[d->irq].dev_ino;
-
- handler_data->pre_handler(ino, handler_data->arg1, handler_data->arg2);
-}
-
-void irq_install_pre_handler(int irq,
- void (*func)(unsigned int, void *, void *),
- void *arg1, void *arg2)
-{
- struct irq_handler_data *handler_data = irq_get_handler_data(irq);
-
- handler_data->pre_handler = func;
- handler_data->arg1 = arg1;
- handler_data->arg2 = arg2;
-
- __irq_set_preflow_handler(irq, pre_flow_handler);
-}
-
unsigned int build_irq(int inofixup, unsigned long iclr, unsigned long imap)
{
- struct ino_bucket *bucket;
struct irq_handler_data *handler_data;
+ struct ino_bucket *bucket;
unsigned int irq;
int ino;

@@ -537,119 +641,166 @@ out:
return irq;
}

-static unsigned int sun4v_build_common(unsigned long sysino,
- struct irq_chip *chip)
+static unsigned int sun4v_build_common(u32 devhandle, unsigned int devino,
+ void (*handler_data_init)(struct irq_handler_data *data,
+ u32 devhandle, unsigned int devino),
+ struct irq_chip *chip)
{
- struct ino_bucket *bucket;
- struct irq_handler_data *handler_data;
+ struct irq_handler_data *data;
unsigned int irq;

- BUG_ON(tlb_type != hypervisor);
+ irq = irq_alloc(devhandle, devino);
+ if (!irq)
+ goto out;

- bucket = &ivector_table[sysino];
- irq = bucket_get_irq(__pa(bucket));
- if (!irq) {
- irq = irq_alloc(0, sysino);
- bucket_set_irq(__pa(bucket), irq);
- irq_set_chip_and_handler_name(irq, chip, handle_fasteoi_irq,
- "IVEC");
+ data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
+ if (unlikely(!data)) {
+ pr_err("IRQ handler data allocation failed.\n");
+ irq_free(irq);
+ irq = 0;
+ goto out;
}

- handler_data = irq_get_handler_data(irq);
- if (unlikely(handler_data))
- goto out;
+ irq_set_handler_data(irq, data);
+ handler_data_init(data, devhandle, devino);
+ irq_set_chip_and_handler_name(irq, chip, handle_fasteoi_irq, "IVEC");
+ data->imap = ~0UL;
+ data->iclr = ~0UL;
+out:
+ return irq;
+}

- handler_data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
- if (unlikely(!handler_data)) {
- prom_printf("IRQ: kzalloc(irq_handler_data) failed.\n");
- prom_halt();
- }
- irq_set_handler_data(irq, handler_data);
+static unsigned long cookie_assign(unsigned int irq, u32 devhandle,
+ unsigned int devino)
+{
+ struct irq_handler_data *ihd = irq_get_handler_data(irq);
+ unsigned long hv_error, cookie;

- /* Catch accidental accesses to these things. IMAP/ICLR handling
- * is done by hypervisor calls on sun4v platforms, not by direct
- * register accesses.
+ /* handler_irq needs to find the irq. cookie is seen signed in
+ * sun4v_dev_mondo and treated as a non ivector_table delivery.
*/
- handler_data->imap = ~0UL;
- handler_data->iclr = ~0UL;
+ ihd->bucket.__irq = irq;
+ cookie = ~__pa(&ihd->bucket);

-out:
- return irq;
+ hv_error = sun4v_vintr_set_cookie(devhandle, devino, cookie);
+ if (hv_error)
+ pr_err("HV vintr set cookie failed = %ld\n", hv_error);
+
+ return hv_error;
}

-unsigned int sun4v_build_irq(u32 devhandle, unsigned int devino)
+static void cookie_handler_data(struct irq_handler_data *data,
+ u32 devhandle, unsigned int devino)
{
- unsigned long sysino = sun4v_devino_to_sysino(devhandle, devino);
+ data->dev_handle = devhandle;
+ data->dev_ino = devino;
+}

- return sun4v_build_common(sysino, &sun4v_irq);
+static unsigned int cookie_build_irq(u32 devhandle, unsigned int devino,
+ struct irq_chip *chip)
+{
+ unsigned long hv_error;
+ unsigned int irq;
+
+ irq = sun4v_build_common(devhandle, devino, cookie_handler_data, chip);
+
+ hv_error = cookie_assign(irq, devhandle, devino);
+ if (hv_error) {
+ irq_free(irq);
+ irq = 0;
+ }
+
+ return irq;
}

-unsigned int sun4v_build_virq(u32 devhandle, unsigned int devino)
+static unsigned int sun4v_build_cookie(u32 devhandle, unsigned int devino)
{
- struct irq_handler_data *handler_data;
- unsigned long hv_err, cookie;
- struct ino_bucket *bucket;
unsigned int irq;

- bucket = kzalloc(sizeof(struct ino_bucket), GFP_ATOMIC);
- if (unlikely(!bucket))
- return 0;
+ irq = cookie_exists(devhandle, devino);
+ if (irq)
+ goto out;

- /* The only reference we store to the IRQ bucket is
- * by physical address which kmemleak can't see, tell
- * it that this object explicitly is not a leak and
- * should be scanned.
- */
- kmemleak_not_leak(bucket);
+ irq = cookie_build_irq(devhandle, devino, &sun4v_virq);

- __flush_dcache_range((unsigned long) bucket,
- ((unsigned long) bucket +
- sizeof(struct ino_bucket)));
+out:
+ return irq;
+}

- irq = irq_alloc(devhandle, devino);
+static void sysino_set_bucket(unsigned int irq)
+{
+ struct irq_handler_data *ihd = irq_get_handler_data(irq);
+ struct ino_bucket *bucket;
+ unsigned long sysino;
+
+ sysino = sun4v_devino_to_sysino(ihd->dev_handle, ihd->dev_ino);
+ BUG_ON(sysino >= nr_ivec);
+ bucket = &ivector_table[sysino];
bucket_set_irq(__pa(bucket), irq);
+}

- irq_set_chip_and_handler_name(irq, &sun4v_virq, handle_fasteoi_irq,
- "IVEC");
+static void sysino_handler_data(struct irq_handler_data *data,
+ u32 devhandle, unsigned int devino)
+{
+ unsigned long sysino;

- handler_data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
- if (unlikely(!handler_data))
- return 0;
+ sysino = sun4v_devino_to_sysino(devhandle, devino);
+ data->sysino = sysino;
+}

- /* In order to make the LDC channel startup sequence easier,
- * especially wrt. locking, we do not let request_irq() enable
- * the interrupt.
- */
- irq_set_status_flags(irq, IRQ_NOAUTOEN);
- irq_set_handler_data(irq, handler_data);
+static unsigned int sysino_build_irq(u32 devhandle, unsigned int devino,
+ struct irq_chip *chip)
+{
+ unsigned int irq;

- /* Catch accidental accesses to these things. IMAP/ICLR handling
- * is done by hypervisor calls on sun4v platforms, not by direct
- * register accesses.
- */
- handler_data->imap = ~0UL;
- handler_data->iclr = ~0UL;
+ irq = sun4v_build_common(devhandle, devino, sysino_handler_data, chip);
+ if (!irq)
+ goto out;

- cookie = ~__pa(bucket);
- hv_err = sun4v_vintr_set_cookie(devhandle, devino, cookie);
- if (hv_err) {
- prom_printf("IRQ: Fatal, cannot set cookie for [%x:%x] "
- "err=%lu\n", devhandle, devino, hv_err);
- prom_halt();
- }
+ sysino_set_bucket(irq);
+out:
+ return irq;
+}

+static int sun4v_build_sysino(u32 devhandle, unsigned int devino)
+{
+ int irq;
+
+ irq = sysino_exists(devhandle, devino);
+ if (irq)
+ goto out;
+
+ irq = sysino_build_irq(devhandle, devino, &sun4v_irq);
+out:
return irq;
}

-void ack_bad_irq(unsigned int irq)
+unsigned int sun4v_build_irq(u32 devhandle, unsigned int devino)
{
- unsigned int ino = irq_table[irq].dev_ino;
+ unsigned int irq;

- if (!ino)
- ino = 0xdeadbeef;
+ if (sun4v_cookie_only_virqs())
+ irq = sun4v_build_cookie(devhandle, devino);
+ else
+ irq = sun4v_build_sysino(devhandle, devino);

- printk(KERN_CRIT "Unexpected IRQ from ino[%x] irq[%u]\n",
- ino, irq);
+ return irq;
+}
+
+unsigned int sun4v_build_virq(u32 devhandle, unsigned int devino)
+{
+ int irq;
+
+ irq = cookie_build_irq(devhandle, devino, &sun4v_virq);
+ if (!irq)
+ goto out;
+
+ /* This is borrowed from the original function.
+ */
+ irq_set_status_flags(irq, IRQ_NOAUTOEN);
+
+out:
+ return irq;
}

void *hardirq_stack[NR_CPUS];
@@ -720,9 +871,12 @@ void fixup_irqs(void)

for (irq = 0; irq < NR_IRQS; irq++) {
struct irq_desc *desc = irq_to_desc(irq);
- struct irq_data *data = irq_desc_get_irq_data(desc);
+ struct irq_data *data;
unsigned long flags;

+ if (!desc)
+ continue;
+ data = irq_desc_get_irq_data(desc);
raw_spin_lock_irqsave(&desc->lock, flags);
if (desc->action && !irqd_is_per_cpu(data)) {
if (data->chip->irq_set_affinity)
@@ -922,16 +1076,22 @@ static struct irqaction timer_irq_action = {
.name = "timer",
};

-/* Only invoked on boot processor. */
-void __init init_IRQ(void)
+static void __init irq_ivector_init(void)
{
- unsigned long size;
+ unsigned long size, order;
+ unsigned int ivecs;

- map_prom_timers();
- kill_prom_timer();
+ /* If we are doing cookie only VIRQs then we do not need the ivector
+ * table to process interrupts.
+ */
+ if (sun4v_cookie_only_virqs())
+ return;

- size = sizeof(struct ino_bucket) * NUM_IVECS;
- ivector_table = kzalloc(size, GFP_KERNEL);
+ ivecs = size_nr_ivec();
+ size = sizeof(struct ino_bucket) * ivecs;
+ order = get_order(size);
+ ivector_table = (struct ino_bucket *)
+ __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
if (!ivector_table) {
prom_printf("Fatal error, cannot allocate ivector_table\n");
prom_halt();
@@ -940,6 +1100,15 @@ void __init init_IRQ(void)
((unsigned long) ivector_table) + size);

ivector_table_pa = __pa(ivector_table);
+}
+
+/* Only invoked on boot processor.*/
+void __init init_IRQ(void)
+{
+ irq_init_hv();
+ irq_ivector_init();
+ map_prom_timers();
+ kill_prom_timer();

if (tlb_type == hypervisor)
sun4v_init_mondo_queues();
--
1.9.1

2014-11-06 23:09:24

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 033/162] sparc64: find_node adjustment

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: bob picco <[email protected]>

[ Upstream commit 3dee9df54836d5f844f3d58281d3f3e6331b467f ]

We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured. Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved - T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
memory size = 0x27f870000
memory.cnt = 0x3
memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
reserved.cnt = 0x2
reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
memory size = 0xfdf2d0000
memory.cnt = 0x3
memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
reserved.cnt = 0x2
reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes
reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: [email protected]
Signed-off-by: Bob Picco <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/mm/init_64.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index e275cde..a1d48b9 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -838,7 +838,10 @@ static int find_node(unsigned long addr)
if ((addr & p->mask) == p->val)
return i;
}
- return -1;
+ /* The following condition has been observed on LDOM guests.*/
+ WARN_ONCE(1, "find_node: A physical address doesn't match a NUMA node"
+ " rule. Some physical memory will be owned by node 0.");
+ return 0;
}

static u64 memblock_nid_range(u64 start, u64 end, int *nid)
--
1.9.1

2014-11-06 23:09:46

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 036/162] sparc64: Fix reversed start/end in flush_tlb_kernel_range()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 473ad7f4fb005d1bb727e4ef27d370d28703a062 ]

When we have to split up a flush request into multiple pieces
(in order to avoid the firmware range) we don't specify the
arguments in the right order for the second piece.

Fix the order, or else we get hangs as the code tries to
flush "a lot" of entries and we get lockups like this:

[ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
[ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
[ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
[ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
[ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000 Not tainted
[ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
[ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
[ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
[ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
[ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
[ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
[ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
[ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
[ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
[ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
[ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
[ 4423.234193] Call Trace:
[ 4423.239051] [0000000000530c24] free_vmap_area_noflush+0x64/0x80
[ 4423.251029] [0000000000531a7c] remove_vm_area+0x5c/0x80
[ 4423.261628] [0000000000531b80] __vunmap+0x20/0x120
[ 4423.271352] [000000000071cf18] n_tty_close+0x18/0x40
[ 4423.281423] [00000000007222b0] tty_ldisc_close+0x30/0x60
[ 4423.292183] [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
[ 4423.303120] [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
[ 4423.314232] [0000000000719aa0] __tty_hangup+0x280/0x3c0
[ 4423.324835] [0000000000724cb4] pty_close+0x134/0x1a0
[ 4423.334905] [000000000071aa24] tty_release+0x104/0x500
[ 4423.345316] [00000000005511d0] __fput+0x90/0x1e0
[ 4423.354701] [000000000047fa54] task_work_run+0x94/0xe0
[ 4423.365126] [0000000000404b44] __handle_signal+0xc/0x2c

Fixes: 4ca9a23765da ("sparc64: Guard against flushing openfirmware mappings.")
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/mm/init_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index a1d48b9..1a9cc81 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2719,8 +2719,8 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
do_flush_tlb_kernel_range(start, LOW_OBP_ADDRESS);
}
if (end > HI_OBP_ADDRESS) {
- flush_tsb_kernel_range(end, HI_OBP_ADDRESS);
- do_flush_tlb_kernel_range(end, HI_OBP_ADDRESS);
+ flush_tsb_kernel_range(HI_OBP_ADDRESS, end);
+ do_flush_tlb_kernel_range(HI_OBP_ADDRESS, end);
}
} else {
flush_tsb_kernel_range(start, end);
--
1.9.1

2014-11-06 23:10:08

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 053/162] sparc64: Kill unnecessary tables and increase MAX_BANKS.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit d195b71bad4347d2df51072a537f922546a904f1 ]

swapper_low_pmd_dir and swapper_pud_dir are actually completely
useless and unnecessary.

We just need swapper_pg_dir[]. Naturally the other page table chunks
will be allocated on an as-needed basis. Since the kernel actually
accesses these tables in the PAGE_OFFSET view, there is not even a TLB
locality advantage of placing them in the kernel image.

Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
naturally page aligned.

Increase MAX_BANKS to 1024 in order to handle heavily fragmented
virtual guests.

Even with this MAX_BANKS increase, the kernel is 20K+ smaller.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/pgtable_64.h | 1 -
arch/sparc/kernel/vmlinux.lds.S | 5 +++--
arch/sparc/mm/init_64.c | 25 ++-----------------------
3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index ad1def4..e8dfabf 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -927,7 +927,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
#endif

extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-extern pmd_t swapper_low_pmd_dir[PTRS_PER_PMD];

extern void paging_init(void);
extern unsigned long find_ecache_flush_span(unsigned long size);
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index 0bacceb..0924305 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -35,8 +35,9 @@ jiffies = jiffies_64;

SECTIONS
{
- /* swapper_low_pmd_dir is sparc64 only */
- swapper_low_pmd_dir = 0x0000000000402000;
+#ifdef CONFIG_SPARC64
+ swapper_pg_dir = 0x0000000000402000;
+#endif
. = INITIAL_ADDRESS;
.text TEXTSTART :
{
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index ca634b3..39c52fc 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -85,7 +85,7 @@ extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];

static unsigned long cpu_pgsz_mask;

-#define MAX_BANKS 32
+#define MAX_BANKS 1024

static struct linux_prom64_registers pavail[MAX_BANKS];
static int pavail_ents;
@@ -1936,12 +1936,6 @@ static void __init sun4v_linear_pte_xor_finalize(void)

static unsigned long last_valid_pfn;

-/* These must be page aligned in order to not trigger the
- * alignment tests of pgd_bad() and pud_bad().
- */
-pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__ ((aligned (PAGE_SIZE)));
-static pud_t swapper_pud_dir[PTRS_PER_PUD] __attribute__ ((aligned (PAGE_SIZE)));
-
static void sun4u_pgprot_init(void);
static void sun4v_pgprot_init(void);

@@ -1949,8 +1943,6 @@ void __init paging_init(void)
{
unsigned long end_pfn, shift, phys_base;
unsigned long real_end, i;
- pud_t *pud;
- pmd_t *pmd;
int node;

setup_page_offset();
@@ -2045,20 +2037,7 @@ void __init paging_init(void)
*/
init_mm.pgd += ((shift) / (sizeof(pgd_t)));

- memset(swapper_low_pmd_dir, 0, sizeof(swapper_low_pmd_dir));
-
- /* The kernel page tables we publish into what the rest of the
- * world sees must be adjusted so that they see the PAGE_OFFSET
- * address of these in-kerenel data structures. However right
- * here we must access them from the kernel image side, because
- * the trap tables haven't been taken over and therefore we cannot
- * take TLB misses in the PAGE_OFFSET linear mappings yet.
- */
- pud = swapper_pud_dir + (shift / sizeof(pud_t));
- pgd_set(&swapper_pg_dir[0], pud);
-
- pmd = swapper_low_pmd_dir + (shift / sizeof(pmd_t));
- pud_set(&swapper_pud_dir[0], pmd);
+ memset(swapper_pg_dir, 0, sizeof(swapper_pg_dir));

inherit_prom_mappings();

--
1.9.1

2014-11-06 23:10:12

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 048/162] sparc64: Fix physical memory management regressions with large max_phys_bits.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 0dd5b7b09e13dae32869371e08e1048349fd040c ]

If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.

Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.

Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.

The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.

The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.

The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.

These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.

So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.

We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.

On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/page_64.h | 3 -
arch/sparc/include/asm/pgtable_64.h | 55 +++--
arch/sparc/include/asm/tsb.h | 47 ++++-
arch/sparc/kernel/ktlb.S | 108 +---------
arch/sparc/kernel/vmlinux.lds.S | 5 -
arch/sparc/mm/init_64.c | 393 +++++++++++++++---------------------
arch/sparc/mm/init_64.h | 7 -
7 files changed, 244 insertions(+), 374 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 6dc1948..b70210d 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -128,9 +128,6 @@ extern unsigned long PAGE_OFFSET;
*/
#define MAX_PHYS_ADDRESS_BITS 47

-/* These two shift counts are used when indexing sparc64_valid_addr_bitmap
- * and kpte_linear_bitmap.
- */
#define ILOG2_4MB 22
#define ILOG2_256MB 28

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 5c91201..be03c00 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -79,22 +79,7 @@

#include <linux/sched.h>

-extern unsigned long sparc64_valid_addr_bitmap[];
-
-/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
-static inline bool __kern_addr_valid(unsigned long paddr)
-{
- if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
- return false;
- return test_bit(paddr >> ILOG2_4MB, sparc64_valid_addr_bitmap);
-}
-
-static inline bool kern_addr_valid(unsigned long addr)
-{
- unsigned long paddr = __pa(addr);
-
- return __kern_addr_valid(paddr);
-}
+bool kern_addr_valid(unsigned long addr);

/* Entries per page directory level. */
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-3))
@@ -122,6 +107,7 @@ static inline bool kern_addr_valid(unsigned long addr)
#define _PAGE_R _AC(0x8000000000000000,UL) /* Keep ref bit uptodate*/
#define _PAGE_SPECIAL _AC(0x0200000000000000,UL) /* Special page */
#define _PAGE_PMD_HUGE _AC(0x0100000000000000,UL) /* Huge page */
+#define _PAGE_PUD_HUGE _PAGE_PMD_HUGE

/* Advertise support for _PAGE_SPECIAL */
#define __HAVE_ARCH_PTE_SPECIAL
@@ -668,26 +654,26 @@ static inline unsigned long pmd_large(pmd_t pmd)
return pte_val(pte) & _PAGE_PMD_HUGE;
}

-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static inline unsigned long pmd_young(pmd_t pmd)
+static inline unsigned long pmd_pfn(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));

- return pte_young(pte);
+ return pte_pfn(pte);
}

-static inline unsigned long pmd_write(pmd_t pmd)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline unsigned long pmd_young(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));

- return pte_write(pte);
+ return pte_young(pte);
}

-static inline unsigned long pmd_pfn(pmd_t pmd)
+static inline unsigned long pmd_write(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));

- return pte_pfn(pte);
+ return pte_write(pte);
}

static inline unsigned long pmd_trans_huge(pmd_t pmd)
@@ -781,18 +767,15 @@ static inline int pmd_present(pmd_t pmd)
* the top bits outside of the range of any physical address size we
* support are clear as well. We also validate the physical itself.
*/
-#define pmd_bad(pmd) ((pmd_val(pmd) & ~PAGE_MASK) || \
- !__kern_addr_valid(pmd_val(pmd)))
+#define pmd_bad(pmd) (pmd_val(pmd) & ~PAGE_MASK)

#define pud_none(pud) (!pud_val(pud))

-#define pud_bad(pud) ((pud_val(pud) & ~PAGE_MASK) || \
- !__kern_addr_valid(pud_val(pud)))
+#define pud_bad(pud) (pud_val(pud) & ~PAGE_MASK)

#define pgd_none(pgd) (!pgd_val(pgd))

-#define pgd_bad(pgd) ((pgd_val(pgd) & ~PAGE_MASK) || \
- !__kern_addr_valid(pgd_val(pgd)))
+#define pgd_bad(pgd) (pgd_val(pgd) & ~PAGE_MASK)

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
@@ -835,6 +818,20 @@ static inline unsigned long __pmd_page(pmd_t pmd)
#define pgd_present(pgd) (pgd_val(pgd) != 0U)
#define pgd_clear(pgdp) (pgd_val(*(pgd)) = 0UL)

+static inline unsigned long pud_large(pud_t pud)
+{
+ pte_t pte = __pte(pud_val(pud));
+
+ return pte_val(pte) & _PAGE_PMD_HUGE;
+}
+
+static inline unsigned long pud_pfn(pud_t pud)
+{
+ pte_t pte = __pte(pud_val(pud));
+
+ return pte_pfn(pte);
+}
+
/* Same in both SUN4V and SUN4U. */
#define pte_none(pte) (!pte_val(pte))

diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index a2f5419..ecb49cf 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -133,9 +133,24 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
sub TSB, 0x8, TSB; \
TSB_STORE(TSB, TAG);

- /* Do a kernel page table walk. Leaves physical PTE pointer in
- * REG1. Jumps to FAIL_LABEL on early page table walk termination.
- * VADDR will not be clobbered, but REG2 will.
+ /* Do a kernel page table walk. Leaves valid PTE value in
+ * REG1. Jumps to FAIL_LABEL on early page table walk
+ * termination. VADDR will not be clobbered, but REG2 will.
+ *
+ * There are two masks we must apply to propagate bits from
+ * the virtual address into the PTE physical address field
+ * when dealing with huge pages. This is because the page
+ * table boundaries do not match the huge page size(s) the
+ * hardware supports.
+ *
+ * In these cases we propagate the bits that are below the
+ * page table level where we saw the huge page mapping, but
+ * are still within the relevant physical bits for the huge
+ * page size in question. So for PMD mappings (which fall on
+ * bit 23, for 8MB per PMD) we must propagate bit 22 for a
+ * 4MB huge page. For huge PUDs (which fall on bit 33, for
+ * 8GB per PUD), we have to accomodate 256MB and 2GB huge
+ * pages. So for those we propagate bits 32 to 28.
*/
#define KERN_PGTABLE_WALK(VADDR, REG1, REG2, FAIL_LABEL) \
sethi %hi(swapper_pg_dir), REG1; \
@@ -150,15 +165,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
- sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
+ sethi %uhi(_PAGE_PUD_HUGE), REG2; \
+ brz,pn REG1, FAIL_LABEL; \
+ sllx REG2, 32, REG2; \
+ andcc REG1, REG2, %g0; \
+ sethi %hi(0xf8000000), REG2; \
+ bne,pt %xcc, 697f; \
+ sllx REG2, 1, REG2; \
+ sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ sethi %uhi(_PAGE_PMD_HUGE), REG2; \
brz,pn REG1, FAIL_LABEL; \
- sllx VADDR, 64 - PMD_SHIFT, REG2; \
+ sllx REG2, 32, REG2; \
+ andcc REG1, REG2, %g0; \
+ be,pn %xcc, 698f; \
+ sethi %hi(0x400000), REG2; \
+697: brgez,pn REG1, FAIL_LABEL; \
+ andn REG1, REG2, REG1; \
+ and VADDR, REG2, REG2; \
+ ba,pt %xcc, 699f; \
+ or REG1, REG2, REG1; \
+698: sllx VADDR, 64 - PMD_SHIFT, REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
- add REG1, REG2, REG1;
+ ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+ brgez,pn REG1, FAIL_LABEL; \
+ nop; \
+699:

/* PMD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PMD or a normal one. If it is not valid
diff --git a/arch/sparc/kernel/ktlb.S b/arch/sparc/kernel/ktlb.S
index 605d492..94a1e66 100644
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -47,14 +47,6 @@ kvmap_itlb_vmalloc_addr:
KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_itlb_longpath)

TSB_LOCK_TAG(%g1, %g2, %g7)
-
- /* Load and check PTE. */
- ldxa [%g5] ASI_PHYS_USE_EC, %g5
- mov 1, %g7
- sllx %g7, TSB_TAG_INVALID_BIT, %g7
- brgez,a,pn %g5, kvmap_itlb_longpath
- TSB_STORE(%g1, %g7)
-
TSB_WRITE(%g1, %g5, %g6)

/* fallthrough to TLB load */
@@ -118,6 +110,12 @@ kvmap_dtlb_obp:
ba,pt %xcc, kvmap_dtlb_load
nop

+kvmap_linear_early:
+ sethi %hi(kern_linear_pte_xor), %g7
+ ldx [%g7 + %lo(kern_linear_pte_xor)], %g2
+ ba,pt %xcc, kvmap_dtlb_tsb4m_load
+ xor %g2, %g4, %g5
+
.align 32
kvmap_dtlb_tsb4m_load:
TSB_LOCK_TAG(%g1, %g2, %g7)
@@ -146,105 +144,17 @@ kvmap_dtlb_4v:
/* Correct TAG_TARGET is already in %g6, check 4mb TSB. */
KERN_TSB4M_LOOKUP_TL1(%g6, %g5, %g1, %g2, %g3, kvmap_dtlb_load)
#endif
- /* TSB entry address left in %g1, lookup linear PTE.
- * Must preserve %g1 and %g6 (TAG).
- */
-kvmap_dtlb_tsb4m_miss:
- /* Clear the PAGE_OFFSET top virtual bits, shift
- * down to get PFN, and make sure PFN is in range.
- */
-661: sllx %g4, 0, %g5
- .section .page_offset_shift_patch, "ax"
- .word 661b
- .previous
-
- /* Check to see if we know about valid memory at the 4MB
- * chunk this physical address will reside within.
+ /* Linear mapping TSB lookup failed. Fallthrough to kernel
+ * page table based lookup.
*/
-661: srlx %g5, MAX_PHYS_ADDRESS_BITS, %g2
- .section .page_offset_shift_patch, "ax"
- .word 661b
- .previous
-
- brnz,pn %g2, kvmap_dtlb_longpath
- nop
-
- /* This unconditional branch and delay-slot nop gets patched
- * by the sethi sequence once the bitmap is properly setup.
- */
- .globl valid_addr_bitmap_insn
-valid_addr_bitmap_insn:
- ba,pt %xcc, 2f
- nop
- .subsection 2
- .globl valid_addr_bitmap_patch
-valid_addr_bitmap_patch:
- sethi %hi(sparc64_valid_addr_bitmap), %g7
- or %g7, %lo(sparc64_valid_addr_bitmap), %g7
- .previous
-
-661: srlx %g5, ILOG2_4MB, %g2
- .section .page_offset_shift_patch, "ax"
- .word 661b
- .previous
-
- srlx %g2, 6, %g5
- and %g2, 63, %g2
- sllx %g5, 3, %g5
- ldx [%g7 + %g5], %g5
- mov 1, %g7
- sllx %g7, %g2, %g7
- andcc %g5, %g7, %g0
- be,pn %xcc, kvmap_dtlb_longpath
-
-2: sethi %hi(kpte_linear_bitmap), %g2
-
- /* Get the 256MB physical address index. */
-661: sllx %g4, 0, %g5
- .section .page_offset_shift_patch, "ax"
- .word 661b
- .previous
-
- or %g2, %lo(kpte_linear_bitmap), %g2
-
-661: srlx %g5, ILOG2_256MB, %g5
- .section .page_offset_shift_patch, "ax"
- .word 661b
- .previous
-
- and %g5, (32 - 1), %g7
-
- /* Divide by 32 to get the offset into the bitmask. */
- srlx %g5, 5, %g5
- add %g7, %g7, %g7
- sllx %g5, 3, %g5
-
- /* kern_linear_pte_xor[(mask >> shift) & 3)] */
- ldx [%g2 + %g5], %g2
- srlx %g2, %g7, %g7
- sethi %hi(kern_linear_pte_xor), %g5
- and %g7, 3, %g7
- or %g5, %lo(kern_linear_pte_xor), %g5
- sllx %g7, 3, %g7
- ldx [%g5 + %g7], %g2
-
.globl kvmap_linear_patch
kvmap_linear_patch:
- ba,pt %xcc, kvmap_dtlb_tsb4m_load
- xor %g2, %g4, %g5
+ ba,a,pt %xcc, kvmap_linear_early

kvmap_dtlb_vmalloc_addr:
KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_dtlb_longpath)

TSB_LOCK_TAG(%g1, %g2, %g7)
-
- /* Load and check PTE. */
- ldxa [%g5] ASI_PHYS_USE_EC, %g5
- mov 1, %g7
- sllx %g7, TSB_TAG_INVALID_BIT, %g7
- brgez,a,pn %g5, kvmap_dtlb_longpath
- TSB_STORE(%g1, %g7)
-
TSB_WRITE(%g1, %g5, %g6)

/* fallthrough to TLB load */
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index 932ff90..0bacceb 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -122,11 +122,6 @@ SECTIONS
*(.swapper_4m_tsb_phys_patch)
__swapper_4m_tsb_phys_patch_end = .;
}
- .page_offset_shift_patch : {
- __page_offset_shift_patch = .;
- *(.page_offset_shift_patch)
- __page_offset_shift_patch_end = .;
- }
.popc_3insn_patch : {
__popc_3insn_patch = .;
*(.popc_3insn_patch)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index f743f05..661a6383 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -73,7 +73,6 @@ unsigned long kern_linear_pte_xor[4] __read_mostly;
* 'cpu' properties, but we need to have this table setup before the
* MDESC is initialized.
*/
-unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];

#ifndef CONFIG_DEBUG_PAGEALLOC
/* A special kernel TSB for 4MB, 256MB, 2GB and 16GB linear mappings.
@@ -82,6 +81,7 @@ unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
*/
extern struct tsb swapper_4m_tsb[KERNEL_TSB4M_NENTRIES];
#endif
+extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];

static unsigned long cpu_pgsz_mask;

@@ -163,10 +163,6 @@ static void __init read_obp_memory(const char *property,
cmp_p64, NULL);
}

-unsigned long sparc64_valid_addr_bitmap[VALID_ADDR_BITMAP_BYTES /
- sizeof(unsigned long)];
-EXPORT_SYMBOL(sparc64_valid_addr_bitmap);
-
/* Kernel physical address base and size in bytes. */
unsigned long kern_base __read_mostly;
unsigned long kern_size __read_mostly;
@@ -1362,9 +1358,145 @@ static unsigned long __init bootmem_init(unsigned long phys_base)
static struct linux_prom64_registers pall[MAX_BANKS] __initdata;
static int pall_ents __initdata;

-#ifdef CONFIG_DEBUG_PAGEALLOC
+static unsigned long max_phys_bits = 40;
+
+bool kern_addr_valid(unsigned long addr)
+{
+ unsigned long above = ((long)addr) >> max_phys_bits;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ if (above != 0 && above != -1UL)
+ return false;
+
+ if (addr >= (unsigned long) KERNBASE &&
+ addr < (unsigned long)&_end)
+ return true;
+
+ if (addr >= PAGE_OFFSET) {
+ unsigned long pa = __pa(addr);
+
+ return pfn_valid(pa >> PAGE_SHIFT);
+ }
+
+ pgd = pgd_offset_k(addr);
+ if (pgd_none(*pgd))
+ return 0;
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return 0;
+
+ if (pud_large(*pud))
+ return pfn_valid(pud_pfn(*pud));
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return 0;
+
+ if (pmd_large(*pmd))
+ return pfn_valid(pmd_pfn(*pmd));
+
+ pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte))
+ return 0;
+
+ return pfn_valid(pte_pfn(*pte));
+}
+EXPORT_SYMBOL(kern_addr_valid);
+
+static unsigned long __ref kernel_map_hugepud(unsigned long vstart,
+ unsigned long vend,
+ pud_t *pud)
+{
+ const unsigned long mask16gb = (1UL << 34) - 1UL;
+ u64 pte_val = vstart;
+
+ /* Each PUD is 8GB */
+ if ((vstart & mask16gb) ||
+ (vend - vstart <= mask16gb)) {
+ pte_val ^= kern_linear_pte_xor[2];
+ pud_val(*pud) = pte_val | _PAGE_PUD_HUGE;
+
+ return vstart + PUD_SIZE;
+ }
+
+ pte_val ^= kern_linear_pte_xor[3];
+ pte_val |= _PAGE_PUD_HUGE;
+
+ vend = vstart + mask16gb + 1UL;
+ while (vstart < vend) {
+ pud_val(*pud) = pte_val;
+
+ pte_val += PUD_SIZE;
+ vstart += PUD_SIZE;
+ pud++;
+ }
+ return vstart;
+}
+
+static bool kernel_can_map_hugepud(unsigned long vstart, unsigned long vend,
+ bool guard)
+{
+ if (guard && !(vstart & ~PUD_MASK) && (vend - vstart) >= PUD_SIZE)
+ return true;
+
+ return false;
+}
+
+static unsigned long __ref kernel_map_hugepmd(unsigned long vstart,
+ unsigned long vend,
+ pmd_t *pmd)
+{
+ const unsigned long mask256mb = (1UL << 28) - 1UL;
+ const unsigned long mask2gb = (1UL << 31) - 1UL;
+ u64 pte_val = vstart;
+
+ /* Each PMD is 8MB */
+ if ((vstart & mask256mb) ||
+ (vend - vstart <= mask256mb)) {
+ pte_val ^= kern_linear_pte_xor[0];
+ pmd_val(*pmd) = pte_val | _PAGE_PMD_HUGE;
+
+ return vstart + PMD_SIZE;
+ }
+
+ if ((vstart & mask2gb) ||
+ (vend - vstart <= mask2gb)) {
+ pte_val ^= kern_linear_pte_xor[1];
+ pte_val |= _PAGE_PMD_HUGE;
+ vend = vstart + mask256mb + 1UL;
+ } else {
+ pte_val ^= kern_linear_pte_xor[2];
+ pte_val |= _PAGE_PMD_HUGE;
+ vend = vstart + mask2gb + 1UL;
+ }
+
+ while (vstart < vend) {
+ pmd_val(*pmd) = pte_val;
+
+ pte_val += PMD_SIZE;
+ vstart += PMD_SIZE;
+ pmd++;
+ }
+
+ return vstart;
+}
+
+static bool kernel_can_map_hugepmd(unsigned long vstart, unsigned long vend,
+ bool guard)
+{
+ if (guard && !(vstart & ~PMD_MASK) && (vend - vstart) >= PMD_SIZE)
+ return true;
+
+ return false;
+}
+
static unsigned long __ref kernel_map_range(unsigned long pstart,
- unsigned long pend, pgprot_t prot)
+ unsigned long pend, pgprot_t prot,
+ bool use_huge)
{
unsigned long vstart = PAGE_OFFSET + pstart;
unsigned long vend = PAGE_OFFSET + pend;
@@ -1394,15 +1526,23 @@ static unsigned long __ref kernel_map_range(unsigned long pstart,
if (pud_none(*pud)) {
pmd_t *new;

+ if (kernel_can_map_hugepud(vstart, vend, use_huge)) {
+ vstart = kernel_map_hugepud(vstart, vend, pud);
+ continue;
+ }
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
alloc_bytes += PAGE_SIZE;
pud_populate(&init_mm, pud, new);
}

pmd = pmd_offset(pud, vstart);
- if (!pmd_present(*pmd)) {
+ if (pmd_none(*pmd)) {
pte_t *new;

+ if (kernel_can_map_hugepmd(vstart, vend, use_huge)) {
+ vstart = kernel_map_hugepmd(vstart, vend, pmd);
+ continue;
+ }
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
alloc_bytes += PAGE_SIZE;
pmd_populate_kernel(&init_mm, pmd, new);
@@ -1425,100 +1565,34 @@ static unsigned long __ref kernel_map_range(unsigned long pstart,
return alloc_bytes;
}

-extern unsigned int kvmap_linear_patch[1];
-#endif /* CONFIG_DEBUG_PAGEALLOC */
-
-static void __init kpte_set_val(unsigned long index, unsigned long val)
-{
- unsigned long *ptr = kpte_linear_bitmap;
-
- val <<= ((index % (BITS_PER_LONG / 2)) * 2);
- ptr += (index / (BITS_PER_LONG / 2));
-
- *ptr |= val;
-}
-
-static const unsigned long kpte_shift_min = 28; /* 256MB */
-static const unsigned long kpte_shift_max = 34; /* 16GB */
-static const unsigned long kpte_shift_incr = 3;
-
-static unsigned long kpte_mark_using_shift(unsigned long start, unsigned long end,
- unsigned long shift)
+static void __init flush_all_kernel_tsbs(void)
{
- unsigned long size = (1UL << shift);
- unsigned long mask = (size - 1UL);
- unsigned long remains = end - start;
- unsigned long val;
-
- if (remains < size || (start & mask))
- return start;
-
- /* VAL maps:
- *
- * shift 28 --> kern_linear_pte_xor index 1
- * shift 31 --> kern_linear_pte_xor index 2
- * shift 34 --> kern_linear_pte_xor index 3
- */
- val = ((shift - kpte_shift_min) / kpte_shift_incr) + 1;
-
- remains &= ~mask;
- if (shift != kpte_shift_max)
- remains = size;
-
- while (remains) {
- unsigned long index = start >> kpte_shift_min;
+ int i;

- kpte_set_val(index, val);
+ for (i = 0; i < KERNEL_TSB_NENTRIES; i++) {
+ struct tsb *ent = &swapper_tsb[i];

- start += 1UL << kpte_shift_min;
- remains -= 1UL << kpte_shift_min;
+ ent->tag = (1UL << TSB_TAG_INVALID_BIT);
}
+#ifndef CONFIG_DEBUG_PAGEALLOC
+ for (i = 0; i < KERNEL_TSB4M_NENTRIES; i++) {
+ struct tsb *ent = &swapper_4m_tsb[i];

- return start;
-}
-
-static void __init mark_kpte_bitmap(unsigned long start, unsigned long end)
-{
- unsigned long smallest_size, smallest_mask;
- unsigned long s;
-
- smallest_size = (1UL << kpte_shift_min);
- smallest_mask = (smallest_size - 1UL);
-
- while (start < end) {
- unsigned long orig_start = start;
-
- for (s = kpte_shift_max; s >= kpte_shift_min; s -= kpte_shift_incr) {
- start = kpte_mark_using_shift(start, end, s);
-
- if (start != orig_start)
- break;
- }
-
- if (start == orig_start)
- start = (start + smallest_size) & ~smallest_mask;
+ ent->tag = (1UL << TSB_TAG_INVALID_BIT);
}
+#endif
}

-static void __init init_kpte_bitmap(void)
-{
- unsigned long i;
-
- for (i = 0; i < pall_ents; i++) {
- unsigned long phys_start, phys_end;
-
- phys_start = pall[i].phys_addr;
- phys_end = phys_start + pall[i].reg_size;
-
- mark_kpte_bitmap(phys_start, phys_end);
- }
-}
+extern unsigned int kvmap_linear_patch[1];

static void __init kernel_physical_mapping_init(void)
{
-#ifdef CONFIG_DEBUG_PAGEALLOC
unsigned long i, mem_alloced = 0UL;
+ bool use_huge = true;

+#ifdef CONFIG_DEBUG_PAGEALLOC
+ use_huge = false;
+#endif
for (i = 0; i < pall_ents; i++) {
unsigned long phys_start, phys_end;

@@ -1526,7 +1600,7 @@ static void __init kernel_physical_mapping_init(void)
phys_end = phys_start + pall[i].reg_size;

mem_alloced += kernel_map_range(phys_start, phys_end,
- PAGE_KERNEL);
+ PAGE_KERNEL, use_huge);
}

printk("Allocated %ld bytes for kernel page tables.\n",
@@ -1535,8 +1609,9 @@ static void __init kernel_physical_mapping_init(void)
kvmap_linear_patch[0] = 0x01000000; /* nop */
flushi(&kvmap_linear_patch[0]);

+ flush_all_kernel_tsbs();
+
__flush_tlb_all();
-#endif
}

#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1546,7 +1621,7 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
unsigned long phys_end = phys_start + (numpages * PAGE_SIZE);

kernel_map_range(phys_start, phys_end,
- (enable ? PAGE_KERNEL : __pgprot(0)));
+ (enable ? PAGE_KERNEL : __pgprot(0)), false);

flush_tsb_kernel_range(PAGE_OFFSET + phys_start,
PAGE_OFFSET + phys_end);
@@ -1574,62 +1649,11 @@ unsigned long __init find_ecache_flush_span(unsigned long size)
unsigned long PAGE_OFFSET;
EXPORT_SYMBOL(PAGE_OFFSET);

-static void __init page_offset_shift_patch_one(unsigned int *insn, unsigned long phys_bits)
-{
- unsigned long final_shift;
- unsigned int val = *insn;
- unsigned int cnt;
-
- /* We are patching in ilog2(max_supported_phys_address), and
- * we are doing so in a manner similar to a relocation addend.
- * That is, we are adding the shift value to whatever value
- * is in the shift instruction count field already.
- */
- cnt = (val & 0x3f);
- val &= ~0x3f;
-
- /* If we are trying to shift >= 64 bits, clear the destination
- * register. This can happen when phys_bits ends up being equal
- * to MAX_PHYS_ADDRESS_BITS.
- */
- final_shift = (cnt + (64 - phys_bits));
- if (final_shift >= 64) {
- unsigned int rd = (val >> 25) & 0x1f;
-
- val = 0x80100000 | (rd << 25);
- } else {
- val |= final_shift;
- }
- *insn = val;
-
- __asm__ __volatile__("flush %0"
- : /* no outputs */
- : "r" (insn));
-}
-
-static void __init page_offset_shift_patch(unsigned long phys_bits)
-{
- extern unsigned int __page_offset_shift_patch;
- extern unsigned int __page_offset_shift_patch_end;
- unsigned int *p;
-
- p = &__page_offset_shift_patch;
- while (p < &__page_offset_shift_patch_end) {
- unsigned int *insn = (unsigned int *)(unsigned long)*p;
-
- page_offset_shift_patch_one(insn, phys_bits);
-
- p++;
- }
-}
-
unsigned long sparc64_va_hole_top = 0xfffff80000000000UL;
unsigned long sparc64_va_hole_bottom = 0x0000080000000000UL;

static void __init setup_page_offset(void)
{
- unsigned long max_phys_bits = 40;
-
if (tlb_type == cheetah || tlb_type == cheetah_plus) {
/* Cheetah/Panther support a full 64-bit virtual
* address, so we can use all that our page tables
@@ -1678,8 +1702,6 @@ static void __init setup_page_offset(void)

pr_info("PAGE_OFFSET is 0x%016lx (max_phys_bits == %lu)\n",
PAGE_OFFSET, max_phys_bits);
-
- page_offset_shift_patch(max_phys_bits);
}

static void __init tsb_phys_patch(void)
@@ -1724,7 +1746,6 @@ static void __init tsb_phys_patch(void)
#define NUM_KTSB_DESCR 1
#endif
static struct hv_tsb_descr ktsb_descr[NUM_KTSB_DESCR];
-extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];

/* The swapper TSBs are loaded with a base sequence of:
*
@@ -2023,11 +2044,9 @@ void __init paging_init(void)

pmd = swapper_low_pmd_dir + (shift / sizeof(pmd_t));
pud_set(&swapper_pud_dir[0], pmd);
-
+
inherit_prom_mappings();

- init_kpte_bitmap();
-
/* Ok, we can use our TLB miss and window trap handlers safely. */
setup_tba();

@@ -2134,70 +2153,6 @@ int page_in_phys_avail(unsigned long paddr)
return 0;
}

-static struct linux_prom64_registers pavail_rescan[MAX_BANKS] __initdata;
-static int pavail_rescan_ents __initdata;
-
-/* Certain OBP calls, such as fetching "available" properties, can
- * claim physical memory. So, along with initializing the valid
- * address bitmap, what we do here is refetch the physical available
- * memory list again, and make sure it provides at least as much
- * memory as 'pavail' does.
- */
-static void __init setup_valid_addr_bitmap_from_pavail(unsigned long *bitmap)
-{
- int i;
-
- read_obp_memory("available", &pavail_rescan[0], &pavail_rescan_ents);
-
- for (i = 0; i < pavail_ents; i++) {
- unsigned long old_start, old_end;
-
- old_start = pavail[i].phys_addr;
- old_end = old_start + pavail[i].reg_size;
- while (old_start < old_end) {
- int n;
-
- for (n = 0; n < pavail_rescan_ents; n++) {
- unsigned long new_start, new_end;
-
- new_start = pavail_rescan[n].phys_addr;
- new_end = new_start +
- pavail_rescan[n].reg_size;
-
- if (new_start <= old_start &&
- new_end >= (old_start + PAGE_SIZE)) {
- set_bit(old_start >> ILOG2_4MB, bitmap);
- goto do_next_page;
- }
- }
-
- prom_printf("mem_init: Lost memory in pavail\n");
- prom_printf("mem_init: OLD start[%lx] size[%lx]\n",
- pavail[i].phys_addr,
- pavail[i].reg_size);
- prom_printf("mem_init: NEW start[%lx] size[%lx]\n",
- pavail_rescan[i].phys_addr,
- pavail_rescan[i].reg_size);
- prom_printf("mem_init: Cannot continue, aborting.\n");
- prom_halt();
-
- do_next_page:
- old_start += PAGE_SIZE;
- }
- }
-}
-
-static void __init patch_tlb_miss_handler_bitmap(void)
-{
- extern unsigned int valid_addr_bitmap_insn[];
- extern unsigned int valid_addr_bitmap_patch[];
-
- valid_addr_bitmap_insn[1] = valid_addr_bitmap_patch[1];
- mb();
- valid_addr_bitmap_insn[0] = valid_addr_bitmap_patch[0];
- flushi(&valid_addr_bitmap_insn[0]);
-}
-
static void __init register_page_bootmem_info(void)
{
#ifdef CONFIG_NEED_MULTIPLE_NODES
@@ -2210,18 +2165,6 @@ static void __init register_page_bootmem_info(void)
}
void __init mem_init(void)
{
- unsigned long addr, last;
-
- addr = PAGE_OFFSET + kern_base;
- last = PAGE_ALIGN(kern_size) + addr;
- while (addr < last) {
- set_bit(__pa(addr) >> ILOG2_4MB, sparc64_valid_addr_bitmap);
- addr += PAGE_SIZE;
- }
-
- setup_valid_addr_bitmap_from_pavail(sparc64_valid_addr_bitmap);
- patch_tlb_miss_handler_bitmap();
-
high_memory = __va(last_valid_pfn << PAGE_SHIFT);

register_page_bootmem_info();
diff --git a/arch/sparc/mm/init_64.h b/arch/sparc/mm/init_64.h
index 5d3782de..3ccbf92 100644
--- a/arch/sparc/mm/init_64.h
+++ b/arch/sparc/mm/init_64.h
@@ -8,15 +8,8 @@
*/

#define MAX_PHYS_ADDRESS (1UL << MAX_PHYS_ADDRESS_BITS)
-#define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
-#define KPTE_BITMAP_BYTES \
- ((MAX_PHYS_ADDRESS / KPTE_BITMAP_CHUNK_SZ) / 4)
-#define VALID_ADDR_BITMAP_CHUNK_SZ (4UL * 1024UL * 1024UL)
-#define VALID_ADDR_BITMAP_BYTES \
- ((MAX_PHYS_ADDRESS / VALID_ADDR_BITMAP_CHUNK_SZ) / 8)

extern unsigned long kern_linear_pte_xor[4];
-extern unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
extern unsigned int sparc64_highest_unlocked_tlb_ent;
extern unsigned long sparc64_kern_pri_context;
extern unsigned long sparc64_kern_pri_nuc_bits;
--
1.9.1

2014-11-06 23:11:03

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 056/162] sparc64: Implement __get_user_pages_fast().

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit 06090e8ed89ea2113a236befb41f71d51f100e60 ]

It is not sufficient to only implement get_user_pages_fast(), you
must also implement the atomic version __get_user_pages_fast()
otherwise you end up using the weak symbol fallback implementation
which simply returns zero.

This is dangerous, because it causes the futex code to loop forever
if transparent hugepages are supported (see get_futex_key()).

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/mm/gup.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index 1aed043..ae6ce38 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -160,6 +160,36 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
return 1;
}

+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long addr, len, end;
+ unsigned long next, flags;
+ pgd_t *pgdp;
+ int nr = 0;
+
+ start &= PAGE_MASK;
+ addr = start;
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+
+ local_irq_save(flags);
+ pgdp = pgd_offset(mm, addr);
+ do {
+ pgd_t pgd = *pgdp;
+
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(pgd))
+ break;
+ if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+ break;
+ } while (pgdp++, addr = next, addr != end);
+ local_irq_restore(flags);
+
+ return nr;
+}
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages)
{
--
1.9.1

2014-11-06 23:11:08

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 038/162] sparc64: Fix FPU register corruption with AES crypto offload.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit f4da3628dc7c32a59d1fb7116bb042e6f436d611 ]

The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls. And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time. Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot. Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/visasm.h | 8 ++++++++
arch/sparc/lib/NG4memcpy.S | 14 +++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/include/asm/visasm.h b/arch/sparc/include/asm/visasm.h
index 39ca301..11fdf0e 100644
--- a/arch/sparc/include/asm/visasm.h
+++ b/arch/sparc/include/asm/visasm.h
@@ -39,6 +39,14 @@
297: wr %o5, FPRS_FEF, %fprs; \
298:

+#define VISEntryHalfFast(fail_label) \
+ rd %fprs, %o5; \
+ andcc %o5, FPRS_FEF, %g0; \
+ be,pt %icc, 297f; \
+ nop; \
+ ba,a,pt %xcc, fail_label; \
+297: wr %o5, FPRS_FEF, %fprs;
+
#define VISExitHalf \
wr %o5, 0, %fprs;

diff --git a/arch/sparc/lib/NG4memcpy.S b/arch/sparc/lib/NG4memcpy.S
index 9cf2ee0..140527a 100644
--- a/arch/sparc/lib/NG4memcpy.S
+++ b/arch/sparc/lib/NG4memcpy.S
@@ -41,6 +41,10 @@
#endif
#endif

+#if !defined(EX_LD) && !defined(EX_ST)
+#define NON_USER_COPY
+#endif
+
#ifndef EX_LD
#define EX_LD(x) x
#endif
@@ -197,9 +201,13 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
mov EX_RETVAL(%o3), %o0

.Llarge_src_unaligned:
+#ifdef NON_USER_COPY
+ VISEntryHalfFast(.Lmedium_vis_entry_fail)
+#else
+ VISEntryHalf
+#endif
andn %o2, 0x3f, %o4
sub %o2, %o4, %o2
- VISEntryHalf
alignaddr %o1, %g0, %g1
add %o1, %o4, %o1
EX_LD(LOAD(ldd, %g1 + 0x00, %f0))
@@ -240,6 +248,10 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
nop
ba,a,pt %icc, .Lmedium_unaligned

+#ifdef NON_USER_COPY
+.Lmedium_vis_entry_fail:
+ or %o0, %o1, %g2
+#endif
.Lmedium:
LOAD(prefetch, %o1 + 0x40, #n_reads_strong)
andcc %g2, 0x7, %g0
--
1.9.1

2014-11-06 22:41:24

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 054/162] sparc64: Increase size of boot string to 1024 bytes

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Kleikamp <[email protected]>

[ Upstream commit 1cef94c36bd4d79b5ae3a3df99ee0d76d6a4a6dc ]

This is the longest boot string that silo supports.

Signed-off-by: Dave Kleikamp <[email protected]>
Cc: Bob Picco <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/prom/bootstr_64.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/prom/bootstr_64.c b/arch/sparc/prom/bootstr_64.c
index ab9ccc6..7149e77 100644
--- a/arch/sparc/prom/bootstr_64.c
+++ b/arch/sparc/prom/bootstr_64.c
@@ -14,7 +14,10 @@
* the .bss section or it will break things.
*/

-#define BARG_LEN 256
+/* We limit BARG_LEN to 1024 because this is the size of the
+ * 'barg_out' command line buffer in the SILO bootloader.
+ */
+#define BARG_LEN 1024
struct {
int bootstr_len;
int bootstr_valid;
--
1.9.1

2014-11-06 23:11:57

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 037/162] sparc64: Fix lockdep warnings on reboot on Ultra-5

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit bdcf81b658ebc4c2640c3c2c55c8b31c601b6996 ]

Inconsistently, the raw_* IRQ routines do not interact with and update
the irqflags tracing and lockdep state, whereas the raw_* spinlock
interfaces do.

This causes problems in p1275_cmd_direct() because we disable hardirqs
by hand using raw_local_irq_restore() and then do a raw_spin_lock()
which triggers a lockdep trace because the CPU's hw IRQ state doesn't
match IRQ tracing's internal software copy of that state.

The CPU's irqs are disabled, yet current->hardirqs_enabled is true.

====================
reboot: Restarting system
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
Modules linked in: openpromfs
CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G W 3.17.0-dirty #145
Call Trace:
[000000000045919c] warn_slowpath_common+0x5c/0xa0
[0000000000459210] warn_slowpath_fmt+0x30/0x40
[000000000048f41c] check_flags+0x7c/0x240
[0000000000493280] lock_acquire+0x20/0x1c0
[0000000000832b70] _raw_spin_lock+0x30/0x60
[000000000068f2fc] p1275_cmd_direct+0x1c/0x60
[000000000068ed28] prom_reboot+0x28/0x40
[000000000043610c] machine_restart+0x4c/0x80
[000000000047d2d4] kernel_restart+0x54/0x80
[000000000047d618] SyS_reboot+0x138/0x200
[00000000004060b4] linux_sparc_syscall32+0x34/0x60
---[ end trace 5c439fe81c05a100 ]---
possible reason: unannotated irqs-off.
irq event stamp: 2010267
hardirqs last enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
softirqs last enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
Resetting ...
====================

Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
all of our changes.

Reported-by: Meelis Roos <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/prom/p1275.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index 04a4540..ea5b427 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -10,6 +10,7 @@
#include <linux/smp.h>
#include <linux/string.h>
#include <linux/spinlock.h>
+#include <linux/irqflags.h>

#include <asm/openprom.h>
#include <asm/oplib.h>
@@ -37,8 +38,8 @@ void p1275_cmd_direct(unsigned long *args)
{
unsigned long flags;

- raw_local_save_flags(flags);
- raw_local_irq_restore((unsigned long)PIL_NMI);
+ local_save_flags(flags);
+ local_irq_restore((unsigned long)PIL_NMI);
raw_spin_lock(&prom_entry_lock);

prom_world(1);
@@ -46,7 +47,7 @@ void p1275_cmd_direct(unsigned long *args)
prom_world(0);

raw_spin_unlock(&prom_entry_lock);
- raw_local_irq_restore(flags);
+ local_irq_restore(flags);
}

void prom_cif_init(void *cif_handler, void *cif_stack)
--
1.9.1

2014-11-06 22:41:21

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 049/162] sparc64: Use kernel page tables for vmemmap.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit c06240c7f5c39c83dfd7849c0770775562441b96 ]

For sparse memory configurations, the vmemmap array behaves terribly
and it takes up an inordinate amount of space in the BSS section of
the kernel image unconditionally.

Just build huge PMDs and look them up just like we do for TLB misses
in the vmalloc area.

Kernel BSS shrinks by about 2MB.

Signed-off-by: David S. Miller <[email protected]>
Acked-by: Bob Picco <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/kernel/ktlb.S | 9 ++----
arch/sparc/mm/init_64.c | 72 +++++++++++++++++++++++-------------------------
arch/sparc/mm/init_64.h | 11 --------
3 files changed, 36 insertions(+), 56 deletions(-)

diff --git a/arch/sparc/kernel/ktlb.S b/arch/sparc/kernel/ktlb.S
index 94a1e66..2627a7f 100644
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -186,13 +186,8 @@ kvmap_dtlb_load:

#ifdef CONFIG_SPARSEMEM_VMEMMAP
kvmap_vmemmap:
- sub %g4, %g5, %g5
- srlx %g5, ILOG2_4MB, %g5
- sethi %hi(vmemmap_table), %g1
- sllx %g5, 3, %g5
- or %g1, %lo(vmemmap_table), %g1
- ba,pt %xcc, kvmap_dtlb_load
- ldx [%g1 + %g5], %g5
+ KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_dtlb_longpath)
+ ba,a,pt %xcc, kvmap_dtlb_load
#endif

kvmap_dtlb_nonlinear:
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 661a6383..75d26ee 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2254,18 +2254,9 @@ unsigned long _PAGE_CACHE __read_mostly;
EXPORT_SYMBOL(_PAGE_CACHE);

#ifdef CONFIG_SPARSEMEM_VMEMMAP
-unsigned long vmemmap_table[VMEMMAP_SIZE];
-
-static long __meminitdata addr_start, addr_end;
-static int __meminitdata node_start;
-
int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
int node)
{
- unsigned long phys_start = (vstart - VMEMMAP_BASE);
- unsigned long phys_end = (vend - VMEMMAP_BASE);
- unsigned long addr = phys_start & VMEMMAP_CHUNK_MASK;
- unsigned long end = VMEMMAP_ALIGN(phys_end);
unsigned long pte_base;

pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4U |
@@ -2276,47 +2267,52 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
_PAGE_CP_4V | _PAGE_CV_4V |
_PAGE_P_4V | _PAGE_W_4V);

- for (; addr < end; addr += VMEMMAP_CHUNK) {
- unsigned long *vmem_pp =
- vmemmap_table + (addr >> VMEMMAP_CHUNK_SHIFT);
- void *block;
+ pte_base |= _PAGE_PMD_HUGE;

- if (!(*vmem_pp & _PAGE_VALID)) {
- block = vmemmap_alloc_block(1UL << ILOG2_4MB, node);
- if (!block)
+ vstart = vstart & PMD_MASK;
+ vend = ALIGN(vend, PMD_SIZE);
+ for (; vstart < vend; vstart += PMD_SIZE) {
+ pgd_t *pgd = pgd_offset_k(vstart);
+ unsigned long pte;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ if (pgd_none(*pgd)) {
+ pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+
+ if (!new)
return -ENOMEM;
+ pgd_populate(&init_mm, pgd, new);
+ }

- *vmem_pp = pte_base | __pa(block);
+ pud = pud_offset(pgd, vstart);
+ if (pud_none(*pud)) {
+ pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);

- /* check to see if we have contiguous blocks */
- if (addr_end != addr || node_start != node) {
- if (addr_start)
- printk(KERN_DEBUG " [%lx-%lx] on node %d\n",
- addr_start, addr_end-1, node_start);
- addr_start = addr;
- node_start = node;
- }
- addr_end = addr + VMEMMAP_CHUNK;
+ if (!new)
+ return -ENOMEM;
+ pud_populate(&init_mm, pud, new);
}
- }
- return 0;
-}

-void __meminit vmemmap_populate_print_last(void)
-{
- if (addr_start) {
- printk(KERN_DEBUG " [%lx-%lx] on node %d\n",
- addr_start, addr_end-1, node_start);
- addr_start = 0;
- addr_end = 0;
- node_start = 0;
+ pmd = pmd_offset(pud, vstart);
+
+ pte = pmd_val(*pmd);
+ if (!(pte & _PAGE_VALID)) {
+ void *block = vmemmap_alloc_block(PMD_SIZE, node);
+
+ if (!block)
+ return -ENOMEM;
+
+ pmd_val(*pmd) = pte_base | __pa(block);
+ }
}
+
+ return 0;
}

void vmemmap_free(unsigned long start, unsigned long end)
{
}
-
#endif /* CONFIG_SPARSEMEM_VMEMMAP */

static void prot_init_common(unsigned long page_none,
diff --git a/arch/sparc/mm/init_64.h b/arch/sparc/mm/init_64.h
index 3ccbf92..ac49119 100644
--- a/arch/sparc/mm/init_64.h
+++ b/arch/sparc/mm/init_64.h
@@ -31,15 +31,4 @@ extern unsigned long kern_locked_tte_data;

extern void prom_world(int enter);

-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-#define VMEMMAP_CHUNK_SHIFT 22
-#define VMEMMAP_CHUNK (1UL << VMEMMAP_CHUNK_SHIFT)
-#define VMEMMAP_CHUNK_MASK ~(VMEMMAP_CHUNK - 1UL)
-#define VMEMMAP_ALIGN(x) (((x)+VMEMMAP_CHUNK-1UL)&VMEMMAP_CHUNK_MASK)
-
-#define VMEMMAP_SIZE ((((1UL << MAX_PHYSADDR_BITS) >> PAGE_SHIFT) * \
- sizeof(struct page)) >> VMEMMAP_CHUNK_SHIFT)
-extern unsigned long vmemmap_table[VMEMMAP_SIZE];
-#endif
-
#endif /* _SPARC64_MM_INIT_H */
--
1.9.1

2014-11-06 23:12:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 044/162] sparc64: T5 PMU

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: bob picco <[email protected]>

[ Upstream commit 05aa1651e8b9ca078b1808a2fe7b50703353ec02 ]

The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: [email protected]
Signed-off-by: Bob Picco <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/hypervisor.h | 11 +++++++++
arch/sparc/kernel/hvapi.c | 1 +
arch/sparc/kernel/hvcalls.S | 16 +++++++++++++
arch/sparc/kernel/pcr.c | 47 +++++++++++++++++++++++++++++++++----
arch/sparc/kernel/perf_event.c | 3 ++-
5 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h
index ca121f0..17be9d6 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2944,6 +2944,16 @@ extern unsigned long sun4v_vt_set_perfreg(unsigned long reg_num,
unsigned long reg_val);
#endif

+#define HV_FAST_T5_GET_PERFREG 0x1a8
+#define HV_FAST_T5_SET_PERFREG 0x1a9
+
+#ifndef __ASSEMBLY__
+unsigned long sun4v_t5_get_perfreg(unsigned long reg_num,
+ unsigned long *reg_val);
+unsigned long sun4v_t5_set_perfreg(unsigned long reg_num,
+ unsigned long reg_val);
+#endif
+
/* Function numbers for HV_CORE_TRAP. */
#define HV_CORE_SET_VER 0x00
#define HV_CORE_PUTCHAR 0x01
@@ -2975,6 +2985,7 @@ extern unsigned long sun4v_vt_set_perfreg(unsigned long reg_num,
#define HV_GRP_VF_CPU 0x0205
#define HV_GRP_KT_CPU 0x0209
#define HV_GRP_VT_CPU 0x020c
+#define HV_GRP_T5_CPU 0x0211
#define HV_GRP_DIAG 0x0300

#ifndef __ASSEMBLY__
diff --git a/arch/sparc/kernel/hvapi.c b/arch/sparc/kernel/hvapi.c
index c0a2de0..5c55145 100644
--- a/arch/sparc/kernel/hvapi.c
+++ b/arch/sparc/kernel/hvapi.c
@@ -46,6 +46,7 @@ static struct api_info api_table[] = {
{ .group = HV_GRP_VF_CPU, },
{ .group = HV_GRP_KT_CPU, },
{ .group = HV_GRP_VT_CPU, },
+ { .group = HV_GRP_T5_CPU, },
{ .group = HV_GRP_DIAG, .flags = FLAG_PRE_API },
};

diff --git a/arch/sparc/kernel/hvcalls.S b/arch/sparc/kernel/hvcalls.S
index f3ab509..caedf83 100644
--- a/arch/sparc/kernel/hvcalls.S
+++ b/arch/sparc/kernel/hvcalls.S
@@ -821,3 +821,19 @@ ENTRY(sun4v_vt_set_perfreg)
retl
nop
ENDPROC(sun4v_vt_set_perfreg)
+
+ENTRY(sun4v_t5_get_perfreg)
+ mov %o1, %o4
+ mov HV_FAST_T5_GET_PERFREG, %o5
+ ta HV_FAST_TRAP
+ stx %o1, [%o4]
+ retl
+ nop
+ENDPROC(sun4v_t5_get_perfreg)
+
+ENTRY(sun4v_t5_set_perfreg)
+ mov HV_FAST_T5_SET_PERFREG, %o5
+ ta HV_FAST_TRAP
+ retl
+ nop
+ENDPROC(sun4v_t5_set_perfreg)
diff --git a/arch/sparc/kernel/pcr.c b/arch/sparc/kernel/pcr.c
index 269af58..7e967c8 100644
--- a/arch/sparc/kernel/pcr.c
+++ b/arch/sparc/kernel/pcr.c
@@ -191,12 +191,41 @@ static const struct pcr_ops n4_pcr_ops = {
.pcr_nmi_disable = PCR_N4_PICNPT,
};

+static u64 n5_pcr_read(unsigned long reg_num)
+{
+ unsigned long val;
+
+ (void) sun4v_t5_get_perfreg(reg_num, &val);
+
+ return val;
+}
+
+static void n5_pcr_write(unsigned long reg_num, u64 val)
+{
+ (void) sun4v_t5_set_perfreg(reg_num, val);
+}
+
+static const struct pcr_ops n5_pcr_ops = {
+ .read_pcr = n5_pcr_read,
+ .write_pcr = n5_pcr_write,
+ .read_pic = n4_pic_read,
+ .write_pic = n4_pic_write,
+ .nmi_picl_value = n4_picl_value,
+ .pcr_nmi_enable = (PCR_N4_PICNPT | PCR_N4_STRACE |
+ PCR_N4_UTRACE | PCR_N4_TOE |
+ (26 << PCR_N4_SL_SHIFT)),
+ .pcr_nmi_disable = PCR_N4_PICNPT,
+};
+
+
static unsigned long perf_hsvc_group;
static unsigned long perf_hsvc_major;
static unsigned long perf_hsvc_minor;

static int __init register_perf_hsvc(void)
{
+ unsigned long hverror;
+
if (tlb_type == hypervisor) {
switch (sun4v_chip_type) {
case SUN4V_CHIP_NIAGARA1:
@@ -215,6 +244,10 @@ static int __init register_perf_hsvc(void)
perf_hsvc_group = HV_GRP_VT_CPU;
break;

+ case SUN4V_CHIP_NIAGARA5:
+ perf_hsvc_group = HV_GRP_T5_CPU;
+ break;
+
default:
return -ENODEV;
}
@@ -222,10 +255,12 @@ static int __init register_perf_hsvc(void)

perf_hsvc_major = 1;
perf_hsvc_minor = 0;
- if (sun4v_hvapi_register(perf_hsvc_group,
- perf_hsvc_major,
- &perf_hsvc_minor)) {
- printk("perfmon: Could not register hvapi.\n");
+ hverror = sun4v_hvapi_register(perf_hsvc_group,
+ perf_hsvc_major,
+ &perf_hsvc_minor);
+ if (hverror) {
+ pr_err("perfmon: Could not register hvapi(0x%lx).\n",
+ hverror);
return -ENODEV;
}
}
@@ -254,6 +289,10 @@ static int __init setup_sun4v_pcr_ops(void)
pcr_ops = &n4_pcr_ops;
break;

+ case SUN4V_CHIP_NIAGARA5:
+ pcr_ops = &n5_pcr_ops;
+ break;
+
default:
ret = -ENODEV;
break;
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 857baca..617b9fe 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1662,7 +1662,8 @@ static bool __init supported_pmu(void)
sparc_pmu = &niagara2_pmu;
return true;
}
- if (!strcmp(sparc_pmu_type, "niagara4")) {
+ if (!strcmp(sparc_pmu_type, "niagara4") ||
+ !strcmp(sparc_pmu_type, "niagara5")) {
sparc_pmu = &niagara4_pmu;
return true;
}
--
1.9.1

2014-11-06 23:13:19

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 055/162] sparc64: Fix register corruption in top-most kernel stack frame during boot.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Upstream commit ef3e035c3a9b81da8a778bc333d10637acf6c199 ]

Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
eventually narrowed this down to only impacting machines using
UltraSPARC-III and derivitive cpus.

The crash happens right when the first user process is spawned:

[ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[ 54.451346]
[ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96
[ 54.666431] Call Trace:
[ 54.698453] [0000000000762f8c] panic+0xb0/0x224
[ 54.759071] [000000000045cf68] do_exit+0x948/0x960
[ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100
[ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10
[ 54.978662] Press Stop-A (L1-A) to return to the boot prom
[ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Further investigation showed that compiling only per_cpu_patch() with
an older compiler fixes the boot.

Detailed analysis showed that the function is not being miscompiled by
gcc-4.9, but it is using a different register allocation ordering.

With the gcc-4.9 compiled function, something during the code patching
causes some of the %i* input registers to get corrupted. Perhaps
we have a TLB miss path into the firmware that is deep enough to
cause a register window spill and subsequent restore when we get
back from the TLB miss trap.

Let's plug this up by doing two things:

1) Stop using the firmware stack for client interface calls into
the firmware. Just use the kernel's stack.

2) As soon as we can, call into a new function "start_early_boot()"
to put a one-register-window buffer between the firmware's
deepest stack frame and the top-most initial kernel one.

Reported-by: Meelis Roos <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/sparc/include/asm/oplib_64.h | 3 ++-
arch/sparc/include/asm/setup.h | 4 ++++
arch/sparc/kernel/entry.h | 3 ---
arch/sparc/kernel/head_64.S | 40 ++++-----------------------------------
arch/sparc/kernel/hvtramp.S | 1 -
arch/sparc/kernel/setup_64.c | 28 +++++++++++++++++++--------
arch/sparc/kernel/trampoline_64.S | 12 +++++++-----
arch/sparc/prom/cif.S | 5 ++---
arch/sparc/prom/init_64.c | 6 +++---
arch/sparc/prom/p1275.c | 2 --
10 files changed, 42 insertions(+), 62 deletions(-)

diff --git a/arch/sparc/include/asm/oplib_64.h b/arch/sparc/include/asm/oplib_64.h
index a12dbe3..e48fdf4 100644
--- a/arch/sparc/include/asm/oplib_64.h
+++ b/arch/sparc/include/asm/oplib_64.h
@@ -62,7 +62,8 @@ struct linux_mem_p1275 {
/* You must call prom_init() before using any of the library services,
* preferably as early as possible. Pass it the romvec pointer.
*/
-extern void prom_init(void *cif_handler, void *cif_stack);
+extern void prom_init(void *cif_handler);
+extern void prom_init_report(void);

/* Boot argument acquisition, returns the boot command line string. */
extern char *prom_getbootargs(void);
diff --git a/arch/sparc/include/asm/setup.h b/arch/sparc/include/asm/setup.h
index 5e35e05..acd6146 100644
--- a/arch/sparc/include/asm/setup.h
+++ b/arch/sparc/include/asm/setup.h
@@ -24,6 +24,10 @@ static inline int con_is_present(void)
}
#endif

+#ifdef CONFIG_SPARC64
+extern void __init start_early_boot(void);
+#endif
+
extern void sun_do_break(void);
extern int stop_a_enabled;
extern int scons_pwroff;
diff --git a/arch/sparc/kernel/entry.h b/arch/sparc/kernel/entry.h
index 140966f..c88ffb9 100644
--- a/arch/sparc/kernel/entry.h
+++ b/arch/sparc/kernel/entry.h
@@ -66,13 +66,10 @@ struct pause_patch_entry {
extern struct pause_patch_entry __pause_3insn_patch,
__pause_3insn_patch_end;

-extern void __init per_cpu_patch(void);
extern void sun4v_patch_1insn_range(struct sun4v_1insn_patch_entry *,
struct sun4v_1insn_patch_entry *);
extern void sun4v_patch_2insn_range(struct sun4v_2insn_patch_entry *,
struct sun4v_2insn_patch_entry *);
-extern void __init sun4v_patch(void);
-extern void __init boot_cpu_id_too_large(int cpu);
extern unsigned int dcache_parity_tl1_occurred;
extern unsigned int icache_parity_tl1_occurred;

diff --git a/arch/sparc/kernel/head_64.S b/arch/sparc/kernel/head_64.S
index 4fdeb80..3d61fca 100644
--- a/arch/sparc/kernel/head_64.S
+++ b/arch/sparc/kernel/head_64.S
@@ -672,14 +672,12 @@ tlb_fixup_done:
sethi %hi(init_thread_union), %g6
or %g6, %lo(init_thread_union), %g6
ldx [%g6 + TI_TASK], %g4
- mov %sp, %l6

wr %g0, ASI_P, %asi
mov 1, %g1
sllx %g1, THREAD_SHIFT, %g1
sub %g1, (STACKFRAME_SZ + STACK_BIAS), %g1
add %g6, %g1, %sp
- mov 0, %fp

/* Set per-cpu pointer initially to zero, this makes
* the boot-cpu use the in-kernel-image per-cpu areas
@@ -706,44 +704,14 @@ tlb_fixup_done:
nop
#endif

- mov %l6, %o1 ! OpenPROM stack
call prom_init
mov %l7, %o0 ! OpenPROM cif handler

- /* Initialize current_thread_info()->cpu as early as possible.
- * In order to do that accurately we have to patch up the get_cpuid()
- * assembler sequences. And that, in turn, requires that we know
- * if we are on a Starfire box or not. While we're here, patch up
- * the sun4v sequences as well.
+ /* To create a one-register-window buffer between the kernel's
+ * initial stack and the last stack frame we use from the firmware,
+ * do the rest of the boot from a C helper function.
*/
- call check_if_starfire
- nop
- call per_cpu_patch
- nop
- call sun4v_patch
- nop
-
-#ifdef CONFIG_SMP
- call hard_smp_processor_id
- nop
- cmp %o0, NR_CPUS
- blu,pt %xcc, 1f
- nop
- call boot_cpu_id_too_large
- nop
- /* Not reached... */
-
-1:
-#else
- mov 0, %o0
-#endif
- sth %o0, [%g6 + TI_CPU]
-
- call prom_init_report
- nop
-
- /* Off we go.... */
- call start_kernel
+ call start_early_boot
nop
/* Not reached... */

diff --git a/arch/sparc/kernel/hvtramp.S b/arch/sparc/kernel/hvtramp.S
index 4eb1a5a..4ad8138 100644
--- a/arch/sparc/kernel/hvtramp.S
+++ b/arch/sparc/kernel/hvtramp.S
@@ -110,7 +110,6 @@ hv_cpu_startup:
sllx %g5, THREAD_SHIFT, %g5
sub %g5, (STACKFRAME_SZ + STACK_BIAS), %g5
add %g6, %g5, %sp
- mov 0, %fp

call init_irqwork_curcpu
nop
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 1c7bfdf..61a5198 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/initrd.h>
#include <linux/module.h>
+#include <linux/start_kernel.h>

#include <asm/io.h>
#include <asm/processor.h>
@@ -174,7 +175,7 @@ char reboot_command[COMMAND_LINE_SIZE];

static struct pt_regs fake_swapper_regs = { { 0, }, 0, 0, 0, 0 };

-void __init per_cpu_patch(void)
+static void __init per_cpu_patch(void)
{
struct cpuid_patch_entry *p;
unsigned long ver;
@@ -266,7 +267,7 @@ void sun4v_patch_2insn_range(struct sun4v_2insn_patch_entry *start,
}
}

-void __init sun4v_patch(void)
+static void __init sun4v_patch(void)
{
extern void sun4v_hvapi_init(void);

@@ -335,14 +336,25 @@ static void __init pause_patch(void)
}
}

-#ifdef CONFIG_SMP
-void __init boot_cpu_id_too_large(int cpu)
+void __init start_early_boot(void)
{
- prom_printf("Serious problem, boot cpu id (%d) >= NR_CPUS (%d)\n",
- cpu, NR_CPUS);
- prom_halt();
+ int cpu;
+
+ check_if_starfire();
+ per_cpu_patch();
+ sun4v_patch();
+
+ cpu = hard_smp_processor_id();
+ if (cpu >= NR_CPUS) {
+ prom_printf("Serious problem, boot cpu id (%d) >= NR_CPUS (%d)\n",
+ cpu, NR_CPUS);
+ prom_halt();
+ }
+ current_thread_info()->cpu = cpu;
+
+ prom_init_report();
+ start_kernel();
}
-#endif

/* On Ultra, we support all of the v8 capabilities. */
unsigned long sparc64_elf_hwcap = (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR |
diff --git a/arch/sparc/kernel/trampoline_64.S b/arch/sparc/kernel/trampoline_64.S
index ad4bde3..092a39d 100644
--- a/arch/sparc/kernel/trampoline_64.S
+++ b/arch/sparc/kernel/trampoline_64.S
@@ -110,10 +110,13 @@ startup_continue:
brnz,pn %g1, 1b
nop

- sethi %hi(p1275buf), %g2
- or %g2, %lo(p1275buf), %g2
- ldx [%g2 + 0x10], %l2
- add %l2, -(192 + 128), %sp
+ /* Get onto temporary stack which will be in the locked
+ * kernel image.
+ */
+ sethi %hi(tramp_stack), %g1
+ or %g1, %lo(tramp_stack), %g1
+ add %g1, TRAMP_STACK_SIZE, %g1
+ sub %g1, STACKFRAME_SZ + STACK_BIAS + 256, %sp
flushw

/* Setup the loop variables:
@@ -395,7 +398,6 @@ after_lock_tlb:
sllx %g5, THREAD_SHIFT, %g5
sub %g5, (STACKFRAME_SZ + STACK_BIAS), %g5
add %g6, %g5, %sp
- mov 0, %fp

rdpr %pstate, %o1
or %o1, PSTATE_IE, %o1
diff --git a/arch/sparc/prom/cif.S b/arch/sparc/prom/cif.S
index 9c86b4b..8050f38 100644
--- a/arch/sparc/prom/cif.S
+++ b/arch/sparc/prom/cif.S
@@ -11,11 +11,10 @@
.text
.globl prom_cif_direct
prom_cif_direct:
+ save %sp, -192, %sp
sethi %hi(p1275buf), %o1
or %o1, %lo(p1275buf), %o1
- ldx [%o1 + 0x0010], %o2 ! prom_cif_stack
- save %o2, -192, %sp
- ldx [%i1 + 0x0008], %l2 ! prom_cif_handler
+ ldx [%o1 + 0x0008], %l2 ! prom_cif_handler
mov %g4, %l0
mov %g5, %l1
mov %g6, %l3
diff --git a/arch/sparc/prom/init_64.c b/arch/sparc/prom/init_64.c
index d95db75..110b0d7 100644
--- a/arch/sparc/prom/init_64.c
+++ b/arch/sparc/prom/init_64.c
@@ -26,13 +26,13 @@ phandle prom_chosen_node;
* It gets passed the pointer to the PROM vector.
*/

-extern void prom_cif_init(void *, void *);
+extern void prom_cif_init(void *);

-void __init prom_init(void *cif_handler, void *cif_stack)
+void __init prom_init(void *cif_handler)
{
phandle node;

- prom_cif_init(cif_handler, cif_stack);
+ prom_cif_init(cif_handler);

prom_chosen_node = prom_finddevice(prom_chosen_path);
if (!prom_chosen_node || (s32)prom_chosen_node == -1)
diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index ea5b427..fda23e6 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -21,7 +21,6 @@
struct {
long prom_callback; /* 0x00 */
void (*prom_cif_handler)(long *); /* 0x08 */
- unsigned long prom_cif_stack; /* 0x10 */
} p1275buf;

extern void prom_world(int);
@@ -53,5 +52,4 @@ void p1275_cmd_direct(unsigned long *args)
void prom_cif_init(void *cif_handler, void *cif_stack)
{
p1275buf.prom_cif_handler = (void (*)(long *))cif_handler;
- p1275buf.prom_cif_stack = (unsigned long)cif_stack;
}
--
1.9.1

2014-11-06 23:14:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 146/162] drm/radeon: remove invalid pci id

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit 8c3e434769b1707fd2d24de5a2eb25fedc634c4a upstream.

0x4c6e is a secondary device id so should not be used
by the driver.

Noticed-by: Mark Kettenis <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/drm/drm_pciids.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/drm/drm_pciids.h b/include/drm/drm_pciids.h
index bcec4c4..ca52de5 100644
--- a/include/drm/drm_pciids.h
+++ b/include/drm/drm_pciids.h
@@ -74,7 +74,6 @@
{0x1002, 0x4C64, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV250|RADEON_IS_MOBILITY}, \
{0x1002, 0x4C66, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV250|RADEON_IS_MOBILITY}, \
{0x1002, 0x4C67, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV250|RADEON_IS_MOBILITY}, \
- {0x1002, 0x4C6E, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280|RADEON_IS_MOBILITY}, \
{0x1002, 0x4E44, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R300}, \
{0x1002, 0x4E45, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R300}, \
{0x1002, 0x4E46, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R300}, \
--
1.9.1

2014-11-06 22:40:31

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 151/162] lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit ea5d05b34aca25c066e0699512d0ffbd8ee6ac3e upstream.

If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.

Coverity id: 1192175
Coverity id: 1192174
Signed-off-by: Jan Kara <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
lib/bitmap.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 06f7e4f..e5c4ebe 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -131,7 +131,9 @@ void __bitmap_shift_right(unsigned long *dst,
lower = src[off + k];
if (left && off + k == lim - 1)
lower &= mask;
- dst[k] = upper << (BITS_PER_LONG - rem) | lower >> rem;
+ dst[k] = lower >> rem;
+ if (rem)
+ dst[k] |= upper << (BITS_PER_LONG - rem);
if (left && k == lim - 1)
dst[k] &= mask;
}
@@ -172,7 +174,9 @@ void __bitmap_shift_left(unsigned long *dst,
upper = src[k];
if (left && k == lim - 1)
upper &= (1UL << left) - 1;
- dst[k + off] = lower >> (BITS_PER_LONG - rem) | upper << rem;
+ dst[k + off] = upper << rem;
+ if (rem)
+ dst[k + off] |= lower >> (BITS_PER_LONG - rem);
if (left && k + off == lim - 1)
dst[k + off] &= (1UL << left) - 1;
}
--
1.9.1

2014-11-06 23:14:24

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 139/162] futex: Fix a race condition between REQUEUE_PI and task death

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Silverman <[email protected]>

commit 30a6b8031fe14031ab27c1fa3483cb9780e7f63c upstream.

free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:

task A | task B
exit_pi_state_list |
pi_state = |
curr->pi_state_list->next |
| futex_requeue(requeue_pi=1)
| // pi_state is the same as
| // the one in task A
| free_pi_state(pi_state)
| list_del_init(&pi_state->list)
| kfree(pi_state)
list_del_init(&pi_state->list) |

Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.

[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
debugging. The latter comes via a seperate patch ]

Signed-off-by: Brian Silverman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
kernel/futex.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 2b1583e..7947e4c 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -492,8 +492,14 @@ static struct futex_pi_state * alloc_pi_state(void)
return pi_state;
}

+/*
+ * Must be called with the hb lock held.
+ */
static void free_pi_state(struct futex_pi_state *pi_state)
{
+ if (!pi_state)
+ return;
+
if (!atomic_dec_and_test(&pi_state->refcount))
return;

@@ -1407,15 +1413,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
}

retry:
- if (pi_state != NULL) {
- /*
- * We will have to lookup the pi_state again, so free this one
- * to keep the accounting correct.
- */
- free_pi_state(pi_state);
- pi_state = NULL;
- }
-
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
if (unlikely(ret != 0))
goto out;
@@ -1503,6 +1500,8 @@ retry_private:
case 0:
break;
case -EFAULT:
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1512,6 +1511,8 @@ retry_private:
goto out;
case -EAGAIN:
/* The owner was exiting, try again. */
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1588,6 +1589,7 @@ retry_private:
}

out_unlock:
+ free_pi_state(pi_state);
double_unlock_hb(hb1, hb2);

/*
@@ -1604,8 +1606,6 @@ out_put_keys:
out_put_key1:
put_futex_key(&key1);
out:
- if (pi_state != NULL)
- free_pi_state(pi_state);
return ret ? ret : task_count;
}

--
1.9.1

2014-11-06 22:40:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 149/162] mm: free compound page with correct order

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Yu Zhao <[email protected]>

commit 5ddacbe92b806cd5b4f8f154e8e46ac267fff55c upstream.

Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.

The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.

This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.

$ cat /proc/vmstat | grep thp_zero_page_alloc
thp_zero_page_alloc 55
thp_zero_page_alloc_failed 0

This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.

Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Yu Zhao <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Bob Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 64a7f9c..310f27e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -193,7 +193,7 @@ retry:
preempt_disable();
if (cmpxchg(&huge_zero_page, NULL, zero_page)) {
preempt_enable();
- __free_page(zero_page);
+ __free_pages(zero_page, compound_order(zero_page));
goto retry;
}

@@ -225,7 +225,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) {
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
- __free_page(zero_page);
+ __free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}

--
1.9.1

2014-11-06 23:14:55

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 155/162] ext4: bail out from make_indexed_dir() on first error

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 6050d47adcadbb53582434d919ed7f038d936712 upstream.

When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).

Coverity-id: 741300
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/namei.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 08de398..c8fd7ce 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1829,31 +1829,39 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
ext4fs_dirhash(name, namelen, &hinfo);
+ memset(frames, 0, sizeof(frames));
frame = frames;
frame->entries = entries;
frame->at = entries;
frame->bh = bh;
bh = bh2;

- ext4_handle_dirty_dx_node(handle, dir, frame->bh);
- ext4_handle_dirty_dirent_node(handle, dir, bh);
+ retval = ext4_handle_dirty_dx_node(handle, dir, frame->bh);
+ if (retval)
+ goto out_frames;
+ retval = ext4_handle_dirty_dirent_node(handle, dir, bh);
+ if (retval)
+ goto out_frames;

de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
if (!de) {
- /*
- * Even if the block split failed, we have to properly write
- * out all the changes we did so far. Otherwise we can end up
- * with corrupted filesystem.
- */
- ext4_mark_inode_dirty(handle, dir);
- dx_release(frames);
- return retval;
+ retval = PTR_ERR(de);
+ goto out_frames;
}
dx_release(frames);

retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
brelse(bh);
return retval;
+out_frames:
+ /*
+ * Even if the block split failed, we have to properly write
+ * out all the changes we did so far. Otherwise we can end up
+ * with corrupted filesystem.
+ */
+ ext4_mark_inode_dirty(handle, dir);
+ dx_release(frames);
+ return retval;
}

/*
--
1.9.1

2014-11-06 23:15:28

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 152/162] ext4: fix overflow when updating superblock backups after resize

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 9378c6768e4fca48971e7b6a9075bc006eda981d upstream.

When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.

Coverity-id: 741252
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/resize.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 14e0f8a..2400ad1 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1071,7 +1071,7 @@ static void update_backups(struct super_block *sb, int blk_off, char *data,
break;

if (meta_bg == 0)
- backup_block = group * bpg + blk_off;
+ backup_block = ((ext4_fsblk_t)group) * bpg + blk_off;
else
backup_block = (ext4_group_first_block_no(sb, group) +
ext4_bg_has_super(sb, group));
--
1.9.1

2014-11-06 23:15:56

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 136/162] usb: dwc3: gadget: Properly initialize LINK TRB

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jack Pham <[email protected]>

commit 1200a82a59b6aa65758ccc92c3447b98c53cd7a2 upstream.

On ISOC endpoints the last trb_pool entry used as a
LINK TRB is not getting zeroed out correctly due to
memset being called incorrectly and in the wrong place.
If pool allocated from DMA was not zero-initialized
to begin with this will result in the size and ctrl
values being random garbage. Call memset correctly after
assignment of the trb_link pointer.

Fixes: f6bafc6a1c ("usb: dwc3: convert TRBs into bitshifts")
Signed-off-by: Jack Pham <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/dwc3/gadget.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 27b9c8a..20e4d2e 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -532,12 +532,11 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep,
if (!usb_endpoint_xfer_isoc(desc))
return 0;

- memset(&trb_link, 0, sizeof(trb_link));
-
/* Link TRB for ISOC. The HWO bit is never reset */
trb_st_hw = &dep->trb_pool[0];

trb_link = &dep->trb_pool[DWC3_TRB_NUM - 1];
+ memset(trb_link, 0, sizeof(*trb_link));

trb_link->bpl = lower_32_bits(dwc3_trb_dma_offset(dep, trb_st_hw));
trb_link->bph = upper_32_bits(dwc3_trb_dma_offset(dep, trb_st_hw));
--
1.9.1

2014-11-06 23:16:49

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 142/162] ima: check xattr value length and type in the ima_inode_setxattr()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Kasatkin <[email protected]>

commit a48fda9de94500a3152a56b723d0a64ae236547c upstream.

ima_inode_setxattr() can be called with no value. Function does not
check the length so that following command can be used to produce
kernel oops: setfattr -n security.ima FOO. This patch fixes it.

Changes in v3:
* for stable reverted "allow setting hash only in fix or log mode"
It will be a separate patch.

Changes in v2:
* testing validity of xattr type
* allow setting hash only in fix or log mode (Mimi)

[ 261.562522] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 261.564109] IP: [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] PGD 3112f067 PUD 42965067 PMD 0
[ 261.564109] Oops: 0000 [#1] SMP
[ 261.564109] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 261.564109] CPU: 0 PID: 3299 Comm: setxattr Not tainted 3.16.0-kds+ #2924
[ 261.564109] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 261.564109] task: ffff8800428c2430 ti: ffff880042be0000 task.ti: ffff880042be0000
[ 261.564109] RIP: 0010:[<ffffffff812af272>] [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] RSP: 0018:ffff880042be3d50 EFLAGS: 00010246
[ 261.564109] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000015
[ 261.564109] RDX: 0000001500000000 RSI: 0000000000000000 RDI: ffff8800375cc600
[ 261.564109] RBP: ffff880042be3d68 R08: 0000000000000000 R09: 00000000004d6256
[ 261.564109] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88002149ba00
[ 261.564109] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 261.564109] FS: 00007f6c1e219740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
[ 261.564109] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.564109] CR2: 0000000000000000 CR3: 000000003b35a000 CR4: 00000000000006f0
[ 261.564109] Stack:
[ 261.564109] ffff88002149ba00 ffff880042be3df8 0000000000000000 ffff880042be3d98
[ 261.564109] ffffffff812a101b ffff88002149ba00 ffff880042be3df8 0000000000000000
[ 261.564109] 0000000000000000 ffff880042be3de0 ffffffff8116d08a ffff880042be3dc8
[ 261.564109] Call Trace:
[ 261.564109] [<ffffffff812a101b>] security_inode_setxattr+0x48/0x6a
[ 261.564109] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f
[ 261.564109] [<ffffffff8116d1e0>] setxattr+0x122/0x16c
[ 261.564109] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 261.564109] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143
[ 261.564109] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45
[ 261.564109] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f
[ 261.564109] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0
[ 261.564109] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b
[ 261.564109] Code: 48 89 f7 48 c7 c6 58 36 81 81 53 31 db e8 73 27 04 00 85 c0 75 28 bf 15 00 00 00 e8 8a a5 d9 ff 84 c0 75 05 83 cb ff eb 15 31 f6 <41> 80 7d 00 03 49 8b 7c 24 68 40 0f 94 c6 e8 e1 f9 ff ff 89 d8
[ 261.564109] RIP [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a
[ 261.564109] RSP <ffff880042be3d50>
[ 261.564109] CR2: 0000000000000000
[ 261.599998] ---[ end trace 39a89a3fc267e652 ]---

Reported-by: Jan Kara <[email protected]>
Signed-off-by: Dmitry Kasatkin <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
[ kamal: backport to 3.13-stable: xvalue def ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
security/integrity/ima/ima_appraise.c | 3 +++
security/integrity/integrity.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 734e946..c7190b7 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -359,11 +359,14 @@ static void ima_reset_appraise_flags(struct inode *inode)
int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
const void *xattr_value, size_t xattr_value_len)
{
+ const struct evm_ima_xattr_data *xvalue = xattr_value;
int result;

result = ima_protect_xattr(dentry, xattr_name, xattr_value,
xattr_value_len);
if (result == 1) {
+ if (!xattr_value_len || (xvalue->type >= IMA_XATTR_LAST))
+ return -EINVAL;
ima_reset_appraise_flags(dentry->d_inode);
result = 0;
}
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 33c0a70..8f2fbdc 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -56,6 +56,7 @@ enum evm_ima_xattr_type {
EVM_XATTR_HMAC,
EVM_IMA_XATTR_DIGSIG,
IMA_XATTR_DIGEST_NG,
+ IMA_XATTR_LAST
};

struct evm_ima_xattr_data {
--
1.9.1

2014-11-06 23:17:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 140/162] PM / Sleep: fix recovery during resuming from hibernation

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Imre Deak <[email protected]>

commit 94fb823fcb4892614f57e59601bb9d4920f24711 upstream.

If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Signed-off-by: Imre Deak <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
kernel/power/hibernate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 0121dab..7ef5244 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -491,8 +491,14 @@ int hibernation_restore(int platform_mode)
error = dpm_suspend_start(PMSG_QUIESCE);
if (!error) {
error = resume_target_kernel(platform_mode);
- dpm_resume_end(PMSG_RECOVER);
+ /*
+ * The above should either succeed and jump to the new kernel,
+ * or return with an error. Otherwise things are just
+ * undefined, so let's be paranoid.
+ */
+ BUG_ON(!error);
}
+ dpm_resume_end(PMSG_RECOVER);
pm_restore_gfp_mask();
ftrace_start();
resume_console();
--
1.9.1

2014-11-06 23:17:35

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 153/162] ext4: fix oops when loading block bitmap failed

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 599a9b77ab289d85c2d5c8607624efbe1f552b0f upstream.

When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().

Coverity-id: 989065
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/ext4/ialloc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 158ddf6..a8d1a64 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -864,6 +864,10 @@ got:
struct buffer_head *block_bitmap_bh;

block_bitmap_bh = ext4_read_block_bitmap(sb, group);
+ if (!block_bitmap_bh) {
+ err = -EIO;
+ goto out;
+ }
BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
err = ext4_journal_get_write_access(handle, block_bitmap_bh);
if (err) {
--
1.9.1

2014-11-06 23:18:22

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 125/162] cpc925_edac: Report UE events properly

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <[email protected]>

commit fa19ac4b92bc2b5024af3e868f41f81fa738567a upstream.

Fix UE event being reported as HW_EVENT_ERR_CORRECTED.

Signed-off-by: Jason Baron <[email protected]>
Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/edac/cpc925_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/cpc925_edac.c b/drivers/edac/cpc925_edac.c
index df6575f..682288c 100644
--- a/drivers/edac/cpc925_edac.c
+++ b/drivers/edac/cpc925_edac.c
@@ -562,7 +562,7 @@ static void cpc925_mc_check(struct mem_ctl_info *mci)

if (apiexcp & UECC_EXCP_DETECTED) {
cpc925_mc_printk(mci, KERN_INFO, "DRAM UECC Fault\n");
- edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1,
+ edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1,
pfn, offset, 0,
csrow, -1, -1,
mci->ctl_name, "");
--
1.9.1

2014-11-06 23:18:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 131/162] nfsd4: fix crash on unknown operation number

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: "J. Bruce Fields" <[email protected]>

commit 51904b08072a8bf2b9ed74d1bd7a5300a614471d upstream.

Unknown operation numbers are caught in nfsd4_decode_compound() which
sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
error causes the main loop in nfsd4_proc_compound() to skip most
processing. But nfsd4_proc_compound also peeks ahead at the next
operation in one case and doesn't take similar precautions there.

Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 08c8e02..25024d5 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1233,7 +1233,8 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp)
*/
if (argp->opcnt == resp->opcnt)
return false;
-
+ if (next->opnum == OP_ILLEGAL)
+ return false;
nextd = OPDESC(next);
/*
* Rest of 2.6.3.1.1: certain operations will return WRONGSEC
--
1.9.1

2014-11-06 22:38:55

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 117/162] quota: Properly return errors from dquot_writeback_dquots()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 474d2605d119479e5aa050f738632e63589d4bb5 upstream.

Due to a switched left and right side of an assignment,
dquot_writeback_dquots() never returned error. This could result in
errors during quota writeback to not be reported to userspace properly.
Fix it.

Coverity-id: 1226884
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
fs/quota/dquot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index ce87c90..89da957 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -637,7 +637,7 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
dqstats_inc(DQST_LOOKUPS);
err = sb->dq_op->write_dquot(dquot);
if (!ret && err)
- err = ret;
+ ret = err;
dqput(dquot);
spin_lock(&dq_list_lock);
}
--
1.9.1

2014-11-06 23:19:38

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 130/162] usb: gadget: udc: core: fix kernel oops with soft-connect

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit bfa6b18c680450c17512c741ed1d818695747621 upstream.

Currently, there's no guarantee that udc->driver
will be valid when using soft_connect sysfs
interface. In fact, we can very easily trigger
a NULL pointer dereference by trying to disconnect
when a gadget driver isn't loaded.

Fix this bug:

~# echo disconnect > soft_connect
[ 33.685743] Unable to handle kernel NULL pointer dereference at virtual address 00000014
[ 33.694221] pgd = ed0cc000
[ 33.697174] [00000014] *pgd=ae351831, *pte=00000000, *ppte=00000000
[ 33.703766] Internal error: Oops: 17 [#1] SMP ARM
[ 33.708697] Modules linked in: xhci_plat_hcd xhci_hcd snd_soc_davinci_mcasp snd_soc_tlv320aic3x snd_soc_edma snd_soc_omap snd_soc_evm snd_soc_core dwc3 snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd lis3lv02d_i2c matrix_keypad lis3lv02d dwc3_omap input_polldev soundcore
[ 33.734372] CPU: 0 PID: 1457 Comm: bash Not tainted 3.17.0-09740-ga93416e-dirty #345
[ 33.742457] task: ee71ce00 ti: ee68a000 task.ti: ee68a000
[ 33.748116] PC is at usb_udc_softconn_store+0xa4/0xec
[ 33.753416] LR is at mark_held_locks+0x78/0x90
[ 33.758057] pc : [<c04df128>] lr : [<c00896a4>] psr: 20000013
[ 33.758057] sp : ee68bec8 ip : c0c00008 fp : ee68bee4
[ 33.770050] r10: ee6b394c r9 : ee68bf80 r8 : ee6062c0
[ 33.775508] r7 : 00000000 r6 : ee6062c0 r5 : 0000000b r4 : ee739408
[ 33.782346] r3 : 00000000 r2 : 00000000 r1 : ee71d390 r0 : ee664170
[ 33.789168] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 33.796636] Control: 10c5387d Table: ad0cc059 DAC: 00000015
[ 33.802638] Process bash (pid: 1457, stack limit = 0xee68a248)
[ 33.808740] Stack: (0xee68bec8 to 0xee68c000)
[ 33.813299] bec0: 0000000b c0411284 ee6062c0 00000000 ee68bef4 ee68bee8
[ 33.821862] bee0: c04112ac c04df090 ee68bf14 ee68bef8 c01c2868 c0411290 0000000b ee6b3940
[ 33.830419] bf00: 00000000 00000000 ee68bf4c ee68bf18 c01c1a24 c01c2818 00000000 00000000
[ 33.838990] bf20: ee61b940 ee2f47c0 0000000b 000ce408 ee68bf80 c000f304 ee68a000 00000000
[ 33.847544] bf40: ee68bf7c ee68bf50 c0152dd8 c01c1960 ee68bf7c c0170af8 ee68bf7c ee2f47c0
[ 33.856099] bf60: ee2f47c0 000ce408 0000000b c000f304 ee68bfa4 ee68bf80 c0153330 c0152d34
[ 33.864653] bf80: 00000000 00000000 0000000b 000ce408 b6e7fb50 00000004 00000000 ee68bfa8
[ 33.873204] bfa0: c000f080 c01532e8 0000000b 000ce408 00000001 000ce408 0000000b 00000000
[ 33.881763] bfc0: 0000000b 000ce408 b6e7fb50 00000004 0000000b 00000000 000c5758 00000000
[ 33.890319] bfe0: 00000000 bec2c924 b6de422d b6e1d226 40000030 00000001 75716d2f 00657565
[ 33.898890] [<c04df128>] (usb_udc_softconn_store) from [<c04112ac>] (dev_attr_store+0x28/0x34)
[ 33.907920] [<c04112ac>] (dev_attr_store) from [<c01c2868>] (sysfs_kf_write+0x5c/0x60)
[ 33.916200] [<c01c2868>] (sysfs_kf_write) from [<c01c1a24>] (kernfs_fop_write+0xd0/0x194)
[ 33.924773] [<c01c1a24>] (kernfs_fop_write) from [<c0152dd8>] (vfs_write+0xb0/0x1bc)
[ 33.932874] [<c0152dd8>] (vfs_write) from [<c0153330>] (SyS_write+0x54/0xb0)
[ 33.940247] [<c0153330>] (SyS_write) from [<c000f080>] (ret_fast_syscall+0x0/0x48)
[ 33.948160] Code: e1a01007 e12fff33 e5140004 e5143008 (e5933014)
[ 33.954625] ---[ end trace f849bead94eab7ea ]---

Fixes: 2ccea03 (usb: gadget: introduce UDC Class)
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/gadget/udc-core.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/usb/gadget/udc-core.c b/drivers/usb/gadget/udc-core.c
index 27768a7..9ce0b13 100644
--- a/drivers/usb/gadget/udc-core.c
+++ b/drivers/usb/gadget/udc-core.c
@@ -456,6 +456,11 @@ static ssize_t usb_udc_softconn_store(struct device *dev,
{
struct usb_udc *udc = container_of(dev, struct usb_udc, dev);

+ if (!udc->driver) {
+ dev_err(dev, "soft-connect without a gadget driver\n");
+ return -EOPNOTSUPP;
+ }
+
if (sysfs_streq(buf, "connect")) {
usb_gadget_udc_start(udc->gadget, udc->driver);
usb_gadget_connect(udc->gadget);
--
1.9.1

2014-11-06 23:20:13

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 122/162] x86, apic: Handle a bad TSC more gracefully

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit b47dcbdc5161d3d5756f430191e2840d9b855492 upstream.

If the TSC is unusable or disabled, then this patch fixes:

- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.

This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline

This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Bandan Das <[email protected]>
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
arch/x86/kernel/apic/apic.c | 4 ++--
arch/x86/kernel/tsc.c | 5 ++++-
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d278736..f9e7786 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1283,7 +1283,7 @@ void setup_local_APIC(void)
unsigned int value, queued;
int i, j, acked = 0;
unsigned long long tsc = 0, ntsc;
- long long max_loops = cpu_khz;
+ long long max_loops = cpu_khz ? cpu_khz : 1000000;

if (cpu_has_tsc)
rdtscll(tsc);
@@ -1380,7 +1380,7 @@ void setup_local_APIC(void)
break;
}
if (queued) {
- if (cpu_has_tsc) {
+ if (cpu_has_tsc && cpu_khz) {
rdtscll(ntsc);
max_loops = (cpu_khz << 10) - (ntsc - tsc);
} else
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 66af9af..dffebdd 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -974,14 +974,17 @@ void __init tsc_init(void)

x86_init.timers.tsc_pre_init();

- if (!cpu_has_tsc)
+ if (!cpu_has_tsc) {
+ setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
+ }

tsc_khz = x86_platform.calibrate_tsc();
cpu_khz = tsc_khz;

if (!tsc_khz) {
mark_tsc_unstable("could not calculate TSC khz");
+ setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
}

--
1.9.1

2014-11-06 23:21:01

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 134/162] iwlwifi: dvm: drop non VO frames when flushing

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit a0855054e59b0c5b2b00237fdb5147f7bcc18efb upstream.

When mac80211 wants to ensure that a frame is sent, it calls
the flush() callback. Until now, iwldvm implemented this by
waiting that all the frames are sent (ACKed or timeout).
In case of weak signal, this can take a significant amount
of time, delaying the next connection (in case of roaming).
Many users have reported that the flush would take too long
leading to the following error messages to be printed:

iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 2
iwlwifi 0000:03:00.0: Current SW read_ptr 161 write_ptr 201
iwl data: 00000000: 00 00 00 00 00 00 00 00 fe ff 01 00 00 00 00 00
[snip]
iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000
[snip]
iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [9,9]
[snip]

Instead of waiting for these packets, simply drop them. This
significantly improves the responsiveness of the network.
Note that all the queues are flushed, but the VO one. This
is not typically used by the applications and it likely
contains management frames that are useful for connection
or roaming.

This bug is tracked here:
https://bugzilla.kernel.org/show_bug.cgi?id=56581

But it is duplicated in distributions' trackers.
A simple search in Ubuntu's database led to these bugs:

https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1270808
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1305406
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356236
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1360597
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1361809

Depends-on: 77be2c54c5bd ("mac80211: add vif to flush call")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/wireless/iwlwifi/dvm/mac80211.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/dvm/mac80211.c b/drivers/net/wireless/iwlwifi/dvm/mac80211.c
index bf1107a..7f0846e 100644
--- a/drivers/net/wireless/iwlwifi/dvm/mac80211.c
+++ b/drivers/net/wireless/iwlwifi/dvm/mac80211.c
@@ -1103,6 +1103,7 @@ static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
u32 queues, bool drop)
{
struct iwl_priv *priv = IWL_MAC80211_GET_DVM(hw);
+ u32 scd_queues;

mutex_lock(&priv->mutex);
IWL_DEBUG_MAC80211(priv, "enter\n");
@@ -1116,18 +1117,19 @@ static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
goto done;
}

- /*
- * mac80211 will not push any more frames for transmit
- * until the flush is completed
- */
- if (drop) {
- IWL_DEBUG_MAC80211(priv, "send flush command\n");
- if (iwlagn_txfifo_flush(priv, 0)) {
- IWL_ERR(priv, "flush request fail\n");
- goto done;
- }
+ scd_queues = BIT(priv->cfg->base_params->num_of_queues) - 1;
+ scd_queues &= ~(BIT(IWL_IPAN_CMD_QUEUE_NUM) |
+ BIT(IWL_DEFAULT_CMD_QUEUE_NUM));
+
+ if (vif)
+ scd_queues &= ~BIT(vif->hw_queue[IEEE80211_AC_VO]);
+
+ IWL_DEBUG_TX_QUEUES(priv, "Flushing SCD queues: 0x%x\n", scd_queues);
+ if (iwlagn_txfifo_flush(priv, scd_queues)) {
+ IWL_ERR(priv, "flush request fail\n");
+ goto done;
}
- IWL_DEBUG_MAC80211(priv, "wait transmit/flush all frames\n");
+ IWL_DEBUG_TX_QUEUES(priv, "wait transmit/flush all frames\n");
iwl_trans_wait_tx_queue_empty(priv->trans);
done:
mutex_unlock(&priv->mutex);
--
1.9.1

2014-11-06 23:21:34

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 133/162] mac80211: add vif to flush call

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit 77be2c54c5bd26279abc13807398771d80cda37a upstream.

This will allow the low level driver to make decision based
on the vif such as queues etc...
Since the vif might be NULL, we can't add it to the tracing
functions.

Signed-off-by: Emmanuel Grumbach <[email protected]>
[fix staging rtl8821ae driver]
Signed-off-by: Johannes Berg <[email protected]>
[ kamal: 3.13-stable prereq backport (omitted rtl8821ae) for
a085505 iwlwifi: dvm: drop non VO frames when flushing ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/wireless/ath/ar5523/ar5523.c | 3 ++-
drivers/net/wireless/ath/ath10k/mac.c | 3 ++-
drivers/net/wireless/ath/ath9k/main.c | 3 ++-
drivers/net/wireless/ath/carl9170/main.c | 4 +++-
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c | 3 ++-
drivers/net/wireless/cw1200/sta.c | 3 ++-
drivers/net/wireless/cw1200/sta.h | 3 ++-
drivers/net/wireless/iwlegacy/common.c | 3 ++-
drivers/net/wireless/iwlegacy/common.h | 3 ++-
drivers/net/wireless/iwlwifi/dvm/mac80211.c | 3 ++-
drivers/net/wireless/mac80211_hwsim.c | 4 +++-
drivers/net/wireless/p54/main.c | 3 ++-
drivers/net/wireless/rt2x00/rt2x00.h | 3 ++-
drivers/net/wireless/rt2x00/rt2x00mac.c | 3 ++-
drivers/net/wireless/rtlwifi/core.c | 3 ++-
drivers/net/wireless/ti/wlcore/main.c | 3 ++-
include/net/mac80211.h | 4 +++-
net/mac80211/driver-ops.h | 8 +++++++-
net/mac80211/util.c | 2 +-
19 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
index 9cc2b91..5d219a4 100644
--- a/drivers/net/wireless/ath/ar5523/ar5523.c
+++ b/drivers/net/wireless/ath/ar5523/ar5523.c
@@ -1091,7 +1091,8 @@ static int ar5523_set_rts_threshold(struct ieee80211_hw *hw, u32 value)
return ret;
}

-static void ar5523_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void ar5523_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct ar5523 *ar = hw->priv;

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 97ac8c8..6f8e6ad 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -2939,7 +2939,8 @@ static int ath10k_set_frag_threshold(struct ieee80211_hw *hw, u32 value)
return ret;
}

-static void ath10k_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct ath10k *ar = hw->priv;
bool skip;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 21aa09e..5db6849 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1818,7 +1818,8 @@ static void ath9k_set_coverage_class(struct ieee80211_hw *hw, u8 coverage_class)
mutex_unlock(&sc->mutex);
}

-static void ath9k_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void ath9k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct ath_softc *sc = hw->priv;
struct ath_hw *ah = sc->sc_ah;
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 349fa22..b795b18 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -1708,7 +1708,9 @@ found:
return 0;
}

-static void carl9170_op_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void carl9170_op_flush(struct ieee80211_hw *hw,
+ struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct ar9170 *ar = hw->priv;
unsigned int vid;
diff --git a/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c b/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
index edc5d10..778f593 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
@@ -899,7 +899,8 @@ static bool brcms_tx_flush_completed(struct brcms_info *wl)
return result;
}

-static void brcms_ops_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void brcms_ops_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct brcms_info *wl = hw->priv;
int ret;
diff --git a/drivers/net/wireless/cw1200/sta.c b/drivers/net/wireless/cw1200/sta.c
index 010b252..949dc4e 100644
--- a/drivers/net/wireless/cw1200/sta.c
+++ b/drivers/net/wireless/cw1200/sta.c
@@ -935,7 +935,8 @@ static int __cw1200_flush(struct cw1200_common *priv, bool drop)
return ret;
}

-void cw1200_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+void cw1200_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct cw1200_common *priv = hw->priv;

diff --git a/drivers/net/wireless/cw1200/sta.h b/drivers/net/wireless/cw1200/sta.h
index 35babb6..b7e386b 100644
--- a/drivers/net/wireless/cw1200/sta.h
+++ b/drivers/net/wireless/cw1200/sta.h
@@ -40,7 +40,8 @@ int cw1200_set_key(struct ieee80211_hw *dev, enum set_key_cmd cmd,

int cw1200_set_rts_threshold(struct ieee80211_hw *hw, u32 value);

-void cw1200_flush(struct ieee80211_hw *hw, u32 queues, bool drop);
+void cw1200_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop);

u64 cw1200_prepare_multicast(struct ieee80211_hw *hw,
struct netdev_hw_addr_list *mc_list);
diff --git a/drivers/net/wireless/iwlegacy/common.c b/drivers/net/wireless/iwlegacy/common.c
index b03e22e..d9d22e5 100644
--- a/drivers/net/wireless/iwlegacy/common.c
+++ b/drivers/net/wireless/iwlegacy/common.c
@@ -4702,7 +4702,8 @@ out:
}
EXPORT_SYMBOL(il_mac_change_interface);

-void il_mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+void il_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct il_priv *il = hw->priv;
unsigned long timeout = jiffies + msecs_to_jiffies(500);
diff --git a/drivers/net/wireless/iwlegacy/common.h b/drivers/net/wireless/iwlegacy/common.h
index ad123d6..108e037 100644
--- a/drivers/net/wireless/iwlegacy/common.h
+++ b/drivers/net/wireless/iwlegacy/common.h
@@ -1722,7 +1722,8 @@ void il_mac_remove_interface(struct ieee80211_hw *hw,
struct ieee80211_vif *vif);
int il_mac_change_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
enum nl80211_iftype newtype, bool newp2p);
-void il_mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop);
+void il_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop);
int il_alloc_txq_mem(struct il_priv *il);
void il_free_txq_mem(struct il_priv *il);

diff --git a/drivers/net/wireless/iwlwifi/dvm/mac80211.c b/drivers/net/wireless/iwlwifi/dvm/mac80211.c
index d6e6405..bf1107a 100644
--- a/drivers/net/wireless/iwlwifi/dvm/mac80211.c
+++ b/drivers/net/wireless/iwlwifi/dvm/mac80211.c
@@ -1099,7 +1099,8 @@ static void iwlagn_configure_filter(struct ieee80211_hw *hw,
FIF_BCN_PRBRESP_PROMISC | FIF_CONTROL;
}

-static void iwlagn_mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct iwl_priv *priv = IWL_MAC80211_GET_DVM(hw);

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index a1b32ee..1a0c2bb 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1466,7 +1466,9 @@ static int mac80211_hwsim_ampdu_action(struct ieee80211_hw *hw,
return 0;
}

-static void mac80211_hwsim_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void mac80211_hwsim_flush(struct ieee80211_hw *hw,
+ struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
/* Not implemented, queues only on kernel side */
}
diff --git a/drivers/net/wireless/p54/main.c b/drivers/net/wireless/p54/main.c
index 067e6f2..c214332 100644
--- a/drivers/net/wireless/p54/main.c
+++ b/drivers/net/wireless/p54/main.c
@@ -670,7 +670,8 @@ static unsigned int p54_flush_count(struct p54_common *priv)
return total;
}

-static void p54_flush(struct ieee80211_hw *dev, u32 queues, bool drop)
+static void p54_flush(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct p54_common *priv = dev->priv;
unsigned int total, i;
diff --git a/drivers/net/wireless/rt2x00/rt2x00.h b/drivers/net/wireless/rt2x00/rt2x00.h
index c06ffb7..6a64211 100644
--- a/drivers/net/wireless/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/rt2x00/rt2x00.h
@@ -1451,7 +1451,8 @@ int rt2x00mac_conf_tx(struct ieee80211_hw *hw,
struct ieee80211_vif *vif, u16 queue,
const struct ieee80211_tx_queue_params *params);
void rt2x00mac_rfkill_poll(struct ieee80211_hw *hw);
-void rt2x00mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop);
+void rt2x00mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop);
int rt2x00mac_set_antenna(struct ieee80211_hw *hw, u32 tx_ant, u32 rx_ant);
int rt2x00mac_get_antenna(struct ieee80211_hw *hw, u32 *tx_ant, u32 *rx_ant);
void rt2x00mac_get_ringparam(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/rt2x00/rt2x00mac.c b/drivers/net/wireless/rt2x00/rt2x00mac.c
index bc21fae..04896fa 100644
--- a/drivers/net/wireless/rt2x00/rt2x00mac.c
+++ b/drivers/net/wireless/rt2x00/rt2x00mac.c
@@ -753,7 +753,8 @@ void rt2x00mac_rfkill_poll(struct ieee80211_hw *hw)
}
EXPORT_SYMBOL_GPL(rt2x00mac_rfkill_poll);

-void rt2x00mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+void rt2x00mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct rt2x00_dev *rt2x00dev = hw->priv;
struct data_queue *queue;
diff --git a/drivers/net/wireless/rtlwifi/core.c b/drivers/net/wireless/rtlwifi/core.c
index 2d337a0..2b16dfb 100644
--- a/drivers/net/wireless/rtlwifi/core.c
+++ b/drivers/net/wireless/rtlwifi/core.c
@@ -1309,7 +1309,8 @@ static void rtl_op_rfkill_poll(struct ieee80211_hw *hw)
* before switch channel or power save, or tx buffer packet
* maybe send after offchannel or rf sleep, this may cause
* dis-association by AP */
-static void rtl_op_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void rtl_op_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct rtl_priv *rtlpriv = rtl_priv(hw);

diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
index 0368b9c..8f63463 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -5147,7 +5147,8 @@ out:
mutex_unlock(&wl->mutex);
}

-static void wlcore_op_flush(struct ieee80211_hw *hw, u32 queues, bool drop)
+static void wlcore_op_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
{
struct wl1271 *wl = hw->priv;

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index 7ceed99..3e95bff 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -2546,6 +2546,7 @@ enum ieee80211_roc_type {
* of queues to flush, which is useful if different virtual interfaces
* use different hardware queues; it may also indicate all queues.
* If the parameter @drop is set to %true, pending frames may be dropped.
+ * Note that vif can be NULL.
* The callback can sleep.
*
* @channel_switch: Drivers that need (or want) to offload the channel
@@ -2809,7 +2810,8 @@ struct ieee80211_ops {
struct netlink_callback *cb,
void *data, int len);
#endif
- void (*flush)(struct ieee80211_hw *hw, u32 queues, bool drop);
+ void (*flush)(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop);
void (*channel_switch)(struct ieee80211_hw *hw,
struct ieee80211_channel_switch *ch_switch);
int (*napi_poll)(struct ieee80211_hw *hw, int budget);
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 5d03c47..8f12800 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -722,13 +722,19 @@ static inline void drv_rfkill_poll(struct ieee80211_local *local)
}

static inline void drv_flush(struct ieee80211_local *local,
+ struct ieee80211_sub_if_data *sdata,
u32 queues, bool drop)
{
+ struct ieee80211_vif *vif = sdata ? &sdata->vif : NULL;
+
might_sleep();

+ if (sdata)
+ check_sdata_in_driver(sdata);
+
trace_drv_flush(local, queues, drop);
if (local->ops->flush)
- local->ops->flush(&local->hw, queues, drop);
+ local->ops->flush(&local->hw, vif, queues, drop);
trace_drv_return_void(local);
}

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 9f9b9bd..32cbc9e 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -558,7 +558,7 @@ void ieee80211_flush_queues(struct ieee80211_local *local,
ieee80211_stop_queues_by_reason(&local->hw, IEEE80211_MAX_QUEUE_MAP,
IEEE80211_QUEUE_STOP_REASON_FLUSH);

- drv_flush(local, queues, false);
+ drv_flush(local, sdata, queues, false);

ieee80211_wake_queues_by_reason(&local->hw, IEEE80211_MAX_QUEUE_MAP,
IEEE80211_QUEUE_STOP_REASON_FLUSH);
--
1.9.1

2014-11-06 22:38:26

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 123/162] i3200_edac: Report CE events properly

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <[email protected]>

commit 8a3f075d6c9b3612b4a5fb2af8db82b38b20caf0 upstream.

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.

Signed-off-by: Jason Baron <[email protected]>
Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/edac/i3200_edac.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index be10a74..7d5b369 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -242,11 +242,11 @@ static void i3200_process_error_info(struct mem_ctl_info *mci,
-1, -1,
"i3000 UE", "");
} else if (log & I3200_ECCERRLOG_CE) {
- edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1,
+ edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1,
0, 0, eccerrlog_syndrome(log),
eccerrlog_row(channel, log),
-1, -1,
- "i3000 UE", "");
+ "i3000 CE", "");
}
}
}
--
1.9.1

2014-11-06 23:22:29

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 127/162] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream.

When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.

CC: Jens Axboe <[email protected]>
CC: [email protected]
Coverity-id: 1226871
Signed-off-by: Jan Kara <[email protected]>

Fixed up the, now unused, out label.

Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
block/scsi_ioctl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 625e3e4..91cceb2 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -487,7 +487,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,

if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
err = DRIVER_ERROR << 24;
- goto out;
+ goto error;
}

memset(sense, 0, sizeof(sense));
@@ -497,7 +497,6 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,

blk_execute_rq(q, disk, rq, 0);

-out:
err = rq->errors & 0xff; /* only 8 bit SCSI status */
if (err) {
if (rq->sense_len && rq->sense) {
--
1.9.1

2014-11-06 22:37:42

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 016/162] tg3: Work around HW/FW limitations with vlan encapsulated frames

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit 476c18850c6cbaa3f2bb661ae9710645081563b9 ]

TG3 appears to have an issue performing TSO and checksum offloading
correclty when the frame has been vlan encapsulated (non-accelrated).
In these cases, tcp checksum is not correctly updated.

This patch attempts to work around this issue. After the patch,
802.1ad vlans start working correctly over tg3 devices.

CC: Prashant Sreedharan <[email protected]>
CC: Michael Chan <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index e71ee9f..072f401 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7899,8 +7899,6 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)

entry = tnapi->tx_prod;
base_flags = 0;
- if (skb->ip_summed == CHECKSUM_PARTIAL)
- base_flags |= TXD_FLAG_TCPUDP_CSUM;

mss = skb_shinfo(skb)->gso_size;
if (mss) {
@@ -7916,6 +7914,13 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)

hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb) - ETH_HLEN;

+ /* HW/FW can not correctly segment packets that have been
+ * vlan encapsulated.
+ */
+ if (skb->protocol == htons(ETH_P_8021Q) ||
+ skb->protocol == htons(ETH_P_8021AD))
+ return tg3_tso_bug(tp, skb);
+
if (!skb_is_gso_v6(skb)) {
iph->check = 0;
iph->tot_len = htons(mss + hdr_len);
@@ -7962,6 +7967,17 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
base_flags |= tsflags << 12;
}
}
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ /* HW/FW can not correctly checksum packets that have been
+ * vlan encapsulated.
+ */
+ if (skb->protocol == htons(ETH_P_8021Q) ||
+ skb->protocol == htons(ETH_P_8021AD)) {
+ if (skb_checksum_help(skb))
+ goto drop;
+ } else {
+ base_flags |= TXD_FLAG_TCPUDP_CSUM;
+ }
}

if (tg3_flag(tp, USE_JUMBO_BDFLAG) &&
--
1.9.1

2014-11-06 23:23:40

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 003/162] myri10ge: check for DMA mapping errors

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Stanislaw Gruszka <[email protected]>

[ Upstream commit 10545937e866ccdbb7ab583031dbdcc6b14e4eb4 ]

On IOMMU systems DMA mapping can fail, we need to check for
that possibility.

Signed-off-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 88 ++++++++++++++++--------
1 file changed, 58 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 68026f7..4a474dd 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -872,6 +872,10 @@ static int myri10ge_dma_test(struct myri10ge_priv *mgp, int test_type)
return -ENOMEM;
dmatest_bus = pci_map_page(mgp->pdev, dmatest_page, 0, PAGE_SIZE,
DMA_BIDIRECTIONAL);
+ if (unlikely(pci_dma_mapping_error(mgp->pdev, dmatest_bus))) {
+ __free_page(dmatest_page);
+ return -ENOMEM;
+ }

/* Run a small DMA test.
* The magic multipliers to the length tell the firmware
@@ -1293,6 +1297,7 @@ myri10ge_alloc_rx_pages(struct myri10ge_priv *mgp, struct myri10ge_rx_buf *rx,
int bytes, int watchdog)
{
struct page *page;
+ dma_addr_t bus;
int idx;
#if MYRI10GE_ALLOC_SIZE > 4096
int end_offset;
@@ -1317,11 +1322,21 @@ myri10ge_alloc_rx_pages(struct myri10ge_priv *mgp, struct myri10ge_rx_buf *rx,
rx->watchdog_needed = 1;
return;
}
+
+ bus = pci_map_page(mgp->pdev, page, 0,
+ MYRI10GE_ALLOC_SIZE,
+ PCI_DMA_FROMDEVICE);
+ if (unlikely(pci_dma_mapping_error(mgp->pdev, bus))) {
+ __free_pages(page, MYRI10GE_ALLOC_ORDER);
+ if (rx->fill_cnt - rx->cnt < 16)
+ rx->watchdog_needed = 1;
+ return;
+ }
+
rx->page = page;
rx->page_offset = 0;
- rx->bus = pci_map_page(mgp->pdev, page, 0,
- MYRI10GE_ALLOC_SIZE,
- PCI_DMA_FROMDEVICE);
+ rx->bus = bus;
+
}
rx->info[idx].page = rx->page;
rx->info[idx].page_offset = rx->page_offset;
@@ -2765,6 +2780,35 @@ myri10ge_submit_req(struct myri10ge_tx_buf *tx, struct mcp_kreq_ether_send *src,
mb();
}

+static void myri10ge_unmap_tx_dma(struct myri10ge_priv *mgp,
+ struct myri10ge_tx_buf *tx, int idx)
+{
+ unsigned int len;
+ int last_idx;
+
+ /* Free any DMA resources we've alloced and clear out the skb slot */
+ last_idx = (idx + 1) & tx->mask;
+ idx = tx->req & tx->mask;
+ do {
+ len = dma_unmap_len(&tx->info[idx], len);
+ if (len) {
+ if (tx->info[idx].skb != NULL)
+ pci_unmap_single(mgp->pdev,
+ dma_unmap_addr(&tx->info[idx],
+ bus), len,
+ PCI_DMA_TODEVICE);
+ else
+ pci_unmap_page(mgp->pdev,
+ dma_unmap_addr(&tx->info[idx],
+ bus), len,
+ PCI_DMA_TODEVICE);
+ dma_unmap_len_set(&tx->info[idx], len, 0);
+ tx->info[idx].skb = NULL;
+ }
+ idx = (idx + 1) & tx->mask;
+ } while (idx != last_idx);
+}
+
/*
* Transmit a packet. We need to split the packet so that a single
* segment does not cross myri10ge->tx_boundary, so this makes segment
@@ -2788,7 +2832,7 @@ static netdev_tx_t myri10ge_xmit(struct sk_buff *skb,
u32 low;
__be32 high_swapped;
unsigned int len;
- int idx, last_idx, avail, frag_cnt, frag_idx, count, mss, max_segments;
+ int idx, avail, frag_cnt, frag_idx, count, mss, max_segments;
u16 pseudo_hdr_offset, cksum_offset, queue;
int cum_len, seglen, boundary, rdma_count;
u8 flags, odd_flag;
@@ -2885,9 +2929,12 @@ again:

/* map the skb for DMA */
len = skb_headlen(skb);
+ bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
+ if (unlikely(pci_dma_mapping_error(mgp->pdev, bus)))
+ goto drop;
+
idx = tx->req & tx->mask;
tx->info[idx].skb = skb;
- bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
dma_unmap_addr_set(&tx->info[idx], bus, bus);
dma_unmap_len_set(&tx->info[idx], len, len);

@@ -2986,12 +3033,16 @@ again:
break;

/* map next fragment for DMA */
- idx = (count + tx->req) & tx->mask;
frag = &skb_shinfo(skb)->frags[frag_idx];
frag_idx++;
len = skb_frag_size(frag);
bus = skb_frag_dma_map(&mgp->pdev->dev, frag, 0, len,
DMA_TO_DEVICE);
+ if (unlikely(pci_dma_mapping_error(mgp->pdev, bus))) {
+ myri10ge_unmap_tx_dma(mgp, tx, idx);
+ goto drop;
+ }
+ idx = (count + tx->req) & tx->mask;
dma_unmap_addr_set(&tx->info[idx], bus, bus);
dma_unmap_len_set(&tx->info[idx], len, len);
}
@@ -3022,31 +3073,8 @@ again:
return NETDEV_TX_OK;

abort_linearize:
- /* Free any DMA resources we've alloced and clear out the skb
- * slot so as to not trip up assertions, and to avoid a
- * double-free if linearizing fails */
+ myri10ge_unmap_tx_dma(mgp, tx, idx);

- last_idx = (idx + 1) & tx->mask;
- idx = tx->req & tx->mask;
- tx->info[idx].skb = NULL;
- do {
- len = dma_unmap_len(&tx->info[idx], len);
- if (len) {
- if (tx->info[idx].skb != NULL)
- pci_unmap_single(mgp->pdev,
- dma_unmap_addr(&tx->info[idx],
- bus), len,
- PCI_DMA_TODEVICE);
- else
- pci_unmap_page(mgp->pdev,
- dma_unmap_addr(&tx->info[idx],
- bus), len,
- PCI_DMA_TODEVICE);
- dma_unmap_len_set(&tx->info[idx], len, 0);
- tx->info[idx].skb = NULL;
- }
- idx = (idx + 1) & tx->mask;
- } while (idx != last_idx);
if (skb_is_gso(skb)) {
netdev_err(mgp->dev, "TSO but wanted to linearize?!?!?\n");
goto drop;
--
1.9.1

2014-11-06 23:23:42

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 014/162] bridge: Fix br_should_learn to check vlan_enabled

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit c095f248e63ada504dd90c90baae673ae10ee3fe ]

As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will
not be initialized in br_should_learn() as that function
is called only from br_handle_local_finish(). That is
an input handler for link-local ethernet traffic so it perfectly
correct to check br->vlan_enabled here.

Reported-by: Toshiaki Makita<[email protected]>
Fixes: 20adfa1 bridge: Check if vlan filtering is enabled only once.
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/bridge/br_vlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index f71b4d8..a22abd1 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -265,7 +265,7 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
struct net_port_vlans *v;

/* If filtering was disabled at input, let it pass. */
- if (!BR_INPUT_SKB_CB(skb)->vlan_filtered)
+ if (!br->vlan_enabled)
return true;

v = rcu_dereference(p->vlan_info);
--
1.9.1

2014-11-06 23:23:36

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 009/162] openvswitch: fix panic with multiple vlan headers

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Benc <[email protected]>

[ Upstream commit 2ba5af42a7b59ef01f9081234d8855140738defd ]

When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.

This leads to two things:

1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.

2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.

Reported-by: Yulong Pei <[email protected]>
Signed-off-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/openvswitch/actions.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 65cfaa8..07c4ae3 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -42,6 +42,9 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,

static int make_writable(struct sk_buff *skb, int write_len)
{
+ if (!pskb_may_pull(skb, write_len))
+ return -ENOMEM;
+
if (!skb_cloned(skb) || skb_clone_writable(skb, write_len))
return 0;

@@ -70,6 +73,8 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)

vlan_set_encap_proto(skb, vhdr);
skb->mac_header += VLAN_HLEN;
+ if (skb_network_offset(skb) < ETH_HLEN)
+ skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_len(skb);

return 0;
--
1.9.1

2014-11-06 23:23:34

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 020/162] macvtap: Fix race between device delete and open.

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Vlad Yasevich <[email protected]>

[ Upstream commit 40b8fe45d1f094e3babe7b2dc2b71557ab71401d ]

In macvtap device delete and open calls can race and
this causes a list curruption of the vlan queue_list.

The race intself is triggered by the idr accessors
that located the vlan device. The device is stored
into and removed from the idr under both an rtnl and
a mutex. However, when attempting to locate the device
in idr, only a mutex is taken. As a result, once cpu
perfoming a delete may take an rtnl and wait for the mutex,
while another cput doing an open() will take the idr
mutex first to fetch the device pointer and later take
an rtnl to add a queue for the device which may have
just gotten deleted.

With this patch, we now hold the rtnl for the duration
of the macvtap_open() call thus making sure that
open will not race with delete.

CC: Michael S. Tsirkin <[email protected]>
CC: Jason Wang <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/macvtap.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 73ffad8..eed7544 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -108,17 +108,15 @@ out:
return err;
}

+/* Requires RTNL */
static int macvtap_set_queue(struct net_device *dev, struct file *file,
struct macvtap_queue *q)
{
struct macvlan_dev *vlan = netdev_priv(dev);
- int err = -EBUSY;

- rtnl_lock();
if (vlan->numqueues == MAX_MACVTAP_QUEUES)
- goto out;
+ return -EBUSY;

- err = 0;
rcu_assign_pointer(q->vlan, vlan);
rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
sock_hold(&q->sk);
@@ -132,9 +130,7 @@ static int macvtap_set_queue(struct net_device *dev, struct file *file,
vlan->numvtaps++;
vlan->numqueues++;

-out:
- rtnl_unlock();
- return err;
+ return 0;
}

static int macvtap_disable_queue(struct macvtap_queue *q)
@@ -450,11 +446,12 @@ static void macvtap_sock_destruct(struct sock *sk)
static int macvtap_open(struct inode *inode, struct file *file)
{
struct net *net = current->nsproxy->net_ns;
- struct net_device *dev = dev_get_by_macvtap_minor(iminor(inode));
+ struct net_device *dev;
struct macvtap_queue *q;
- int err;
+ int err = -ENODEV;

- err = -ENODEV;
+ rtnl_lock();
+ dev = dev_get_by_macvtap_minor(iminor(inode));
if (!dev)
goto out;

@@ -494,6 +491,7 @@ out:
if (dev)
dev_put(dev);

+ rtnl_unlock();
return err;
}

--
1.9.1

2014-11-06 23:24:40

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 002/162] rtnetlink: fix VF info size

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Benc <[email protected]>

[ Upstream commit 945a36761fd7877660f630bbdeb4ff9ff80d1935 ]

Commit 1d8faf48c74b8 ("net/core: Add VF link state control") added new
attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
may trigger warnings in rtnl_getlink and similar functions when many VF
links are enabled, as the information does not fit into the allocated skb.

Fixes: 1d8faf48c74b8 ("net/core: Add VF link state control")
Reported-by: Yulong Pei <[email protected]>
Signed-off-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/core/rtnetlink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a7b6520..8463178 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -739,7 +739,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
(nla_total_size(sizeof(struct ifla_vf_mac)) +
nla_total_size(sizeof(struct ifla_vf_vlan)) +
nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
- nla_total_size(sizeof(struct ifla_vf_spoofchk)));
+ nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
+ nla_total_size(sizeof(struct ifla_vf_link_state)));
return size;
} else
return 0;
--
1.9.1

2014-11-06 23:24:45

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 015/162] net: allow macvlans to move to net namespace

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Francesco Ruggeri <[email protected]>

[ Upstream commit 0d0162e7a33d3710b9604e7c68c0f31f5c457428 ]

I cannot move a macvlan interface created on top of a bonding interface
to a different namespace:

% ip netns add dummy0
% ip link add link bond0 mac0 type macvlan
% ip link set mac0 netns dummy0
RTNETLINK answers: Invalid argument
%

The problem seems to be that commit f9399814927a ("bonding: Don't allow
bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
on bonding interfaces, and commit 797f87f83b60 ("macvlan: fix netdev
feature propagation from lower device") causes macvlan interfaces
to inherit its features from the lower device.

NETIF_F_NETNS_LOCAL should not be inherited from the lower device
by a macvlan.
Patch tested on 3.16.

Signed-off-by: Francesco Ruggeri <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/macvlan.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a430b99..0831e2f 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -710,6 +710,7 @@ static netdev_features_t macvlan_fix_features(struct net_device *dev,
features,
mask);
features |= ALWAYS_ON_FEATURES;
+ features &= ~NETIF_F_NETNS_LOCAL;

return features;
}
--
1.9.1

2014-11-06 23:24:59

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 018/162] xfrm: Generate blackhole routes only from route lookup functions

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <[email protected]>

[ Upstream commit f92ee61982d6da15a9e49664ecd6405a15a2ee56 ]

Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1d ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/dst.h | 15 ++++++++++++++-
net/ipv4/route.c | 6 +++---
net/ipv6/ip6_output.c | 4 ++--
net/xfrm/xfrm_policy.c | 18 +++++++++++++++++-
4 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 44995c1..8211e32 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,7 +477,16 @@ static inline struct dst_entry *xfrm_lookup(struct net *net,
int flags)
{
return dst_orig;
-}
+}
+
+static inline struct dst_entry *xfrm_lookup_route(struct net *net,
+ struct dst_entry *dst_orig,
+ const struct flowi *fl,
+ struct sock *sk,
+ int flags)
+{
+ return dst_orig;
+}

static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
{
@@ -489,6 +498,10 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
const struct flowi *fl, struct sock *sk,
int flags);

+struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig,
+ const struct flowi *fl, struct sock *sk,
+ int flags);
+
/* skb attached with this dst needs transformation if dst->xfrm is valid */
static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
{
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index caadba0..4107687 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2270,9 +2270,9 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
return rt;

if (flp4->flowi4_proto)
- rt = (struct rtable *) xfrm_lookup(net, &rt->dst,
- flowi4_to_flowi(flp4),
- sk, 0);
+ rt = (struct rtable *)xfrm_lookup_route(net, &rt->dst,
+ flowi4_to_flowi(flp4),
+ sk, 0);

return rt;
}
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6622e14..e34ca3d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -991,7 +991,7 @@ struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
if (can_sleep)
fl6->flowi6_flags |= FLOWI_FLAG_CAN_SLEEP;

- return xfrm_lookup(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
+ return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
}
EXPORT_SYMBOL_GPL(ip6_dst_lookup_flow);

@@ -1027,7 +1027,7 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
if (can_sleep)
fl6->flowi6_flags |= FLOWI_FLAG_CAN_SLEEP;

- return xfrm_lookup(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
+ return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
}
EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup_flow);

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 9a91f74..eb918d3 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2150,7 +2150,7 @@ restart:
xfrm_pols_put(pols, drop_pols);
XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTNOSTATES);

- return make_blackhole(net, family, dst_orig);
+ return ERR_PTR(-EREMOTE);
}
if (fl->flowi_flags & FLOWI_FLAG_CAN_SLEEP) {
DECLARE_WAITQUEUE(wait, current);
@@ -2222,6 +2222,22 @@ dropdst:
}
EXPORT_SYMBOL(xfrm_lookup);

+/* Callers of xfrm_lookup_route() must ensure a call to dst_output().
+ * Otherwise we may send out blackholed packets.
+ */
+struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig,
+ const struct flowi *fl,
+ struct sock *sk, int flags)
+{
+ struct dst_entry *dst = xfrm_lookup(net, dst_orig, fl, sk, flags);
+
+ if (IS_ERR(dst) && PTR_ERR(dst) == -EREMOTE)
+ return make_blackhole(net, dst_orig->ops->family, dst_orig);
+
+ return dst;
+}
+EXPORT_SYMBOL(xfrm_lookup_route);
+
static inline int
xfrm_secpath_reject(int idx, struct sk_buff *skb, const struct flowi *fl)
{
--
1.9.1

2014-11-06 23:25:00

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 007/162] tcp: fix ssthresh and undo for consecutive short FRTO episodes

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>

[ Upstream commit 0c9ab09223fe9922baeb22546c9a90d774a4bde6 ]

Fix TCP FRTO logic so that it always notices when snd_una advances,
indicating that any RTO after that point will be a new and distinct
loss episode.

Previously there was a very specific sequence that could cause FRTO to
fail to notice a new loss episode had started:

(1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
(2) receiver ACKs packet 1
(3) FRTO sends 2 more packets
(4) RTO timer fires again (should start a new loss episode)

The problem was in step (3) above, where tcp_process_loss() returned
early (in the spot marked "Step 2.b"), so that it never got to the
logic to clear icsk_retransmits. Thus icsk_retransmits stayed
non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
icsk_retransmits, decide that this RTO is not a new episode, and
decide not to cut ssthresh and remember the current cwnd and ssthresh
for undo.

There were two main consequences to the bug that we have
observed. First, ssthresh was not decreased in step (4). Second, when
there was a series of such FRTO (1-4) sequences that happened to be
followed by an FRTO undo, we would restore the cwnd and ssthresh from
before the entire series started (instead of the cwnd and ssthresh
from before the most recent RTO). This could result in cwnd and
ssthresh being restored to values much bigger than the proper values.

Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/ipv4/tcp_input.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index abd367b..2ab6b821 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2674,7 +2674,6 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack)
*/
static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack)
{
- struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
bool recovered = !before(tp->snd_una, tp->high_seq);

@@ -2700,12 +2699,9 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack)

if (recovered) {
/* F-RTO RFC5682 sec 3.1 step 2.a and 1st part of step 3.a */
- icsk->icsk_retransmits = 0;
tcp_try_undo_recovery(sk);
return;
}
- if (flag & FLAG_DATA_ACKED)
- icsk->icsk_retransmits = 0;
if (tcp_is_reno(tp)) {
/* A Reno DUPACK means new data in F-RTO step 2.b above are
* delivered. Lower inflight to clock out (re)tranmissions.
@@ -3394,8 +3390,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
tcp_rearm_rto(sk);

- if (after(ack, prior_snd_una))
+ if (after(ack, prior_snd_una)) {
flag |= FLAG_SND_UNA_ADVANCED;
+ icsk->icsk_retransmits = 0;
+ }

prior_fackets = tp->fackets_out;
prior_in_flight = tcp_packets_in_flight(tp);
--
1.9.1

2014-11-06 23:25:04

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 005/162] sit: Fix ipip6_tunnel_lookup device matching criteria

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Shmulik Ladkani <[email protected]>

[ Upstream commit bc8fc7b8f825ef17a0fb9e68c18ce94fa66ab337 ]

As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.

However the comparison was incorrectly based on skb->dev->iflink.

Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.

This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).

Signed-off-by: Shmulik Ladkani <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/ipv6/sit.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 8f59675..9568c1f 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -101,19 +101,19 @@ static struct ip_tunnel *ipip6_tunnel_lookup(struct net *net,
for_each_ip_tunnel_rcu(t, sitn->tunnels_r_l[h0 ^ h1]) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
- (!dev || !t->parms.link || dev->iflink == t->parms.link) &&
+ (!dev || !t->parms.link || dev->ifindex == t->parms.link) &&
(t->dev->flags & IFF_UP))
return t;
}
for_each_ip_tunnel_rcu(t, sitn->tunnels_r[h0]) {
if (remote == t->parms.iph.daddr &&
- (!dev || !t->parms.link || dev->iflink == t->parms.link) &&
+ (!dev || !t->parms.link || dev->ifindex == t->parms.link) &&
(t->dev->flags & IFF_UP))
return t;
}
for_each_ip_tunnel_rcu(t, sitn->tunnels_l[h1]) {
if (local == t->parms.iph.saddr &&
- (!dev || !t->parms.link || dev->iflink == t->parms.link) &&
+ (!dev || !t->parms.link || dev->ifindex == t->parms.link) &&
(t->dev->flags & IFF_UP))
return t;
}
--
1.9.1

2014-11-06 23:24:56

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 010/162] vxlan: fix incorrect initializer in union vxlan_addr

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Gerhard Stenzel <[email protected]>

[ Upstream commit a45e92a599e77ee6a850eabdd0141633fde03915 ]

The first initializer in the following

union vxlan_addr ipa = {
.sin.sin_addr.s_addr = tip,
.sa.sa_family = AF_INET,
};

is optimised away by the compiler, due to the second initializer,
therefore initialising .sin.sin_addr.s_addr always to 0.
This results in netlink messages indicating a L3 miss never contain the
missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
The problem affects user space programs relying on an IP address being
sent as part of a netlink message indicating a L3 miss.

Changing
.sa.sa_family = AF_INET,
to
.sin.sin_family = AF_INET,
fixes the problem.

Signed-off-by: Gerhard Stenzel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/net/vxlan.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7e748a1..5543eb1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1224,7 +1224,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
} else if (vxlan->flags & VXLAN_F_L3MISS) {
union vxlan_addr ipa = {
.sin.sin_addr.s_addr = tip,
- .sa.sa_family = AF_INET,
+ .sin.sin_family = AF_INET,
};

vxlan_ip_miss(dev, &ipa);
@@ -1385,7 +1385,7 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb)
} else if (vxlan->flags & VXLAN_F_L3MISS) {
union vxlan_addr ipa = {
.sin6.sin6_addr = msg->target,
- .sa.sa_family = AF_INET6,
+ .sin6.sin6_family = AF_INET6,
};

vxlan_ip_miss(dev, &ipa);
@@ -1418,7 +1418,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
if (!n && (vxlan->flags & VXLAN_F_L3MISS)) {
union vxlan_addr ipa = {
.sin.sin_addr.s_addr = pip->daddr,
- .sa.sa_family = AF_INET,
+ .sin.sin_family = AF_INET,
};

vxlan_ip_miss(dev, &ipa);
@@ -1439,7 +1439,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
if (!n && (vxlan->flags & VXLAN_F_L3MISS)) {
union vxlan_addr ipa = {
.sin6.sin6_addr = pip6->daddr,
- .sa.sa_family = AF_INET6,
+ .sin6.sin6_family = AF_INET6,
};

vxlan_ip_miss(dev, &ipa);
--
1.9.1

2014-11-06 23:24:52

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 019/162] xfrm: Generate queueing routes only from route lookup functions

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <[email protected]>

[ Upstream commit b8c203b2d2fc961bafd53b41d5396bbcdec55998 ]

Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e71 ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/dst.h | 1 +
net/xfrm/xfrm_policy.c | 32 ++++++++++++++++++++++++--------
2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 8211e32..4b368ae 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -467,6 +467,7 @@ void dst_init(void);
/* Flags for xfrm_lookup flags argument. */
enum {
XFRM_LOOKUP_ICMP = 1 << 0,
+ XFRM_LOOKUP_QUEUE = 1 << 1,
};

struct flowi;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index eb918d3..0ee05f0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -46,6 +46,11 @@ static DEFINE_SPINLOCK(xfrm_policy_sk_bundle_lock);
static struct dst_entry *xfrm_policy_sk_bundles;
static DEFINE_RWLOCK(xfrm_policy_lock);

+struct xfrm_flo {
+ struct dst_entry *dst_orig;
+ u8 flags;
+};
+
static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
static struct xfrm_policy_afinfo __rcu *xfrm_policy_afinfo[NPROTO]
__read_mostly;
@@ -1882,13 +1887,14 @@ static int xdst_queue_output(struct sk_buff *skb)
}

static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net,
- struct dst_entry *dst,
+ struct xfrm_flo *xflo,
const struct flowi *fl,
int num_xfrms,
u16 family)
{
int err;
struct net_device *dev;
+ struct dst_entry *dst;
struct dst_entry *dst1;
struct xfrm_dst *xdst;

@@ -1896,10 +1902,13 @@ static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net,
if (IS_ERR(xdst))
return xdst;

- if (net->xfrm.sysctl_larval_drop || num_xfrms <= 0 ||
+ if (!(xflo->flags & XFRM_LOOKUP_QUEUE) ||
+ net->xfrm.sysctl_larval_drop ||
+ num_xfrms <= 0 ||
(fl->flowi_flags & FLOWI_FLAG_CAN_SLEEP))
return xdst;

+ dst = xflo->dst_orig;
dst1 = &xdst->u.dst;
dst_hold(dst);
xdst->route = dst;
@@ -1941,7 +1950,7 @@ static struct flow_cache_object *
xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
struct flow_cache_object *oldflo, void *ctx)
{
- struct dst_entry *dst_orig = (struct dst_entry *)ctx;
+ struct xfrm_flo *xflo = (struct xfrm_flo *)ctx;
struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX];
struct xfrm_dst *xdst, *new_xdst;
int num_pols = 0, num_xfrms = 0, i, err, pol_dead;
@@ -1982,7 +1991,8 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
goto make_dummy_bundle;
}

- new_xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family, dst_orig);
+ new_xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family,
+ xflo->dst_orig);
if (IS_ERR(new_xdst)) {
err = PTR_ERR(new_xdst);
if (err != -EAGAIN)
@@ -2016,7 +2026,7 @@ make_dummy_bundle:
/* We found policies, but there's no bundles to instantiate:
* either because the policy blocks, has no transformations or
* we could not build template (no xfrm_states).*/
- xdst = xfrm_create_dummy_bundle(net, dst_orig, fl, num_xfrms, family);
+ xdst = xfrm_create_dummy_bundle(net, xflo, fl, num_xfrms, family);
if (IS_ERR(xdst)) {
xfrm_pols_put(pols, num_pols);
return ERR_CAST(xdst);
@@ -2116,13 +2126,18 @@ restart:
}

if (xdst == NULL) {
+ struct xfrm_flo xflo;
+
+ xflo.dst_orig = dst_orig;
+ xflo.flags = flags;
+
/* To accelerate a bit... */
if ((dst_orig->flags & DST_NOXFRM) ||
!net->xfrm.policy_count[XFRM_POLICY_OUT])
goto nopol;

flo = flow_cache_lookup(net, fl, family, dir,
- xfrm_bundle_lookup, dst_orig);
+ xfrm_bundle_lookup, &xflo);
if (flo == NULL)
goto nopol;
if (IS_ERR(flo)) {
@@ -2229,7 +2244,8 @@ struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig,
const struct flowi *fl,
struct sock *sk, int flags)
{
- struct dst_entry *dst = xfrm_lookup(net, dst_orig, fl, sk, flags);
+ struct dst_entry *dst = xfrm_lookup(net, dst_orig, fl, sk,
+ flags | XFRM_LOOKUP_QUEUE);

if (IS_ERR(dst) && PTR_ERR(dst) == -EREMOTE)
return make_blackhole(net, dst_orig->ops->family, dst_orig);
@@ -2503,7 +2519,7 @@ int __xfrm_route_forward(struct sk_buff *skb, unsigned short family)

skb_dst_force(skb);

- dst = xfrm_lookup(net, skb_dst(skb), &fl, NULL, 0);
+ dst = xfrm_lookup(net, skb_dst(skb), &fl, NULL, XFRM_LOOKUP_QUEUE);
if (IS_ERR(dst)) {
res = 0;
dst = NULL;
--
1.9.1

2014-11-06 23:24:49

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 006/162] tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <[email protected]>

[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ]

Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.

Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Diagnosed-by: Willem de Bruijn <[email protected]>
Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/inet_connection_sock.h | 1 +
include/net/sock.h | 1 -
include/net/tcp.h | 1 +
net/ipv4/tcp_ipv4.c | 5 +++--
net/ipv4/tcp_output.c | 2 +-
net/ipv6/tcp_ipv6.c | 3 ++-
6 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c55aeed..cf92728 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -62,6 +62,7 @@ struct inet_connection_sock_af_ops {
void (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
int (*bind_conflict)(const struct sock *sk,
const struct inet_bind_bucket *tb, bool relax);
+ void (*mtu_reduced)(struct sock *sk);
};

/** inet_connection_sock - INET connection oriented sock
diff --git a/include/net/sock.h b/include/net/sock.h
index 749bad5..5db5b7f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -950,7 +950,6 @@ struct proto {
struct sk_buff *skb);

void (*release_cb)(struct sock *sk);
- void (*mtu_reduced)(struct sock *sk);

/* Keeping track of sk's, looking them up, and port selection methods. */
void (*hash)(struct sock *sk);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0a00e7c..a9b7191 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -450,6 +450,7 @@ const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
*/

void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb);
+void tcp_v4_mtu_reduced(struct sock *sk);
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
struct sock *tcp_create_openreq_child(struct sock *sk,
struct request_sock *req,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 77fe507..9b726c0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -268,7 +268,7 @@ EXPORT_SYMBOL(tcp_v4_connect);
* It can be called through tcp_release_cb() if socket was owned by user
* at the time tcp_v4_err() was called to handle ICMP message.
*/
-static void tcp_v4_mtu_reduced(struct sock *sk)
+void tcp_v4_mtu_reduced(struct sock *sk)
{
struct dst_entry *dst;
struct inet_sock *inet = inet_sk(sk);
@@ -299,6 +299,7 @@ static void tcp_v4_mtu_reduced(struct sock *sk)
tcp_simple_retransmit(sk);
} /* else let the usual retransmit timer handle it */
}
+EXPORT_SYMBOL(tcp_v4_mtu_reduced);

static void do_redirect(struct sk_buff *skb, struct sock *sk)
{
@@ -2107,6 +2108,7 @@ const struct inet_connection_sock_af_ops ipv4_specific = {
.compat_setsockopt = compat_ip_setsockopt,
.compat_getsockopt = compat_ip_getsockopt,
#endif
+ .mtu_reduced = tcp_v4_mtu_reduced,
};
EXPORT_SYMBOL(ipv4_specific);

@@ -2721,7 +2723,6 @@ struct proto tcp_prot = {
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v4_do_rcv,
.release_cb = tcp_release_cb,
- .mtu_reduced = tcp_v4_mtu_reduced,
.hash = inet_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fa5b391..f587e1c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -785,7 +785,7 @@ void tcp_release_cb(struct sock *sk)
__sock_put(sk);
}
if (flags & (1UL << TCP_MTU_REDUCED_DEFERRED)) {
- sk->sk_prot->mtu_reduced(sk);
+ inet_csk(sk)->icsk_af_ops->mtu_reduced(sk);
__sock_put(sk);
}
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 282874b..d83045c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1640,6 +1640,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific = {
.compat_setsockopt = compat_ipv6_setsockopt,
.compat_getsockopt = compat_ipv6_getsockopt,
#endif
+ .mtu_reduced = tcp_v4_mtu_reduced,
};

#ifdef CONFIG_TCP_MD5SIG
@@ -1671,6 +1672,7 @@ static const struct inet_connection_sock_af_ops ipv6_mapped = {
.compat_setsockopt = compat_ipv6_setsockopt,
.compat_getsockopt = compat_ipv6_getsockopt,
#endif
+ .mtu_reduced = tcp_v6_mtu_reduced,
};

#ifdef CONFIG_TCP_MD5SIG
@@ -1907,7 +1909,6 @@ struct proto tcpv6_prot = {
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v6_do_rcv,
.release_cb = tcp_release_cb,
- .mtu_reduced = tcp_v6_mtu_reduced,
.hash = tcp_v6_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
--
1.9.1

2014-11-06 23:24:36

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 001/162] netlink: reset network header before passing to taps

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

[ Upstream commit 4e48ed883c72e78c5a910f8831ffe90c9b18f0ec ]

netlink doesn't set any network header offset thus when the skb is
being passed to tap devices via dev_queue_xmit_nit(), it emits klog
false positives due to it being unset like:

...
[ 124.990397] protocol 0000 is buggy, dev nlmon0
[ 124.990411] protocol 0000 is buggy, dev nlmon0
...

So just reset the network header before passing to the device; for
packet sockets that just means nothing will change - mac and net
offset hold the same value just as before.

Reported-by: Marcel Holtmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dbba678..cb5b7e0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -204,7 +204,7 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
if (nskb) {
nskb->dev = dev;
nskb->protocol = htons((u16) sk->sk_protocol);
-
+ skb_reset_network_header(nskb);
ret = dev_queue_xmit(nskb);
if (unlikely(ret > 0))
ret = net_xmit_errno(ret);
--
1.9.1

2014-11-06 23:28:08

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 011/162] l2tp: fix race while getting PMTU on PPP pseudo-wire

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Guillaume Nault <[email protected]>

[ Upstream commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb ]

Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.

The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
could return NULL if tunnel->sock->sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:

[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
[ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
[ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1
[ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
[ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb
[ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb
[ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP <ffff8800c43c7de8>
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---

Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/l2tp/l2tp_ppp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index cfa1406..e6c6d81 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -756,7 +756,8 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
/* If PMTU discovery was enabled, use the MTU that was discovered */
dst = sk_dst_get(tunnel->sock);
if (dst != NULL) {
- u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock));
+ u32 pmtu = dst_mtu(dst);
+
if (pmtu != 0)
session->mtu = session->mru = pmtu -
PPPOL2TP_HEADER_OVERHEAD;
--
1.9.1

2014-11-06 23:28:11

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 008/162] packet: handle too big packets for PACKET_V3

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit dc808110bb62b64a448696ecac3938902c92e1ab ]

af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.

This is not generally the case, even if most existing tools do it right.

This patch clamps too long frames as API permits, and issue a one time
error on syslog.

[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82

In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)

Signed-off-by: Eric Dumazet <[email protected]>
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/packet/af_packet.c | 17 +++++++++++++++++
net/packet/internal.h | 1 +
2 files changed, 18 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 88cfbc1..a846125 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -565,6 +565,7 @@ static void init_prb_bdqc(struct packet_sock *po,
p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;

+ p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
prb_init_ft_ops(p1, req_u);
prb_setup_retire_blk_timer(po, tx_ring);
prb_open_block(p1, pbd);
@@ -1814,6 +1815,18 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
if ((int)snaplen < 0)
snaplen = 0;
}
+ } else if (unlikely(macoff + snaplen >
+ GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len)) {
+ u32 nval;
+
+ nval = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len - macoff;
+ pr_err_once("tpacket_rcv: packet too big, clamped from %u to %u. macoff=%u\n",
+ snaplen, nval, macoff);
+ snaplen = nval;
+ if (unlikely((int)snaplen < 0)) {
+ snaplen = 0;
+ macoff = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len;
+ }
}
spin_lock(&sk->sk_receive_queue.lock);
h.raw = packet_current_rx_frame(po, skb,
@@ -3610,6 +3623,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
goto out;
if (unlikely(req->tp_block_size & (PAGE_SIZE - 1)))
goto out;
+ if (po->tp_version >= TPACKET_V3 &&
+ (int)(req->tp_block_size -
+ BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
+ goto out;
if (unlikely(req->tp_frame_size < po->tp_hdrlen +
po->tp_reserve))
goto out;
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 1035fa2..ca086c0 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -29,6 +29,7 @@ struct tpacket_kbdq_core {
char *pkblk_start;
char *pkblk_end;
int kblk_size;
+ unsigned int max_frame_len;
unsigned int knum_blocks;
uint64_t knxt_seq_num;
char *prev;
--
1.9.1

2014-11-07 17:30:18

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 115/165 corrected] usb: gadget: function: acm: make f_acm pass USB20CV Chapter9

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 52ec49a5e56a27c5b6f8217708783eff39f24c16 upstream.

During Halt Endpoint Test, our interrupt endpoint
will be disabled, which will clear out ep->desc
to NULL. Unless we call config_ep_by_speed() again,
we will not be able to enable this endpoint which
will make us fail that test.

Fixes: f9c56cd (usb: gadget: Clear usb_endpoint_descriptor
inside the struct usb_ep on disable)
Signed-off-by: Felipe Balbi <[email protected]>
[ kamal: backport to 3.13-stable: context ]
Signed-off-by: Kamal Mostafa <[email protected]>
---
drivers/usb/gadget/f_acm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/gadget/f_acm.c b/drivers/usb/gadget/f_acm.c
index ab1065a..3384486 100644
--- a/drivers/usb/gadget/f_acm.c
+++ b/drivers/usb/gadget/f_acm.c
@@ -430,11 +430,12 @@ static int acm_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
if (acm->notify->driver_data) {
VDBG(cdev, "reset acm control interface %d\n", intf);
usb_ep_disable(acm->notify);
- } else {
- VDBG(cdev, "init acm ctrl interface %d\n", intf);
+ }
+
+ if (!acm->notify->desc)
if (config_ep_by_speed(cdev->gadget, f, acm->notify))
return -EINVAL;
- }
+
usb_ep_enable(acm->notify);
acm->notify->driver_data = acm;

--
1.9.1

2014-11-07 17:30:22

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 165/165] net: sctp: fix remote memory pressure from excessive queueing

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
[...]
---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Reference: CVE-2014-3688
Signed-off-by: Kamal Mostafa <[email protected]>
---
net/sctp/inqueue.c | 33 +++++++--------------------------
net/sctp/sm_statefuns.c | 3 +++
2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 5856932..560cd41 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -141,18 +141,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
} else {
/* Nothing to do. Next chunk in the packet, please. */
ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
/* Force chunk->skb->data to chunk->chunk_end. */
- skb_pull(chunk->skb,
- chunk->chunk_end - chunk->skb->data);
-
- /* Verify that we have at least chunk headers
- * worth of buffer left.
- */
- if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
- sctp_chunk_free(chunk);
- chunk = queue->in_progress = NULL;
- }
+ skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+ /* We are guaranteed to pull a SCTP header. */
}
}

@@ -188,24 +179,14 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
chunk->subh.v = NULL; /* Subheader is no longer valid. */

- if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+ if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+ skb_tail_pointer(chunk->skb)) {
/* This is not a singleton */
chunk->singleton = 0;
} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
- /* RFC 2960, Section 6.10 Bundling
- *
- * Partial chunks MUST NOT be placed in an SCTP packet.
- * If the receiver detects a partial chunk, it MUST drop
- * the chunk.
- *
- * Since the end of the chunk is past the end of our buffer
- * (which contains the whole packet, we can freely discard
- * the whole packet.
- */
- sctp_chunk_free(chunk);
- chunk = queue->in_progress = NULL;
-
- return NULL;
+ /* Discard inside state machine. */
+ chunk->pdiscard = 1;
+ chunk->chunk_end = skb_tail_pointer(chunk->skb);
} else {
/* We are at the end of the packet, so mark the chunk
* in case we need to send a SACK.
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index bf97a83..1c73c33 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -171,6 +171,9 @@ sctp_chunk_length_valid(struct sctp_chunk *chunk,
{
__u16 chunk_length = ntohs(chunk->chunk_hdr->length);

+ /* Previously already marked? */
+ if (unlikely(chunk->pdiscard))
+ return 0;
if (unlikely(chunk_length < required_length))
return 0;

--
1.9.1

2014-11-07 17:30:51

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 164/165] net: sctp: fix panic on duplicate ASCONF chunks

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Reference: CVE-2014-3687
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/sctp/sctp.h | 5 +++++
net/sctp/associola.c | 2 ++
2 files changed, 7 insertions(+)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index c5fe806..d5bc97e 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -453,6 +453,11 @@ static inline void sctp_assoc_pending_pmtu(struct sock *sk, struct sctp_associat
asoc->pmtu_pending = 0;
}

+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+ return !list_empty(&chunk->list);
+}
+
/* Walk through a list of TLV parameters. Don't trust the
* individual parameter lengths and instead depend on
* the chunk length to indicate when to stop. Make sure
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 6a4c5a7..3194f7f 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1644,6 +1644,8 @@ struct sctp_chunk *sctp_assoc_lookup_asconf_ack(
* ack chunk whose serial number matches that of the request.
*/
list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+ if (sctp_chunk_pending(ack))
+ continue;
if (ack->subh.addip_hdr->serial == serial) {
sctp_chunk_hold(ack);
return ack;
--
1.9.1

2014-11-07 17:30:50

by Kamal Mostafa

[permalink] [raw]
Subject: [PATCH 3.13 163/165] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks

3.13.11.11 -stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8144fb1c>] skb_put+0x5c/0x70
[<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<ffffffff81496ac5>] ip_rcv+0x275/0x350
[<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Reference: CVE-2014-3673
Signed-off-by: Kamal Mostafa <[email protected]>
---
include/net/sctp/sm.h | 6 +--
net/sctp/sm_make_chunk.c | 99 +++++++++++++++++++++++++++---------------------
net/sctp/sm_statefuns.c | 18 +--------
3 files changed, 60 insertions(+), 63 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 4ef75af..c91b6f5 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -249,9 +249,9 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *,
int, __be16);
struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
- struct sctp_paramhdr *param_hdr, void *chunk_end,
- struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+ struct sctp_chunk *chunk, bool addr_param_needed,
+ struct sctp_paramhdr **errp);
struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
struct sctp_chunk *asconf);
int sctp_process_asconf_ack(struct sctp_association *asoc,
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 1d71674..e45212e 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3110,50 +3110,63 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
return SCTP_ERROR_NO_ERROR;
}

-/* Verify the ASCONF packet before we process it. */
-int sctp_verify_asconf(const struct sctp_association *asoc,
- struct sctp_paramhdr *param_hdr, void *chunk_end,
- struct sctp_paramhdr **errp) {
- sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+ struct sctp_chunk *chunk, bool addr_param_needed,
+ struct sctp_paramhdr **errp)
+{
+ sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
union sctp_params param;
- int length, plen;
-
- param.v = (sctp_paramhdr_t *) param_hdr;
- while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
- length = ntohs(param.p->length);
- *errp = param.p;
+ bool addr_param_seen = false;

- if (param.v > chunk_end - length ||
- length < sizeof(sctp_paramhdr_t))
- return 0;
+ sctp_walk_params(param, addip, addip_hdr.params) {
+ size_t length = ntohs(param.p->length);

+ *errp = param.p;
switch (param.p->type) {
+ case SCTP_PARAM_ERR_CAUSE:
+ break;
+ case SCTP_PARAM_IPV4_ADDRESS:
+ if (length != sizeof(sctp_ipv4addr_param_t))
+ return false;
+ addr_param_seen = true;
+ break;
+ case SCTP_PARAM_IPV6_ADDRESS:
+ if (length != sizeof(sctp_ipv6addr_param_t))
+ return false;
+ addr_param_seen = true;
+ break;
case SCTP_PARAM_ADD_IP:
case SCTP_PARAM_DEL_IP:
case SCTP_PARAM_SET_PRIMARY:
- asconf_param = (sctp_addip_param_t *)param.v;
- plen = ntohs(asconf_param->param_hdr.length);
- if (plen < sizeof(sctp_addip_param_t) +
- sizeof(sctp_paramhdr_t))
- return 0;
+ /* In ASCONF chunks, these need to be first. */
+ if (addr_param_needed && !addr_param_seen)
+ return false;
+ length = ntohs(param.addip->param_hdr.length);
+ if (length < sizeof(sctp_addip_param_t) +
+ sizeof(sctp_paramhdr_t))
+ return false;
break;
case SCTP_PARAM_SUCCESS_REPORT:
case SCTP_PARAM_ADAPTATION_LAYER_IND:
if (length != sizeof(sctp_addip_param_t))
- return 0;
-
+ return false;
break;
default:
- break;
+ /* This is unkown to us, reject! */
+ return false;
}
-
- param.v += WORD_ROUND(length);
}

- if (param.v != chunk_end)
- return 0;
+ /* Remaining sanity checks. */
+ if (addr_param_needed && !addr_param_seen)
+ return false;
+ if (!addr_param_needed && addr_param_seen)
+ return false;
+ if (param.v != chunk->chunk_end)
+ return false;

- return 1;
+ return true;
}

/* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3162,16 +3175,17 @@ int sctp_verify_asconf(const struct sctp_association *asoc,
struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
struct sctp_chunk *asconf)
{
+ sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+ bool all_param_pass = true;
+ union sctp_params param;
sctp_addiphdr_t *hdr;
union sctp_addr_param *addr_param;
sctp_addip_param_t *asconf_param;
struct sctp_chunk *asconf_ack;
-
__be16 err_code;
int length = 0;
int chunk_len;
__u32 serial;
- int all_param_pass = 1;

chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3199,9 +3213,14 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
goto done;

/* Process the TLVs contained within the ASCONF chunk. */
- while (chunk_len > 0) {
+ sctp_walk_params(param, addip, addip_hdr.params) {
+ /* Skip preceeding address parameters. */
+ if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+ param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+ continue;
+
err_code = sctp_process_asconf_param(asoc, asconf,
- asconf_param);
+ param.addip);
/* ADDIP 4.1 A7)
* If an error response is received for a TLV parameter,
* all TLVs with no response before the failed TLV are
@@ -3209,28 +3228,20 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
* the failed response are considered unsuccessful unless
* a specific success indication is present for the parameter.
*/
- if (SCTP_ERROR_NO_ERROR != err_code)
- all_param_pass = 0;
-
+ if (err_code != SCTP_ERROR_NO_ERROR)
+ all_param_pass = false;
if (!all_param_pass)
- sctp_add_asconf_response(asconf_ack,
- asconf_param->crr_id, err_code,
- asconf_param);
+ sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+ err_code, param.addip);

/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
* an IP address sends an 'Out of Resource' in its response, it
* MUST also fail any subsequent add or delete requests bundled
* in the ASCONF.
*/
- if (SCTP_ERROR_RSRC_LOW == err_code)
+ if (err_code == SCTP_ERROR_RSRC_LOW)
goto done;
-
- /* Move to the next ASCONF param. */
- length = ntohs(asconf_param->param_hdr.length);
- asconf_param = (void *)asconf_param + length;
- chunk_len -= length;
}
-
done:
asoc->peer.addip_serial++;

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 4b549a3..bf97a83 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3592,9 +3592,7 @@ sctp_disposition_t sctp_sf_do_asconf(struct net *net,
struct sctp_chunk *asconf_ack = NULL;
struct sctp_paramhdr *err_param = NULL;
sctp_addiphdr_t *hdr;
- union sctp_addr_param *addr_param;
__u32 serial;
- int length;

if (!sctp_vtag_verify(chunk, asoc)) {
sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3619,17 +3617,8 @@ sctp_disposition_t sctp_sf_do_asconf(struct net *net,
hdr = (sctp_addiphdr_t *)chunk->skb->data;
serial = ntohl(hdr->serial);

- addr_param = (union sctp_addr_param *)hdr->params;
- length = ntohs(addr_param->p.length);
- if (length < sizeof(sctp_paramhdr_t))
- return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
- (void *)addr_param, commands);
-
/* Verify the ASCONF chunk before processing it. */
- if (!sctp_verify_asconf(asoc,
- (sctp_paramhdr_t *)((void *)addr_param + length),
- (void *)chunk->chunk_end,
- &err_param))
+ if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
(void *)err_param, commands);

@@ -3747,10 +3736,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack(struct net *net,
rcvd_serial = ntohl(addip_hdr->serial);

/* Verify the ASCONF-ACK chunk before processing it. */
- if (!sctp_verify_asconf(asoc,
- (sctp_paramhdr_t *)addip_hdr->params,
- (void *)asconf_ack->chunk_end,
- &err_param))
+ if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
(void *)err_param, commands);

--
1.9.1

2014-12-29 23:59:00

by Vinson Lee

[permalink] [raw]
Subject: Re: [PATCH 3.13 106/162] KVM: x86: Fix far-jump to non-canonical check

On Thu, Nov 6, 2014 at 2:36 PM, Kamal Mostafa <[email protected]> wrote:
> 3.13.11.11 -stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Nadav Amit <[email protected]>
>
> commit 7e46dddd6f6cd5dbf3c7bd04a7e75d19475ac9f2 upstream.
>
> Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
> jumps") introduced a bug that caused the fix to be incomplete. Due to
> incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
> segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
> not trigger #GP. As we know, this imposes a security problem.
>
> In addition, the condition for two warnings was incorrect.
>
> Fixes: d1442d85cc30ea75f7d399474ca738e0bc96f715
> Reported-by: Dan Carpenter <[email protected]>
> Signed-off-by: Nadav Amit <[email protected]>
> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
> Signed-off-by: Paolo Bonzini <[email protected]>
> [ kamal: backport to 3.13-stable: omitted WARN_ON fixes (not in 3.13) ]
> Signed-off-by: Kamal Mostafa <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index a440bea..4ae37e7 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -581,12 +581,14 @@ static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
> case 4:
> ctxt->_eip = (u32)dst;
> break;
> +#ifdef CONFIG_X86_64
> case 8:
> if ((cs_l && is_noncanonical_address(dst)) ||
> - (!cs_l && (dst & ~(u32)-1)))
> + (!cs_l && (dst >> 32) != 0))
> return emulate_gp(ctxt, 0);
> ctxt->_eip = dst;
> break;
> +#endif
> default:
> WARN(1, "unsupported eip assignment size\n");
> }
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Hi.

Should the WARN_ON fixes have been included as well in 3.13.11.11?

WARN_ON hunks were added with the backport of "KVM: x86: Handle errors
when RIP is set during far jumps" in 3.13.11.11 commit
b8ba339d86fb627d54fea929492114d45f6835c2.

Cheers,
Vinson

2014-12-30 22:27:13

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3.13 106/162] KVM: x86: Fix far-jump to non-canonical check

Vinson Lee <[email protected]> wrote:

> On Thu, Nov 6, 2014 at 2:36 PM, Kamal Mostafa <[email protected]> wrote:
>> 3.13.11.11 -stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Nadav Amit <[email protected]>
>>
>> commit 7e46dddd6f6cd5dbf3c7bd04a7e75d19475ac9f2 upstream.
>>
>> Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
>> jumps") introduced a bug that caused the fix to be incomplete. Due to
>> incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
>> segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
>> not trigger #GP. As we know, this imposes a security problem.
>>
>> In addition, the condition for two warnings was incorrect.
>>
>> Fixes: d1442d85cc30ea75f7d399474ca738e0bc96f715
>> Reported-by: Dan Carpenter <[email protected]>
>> Signed-off-by: Nadav Amit <[email protected]>
>> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> [ kamal: backport to 3.13-stable: omitted WARN_ON fixes (not in 3.13) ]
>> Signed-off-by: Kamal Mostafa <[email protected]>
>> ---
>> arch/x86/kvm/emulate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index a440bea..4ae37e7 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -581,12 +581,14 @@ static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
>> case 4:
>> ctxt->_eip = (u32)dst;
>> break;
>> +#ifdef CONFIG_X86_64
>> case 8:
>> if ((cs_l && is_noncanonical_address(dst)) ||
>> - (!cs_l && (dst & ~(u32)-1)))
>> + (!cs_l && (dst >> 32) != 0))
>> return emulate_gp(ctxt, 0);
>> ctxt->_eip = dst;
>> break;
>> +#endif
>> default:
>> WARN(1, "unsupported eip assignment size\n");
>> }
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe stable" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Hi.
>
> Should the WARN_ON fixes have been included as well in 3.13.11.11?
>
> WARN_ON hunks were added with the backport of "KVM: x86: Handle errors
> when RIP is set during far jumps" in 3.13.11.11 commit
> b8ba339d86fb627d54fea929492114d45f6835c2.

If it was added, it should be fixed. The WARN_ON mistakenly has
double-negation.

Nadav