2014-11-01 22:56:42

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 000/102] 3.2.64-rc1 review

This is the start of the stable review cycle for the 3.2.64 release.
There are 102 patches in this series, which will be posted as responses
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue Nov 04 00:00:00 UTC 2014.
Anything received after that time might be too late.

A combined patch relative to 3.2.63 will be posted as an additional
response to this. A shortlog and diffstat can be found below.

Ben.

-------------

Al Viro (2):
be careful with nd->inode in path_init() and follow_dotdot_rcu()
[4023bfc9f351a7994fb6a7d515476c320f94a574]
don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
[7bd88377d482e1eae3c5329b12e33cfd664fa6a9]

Alban Crequy (1):
cgroup: reject cgroup names with '\n'
[71b1fb5c4473a5b1e601d41b109bdfe001ec82e0]

Alex Deucher (1):
drm/radeon: add connector quirk for fujitsu board
[1952f24d0fa6292d65f886887af87ba8ac79b3ba]

Andreas Rohner (1):
nilfs2: fix data loss with mmap()
[56d7acc792c0d98f38f22058671ee715ff197023]

Andrew Hunter (1):
jiffies: Fix timeval conversion to jiffies
[d78c9300c51d6ceed9f6d078d4e9366f259de28c]

Andy Honig (1):
KVM: x86: Improve thread safety in pit
[2febc839133280d5a5e8e1179c94ea674489dae2]

Andy Lutomirski (1):
x86,kvm,vmx: Preserve CR4 across VM entry
[d974baa398f34393db76be45f7d4d04fbdbb4a0a]

Anton Altaparmakov (1):
Fix nasty 32-bit overflow bug in buffer i/o code.
[f2d5a94436cc7cc0221b9a81bba2276a25187dd3]

Aurelien Jarno (1):
MIPS: ZBOOT: add missing <linux/string.h> include
[29593fd5a8149462ed6fad0d522234facdaee6c8]

Ben Hutchings (1):
vfs: Fold follow_mount_rcu() into follow_dotdot_rcu()
[b37199e626b31e1175fb06764c5d1d687723aac2]

Bjørn Mork (2):
USB: sierra: add 1199:68AA device ID
[5b3da69285c143b7ea76b3b9f73099ff1093ab73]
USB: sierra: avoid CDC class functions on "68A3" devices
[049255f51644c1105775af228396d187402a5934]

Chenweilong (1):
ipv6: reallocate addrconf router for ipv6 address when lo device up
[33d99113b1102c2d2f8603b9ba72d89d915c13f5]

Christian Borntraeger (1):
KVM: s390: Fix user triggerable bug in dead code
[614a80e474b227cace52fd6e3c790554db8a396e]

Clemens Ladisch (1):
ALSA: pcm: fix fifo_size frame calculation
[a9960e6a293e6fc3ed414643bb4e4106272e4d0a]

Cong Wang (1):
perf: Fix a race condition in perf_remove_from_context()
[3577af70a2ce4853d58e57d832e687d739281479]

Daniel Borkmann (3):
net: sctp: fix panic on duplicate ASCONF chunks
[b69040d8e39f20d5215a03502a8e8b4c6ab78395]
net: sctp: fix remote memory pressure from excessive queueing
[26b87c7881006311828bb0ab271a551a62dcceb4]
net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
[9de7922bc709eee2f609cd01d98aaedc4cf5ea74]

Dave Chinner (1):
xfs: don't dirty buffers beyond EOF
[22e757a49cf010703fcb9c9b4ef793248c39b0c2]

David Dueck (1):
can: at91_can: add missing prepare and unprepare of the clock
[e77980e50bc2850599d4d9c0192b67a9ffd6daac]

David Jander (2):
can: flexcan: correctly initialize mailboxes
[fc05b884a31dbf259cc73cc856e634ec3acbebb6]
can: flexcan: implement workaround for errata ERR005829
[25e924450fcb23c11c07c95ea8964dd9f174652e]

Dmitry Torokhov (1):
Input: synaptics - add support for ForcePads
[5715fc764f7753d464dbe094b5ef9cffa6e479a4]

Eliad Peller (1):
regulatory: add NUL to alpha2
[a5fe8e7695dc3f547e955ad2b662e3e72969e506]

Emmanuel Grumbach (1):
Revert "iwlwifi: dvm: don't enable CTS to self"
[f47f46d7b09cf1d09e4b44b6cc4dd7d68a08028c]

Felipe Balbi (3):
usb: dwc3: core: fix order of PM runtime calls
[fed33afce0eda44a46ae24d93aec1b5198c0bac4]
usb: dwc3: core: use pm_runtime_put_sync() on remove
[16b972a592ea2c9a3c2a3c12238de650fd4043a9]
usb: host: xhci: fix compliance mode workaround
[96908589a8b2584b1185f834d365f5cc360e8226]

Hannes Frederic Sowa (1):
ipv6: reuse ip6_frag_id from ip6_ufo_append_data
[916e4cf46d0204806c062c8c6c4d1f633852c5b6]

Hans de Goede (3):
Input: elantech - fix detection of touchpad on ASUS s301l
[271329b3c798b2102120f5df829071c211ef00ed]
Input: i8042 - add Fujitsu U574 to no_timeout dmi table
[cc18a69c92d0972bc2fc5a047ee3be1e8398171b]
Input: i8042 - add nomux quirk for Avatar AVIU-145A6
[d2682118f4bb3ceb835f91c1a694407a31bb7378]

Honggang Li (1):
percpu: free percpu allocation info for uniprocessor system
[3189eddbcafcc4d827f7f19facbeddec4424eba8]

Ilya Dryomov (3):
libceph: add process_one_ticket() helper
[597cda357716a3cf8d994cb11927af917c8d71fa]
libceph: do not hard code max auth ticket len
[c27a3e4d667fdcad3db7b104f75659478e0c68d8]
libceph: rename ceph_msg::front_max to front_alloc_len
[3cea4c3071d4e55e9d7356efe9d0ebf92f0c2204]

James Ralston (2):
ahci: Add Device IDs for Intel 9 Series PCH
[1b071a0947dbce5c184c12262e02540fbc493457]
ata_piix: Add Device IDs for Intel 9 Series PCH
[6cad1376954e591c3c41500c4e586e183e7ffe6d]

Jan Kara (1):
ext2: Fix fs corruption in ext2_get_xip_mem()
[7ba3ec5749ddb61f79f7be17b5fd7720eebc52de]

Jeff Moyer (1):
aio: add missing smp_rmb() in read_events_ring
[2ff396be602f10b5eab8e73b24f20348fa2de159]

Jiri Kosina (1):
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
[6726655dfdd2dc60c035c690d9f10cb69d7ea075]

Joe Lawrence (1):
usb: hub: take hub->hdev reference when processing from eventlist
[c605f3cdff53a743f6d875b76956b239deca1272]

Joern Engel (1):
iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
[8ae757d09c45102b347a1bc2867f54ffc1ab8fda]

Johan Hovold (1):
USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
[ee444609dbae8afee420c3243ce4c5f442efb622]

Johannes Berg (1):
nl80211: clear skb cb before passing to netlink
[bd8c78e78d5011d8111bc2533ee73b13a3bd6c42]

John David Anglin (1):
parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
[d26a7730b5874a5fa6779c62f4ad7c5065a94723]

John Sung (1):
Input: serport - add compat handling for SPIOCSTYPE ioctl
[a80d8b02751060a178bb1f7a6b7a93645a7a308b]

Joseph Qi (1):
ocfs2/dlm: do not get resource spinlock if lockres is new
[5760a97c7143c208fa3a8f8cad0ed7dd672ebd28]

Josh Triplett (1):
init/Kconfig: Hide printk log config if CONFIG_PRINTK=n
[361e9dfbaae84b0b246ed18d1ab7c82a1a41b53e]

Julian Anastasov (1):
ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
[2627b7e15c5064ddd5e578e4efd948d48d531a3f]

Keith Busch (1):
block: Fix dev_t minor allocation lifetime
[2da78092dda13f1efd26edbbf99a567776913750]

Larry Finger (1):
rtlwifi: rtl8192cu: Add new ID
[c66517165610b911e4c6d268f28d8c640832dbd1]

Marc Kleine-Budde (2):
can: flexcan: mark TX mailbox as TX_INACTIVE
[c32fe4ad3e4861b2bfa1f44114c564935a123dda]
can: flexcan: put TX mailbox into TX_INACTIVE mode after tx-complete
[de5944883ebbedbf5adc8497659772f5da7b7d72]

Marcelo Ricardo Leitner (3):
ipv4: avoid parallel route cache gc executions
[not upstream; route cache has been removed]
ipv4: disable bh while doing route gc
[not upstream; route cache has been removed]
ipv4: move route garbage collector to work queue
[not upstream; route cache has been removed]

Mark (4):
USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
[67d365a57a51fb9dece6a5ceb504aa381cae1e5b]
USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
[b6a3ed677991558ce09046397a7c4d70530d15b3]
USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
[c80b4495c61636edc58fe1ce300f09f24db28e10]
storage: Add single-LUN quirk for Jaz USB Adapter
[c66f1c62e85927357e7b3f4c701614dcb5c498a2]

Mark Brown (1):
regmap: Fix handling of volatile registers for format_write() chips
[5844a8b9d98ec11ce1d77610daacf3f0a0e14715]

Markos Chandras (1):
MIPS: mcount: Adjust stack pointer for static trace in MIPS32
[8a574cfa2652545eb95595d38ac2a0bb501af0ae]

Mathias Krause (1):
drm/i915: Remove bogus __init annotation from DMI callbacks
[bbe1c2740d3a25aa1dbe5d842d2ff09cddcdde0a]

Mathias Nyman (1):
xhci: Fix null pointer dereference if xhci initialization fails
[c207e7c50f31113c24a9f536fcab1e8a256985d7]

Mel Gorman (1):
mm: migrate: Close race between migration completion and mprotect
[d3cb8bf6081b8b7a2dabb1264fe968fd870fa595]

Mike Christie (1):
[SCSI] libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
[db9bfd64b14a3a8f1868d2164518fdeab1b26ad1]

Miklos Szeredi (1):
shmem: fix nlink for rename overwrite directory
[b928095b0a7cff7fb9fcf4c706348ceb8ab2c295]

Mikulas Patocka (1):
dm crypt: fix access beyond the end of allocated space
[d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed]

Murali Karicheri (1):
ahci: add pcid for Marvel 0x9182 controller
[c5edfff9db6f4d2c35c802acb4abe0df178becee]

Nadav Amit (4):
KVM: x86: Check non-canonical addresses upon WRMSR
[854e8bb1aa06c578c2c9145fa6bfe3680ef63b23]
KVM: x86: Emulator fixes for eip canonical checks on near branches
[234f3ce485d54017f15cf5e0699cff4100121601]
KVM: x86: Fix wrong masking on relative jump/call
[05c83ec9b73c8124555b706f6af777b10adf0862]
KVM: x86: Handle errors when RIP is set during far jumps
[d1442d85cc30ea75f7d399474ca738e0bc96f715]

Nadav Har'El (1):
nEPT: Nested INVEPT
[bfd0a56b90005f8c8a004baf407ad90045c2b11e]

Nicholas Bellinger (1):
iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
[b53b0d99d6fbf7d44330395349a895521cfdbc96]

Paolo Bonzini (1):
KVM: x86: use new CS.RPL as CPL during task switch
[2356aaeb2f58f491679dc0c38bc3f6dbe54e7ded]

Peter Zijlstra (1):
perf: fix perf bug in fork()
[6c72e3501d0d62fc064d3680e5234f3463ec5a86]

Petr Matousek (1):
kvm: vmx: handle invvpid vm exit gracefully
[a642fc305053cc1c6e47e4f4df327895747ab485]

Richard Larocque (3):
alarmtimer: Do not signal SIGEV_NONE timers
[265b81d23a46c39df0a735a3af4238954b41a4c2]
alarmtimer: Lock k_itimer during timer callback
[474e941bed9262f5fa2394f9a4a67e24499e5926]
alarmtimer: Return relative times in timer_gettime
[e86fea764991e00a03ff1e56409ec9cacdbda4c9]

Robin Murphy (1):
ARM: 8165/1: alignment: don't break misaligned NEON load/store
[5ca918e5e3f9df4634077c06585c42bc6a8d699a]

Ross Lagerwall (1):
xen/manage: Always freeze/thaw processes when suspend/resuming
[61a734d305e16944b42730ef582a7171dc733321]

Sage Weil (1):
libceph: gracefully handle large reply messages from the mon
[73c3d4812b4c755efeca0140f606f83772a39ce4]

Sergio Gelato (1):
nfsd: Fix ACL null pointer deref
[4ac7249ea5a0ceef9f8269f63f33cc873c3fac61]

Steven Rostedt (1):
ring-buffer: Fix infinite spin in reading buffer
[24607f114fd14f2f37e3e0cb3d47bce96e81e848]

Takashi Iwai (1):
ALSA: hda - Fix COEF setups for ALC1150 codec
[acf08081adb5e8fe0519eb97bb49797ef52614d6]

Takuya Yoshikawa (1):
KVM: x86 emulator: Use opcode::execute for CALL
[d4ddafcdf2201326ec9717172767cfad0ede1472]

Taylor Braun-Jones (1):
USB: ftdi_sio: Add support for GE Healthcare Nemo Tracker device
[9c491c372d677b6420e0f8c6361fe422791662cc]

Tejun Heo (2):
percpu: fix pcpu_alloc_pages() failure path
[f0d279654dea22b7a6ad34b9334aee80cda62cde]
percpu: perform tlb flush after pcpu_map_pages() failure
[849f5169097e1ba35b90ac9df76b5bb6f9c0aabd]

Theodore Ts'o (1):
ext4: fix BUG_ON in mb_free_blocks()
[c99d1e6e83b06744c75d9f5e491ed495a7086b7b]

Thomas Gleixner (1):
futex: Unlock hb->lock in futex_wait_requeue_pi() error path
[13c42c2f43b19aab3195f2d357db00d1e885eaa8]

Thomas Hellstrom (1):
drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
[f01ea0c3d9db536c64d47922716d8b3b8f21d850]

Thomas Pugliese (1):
uwb: init beacon cache entry before registering uwb device
[675f0ab2fe5a0f7325208e60b617a5f32b86d72c]

Trond Myklebust (1):
NFSv4: Fix another bug in the close/open_downgrade code
[cd9288ffaea4359d5cfe2b8d264911506aed26a4]

Wanpeng Li (1):
sched: Fix unreleased llc_shared_mask bit during CPU hotplug
[03bd4e1f7265548832a76e7919a81f3137c44fd1]

Wolfram Sang (1):
regmap: if format_write is used, declare all registers as "unreadable"
[4191f19792bf91267835eb090d970e9cd6277a65]

Yoichi Yuasa (1):
MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
[5596b0b245fb9d2cefb5023b11061050351c1398]

Makefile | 4 +-
arch/arm/mm/alignment.c | 3 +
arch/mips/boot/compressed/decompress.c | 1 +
arch/mips/kernel/mcount.S | 12 ++
arch/mips/mm/c-r4k.c | 2 +
arch/parisc/Makefile | 7 +-
arch/s390/kvm/kvm-s390.c | 10 -
arch/x86/include/asm/kvm_host.h | 14 ++
arch/x86/include/asm/vmx.h | 4 +
arch/x86/kernel/smpboot.c | 3 +
arch/x86/kvm/emulate.c | 251 +++++++++++++++++-------
arch/x86/kvm/i8254.c | 2 +
arch/x86/kvm/mmu.c | 2 +
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx.c | 108 ++++++++++-
arch/x86/kvm/x86.c | 27 ++-
block/genhd.c | 18 +-
drivers/acpi/processor_idle.c | 4 +-
drivers/ata/ahci.c | 10 +
drivers/ata/ata_piix.c | 8 +
drivers/base/regmap/regmap.c | 7 +-
drivers/gpu/drm/i915/intel_bios.c | 2 +-
drivers/gpu/drm/i915/intel_lvds.c | 2 +-
drivers/gpu/drm/radeon/radeon_atombios.c | 7 +
drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 3 +-
drivers/input/mouse/elantech.c | 7 +
drivers/input/mouse/synaptics.c | 68 +++++--
drivers/input/mouse/synaptics.h | 11 ++
drivers/input/serio/i8042-x86ia64io.h | 15 ++
drivers/input/serio/serport.c | 45 ++++-
drivers/md/dm-crypt.c | 20 +-
drivers/net/can/at91_can.c | 8 +-
drivers/net/can/flexcan.c | 41 +++-
drivers/net/wireless/iwlwifi/iwl-agn-rxon.c | 13 ++
drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
drivers/scsi/libiscsi.c | 10 +
drivers/target/iscsi/iscsi_target.c | 4 +-
drivers/target/iscsi/iscsi_target_parameters.c | 2 +-
drivers/usb/core/hub.c | 4 +-
drivers/usb/dwc3/core.c | 6 +-
drivers/usb/host/xhci-hub.c | 8 +-
drivers/usb/host/xhci-mem.c | 2 +-
drivers/usb/serial/ftdi_sio.c | 3 +
drivers/usb/serial/ftdi_sio_ids.h | 12 ++
drivers/usb/serial/sierra.c | 9 +-
drivers/usb/storage/unusual_devs.h | 38 ++++
drivers/uwb/lc-dev.c | 13 +-
drivers/xen/manage.c | 7 -
fs/aio.c | 7 +
fs/buffer.c | 6 +-
fs/ext2/inode.c | 2 +
fs/ext2/xip.c | 1 +
fs/ext4/mballoc.c | 5 +
fs/namei.c | 71 +++----
fs/nfs/nfs4proc.c | 30 +--
fs/nfsd/vfs.c | 3 +
fs/nilfs2/inode.c | 7 +-
fs/ocfs2/dlm/dlmmaster.c | 18 +-
fs/partitions/check.c | 2 +-
fs/xfs/xfs_aops.c | 61 ++++++
include/linux/alarmtimer.h | 1 +
include/linux/ceph/messenger.h | 2 +-
include/linux/jiffies.h | 12 --
include/net/regulatory.h | 2 +-
include/net/sctp/sctp.h | 5 +
include/net/sctp/sm.h | 6 +-
init/Kconfig | 1 +
kernel/cgroup.c | 5 +
kernel/events/core.c | 14 +-
kernel/fork.c | 5 +-
kernel/futex.c | 1 +
kernel/time.c | 54 +++---
kernel/time/alarmtimer.c | 40 ++--
mm/migrate.c | 5 +-
mm/percpu-vm.c | 22 ++-
mm/percpu.c | 2 +
mm/shmem.c | 4 +-
net/ceph/auth_x.c | 256 +++++++++++++------------
net/ceph/messenger.c | 6 +-
net/ceph/mon_client.c | 16 +-
net/ipv4/route.c | 58 ++++--
net/ipv6/addrconf.c | 14 +-
net/ipv6/udp.c | 2 +-
net/sctp/associola.c | 2 +
net/sctp/inqueue.c | 33 +---
net/sctp/sm_make_chunk.c | 99 +++++-----
net/sctp/sm_statefuns.c | 21 +-
net/wireless/nl80211.c | 6 +
sound/core/pcm_lib.c | 8 +-
sound/pci/hda/patch_realtek.c | 2 +
90 files changed, 1261 insertions(+), 516 deletions(-)

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


2014-11-01 22:39:30

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 003/102] percpu: perform tlb flush after pcpu_map_pages() failure

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit 849f5169097e1ba35b90ac9df76b5bb6f9c0aabd upstream.

If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
Currently, it doesn't flush tlb after the partial unmapping. This may
be okay in most cases as the established mapping hasn't been used at
that point but it can go wrong and when it goes wrong it'd be
extremely difficult to track down.

Flush tlb after the partial unmapping.

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
mm/percpu-vm.c | 1 +
1 file changed, 1 insertion(+)

--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -273,6 +273,7 @@ err:
__pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
page_end - page_start);
}
+ pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
return err;
}

2014-11-01 22:39:42

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 002/102] percpu: fix pcpu_alloc_pages() failure path

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit f0d279654dea22b7a6ad34b9334aee80cda62cde upstream.

When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
free what has already been allocated. The invocation is across the
whole requested range and pcpu_free_pages() will try to free all
non-NULL pages; unfortunately, this is incorrect as
pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
clear the pages array and thus the array may have entries from the
previous invocations making the partial failure path free incorrect
pages.

Fix it by open-coding the partial freeing of the already allocated
pages.

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
mm/percpu-vm.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -108,7 +108,7 @@ static int pcpu_alloc_pages(struct pcpu_
int page_start, int page_end)
{
const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
- unsigned int cpu;
+ unsigned int cpu, tcpu;
int i;

for_each_possible_cpu(cpu) {
@@ -116,14 +116,23 @@ static int pcpu_alloc_pages(struct pcpu_
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];

*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
- if (!*pagep) {
- pcpu_free_pages(chunk, pages, populated,
- page_start, page_end);
- return -ENOMEM;
- }
+ if (!*pagep)
+ goto err;
}
}
return 0;
+
+err:
+ while (--i >= page_start)
+ __free_page(pages[pcpu_page_idx(cpu, i)]);
+
+ for_each_possible_cpu(tcpu) {
+ if (tcpu == cpu)
+ break;
+ for (i = page_start; i < page_end; i++)
+ __free_page(pages[pcpu_page_idx(tcpu, i)]);
+ }
+ return -ENOMEM;
}

/**

2014-11-01 22:39:44

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 004/102] percpu: free percpu allocation info for uniprocessor system

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Honggang Li <[email protected]>

commit 3189eddbcafcc4d827f7f19facbeddec4424eba8 upstream.

Currently, only SMP system free the percpu allocation info.
Uniprocessor system should free it too. For example, one x86 UML
virtual machine with 256MB memory, UML kernel wastes one page memory.

Signed-off-by: Honggang Li <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
mm/percpu.c | 2 ++
1 file changed, 2 insertions(+)

--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1895,6 +1895,8 @@ void __init setup_per_cpu_areas(void)

if (pcpu_setup_first_chunk(ai, fc) < 0)
panic("Failed to initialize percpu areas.");
+
+ pcpu_free_alloc_info(ai);
}

#endif /* CONFIG_SMP */

2014-11-01 22:39:39

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 015/102] USB: ftdi_sio: add support for NOVITUS Bono E thermal printer

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <[email protected]>

commit ee444609dbae8afee420c3243ce4c5f442efb622 upstream.

Add device id for NOVITUS Bono E thermal printer.

Reported-by: Emanuel Koczwara <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 7 insertions(+)

--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -752,6 +752,7 @@ static struct usb_device_id id_table_com
{ USB_DEVICE(FTDI_VID, FTDI_NDI_AURORA_SCU_PID),
.driver_info = (kernel_ulong_t)&ftdi_NDI_device_quirk },
{ USB_DEVICE(TELLDUS_VID, TELLDUS_TELLSTICK_PID) },
+ { USB_DEVICE(NOVITUS_VID, NOVITUS_BONO_E_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_S03_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_59_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_57A_PID) },
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -831,6 +831,12 @@
#define TELLDUS_TELLSTICK_PID 0x0C30 /* RF control dongle 433 MHz using FT232RL */

/*
+ * NOVITUS printers
+ */
+#define NOVITUS_VID 0x1a28
+#define NOVITUS_BONO_E_PID 0x6010
+
+/*
* RT Systems programming cables for various ham radios
*/
#define RTSYSTEMS_VID 0x2100 /* Vendor ID */

2014-11-01 22:39:37

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 012/102] ahci: Add Device IDs for Intel 9 Series PCH

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: James Ralston <[email protected]>

commit 1b071a0947dbce5c184c12262e02540fbc493457 upstream.

This patch adds the AHCI mode SATA Device IDs for the Intel 9 Series PCH.

Signed-off-by: James Ralston <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/ata/ahci.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -315,6 +315,14 @@ static const struct pci_device_id ahci_p
{ PCI_VDEVICE(INTEL, 0x9c85), board_ahci }, /* Wildcat Point-LP RAID */
{ PCI_VDEVICE(INTEL, 0x9c87), board_ahci }, /* Wildcat Point-LP RAID */
{ PCI_VDEVICE(INTEL, 0x9c8f), board_ahci }, /* Wildcat Point-LP RAID */
+ { PCI_VDEVICE(INTEL, 0x8c82), board_ahci }, /* 9 Series AHCI */
+ { PCI_VDEVICE(INTEL, 0x8c83), board_ahci }, /* 9 Series AHCI */
+ { PCI_VDEVICE(INTEL, 0x8c84), board_ahci }, /* 9 Series RAID */
+ { PCI_VDEVICE(INTEL, 0x8c85), board_ahci }, /* 9 Series RAID */
+ { PCI_VDEVICE(INTEL, 0x8c86), board_ahci }, /* 9 Series RAID */
+ { PCI_VDEVICE(INTEL, 0x8c87), board_ahci }, /* 9 Series RAID */
+ { PCI_VDEVICE(INTEL, 0x8c8e), board_ahci }, /* 9 Series RAID */
+ { PCI_VDEVICE(INTEL, 0x8c8f), board_ahci }, /* 9 Series RAID */

/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,

2014-11-01 22:39:34

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 017/102] USB: sierra: add 1199:68AA device ID

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Bjørn Mork <[email protected]>

commit 5b3da69285c143b7ea76b3b9f73099ff1093ab73 upstream.

This VID:PID is used for some Direct IP devices behaving
identical to the already supported 0F3D:68AA devices.

Reported-by: Lars Melin <[email protected]>
Signed-off-by: Bjørn Mork <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/serial/sierra.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -300,6 +300,9 @@ static const struct usb_device_id id_tab
{ USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
+ { USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68AA, 0xFF, 0xFF, 0xFF),
+ .driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
+ },
/* AT&T Direct IP LTE modems */
{ USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist

2014-11-01 22:41:07

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 026/102] usb: dwc3: core: fix order of PM runtime calls

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit fed33afce0eda44a46ae24d93aec1b5198c0bac4 upstream.

Currently, we disable pm_runtime before all register
accesses are done, this is dangerous and might lead
to abort exceptions due to the driver trying to access
a register which is clocked by a clock which was long
gated.

Fix that by moving pm_runtime_put_sync() and pm_runtime_disable()
as the last thing we do before returning from our ->remove()
method.

Fixes: 72246da (usb: Introduce DesignWare USB3 DRD Driver)
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/dwc3/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -427,9 +427,6 @@ static int __devexit dwc3_remove(struct

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

- pm_runtime_put_sync(&pdev->dev);
- pm_runtime_disable(&pdev->dev);
-
dwc3_debugfs_exit(dwc);

if (features & DWC3_HAS_PERIPHERAL)
@@ -440,6 +437,9 @@ static int __devexit dwc3_remove(struct
iounmap(dwc->regs);
kfree(dwc->mem);

+ pm_runtime_put_sync(&pdev->dev);
+ pm_runtime_disable(&pdev->dev);
+
return 0;
}

2014-11-01 22:41:12

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 027/102] ahci: add pcid for Marvel 0x9182 controller

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Murali Karicheri <[email protected]>

commit c5edfff9db6f4d2c35c802acb4abe0df178becee upstream.

Keystone K2E EVM uses Marvel 0x9182 controller. This requires support
for the ID in the ahci driver.

Signed-off-by: Murali Karicheri <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Cc: Santosh Shilimkar <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/ata/ahci.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -457,6 +457,8 @@ static const struct pci_device_id ahci_p
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x917a),
.driver_data = board_ahci_yes_fbs }, /* 88se9172 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x9172),
+ .driver_data = board_ahci_yes_fbs }, /* 88se9182 */
+ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x9182),
.driver_data = board_ahci_yes_fbs }, /* 88se9172 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x9192),
.driver_data = board_ahci_yes_fbs }, /* 88se9172 on some Gigabyte */

2014-11-01 22:41:20

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 020/102] ALSA: hda - Fix COEF setups for ALC1150 codec

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit acf08081adb5e8fe0519eb97bb49797ef52614d6 upstream.

ALC1150 codec seems to need the COEF- and PLL-setups just like its
compatible ALC882 codec. Some machines (e.g. SunMicro X10SAT) show
the problem like too low output volumes unless the COEF setup is
applied.

Reported-and-tested-by: Dana Goyette <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
sound/pci/hda/patch_realtek.c | 2 ++
1 file changed, 2 insertions(+)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -789,6 +789,7 @@ static void alc_auto_init_amp(struct hda
case 0x10ec0885:
case 0x10ec0887:
/*case 0x10ec0889:*/ /* this causes an SPDIF problem */
+ case 0x10ec0900:
alc889_coef_init(codec);
break;
case 0x10ec0888:
@@ -4343,6 +4344,7 @@ static int patch_alc882(struct hda_codec
switch (codec->vendor_id) {
case 0x10ec0882:
case 0x10ec0885:
+ case 0x10ec0900:
break;
default:
/* ALC883 and variants */

2014-11-01 22:41:26

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 033/102] perf: Fix a race condition in perf_remove_from_context()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Cong Wang <[email protected]>

commit 3577af70a2ce4853d58e57d832e687d739281479 upstream.

We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.

It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().

This should cure the above soft lockup.

Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/events/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1683,6 +1683,16 @@ retry:
*/
if (ctx->is_active) {
raw_spin_unlock_irq(&ctx->lock);
+ /*
+ * Reload the task pointer, it might have been changed by
+ * a concurrent perf_event_context_sched_out().
+ */
+ task = ctx->task;
+ /*
+ * Reload the task pointer, it might have been changed by
+ * a concurrent perf_event_context_sched_out().
+ */
+ task = ctx->task;
goto retry;
}

2014-11-01 22:41:23

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 024/102] ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <[email protected]>

commit 6726655dfdd2dc60c035c690d9f10cb69d7ea075 upstream.

There is a following AB-BA dependency between cpu_hotplug.lock and
cpuidle_lock:

1) cpu_hotplug.lock -> cpuidle_lock
enable_nonboot_cpus()
_cpu_up()
cpu_hotplug_begin()
LOCK(cpu_hotplug.lock)
cpu_notify()
...
acpi_processor_hotplug()
cpuidle_pause_and_lock()
LOCK(cpuidle_lock)

2) cpuidle_lock -> cpu_hotplug.lock
acpi_os_execute_deferred() workqueue
...
acpi_processor_cst_has_changed()
cpuidle_pause_and_lock()
LOCK(cpuidle_lock)
get_online_cpus()
LOCK(cpu_hotplug.lock)

Fix this by reversing the order acpi_processor_cst_has_changed() does
thigs -- let it first execute the protection against CPU hotplug by
calling get_online_cpus() and obtain the cpuidle lock only after that (and
perform the symmentric change when allowing CPUs hotplug again and
dropping cpuidle lock).

Spotted by lockdep.

Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/acpi/processor_idle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1165,9 +1165,9 @@ int acpi_processor_cst_has_changed(struc
if (smp_processor_id() == 0 &&
cpuidle_get_driver() == &acpi_idle_driver) {

- cpuidle_pause_and_lock();
/* Protect against cpu-hotplug */
get_online_cpus();
+ cpuidle_pause_and_lock();

/* Disable all cpuidle devices */
for_each_online_cpu(cpu) {
@@ -1192,8 +1192,8 @@ int acpi_processor_cst_has_changed(struc
cpuidle_enable_device(&_pr->power.dev);
}
}
- put_online_cpus();
cpuidle_resume_and_unlock();
+ put_online_cpus();
}

return 0;

2014-11-01 22:41:30

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 069/102] sched: Fix unreleased llc_shared_mask bit during CPU hotplug

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Wanpeng Li <[email protected]>

commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream.

The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

https://lkml.org/lkml/2014/7/22/1018

His case is here:

==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119

Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.

When hot-removing socket#2 and #3, each core of sockets is
numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89

But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
remains having 0x3fff80000001fffc0000000.

After that, when hot-adding socket#2 and #3, each core of
sockets is numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119

Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.

Signed-off-by: Wanpeng Li <[email protected]>
Tested-by: Linn Crosetto <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Toshi Kani <[email protected]>
Reviewed-by: Yasuaki Ishimatsu <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kernel/smpboot.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1252,6 +1252,9 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, cpu_sibling_mask(cpu))
cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
+ for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
+ cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
+ cpumask_clear(cpu_llc_shared_mask(cpu));
cpumask_clear(cpu_sibling_mask(cpu));
cpumask_clear(cpu_core_mask(cpu));
c->phys_proc_id = 0;

2014-11-01 22:41:34

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 045/102] futex: Unlock hb->lock in futex_wait_requeue_pi() error path

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <[email protected]>

commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8 upstream.

futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

if (match_futex(&q.key, &key2)) {
ret = -EINVAL;
goto out_put_keys;
}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Darren Hart <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
[bwh: Backported to 3.2: queue_unlock() takes two parameters]
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/futex.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2460,6 +2460,7 @@ static int futex_wait_requeue_pi(u32 __u
* shared futexes. We need to compare the keys:
*/
if (match_futex(&q.key, &key2)) {
+ queue_unlock(&q, hb);
ret = -EINVAL;
goto out_put_keys;
}

2014-11-01 22:41:17

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 034/102] Input: synaptics - add support for ForcePads

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Torokhov <[email protected]>

commit 5715fc764f7753d464dbe094b5ef9cffa6e479a4 upstream.

ForcePads are found on HP EliteBook 1040 laptops. They lack any kind of
physical buttons, instead they generate primary button click when user
presses somewhat hard on the surface of the touchpad. Unfortunately they
also report primary button click whenever there are 2 or more contacts
on the pad, messing up all multi-finger gestures (2-finger scrolling,
multi-finger tapping, etc). To cope with this behavior we introduce a
delay (currently 50 msecs) in reporting primary press in case more
contacts appear.

Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/input/mouse/synaptics.c | 68 +++++++++++++++++++++++++++++++----------
drivers/input/mouse/synaptics.h | 11 +++++++
2 files changed, 63 insertions(+), 16 deletions(-)

--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -506,10 +506,61 @@ static int synaptics_parse_hw_state(cons
((buf[0] & 0x04) >> 1) |
((buf[3] & 0x04) >> 2));

+ if ((SYN_CAP_ADV_GESTURE(priv->ext_cap_0c) ||
+ SYN_CAP_IMAGE_SENSOR(priv->ext_cap_0c)) &&
+ hw->w == 2) {
+ synaptics_parse_agm(buf, priv, hw);
+ return 1;
+ }
+
+ hw->x = (((buf[3] & 0x10) << 8) |
+ ((buf[1] & 0x0f) << 8) |
+ buf[4]);
+ hw->y = (((buf[3] & 0x20) << 7) |
+ ((buf[1] & 0xf0) << 4) |
+ buf[5]);
+ hw->z = buf[2];
+
hw->left = (buf[0] & 0x01) ? 1 : 0;
hw->right = (buf[0] & 0x02) ? 1 : 0;

- if (SYN_CAP_CLICKPAD(priv->ext_cap_0c)) {
+ if (SYN_CAP_FORCEPAD(priv->ext_cap_0c)) {
+ /*
+ * ForcePads, like Clickpads, use middle button
+ * bits to report primary button clicks.
+ * Unfortunately they report primary button not
+ * only when user presses on the pad above certain
+ * threshold, but also when there are more than one
+ * finger on the touchpad, which interferes with
+ * out multi-finger gestures.
+ */
+ if (hw->z == 0) {
+ /* No contacts */
+ priv->press = priv->report_press = false;
+ } else if (hw->w >= 4 && ((buf[0] ^ buf[3]) & 0x01)) {
+ /*
+ * Single-finger touch with pressure above
+ * the threshold. If pressure stays long
+ * enough, we'll start reporting primary
+ * button. We rely on the device continuing
+ * sending data even if finger does not
+ * move.
+ */
+ if (!priv->press) {
+ priv->press_start = jiffies;
+ priv->press = true;
+ } else if (time_after(jiffies,
+ priv->press_start +
+ msecs_to_jiffies(50))) {
+ priv->report_press = true;
+ }
+ } else {
+ priv->press = false;
+ }
+
+ hw->left = priv->report_press;
+
+ } else if (SYN_CAP_CLICKPAD(priv->ext_cap_0c)) {
/*
* Clickpad's button is transmitted as middle button,
* however, since it is primary button, we will report
@@ -528,21 +579,6 @@ static int synaptics_parse_hw_state(cons
hw->down = ((buf[0] ^ buf[3]) & 0x02) ? 1 : 0;
}

- if ((SYN_CAP_ADV_GESTURE(priv->ext_cap_0c) ||
- SYN_CAP_IMAGE_SENSOR(priv->ext_cap_0c)) &&
- hw->w == 2) {
- synaptics_parse_agm(buf, priv, hw);
- return 1;
- }
-
- hw->x = (((buf[3] & 0x10) << 8) |
- ((buf[1] & 0x0f) << 8) |
- buf[4]);
- hw->y = (((buf[3] & 0x20) << 7) |
- ((buf[1] & 0xf0) << 4) |
- buf[5]);
- hw->z = buf[2];
-
if (SYN_CAP_MULTI_BUTTON_NO(priv->ext_cap) &&
((buf[0] ^ buf[3]) & 0x02)) {
switch (SYN_CAP_MULTI_BUTTON_NO(priv->ext_cap) & ~0x01) {
--- a/drivers/input/mouse/synaptics.h
+++ b/drivers/input/mouse/synaptics.h
@@ -77,6 +77,11 @@
* 2 0x08 image sensor image sensor tracks 5 fingers, but only
* reports 2.
* 2 0x20 report min query 0x0f gives min coord reported
+ * 2 0x80 forcepad forcepad is a variant of clickpad that
+ * does not have physical buttons but rather
+ * uses pressure above certain threshold to
+ * report primary clicks. Forcepads also have
+ * clickpad bit set.
*/
#define SYN_CAP_CLICKPAD(ex0c) ((ex0c) & 0x100000) /* 1-button ClickPad */
#define SYN_CAP_CLICKPAD2BTN(ex0c) ((ex0c) & 0x000100) /* 2-button ClickPad */
@@ -85,6 +90,7 @@
#define SYN_CAP_ADV_GESTURE(ex0c) ((ex0c) & 0x080000)
#define SYN_CAP_REDUCED_FILTERING(ex0c) ((ex0c) & 0x000400)
#define SYN_CAP_IMAGE_SENSOR(ex0c) ((ex0c) & 0x000800)
+#define SYN_CAP_FORCEPAD(ex0c) ((ex0c) & 0x008000)

/* synaptics modes query bits */
#define SYN_MODE_ABSOLUTE(m) ((m) & (1 << 7))
@@ -170,6 +176,11 @@ struct synaptics_data {
*/
struct synaptics_hw_state agm;
bool agm_pending; /* new AGM packet received */
+
+ /* ForcePad handling */
+ unsigned long press_start;
+ bool press;
+ bool report_press;
};

void synaptics_module_init(void);

2014-11-01 22:41:04

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 025/102] usb: dwc3: core: use pm_runtime_put_sync() on remove

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 16b972a592ea2c9a3c2a3c12238de650fd4043a9 upstream.

We are going to disable runtime_pm and we're
removing the driver, we must disable the device
now.

Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/dwc3/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -427,7 +427,7 @@ static int __devexit dwc3_remove(struct

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

- pm_runtime_put(&pdev->dev);
+ pm_runtime_put_sync(&pdev->dev);
pm_runtime_disable(&pdev->dev);

dwc3_debugfs_exit(dwc);

2014-11-01 22:41:01

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 022/102] aio: add missing smp_rmb() in read_events_ring

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Jeff Moyer <[email protected]>

commit 2ff396be602f10b5eab8e73b24f20348fa2de159 upstream.

We ran into a case on ppc64 running mariadb where io_getevents would
return zeroed out I/O events. After adding instrumentation, it became
clear that there was some missing synchronization between reading the
tail pointer and the events themselves. This small patch fixes the
problem in testing.

Thanks to Zach for helping to look into this, and suggesting the fix.

Signed-off-by: Jeff Moyer <[email protected]>
Signed-off-by: Benjamin LaHaise <[email protected]>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/aio.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1102,6 +1102,13 @@ static int aio_read_evt(struct kioctx *i
head = ring->head % info->nr;
if (head != ring->tail) {
struct io_event *evp = aio_ring_event(info, head, KM_USER1);
+
+ /*
+ * Ensure that once we've read the current tail pointer, that
+ * we also see the events that were stored up to the tail.
+ */
+ smp_rmb();
+
*ent = *evp;
head = (head + 1) % info->nr;
smp_mb(); /* finish reading the event before updatng the head */

2014-11-01 22:40:58

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 028/102] drm/radeon: add connector quirk for fujitsu board

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <[email protected]>

commit 1952f24d0fa6292d65f886887af87ba8ac79b3ba upstream.

Vbios connector table lists non-existent VGA port.

Bug:
https://bugs.freedesktop.org/show_bug.cgi?id=83184

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/gpu/drm/radeon/radeon_atombios.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/gpu/drm/radeon/radeon_atombios.c
+++ b/drivers/gpu/drm/radeon/radeon_atombios.c
@@ -457,6 +457,13 @@ static bool radeon_atom_apply_quirks(str
}
}

+ /* Fujitsu D3003-S2 board lists DVI-I as DVI-I and VGA */
+ if ((dev->pdev->device == 0x9805) &&
+ (dev->pdev->subsystem_vendor == 0x1734) &&
+ (dev->pdev->subsystem_device == 0x11bd)) {
+ if (*connector_type == DRM_MODE_CONNECTOR_VGA)
+ return false;
+ }

return true;
}

2014-11-01 22:40:56

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 029/102] usb: host: xhci: fix compliance mode workaround

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Felipe Balbi <[email protected]>

commit 96908589a8b2584b1185f834d365f5cc360e8226 upstream.

Commit 71c731a (usb: host: xhci: Fix Compliance Mode
on SN65LVP3502CP Hardware) implemented a workaround
for a known issue with Texas Instruments' USB 3.0
redriver IC but it left a condition where any xHCI
host would be taken out of reset if port was placed
in compliance mode and there was no device connected
to the port.

That condition would trigger a fake connection to a
non-existent device so that usbcore would trigger a
warm reset of the port, thus taking the link out of
reset.

This has the side-effect of preventing any xHCI host
connected to a Linux machine from starting and running
the USB 3.0 Electrical Compliance Suite because the
port will mysteriously taken out of compliance mode
and, thus, xHCI won't step through the necessary
compliance patterns for link validation.

This patch fixes the issue by just adding a missing
check for XHCI_COMP_MODE_QUIRK inside
xhci_hub_report_usb3_link_state() when PORT_CAS isn't
set.

This patch should be backported to all kernels containing
commit 71c731a.

Fixes: 71c731a (usb: host: xhci: Fix Compliance Mode on SN65LVP3502CP Hardware)
Cc: Alexis R. Cortes <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Acked-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[bwh: Backported to 3.2:
- s/xhci_hub_report_usb3_link_state/xhci_hub_report_link_state/
- s/raw_port_status/temp/
- Adjust indentation]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/host/xhci-hub.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -440,7 +440,8 @@ void xhci_test_and_clear_bit(struct xhci
}

/* Updates Link Status for super Speed port */
-static void xhci_hub_report_link_state(u32 *status, u32 status_reg)
+static void xhci_hub_report_link_state(struct xhci_hcd *xhci,
+ u32 *status, u32 status_reg)
{
u32 pls = status_reg & PORT_PLS_MASK;

@@ -479,7 +480,8 @@ static void xhci_hub_report_link_state(u
* in which sometimes the port enters compliance mode
* caused by a delay on the host-device negotiation.
*/
- if (pls == USB_SS_PORT_LS_COMP_MOD)
+ if ((xhci->quirks & XHCI_COMP_MODE_QUIRK) &&
+ (pls == USB_SS_PORT_LS_COMP_MOD))
pls |= USB_PORT_STAT_CONNECTION;
}

@@ -655,7 +657,7 @@ int xhci_hub_control(struct usb_hcd *hcd
}
/* Update Port Link State for super speed ports*/
if (hcd->speed == HCD_USB3) {
- xhci_hub_report_link_state(&status, temp);
+ xhci_hub_report_link_state(xhci, &status, temp);
/*
* Verify if all USB3 Ports Have entered U0 already.
* Delete Compliance Mode Timer if so.

2014-11-01 22:40:53

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 040/102] usb: hub: take hub->hdev reference when processing from eventlist

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Joe Lawrence <[email protected]>

commit c605f3cdff53a743f6d875b76956b239deca1272 upstream.

During surprise device hotplug removal tests, it was observed that
hub_events may try to call usb_lock_device on a device that has already
been freed. Protect the usb_device by taking out a reference (under the
hub_event_lock) when hub_events pulls it off the list, returning the
reference after hub_events is finished using it.

Signed-off-by: Joe Lawrence <[email protected]>
Suggested-by: David Bulkow <[email protected]> for using kref
Suggested-by: Alan Stern <[email protected]> for placement
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/core/hub.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3651,9 +3651,10 @@ static void hub_events(void)

hub = list_entry(tmp, struct usb_hub, event_list);
kref_get(&hub->kref);
+ hdev = hub->hdev;
+ usb_get_dev(hdev);
spin_unlock_irq(&hub_event_lock);

- hdev = hub->hdev;
hub_dev = hub->intfdev;
intf = to_usb_interface(hub_dev);
dev_dbg(hub_dev, "state %d ports %d chg %04x evt %04x\n",
@@ -3888,6 +3889,7 @@ static void hub_events(void)
usb_autopm_put_interface(intf);
loop_disconnected:
usb_unlock_device(hdev);
+ usb_put_dev(hdev);
kref_put(&hub->kref, hub_release);

} /* end while (1) */

2014-11-01 22:40:50

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 031/102] USB: ftdi_sio: Add support for GE Healthcare Nemo Tracker device

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Taylor Braun-Jones <[email protected]>

commit 9c491c372d677b6420e0f8c6361fe422791662cc upstream.

Signed-off-by: Taylor Braun-Jones <[email protected]>
Cc: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/serial/ftdi_sio.c | 2 ++
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 8 insertions(+)

--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -962,6 +962,8 @@ static struct usb_device_id id_table_com
{ USB_DEVICE(BRAINBOXES_VID, BRAINBOXES_US_842_4_PID) },
/* ekey Devices */
{ USB_DEVICE(FTDI_VID, FTDI_EKEY_CONV_USB_PID) },
+ /* GE Healthcare devices */
+ { USB_DEVICE(GE_HEALTHCARE_VID, GE_HEALTHCARE_NEMO_TRACKER_PID) },
{ }, /* Optional parameter entry */
{ } /* Terminating entry */
};
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -1385,3 +1385,9 @@
* ekey biometric systems GmbH (http://ekey.net/)
*/
#define FTDI_EKEY_CONV_USB_PID 0xCB08 /* Converter USB */
+
+/*
+ * GE Healthcare devices
+ */
+#define GE_HEALTHCARE_VID 0x1901
+#define GE_HEALTHCARE_NEMO_TRACKER_PID 0x0015

2014-11-01 22:40:46

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 023/102] block: Fix dev_t minor allocation lifetime

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Keith Busch <[email protected]>

commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.

Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.

Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
[bwh: Backported to 3.2:
- Adjust filename
- idr insertion API is different, and blk_alloc_devt() is preallocating
a node in a different way]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -28,10 +28,10 @@ struct kobject *block_depr;
/* for extended dynamic devt allocation, currently only one major is used */
#define NR_EXT_DEVT (1 << MINORBITS)

-/* For extended devt allocation. ext_devt_mutex prevents look up
+/* For extended devt allocation. ext_devt_lock prevents look up
* results from going away underneath its user.
*/
-static DEFINE_MUTEX(ext_devt_mutex);
+static DEFINE_SPINLOCK(ext_devt_lock);
static DEFINE_IDR(ext_devt_idr);

static struct device_type disk_type;
@@ -421,13 +421,13 @@ int blk_alloc_devt(struct hd_struct *par
do {
if (!idr_pre_get(&ext_devt_idr, GFP_KERNEL))
return -ENOMEM;
- mutex_lock(&ext_devt_mutex);
+ spin_lock(&ext_devt_lock);
rc = idr_get_new(&ext_devt_idr, part, &idx);
if (!rc && idx >= NR_EXT_DEVT) {
idr_remove(&ext_devt_idr, idx);
rc = -EBUSY;
}
- mutex_unlock(&ext_devt_mutex);
+ spin_unlock(&ext_devt_lock);
} while (rc == -EAGAIN);

if (rc)
@@ -454,9 +454,9 @@ void blk_free_devt(dev_t devt)
return;

if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
- mutex_lock(&ext_devt_mutex);
+ spin_lock(&ext_devt_lock);
idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
- mutex_unlock(&ext_devt_mutex);
+ spin_unlock(&ext_devt_lock);
}
}

@@ -663,7 +663,6 @@ void del_gendisk(struct gendisk *disk)
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
device_del(disk_to_dev(disk));
- blk_free_devt(disk_to_dev(disk)->devt);
}
EXPORT_SYMBOL(del_gendisk);

@@ -688,13 +687,13 @@ struct gendisk *get_gendisk(dev_t devt,
} else {
struct hd_struct *part;

- mutex_lock(&ext_devt_mutex);
+ spin_lock(&ext_devt_lock);
part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
if (part && get_disk(part_to_disk(part))) {
*partno = part->partno;
disk = part_to_disk(part);
}
- mutex_unlock(&ext_devt_mutex);
+ spin_unlock(&ext_devt_lock);
}

return disk;
@@ -1102,6 +1101,7 @@ static void disk_release(struct device *
{
struct gendisk *disk = dev_to_disk(dev);

+ blk_free_devt(dev->devt);
disk_release_events(disk);
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -361,6 +361,7 @@ static const struct attribute_group *par
static void part_release(struct device *dev)
{
struct hd_struct *p = dev_to_part(dev);
+ blk_free_devt(dev->devt);
free_part_stats(p);
free_part_info(p);
kfree(p);
@@ -403,7 +404,6 @@ void delete_partition(struct gendisk *di
rcu_assign_pointer(ptbl->last_lookup, NULL);
kobject_put(part->holder_dir);
device_del(part_to_dev(part));
- blk_free_devt(part_devt(part));

hd_struct_put(part);
}

2014-11-01 22:40:42

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 021/102] xen/manage: Always freeze/thaw processes when suspend/resuming

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ross Lagerwall <[email protected]>

commit 61a734d305e16944b42730ef582a7171dc733321 upstream.

Always freeze processes when suspending and thaw processes when resuming
to prevent a race noticeable with HVM guests.

This prevents a deadlock where the khubd kthread (which is designed to
be freezable) acquires a usb device lock and then tries to allocate
memory which requires the disk which hasn't been resumed yet.
Meanwhile, the xenwatch thread deadlocks waiting for the usb device
lock.

Freezing processes fixes this because the khubd thread is only thawed
after the xenwatch thread finishes resuming all the devices.

Signed-off-by: Ross Lagerwall <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/xen/manage.c | 7 -------
1 file changed, 7 deletions(-)

--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -108,16 +108,11 @@ static void do_suspend(void)

shutting_down = SHUTDOWN_SUSPEND;

-#ifdef CONFIG_PREEMPT
- /* If the kernel is preemptible, we need to freeze all the processes
- to prevent them from being in the middle of a pagetable update
- during suspend. */
err = freeze_processes();
if (err) {
printk(KERN_ERR "xen suspend: freeze failed %d\n", err);
goto out;
}
-#endif

err = dpm_suspend_start(PMSG_FREEZE);
if (err) {
@@ -172,10 +167,8 @@ out_resume:
clock_was_set();

out_thaw:
-#ifdef CONFIG_PREEMPT
thaw_processes();
out:
-#endif
shutting_down = SHUTDOWN_INVALID;
}
#endif /* CONFIG_HIBERNATE_CALLBACKS */

2014-11-01 22:44:48

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 032/102] uwb: init beacon cache entry before registering uwb device

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Pugliese <[email protected]>

commit 675f0ab2fe5a0f7325208e60b617a5f32b86d72c upstream.

Make sure the uwb_dev->bce entry is set before calling uwb_dev_add in
uwbd_dev_onair so that usermode will only see the device after it is
properly initialized. This fixes a kernel panic that can occur if
usermode tries to access the IEs sysfs attribute of a UWB device before
the driver has had a chance to set the beacon cache entry.

Signed-off-by: Thomas Pugliese <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/uwb/lc-dev.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

--- a/drivers/uwb/lc-dev.c
+++ b/drivers/uwb/lc-dev.c
@@ -441,16 +441,19 @@ void uwbd_dev_onair(struct uwb_rc *rc, s
uwb_dev->mac_addr = *bce->mac_addr;
uwb_dev->dev_addr = bce->dev_addr;
dev_set_name(&uwb_dev->dev, macbuf);
+
+ /* plug the beacon cache */
+ bce->uwb_dev = uwb_dev;
+ uwb_dev->bce = bce;
+ uwb_bce_get(bce); /* released in uwb_dev_sys_release() */
+
result = uwb_dev_add(uwb_dev, &rc->uwb_dev.dev, rc);
if (result < 0) {
dev_err(dev, "new device %s: cannot instantiate device\n",
macbuf);
goto error_dev_add;
}
- /* plug the beacon cache */
- bce->uwb_dev = uwb_dev;
- uwb_dev->bce = bce;
- uwb_bce_get(bce); /* released in uwb_dev_sys_release() */
+
dev_info(dev, "uwb device (mac %s dev %s) connected to %s %s\n",
macbuf, devbuf, rc->uwb_dev.dev.parent->bus->name,
dev_name(rc->uwb_dev.dev.parent));
@@ -458,6 +461,8 @@ void uwbd_dev_onair(struct uwb_rc *rc, s
return;

error_dev_add:
+ bce->uwb_dev = NULL;
+ uwb_bce_put(bce);
kfree(uwb_dev);
return;
}

2014-11-01 22:44:52

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 056/102] [SCSI] libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mike Christie <[email protected]>

commit db9bfd64b14a3a8f1868d2164518fdeab1b26ad1 upstream.

This patches fixes a potential buffer overrun in __iscsi_conn_send_pdu.
This function is used by iscsi drivers and userspace to send iscsi PDUs/
commands. For login commands, we have a set buffer size. For all other
commands we do not support data buffers.

This was reported by Dan Carpenter here:
http://www.spinics.net/lists/linux-scsi/msg66838.html

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/libiscsi.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -718,11 +718,21 @@ __iscsi_conn_send_pdu(struct iscsi_conn
return NULL;
}

+ if (data_size > ISCSI_DEF_MAX_RECV_SEG_LEN) {
+ iscsi_conn_printk(KERN_ERR, conn, "Invalid buffer len of %u for login task. Max len is %u\n", data_size, ISCSI_DEF_MAX_RECV_SEG_LEN);
+ return NULL;
+ }
+
task = conn->login_task;
} else {
if (session->state != ISCSI_STATE_LOGGED_IN)
return NULL;

+ if (data_size != 0) {
+ iscsi_conn_printk(KERN_ERR, conn, "Can not send data buffer of len %u for op 0x%x\n", data_size, opcode);
+ return NULL;
+ }
+
BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);

2014-11-01 22:45:25

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 038/102] libceph: do not hard code max auth ticket len

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <[email protected]>

commit c27a3e4d667fdcad3db7b104f75659478e0c68d8 upstream.

We hard code cephx auth ticket buffer size to 256 bytes. This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979

Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ceph/auth_x.c | 64 +++++++++++++++++++++++++------------------------------
1 file changed, 29 insertions(+), 35 deletions(-)

--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -13,8 +13,6 @@
#include "auth_x.h"
#include "auth_x_protocol.h"

-#define TEMP_TICKET_BUF_LEN 256
-
static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);

static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
@@ -64,7 +62,7 @@ static int ceph_x_encrypt(struct ceph_cr
}

static int ceph_x_decrypt(struct ceph_crypto_key *secret,
- void **p, void *end, void *obuf, size_t olen)
+ void **p, void *end, void **obuf, size_t olen)
{
struct ceph_x_encrypt_header head;
size_t head_len = sizeof(head);
@@ -75,8 +73,14 @@ static int ceph_x_decrypt(struct ceph_cr
return -EINVAL;

dout("ceph_x_decrypt len %d\n", len);
- ret = ceph_decrypt2(secret, &head, &head_len, obuf, &olen,
- *p, len);
+ if (*obuf == NULL) {
+ *obuf = kmalloc(len, GFP_NOFS);
+ if (!*obuf)
+ return -ENOMEM;
+ olen = len;
+ }
+
+ ret = ceph_decrypt2(secret, &head, &head_len, *obuf, &olen, *p, len);
if (ret)
return ret;
if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
@@ -131,18 +135,19 @@ static void remove_ticket_handler(struct

static int process_one_ticket(struct ceph_auth_client *ac,
struct ceph_crypto_key *secret,
- void **p, void *end,
- void *dbuf, void *ticket_buf)
+ void **p, void *end)
{
struct ceph_x_info *xi = ac->private;
int type;
u8 tkt_struct_v, blob_struct_v;
struct ceph_x_ticket_handler *th;
+ void *dbuf = NULL;
void *dp, *dend;
int dlen;
char is_enc;
struct timespec validity;
struct ceph_crypto_key old_key;
+ void *ticket_buf = NULL;
void *tp, *tpend;
struct ceph_timespec new_validity;
struct ceph_crypto_key new_session_key;
@@ -167,8 +172,7 @@ static int process_one_ticket(struct cep
}

/* blob for me */
- dlen = ceph_x_decrypt(secret, p, end, dbuf,
- TEMP_TICKET_BUF_LEN);
+ dlen = ceph_x_decrypt(secret, p, end, &dbuf, 0);
if (dlen <= 0) {
ret = dlen;
goto out;
@@ -195,20 +199,25 @@ static int process_one_ticket(struct cep

/* ticket blob for service */
ceph_decode_8_safe(p, end, is_enc, bad);
- tp = ticket_buf;
if (is_enc) {
/* encrypted */
dout(" encrypted ticket\n");
- dlen = ceph_x_decrypt(&old_key, p, end, ticket_buf,
- TEMP_TICKET_BUF_LEN);
+ dlen = ceph_x_decrypt(&old_key, p, end, &ticket_buf, 0);
if (dlen < 0) {
ret = dlen;
goto out;
}
+ tp = ticket_buf;
dlen = ceph_decode_32(&tp);
} else {
/* unencrypted */
ceph_decode_32_safe(p, end, dlen, bad);
+ ticket_buf = kmalloc(dlen, GFP_NOFS);
+ if (!ticket_buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ tp = ticket_buf;
ceph_decode_need(p, end, dlen, bad);
ceph_decode_copy(p, ticket_buf, dlen);
}
@@ -237,6 +246,8 @@ static int process_one_ticket(struct cep
xi->have_keys |= th->service;

out:
+ kfree(ticket_buf);
+ kfree(dbuf);
return ret;

bad:
@@ -249,21 +260,10 @@ static int ceph_x_proc_ticket_reply(stru
void *buf, void *end)
{
void *p = buf;
- char *dbuf;
- char *ticket_buf;
u8 reply_struct_v;
u32 num;
int ret;

- dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
- if (!dbuf)
- return -ENOMEM;
-
- ret = -ENOMEM;
- ticket_buf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
- if (!ticket_buf)
- goto out_dbuf;
-
ceph_decode_8_safe(&p, end, reply_struct_v, bad);
if (reply_struct_v != 1)
return -EINVAL;
@@ -272,22 +272,15 @@ static int ceph_x_proc_ticket_reply(stru
dout("%d tickets\n", num);

while (num--) {
- ret = process_one_ticket(ac, secret, &p, end,
- dbuf, ticket_buf);
+ ret = process_one_ticket(ac, secret, &p, end);
if (ret)
- goto out;
+ return ret;
}

- ret = 0;
-out:
- kfree(ticket_buf);
-out_dbuf:
- kfree(dbuf);
- return ret;
+ return 0;

bad:
- ret = -EINVAL;
- goto out;
+ return -EINVAL;
}

static int ceph_x_build_authorizer(struct ceph_auth_client *ac,
@@ -583,13 +576,14 @@ static int ceph_x_verify_authorizer_repl
struct ceph_x_ticket_handler *th;
int ret = 0;
struct ceph_x_authorize_reply reply;
+ void *preply = &reply;
void *p = au->reply_buf;
void *end = p + sizeof(au->reply_buf);

th = get_ticket_handler(ac, au->service);
if (IS_ERR(th))
return PTR_ERR(th);
- ret = ceph_x_decrypt(&th->session_key, &p, end, &reply, sizeof(reply));
+ ret = ceph_x_decrypt(&th->session_key, &p, end, &preply, sizeof(reply));
if (ret < 0)
return ret;
if (ret != sizeof(reply))

2014-11-01 22:45:44

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 036/102] libceph: gracefully handle large reply messages from the mon

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Sage Weil <[email protected]>

commit 73c3d4812b4c755efeca0140f606f83772a39ce4 upstream.

We preallocate a few of the message types we get back from the mon. If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.

Signed-off-by: Sage Weil <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ceph/mon_client.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/net/ceph/mon_client.c
+++ b/net/ceph/mon_client.c
@@ -987,7 +987,15 @@ static struct ceph_msg *mon_alloc_msg(st
if (!m) {
pr_info("alloc_msg unknown type %d\n", type);
*skip = 1;
+ } else if (front_len > m->front_alloc_len) {
+ pr_warning("mon_alloc_msg front %d > prealloc %d (%u#%llu)\n",
+ front_len, m->front_alloc_len,
+ (unsigned int)con->peer_name.type,
+ le64_to_cpu(con->peer_name.num));
+ ceph_msg_put(m);
+ m = ceph_msg_new(type, front_len, GFP_NOFS, false);
}
+
return m;
}

2014-11-01 22:45:47

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 005/102] cgroup: reject cgroup names with '\n'

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Alban Crequy <[email protected]>

commit 71b1fb5c4473a5b1e601d41b109bdfe001ec82e0 upstream.

/proc/<pid>/cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc/<pid>/cgroup safely.

Signed-off-by: Alban Crequy <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- We have to get the name from the dentry pointer]
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/cgroup.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3871,6 +3871,11 @@ static int cgroup_mkdir(struct inode *di
{
struct cgroup *c_parent = dentry->d_parent->d_fsdata;

+ /* Do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable.
+ */
+ if (strchr(dentry->d_name.name, '\n'))
+ return -EINVAL;
+
/* the vfs holds inode->i_mutex already */
return cgroup_create(c_parent, dentry, mode | S_IFDIR);
}

2014-11-01 22:45:53

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 035/102] libceph: rename ceph_msg::front_max to front_alloc_len

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <[email protected]>

commit 3cea4c3071d4e55e9d7356efe9d0ebf92f0c2204 upstream.

Rename front_max field of struct ceph_msg to front_alloc_len to make
its purpose more clear.

Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
include/linux/ceph/messenger.h | 2 +-
net/ceph/messenger.c | 6 +++---
net/ceph/mon_client.c | 8 ++++----
3 files changed, 8 insertions(+), 8 deletions(-)

--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -92,7 +92,7 @@ struct ceph_msg {
bool front_is_vmalloc;
bool more_to_follow;
bool needs_out_seq;
- int front_max;
+ int front_alloc_len;
unsigned long ack_stamp; /* tx: when we were acked */

struct ceph_msgpool *pool;
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2423,7 +2423,7 @@ struct ceph_msg *ceph_msg_new(int type,
m->footer.middle_crc = 0;
m->footer.data_crc = 0;
m->footer.flags = 0;
- m->front_max = front_len;
+ m->front_alloc_len = front_len;
m->front_is_vmalloc = false;
m->more_to_follow = false;
m->ack_stamp = 0;
@@ -2594,8 +2594,8 @@ EXPORT_SYMBOL(ceph_msg_last_put);

void ceph_msg_dump(struct ceph_msg *msg)
{
- pr_debug("msg_dump %p (front_max %d nr_pages %d)\n", msg,
- msg->front_max, msg->nr_pages);
+ pr_debug("msg_dump %p (front_alloc_len %d nr_pages %d)\n", msg,
+ msg->front_alloc_len, msg->nr_pages);
print_hex_dump(KERN_DEBUG, "header: ",
DUMP_PREFIX_OFFSET, 16, 1,
&msg->hdr, sizeof(msg->hdr), true);
--- a/net/ceph/mon_client.c
+++ b/net/ceph/mon_client.c
@@ -150,7 +150,7 @@ static int __open_session(struct ceph_mo
/* initiatiate authentication handshake */
ret = ceph_auth_build_hello(monc->auth,
monc->m_auth->front.iov_base,
- monc->m_auth->front_max);
+ monc->m_auth->front_alloc_len);
__send_prepared_auth_request(monc, ret);
} else {
dout("open_session mon%d already open\n", monc->cur_mon);
@@ -194,7 +194,7 @@ static void __send_subscribe(struct ceph
int num;

p = msg->front.iov_base;
- end = p + msg->front_max;
+ end = p + msg->front_alloc_len;

num = 1 + !!monc->want_next_osdmap + !!monc->want_mdsmap;
ceph_encode_32(&p, num);
@@ -860,7 +860,7 @@ static void handle_auth_reply(struct cep
ret = ceph_handle_auth_reply(monc->auth, msg->front.iov_base,
msg->front.iov_len,
monc->m_auth->front.iov_base,
- monc->m_auth->front_max);
+ monc->m_auth->front_alloc_len);
if (ret < 0) {
monc->client->auth_err = ret;
wake_up_all(&monc->client->auth_wq);
@@ -887,7 +887,7 @@ static int __validate_auth(struct ceph_m
return 0;

ret = ceph_build_auth(monc->auth, monc->m_auth->front.iov_base,
- monc->m_auth->front_max);
+ monc->m_auth->front_alloc_len);
if (ret <= 0)
return ret; /* either an error, or no need to authenticate */
__send_prepared_auth_request(monc, ret);

2014-11-01 22:45:57

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 073/102] ocfs2/dlm: do not get resource spinlock if lockres is new

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Joseph Qi <[email protected]>

commit 5760a97c7143c208fa3a8f8cad0ed7dd672ebd28 upstream.

There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.

It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.

Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: joyce.xue <[email protected]>
Reported-by: Guozhonghua <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/ocfs2/dlm/dlmmaster.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -653,12 +653,9 @@ void dlm_lockres_clear_refmap_bit(struct
clear_bit(bit, res->refmap);
}

-
-void dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
+static void __dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
struct dlm_lock_resource *res)
{
- assert_spin_locked(&res->spinlock);
-
res->inflight_locks++;

mlog(0, "%s: res %.*s, inflight++: now %u, %ps()\n", dlm->name,
@@ -666,6 +663,13 @@ void dlm_lockres_grab_inflight_ref(struc
__builtin_return_address(0));
}

+void dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
+ struct dlm_lock_resource *res)
+{
+ assert_spin_locked(&res->spinlock);
+ __dlm_lockres_grab_inflight_ref(dlm, res);
+}
+
void dlm_lockres_drop_inflight_ref(struct dlm_ctxt *dlm,
struct dlm_lock_resource *res)
{
@@ -855,10 +859,8 @@ lookup:
/* finally add the lockres to its hash bucket */
__dlm_insert_lockres(dlm, res);

- /* Grab inflight ref to pin the resource */
- spin_lock(&res->spinlock);
- dlm_lockres_grab_inflight_ref(dlm, res);
- spin_unlock(&res->spinlock);
+ /* since this lockres is new it doesn't not require the spinlock */
+ __dlm_lockres_grab_inflight_ref(dlm, res);

/* get an extra ref on the mle in case this is a BLOCK
* if so, the creator of the BLOCK may try to put the last

2014-11-01 22:45:51

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 076/102] perf: fix perf bug in fork()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit 6c72e3501d0d62fc064d3680e5234f3463ec5a86 upstream.

Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..

Suggested-by: Oleg Nesterov <[email protected]>
Reported-by: Oleg Nesterov <[email protected]>
Reported-by: Sylvain 'ythier' Hitier <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/events/core.c | 4 +++-
kernel/fork.c | 5 +++--
2 files changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7081,8 +7081,10 @@ int perf_event_init_task(struct task_str

for_each_task_context_nr(ctxn) {
ret = perf_event_init_context(child, ctxn);
- if (ret)
+ if (ret) {
+ perf_event_free_task(child);
return ret;
+ }
}

return 0;
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1221,7 +1221,7 @@ static struct task_struct *copy_process(
goto bad_fork_cleanup_policy;
retval = audit_alloc(p);
if (retval)
- goto bad_fork_cleanup_policy;
+ goto bad_fork_cleanup_perf;
/* copy all the process information */
retval = copy_semundo(clone_flags, p);
if (retval)
@@ -1406,8 +1406,9 @@ bad_fork_cleanup_semundo:
exit_sem(p);
bad_fork_cleanup_audit:
audit_free(p);
-bad_fork_cleanup_policy:
+bad_fork_cleanup_perf:
perf_event_free_task(p);
+bad_fork_cleanup_policy:
#ifdef CONFIG_NUMA
mpol_put(p->mempolicy);
bad_fork_cleanup_cgroup:

2014-11-01 22:45:46

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 030/102] Input: elantech - fix detection of touchpad on ASUS s301l

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit 271329b3c798b2102120f5df829071c211ef00ed upstream.

Adjust Elantech signature validation to account fo rnewer models of
touchpads.

Reported-and-tested-by: Màrius Monton <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/input/mouse/elantech.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1130,6 +1130,13 @@ static bool elantech_is_signature_valid(
if (param[1] == 0)
return true;

+ /*
+ * Some models have a revision higher then 20. Meaning param[2] may
+ * be 10 or 20, skip the rates check for these.
+ */
+ if (param[0] == 0x46 && (param[1] & 0xef) == 0x0f && param[2] < 40)
+ return true;
+
for (i = 0; i < ARRAY_SIZE(rates); i++)
if (param[2] == rates[i])
return false;

2014-11-01 22:45:38

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 077/102] init/Kconfig: Hide printk log config if CONFIG_PRINTK=n

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Josh Triplett <[email protected]>

commit 361e9dfbaae84b0b246ed18d1ab7c82a1a41b53e upstream.

The buffers sized by CONFIG_LOG_BUF_SHIFT and
CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't
ask about their size at all.

Signed-off-by: Josh Triplett <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
[bwh: Backported to 3.2: drop change to CONFIG_LOG_CPU_MAX_BUF_SHIFT]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -555,6 +555,7 @@ config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
range 12 21
default 17
+ depends on PRINTK
help
Select kernel log buffer size as a power of 2.
Examples:

2014-11-01 22:45:35

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 044/102] Input: i8042 - add nomux quirk for Avatar AVIU-145A6

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit d2682118f4bb3ceb835f91c1a694407a31bb7378 upstream.

The sys_vendor / product_name are somewhat generic unfortunately, so this
may lead to some false positives. But nomux usually does no harm, where as
not having it clearly is causing problems on the Avatar AVIU-145A6.

https://bugzilla.kernel.org/show_bug.cgi?id=77391

Reported-by: Hugo P <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/input/serio/i8042-x86ia64io.h | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -458,6 +458,13 @@ static const struct dmi_system_id __init
DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv4 Notebook PC"),
},
},
+ {
+ /* Avatar AVIU-145A6 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Intel"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "IC4I"),
+ },
+ },
{ }
};

2014-11-01 22:45:31

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 043/102] Input: i8042 - add Fujitsu U574 to no_timeout dmi table

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit cc18a69c92d0972bc2fc5a047ee3be1e8398171b upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=69731

Reported-by: Jason Robinson <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/input/serio/i8042-x86ia64io.h | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -594,6 +594,14 @@ static const struct dmi_system_id __init
DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv4 Notebook PC"),
},
},
+ {
+ /* Fujitsu U574 laptop */
+ /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U574"),
+ },
+ },
{ }
};

2014-11-01 22:45:28

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 037/102] libceph: add process_one_ticket() helper

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <[email protected]>

commit 597cda357716a3cf8d994cb11927af917c8d71fa upstream.

Add a helper for processing individual cephx auth tickets. Needed for
the next commit, which deals with allocating ticket buffers. (Most of
the diff here is whitespace - view with git diff -b).

Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ceph/auth_x.c | 228 +++++++++++++++++++++++++++++-------------------------
1 file changed, 124 insertions(+), 104 deletions(-)

--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -129,17 +129,131 @@ static void remove_ticket_handler(struct
kfree(th);
}

+static int process_one_ticket(struct ceph_auth_client *ac,
+ struct ceph_crypto_key *secret,
+ void **p, void *end,
+ void *dbuf, void *ticket_buf)
+{
+ struct ceph_x_info *xi = ac->private;
+ int type;
+ u8 tkt_struct_v, blob_struct_v;
+ struct ceph_x_ticket_handler *th;
+ void *dp, *dend;
+ int dlen;
+ char is_enc;
+ struct timespec validity;
+ struct ceph_crypto_key old_key;
+ void *tp, *tpend;
+ struct ceph_timespec new_validity;
+ struct ceph_crypto_key new_session_key;
+ struct ceph_buffer *new_ticket_blob;
+ unsigned long new_expires, new_renew_after;
+ u64 new_secret_id;
+ int ret;
+
+ ceph_decode_need(p, end, sizeof(u32) + 1, bad);
+
+ type = ceph_decode_32(p);
+ dout(" ticket type %d %s\n", type, ceph_entity_type_name(type));
+
+ tkt_struct_v = ceph_decode_8(p);
+ if (tkt_struct_v != 1)
+ goto bad;
+
+ th = get_ticket_handler(ac, type);
+ if (IS_ERR(th)) {
+ ret = PTR_ERR(th);
+ goto out;
+ }
+
+ /* blob for me */
+ dlen = ceph_x_decrypt(secret, p, end, dbuf,
+ TEMP_TICKET_BUF_LEN);
+ if (dlen <= 0) {
+ ret = dlen;
+ goto out;
+ }
+ dout(" decrypted %d bytes\n", dlen);
+ dp = dbuf;
+ dend = dp + dlen;
+
+ tkt_struct_v = ceph_decode_8(&dp);
+ if (tkt_struct_v != 1)
+ goto bad;
+
+ memcpy(&old_key, &th->session_key, sizeof(old_key));
+ ret = ceph_crypto_key_decode(&new_session_key, &dp, dend);
+ if (ret)
+ goto out;
+
+ ceph_decode_copy(&dp, &new_validity, sizeof(new_validity));
+ ceph_decode_timespec(&validity, &new_validity);
+ new_expires = get_seconds() + validity.tv_sec;
+ new_renew_after = new_expires - (validity.tv_sec / 4);
+ dout(" expires=%lu renew_after=%lu\n", new_expires,
+ new_renew_after);
+
+ /* ticket blob for service */
+ ceph_decode_8_safe(p, end, is_enc, bad);
+ tp = ticket_buf;
+ if (is_enc) {
+ /* encrypted */
+ dout(" encrypted ticket\n");
+ dlen = ceph_x_decrypt(&old_key, p, end, ticket_buf,
+ TEMP_TICKET_BUF_LEN);
+ if (dlen < 0) {
+ ret = dlen;
+ goto out;
+ }
+ dlen = ceph_decode_32(&tp);
+ } else {
+ /* unencrypted */
+ ceph_decode_32_safe(p, end, dlen, bad);
+ ceph_decode_need(p, end, dlen, bad);
+ ceph_decode_copy(p, ticket_buf, dlen);
+ }
+ tpend = tp + dlen;
+ dout(" ticket blob is %d bytes\n", dlen);
+ ceph_decode_need(&tp, tpend, 1 + sizeof(u64), bad);
+ blob_struct_v = ceph_decode_8(&tp);
+ new_secret_id = ceph_decode_64(&tp);
+ ret = ceph_decode_buffer(&new_ticket_blob, &tp, tpend);
+ if (ret)
+ goto out;
+
+ /* all is well, update our ticket */
+ ceph_crypto_key_destroy(&th->session_key);
+ if (th->ticket_blob)
+ ceph_buffer_put(th->ticket_blob);
+ th->session_key = new_session_key;
+ th->ticket_blob = new_ticket_blob;
+ th->validity = new_validity;
+ th->secret_id = new_secret_id;
+ th->expires = new_expires;
+ th->renew_after = new_renew_after;
+ dout(" got ticket service %d (%s) secret_id %lld len %d\n",
+ type, ceph_entity_type_name(type), th->secret_id,
+ (int)th->ticket_blob->vec.iov_len);
+ xi->have_keys |= th->service;
+
+out:
+ return ret;
+
+bad:
+ ret = -EINVAL;
+ goto out;
+}
+
static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
struct ceph_crypto_key *secret,
void *buf, void *end)
{
- struct ceph_x_info *xi = ac->private;
- int num;
void *p = buf;
- int ret;
char *dbuf;
char *ticket_buf;
u8 reply_struct_v;
+ u32 num;
+ int ret;

dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
if (!dbuf)
@@ -150,112 +264,18 @@ static int ceph_x_proc_ticket_reply(stru
if (!ticket_buf)
goto out_dbuf;

- ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
- reply_struct_v = ceph_decode_8(&p);
+ ceph_decode_8_safe(&p, end, reply_struct_v, bad);
if (reply_struct_v != 1)
- goto bad;
- num = ceph_decode_32(&p);
- dout("%d tickets\n", num);
- while (num--) {
- int type;
- u8 tkt_struct_v, blob_struct_v;
- struct ceph_x_ticket_handler *th;
- void *dp, *dend;
- int dlen;
- char is_enc;
- struct timespec validity;
- struct ceph_crypto_key old_key;
- void *tp, *tpend;
- struct ceph_timespec new_validity;
- struct ceph_crypto_key new_session_key;
- struct ceph_buffer *new_ticket_blob;
- unsigned long new_expires, new_renew_after;
- u64 new_secret_id;
-
- ceph_decode_need(&p, end, sizeof(u32) + 1, bad);
-
- type = ceph_decode_32(&p);
- dout(" ticket type %d %s\n", type, ceph_entity_type_name(type));
-
- tkt_struct_v = ceph_decode_8(&p);
- if (tkt_struct_v != 1)
- goto bad;
-
- th = get_ticket_handler(ac, type);
- if (IS_ERR(th)) {
- ret = PTR_ERR(th);
- goto out;
- }
-
- /* blob for me */
- dlen = ceph_x_decrypt(secret, &p, end, dbuf,
- TEMP_TICKET_BUF_LEN);
- if (dlen <= 0) {
- ret = dlen;
- goto out;
- }
- dout(" decrypted %d bytes\n", dlen);
- dend = dbuf + dlen;
- dp = dbuf;
-
- tkt_struct_v = ceph_decode_8(&dp);
- if (tkt_struct_v != 1)
- goto bad;
+ return -EINVAL;

- memcpy(&old_key, &th->session_key, sizeof(old_key));
- ret = ceph_crypto_key_decode(&new_session_key, &dp, dend);
- if (ret)
- goto out;
+ ceph_decode_32_safe(&p, end, num, bad);
+ dout("%d tickets\n", num);

- ceph_decode_copy(&dp, &new_validity, sizeof(new_validity));
- ceph_decode_timespec(&validity, &new_validity);
- new_expires = get_seconds() + validity.tv_sec;
- new_renew_after = new_expires - (validity.tv_sec / 4);
- dout(" expires=%lu renew_after=%lu\n", new_expires,
- new_renew_after);
-
- /* ticket blob for service */
- ceph_decode_8_safe(&p, end, is_enc, bad);
- tp = ticket_buf;
- if (is_enc) {
- /* encrypted */
- dout(" encrypted ticket\n");
- dlen = ceph_x_decrypt(&old_key, &p, end, ticket_buf,
- TEMP_TICKET_BUF_LEN);
- if (dlen < 0) {
- ret = dlen;
- goto out;
- }
- dlen = ceph_decode_32(&tp);
- } else {
- /* unencrypted */
- ceph_decode_32_safe(&p, end, dlen, bad);
- ceph_decode_need(&p, end, dlen, bad);
- ceph_decode_copy(&p, ticket_buf, dlen);
- }
- tpend = tp + dlen;
- dout(" ticket blob is %d bytes\n", dlen);
- ceph_decode_need(&tp, tpend, 1 + sizeof(u64), bad);
- blob_struct_v = ceph_decode_8(&tp);
- new_secret_id = ceph_decode_64(&tp);
- ret = ceph_decode_buffer(&new_ticket_blob, &tp, tpend);
+ while (num--) {
+ ret = process_one_ticket(ac, secret, &p, end,
+ dbuf, ticket_buf);
if (ret)
goto out;
-
- /* all is well, update our ticket */
- ceph_crypto_key_destroy(&th->session_key);
- if (th->ticket_blob)
- ceph_buffer_put(th->ticket_blob);
- th->session_key = new_session_key;
- th->ticket_blob = new_ticket_blob;
- th->validity = new_validity;
- th->secret_id = new_secret_id;
- th->expires = new_expires;
- th->renew_after = new_renew_after;
- dout(" got ticket service %d (%s) secret_id %lld len %d\n",
- type, ceph_entity_type_name(type), th->secret_id,
- (int)th->ticket_blob->vec.iov_len);
- xi->have_keys |= th->service;
}

ret = 0;

2014-11-01 22:48:32

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 039/102] Input: serport - add compat handling for SPIOCSTYPE ioctl

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: John Sung <[email protected]>

commit a80d8b02751060a178bb1f7a6b7a93645a7a308b upstream.

When running a 32-bit inputattach utility in a 64-bit system, there will be
error code "inputattach: can't set device type". This is caused by the
serport device driver not supporting compat_ioctl, so that SPIOCSTYPE ioctl
fails.

Signed-off-by: John Sung <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/input/serio/serport.c | 45 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 7 deletions(-)

--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -21,6 +21,7 @@
#include <linux/init.h>
#include <linux/serio.h>
#include <linux/tty.h>
+#include <linux/compat.h>

MODULE_AUTHOR("Vojtech Pavlik <[email protected]>");
MODULE_DESCRIPTION("Input device TTY line discipline");
@@ -196,28 +197,55 @@ static ssize_t serport_ldisc_read(struct
return 0;
}

+static void serport_set_type(struct tty_struct *tty, unsigned long type)
+{
+ struct serport *serport = tty->disc_data;
+
+ serport->id.proto = type & 0x000000ff;
+ serport->id.id = (type & 0x0000ff00) >> 8;
+ serport->id.extra = (type & 0x00ff0000) >> 16;
+}
+
/*
* serport_ldisc_ioctl() allows to set the port protocol, and device ID
*/

-static int serport_ldisc_ioctl(struct tty_struct * tty, struct file * file, unsigned int cmd, unsigned long arg)
+static int serport_ldisc_ioctl(struct tty_struct *tty, struct file *file,
+ unsigned int cmd, unsigned long arg)
{
- struct serport *serport = (struct serport*) tty->disc_data;
- unsigned long type;
-
if (cmd == SPIOCSTYPE) {
+ unsigned long type;
+
if (get_user(type, (unsigned long __user *) arg))
return -EFAULT;

- serport->id.proto = type & 0x000000ff;
- serport->id.id = (type & 0x0000ff00) >> 8;
- serport->id.extra = (type & 0x00ff0000) >> 16;
+ serport_set_type(tty, type);
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+#ifdef CONFIG_COMPAT
+#define COMPAT_SPIOCSTYPE _IOW('q', 0x01, compat_ulong_t)
+static long serport_ldisc_compat_ioctl(struct tty_struct *tty,
+ struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ if (cmd == COMPAT_SPIOCSTYPE) {
+ void __user *uarg = compat_ptr(arg);
+ compat_ulong_t compat_type;
+
+ if (get_user(compat_type, (compat_ulong_t __user *)uarg))
+ return -EFAULT;

+ serport_set_type(tty, compat_type);
return 0;
}

return -EINVAL;
}
+#endif

static void serport_ldisc_write_wakeup(struct tty_struct * tty)
{
@@ -241,6 +269,9 @@ static struct tty_ldisc_ops serport_ldis
.close = serport_ldisc_close,
.read = serport_ldisc_read,
.ioctl = serport_ldisc_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = serport_ldisc_compat_ioctl,
+#endif
.receive_buf = serport_ldisc_receive,
.write_wakeup = serport_ldisc_write_wakeup
};

2014-11-01 22:48:37

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 041/102] storage: Add single-LUN quirk for Jaz USB Adapter

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mark <[email protected]>

commit c66f1c62e85927357e7b3f4c701614dcb5c498a2 upstream.

The Iomega Jaz USB Adapter is a SCSI-USB converter cable. The hardware
seems to be identical to e.g. the Microtech XpressSCSI, using a Shuttle/
SCM chip set. However its firmware restricts it to only work with Jaz
drives.

On connecting the cable a message like this appears four times in the log:
reset full speed USB device number 4 using uhci_hcd

That's non-fatal but the US_FL_SINGLE_LUN quirk fixes it.

Signed-off-by: Mark Knibbs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/storage/unusual_devs.h | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -733,6 +733,12 @@ UNUSUAL_DEV( 0x059b, 0x0001, 0x0100, 0x
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_SINGLE_LUN ),

+UNUSUAL_DEV( 0x059b, 0x0040, 0x0100, 0x0100,
+ "Iomega",
+ "Jaz USB Adapter",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_SINGLE_LUN ),
+
/* Reported by <[email protected]> */
UNUSUAL_DEV( 0x059f, 0x0643, 0x0000, 0x0000,
"LaCie",

2014-11-01 22:48:51

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 047/102] alarmtimer: Return relative times in timer_gettime

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Richard Larocque <[email protected]>

commit e86fea764991e00a03ff1e56409ec9cacdbda4c9 upstream.

Returns the time remaining for an alarm timer, rather than the time at
which it is scheduled to expire. If the timer has already expired or it
is not currently scheduled, the it_value's members are set to zero.

This new behavior matches that of the other posix-timers and the POSIX
specifications.

This is a change in user-visible behavior, and may break existing
applications. Hopefully, few users rely on the old incorrect behavior.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Sharvil Nanavati <[email protected]>
Signed-off-by: Richard Larocque <[email protected]>
[jstultz: minor style tweak]
Signed-off-by: John Stultz <[email protected]>
[bwh: Backported to 3.2: Add definition of alarm_expires_remaining() from
commit 6cffe00f7d4e ('alarmtimer: Add functions for timerfd support')]
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/time/alarmtimer.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -226,6 +226,12 @@ static enum hrtimer_restart alarmtimer_f

}

+ktime_t alarm_expires_remaining(const struct alarm *alarm)
+{
+ struct alarm_base *base = &alarm_bases[alarm->type];
+ return ktime_sub(alarm->node.expires, base->gettime());
+}
+
#ifdef CONFIG_RTC_CLASS
/**
* alarmtimer_suspend - Suspend time callback
@@ -519,18 +525,22 @@ static int alarm_timer_create(struct k_i
* @new_timer: k_itimer pointer
* @cur_setting: itimerspec data to fill
*
- * Copies the itimerspec data out from the k_itimer
+ * Copies out the current itimerspec data
*/
static void alarm_timer_get(struct k_itimer *timr,
struct itimerspec *cur_setting)
{
- memset(cur_setting, 0, sizeof(struct itimerspec));
+ ktime_t relative_expiry_time =
+ alarm_expires_remaining(&(timr->it.alarm.alarmtimer));
+
+ if (ktime_to_ns(relative_expiry_time) > 0) {
+ cur_setting->it_value = ktime_to_timespec(relative_expiry_time);
+ } else {
+ cur_setting->it_value.tv_sec = 0;
+ cur_setting->it_value.tv_nsec = 0;
+ }

- cur_setting->it_interval =
- ktime_to_timespec(timr->it.alarm.interval);
- cur_setting->it_value =
- ktime_to_timespec(timr->it.alarm.alarmtimer.node.expires);
- return;
+ cur_setting->it_interval = ktime_to_timespec(timr->it.alarm.interval);
}

/**
--- a/include/linux/alarmtimer.h
+++ b/include/linux/alarmtimer.h
@@ -48,6 +48,7 @@ int alarm_try_to_cancel(struct alarm *al
int alarm_cancel(struct alarm *alarm);

u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval);
+ktime_t alarm_expires_remaining(const struct alarm *alarm);

/*
* A alarmtimer is active, when it is enqueued into timerqueue or the

2014-11-01 22:48:55

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 055/102] NFSv4: Fix another bug in the close/open_downgrade code

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <[email protected]>

commit cd9288ffaea4359d5cfe2b8d264911506aed26a4 upstream.

James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.

Reported-by: James Drews <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Fixes: aee7af356e15 (NFSv4: Fix problems with close in the presence...)
Signed-off-by: Trond Myklebust <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/nfs/nfs4proc.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2015,23 +2015,23 @@ static void nfs4_close_prepare(struct rp
is_rdwr = test_bit(NFS_O_RDWR_STATE, &state->flags);
is_rdonly = test_bit(NFS_O_RDONLY_STATE, &state->flags);
is_wronly = test_bit(NFS_O_WRONLY_STATE, &state->flags);
- /* Calculate the current open share mode */
- calldata->arg.fmode = 0;
- if (is_rdonly || is_rdwr)
- calldata->arg.fmode |= FMODE_READ;
- if (is_wronly || is_rdwr)
- calldata->arg.fmode |= FMODE_WRITE;
/* Calculate the change in open mode */
+ calldata->arg.fmode = 0;
if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
- call_close |= is_rdonly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_READ;
- }
- if (state->n_wronly == 0) {
- call_close |= is_wronly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_WRITE;
- }
- }
+ if (state->n_rdonly == 0)
+ call_close |= is_rdonly;
+ else if (is_rdonly)
+ calldata->arg.fmode |= FMODE_READ;
+ if (state->n_wronly == 0)
+ call_close |= is_wronly;
+ else if (is_wronly)
+ calldata->arg.fmode |= FMODE_WRITE;
+ } else if (is_rdwr)
+ calldata->arg.fmode |= FMODE_READ|FMODE_WRITE;
+
+ if (calldata->arg.fmode == 0)
+ call_close |= is_rdwr;
+
spin_unlock(&state->owner->so_lock);

if (!call_close) {

2014-11-01 22:49:01

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 050/102] don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 7bd88377d482e1eae3c5329b12e33cfd664fa6a9 upstream.

return the value instead, and have path_init() do the assignment. Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination. This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers. And do the same to set_root(), to keep them
in sync.

Signed-off-by: Al Viro <[email protected]>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/namei.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -554,24 +554,22 @@ static int complete_walk(struct nameidat

static __always_inline void set_root(struct nameidata *nd)
{
- if (!nd->root.mnt)
- get_fs_root(current->fs, &nd->root);
+ get_fs_root(current->fs, &nd->root);
}

static int link_path_walk(const char *, struct nameidata *);

-static __always_inline void set_root_rcu(struct nameidata *nd)
+static __always_inline unsigned set_root_rcu(struct nameidata *nd)
{
- if (!nd->root.mnt) {
- struct fs_struct *fs = current->fs;
- unsigned seq;
+ struct fs_struct *fs = current->fs;
+ unsigned seq, res;

- do {
- seq = read_seqcount_begin(&fs->seq);
- nd->root = fs->root;
- nd->seq = __read_seqcount_begin(&nd->root.dentry->d_seq);
- } while (read_seqcount_retry(&fs->seq, seq));
- }
+ do {
+ seq = read_seqcount_begin(&fs->seq);
+ nd->root = fs->root;
+ res = __read_seqcount_begin(&nd->root.dentry->d_seq);
+ } while (read_seqcount_retry(&fs->seq, seq));
+ return res;
}

static __always_inline int __vfs_follow_link(struct nameidata *nd, const char *link)
@@ -582,7 +580,8 @@ static __always_inline int __vfs_follow_
goto fail;

if (*link == '/') {
- set_root(nd);
+ if (!nd->root.mnt)
+ set_root(nd);
path_put(&nd->path);
nd->path = nd->root;
path_get(&nd->root);
@@ -927,7 +926,8 @@ static void follow_mount_rcu(struct name

static int follow_dotdot_rcu(struct nameidata *nd)
{
- set_root_rcu(nd);
+ if (!nd->root.mnt)
+ set_root_rcu(nd);

while (1) {
if (nd->path.dentry == nd->root.dentry &&
@@ -1030,7 +1030,8 @@ static void follow_mount(struct path *pa

static void follow_dotdot(struct nameidata *nd)
{
- set_root(nd);
+ if (!nd->root.mnt)
+ set_root(nd);

while(1) {
struct dentry *old = nd->path.dentry;
@@ -1504,7 +1505,7 @@ static int path_init(int dfd, const char
if (flags & LOOKUP_RCU) {
br_read_lock(vfsmount_lock);
rcu_read_lock();
- set_root_rcu(nd);
+ nd->seq = set_root_rcu(nd);
} else {
set_root(nd);
path_get(&nd->root);

2014-11-01 22:49:05

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 053/102] iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <[email protected]>

commit b53b0d99d6fbf7d44330395349a895521cfdbc96 upstream.

This patch fixes a bug in iscsit_logout_post_handler_diffcid() where
a pointer used as storage for list_for_each_entry() was incorrectly
being used to determine if no matching entry had been found.

This patch changes iscsit_logout_post_handler_diffcid() to key off
bool conn_found to determine if the function needs to exit early.

Reported-by: Joern Engel <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/target/iscsi/iscsi_target.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4306,6 +4306,7 @@ static void iscsit_logout_post_handler_d
{
struct iscsi_conn *l_conn;
struct iscsi_session *sess = conn->sess;
+ bool conn_found = false;

if (!sess)
return;
@@ -4314,12 +4315,13 @@ static void iscsit_logout_post_handler_d
list_for_each_entry(l_conn, &sess->sess_conn_list, conn_list) {
if (l_conn->cid == cid) {
iscsit_inc_conn_usage_count(l_conn);
+ conn_found = true;
break;
}
}
spin_unlock_bh(&sess->conn_lock);

- if (!l_conn)
+ if (!conn_found)
return;

if (l_conn->sock)

2014-11-01 22:48:59

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 054/102] iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Joern Engel <[email protected]>

commit 8ae757d09c45102b347a1bc2867f54ffc1ab8fda upstream.

In iscsi_copy_param_list() a failed iscsi_param_list memory allocation
currently invokes iscsi_release_param_list() to cleanup, and will promptly
trigger a NULL pointer dereference.

Instead, go ahead and return for the first iscsi_copy_param_list()
failure case.

Found by coverity.

Signed-off-by: Joern Engel <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/target/iscsi/iscsi_target_parameters.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/target/iscsi/iscsi_target_parameters.c
+++ b/drivers/target/iscsi/iscsi_target_parameters.c
@@ -552,7 +552,7 @@ int iscsi_copy_param_list(
param_list = kzalloc(sizeof(struct iscsi_param_list), GFP_KERNEL);
if (!param_list) {
pr_err("Unable to allocate memory for struct iscsi_param_list.\n");
- goto err_out;
+ return -1;
}
INIT_LIST_HEAD(&param_list->param_list);
INIT_LIST_HEAD(&param_list->extra_response_list);

2014-11-01 22:48:48

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 046/102] jiffies: Fix timeval conversion to jiffies

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Andrew Hunter <[email protected]>

commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream.

timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.) Doing
this repeatedly would cause unbounded growth in val. So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) >> USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies. This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
struct itimerval zero = {{0, 0}, {0, 0}};
/* Initially set to 10 ms. */
struct itimerval initial = zero;
initial.it_interval.tv_usec = 10000;
setitimer(ITIMER_PROF, &initial, NULL);
/* Save and restore several times. */
for (size_t i = 0; i < 10; ++i) {
struct itimerval prev;
setitimer(ITIMER_PROF, &zero, &prev);
/* on old kernels, this goes up by TICK_USEC every iteration */
printf("previous value: %ld %ld %ld %ld\n",
prev.it_interval.tv_sec, prev.it_interval.tv_usec,
prev.it_value.tv_sec, prev.it_value.tv_usec);
setitimer(ITIMER_PROF, &prev, NULL);
}
return 0;
}

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Reviewed-by: Paul Turner <[email protected]>
Reported-by: Aaron Jacobs <[email protected]>
Signed-off-by: Andrew Hunter <[email protected]>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <[email protected]>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <[email protected]>
---
include/linux/jiffies.h | 12 -----------
kernel/time.c | 56 +++++++++++++++++++++++++++----------------------
2 files changed, 31 insertions(+), 37 deletions(-)

--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -259,23 +259,11 @@ extern unsigned long preset_lpj;
#define SEC_JIFFIE_SC (32 - SHIFT_HZ)
#endif
#define NSEC_JIFFIE_SC (SEC_JIFFIE_SC + 29)
-#define USEC_JIFFIE_SC (SEC_JIFFIE_SC + 19)
#define SEC_CONVERSION ((unsigned long)((((u64)NSEC_PER_SEC << SEC_JIFFIE_SC) +\
TICK_NSEC -1) / (u64)TICK_NSEC))

#define NSEC_CONVERSION ((unsigned long)((((u64)1 << NSEC_JIFFIE_SC) +\
TICK_NSEC -1) / (u64)TICK_NSEC))
-#define USEC_CONVERSION \
- ((unsigned long)((((u64)NSEC_PER_USEC << USEC_JIFFIE_SC) +\
- TICK_NSEC -1) / (u64)TICK_NSEC))
-/*
- * USEC_ROUND is used in the timeval to jiffie conversion. See there
- * for more details. It is the scaled resolution rounding value. Note
- * that it is a 64-bit value. Since, when it is applied, we are already
- * in jiffies (albit scaled), it is nothing but the bits we will shift
- * off.
- */
-#define USEC_ROUND (u64)(((u64)1 << USEC_JIFFIE_SC) - 1)
/*
* The maximum jiffie value is (MAX_INT >> 1). Here we translate that
* into seconds. The 64-bit case will overflow if we are not careful,
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -493,17 +493,20 @@ EXPORT_SYMBOL(usecs_to_jiffies);
* that a remainder subtract here would not do the right thing as the
* resolution values don't fall on second boundries. I.e. the line:
* nsec -= nsec % TICK_NSEC; is NOT a correct resolution rounding.
+ * Note that due to the small error in the multiplier here, this
+ * rounding is incorrect for sufficiently large values of tv_nsec, but
+ * well formed timespecs should have tv_nsec < NSEC_PER_SEC, so we're
+ * OK.
*
* Rather, we just shift the bits off the right.
*
* The >> (NSEC_JIFFIE_SC - SEC_JIFFIE_SC) converts the scaled nsec
* value to a scaled second value.
*/
-unsigned long
-timespec_to_jiffies(const struct timespec *value)
+static unsigned long
+__timespec_to_jiffies(unsigned long sec, long nsec)
{
- unsigned long sec = value->tv_sec;
- long nsec = value->tv_nsec + TICK_NSEC - 1;
+ nsec = nsec + TICK_NSEC - 1;

if (sec >= MAX_SEC_IN_JIFFIES){
sec = MAX_SEC_IN_JIFFIES;
@@ -514,6 +517,13 @@ timespec_to_jiffies(const struct timespe
(NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;

}
+
+unsigned long
+timespec_to_jiffies(const struct timespec *value)
+{
+ return __timespec_to_jiffies(value->tv_sec, value->tv_nsec);
+}
+
EXPORT_SYMBOL(timespec_to_jiffies);

void
@@ -530,31 +540,27 @@ jiffies_to_timespec(const unsigned long
}
EXPORT_SYMBOL(jiffies_to_timespec);

-/* Same for "timeval"
+/*
+ * We could use a similar algorithm to timespec_to_jiffies (with a
+ * different multiplier for usec instead of nsec). But this has a
+ * problem with rounding: we can't exactly add TICK_NSEC - 1 to the
+ * usec value, since it's not necessarily integral.
+ *
+ * We could instead round in the intermediate scaled representation
+ * (i.e. in units of 1/2^(large scale) jiffies) but that's also
+ * perilous: the scaling introduces a small positive error, which
+ * combined with a division-rounding-upward (i.e. adding 2^(scale) - 1
+ * units to the intermediate before shifting) leads to accidental
+ * overflow and overestimates.
*
- * Well, almost. The problem here is that the real system resolution is
- * in nanoseconds and the value being converted is in micro seconds.
- * Also for some machines (those that use HZ = 1024, in-particular),
- * there is a LARGE error in the tick size in microseconds.
-
- * The solution we use is to do the rounding AFTER we convert the
- * microsecond part. Thus the USEC_ROUND, the bits to be shifted off.
- * Instruction wise, this should cost only an additional add with carry
- * instruction above the way it was done above.
+ * At the cost of one additional multiplication by a constant, just
+ * use the timespec implementation.
*/
unsigned long
timeval_to_jiffies(const struct timeval *value)
{
- unsigned long sec = value->tv_sec;
- long usec = value->tv_usec;
-
- if (sec >= MAX_SEC_IN_JIFFIES){
- sec = MAX_SEC_IN_JIFFIES;
- usec = 0;
- }
- return (((u64)sec * SEC_CONVERSION) +
- (((u64)usec * USEC_CONVERSION + USEC_ROUND) >>
- (USEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
+ return __timespec_to_jiffies(value->tv_sec,
+ value->tv_usec * NSEC_PER_USEC);
}
EXPORT_SYMBOL(timeval_to_jiffies);

2014-11-01 22:48:44

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 072/102] nilfs2: fix data loss with mmap()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Andreas Rohner <[email protected]>

commit 56d7acc792c0d98f38f22058671ee715ff197023 upstream.

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:

----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt

dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

/root/mmaptest/mmaptest /mnt/testfile 30 10 5

sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt

echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);

for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------

The output of the script looks something like this:

BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[[email protected]: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <[email protected]>
Tested-by: Andreas Rohner <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/nilfs2/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -24,6 +24,7 @@
#include <linux/buffer_head.h>
#include <linux/gfp.h>
#include <linux/mpage.h>
+#include <linux/pagemap.h>
#include <linux/writeback.h>
#include <linux/uio.h>
#include "nilfs.h"
@@ -195,10 +196,10 @@ static int nilfs_writepage(struct page *

static int nilfs_set_page_dirty(struct page *page)
{
+ struct inode *inode = page->mapping->host;
int ret = __set_page_dirty_nobuffers(page);

if (page_has_buffers(page)) {
- struct inode *inode = page->mapping->host;
unsigned nr_dirty = 0;
struct buffer_head *bh, *head;

@@ -221,6 +222,10 @@ static int nilfs_set_page_dirty(struct p

if (nr_dirty)
nilfs_set_file_dirty(inode, nr_dirty);
+ } else if (ret) {
+ unsigned nr_dirty = 1 << (PAGE_CACHE_SHIFT - inode->i_blkbits);
+
+ nilfs_set_file_dirty(inode, nr_dirty);
}
return ret;
}

2014-11-01 22:48:41

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 042/102] xhci: Fix null pointer dereference if xhci initialization fails

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Nyman <[email protected]>

commit c207e7c50f31113c24a9f536fcab1e8a256985d7 upstream.

If xhci initialization fails before the roothub bandwidth
domains (xhci->rh_bw[i]) are allocated it will oops when
trying to access rh_bw members in xhci_mem_cleanup().

Reported-by: Manuel Reimer <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/host/xhci-mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1723,7 +1723,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x
}

num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
- for (i = 0; i < num_ports; i++) {
+ for (i = 0; i < num_ports && xhci->rh_bw; i++) {
struct xhci_interval_bw_table *bwt = &xhci->rh_bw[i].bw_table;
for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
struct list_head *ep = &bwt->interval_bw[j].endpoints;

2014-11-01 22:51:27

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 068/102] parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: John David Anglin <[email protected]>

commit d26a7730b5874a5fa6779c62f4ad7c5065a94723 upstream.

In spite of what the GCC manual says, the -mfast-indirect-calls has
never been supported in the 64-bit parisc compiler. Indirect calls have
always been done using function descriptors irrespective of the
-mfast-indirect-calls option.

Recently, it was noticed that a function descriptor was always requested
when the -mfast-indirect-calls option was specified. This caused
problems when the option was used in application code and doesn't make
any sense because the whole point of the option is to avoid using a
function descriptor for indirect calls.

Fixing this broke 64-bit kernel builds.

I will fix GCC but for now we need the attached change. This results in
the same kernel code as before.

Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/parisc/Makefile | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -47,7 +47,12 @@ cflags-y := -pipe

# These flags should be implied by an hppa-linux configuration, but they
# are not in gcc 3.2.
-cflags-y += -mno-space-regs -mfast-indirect-calls
+cflags-y += -mno-space-regs
+
+# -mfast-indirect-calls is only relevant for 32-bit kernels.
+ifndef CONFIG_64BIT
+cflags-y += -mfast-indirect-calls
+endif

# Currently we save and restore fpregs on all kernel entry/interruption paths.
# If that gets optimized, we might need to disable the use of fpregs in the

2014-11-01 22:51:30

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 067/102] Fix nasty 32-bit overflow bug in buffer i/o code.

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Anton Altaparmakov <[email protected]>

commit f2d5a94436cc7cc0221b9a81bba2276a25187dd3 upstream.

On 32-bit architectures, the legacy buffer_head functions are not always
handling the sector number with the proper 64-bit types, and will thus
fail on 4TB+ disks.

Any code that uses __getblk() (and thus bread(), breadahead(),
sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
in __getblk_slow() with an infinite stream of errors logged to dmesg
like this:

__find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
b_state=0x00000020, b_size=512
device sda1 blocksize: 512

Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
top 32-bits are missing (in this case the 0x1 at the top).

This is because grow_dev_page() is broken and has a 32-bit overflow due
to shifting the page index value (a pgoff_t - which is just 32 bits on
32-bit architectures) left-shifted as the block number. But the top
bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
before the shift.

This patch fixes this issue by type casting "index" to sector_t before
doing the left shift.

Note this is not a theoretical bug but has been seen in the field on a
4TiB hard drive with logical sector size 512 bytes.

This patch has been verified to fix the infinite loop problem on 3.17-rc5
kernel using a 4TB disk image mounted using "-o loop". Without this patch
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
100% reproducibly whilst with the patch it works fine as expected.

Signed-off-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/buffer.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1021,7 +1021,8 @@ grow_dev_page(struct block_device *bdev,
bh = page_buffers(page);
if (bh->b_size == size) {
end_block = init_page_buffers(page, bdev,
- index << sizebits, size);
+ (sector_t)index << sizebits,
+ size);
goto done;
}
if (!try_to_free_buffers(page))
@@ -1042,7 +1043,8 @@ grow_dev_page(struct block_device *bdev,
*/
spin_lock(&inode->i_mapping->private_lock);
link_dev_buffers(page, bh);
- end_block = init_page_buffers(page, bdev, index << sizebits, size);
+ end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits,
+ size);
spin_unlock(&inode->i_mapping->private_lock);
done:
ret = (block < end_block) ? 1 : -ENXIO;

2014-11-01 22:51:39

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 063/102] can: flexcan: implement workaround for errata ERR005829

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: David Jander <[email protected]>

commit 25e924450fcb23c11c07c95ea8964dd9f174652e upstream.

This patch implements the workaround mentioned in ERR005829:

ERR005829: FlexCAN: FlexCAN does not transmit a message that is enabled to
be transmitted in a specific moment during the arbitration process.

Workaround: The workaround consists of two extra steps after setting up a
message for transmission:

Step 8: Reserve the first valid mailbox as an inactive mailbox (CODE=0b1000).
If RX FIFO is disabled, this mailbox must be message buffer 0. Otherwise, the
first valid mailbox can be found using the "RX FIFO filters" table in the
FlexCAN chapter of the chip reference manual.

Step 9: Write twice INACTIVE code (0b1000) into the first valid mailbox.

Signed-off-by: David Jander <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/flexcan.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -120,7 +120,9 @@
(FLEXCAN_ESR_ERR_BUS | FLEXCAN_ESR_ERR_STATE)

/* FLEXCAN interrupt flag register (IFLAG) bits */
-#define FLEXCAN_TX_BUF_ID 8
+/* Errata ERR005829 step7: Reserve first valid MB */
+#define FLEXCAN_TX_BUF_RESERVED 8
+#define FLEXCAN_TX_BUF_ID 9
#define FLEXCAN_IFLAG_BUF(x) BIT(x)
#define FLEXCAN_IFLAG_RX_FIFO_OVERFLOW BIT(7)
#define FLEXCAN_IFLAG_RX_FIFO_WARN BIT(6)
@@ -313,6 +315,14 @@ static int flexcan_start_xmit(struct sk_
flexcan_write(can_id, &regs->cantxfg[FLEXCAN_TX_BUF_ID].can_id);
flexcan_write(ctrl, &regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);

+ /* Errata ERR005829 step8:
+ * Write twice INACTIVE(0x8) code to first MB.
+ */
+ flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
+ &regs->cantxfg[FLEXCAN_TX_BUF_RESERVED].can_ctrl);
+ flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
+ &regs->cantxfg[FLEXCAN_TX_BUF_RESERVED].can_ctrl);
+
kfree_skb(skb);

/* tx_packets is incremented in flexcan_irq */
@@ -751,6 +761,10 @@ static int flexcan_chip_start(struct net
&regs->cantxfg[i].can_ctrl);
}

+ /* Errata ERR005829: mark first TX mailbox as INACTIVE */
+ flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
+ &regs->cantxfg[FLEXCAN_TX_BUF_RESERVED].can_ctrl);
+
/* mark TX mailbox as INACTIVE */
flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
&regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);

2014-11-01 22:51:43

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 057/102] USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mark <[email protected]>

commit 67d365a57a51fb9dece6a5ceb504aa381cae1e5b upstream.

The Adaptec USBConnect 2000 is another SCSI-USB converter which uses
Shuttle Technology/SCM Microsystems chips. The US_FL_SCM_MULT_TARG quirk is
required to use SCSI devices with ID other than 0.

I don't have a USBConnect 2000, but based on the other entries for Shuttle/
SCM-based converters this patch is very likely correct. I used 0x0000 and
0x9999 for bcdDeviceMin and bcdDeviceMax because I'm not sure which
bcdDevice value the product uses.

Signed-off-by: Mark Knibbs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/storage/unusual_devs.h | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -93,6 +93,12 @@ UNUSUAL_DEV( 0x03f0, 0x4002, 0x0001, 0x
"PhotoSmart R707",
USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_FIX_CAPACITY),

+UNUSUAL_DEV( 0x03f3, 0x0001, 0x0000, 0x9999,
+ "Adaptec",
+ "USBConnect 2000",
+ USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init,
+ US_FL_SCM_MULT_TARG ),
+
/* Reported by Sebastian Kapfer <[email protected]>
* and Olaf Hering <[email protected]> (different bcd's, same vendor/product)
* for USB floppies that need the SINGLE_LUN enforcement.

2014-11-01 22:51:47

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 059/102] USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mark <[email protected]>

commit c80b4495c61636edc58fe1ce300f09f24db28e10 upstream.

This patch adds quirks for Entrega Technologies (later Xircom PortGear) USB-
SCSI converters. They use Shuttle Technology EUSB-01/EUSB-S1 chips. The
US_FL_SCM_MULT_TARG quirk is needed to allow multiple devices on the SCSI
chain to be accessed. Without it only the (single) device with SCSI ID 0
can be used.

The standalone converter sold by Entrega had model number U1-SC25. Xircom
acquired Entrega and re-branded the product line PortGear. The PortGear USB
to SCSI Converter (model PGSCSI) is internally identical to the Entrega
product, but later models may use a different USB ID. The Entrega-branded
units have USB ID 1645:0007, as does my Xircom PGSCSI, but the Windows and
Macintosh drivers also support 085A:0028.

Entrega also sold the "Mac USB Dock", which provides two USB ports, a Mac
(8-pin mini-DIN) serial port and a SCSI port. It appears to the computer as
a four-port hub, USB-serial, and USB-SCSI converters. The USB-SCSI part may
have initially used the same ID as the standalone U1-SC25 (1645:0007), but
later production used 085A:0026.

My Xircom PortGear PGSCSI has bcdDevice=0x0100. Units with bcdDevice=0x0133
probably also exist.

This patch adds quirks for 1645:0007, 085A:0026 and 085A:0028. The Windows
driver INF file also mentions 085A:0032 "PortStation SCSI Module", but I
couldn't find any mention of that actually existing in the wild; perhaps it
was cancelled before release?

Signed-off-by: Mark Knibbs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/storage/unusual_devs.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -1117,6 +1117,18 @@ UNUSUAL_DEV( 0x0851, 0x1543, 0x0200, 0x
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NOT_LOCKABLE),

+UNUSUAL_DEV( 0x085a, 0x0026, 0x0100, 0x0133,
+ "Xircom",
+ "PortGear USB-SCSI (Mac USB Dock)",
+ USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init,
+ US_FL_SCM_MULT_TARG ),
+
+UNUSUAL_DEV( 0x085a, 0x0028, 0x0100, 0x0133,
+ "Xircom",
+ "PortGear USB to SCSI Converter",
+ USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init,
+ US_FL_SCM_MULT_TARG ),
+
/* Submitted by Jan De Luyck <[email protected]> */
UNUSUAL_DEV( 0x08bd, 0x1100, 0x0000, 0x0000,
"CITIZEN",
@@ -1944,6 +1956,14 @@ UNUSUAL_DEV( 0x152d, 0x2329, 0x0100, 0x
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_IGNORE_RESIDUE | US_FL_SANE_SENSE ),

+/* Entrega Technologies U1-SC25 (later Xircom PortGear PGSCSI)
+ * and Mac USB Dock USB-SCSI */
+UNUSUAL_DEV( 0x1645, 0x0007, 0x0100, 0x0133,
+ "Entrega Technologies",
+ "USB to SCSI Converter",
+ USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init,
+ US_FL_SCM_MULT_TARG ),
+
/* Reported by Robert Schedel <[email protected]>
* Note: this is a 'super top' device like the above 14cd/6600 device */
UNUSUAL_DEV( 0x1652, 0x6600, 0x0201, 0x0201,

2014-11-01 22:51:37

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 048/102] alarmtimer: Do not signal SIGEV_NONE timers

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Richard Larocque <[email protected]>

commit 265b81d23a46c39df0a735a3af4238954b41a4c2 upstream.

Avoids sending a signal to alarm timers created with sigev_notify set to
SIGEV_NONE by checking for that special case in the timeout callback.

The regular posix timers avoid sending signals to SIGEV_NONE timers by
not scheduling any callbacks for them in the first place. Although it
would be possible to do something similar for alarm timers, it's simpler
to handle this as a special case in the timeout.

Prior to this patch, the alarm timer would ignore the sigev_notify value
and try to deliver signals to the process anyway. Even worse, the
sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
specified, so the signal number could be bogus. If sigev_signo was an
unitialized value (as it often would be if SIGEV_NONE is used), then
it's hard to predict which signal will be sent.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Sharvil Nanavati <[email protected]>
Signed-off-by: Richard Larocque <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/time/alarmtimer.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -450,8 +450,10 @@ static enum alarmtimer_restart alarm_han
{
struct k_itimer *ptr = container_of(alarm, struct k_itimer,
it.alarm.alarmtimer);
- if (posix_timer_event(ptr, 0) != 0)
- ptr->it_overrun++;
+ if ((ptr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
+ if (posix_timer_event(ptr, 0) != 0)
+ ptr->it_overrun++;
+ }

/* Re-add periodic timers */
if (ptr->it.alarm.interval.tv64) {

2014-11-01 22:51:35

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 058/102] USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mark <[email protected]>

commit b6a3ed677991558ce09046397a7c4d70530d15b3 upstream.

Hi,

The Ariston Technologies iConnect 025 and iConnect 050 (also known as e.g.
iSCSI-50) are SCSI-USB converters which use Shuttle Technology/SCM
Microsystems chips. Only the connectors differ; both have the same USB ID.
The US_FL_SCM_MULT_TARG quirk is required to use SCSI devices with ID other
than 0.

I don't have one of these, but based on the other entries for Shuttle/
SCM-based converters this patch is very likely correct. I used 0x0000 and
0x9999 for bcdDeviceMin and bcdDeviceMax because I'm not sure which
bcdDevice value the products use.

Signed-off-by: Mark Knibbs <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/storage/unusual_devs.h | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -1959,6 +1959,12 @@ UNUSUAL_DEV( 0x177f, 0x0400, 0x0000, 0x
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BULK_IGNORE_TAG | US_FL_MAX_SECTORS_64 ),

+UNUSUAL_DEV( 0x1822, 0x0001, 0x0000, 0x9999,
+ "Ariston Technologies",
+ "iConnect USB to SCSI adapter",
+ USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init,
+ US_FL_SCM_MULT_TARG ),
+
/* Reported by Hans de Goede <[email protected]>
* These Appotech controllers are found in Picture Frames, they provide a
* (buggy) emulation of a cdrom drive which contains the windows software

2014-11-01 22:53:25

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 051/102] vfs: Fold follow_mount_rcu() into follow_dotdot_rcu()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <[email protected]>

This is needed before commit 4023bfc9f351 ('be careful with nd->inode
in path_init() and follow_dotdot_rcu()'). A similar change was made
upstream as part of commit b37199e626b3 ('rcuwalk: recheck mount_lock
after mountpoint crossing attempts').

Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -911,19 +911,6 @@ static bool __follow_mount_rcu(struct na
return true;
}

-static void follow_mount_rcu(struct nameidata *nd)
-{
- while (d_mountpoint(nd->path.dentry)) {
- struct vfsmount *mounted;
- mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry, 1);
- if (!mounted)
- break;
- nd->path.mnt = mounted;
- nd->path.dentry = mounted->mnt_root;
- nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
- }
-}
-
static int follow_dotdot_rcu(struct nameidata *nd)
{
if (!nd->root.mnt)
@@ -950,7 +937,15 @@ static int follow_dotdot_rcu(struct name
break;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
}
- follow_mount_rcu(nd);
+ while (d_mountpoint(nd->path.dentry)) {
+ struct vfsmount *mounted;
+ mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry, 1);
+ if (!mounted)
+ break;
+ nd->path.mnt = mounted;
+ nd->path.dentry = mounted->mnt_root;
+ nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+ }
nd->inode = nd->path.dentry->d_inode;
return 0;

2014-11-01 22:53:33

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 062/102] can: flexcan: correctly initialize mailboxes

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: David Jander <[email protected]>

commit fc05b884a31dbf259cc73cc856e634ec3acbebb6 upstream.

Apparently mailboxes may contain random data at startup, causing some of them
being prepared for message reception. This causes overruns being missed or even
confusing the IRQ check for trasmitted messages, increasing the transmit
counter instead of the error counter.

This patch initializes all mailboxes after the FIFO as RX_INACTIVE.

Signed-off-by: David Jander <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/flexcan.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -679,6 +679,7 @@ static int flexcan_chip_start(struct net
struct flexcan_regs __iomem *regs = priv->base;
int err;
u32 reg_mcr, reg_ctrl;
+ int i;

/* enable module */
flexcan_chip_enable(priv);
@@ -744,6 +745,12 @@ static int flexcan_chip_start(struct net
dev_dbg(dev->dev.parent, "%s: writing ctrl=0x%08x", __func__, reg_ctrl);
flexcan_write(reg_ctrl, &regs->ctrl);

+ /* clear and invalidate all mailboxes first */
+ for (i = FLEXCAN_TX_BUF_ID; i < ARRAY_SIZE(regs->cantxfg); i++) {
+ flexcan_write(FLEXCAN_MB_CODE_RX_INACTIVE,
+ &regs->cantxfg[i].can_ctrl);
+ }
+
/* mark TX mailbox as INACTIVE */
flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
&regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);

2014-11-01 22:53:36

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 061/102] can: flexcan: mark TX mailbox as TX_INACTIVE

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <[email protected]>

commit c32fe4ad3e4861b2bfa1f44114c564935a123dda upstream.

This patch fixes the initialization of the TX mailbox. It is now correctly
initialized as TX_INACTIVE not RX_EMPTY.

Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/flexcan.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -131,6 +131,17 @@

/* FLEXCAN message buffers */
#define FLEXCAN_MB_CNT_CODE(x) (((x) & 0xf) << 24)
+#define FLEXCAN_MB_CODE_RX_INACTIVE (0x0 << 24)
+#define FLEXCAN_MB_CODE_RX_EMPTY (0x4 << 24)
+#define FLEXCAN_MB_CODE_RX_FULL (0x2 << 24)
+#define FLEXCAN_MB_CODE_RX_OVERRRUN (0x6 << 24)
+#define FLEXCAN_MB_CODE_RX_RANSWER (0xa << 24)
+
+#define FLEXCAN_MB_CODE_TX_INACTIVE (0x8 << 24)
+#define FLEXCAN_MB_CODE_TX_ABORT (0x9 << 24)
+#define FLEXCAN_MB_CODE_TX_DATA (0xc << 24)
+#define FLEXCAN_MB_CODE_TX_TANSWER (0xe << 24)
+
#define FLEXCAN_MB_CNT_SRR BIT(22)
#define FLEXCAN_MB_CNT_IDE BIT(21)
#define FLEXCAN_MB_CNT_RTR BIT(20)
@@ -733,8 +744,8 @@ static int flexcan_chip_start(struct net
dev_dbg(dev->dev.parent, "%s: writing ctrl=0x%08x", __func__, reg_ctrl);
flexcan_write(reg_ctrl, &regs->ctrl);

- /* Abort any pending TX, mark Mailbox as INACTIVE */
- flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
+ /* mark TX mailbox as INACTIVE */
+ flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
&regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);

/* acceptance mask/acceptance code (accept everything) */

2014-11-01 22:53:46

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 052/102] be careful with nd->inode in path_init() and follow_dotdot_rcu()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 4023bfc9f351a7994fb6a7d515476c320f94a574 upstream.

in the former we simply check if dentry is still valid after picking
its ->d_inode; in the latter we fetch ->d_inode in the same places
where we fetch dentry and its ->d_seq, under the same checks.

Signed-off-by: Al Viro <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/namei.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -913,6 +913,7 @@ static bool __follow_mount_rcu(struct na

static int follow_dotdot_rcu(struct nameidata *nd)
{
+ struct inode *inode = nd->inode;
if (!nd->root.mnt)
set_root_rcu(nd);

@@ -926,6 +927,7 @@ static int follow_dotdot_rcu(struct name
struct dentry *parent = old->d_parent;
unsigned seq;

+ inode = parent->d_inode;
seq = read_seqcount_begin(&parent->d_seq);
if (read_seqcount_retry(&old->d_seq, nd->seq))
goto failed;
@@ -935,6 +937,7 @@ static int follow_dotdot_rcu(struct name
}
if (!follow_up_rcu(&nd->path))
break;
+ inode = nd->path.dentry->d_inode;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
}
while (d_mountpoint(nd->path.dentry)) {
@@ -944,9 +947,10 @@ static int follow_dotdot_rcu(struct name
break;
nd->path.mnt = mounted;
nd->path.dentry = mounted->mnt_root;
+ inode = nd->path.dentry->d_inode;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
}
- nd->inode = nd->path.dentry->d_inode;
+ nd->inode = inode;
return 0;

failed:
@@ -1556,7 +1560,14 @@ static int path_init(int dfd, const char
}

nd->inode = nd->path.dentry->d_inode;
- return 0;
+ if (!(flags & LOOKUP_RCU))
+ return 0;
+ if (likely(!read_seqcount_retry(&nd->path.dentry->d_seq, nd->seq)))
+ return 0;
+ if (!(nd->flags & LOOKUP_ROOT))
+ nd->root.mnt = NULL;
+ rcu_read_unlock();
+ return -ECHILD;

fput_fail:
fput_light(file, fput_needed);

2014-11-01 22:53:53

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 097/102] x86,kvm,vmx: Preserve CR4 across VM entry

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit d974baa398f34393db76be45f7d4d04fbdbb4a0a upstream.

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry. Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact. A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.

Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Cc: [email protected]
Cc: Petr Matousek <[email protected]>
Cc: Gleb Natapov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- Add struct vcpu_vmx *vmx parameter to vmx_set_constant_host_state(), done
upstream in commit a547c6db4d2f ("KVM: VMX: Enable acknowledge interupt
on vmexit")]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -390,6 +390,7 @@ struct vcpu_vmx {
u16 fs_sel, gs_sel, ldt_sel;
int gs_ldt_reload_needed;
int fs_reload_needed;
+ unsigned long vmcs_host_cr4; /* May not match real cr4 */
} host_state;
struct {
int vm86_active;
@@ -3631,16 +3632,21 @@ static void vmx_disable_intercept_for_ms
* Note that host-state that does change is set elsewhere. E.g., host-state
* that is set differently for each CPU is set in vmx_vcpu_load(), not here.
*/
-static void vmx_set_constant_host_state(void)
+static void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
{
u32 low32, high32;
unsigned long tmpl;
struct desc_ptr dt;
+ unsigned long cr4;

vmcs_writel(HOST_CR0, read_cr0() | X86_CR0_TS); /* 22.2.3 */
- vmcs_writel(HOST_CR4, read_cr4()); /* 22.2.3, 22.2.5 */
vmcs_writel(HOST_CR3, read_cr3()); /* 22.2.3 FIXME: shadow tables */

+ /* Save the most likely value for this task's CR4 in the VMCS. */
+ cr4 = read_cr4();
+ vmcs_writel(HOST_CR4, cr4); /* 22.2.3, 22.2.5 */
+ vmx->host_state.vmcs_host_cr4 = cr4;
+
vmcs_write16(HOST_CS_SELECTOR, __KERNEL_CS); /* 22.2.4 */
vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS); /* 22.2.4 */
vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS); /* 22.2.4 */
@@ -3762,7 +3768,7 @@ static int vmx_vcpu_setup(struct vcpu_vm

vmcs_write16(HOST_FS_SELECTOR, 0); /* 22.2.4 */
vmcs_write16(HOST_GS_SELECTOR, 0); /* 22.2.4 */
- vmx_set_constant_host_state();
+ vmx_set_constant_host_state(vmx);
#ifdef CONFIG_X86_64
rdmsrl(MSR_FS_BASE, a);
vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
@@ -6172,6 +6178,7 @@ static void atomic_switch_perf_msrs(stru
static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned long cr4;

if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -6202,6 +6209,12 @@ static void __noclone vmx_vcpu_run(struc
if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);

+ cr4 = read_cr4();
+ if (unlikely(cr4 != vmx->host_state.vmcs_host_cr4)) {
+ vmcs_writel(HOST_CR4, cr4);
+ vmx->host_state.vmcs_host_cr4 = cr4;
+ }
+
/* When single-stepping over STI and MOV SS, we must clear the
* corresponding interruptibility bits in the guest state. Otherwise
* vmentry fails as it then expects bit 14 (BS) in pending debug
@@ -6666,7 +6679,7 @@ static void prepare_vmcs02(struct kvm_vc
* Other fields are different per CPU, and will be set later when
* vmx_vcpu_load() is called, and when vmx_save_host_state() is called.
*/
- vmx_set_constant_host_state();
+ vmx_set_constant_host_state(vmx);

/*
* HOST_RSP is normally set correctly in vmx_vcpu_run() just before

2014-11-01 22:53:49

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 060/102] nl80211: clear skb cb before passing to netlink

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Johannes Berg <[email protected]>

commit bd8c78e78d5011d8111bc2533ee73b13a3bd6c42 upstream.

In testmode and vendor command reply/event SKBs we use the
skb cb data to store nl80211 parameters between allocation
and sending. This causes the code for CONFIG_NETLINK_MMAP
to get confused, because it takes ownership of the skb cb
data when the SKB is handed off to netlink, and it doesn't
explicitly clear it.

Clear the skb cb explicitly when we're done and before it
gets passed to netlink to avoid this issue.

Reported-by: Assaf Azulay <[email protected]>
Reported-by: David Spinadel <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
net/wireless/nl80211.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4804,6 +4804,9 @@ int cfg80211_testmode_reply(struct sk_bu
void *hdr = ((void **)skb->cb)[1];
struct nlattr *data = ((void **)skb->cb)[2];

+ /* clear CB data for netlink core to own from now on */
+ memset(skb->cb, 0, sizeof(skb->cb));
+
if (WARN_ON(!rdev->testmode_info)) {
kfree_skb(skb);
return -EINVAL;
@@ -4830,6 +4833,9 @@ void cfg80211_testmode_event(struct sk_b
void *hdr = ((void **)skb->cb)[1];
struct nlattr *data = ((void **)skb->cb)[2];

+ /* clear CB data for netlink core to own from now on */
+ memset(skb->cb, 0, sizeof(skb->cb));
+
nla_nest_end(skb, data);
genlmsg_end(skb, hdr);
genlmsg_multicast_netns(wiphy_net(&rdev->wiphy), skb, 0,

2014-11-01 22:53:41

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 007/102] rtlwifi: rtl8192cu: Add new ID

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Larry Finger <[email protected]>

commit c66517165610b911e4c6d268f28d8c640832dbd1 upstream.

The Sitecom WLA-2102 adapter uses this driver.

Reported-by: Nico Baggus <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Nico Baggus <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -316,6 +316,7 @@ static struct usb_device_id rtl8192c_usb
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CC&C*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
{RTL_USB_DEVICE(0x0df6, 0x005c, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
+ {RTL_USB_DEVICE(0x0df6, 0x0070, rtl92cu_hal_cfg)}, /*Sitecom - 150N */
{RTL_USB_DEVICE(0x0df6, 0x0077, rtl92cu_hal_cfg)}, /*Sitecom-WLA2100V2*/
{RTL_USB_DEVICE(0x0eb0, 0x9071, rtl92cu_hal_cfg)}, /*NO Brand - Etop*/
{RTL_USB_DEVICE(0x4856, 0x0091, rtl92cu_hal_cfg)}, /*NetweeN - Feixun*/

2014-11-01 22:53:38

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 011/102] drm/i915: Remove bogus __init annotation from DMI callbacks

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <[email protected]>

commit bbe1c2740d3a25aa1dbe5d842d2ff09cddcdde0a upstream.

The __init annotations for the DMI callback functions are wrong as this
code can be called even after the module has been initialized, e.g. like
this:

# echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove
# modprobe i915
# echo 1 > /sys/bus/pci/rescan

The first command will remove the PCI device from the kernel's device
list so the second command won't see it right away. But as it registers
a PCI driver it'll see it on the third command. If the system happens to
match one of the DMI table entries we'll try to call a function in long
released memory and generate an Oops, at best.

Fix this by removing the bogus annotation.

Modpost should have caught that one but it ignores section reference
mismatches from the .rodata section. :/

Fixes: 25e341cfc33d ("drm/i915: quirk away broken OpRegion VBT")
Fixes: 8ca4013d702d ("CHROMIUM: i915: Add DMI override to skip CRT...")
Fixes: 425d244c8670 ("drm/i915: ignore LVDS on intel graphics systems...")
Signed-off-by: Mathias Krause <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Duncan Laurie <[email protected]>
Cc: Jarod Wilson <[email protected]>
Cc: Rusty Russell <[email protected]> # Can modpost be fixed?
Signed-off-by: Jani Nikula <[email protected]>
[bwh: Backported to 3.2: drop inapplicable change in intel_crt.c]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/gpu/drm/i915/intel_bios.c | 2 +-
drivers/gpu/drm/i915/intel_crt.c | 2 +-
drivers/gpu/drm/i915/intel_lvds.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -651,7 +651,7 @@ init_vbt_defaults(struct drm_i915_privat
DRM_DEBUG_KMS("Set default to SSC at %dMHz\n", dev_priv->lvds_ssc_freq);
}

-static int __init intel_no_opregion_vbt_callback(const struct dmi_system_id *id)
+static int intel_no_opregion_vbt_callback(const struct dmi_system_id *id)
{
DRM_DEBUG_KMS("Falling back to manually reading VBT from "
"VBIOS ROM for %s\n",
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -613,7 +613,7 @@ static const struct drm_encoder_funcs in
.destroy = intel_encoder_destroy,
};

-static int __init intel_no_lvds_dmi_callback(const struct dmi_system_id *id)
+static int intel_no_lvds_dmi_callback(const struct dmi_system_id *id)
{
DRM_DEBUG_KMS("Skipping LVDS initialization for %s\n", id->ident);
return 1;

2014-11-01 22:53:28

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 049/102] alarmtimer: Lock k_itimer during timer callback

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Richard Larocque <[email protected]>

commit 474e941bed9262f5fa2394f9a4a67e24499e5926 upstream.

Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.

The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn(). The alarm timers follow a different path, so they
ought to grab the lock somewhere else.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Sharvil Nanavati <[email protected]>
Signed-off-by: Richard Larocque <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/time/alarmtimer.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -448,8 +448,12 @@ static enum alarmtimer_type clock2alarm(
static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
ktime_t now)
{
+ unsigned long flags;
struct k_itimer *ptr = container_of(alarm, struct k_itimer,
it.alarm.alarmtimer);
+ enum alarmtimer_restart result = ALARMTIMER_NORESTART;
+
+ spin_lock_irqsave(&ptr->it_lock, flags);
if ((ptr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
if (posix_timer_event(ptr, 0) != 0)
ptr->it_overrun++;
@@ -459,9 +463,11 @@ static enum alarmtimer_restart alarm_han
if (ptr->it.alarm.interval.tv64) {
ptr->it_overrun += alarm_forward(alarm, now,
ptr->it.alarm.interval);
- return ALARMTIMER_RESTART;
+ result = ALARMTIMER_RESTART;
}
- return ALARMTIMER_NORESTART;
+ spin_unlock_irqrestore(&ptr->it_lock, flags);
+
+ return result;
}

/**

2014-11-01 22:53:23

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 070/102] ARM: 8165/1: alignment: don't break misaligned NEON load/store

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Robin Murphy <[email protected]>

commit 5ca918e5e3f9df4634077c06585c42bc6a8d699a upstream.

The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn
instructions (where the optional alignment hint is given but incorrect)
as LDR/STR, leading to register corruption. Detect these and correctly
treat them as unhandled, so that userspace gets the fault it expects.

Reported-by: Simon Hosie <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/arm/mm/alignment.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -38,6 +38,7 @@
* This code is not portable to processors with late data abort handling.
*/
#define CODING_BITS(i) (i & 0x0e000000)
+#define COND_BITS(i) (i & 0xf0000000)

#define LDST_I_BIT(i) (i & (1 << 26)) /* Immediate constant */
#define LDST_P_BIT(i) (i & (1 << 24)) /* Preindex */
@@ -812,6 +813,8 @@ do_alignment(unsigned long addr, unsigne
break;

case 0x04000000: /* ldr or str immediate */
+ if (COND_BITS(instr) == 0xf0000000) /* NEON VLDn, VSTn */
+ goto bad;
offset.un = OFFSET_BITS(instr);
handler = do_alignment_ldrstr;
break;

2014-11-01 22:53:19

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 085/102] KVM: x86: Check non-canonical addresses upon WRMSR

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 854e8bb1aa06c578c2c9145fa6bfe3680ef63b23 upstream.

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- The various set_msr() functions all separate msr_index and data parameters]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 27 ++++++++++++++++++++++++++-
4 files changed, 42 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -821,6 +821,20 @@ static inline void kvm_inject_gp(struct
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
}

+static inline u64 get_canonical(u64 la)
+{
+ return ((int64_t)la << 16) >> 16;
+}
+
+static inline bool is_noncanonical_address(u64 la)
+{
+#ifdef CONFIG_X86_64
+ return get_canonical(la) != la;
+#else
+ return false;
+#endif
+}
+
#define TSS_IOPB_BASE_OFFSET 0x66
#define TSS_BASE_SIZE 0x68
#define TSS_IOPB_SIZE (65536 / 8)
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3109,7 +3109,7 @@ static int wrmsr_interception(struct vcp


svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
- if (svm_set_msr(&svm->vcpu, ecx, data)) {
+ if (kvm_set_msr(&svm->vcpu, ecx, data)) {
trace_kvm_msr_write_ex(ecx, data);
kvm_inject_gp(&svm->vcpu, 0);
} else {
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4544,7 +4544,7 @@ static int handle_wrmsr(struct kvm_vcpu
u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);

- if (vmx_set_msr(vcpu, ecx, data) != 0) {
+ if (kvm_set_msr(vcpu, ecx, data) != 0) {
trace_kvm_msr_write_ex(ecx, data);
kvm_inject_gp(vcpu, 0);
return 1;
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -893,7 +893,6 @@ void kvm_enable_efer_bits(u64 mask)
}
EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);

-
/*
* Writes msr value into into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -901,8 +900,34 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
*/
int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
{
+ switch (msr_index) {
+ case MSR_FS_BASE:
+ case MSR_GS_BASE:
+ case MSR_KERNEL_GS_BASE:
+ case MSR_CSTAR:
+ case MSR_LSTAR:
+ if (is_noncanonical_address(data))
+ return 1;
+ break;
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_IA32_SYSENTER_ESP:
+ /*
+ * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
+ * non-canonical address is written on Intel but not on
+ * AMD (which ignores the top 32-bits, because it does
+ * not implement 64-bit SYSENTER).
+ *
+ * 64-bit code should hence be able to write a non-canonical
+ * value on AMD. Making the address canonical ensures that
+ * vmentry does not fail on Intel after writing a non-canonical
+ * value, and that something deterministic happens if the guest
+ * invokes 64-bit SYSENTER.
+ */
+ data = get_canonical(data);
+ }
return kvm_x86_ops->set_msr(vcpu, msr_index, data);
}
+EXPORT_SYMBOL_GPL(kvm_set_msr);

/*
* Adapt set_msr() to msr_io()'s calling convention

2014-11-01 22:53:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 093/102] KVM: x86: Handle errors when RIP is set during far jumps

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit d1442d85cc30ea75f7d399474ca738e0bc96f715 upstream.

Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done. The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state. Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- __load_segment_descriptor() does not take an in_task_switch parameter]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1234,7 +1234,8 @@ static int write_segment_descriptor(stru

/* Does not support long mode */
static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, int seg, u8 cpl)
+ u16 selector, int seg, u8 cpl,
+ struct desc_struct *desc)
{
struct desc_struct seg_desc;
u8 dpl, rpl;
@@ -1342,6 +1343,8 @@ static int __load_segment_descriptor(str
}
load:
ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg);
+ if (desc)
+ *desc = seg_desc;
return X86EMUL_CONTINUE;
exception:
emulate_exception(ctxt, err_vec, err_code, true);
@@ -1352,7 +1355,7 @@ static int load_segment_descriptor(struc
u16 selector, int seg)
{
u8 cpl = ctxt->ops->cpl(ctxt);
- return __load_segment_descriptor(ctxt, selector, seg, cpl);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl, NULL);
}

static void write_register_operand(struct operand *op)
@@ -1694,17 +1697,31 @@ static int em_iret(struct x86_emulate_ct
static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned short sel;
+ unsigned short sel, old_sel;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ u8 cpl = ctxt->ops->cpl(ctxt);
+
+ /* Assignment of RIP may only fail in 64-bit mode */
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
+ VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);

- rc = load_segment_descriptor(ctxt, sel, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ &new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
- return X86EMUL_CONTINUE;
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ /* assigning eip failed; restore the old cs */
+ ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+ }
+ return rc;
}

static int em_grp1a(struct x86_emulate_ctxt *ctxt)
@@ -1856,21 +1873,34 @@ static int em_ret(struct x86_emulate_ctx
static int em_ret_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned long cs;
+ unsigned long eip, cs;
+ u16 old_cs;
int cpl = ctxt->ops->cpl(ctxt);
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
+ VCPU_SREG_CS);

- rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
- if (ctxt->op_bytes == 4)
- ctxt->_eip = (u32)ctxt->_eip;
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
/* Outer-privilege level return is not implemented */
if (ctxt->mode >= X86EMUL_MODE_PROT16 && (cs & 3) > cpl)
return X86EMUL_UNHANDLEABLE;
- rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, 0,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+ rc = assign_eip_far(ctxt, eip, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ }
return rc;
}

@@ -2248,19 +2278,24 @@ static int load_state_from_tss16(struct
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2373,25 +2408,32 @@ static int load_state_from_tss32(struct
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
+ cpl, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl,
+ NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2605,24 +2647,39 @@ static int em_call_far(struct x86_emulat
u16 sel, old_cs;
ulong old_eip;
int rc;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ int cpl = ctxt->ops->cpl(ctxt);

- old_cs = get_segment_selector(ctxt, VCPU_SREG_CS);
old_eip = ctxt->_eip;
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
- if (load_segment_descriptor(ctxt, sel, VCPU_SREG_CS))
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;

ctxt->src.val = old_cs;
rc = em_push(ctxt);
if (rc != X86EMUL_CONTINUE)
- return rc;
+ goto fail;

ctxt->src.val = old_eip;
- return em_push(ctxt);
+ rc = em_push(ctxt);
+ /* If we failed, we tainted the memory, but the very least we should
+ restore cs */
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;
+ return rc;
+fail:
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+
}

static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)

2014-11-01 22:56:40

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 089/102] KVM: x86 emulator: Use opcode::execute for CALL

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Takuya Yoshikawa <[email protected]>

commit d4ddafcdf2201326ec9717172767cfad0ede1472 upstream.

CALL: E8

Signed-off-by: Takuya Yoshikawa <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/emulate.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2536,6 +2536,15 @@ static int em_das(struct x86_emulate_ctx
return X86EMUL_CONTINUE;
}

+static int em_call(struct x86_emulate_ctxt *ctxt)
+{
+ long rel = ctxt->src.val;
+
+ ctxt->src.val = (unsigned long)ctxt->_eip;
+ jmp_rel(ctxt, rel);
+ return em_push(ctxt);
+}
+
static int em_call_far(struct x86_emulate_ctxt *ctxt)
{
u16 sel, old_cs;
@@ -3271,7 +3280,7 @@ static struct opcode opcode_table[256] =
D2bvIP(SrcImmUByte | DstAcc, in, check_perm_in),
D2bvIP(SrcAcc | DstImmUByte, out, check_perm_out),
/* 0xE8 - 0xEF */
- D(SrcImm | Stack), D(SrcImm | ImplicitOps),
+ I(SrcImm | Stack, em_call), D(SrcImm | ImplicitOps),
I(SrcImmFAddr | No64, em_jmp_far), D(SrcImmByte | ImplicitOps),
D2bvIP(SrcDX | DstAcc, in, check_perm_in),
D2bvIP(SrcAcc | DstDX, out, check_perm_out),
@@ -3966,13 +3975,6 @@ special_insn:
case 0xe6: /* outb */
case 0xe7: /* out */
goto do_io_out;
- case 0xe8: /* call (near) */ {
- long int rel = ctxt->src.val;
- ctxt->src.val = (unsigned long) ctxt->_eip;
- jmp_rel(ctxt, rel);
- rc = em_push(ctxt);
- break;
- }
case 0xe9: /* jmp rel */
case 0xeb: /* jmp rel short */
jmp_rel(ctxt, ctxt->src.val);

2014-11-01 22:56:44

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 064/102] can: flexcan: put TX mailbox into TX_INACTIVE mode after tx-complete

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marc Kleine-Budde <[email protected]>

commit de5944883ebbedbf5adc8497659772f5da7b7d72 upstream.

After sending a RTR frame the TX mailbox becomes a RX_EMPTY mailbox. To avoid
side effects when the RX-FIFO is full, this patch puts the TX mailbox into
TX_INACTIVE mode in the transmission complete interrupt handler. This, of
course, leaves a race window between the actual completion of the transmission
and the handling of tx-complete interrupt. However this is the best we can do
without busy polling the tx complete interrupt.

Signed-off-by: Marc Kleine-Budde <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/flexcan.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -632,6 +632,9 @@ static irqreturn_t flexcan_irq(int irq,
if (reg_iflag1 & (1 << FLEXCAN_TX_BUF_ID)) {
/* tx_bytes is incremented in flexcan_start_xmit */
stats->tx_packets++;
+ /* after sending a RTR frame mailbox is in RX mode */
+ flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
+ &regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
flexcan_write((1 << FLEXCAN_TX_BUF_ID), &regs->iflag1);
netif_wake_queue(dev);
}

2014-11-01 22:56:55

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 078/102] MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Yoichi Yuasa <[email protected]>

commit 5596b0b245fb9d2cefb5023b11061050351c1398 upstream.

[ 1.904000] BUG: scheduling while atomic: swapper/1/0x00000002
[ 1.908000] Modules linked in:
[ 1.916000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc2-lemote-los.git-5318619-dirty #1
[ 1.920000] Stack : 0000000031aac000 ffffffff810d0000 0000000000000052 ffffffff802730a4
0000000000000000 0000000000000001 ffffffff810cdf90 ffffffff810d0000
ffffffff8068b968 ffffffff806f5537 ffffffff810cdf90 980000009f0782e8
0000000000000001 ffffffff80720000 ffffffff806b0000 980000009f078000
980000009f290000 ffffffff805f312c 980000009f05b5d8 ffffffff80233518
980000009f05b5e8 ffffffff80274b7c 980000009f078000 ffffffff8068b968
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 980000009f05b520 0000000000000000 ffffffff805f2f6c
0000000000000000 ffffffff80700000 ffffffff80700000 ffffffff806fc758
ffffffff80700000 ffffffff8020be98 ffffffff806fceb0 ffffffff805f2f6c
...
[ 2.028000] Call Trace:
[ 2.032000] [<ffffffff8020be98>] show_stack+0x80/0x98
[ 2.036000] [<ffffffff805f2f6c>] __schedule_bug+0x44/0x6c
[ 2.040000] [<ffffffff805fac58>] __schedule+0x518/0x5b0
[ 2.044000] [<ffffffff805f8a58>] schedule_timeout+0x128/0x1f0
[ 2.048000] [<ffffffff80240314>] msleep+0x3c/0x60
[ 2.052000] [<ffffffff80495400>] do_probe+0x238/0x3a8
[ 2.056000] [<ffffffff804958b0>] ide_probe_port+0x340/0x7e8
[ 2.060000] [<ffffffff80496028>] ide_host_register+0x2d0/0x7a8
[ 2.064000] [<ffffffff8049c65c>] ide_pci_init_two+0x4e4/0x790
[ 2.068000] [<ffffffff8049f9b8>] amd74xx_probe+0x148/0x2c8
[ 2.072000] [<ffffffff803f571c>] pci_device_probe+0xc4/0x130
[ 2.076000] [<ffffffff80478f60>] driver_probe_device+0x98/0x270
[ 2.080000] [<ffffffff80479298>] __driver_attach+0xe0/0xe8
[ 2.084000] [<ffffffff80476ab0>] bus_for_each_dev+0x78/0xe0
[ 2.088000] [<ffffffff80478468>] bus_add_driver+0x230/0x310
[ 2.092000] [<ffffffff80479b44>] driver_register+0x84/0x158
[ 2.096000] [<ffffffff80200504>] do_one_initcall+0x104/0x160

Signed-off-by: Yoichi Yuasa <[email protected]>
Reported-by: Aaro Koskinen <[email protected]>
Tested-by: Aaro Koskinen <[email protected]>
Cc: [email protected]
Cc: Linux Kernel Mailing List <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/5941/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/mips/mm/c-r4k.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -606,6 +606,7 @@ static void r4k_dma_cache_wback_inv(unsi
r4k_blast_scache();
else
blast_scache_range(addr, addr + size);
+ preempt_enable();
__sync();
return;
}
@@ -647,6 +648,7 @@ static void r4k_dma_cache_inv(unsigned l
*/
blast_inv_scache_range(addr, addr + size);
}
+ preempt_enable();
__sync();
return;
}

2014-11-01 22:56:52

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 065/102] can: at91_can: add missing prepare and unprepare of the clock

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: David Dueck <[email protected]>

commit e77980e50bc2850599d4d9c0192b67a9ffd6daac upstream.

In order to make the driver work with the common clock framework, this patch
converts the clk_enable()/clk_disable() to
clk_prepare_enable()/clk_disable_unprepare(). While there, add the missing
error handling.

Signed-off-by: David Dueck <[email protected]>
Signed-off-by: Anthony Harivel <[email protected]>
Acked-by: Boris Brezillon <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/at91_can.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -1115,7 +1115,9 @@ static int at91_open(struct net_device *
struct at91_priv *priv = netdev_priv(dev);
int err;

- clk_enable(priv->clk);
+ err = clk_prepare_enable(priv->clk);
+ if (err)
+ return err;

/* check or determine and set bittime */
err = open_candev(dev);
@@ -1139,7 +1141,7 @@ static int at91_open(struct net_device *
out_close:
close_candev(dev);
out:
- clk_disable(priv->clk);
+ clk_disable_unprepare(priv->clk);

return err;
}
@@ -1156,7 +1158,7 @@ static int at91_close(struct net_device
at91_chip_stop(dev, CAN_STATE_STOPPED);

free_irq(dev->irq, dev);
- clk_disable(priv->clk);
+ clk_disable_unprepare(priv->clk);

close_candev(dev);

2014-11-01 22:56:58

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 082/102] ipv6: reallocate addrconf router for ipv6 address when lo device up

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: chenweilong <[email protected]>

It fix the bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951

The patch can't be applied directly, as it' used the function introduced
by "commit 94e187c0" ip6_rt_put(), that patch can't be applied directly
either.

====================

From: Gao feng <[email protected]>

commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 upstream.

This commit don't have a stable tag, but it fix the bug
no reply after loopback down-up.It's very worthy to be
applied to stable 3.4 kernels.

The bug is 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951


CC: Sabrina Dubroca <[email protected]>
CC: Hannes Frederic Sowa <[email protected]>
Reported-by: Weilong Chen <[email protected]>
Signed-off-by: Weilong Chen <[email protected]>
Signed-off-by: Gao feng <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[weilong: s/ip6_rt_put/dst_release]
Signed-off-by: Chen Weilong <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv6/addrconf.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2443,8 +2443,18 @@ static void init_loopback(struct net_dev
if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
continue;

- if (sp_ifa->rt)
- continue;
+ if (sp_ifa->rt) {
+ /* This dst has been added to garbage list when
+ * lo device down, release this obsolete dst and
+ * reallocate a new router for ifa.
+ */
+ if (sp_ifa->rt->dst.obsolete > 0) {
+ dst_release(&sp_ifa->rt->dst);
+ sp_ifa->rt = NULL;
+ } else {
+ continue;
+ }
+ }

sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, 0);

2014-11-01 22:57:04

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 095/102] net: sctp: fix panic on duplicate ASCONF chunks

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
include/net/sctp/sctp.h | 5 +++++
net/sctp/associola.c | 2 ++
2 files changed, 7 insertions(+)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -523,6 +523,11 @@ static inline void sctp_assoc_pending_pm
asoc->pmtu_pending = 0;
}

+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+ return !list_empty(&chunk->list);
+}
+
/* Walk through a list of TLV parameters. Don't trust the
* individual parameter lengths and instead depend on
* the chunk length to indicate when to stop. Make sure
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1638,6 +1638,8 @@ struct sctp_chunk *sctp_assoc_lookup_asc
* ack chunk whose serial number matches that of the request.
*/
list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+ if (sctp_chunk_pending(ack))
+ continue;
if (ack->subh.addip_hdr->serial == serial) {
sctp_chunk_hold(ack);
return ack;

2014-11-01 22:57:01

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 096/102] net: sctp: fix remote memory pressure from excessive queueing

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
[...]
---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/sctp/inqueue.c | 33 +++++++--------------------------
net/sctp/sm_statefuns.c | 3 +++
2 files changed, 10 insertions(+), 26 deletions(-)

--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -152,18 +152,9 @@ struct sctp_chunk *sctp_inq_pop(struct s
} else {
/* Nothing to do. Next chunk in the packet, please. */
ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
/* Force chunk->skb->data to chunk->chunk_end. */
- skb_pull(chunk->skb,
- chunk->chunk_end - chunk->skb->data);
-
- /* Verify that we have at least chunk headers
- * worth of buffer left.
- */
- if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
- sctp_chunk_free(chunk);
- chunk = queue->in_progress = NULL;
- }
+ skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+ /* We are guaranteed to pull a SCTP header. */
}
}

@@ -199,24 +190,14 @@ struct sctp_chunk *sctp_inq_pop(struct s
skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
chunk->subh.v = NULL; /* Subheader is no longer valid. */

- if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+ if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+ skb_tail_pointer(chunk->skb)) {
/* This is not a singleton */
chunk->singleton = 0;
} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
- /* RFC 2960, Section 6.10 Bundling
- *
- * Partial chunks MUST NOT be placed in an SCTP packet.
- * If the receiver detects a partial chunk, it MUST drop
- * the chunk.
- *
- * Since the end of the chunk is past the end of our buffer
- * (which contains the whole packet, we can freely discard
- * the whole packet.
- */
- sctp_chunk_free(chunk);
- chunk = queue->in_progress = NULL;
-
- return NULL;
+ /* Discard inside state machine. */
+ chunk->pdiscard = 1;
+ chunk->chunk_end = skb_tail_pointer(chunk->skb);
} else {
/* We are at the end of the packet, so mark the chunk
* in case we need to send a SACK.
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -163,6 +163,9 @@ sctp_chunk_length_valid(struct sctp_chun
{
__u16 chunk_length = ntohs(chunk->chunk_hdr->length);

+ /* Previously already marked? */
+ if (unlikely(chunk->pdiscard))
+ return 0;
if (unlikely(chunk_length < required_length))
return 0;

2014-11-01 22:56:49

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 074/102] shmem: fix nlink for rename overwrite directory

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Miklos Szeredi <[email protected]>

commit b928095b0a7cff7fb9fcf4c706348ceb8ab2c295 upstream.

If overwriting an empty directory with rename, then need to drop the extra
nlink.

Test prog:

#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/stat.h>

int main(void)
{
const char *test_dir1 = "test-dir1";
const char *test_dir2 = "test-dir2";
int res;
int fd;
struct stat statbuf;

res = mkdir(test_dir1, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir1);

res = mkdir(test_dir2, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir2);

fd = open(test_dir2, O_RDONLY);
if (fd == -1)
err(1, "open(\"%s\")", test_dir2);

res = rename(test_dir1, test_dir2);
if (res == -1)
err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);

res = fstat(fd, &statbuf);
if (res == -1)
err(1, "fstat(%i)", fd);

if (statbuf.st_nlink != 0) {
fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
return 1;
}

return 0;
}

Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
mm/shmem.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1719,8 +1719,10 @@ static int shmem_rename(struct inode *ol

if (new_dentry->d_inode) {
(void) shmem_unlink(new_dir, new_dentry);
- if (they_are_dirs)
+ if (they_are_dirs) {
+ drop_nlink(new_dentry->d_inode);
drop_nlink(old_dir);
+ }
} else if (they_are_dirs) {
drop_nlink(old_dir);
inc_nlink(new_dir);

2014-11-01 22:56:38

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 090/102] KVM: x86: Fix wrong masking on relative jump/call

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 05c83ec9b73c8124555b706f6af777b10adf0862 upstream.

Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/emulate.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -456,11 +456,6 @@ register_address_increment(struct x86_em
*reg = (*reg & ~ad_mask(ctxt)) | ((*reg + inc) & ad_mask(ctxt));
}

-static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
-{
- register_address_increment(ctxt, &ctxt->_eip, rel);
-}
-
static u32 desc_limit_scaled(struct desc_struct *desc)
{
u32 limit = get_desc_limit(desc);
@@ -534,6 +529,28 @@ static int emulate_nm(struct x86_emulate
return emulate_exception(ctxt, NM_VECTOR, 0, false);
}

+static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+{
+ switch (ctxt->op_bytes) {
+ case 2:
+ ctxt->_eip = (u16)dst;
+ break;
+ case 4:
+ ctxt->_eip = (u32)dst;
+ break;
+ case 8:
+ ctxt->_eip = dst;
+ break;
+ default:
+ WARN(1, "unsupported eip assignment size\n");
+ }
+}
+
+static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
+{
+ assign_eip_near(ctxt, ctxt->_eip + rel);
+}
+
static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
{
u16 selector;

2014-11-01 22:56:36

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 066/102] ALSA: pcm: fix fifo_size frame calculation

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Clemens Ladisch <[email protected]>

commit a9960e6a293e6fc3ed414643bb4e4106272e4d0a upstream.

The calculated frame size was wrong because snd_pcm_format_physical_width()
actually returns the number of bits, not bytes.

Use snd_pcm_format_size() instead, which not only returns bytes, but also
simplifies the calculation.

Fixes: 8bea869c5e56 ("ALSA: PCM midlevel: improve fifo_size handling")
Signed-off-by: Clemens Ladisch <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
sound/core/pcm_lib.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -1692,14 +1692,16 @@ static int snd_pcm_lib_ioctl_fifo_size(s
{
struct snd_pcm_hw_params *params = arg;
snd_pcm_format_t format;
- int channels, width;
+ int channels;
+ ssize_t frame_size;

params->fifo_size = substream->runtime->hw.fifo_size;
if (!(substream->runtime->hw.info & SNDRV_PCM_INFO_FIFO_IN_FRAMES)) {
format = params_format(params);
channels = params_channels(params);
- width = snd_pcm_format_physical_width(format);
- params->fifo_size /= width * channels;
+ frame_size = snd_pcm_format_size(format, channels);
+ if (frame_size > 0)
+ params->fifo_size /= (unsigned)frame_size;
}
return 0;
}

2014-11-01 22:56:33

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 071/102] MIPS: mcount: Adjust stack pointer for static trace in MIPS32

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Markos Chandras <[email protected]>

commit 8a574cfa2652545eb95595d38ac2a0bb501af0ae upstream.

Every mcount() call in the MIPS 32-bit kernel is done as follows:

[...]
move at, ra
jal _mcount
addiu sp, sp, -8
[...]

but upon returning from the mcount() function, the stack pointer
is not adjusted properly. This is explained in details in 58b69401c797
(MIPS: Function tracer: Fix broken function tracing).

Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.)
fixed the stack manipulation for 64-bit but it didn't fix it completely
for MIPS32.

Signed-off-by: Markos Chandras <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7792/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/mips/kernel/mcount.S | 12 ++++++++++++
1 file changed, 12 insertions(+)

--- a/arch/mips/kernel/mcount.S
+++ b/arch/mips/kernel/mcount.S
@@ -119,7 +119,11 @@ NESTED(_mcount, PT_SIZE, ra)
nop
#endif
b ftrace_stub
+#ifdef CONFIG_32BIT
+ addiu sp, sp, 8
+#else
nop
+#endif

static_trace:
MCOUNT_SAVE_REGS
@@ -129,6 +133,9 @@ static_trace:
move a1, AT /* arg2: parent's return address */

MCOUNT_RESTORE_REGS
+#ifdef CONFIG_32BIT
+ addiu sp, sp, 8
+#endif
.globl ftrace_stub
ftrace_stub:
RETURN_BACK
@@ -177,6 +184,11 @@ NESTED(ftrace_graph_caller, PT_SIZE, ra)
jal prepare_ftrace_return
nop
MCOUNT_RESTORE_REGS
+#ifndef CONFIG_DYNAMIC_FTRACE
+#ifdef CONFIG_32BIT
+ addiu sp, sp, 8
+#endif
+#endif
RETURN_BACK
END(ftrace_graph_caller)

2014-11-01 22:56:29

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 091/102] KVM: x86: Emulator fixes for eip canonical checks on near branches

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Amit <[email protected]>

commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream.

Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do. During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur. The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.

This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.

This fixes CVE-2014-3647.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- Use ctxt->regs[] instead of reg_read(), reg_write(), reg_rmw()]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/emulate.c | 78 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 54 insertions(+), 24 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -529,7 +529,8 @@ static int emulate_nm(struct x86_emulate
return emulate_exception(ctxt, NM_VECTOR, 0, false);
}

-static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
+ int cs_l)
{
switch (ctxt->op_bytes) {
case 2:
@@ -539,16 +540,25 @@ static inline void assign_eip_near(struc
ctxt->_eip = (u32)dst;
break;
case 8:
+ if ((cs_l && is_noncanonical_address(dst)) ||
+ (!cs_l && (dst & ~(u32)-1)))
+ return emulate_gp(ctxt, 0);
ctxt->_eip = dst;
break;
default:
WARN(1, "unsupported eip assignment size\n");
}
+ return X86EMUL_CONTINUE;
+}
+
+static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+{
+ return assign_eip_far(ctxt, dst, ctxt->mode == X86EMUL_MODE_PROT64);
}

-static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
+static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
{
- assign_eip_near(ctxt, ctxt->_eip + rel);
+ return assign_eip_near(ctxt, ctxt->_eip + rel);
}

static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
@@ -1787,13 +1797,15 @@ static int em_grp45(struct x86_emulate_c
case 2: /* call near abs */ {
long int old_eip;
old_eip = ctxt->_eip;
- ctxt->_eip = ctxt->src.val;
+ rc = assign_eip_near(ctxt, ctxt->src.val);
+ if (rc != X86EMUL_CONTINUE)
+ break;
ctxt->src.val = old_eip;
rc = em_push(ctxt);
break;
}
case 4: /* jmp abs */
- ctxt->_eip = ctxt->src.val;
+ rc = assign_eip_near(ctxt, ctxt->src.val);
break;
case 5: /* jmp far */
rc = em_jmp_far(ctxt);
@@ -1825,10 +1837,14 @@ static int em_grp9(struct x86_emulate_ct

static int em_ret(struct x86_emulate_ctxt *ctxt)
{
- ctxt->dst.type = OP_REG;
- ctxt->dst.addr.reg = &ctxt->_eip;
- ctxt->dst.bytes = ctxt->op_bytes;
- return em_pop(ctxt);
+ int rc;
+ unsigned long eip;
+
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ return assign_eip_near(ctxt, eip);
}

static int em_ret_far(struct x86_emulate_ctxt *ctxt)
@@ -2060,7 +2076,7 @@ static int em_sysexit(struct x86_emulate
{
struct x86_emulate_ops *ops = ctxt->ops;
struct desc_struct cs, ss;
- u64 msr_data;
+ u64 msr_data, rcx, rdx;
int usermode;
u16 cs_sel = 0, ss_sel = 0;

@@ -2076,6 +2092,9 @@ static int em_sysexit(struct x86_emulate
else
usermode = X86EMUL_MODE_PROT32;

+ rcx = ctxt->regs[VCPU_REGS_RCX];
+ rdx = ctxt->regs[VCPU_REGS_RDX];
+
cs.dpl = 3;
ss.dpl = 3;
ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data);
@@ -2093,6 +2112,9 @@ static int em_sysexit(struct x86_emulate
ss_sel = cs_sel + 8;
cs.d = 0;
cs.l = 1;
+ if (is_noncanonical_address(rcx) ||
+ is_noncanonical_address(rdx))
+ return emulate_gp(ctxt, 0);
break;
}
cs_sel |= SELECTOR_RPL_MASK;
@@ -2101,8 +2123,8 @@ static int em_sysexit(struct x86_emulate
ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);

- ctxt->_eip = ctxt->regs[VCPU_REGS_RDX];
- ctxt->regs[VCPU_REGS_RSP] = ctxt->regs[VCPU_REGS_RCX];
+ ctxt->_eip = rdx;
+ ctxt->regs[VCPU_REGS_RSP] = rcx;

return X86EMUL_CONTINUE;
}
@@ -2555,10 +2577,13 @@ static int em_das(struct x86_emulate_ctx

static int em_call(struct x86_emulate_ctxt *ctxt)
{
+ int rc;
long rel = ctxt->src.val;

ctxt->src.val = (unsigned long)ctxt->_eip;
- jmp_rel(ctxt, rel);
+ rc = jmp_rel(ctxt, rel);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
return em_push(ctxt);
}

@@ -2590,11 +2615,12 @@ static int em_call_far(struct x86_emulat
static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
{
int rc;
+ unsigned long eip;

- ctxt->dst.type = OP_REG;
- ctxt->dst.addr.reg = &ctxt->_eip;
- ctxt->dst.bytes = ctxt->op_bytes;
- rc = emulate_pop(ctxt, &ctxt->dst.val, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+ rc = assign_eip_near(ctxt, eip);
if (rc != X86EMUL_CONTINUE)
return rc;
register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RSP], ctxt->src.val);
@@ -2840,20 +2866,24 @@ static int em_lmsw(struct x86_emulate_ct

static int em_loop(struct x86_emulate_ctxt *ctxt)
{
+ int rc = X86EMUL_CONTINUE;
+
register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], -1);
if ((address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) != 0) &&
(ctxt->b == 0xe2 || test_cc(ctxt->b ^ 0x5, ctxt->eflags)))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);

- return X86EMUL_CONTINUE;
+ return rc;
}

static int em_jcxz(struct x86_emulate_ctxt *ctxt)
{
+ int rc = X86EMUL_CONTINUE;
+
if (address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) == 0)
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);

- return X86EMUL_CONTINUE;
+ return rc;
}

static int em_cli(struct x86_emulate_ctxt *ctxt)
@@ -3946,7 +3976,7 @@ special_insn:
break;
case 0x70 ... 0x7f: /* jcc (short) */
if (test_cc(ctxt->b, ctxt->eflags))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
break;
case 0x8d: /* lea r16/r32, m */
ctxt->dst.val = ctxt->src.addr.mem.ea;
@@ -3994,7 +4024,7 @@ special_insn:
goto do_io_out;
case 0xe9: /* jmp rel */
case 0xeb: /* jmp rel short */
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
ctxt->dst.type = OP_NONE; /* Disable writeback. */
break;
case 0xec: /* in al,dx */
@@ -4160,7 +4190,7 @@ twobyte_insn:
break;
case 0x80 ... 0x8f: /* jnz rel, etc*/
if (test_cc(ctxt->b, ctxt->eflags))
- jmp_rel(ctxt, ctxt->src.val);
+ rc = jmp_rel(ctxt, ctxt->src.val);
break;
case 0x90 ... 0x9f: /* setcc r/m8 */
ctxt->dst.val = test_cc(ctxt->b, ctxt->eflags);

2014-11-01 22:56:26

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 083/102] ext4: fix BUG_ON in mb_free_blocks()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <[email protected]>

commit c99d1e6e83b06744c75d9f5e491ed495a7086b7b upstream.

If we suffer a block allocation failure (for example due to a memory
allocation failure), it's possible that we will call
ext4_discard_allocated_blocks() before we've actually allocated any
blocks. In that case, fe_len and fe_start in ac->ac_f_ex will still
be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
triggering the BUG_ON on mb_free_blocks():

BUG_ON(last >= (sb->s_blocksize << 3));

Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
is zero.

Also fix a missing ext4_mb_unload_buddy() call in
ext4_discard_allocated_blocks().

Google-Bug-Id: 16844242

Fixes: 86f0afd463215fc3e58020493482faa4ac3a4d69
Signed-off-by: Theodore Ts'o <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/ext4/mballoc.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1312,6 +1312,8 @@ static void mb_free_blocks(struct inode
void *buddy2;
struct super_block *sb = e4b->bd_sb;

+ if (WARN_ON(count == 0))
+ return;
BUG_ON(first + count > (sb->s_blocksize << 3));
assert_spin_locked(ext4_group_lock_ptr(sb, e4b->bd_group));
mb_check_buddy(e4b);
@@ -3132,6 +3134,8 @@ static void ext4_discard_allocated_block
int err;

if (pa == NULL) {
+ if (ac->ac_f_ex.fe_len == 0)
+ return;
err = ext4_mb_load_buddy(ac->ac_sb, ac->ac_f_ex.fe_group, &e4b);
if (err) {
/*
@@ -3146,6 +3150,7 @@ static void ext4_discard_allocated_block
mb_free_blocks(ac->ac_inode, &e4b, ac->ac_f_ex.fe_start,
ac->ac_f_ex.fe_len);
ext4_unlock_group(ac->ac_sb, ac->ac_f_ex.fe_group);
+ ext4_mb_unload_buddy(&e4b);
return;
}
if (pa->pa_type == MB_INODE_PA)

2014-11-01 23:00:59

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 001/102] regulatory: add NUL to alpha2

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Eliad Peller <[email protected]>

commit a5fe8e7695dc3f547e955ad2b662e3e72969e506 upstream.

alpha2 is defined as 2-chars array, but is used in multiple
places as string (e.g. with nla_put_string calls), which
might leak kernel data.

Solve it by simply adding an extra char for the NULL
terminator, making such operations safe.

Signed-off-by: Eliad Peller <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
include/net/regulatory.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/net/regulatory.h
+++ b/include/net/regulatory.h
@@ -92,7 +92,7 @@ struct ieee80211_reg_rule {

struct ieee80211_regdomain {
u32 n_reg_rules;
- char alpha2[2];
+ char alpha2[3];
struct ieee80211_reg_rule reg_rules[];
};

2014-11-01 23:01:11

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 079/102] ipv4: move route garbage collector to work queue

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marcelo Ricardo Leitner <[email protected]>

Currently the route garbage collector gets called by dst_alloc() if it
have more entries than the threshold. But it's an expensive call, that
don't really need to be done by then.

Another issue with current way is that it allows running the garbage
collector with the same start parameters on multiple CPUs at once, which
is not optimal. A system may even soft lockup if the cache is big enough
as the garbage collectors will be fighting over the hash lock entries.

This patch thus moves the garbage collector to run asynchronously on a
work queue, much similar to how rt_expire_check runs.

There is one condition left that allows multiple executions, which is
handled by the next patch.

Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv4/route.c | 43 +++++++++++++++++++++++++++++--------------
1 file changed, 29 insertions(+), 14 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -151,6 +151,9 @@ static void ipv4_link_failure(struct s
static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
static int rt_garbage_collect(struct dst_ops *ops);

+static void __rt_garbage_collect(struct work_struct *w);
+static DECLARE_WORK(rt_gc_worker, __rt_garbage_collect);
+
static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
int how)
{
@@ -979,7 +982,7 @@ static void rt_emergency_hash_rebuild(st
and when load increases it reduces to limit cache size.
*/

-static int rt_garbage_collect(struct dst_ops *ops)
+static void __do_rt_garbage_collect(int elasticity, int min_interval)
{
static unsigned long expire = RT_GC_TIMEOUT;
static unsigned long last_gc;
@@ -998,7 +1001,7 @@ static int rt_garbage_collect(struct dst

RT_CACHE_STAT_INC(gc_total);

- if (now - last_gc < ip_rt_gc_min_interval &&
+ if (now - last_gc < min_interval &&
entries < ip_rt_max_size) {
RT_CACHE_STAT_INC(gc_ignored);
goto out;
@@ -1006,7 +1009,7 @@ static int rt_garbage_collect(struct dst

entries = dst_entries_get_slow(&ipv4_dst_ops);
/* Calculate number of entries, which we want to expire now. */
- goal = entries - (ip_rt_gc_elasticity << rt_hash_log);
+ goal = entries - (elasticity << rt_hash_log);
if (goal <= 0) {
if (equilibrium < ipv4_dst_ops.gc_thresh)
equilibrium = ipv4_dst_ops.gc_thresh;
@@ -1023,7 +1026,7 @@ static int rt_garbage_collect(struct dst
equilibrium = entries - goal;
}

- if (now - last_gc >= ip_rt_gc_min_interval)
+ if (now - last_gc >= min_interval)
last_gc = now;

if (goal <= 0) {
@@ -1088,15 +1091,33 @@ static int rt_garbage_collect(struct dst
if (net_ratelimit())
printk(KERN_WARNING "dst cache overflow\n");
RT_CACHE_STAT_INC(gc_dst_overflow);
- return 1;
+ return;

work_done:
- expire += ip_rt_gc_min_interval;
+ expire += min_interval;
if (expire > ip_rt_gc_timeout ||
dst_entries_get_fast(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh ||
dst_entries_get_slow(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh)
expire = ip_rt_gc_timeout;
-out: return 0;
+out: return;
+}
+
+static void __rt_garbage_collect(struct work_struct *w)
+{
+ __do_rt_garbage_collect(ip_rt_gc_elasticity, ip_rt_gc_min_interval);
+}
+
+static int rt_garbage_collect(struct dst_ops *ops)
+{
+ if (!work_pending(&rt_gc_worker))
+ schedule_work(&rt_gc_worker);
+
+ if (dst_entries_get_fast(&ipv4_dst_ops) >= ip_rt_max_size ||
+ dst_entries_get_slow(&ipv4_dst_ops) >= ip_rt_max_size) {
+ RT_CACHE_STAT_INC(gc_dst_overflow);
+ return 1;
+ }
+ return 0;
}

/*
@@ -1291,13 +1312,7 @@ restart:
it is most likely it holds some neighbour records.
*/
if (attempts-- > 0) {
- int saved_elasticity = ip_rt_gc_elasticity;
- int saved_int = ip_rt_gc_min_interval;
- ip_rt_gc_elasticity = 1;
- ip_rt_gc_min_interval = 0;
- rt_garbage_collect(&ipv4_dst_ops);
- ip_rt_gc_min_interval = saved_int;
- ip_rt_gc_elasticity = saved_elasticity;
+ __do_rt_garbage_collect(1, 0);
goto restart;
}

2014-11-01 23:01:18

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 102/102] ring-buffer: Fix infinite spin in reading buffer

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 24607f114fd14f2f37e3e0cb3d47bce96e81e848 upstream.

Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.

A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.

The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.

Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.

Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page"
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/trace/ring_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2847,7 +2847,7 @@ static void rb_iter_reset(struct ring_bu
iter->head = cpu_buffer->reader_page->read;

iter->cache_reader_page = iter->head_page;
- iter->cache_read = iter->head;
+ iter->cache_read = cpu_buffer->read;

if (iter->head)
iter->read_stamp = cpu_buffer->read_stamp;

2014-11-01 23:01:26

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 094/102] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8144fb1c>] skb_put+0x5c/0x70
[<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<ffffffff81496ac5>] ip_rcv+0x275/0x350
[<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- sctp_sf_violation_paramlen() doesn't take a struct net * parameter]
Signed-off-by: Ben Hutchings <[email protected]>
---
include/net/sctp/sm.h | 6 +--
net/sctp/sm_make_chunk.c | 99 +++++++++++++++++++++++++++---------------------
net/sctp/sm_statefuns.c | 18 +--------
3 files changed, 60 insertions(+), 63 deletions(-)

--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -251,9 +251,9 @@ struct sctp_chunk *sctp_make_asconf_upda
int, __be16);
struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
- struct sctp_paramhdr *param_hdr, void *chunk_end,
- struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+ struct sctp_chunk *chunk, bool addr_param_needed,
+ struct sctp_paramhdr **errp);
struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
struct sctp_chunk *asconf);
int sctp_process_asconf_ack(struct sctp_association *asoc,
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3068,50 +3068,63 @@ static __be16 sctp_process_asconf_param(
return SCTP_ERROR_NO_ERROR;
}

-/* Verify the ASCONF packet before we process it. */
-int sctp_verify_asconf(const struct sctp_association *asoc,
- struct sctp_paramhdr *param_hdr, void *chunk_end,
- struct sctp_paramhdr **errp) {
- sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+ struct sctp_chunk *chunk, bool addr_param_needed,
+ struct sctp_paramhdr **errp)
+{
+ sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
union sctp_params param;
- int length, plen;
-
- param.v = (sctp_paramhdr_t *) param_hdr;
- while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
- length = ntohs(param.p->length);
- *errp = param.p;
+ bool addr_param_seen = false;

- if (param.v > chunk_end - length ||
- length < sizeof(sctp_paramhdr_t))
- return 0;
+ sctp_walk_params(param, addip, addip_hdr.params) {
+ size_t length = ntohs(param.p->length);

+ *errp = param.p;
switch (param.p->type) {
+ case SCTP_PARAM_ERR_CAUSE:
+ break;
+ case SCTP_PARAM_IPV4_ADDRESS:
+ if (length != sizeof(sctp_ipv4addr_param_t))
+ return false;
+ addr_param_seen = true;
+ break;
+ case SCTP_PARAM_IPV6_ADDRESS:
+ if (length != sizeof(sctp_ipv6addr_param_t))
+ return false;
+ addr_param_seen = true;
+ break;
case SCTP_PARAM_ADD_IP:
case SCTP_PARAM_DEL_IP:
case SCTP_PARAM_SET_PRIMARY:
- asconf_param = (sctp_addip_param_t *)param.v;
- plen = ntohs(asconf_param->param_hdr.length);
- if (plen < sizeof(sctp_addip_param_t) +
- sizeof(sctp_paramhdr_t))
- return 0;
+ /* In ASCONF chunks, these need to be first. */
+ if (addr_param_needed && !addr_param_seen)
+ return false;
+ length = ntohs(param.addip->param_hdr.length);
+ if (length < sizeof(sctp_addip_param_t) +
+ sizeof(sctp_paramhdr_t))
+ return false;
break;
case SCTP_PARAM_SUCCESS_REPORT:
case SCTP_PARAM_ADAPTATION_LAYER_IND:
if (length != sizeof(sctp_addip_param_t))
- return 0;
-
+ return false;
break;
default:
- break;
+ /* This is unkown to us, reject! */
+ return false;
}
-
- param.v += WORD_ROUND(length);
}

- if (param.v != chunk_end)
- return 0;
+ /* Remaining sanity checks. */
+ if (addr_param_needed && !addr_param_seen)
+ return false;
+ if (!addr_param_needed && addr_param_seen)
+ return false;
+ if (param.v != chunk->chunk_end)
+ return false;

- return 1;
+ return true;
}

/* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3120,16 +3133,17 @@ int sctp_verify_asconf(const struct sctp
struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
struct sctp_chunk *asconf)
{
+ sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+ bool all_param_pass = true;
+ union sctp_params param;
sctp_addiphdr_t *hdr;
union sctp_addr_param *addr_param;
sctp_addip_param_t *asconf_param;
struct sctp_chunk *asconf_ack;
-
__be16 err_code;
int length = 0;
int chunk_len;
__u32 serial;
- int all_param_pass = 1;

chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3157,9 +3171,14 @@ struct sctp_chunk *sctp_process_asconf(s
goto done;

/* Process the TLVs contained within the ASCONF chunk. */
- while (chunk_len > 0) {
+ sctp_walk_params(param, addip, addip_hdr.params) {
+ /* Skip preceeding address parameters. */
+ if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+ param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+ continue;
+
err_code = sctp_process_asconf_param(asoc, asconf,
- asconf_param);
+ param.addip);
/* ADDIP 4.1 A7)
* If an error response is received for a TLV parameter,
* all TLVs with no response before the failed TLV are
@@ -3167,28 +3186,20 @@ struct sctp_chunk *sctp_process_asconf(s
* the failed response are considered unsuccessful unless
* a specific success indication is present for the parameter.
*/
- if (SCTP_ERROR_NO_ERROR != err_code)
- all_param_pass = 0;
-
+ if (err_code != SCTP_ERROR_NO_ERROR)
+ all_param_pass = false;
if (!all_param_pass)
- sctp_add_asconf_response(asconf_ack,
- asconf_param->crr_id, err_code,
- asconf_param);
+ sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+ err_code, param.addip);

/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
* an IP address sends an 'Out of Resource' in its response, it
* MUST also fail any subsequent add or delete requests bundled
* in the ASCONF.
*/
- if (SCTP_ERROR_RSRC_LOW == err_code)
+ if (err_code == SCTP_ERROR_RSRC_LOW)
goto done;
-
- /* Move to the next ASCONF param. */
- length = ntohs(asconf_param->param_hdr.length);
- asconf_param = (void *)asconf_param + length;
- chunk_len -= length;
}
-
done:
asoc->peer.addip_serial++;

--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3516,9 +3516,7 @@ sctp_disposition_t sctp_sf_do_asconf(con
struct sctp_chunk *asconf_ack = NULL;
struct sctp_paramhdr *err_param = NULL;
sctp_addiphdr_t *hdr;
- union sctp_addr_param *addr_param;
__u32 serial;
- int length;

if (!sctp_vtag_verify(chunk, asoc)) {
sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3543,17 +3541,8 @@ sctp_disposition_t sctp_sf_do_asconf(con
hdr = (sctp_addiphdr_t *)chunk->skb->data;
serial = ntohl(hdr->serial);

- addr_param = (union sctp_addr_param *)hdr->params;
- length = ntohs(addr_param->p.length);
- if (length < sizeof(sctp_paramhdr_t))
- return sctp_sf_violation_paramlen(ep, asoc, type, arg,
- (void *)addr_param, commands);
-
/* Verify the ASCONF chunk before processing it. */
- if (!sctp_verify_asconf(asoc,
- (sctp_paramhdr_t *)((void *)addr_param + length),
- (void *)chunk->chunk_end,
- &err_param))
+ if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
return sctp_sf_violation_paramlen(ep, asoc, type, arg,
(void *)err_param, commands);

@@ -3670,10 +3659,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack
rcvd_serial = ntohl(addip_hdr->serial);

/* Verify the ASCONF-ACK chunk before processing it. */
- if (!sctp_verify_asconf(asoc,
- (sctp_paramhdr_t *)addip_hdr->params,
- (void *)asconf_ack->chunk_end,
- &err_param))
+ if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
return sctp_sf_violation_paramlen(ep, asoc, type, arg,
(void *)err_param, commands);

2014-11-01 23:01:22

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 014/102] Revert "iwlwifi: dvm: don't enable CTS to self"

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <[email protected]>

commit f47f46d7b09cf1d09e4b44b6cc4dd7d68a08028c upstream.

This reverts commit 43d826ca5979927131685cc2092c7ce862cb91cd.

This commit caused packet loss.

Signed-off-by: Emmanuel Grumbach <[email protected]>
[bwh: Backported to 3.2:
- Adjust filename
- Condition for RXON_FLG_SELF_CTS_EN in iwlagn_commit_rxon() was different]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/wireless/iwlwifi/iwl-agn-rxon.c | 13 -------------
1 file changed, 13 deletions(-)

--- a/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
@@ -440,6 +440,14 @@ int iwlagn_commit_rxon(struct iwl_priv *
/* always get timestamp with Rx frame */
ctx->staging.flags |= RXON_FLG_TSF2HOST_MSK;

+ /*
+ * force CTS-to-self frames protection if RTS-CTS is not preferred
+ * one aggregation protection method
+ */
+ if (!(priv->cfg->ht_params &&
+ priv->cfg->ht_params->use_rts_for_aggregation))
+ ctx->staging.flags |= RXON_FLG_SELF_CTS_EN;
+
if ((ctx->vif && ctx->vif->bss_conf.use_short_slot) ||
!(ctx->staging.flags & RXON_FLG_BAND_24G_MSK))
ctx->staging.flags |= RXON_FLG_SHORT_SLOT_MSK;
@@ -872,6 +880,11 @@ void iwlagn_bss_info_changed(struct ieee
else
ctx->staging.flags &= ~RXON_FLG_TGG_PROTECT_MSK;

+ if (bss_conf->use_cts_prot)
+ ctx->staging.flags |= RXON_FLG_SELF_CTS_EN;
+ else
+ ctx->staging.flags &= ~RXON_FLG_SELF_CTS_EN;
+
memcpy(ctx->staging.bssid_addr, bss_conf->bssid, ETH_ALEN);

if (vif->type == NL80211_IFTYPE_AP ||

2014-11-01 23:01:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 101/102] ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Julian Anastasov <[email protected]>

commit 2627b7e15c5064ddd5e578e4efd948d48d531a3f upstream.

commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack")
added second ip_vs_conn_drop_conntrack call instead of just adding
the needed check. As result, the first call still can cause
crash on netns exit. Remove it.

Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Hans Schillstrom <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/netfilter/ipvs/ip_vs_conn.c | 1 -
1 file changed, 1 deletion(-)

--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -777,7 +777,6 @@ static void ip_vs_conn_expire(unsigned l
ip_vs_control_del(cp);

if (cp->flags & IP_VS_CONN_F_NFCT) {
- ip_vs_conn_drop_conntrack(cp);
/* Do not access conntracks during subsys cleanup
* because nf_conntrack_find_get can not be used after
* conntrack cleanup for the net.

2014-11-01 23:01:13

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 099/102] ext2: Fix fs corruption in ext2_get_xip_mem()

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 7ba3ec5749ddb61f79f7be17b5fd7720eebc52de upstream.

Commit 8e3dffc651cb "Ext2: mark inode dirty after the function
dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block()
called from ext2_get_xip_mem(). That function called ext2_get_block()
mistakenly asking it to map 0 blocks while 1 was intended. Before the
above mentioned commit things worked out fine by luck but after that commit
we started returning that we allocated 0 blocks while we in fact
allocated 1 block and thus allocation was looping until all blocks in
the filesystem were exhausted.

Fix the problem by properly asking for one block and also add assertion
in ext2_get_blocks() to catch similar problems.

Reported-and-tested-by: Andiry Xu <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/ext2/inode.c | 2 ++
fs/ext2/xip.c | 1 +
2 files changed, 3 insertions(+)

--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -619,6 +619,8 @@ static int ext2_get_blocks(struct inode
int count = 0;
ext2_fsblk_t first_block = 0;

+ BUG_ON(maxblocks == 0);
+
depth = ext2_block_to_path(inode,iblock,offsets,&blocks_to_boundary);

if (depth == 0)
--- a/fs/ext2/xip.c
+++ b/fs/ext2/xip.c
@@ -37,6 +37,7 @@ __ext2_get_block(struct inode *inode, pg
int rc;

memset(&tmp, 0, sizeof(struct buffer_head));
+ tmp.b_size = 1 << inode->i_blkbits;
rc = ext2_get_block(inode, pgoff, &tmp, create);
*result = tmp.b_blocknr;

2014-11-01 23:01:08

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 075/102] mm: migrate: Close race between migration completion and mprotect

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <[email protected]>

commit d3cb8bf6081b8b7a2dabb1264fe968fd870fa595 upstream.

A migration entry is marked as write if pte_write was true at the time the
entry was created. The VMA protections are not double checked when migration
entries are being removed as mprotect marks write-migration-entries as
read. It means that potentially we take a spurious fault to mark PTEs write
again but it's straight-forward. However, there is a race between write
migrations being marked read and migrations finishing. This potentially
allows a PTE to be write that should have been read. Close this race by
double checking the VMA permissions using maybe_mkwrite when migration
completes.

[[email protected]: use maybe_mkwrite]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
mm/migrate.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -141,8 +141,11 @@ static int remove_migration_pte(struct p

get_page(new);
pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
+
+ /* Recheck VMA as permissions can change since migration started */
if (is_write_migration_entry(entry))
- pte = pte_mkwrite(pte);
+ pte = maybe_mkwrite(pte, vma);
+
#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(new))
pte = pte_mkhuge(pte);

2014-11-01 23:01:03

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 092/102] KVM: x86: use new CS.RPL as CPL during task switch

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <[email protected]>

commit 2356aaeb2f58f491679dc0c38bc3f6dbe54e7ded upstream.

During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
to all the other requirements) and will be the new CPL. So far this
worked by carefully setting the CS selector and flag before doing the
task switch; setting CS.selector will already change the CPL.

However, this will not work once we get the CPL from SS.DPL, because
then you will have to set the full segment descriptor cache to change
the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the
task switch, and the check that SS.DPL == CPL will fail.

Temporarily assume that the CPL comes from CS.RPL during task switch
to a protected-mode task. This is the same approach used in QEMU's
emulation code, which (until version 2.0) manually tracks the CPL.

Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- load_state_from_tss32() does not support VM86 mode]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/emulate.c | 60 +++++++++++++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 27 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1233,11 +1233,11 @@ static int write_segment_descriptor(stru
}

/* Does not support long mode */
-static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, int seg)
+static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+ u16 selector, int seg, u8 cpl)
{
struct desc_struct seg_desc;
- u8 dpl, rpl, cpl;
+ u8 dpl, rpl;
unsigned err_vec = GP_VECTOR;
u32 err_code = 0;
bool null_selector = !(selector & ~0x3); /* 0000-0003 are null */
@@ -1286,7 +1286,6 @@ static int load_segment_descriptor(struc

rpl = selector & 3;
dpl = seg_desc.dpl;
- cpl = ctxt->ops->cpl(ctxt);

switch (seg) {
case VCPU_SREG_SS:
@@ -1349,6 +1348,13 @@ exception:
return X86EMUL_PROPAGATE_FAULT;
}

+static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+ u16 selector, int seg)
+{
+ u8 cpl = ctxt->ops->cpl(ctxt);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl);
+}
+
static void write_register_operand(struct operand *op)
{
/* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
@@ -2213,6 +2219,7 @@ static int load_state_from_tss16(struct
struct tss_segment_16 *tss)
{
int ret;
+ u8 cpl;

ctxt->_eip = tss->ip;
ctxt->eflags = tss->flag | 2;
@@ -2235,23 +2242,25 @@ static int load_state_from_tss16(struct
set_segment_selector(ctxt, tss->ss, VCPU_SREG_SS);
set_segment_selector(ctxt, tss->ds, VCPU_SREG_DS);

+ cpl = tss->cs & 3;
+
/*
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR);
+ ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2330,6 +2339,7 @@ static int load_state_from_tss32(struct
struct tss_segment_32 *tss)
{
int ret;
+ u8 cpl;

if (ctxt->ops->set_cr(ctxt, 3, tss->cr3))
return emulate_gp(ctxt, 0);
@@ -2346,7 +2356,8 @@ static int load_state_from_tss32(struct

/*
* SDM says that segment selectors are loaded before segment
- * descriptors
+ * descriptors. This is important because CPL checks will
+ * use CS.RPL.
*/
set_segment_selector(ctxt, tss->ldt_selector, VCPU_SREG_LDTR);
set_segment_selector(ctxt, tss->es, VCPU_SREG_ES);
@@ -2356,29 +2367,31 @@ static int load_state_from_tss32(struct
set_segment_selector(ctxt, tss->fs, VCPU_SREG_FS);
set_segment_selector(ctxt, tss->gs, VCPU_SREG_GS);

+ cpl = tss->cs & 3;
+
/*
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR);
+ ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS);
+ ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS);
+ ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl);
if (ret != X86EMUL_CONTINUE)
return ret;

2014-11-01 23:00:57

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 086/102] KVM: x86: Improve thread safety in pit

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Andy Honig <[email protected]>

commit 2febc839133280d5a5e8e1179c94ea674489dae2 upstream.

There's a race condition in the PIT emulation code in KVM. In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization. If the race condition occurs at the wrong time this
can crash the host kernel.

This fixes CVE-2014-3611.

Signed-off-by: Andrew Honig <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/i8254.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -264,8 +264,10 @@ void __kvm_migrate_pit_timer(struct kvm_
return;

timer = &pit->pit_state.pit_timer.timer;
+ mutex_lock(&pit->pit_state.lock);
if (hrtimer_cancel(timer))
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+ mutex_unlock(&pit->pit_state.lock);
}

static void destroy_pit_timer(struct kvm_pit *pit)

2014-11-01 23:00:53

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 081/102] ipv4: disable bh while doing route gc

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marcelo Ricardo Leitner <[email protected]>

Further tests revealed that after moving the garbage collector to a work
queue and protecting it with a spinlock may leave the system prone to
soft lockups if bottom half gets very busy.

It was reproced with a set of firewall rules that REJECTed packets. If
the NIC bottom half handler ends up running on the same CPU that is
running the garbage collector on a very large cache, the garbage
collector will not be able to do its job due to the amount of work
needed for handling the REJECTs and also won't reschedule.

The fix is to disable bottom half during the garbage collecting, as it
already was in the first place (most calls to it came from softirqs).

Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv4/route.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1000,7 +1000,7 @@ static void __do_rt_garbage_collect(int
* do not make it too frequently.
*/

- spin_lock(&rt_gc_lock);
+ spin_lock_bh(&rt_gc_lock);

RT_CACHE_STAT_INC(gc_total);

@@ -1103,7 +1103,7 @@ work_done:
dst_entries_get_slow(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh)
expire = ip_rt_gc_timeout;
out:
- spin_unlock(&rt_gc_lock);
+ spin_unlock_bh(&rt_gc_lock);
}

static void __rt_garbage_collect(struct work_struct *w)

2014-11-01 23:03:40

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 080/102] ipv4: avoid parallel route cache gc executions

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marcelo Ricardo Leitner <[email protected]>

When rt_intern_hash() has to deal with neighbour cache overflowing,
it triggers the route cache garbage collector in an attempt to free
some references on neighbour entries.

Such call cannot be done async but should also not run in parallel with
an already-running one, so that they don't collapse fighting over the
hash lock entries.

This patch thus blocks parallel executions with spinlocks:
- A call from worker and from rt_intern_hash() are not the same, and
cannot be merged, thus they will wait each other on rt_gc_lock.
- Calls to gc from rt_intern_hash() may happen in parallel but we must
wait for it to finish in order to try again. This dedup and
synchrinozation is then performed by the locking just before calling
__do_rt_garbage_collect().

Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv4/route.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -988,6 +988,7 @@ static void __do_rt_garbage_collect(int
static unsigned long last_gc;
static int rover;
static int equilibrium;
+ static DEFINE_SPINLOCK(rt_gc_lock);
struct rtable *rth;
struct rtable __rcu **rthp;
unsigned long now = jiffies;
@@ -999,6 +1000,8 @@ static void __do_rt_garbage_collect(int
* do not make it too frequently.
*/

+ spin_lock(&rt_gc_lock);
+
RT_CACHE_STAT_INC(gc_total);

if (now - last_gc < min_interval &&
@@ -1091,7 +1094,7 @@ static void __do_rt_garbage_collect(int
if (net_ratelimit())
printk(KERN_WARNING "dst cache overflow\n");
RT_CACHE_STAT_INC(gc_dst_overflow);
- return;
+ goto out;

work_done:
expire += min_interval;
@@ -1099,7 +1102,8 @@ work_done:
dst_entries_get_fast(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh ||
dst_entries_get_slow(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh)
expire = ip_rt_gc_timeout;
-out: return;
+out:
+ spin_unlock(&rt_gc_lock);
}

static void __rt_garbage_collect(struct work_struct *w)
@@ -1174,7 +1178,7 @@ static struct rtable *rt_intern_hash(uns
unsigned long now;
u32 min_score;
int chain_length;
- int attempts = !in_softirq();
+ int attempts = 1;

restart:
chain_length = 0;
@@ -1311,8 +1315,15 @@ restart:
can be released. Try to shrink route cache,
it is most likely it holds some neighbour records.
*/
- if (attempts-- > 0) {
- __do_rt_garbage_collect(1, 0);
+ if (!in_softirq() && attempts-- > 0) {
+ static DEFINE_SPINLOCK(lock);
+
+ if (spin_trylock(&lock)) {
+ __do_rt_garbage_collect(1, 0);
+ spin_unlock(&lock);
+ } else {
+ spin_unlock_wait(&lock);
+ }
goto restart;
}

2014-11-01 23:03:43

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 084/102] ipv6: reuse ip6_frag_id from ip6_ufo_append_data

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Hannes Frederic Sowa <[email protected]>

commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream.

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv6/udp_offload.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1362,7 +1362,7 @@ static struct sk_buff *udp6_ufo_fragment
fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
fptr->nexthdr = nexthdr;
fptr->reserved = 0;
- ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+ fptr->identification = skb_shinfo(skb)->ip6_frag_id;

/* Fragment the skb. ipv6 header and the remaining fields of the
* fragment header are updated in ipv6_gso_segment()

2014-11-01 23:03:46

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 009/102] regmap: if format_write is used, declare all registers as "unreadable"

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Wolfram Sang <[email protected]>

commit 4191f19792bf91267835eb090d970e9cd6277a65 upstream.

Using .format_write means, we have a custom function to write to the
chip, but not to read back. Also, mark registers as "not precious" and
"not volatile" which is implicit because we cannot read them. Make those
functions use 'regmap_readable' to reuse the checks done there.

Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/base/regmap/regmap.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -36,6 +36,9 @@ bool regmap_readable(struct regmap *map,
if (map->max_register && reg > map->max_register)
return false;

+ if (map->format.format_write)
+ return false;
+
if (map->readable_reg)
return map->readable_reg(map->dev, reg);

@@ -44,7 +47,7 @@ bool regmap_readable(struct regmap *map,

bool regmap_volatile(struct regmap *map, unsigned int reg)
{
- if (map->max_register && reg > map->max_register)
+ if (!regmap_readable(map, reg))
return false;

if (map->volatile_reg)
@@ -55,7 +58,7 @@ bool regmap_volatile(struct regmap *map,

bool regmap_precious(struct regmap *map, unsigned int reg)
{
- if (map->max_register && reg > map->max_register)
+ if (!regmap_readable(map, reg))
return false;

if (map->precious_reg)

2014-11-01 23:03:54

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 006/102] KVM: s390: Fix user triggerable bug in dead code

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Christian Borntraeger <[email protected]>

commit 614a80e474b227cace52fd6e3c790554db8a396e upstream.

In the early days, we had some special handling for the
KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit
d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not
just a subset).

Now this switch statement is just a sanity check for userspace
not messing with the kvm_run structure. Unfortunately, this
allows userspace to trigger a kernel BUG. Let's just remove
this switch statement.

Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 13 -------------
1 file changed, 13 deletions(-)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -516,16 +516,6 @@ rerun_vcpu:

BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);

- switch (kvm_run->exit_reason) {
- case KVM_EXIT_S390_SIEIC:
- case KVM_EXIT_UNKNOWN:
- case KVM_EXIT_INTR:
- case KVM_EXIT_S390_RESET:
- break;
- default:
- BUG();
- }
-
vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;

2014-11-01 23:03:50

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 019/102] xfs: don't dirty buffers beyond EOF

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dave Chinner <[email protected]>

commit 22e757a49cf010703fcb9c9b4ef793248c39b0c2 upstream.

generic/263 is failing fsx at this point with a page spanning
EOF that cannot be invalidated. The operations are:

1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes)
1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes)
1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes)

where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
write attempts to invalidate the cached page over this range, it
fails with -EBUSY and so any attempt to do page invalidation fails.

The real question is this: Why can't that page be invalidated after
it has been written to disk and cleaned?

Well, there's data on the first two buffers in the page (1k block
size, 4k page), but the third buffer on the page (i.e. beyond EOF)
is failing drop_buffers because it's bh->b_state == 0x3, which is
BH_Uptodate | BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say
what?

OK, set_buffer_dirty() is called on all buffers from
__set_page_buffers_dirty(), regardless of whether the buffer is
beyond EOF or not, which means that when we get to ->writepage,
we have buffers marked dirty beyond EOF that we need to clean.
So, we need to implement our own .set_page_dirty method that
doesn't dirty buffers beyond EOF.

This is messy because the buffer code is not meant to be shared
and it has interesting locking issues on the buffer dirty bits.
So just copy and paste it and then modify it to suit what we need.

Note: the solutions the other filesystems and generic block code use
of marking the buffers clean in ->writepage does not work for XFS.
It still leaves dirty buffers beyond EOF and invalidations still
fail. Hence rather than play whack-a-mole, this patch simply
prevents those buffers from being dirtied in the first place.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/xfs/xfs_aops.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1448,11 +1448,72 @@ xfs_vm_readpages(
return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
}

+/*
+ * This is basically a copy of __set_page_dirty_buffers() with one
+ * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
+ * dirty, we'll never be able to clean them because we don't write buffers
+ * beyond EOF, and that means we can't invalidate pages that span EOF
+ * that have been marked dirty. Further, the dirty state can leak into
+ * the file interior if the file is extended, resulting in all sorts of
+ * bad things happening as the state does not match the underlying data.
+ *
+ * XXX: this really indicates that bufferheads in XFS need to die. Warts like
+ * this only exist because of bufferheads and how the generic code manages them.
+ */
+STATIC int
+xfs_vm_set_page_dirty(
+ struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct inode *inode = mapping->host;
+ loff_t end_offset;
+ loff_t offset;
+ int newly_dirty;
+
+ if (unlikely(!mapping))
+ return !TestSetPageDirty(page);
+
+ end_offset = i_size_read(inode);
+ offset = page_offset(page);
+
+ spin_lock(&mapping->private_lock);
+ if (page_has_buffers(page)) {
+ struct buffer_head *head = page_buffers(page);
+ struct buffer_head *bh = head;
+
+ do {
+ if (offset < end_offset)
+ set_buffer_dirty(bh);
+ bh = bh->b_this_page;
+ offset += 1 << inode->i_blkbits;
+ } while (bh != head);
+ }
+ newly_dirty = !TestSetPageDirty(page);
+ spin_unlock(&mapping->private_lock);
+
+ if (newly_dirty) {
+ /* sigh - __set_page_dirty() is static, so copy it here, too */
+ unsigned long flags;
+
+ spin_lock_irqsave(&mapping->tree_lock, flags);
+ if (page->mapping) { /* Race with truncate? */
+ WARN_ON_ONCE(!PageUptodate(page));
+ account_page_dirtied(page, mapping);
+ radix_tree_tag_set(&mapping->page_tree,
+ page_index(page), PAGECACHE_TAG_DIRTY);
+ }
+ spin_unlock_irqrestore(&mapping->tree_lock, flags);
+ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+ }
+ return newly_dirty;
+}
+
const struct address_space_operations xfs_address_space_operations = {
.readpage = xfs_vm_readpage,
.readpages = xfs_vm_readpages,
.writepage = xfs_vm_writepage,
.writepages = xfs_vm_writepages,
+ .set_page_dirty = xfs_vm_set_page_dirty,
.releasepage = xfs_vm_releasepage,
.invalidatepage = xfs_vm_invalidatepage,
.write_begin = xfs_vm_write_begin,

2014-11-01 23:03:58

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 087/102] nEPT: Nested INVEPT

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Nadav Har'El <[email protected]>

commit bfd0a56b90005f8c8a004baf407ad90045c2b11e upstream.

If we let L1 use EPT, we should probably also support the INVEPT instruction.

In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.

Reviewed-by: Xiao Guangrong <[email protected]>
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Jun Nakajima <[email protected]>
Signed-off-by: Xinhao Xu <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- Adjust context, filename
- Add definition of nested_ept_get_cr3(), added upstream by commit
155a97a3d7c7 ("nEPT: MMU context for nested EPT")]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -279,6 +279,7 @@ enum vmcs_field {
#define EXIT_REASON_APIC_ACCESS 44
#define EXIT_REASON_EPT_VIOLATION 48
#define EXIT_REASON_EPT_MISCONFIG 49
+#define EXIT_REASON_INVEPT 50
#define EXIT_REASON_WBINVD 54
#define EXIT_REASON_XSETBV 55

@@ -397,6 +398,7 @@ enum vmcs_field {
#define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
#define VMX_EPT_EXTENT_CONTEXT 1
#define VMX_EPT_EXTENT_GLOBAL 2
+#define VMX_EPT_EXTENT_SHIFT 24

#define VMX_EPT_EXECUTE_ONLY_BIT (1ull)
#define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6)
@@ -404,6 +406,7 @@ enum vmcs_field {
#define VMX_EPTP_WB_BIT (1ull << 14)
#define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
#define VMX_EPT_1GB_PAGE_BIT (1ull << 17)
+#define VMX_EPT_INVEPT_BIT (1ull << 20)
#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
#define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
#define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2869,6 +2869,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu
mmu_sync_roots(vcpu);
spin_unlock(&vcpu->kvm->mmu_lock);
}
+EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);

static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
u32 access, struct x86_exception *exception)
@@ -3131,6 +3132,7 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *
++vcpu->stat.tlb_flush;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
+EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);

static void paging_new_cr3(struct kvm_vcpu *vcpu)
{
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -602,6 +602,7 @@ static void nested_release_page_clean(st
kvm_release_page_clean(page);
}

+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
static u64 construct_eptp(unsigned long root_hpa);
static void kvm_cpu_vmxon(u64 addr);
static void kvm_cpu_vmxoff(void);
@@ -1899,6 +1900,7 @@ static u32 nested_vmx_secondary_ctls_low
static u32 nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high;
static u32 nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high;
static u32 nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high;
+static u32 nested_vmx_ept_caps;
static __init void nested_vmx_setup_ctls_msrs(void)
{
/*
@@ -5550,6 +5552,74 @@ static int handle_vmptrst(struct kvm_vcp
return 1;
}

+/* Emulate the INVEPT instruction */
+static int handle_invept(struct kvm_vcpu *vcpu)
+{
+ u32 vmx_instruction_info, types;
+ unsigned long type;
+ gva_t gva;
+ struct x86_exception e;
+ struct {
+ u64 eptp, gpa;
+ } operand;
+ u64 eptp_mask = ((1ull << 51) - 1) & PAGE_MASK;
+
+ if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
+ !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ if (!nested_vmx_check_permission(vcpu))
+ return 1;
+
+ if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+ types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
+
+ if (!(types & (1UL << type))) {
+ nested_vmx_failValid(vcpu,
+ VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ return 1;
+ }
+
+ /* According to the Intel VMX instruction reference, the memory
+ * operand is read even if it isn't needed (e.g., for type==global)
+ */
+ if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+ vmx_instruction_info, &gva))
+ return 1;
+ if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
+ sizeof(operand), &e)) {
+ kvm_inject_page_fault(vcpu, &e);
+ return 1;
+ }
+
+ switch (type) {
+ case VMX_EPT_EXTENT_CONTEXT:
+ if ((operand.eptp & eptp_mask) !=
+ (nested_ept_get_cr3(vcpu) & eptp_mask))
+ break;
+ case VMX_EPT_EXTENT_GLOBAL:
+ kvm_mmu_sync_roots(vcpu);
+ kvm_mmu_flush_tlb(vcpu);
+ nested_vmx_succeed(vcpu);
+ break;
+ default:
+ BUG_ON(1);
+ break;
+ }
+
+ skip_emulated_instruction(vcpu);
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@@ -5591,6 +5661,7 @@ static int (*kvm_vmx_exit_handlers[])(st
[EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
+ [EXIT_REASON_INVEPT] = handle_invept,
};

static const int kvm_vmx_max_exit_handlers =
@@ -5775,6 +5846,7 @@ static bool nested_vmx_exit_handled(stru
case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
+ case EXIT_REASON_INVEPT:
/*
* VMX instructions trap unconditionally. This allows L1 to
* emulate them for its L2 guest, i.e., allows 3-level nesting!
@@ -6436,6 +6508,12 @@ static void vmx_set_supported_cpuid(u32
entry->ecx |= bit(X86_FEATURE_VMX);
}

+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+ /* return the page table to be shadowed - in our case, EPT12 */
+ return get_vmcs12(vcpu)->ept_pointer;
+}
+
/*
* prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
* L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it

2014-11-01 23:04:11

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 016/102] USB: sierra: avoid CDC class functions on "68A3" devices

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Bjørn Mork <[email protected]>

commit 049255f51644c1105775af228396d187402a5934 upstream.

Sierra Wireless Direct IP devices using the 68A3 product ID
can be configured for modes including a CDC ECM class function.
The known example uses interface numbers 12 and 13 for the ECM
control and data interfaces respectively, consistent with CDC
MBIM function interface numbering on other Sierra devices.

It seems cleaner to restrict this driver to the ff/ff/ff
vendor specific interfaces rather than increasing the already
long interface number blacklist. This should be more future
proof if Sierra adds more class functions using interface
numbers not yet in the blacklist.

Signed-off-by: Bjørn Mork <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/usb/serial/sierra.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -296,14 +296,16 @@ static const struct usb_device_id id_tab
{ USB_DEVICE(0x1199, 0x68A2), /* Sierra Wireless MC77xx in QMI mode */
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
- { USB_DEVICE(0x1199, 0x68A3), /* Sierra Wireless Direct IP modems */
+ /* Sierra Wireless Direct IP modems */
+ { USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
/* AT&T Direct IP LTE modems */
{ USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
- { USB_DEVICE(0x0f3d, 0x68A3), /* Airprime/Sierra Wireless Direct IP modems */
+ /* Airprime/Sierra Wireless Direct IP modems */
+ { USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},

2014-11-01 23:04:06

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 008/102] MIPS: ZBOOT: add missing <linux/string.h> include

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Aurelien Jarno <[email protected]>

commit 29593fd5a8149462ed6fad0d522234facdaee6c8 upstream.

Commit dc4d7b37 (MIPS: ZBOOT: gather string functions into string.c)
moved the string related functions into a separate file, which might
cause the following build error, depending on the configuration:

| CC arch/mips/boot/compressed/decompress.o
| In file included from linux/arch/mips/boot/compressed/../../../../lib/decompress_unxz.c:234:0,
| from linux/arch/mips/boot/compressed/decompress.c:67:
| linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'fill_temp':
| linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c:162:2: error: implicit declaration of function 'memcpy' [-Werror=implicit-function-declaration]
| cc1: some warnings being treated as errors
| linux/scripts/Makefile.build:308: recipe for target 'arch/mips/boot/compressed/decompress.o' failed
| make[6]: *** [arch/mips/boot/compressed/decompress.o] Error 1
| linux/arch/mips/Makefile:308: recipe for target 'vmlinuz' failed

It does not fail with the standard configuration, as when
CONFIG_DYNAMIC_DEBUG is not enabled <linux/string.h> gets included in
include/linux/dynamic_debug.h. There might be other ways for it to
get indirectly included.

We can't add the include directly in xz_dec_stream.c as some
architectures might want to use a different version for the boot/
directory (see for example arch/x86/boot/string.h).

Signed-off-by: Aurelien Jarno <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7420/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/mips/boot/compressed/decompress.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/mips/boot/compressed/decompress.c
+++ b/arch/mips/boot/compressed/decompress.c
@@ -13,6 +13,7 @@

#include <linux/types.h>
#include <linux/kernel.h>
+#include <linux/string.h>

#include <asm/addrspace.h>

2014-11-01 23:04:03

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 100/102] nfsd: Fix ACL null pointer deref

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Sergio Gelato <[email protected]>

BugLink: http://bugs.launchpad.net/bugs/1348670

Fix regression introduced in pre-3.14 kernels by cherry-picking
aa07c713ecfc0522916f3cd57ac628ea6127c0ec
(NFSD: Call ->set_acl with a NULL ACL structure if no entries).

The affected code was removed in 3.14 by commit
4ac7249ea5a0ceef9f8269f63f33cc873c3fac61
(nfsd: use get_acl and ->set_acl).
The ->set_acl methods are already able to cope with a NULL argument.

Signed-off-by: Sergio Gelato <[email protected]>
[bwh: Rewrite the subject]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/nfsd/vfs.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 446dc01..fc208e4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -508,6 +508,9 @@
char *buf = NULL;
int error = 0;

+ if (!pacl)
+ return vfs_setxattr(dentry, key, NULL, 0, 0);
+
buflen = posix_acl_xattr_size(pacl->a_count);
buf = kmalloc(buflen, GFP_KERNEL);
error = -ENOMEM;

2014-11-01 23:06:06

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 088/102] kvm: vmx: handle invvpid vm exit gracefully

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Petr Matousek <[email protected]>

commit a642fc305053cc1c6e47e4f4df327895747ab485 upstream.

On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.

Fix this by installing an invvpid vm exit handler.

This is CVE-2014-3646.

Signed-off-by: Petr Matousek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2:
- Adjust filename
- Drop inapplicable change to exit reason string array]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx.c | 9 ++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -280,6 +280,7 @@ enum vmcs_field {
#define EXIT_REASON_EPT_VIOLATION 48
#define EXIT_REASON_EPT_MISCONFIG 49
#define EXIT_REASON_INVEPT 50
+#define EXIT_REASON_INVVPID 53
#define EXIT_REASON_WBINVD 54
#define EXIT_REASON_XSETBV 55

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5620,6 +5620,12 @@ static int handle_invept(struct kvm_vcpu
return 1;
}

+static int handle_invvpid(struct kvm_vcpu *vcpu)
+{
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@@ -5662,6 +5668,7 @@ static int (*kvm_vmx_exit_handlers[])(st
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_INVEPT] = handle_invept,
+ [EXIT_REASON_INVVPID] = handle_invvpid,
};

static const int kvm_vmx_max_exit_handlers =
@@ -5846,7 +5853,7 @@ static bool nested_vmx_exit_handled(stru
case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
- case EXIT_REASON_INVEPT:
+ case EXIT_REASON_INVEPT: case EXIT_REASON_INVVPID:
/*
* VMX instructions trap unconditionally. This allows L1 to
* emulate them for its L2 guest, i.e., allows 3-level nesting!

2014-11-01 23:06:10

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 010/102] regmap: Fix handling of volatile registers for format_write() chips

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mark Brown <[email protected]>

commit 5844a8b9d98ec11ce1d77610daacf3f0a0e14715 upstream.

A previous over-zealous factorisation of code means that we only treat
registers as volatile if they are readable. For most devices this is fine
since normally most registers can be read and volatility implies
readability but for format_write() devices where there is no readback from
the hardware and we use volatility to mean simply uncacheability this means
that we end up treating all registers as cacheble.

A bigger refactoring of the code to clarify this is in order but as a fix
make a minimal change and only check readability when checking volatility
if there is no format_write() operation defined for the device.

Signed-off-by: Mark Brown <[email protected]>
Tested-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/base/regmap/regmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -47,7 +47,7 @@ bool regmap_readable(struct regmap *map,

bool regmap_volatile(struct regmap *map, unsigned int reg)
{
- if (!regmap_readable(map, reg))
+ if (!map->format.format_write && !regmap_readable(map, reg))
return false;

if (map->volatile_reg)

2014-11-01 23:06:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 018/102] drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Hellstrom <[email protected]>

commit f01ea0c3d9db536c64d47922716d8b3b8f21d850 upstream.

The code waiting for fifo idle was incorrect and could possibly spin
forever under certain circumstances.

Signed-off-by: Thomas Hellstrom <[email protected]>
Reported-by: Mark Sheldon <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
Reivewed-by: Mark Sheldon <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -163,8 +163,9 @@ void vmw_fifo_release(struct vmw_private

mutex_lock(&dev_priv->hw_mutex);

+ vmw_write(dev_priv, SVGA_REG_SYNC, SVGA_SYNC_GENERIC);
while (vmw_read(dev_priv, SVGA_REG_BUSY) != 0)
- vmw_write(dev_priv, SVGA_REG_SYNC, SVGA_SYNC_GENERIC);
+ ;

dev_priv->last_read_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);

2014-11-01 23:06:04

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 098/102] dm crypt: fix access beyond the end of allocated space

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <[email protected]>

commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream.

The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.

This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm
crypt: use async crypto"). However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two. This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data"). By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.

To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc. The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.

The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block. dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.

When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).

dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure. However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector. If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.

Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.

Also correct the alignment of dm_crypt_request. struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).

Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/md/dm-crypt.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1565,6 +1565,7 @@ static int crypt_ctr(struct dm_target *t
unsigned int key_size, opt_params;
unsigned long long tmpll;
int ret;
+ size_t iv_size_padding;
struct dm_arg_set as;
const char *opt_string;

@@ -1600,12 +1601,23 @@ static int crypt_ctr(struct dm_target *t

cc->dmreq_start = sizeof(struct ablkcipher_request);
cc->dmreq_start += crypto_ablkcipher_reqsize(any_tfm(cc));
- cc->dmreq_start = ALIGN(cc->dmreq_start, crypto_tfm_ctx_alignment());
- cc->dmreq_start += crypto_ablkcipher_alignmask(any_tfm(cc)) &
- ~(crypto_tfm_ctx_alignment() - 1);
+ cc->dmreq_start = ALIGN(cc->dmreq_start, __alignof__(struct dm_crypt_request));
+
+ if (crypto_ablkcipher_alignmask(any_tfm(cc)) < CRYPTO_MINALIGN) {
+ /* Allocate the padding exactly */
+ iv_size_padding = -(cc->dmreq_start + sizeof(struct dm_crypt_request))
+ & crypto_ablkcipher_alignmask(any_tfm(cc));
+ } else {
+ /*
+ * If the cipher requires greater alignment than kmalloc
+ * alignment, we don't know the exact position of the
+ * initialization vector. We must assume worst case.
+ */
+ iv_size_padding = crypto_ablkcipher_alignmask(any_tfm(cc));
+ }

cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start +
- sizeof(struct dm_crypt_request) + cc->iv_size);
+ sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size);
if (!cc->req_pool) {
ti->error = "Cannot allocate crypt request mempool";
goto bad;

2014-11-01 23:06:01

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 013/102] ata_piix: Add Device IDs for Intel 9 Series PCH

3.2.64-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: James Ralston <[email protected]>

commit 6cad1376954e591c3c41500c4e586e183e7ffe6d upstream.

This patch adds the IDE mode SATA Device IDs for the Intel 9 Series PCH.

Signed-off-by: James Ralston <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/ata/ata_piix.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -362,6 +362,14 @@ static const struct pci_device_id piix_p
{ 0x8086, 0x0F21, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_byt },
/* SATA Controller IDE (Coleto Creek) */
{ 0x8086, 0x23a6, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+ /* SATA Controller IDE (9 Series) */
+ { 0x8086, 0x8c88, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
+ /* SATA Controller IDE (9 Series) */
+ { 0x8086, 0x8c89, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
+ /* SATA Controller IDE (9 Series) */
+ { 0x8086, 0x8c80, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
+ /* SATA Controller IDE (9 Series) */
+ { 0x8086, 0x8c81, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },

{ } /* terminate list */
};

2014-11-01 23:12:27

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

This is the combined patch for 3.2.64-rc1 relative to 3.2.63.

Ben.

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


Attachments:
linux-3.2.64-rc1.patch (126.65 kB)
signature.asc (811.00 B)
This is a digitally signed message part
Download all attachments

2014-11-01 23:18:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 3.2 023/102] block: Fix dev_t minor allocation lifetime

On 2014-11-01 16:28, Ben Hutchings wrote:
> 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Keith Busch <[email protected]>
>
> commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.
>
> Releases the dev_t minor when all references are closed to prevent
> another device from acquiring the same major/minor.
>
> Since the partition's release may be invoked from call_rcu's soft-irq
> context, the ext_dev_idr's mutex had to be replaced with a spinlock so
> as not so sleep.
>
> Signed-off-by: Keith Busch <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> [bwh: Backported to 3.2:
> - Adjust filename
> - idr insertion API is different, and blk_alloc_devt() is preallocating
> a node in a different way]

As I've noted for pretty much every stable branch so far, you have to
backport commit 46f341ffcfb5 as well, if you backport this one.

--
Jens Axboe

2014-11-01 23:30:03

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

On 11/01/2014 03:28 PM, Ben Hutchings wrote:
> This is the start of the stable review cycle for the 3.2.64 release.
> There are 102 patches in this series, which will be posted as responses
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue Nov 04 00:00:00 UTC 2014.
> Anything received after that time might be too late.
>

Build results:
total: 111 pass: 108 fail: 3
Failed builds:
mips:allmodconfig
xtensa:defconfig
xtensa:allmodconfig

Qemu test results:
total: 20 pass: 20 fail: 0

Results are as expected.
Details are available at http://server.roeck-us.net:8010/builders.

Guenter

2014-11-01 23:43:40

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

On Sat, 2014-11-01 at 16:29 -0700, Guenter Roeck wrote:
> On 11/01/2014 03:28 PM, Ben Hutchings wrote:
> > This is the start of the stable review cycle for the 3.2.64 release.
> > There are 102 patches in this series, which will be posted as responses
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Tue Nov 04 00:00:00 UTC 2014.
> > Anything received after that time might be too late.
> >
>
> Build results:
> total: 111 pass: 108 fail: 3
> Failed builds:
> mips:allmodconfig
> xtensa:defconfig
> xtensa:allmodconfig
>
> Qemu test results:
> total: 20 pass: 20 fail: 0
>
> Results are as expected.
> Details are available at http://server.roeck-us.net:8010/builders.

Thanks for checking.

Ben.

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-01 23:46:49

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

On Sat, 2014-11-01 at 23:12 +0000, Ben Hutchings wrote:
> This is the combined patch for 3.2.64-rc1 relative to 3.2.63.

Sorry, that was missing the last two patches in the series. This one
includes all 102 patches that went out for review.

Ben.

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


Attachments:
linux-3.2.64-rc1.patch (127.61 kB)
signature.asc (811.00 B)
This is a digitally signed message part
Download all attachments

2014-11-01 23:48:38

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 023/102] block: Fix dev_t minor allocation lifetime

On Sat, 2014-11-01 at 17:18 -0600, Jens Axboe wrote:
> On 2014-11-01 16:28, Ben Hutchings wrote:
> > 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Keith Busch <[email protected]>
> >
> > commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.
> >
> > Releases the dev_t minor when all references are closed to prevent
> > another device from acquiring the same major/minor.
> >
> > Since the partition's release may be invoked from call_rcu's soft-irq
> > context, the ext_dev_idr's mutex had to be replaced with a spinlock so
> > as not so sleep.
> >
> > Signed-off-by: Keith Busch <[email protected]>
> > Signed-off-by: Jens Axboe <[email protected]>
> > [bwh: Backported to 3.2:
> > - Adjust filename
> > - idr insertion API is different, and blk_alloc_devt() is preallocating
> > a node in a different way]
>
> As I've noted for pretty much every stable branch so far, you have to
> backport commit 46f341ffcfb5 as well, if you backport this one.

I'm not caught up on reading the stable list, so I missed that. Thanks
for pointing it out again; I'll add it.

Ben.

--
Ben Hutchings
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-02 08:44:36

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3.2 091/102] KVM: x86: Emulator fixes for eip canonical checks on near branches

Sorry, but this patch is incomplete due to a bug.
The following patch - http://www.spinics.net/lists/kvm/msg109664.html - is needed as well (on top of the previous one).

Nadav

> On Nov 2, 2014, at 00:28, Ben Hutchings <[email protected]> wrote:
>
> 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Nadav Amit <[email protected]>
>
> commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream.
>
> Before changing rip (during jmp, call, ret, etc.) the target should be asserted
> to be canonical one, as real CPUs do. During sysret, both target rsp and rip
> should be canonical. If any of these values is noncanonical, a #GP exception
> should occur. The exception to this rule are syscall and sysenter instructions
> in which the assigned rip is checked during the assignment to the relevant
> MSRs.
>
> This patch fixes the emulator to behave as real CPUs do for near branches.
> Far branches are handled by the next patch.
>
> This fixes CVE-2014-3647.
>
> Signed-off-by: Nadav Amit <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> [bwh: Backported to 3.2:
> - Adjust context
> - Use ctxt->regs[] instead of reg_read(), reg_write(), reg_rmw()]
> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 78 ++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 54 insertions(+), 24 deletions(-)
>
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -529,7 +529,8 @@ static int emulate_nm(struct x86_emulate
> return emulate_exception(ctxt, NM_VECTOR, 0, false);
> }
>
> -static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
> +static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
> + int cs_l)
> {
> switch (ctxt->op_bytes) {
> case 2:
> @@ -539,16 +540,25 @@ static inline void assign_eip_near(struc
> ctxt->_eip = (u32)dst;
> break;
> case 8:
> + if ((cs_l && is_noncanonical_address(dst)) ||
> + (!cs_l && (dst & ~(u32)-1)))
> + return emulate_gp(ctxt, 0);
> ctxt->_eip = dst;
> break;
> default:
> WARN(1, "unsupported eip assignment size\n");
> }
> + return X86EMUL_CONTINUE;
> +}
> +
> +static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
> +{
> + return assign_eip_far(ctxt, dst, ctxt->mode == X86EMUL_MODE_PROT64);
> }
>
> -static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
> +static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
> {
> - assign_eip_near(ctxt, ctxt->_eip + rel);
> + return assign_eip_near(ctxt, ctxt->_eip + rel);
> }
>
> static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
> @@ -1787,13 +1797,15 @@ static int em_grp45(struct x86_emulate_c
> case 2: /* call near abs */ {
> long int old_eip;
> old_eip = ctxt->_eip;
> - ctxt->_eip = ctxt->src.val;
> + rc = assign_eip_near(ctxt, ctxt->src.val);
> + if (rc != X86EMUL_CONTINUE)
> + break;
> ctxt->src.val = old_eip;
> rc = em_push(ctxt);
> break;
> }
> case 4: /* jmp abs */
> - ctxt->_eip = ctxt->src.val;
> + rc = assign_eip_near(ctxt, ctxt->src.val);
> break;
> case 5: /* jmp far */
> rc = em_jmp_far(ctxt);
> @@ -1825,10 +1837,14 @@ static int em_grp9(struct x86_emulate_ct
>
> static int em_ret(struct x86_emulate_ctxt *ctxt)
> {
> - ctxt->dst.type = OP_REG;
> - ctxt->dst.addr.reg = &ctxt->_eip;
> - ctxt->dst.bytes = ctxt->op_bytes;
> - return em_pop(ctxt);
> + int rc;
> + unsigned long eip;
> +
> + rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> +
> + return assign_eip_near(ctxt, eip);
> }
>
> static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> @@ -2060,7 +2076,7 @@ static int em_sysexit(struct x86_emulate
> {
> struct x86_emulate_ops *ops = ctxt->ops;
> struct desc_struct cs, ss;
> - u64 msr_data;
> + u64 msr_data, rcx, rdx;
> int usermode;
> u16 cs_sel = 0, ss_sel = 0;
>
> @@ -2076,6 +2092,9 @@ static int em_sysexit(struct x86_emulate
> else
> usermode = X86EMUL_MODE_PROT32;
>
> + rcx = ctxt->regs[VCPU_REGS_RCX];
> + rdx = ctxt->regs[VCPU_REGS_RDX];
> +
> cs.dpl = 3;
> ss.dpl = 3;
> ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data);
> @@ -2093,6 +2112,9 @@ static int em_sysexit(struct x86_emulate
> ss_sel = cs_sel + 8;
> cs.d = 0;
> cs.l = 1;
> + if (is_noncanonical_address(rcx) ||
> + is_noncanonical_address(rdx))
> + return emulate_gp(ctxt, 0);
> break;
> }
> cs_sel |= SELECTOR_RPL_MASK;
> @@ -2101,8 +2123,8 @@ static int em_sysexit(struct x86_emulate
> ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
> ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
>
> - ctxt->_eip = ctxt->regs[VCPU_REGS_RDX];
> - ctxt->regs[VCPU_REGS_RSP] = ctxt->regs[VCPU_REGS_RCX];
> + ctxt->_eip = rdx;
> + ctxt->regs[VCPU_REGS_RSP] = rcx;
>
> return X86EMUL_CONTINUE;
> }
> @@ -2555,10 +2577,13 @@ static int em_das(struct x86_emulate_ctx
>
> static int em_call(struct x86_emulate_ctxt *ctxt)
> {
> + int rc;
> long rel = ctxt->src.val;
>
> ctxt->src.val = (unsigned long)ctxt->_eip;
> - jmp_rel(ctxt, rel);
> + rc = jmp_rel(ctxt, rel);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> return em_push(ctxt);
> }
>
> @@ -2590,11 +2615,12 @@ static int em_call_far(struct x86_emulat
> static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
> {
> int rc;
> + unsigned long eip;
>
> - ctxt->dst.type = OP_REG;
> - ctxt->dst.addr.reg = &ctxt->_eip;
> - ctxt->dst.bytes = ctxt->op_bytes;
> - rc = emulate_pop(ctxt, &ctxt->dst.val, ctxt->op_bytes);
> + rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> + rc = assign_eip_near(ctxt, eip);
> if (rc != X86EMUL_CONTINUE)
> return rc;
> register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RSP], ctxt->src.val);
> @@ -2840,20 +2866,24 @@ static int em_lmsw(struct x86_emulate_ct
>
> static int em_loop(struct x86_emulate_ctxt *ctxt)
> {
> + int rc = X86EMUL_CONTINUE;
> +
> register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], -1);
> if ((address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) != 0) &&
> (ctxt->b == 0xe2 || test_cc(ctxt->b ^ 0x5, ctxt->eflags)))
> - jmp_rel(ctxt, ctxt->src.val);
> + rc = jmp_rel(ctxt, ctxt->src.val);
>
> - return X86EMUL_CONTINUE;
> + return rc;
> }
>
> static int em_jcxz(struct x86_emulate_ctxt *ctxt)
> {
> + int rc = X86EMUL_CONTINUE;
> +
> if (address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) == 0)
> - jmp_rel(ctxt, ctxt->src.val);
> + rc = jmp_rel(ctxt, ctxt->src.val);
>
> - return X86EMUL_CONTINUE;
> + return rc;
> }
>
> static int em_cli(struct x86_emulate_ctxt *ctxt)
> @@ -3946,7 +3976,7 @@ special_insn:
> break;
> case 0x70 ... 0x7f: /* jcc (short) */
> if (test_cc(ctxt->b, ctxt->eflags))
> - jmp_rel(ctxt, ctxt->src.val);
> + rc = jmp_rel(ctxt, ctxt->src.val);
> break;
> case 0x8d: /* lea r16/r32, m */
> ctxt->dst.val = ctxt->src.addr.mem.ea;
> @@ -3994,7 +4024,7 @@ special_insn:
> goto do_io_out;
> case 0xe9: /* jmp rel */
> case 0xeb: /* jmp rel short */
> - jmp_rel(ctxt, ctxt->src.val);
> + rc = jmp_rel(ctxt, ctxt->src.val);
> ctxt->dst.type = OP_NONE; /* Disable writeback. */
> break;
> case 0xec: /* in al,dx */
> @@ -4160,7 +4190,7 @@ twobyte_insn:
> break;
> case 0x80 ... 0x8f: /* jnz rel, etc*/
> if (test_cc(ctxt->b, ctxt->eflags))
> - jmp_rel(ctxt, ctxt->src.val);
> + rc = jmp_rel(ctxt, ctxt->src.val);
> break;
> case 0x90 ... 0x9f: /* setcc r/m8 */
> ctxt->dst.val = test_cc(ctxt->b, ctxt->eflags);
>

2014-11-02 09:03:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3.2 087/102] nEPT: Nested INVEPT

You can just use the same scheme as your patch 88/102:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 685b8448d6e2..bd8cc9055fe2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6740,6 +6740,12 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
return 1;
}

+static int handle_invept(struct kvm_vcpu *vcpu)
+{
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@@ -6785,6 +6791,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
+ [EXIT_REASON_INVEPT] = handle_invept,
};

static const int kvm_vmx_max_exit_handlers =
@@ -7020,6 +7027,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
+ case EXIT_REASON_INVEPT:
/*
* VMX instructions trap unconditionally. This allows L1 to
* emulate them for its L2 guest, i.e., allows 3-level nesting!


Paolo

On 01/11/2014 23:28, Ben Hutchings wrote:
> 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Nadav Har'El <[email protected]>
>
> commit bfd0a56b90005f8c8a004baf407ad90045c2b11e upstream.
>
> If we let L1 use EPT, we should probably also support the INVEPT instruction.
>
> In our current nested EPT implementation, when L1 changes its EPT table
> for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
> the course of this modification already calls INVEPT. But if last level
> of shadow page is unsync not all L1's changes to EPT12 are intercepted,
> which means roots need to be synced when L1 calls INVEPT. Global INVEPT
> should not be different since roots are synced by kvm_mmu_load() each
> time EPTP02 changes.
>
> Reviewed-by: Xiao Guangrong <[email protected]>
> Signed-off-by: Nadav Har'El <[email protected]>
> Signed-off-by: Jun Nakajima <[email protected]>
> Signed-off-by: Xinhao Xu <[email protected]>
> Signed-off-by: Yang Zhang <[email protected]>
> Signed-off-by: Gleb Natapov <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> [bwh: Backported to 3.2:
> - Adjust context, filename
> - Add definition of nested_ept_get_cr3(), added upstream by commit
> 155a97a3d7c7 ("nEPT: MMU context for nested EPT")]
> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -279,6 +279,7 @@ enum vmcs_field {
> #define EXIT_REASON_APIC_ACCESS 44
> #define EXIT_REASON_EPT_VIOLATION 48
> #define EXIT_REASON_EPT_MISCONFIG 49
> +#define EXIT_REASON_INVEPT 50
> #define EXIT_REASON_WBINVD 54
> #define EXIT_REASON_XSETBV 55
>
> @@ -397,6 +398,7 @@ enum vmcs_field {
> #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
> #define VMX_EPT_EXTENT_CONTEXT 1
> #define VMX_EPT_EXTENT_GLOBAL 2
> +#define VMX_EPT_EXTENT_SHIFT 24
>
> #define VMX_EPT_EXECUTE_ONLY_BIT (1ull)
> #define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6)
> @@ -404,6 +406,7 @@ enum vmcs_field {
> #define VMX_EPTP_WB_BIT (1ull << 14)
> #define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
> #define VMX_EPT_1GB_PAGE_BIT (1ull << 17)
> +#define VMX_EPT_INVEPT_BIT (1ull << 20)
> #define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
> #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
> #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2869,6 +2869,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu
> mmu_sync_roots(vcpu);
> spin_unlock(&vcpu->kvm->mmu_lock);
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
>
> static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
> u32 access, struct x86_exception *exception)
> @@ -3131,6 +3132,7 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *
> ++vcpu->stat.tlb_flush;
> kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
>
> static void paging_new_cr3(struct kvm_vcpu *vcpu)
> {
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -602,6 +602,7 @@ static void nested_release_page_clean(st
> kvm_release_page_clean(page);
> }
>
> +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
> static u64 construct_eptp(unsigned long root_hpa);
> static void kvm_cpu_vmxon(u64 addr);
> static void kvm_cpu_vmxoff(void);
> @@ -1899,6 +1900,7 @@ static u32 nested_vmx_secondary_ctls_low
> static u32 nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high;
> static u32 nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high;
> static u32 nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high;
> +static u32 nested_vmx_ept_caps;
> static __init void nested_vmx_setup_ctls_msrs(void)
> {
> /*
> @@ -5550,6 +5552,74 @@ static int handle_vmptrst(struct kvm_vcp
> return 1;
> }
>
> +/* Emulate the INVEPT instruction */
> +static int handle_invept(struct kvm_vcpu *vcpu)
> +{
> + u32 vmx_instruction_info, types;
> + unsigned long type;
> + gva_t gva;
> + struct x86_exception e;
> + struct {
> + u64 eptp, gpa;
> + } operand;
> + u64 eptp_mask = ((1ull << 51) - 1) & PAGE_MASK;
> +
> + if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
> + !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> + }
> +
> + if (!nested_vmx_check_permission(vcpu))
> + return 1;
> +
> + if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> + }
> +
> + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> + type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
> +
> + types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
> +
> + if (!(types & (1UL << type))) {
> + nested_vmx_failValid(vcpu,
> + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> + return 1;
> + }
> +
> + /* According to the Intel VMX instruction reference, the memory
> + * operand is read even if it isn't needed (e.g., for type==global)
> + */
> + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
> + vmx_instruction_info, &gva))
> + return 1;
> + if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
> + sizeof(operand), &e)) {
> + kvm_inject_page_fault(vcpu, &e);
> + return 1;
> + }
> +
> + switch (type) {
> + case VMX_EPT_EXTENT_CONTEXT:
> + if ((operand.eptp & eptp_mask) !=
> + (nested_ept_get_cr3(vcpu) & eptp_mask))
> + break;
> + case VMX_EPT_EXTENT_GLOBAL:
> + kvm_mmu_sync_roots(vcpu);
> + kvm_mmu_flush_tlb(vcpu);
> + nested_vmx_succeed(vcpu);
> + break;
> + default:
> + BUG_ON(1);
> + break;
> + }
> +
> + skip_emulated_instruction(vcpu);
> + return 1;
> +}
> +
> /*
> * The exit handlers return 1 if the exit was handled fully and guest execution
> * may resume. Otherwise they set the kvm_run parameter to indicate what needs
> @@ -5591,6 +5661,7 @@ static int (*kvm_vmx_exit_handlers[])(st
> [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
> [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
> + [EXIT_REASON_INVEPT] = handle_invept,
> };
>
> static const int kvm_vmx_max_exit_handlers =
> @@ -5775,6 +5846,7 @@ static bool nested_vmx_exit_handled(stru
> case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
> case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
> case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
> + case EXIT_REASON_INVEPT:
> /*
> * VMX instructions trap unconditionally. This allows L1 to
> * emulate them for its L2 guest, i.e., allows 3-level nesting!
> @@ -6436,6 +6508,12 @@ static void vmx_set_supported_cpuid(u32
> entry->ecx |= bit(X86_FEATURE_VMX);
> }
>
> +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
> +{
> + /* return the page table to be shadowed - in our case, EPT12 */
> + return get_vmcs12(vcpu)->ept_pointer;
> +}
> +
> /*
> * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
> * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
>

2014-11-03 01:24:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 3.2 023/102] block: Fix dev_t minor allocation lifetime

On 2014-11-01 17:48, Ben Hutchings wrote:
> On Sat, 2014-11-01 at 17:18 -0600, Jens Axboe wrote:
>> On 2014-11-01 16:28, Ben Hutchings wrote:
>>> 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Keith Busch <[email protected]>
>>>
>>> commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.
>>>
>>> Releases the dev_t minor when all references are closed to prevent
>>> another device from acquiring the same major/minor.
>>>
>>> Since the partition's release may be invoked from call_rcu's soft-irq
>>> context, the ext_dev_idr's mutex had to be replaced with a spinlock so
>>> as not so sleep.
>>>
>>> Signed-off-by: Keith Busch <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> [bwh: Backported to 3.2:
>>> - Adjust filename
>>> - idr insertion API is different, and blk_alloc_devt() is preallocating
>>> a node in a different way]
>>
>> As I've noted for pretty much every stable branch so far, you have to
>> backport commit 46f341ffcfb5 as well, if you backport this one.
>
> I'm not caught up on reading the stable list, so I missed that. Thanks
> for pointing it out again; I'll add it.

Thanks, it keeps biting me in the ass that I didn't get a Fixes: added
to that patch...


--
Jens Axboe

2014-11-03 10:38:32

by Guillaume Nault

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

On Sat, Nov 01, 2014 at 10:28:02PM +0000, Ben Hutchings wrote:
> This is the start of the stable review cycle for the 3.2.64 release.
> There are 102 patches in this series, which will be posted as responses
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
You might want to add eed4d839 "l2tp: fix race while getting PMTU on
PPP pseudo-wire". It fixes f34c4a35 ("l2tp: take PMTU from tunnel UDP
socket") which reached linux-3.2.y under commit 109d10d1.

The 3.16.y, 3.14.y, 3.12.y and 3.10.y trees already applied the fix.
Other stable/longterm trees which backported f34c4a35, like
linux-3.4.y, should condider applying this fix too.

Regards,

Guillaume

2014-11-03 13:44:38

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 087/102] nEPT: Nested INVEPT

On Sun, 2014-11-02 at 10:03 +0100, Paolo Bonzini wrote:
> You can just use the same scheme as your patch 88/102:

Why is that? Why should I not use the upstream version?

Ben.

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 685b8448d6e2..bd8cc9055fe2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6740,6 +6740,12 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> +static int handle_invept(struct kvm_vcpu *vcpu)
> +{
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> +}
> +
> /*
> * The exit handlers return 1 if the exit was handled fully and guest execution
> * may resume. Otherwise they set the kvm_run parameter to indicate what needs
> @@ -6785,6 +6791,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
> [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
> + [EXIT_REASON_INVEPT] = handle_invept,
> };
>
> static const int kvm_vmx_max_exit_handlers =
> @@ -7020,6 +7027,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
> case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
> case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
> + case EXIT_REASON_INVEPT:
> /*
> * VMX instructions trap unconditionally. This allows L1 to
> * emulate them for its L2 guest, i.e., allows 3-level nesting!
>
>
> Paolo
>
> On 01/11/2014 23:28, Ben Hutchings wrote:
> > 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Nadav Har'El <[email protected]>
> >
> > commit bfd0a56b90005f8c8a004baf407ad90045c2b11e upstream.
> >
> > If we let L1 use EPT, we should probably also support the INVEPT instruction.
> >
> > In our current nested EPT implementation, when L1 changes its EPT table
> > for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
> > the course of this modification already calls INVEPT. But if last level
> > of shadow page is unsync not all L1's changes to EPT12 are intercepted,
> > which means roots need to be synced when L1 calls INVEPT. Global INVEPT
> > should not be different since roots are synced by kvm_mmu_load() each
> > time EPTP02 changes.
> >
> > Reviewed-by: Xiao Guangrong <[email protected]>
> > Signed-off-by: Nadav Har'El <[email protected]>
> > Signed-off-by: Jun Nakajima <[email protected]>
> > Signed-off-by: Xinhao Xu <[email protected]>
> > Signed-off-by: Yang Zhang <[email protected]>
> > Signed-off-by: Gleb Natapov <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > [bwh: Backported to 3.2:
> > - Adjust context, filename
> > - Add definition of nested_ept_get_cr3(), added upstream by commit
> > 155a97a3d7c7 ("nEPT: MMU context for nested EPT")]
> > Signed-off-by: Ben Hutchings <[email protected]>
> > ---
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -279,6 +279,7 @@ enum vmcs_field {
> > #define EXIT_REASON_APIC_ACCESS 44
> > #define EXIT_REASON_EPT_VIOLATION 48
> > #define EXIT_REASON_EPT_MISCONFIG 49
> > +#define EXIT_REASON_INVEPT 50
> > #define EXIT_REASON_WBINVD 54
> > #define EXIT_REASON_XSETBV 55
> >
> > @@ -397,6 +398,7 @@ enum vmcs_field {
> > #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
> > #define VMX_EPT_EXTENT_CONTEXT 1
> > #define VMX_EPT_EXTENT_GLOBAL 2
> > +#define VMX_EPT_EXTENT_SHIFT 24
> >
> > #define VMX_EPT_EXECUTE_ONLY_BIT (1ull)
> > #define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6)
> > @@ -404,6 +406,7 @@ enum vmcs_field {
> > #define VMX_EPTP_WB_BIT (1ull << 14)
> > #define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
> > #define VMX_EPT_1GB_PAGE_BIT (1ull << 17)
> > +#define VMX_EPT_INVEPT_BIT (1ull << 20)
> > #define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
> > #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
> > #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2869,6 +2869,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu
> > mmu_sync_roots(vcpu);
> > spin_unlock(&vcpu->kvm->mmu_lock);
> > }
> > +EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
> >
> > static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
> > u32 access, struct x86_exception *exception)
> > @@ -3131,6 +3132,7 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *
> > ++vcpu->stat.tlb_flush;
> > kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > }
> > +EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
> >
> > static void paging_new_cr3(struct kvm_vcpu *vcpu)
> > {
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -602,6 +602,7 @@ static void nested_release_page_clean(st
> > kvm_release_page_clean(page);
> > }
> >
> > +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
> > static u64 construct_eptp(unsigned long root_hpa);
> > static void kvm_cpu_vmxon(u64 addr);
> > static void kvm_cpu_vmxoff(void);
> > @@ -1899,6 +1900,7 @@ static u32 nested_vmx_secondary_ctls_low
> > static u32 nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high;
> > static u32 nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high;
> > static u32 nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high;
> > +static u32 nested_vmx_ept_caps;
> > static __init void nested_vmx_setup_ctls_msrs(void)
> > {
> > /*
> > @@ -5550,6 +5552,74 @@ static int handle_vmptrst(struct kvm_vcp
> > return 1;
> > }
> >
> > +/* Emulate the INVEPT instruction */
> > +static int handle_invept(struct kvm_vcpu *vcpu)
> > +{
> > + u32 vmx_instruction_info, types;
> > + unsigned long type;
> > + gva_t gva;
> > + struct x86_exception e;
> > + struct {
> > + u64 eptp, gpa;
> > + } operand;
> > + u64 eptp_mask = ((1ull << 51) - 1) & PAGE_MASK;
> > +
> > + if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
> > + !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
> > + kvm_queue_exception(vcpu, UD_VECTOR);
> > + return 1;
> > + }
> > +
> > + if (!nested_vmx_check_permission(vcpu))
> > + return 1;
> > +
> > + if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
> > + kvm_queue_exception(vcpu, UD_VECTOR);
> > + return 1;
> > + }
> > +
> > + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> > + type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
> > +
> > + types = (nested_vmx_ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
> > +
> > + if (!(types & (1UL << type))) {
> > + nested_vmx_failValid(vcpu,
> > + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> > + return 1;
> > + }
> > +
> > + /* According to the Intel VMX instruction reference, the memory
> > + * operand is read even if it isn't needed (e.g., for type==global)
> > + */
> > + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
> > + vmx_instruction_info, &gva))
> > + return 1;
> > + if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
> > + sizeof(operand), &e)) {
> > + kvm_inject_page_fault(vcpu, &e);
> > + return 1;
> > + }
> > +
> > + switch (type) {
> > + case VMX_EPT_EXTENT_CONTEXT:
> > + if ((operand.eptp & eptp_mask) !=
> > + (nested_ept_get_cr3(vcpu) & eptp_mask))
> > + break;
> > + case VMX_EPT_EXTENT_GLOBAL:
> > + kvm_mmu_sync_roots(vcpu);
> > + kvm_mmu_flush_tlb(vcpu);
> > + nested_vmx_succeed(vcpu);
> > + break;
> > + default:
> > + BUG_ON(1);
> > + break;
> > + }
> > +
> > + skip_emulated_instruction(vcpu);
> > + return 1;
> > +}
> > +
> > /*
> > * The exit handlers return 1 if the exit was handled fully and guest execution
> > * may resume. Otherwise they set the kvm_run parameter to indicate what needs
> > @@ -5591,6 +5661,7 @@ static int (*kvm_vmx_exit_handlers[])(st
> > [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> > [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
> > [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
> > + [EXIT_REASON_INVEPT] = handle_invept,
> > };
> >
> > static const int kvm_vmx_max_exit_handlers =
> > @@ -5775,6 +5846,7 @@ static bool nested_vmx_exit_handled(stru
> > case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
> > case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
> > case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
> > + case EXIT_REASON_INVEPT:
> > /*
> > * VMX instructions trap unconditionally. This allows L1 to
> > * emulate them for its L2 guest, i.e., allows 3-level nesting!
> > @@ -6436,6 +6508,12 @@ static void vmx_set_supported_cpuid(u32
> > entry->ecx |= bit(X86_FEATURE_VMX);
> > }
> >
> > +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
> > +{
> > + /* return the page table to be shadowed - in our case, EPT12 */
> > + return get_vmcs12(vcpu)->ept_pointer;
> > +}
> > +
> > /*
> > * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
> > * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
> >

--
Ben Hutchings
Power corrupts. Absolute power is kind of neat.
- John Lehman, Secretary of the US Navy 1981-1987


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-03 13:51:23

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 091/102] KVM: x86: Emulator fixes for eip canonical checks on near branches

On Sun, 2014-11-02 at 10:44 +0200, Nadav Amit wrote:
> Sorry, but this patch is incomplete due to a bug.
> The following patch - http://www.spinics.net/lists/kvm/msg109664.html - is needed as well (on top of the previous one).

Thanks, I've added that (commit 7e46dddd6f6c).

Ben.

> Nadav
>
> > On Nov 2, 2014, at 00:28, Ben Hutchings <[email protected]> wrote:
> >
> > 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Nadav Amit <[email protected]>
> >
> > commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream.
> >
> > Before changing rip (during jmp, call, ret, etc.) the target should be asserted
> > to be canonical one, as real CPUs do. During sysret, both target rsp and rip
> > should be canonical. If any of these values is noncanonical, a #GP exception
> > should occur. The exception to this rule are syscall and sysenter instructions
> > in which the assigned rip is checked during the assignment to the relevant
> > MSRs.
> >
> > This patch fixes the emulator to behave as real CPUs do for near branches.
> > Far branches are handled by the next patch.
> >
> > This fixes CVE-2014-3647.
> >
> > Signed-off-by: Nadav Amit <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > [bwh: Backported to 3.2:
> > - Adjust context
> > - Use ctxt->regs[] instead of reg_read(), reg_write(), reg_rmw()]
> > Signed-off-by: Ben Hutchings <[email protected]>
> > ---
> > arch/x86/kvm/emulate.c | 78 ++++++++++++++++++++++++++++++++++----------------
> > 1 file changed, 54 insertions(+), 24 deletions(-)
> >
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -529,7 +529,8 @@ static int emulate_nm(struct x86_emulate
> > return emulate_exception(ctxt, NM_VECTOR, 0, false);
> > }
> >
> > -static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
> > +static inline int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
> > + int cs_l)
> > {
> > switch (ctxt->op_bytes) {
> > case 2:
> > @@ -539,16 +540,25 @@ static inline void assign_eip_near(struc
> > ctxt->_eip = (u32)dst;
> > break;
> > case 8:
> > + if ((cs_l && is_noncanonical_address(dst)) ||
> > + (!cs_l && (dst & ~(u32)-1)))
> > + return emulate_gp(ctxt, 0);
> > ctxt->_eip = dst;
> > break;
> > default:
> > WARN(1, "unsupported eip assignment size\n");
> > }
> > + return X86EMUL_CONTINUE;
> > +}
> > +
> > +static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
> > +{
> > + return assign_eip_far(ctxt, dst, ctxt->mode == X86EMUL_MODE_PROT64);
> > }
> >
> > -static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
> > +static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
> > {
> > - assign_eip_near(ctxt, ctxt->_eip + rel);
> > + return assign_eip_near(ctxt, ctxt->_eip + rel);
> > }
> >
> > static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
> > @@ -1787,13 +1797,15 @@ static int em_grp45(struct x86_emulate_c
> > case 2: /* call near abs */ {
> > long int old_eip;
> > old_eip = ctxt->_eip;
> > - ctxt->_eip = ctxt->src.val;
> > + rc = assign_eip_near(ctxt, ctxt->src.val);
> > + if (rc != X86EMUL_CONTINUE)
> > + break;
> > ctxt->src.val = old_eip;
> > rc = em_push(ctxt);
> > break;
> > }
> > case 4: /* jmp abs */
> > - ctxt->_eip = ctxt->src.val;
> > + rc = assign_eip_near(ctxt, ctxt->src.val);
> > break;
> > case 5: /* jmp far */
> > rc = em_jmp_far(ctxt);
> > @@ -1825,10 +1837,14 @@ static int em_grp9(struct x86_emulate_ct
> >
> > static int em_ret(struct x86_emulate_ctxt *ctxt)
> > {
> > - ctxt->dst.type = OP_REG;
> > - ctxt->dst.addr.reg = &ctxt->_eip;
> > - ctxt->dst.bytes = ctxt->op_bytes;
> > - return em_pop(ctxt);
> > + int rc;
> > + unsigned long eip;
> > +
> > + rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> > + if (rc != X86EMUL_CONTINUE)
> > + return rc;
> > +
> > + return assign_eip_near(ctxt, eip);
> > }
> >
> > static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> > @@ -2060,7 +2076,7 @@ static int em_sysexit(struct x86_emulate
> > {
> > struct x86_emulate_ops *ops = ctxt->ops;
> > struct desc_struct cs, ss;
> > - u64 msr_data;
> > + u64 msr_data, rcx, rdx;
> > int usermode;
> > u16 cs_sel = 0, ss_sel = 0;
> >
> > @@ -2076,6 +2092,9 @@ static int em_sysexit(struct x86_emulate
> > else
> > usermode = X86EMUL_MODE_PROT32;
> >
> > + rcx = ctxt->regs[VCPU_REGS_RCX];
> > + rdx = ctxt->regs[VCPU_REGS_RDX];
> > +
> > cs.dpl = 3;
> > ss.dpl = 3;
> > ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data);
> > @@ -2093,6 +2112,9 @@ static int em_sysexit(struct x86_emulate
> > ss_sel = cs_sel + 8;
> > cs.d = 0;
> > cs.l = 1;
> > + if (is_noncanonical_address(rcx) ||
> > + is_noncanonical_address(rdx))
> > + return emulate_gp(ctxt, 0);
> > break;
> > }
> > cs_sel |= SELECTOR_RPL_MASK;
> > @@ -2101,8 +2123,8 @@ static int em_sysexit(struct x86_emulate
> > ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
> > ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
> >
> > - ctxt->_eip = ctxt->regs[VCPU_REGS_RDX];
> > - ctxt->regs[VCPU_REGS_RSP] = ctxt->regs[VCPU_REGS_RCX];
> > + ctxt->_eip = rdx;
> > + ctxt->regs[VCPU_REGS_RSP] = rcx;
> >
> > return X86EMUL_CONTINUE;
> > }
> > @@ -2555,10 +2577,13 @@ static int em_das(struct x86_emulate_ctx
> >
> > static int em_call(struct x86_emulate_ctxt *ctxt)
> > {
> > + int rc;
> > long rel = ctxt->src.val;
> >
> > ctxt->src.val = (unsigned long)ctxt->_eip;
> > - jmp_rel(ctxt, rel);
> > + rc = jmp_rel(ctxt, rel);
> > + if (rc != X86EMUL_CONTINUE)
> > + return rc;
> > return em_push(ctxt);
> > }
> >
> > @@ -2590,11 +2615,12 @@ static int em_call_far(struct x86_emulat
> > static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
> > {
> > int rc;
> > + unsigned long eip;
> >
> > - ctxt->dst.type = OP_REG;
> > - ctxt->dst.addr.reg = &ctxt->_eip;
> > - ctxt->dst.bytes = ctxt->op_bytes;
> > - rc = emulate_pop(ctxt, &ctxt->dst.val, ctxt->op_bytes);
> > + rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> > + if (rc != X86EMUL_CONTINUE)
> > + return rc;
> > + rc = assign_eip_near(ctxt, eip);
> > if (rc != X86EMUL_CONTINUE)
> > return rc;
> > register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RSP], ctxt->src.val);
> > @@ -2840,20 +2866,24 @@ static int em_lmsw(struct x86_emulate_ct
> >
> > static int em_loop(struct x86_emulate_ctxt *ctxt)
> > {
> > + int rc = X86EMUL_CONTINUE;
> > +
> > register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], -1);
> > if ((address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) != 0) &&
> > (ctxt->b == 0xe2 || test_cc(ctxt->b ^ 0x5, ctxt->eflags)))
> > - jmp_rel(ctxt, ctxt->src.val);
> > + rc = jmp_rel(ctxt, ctxt->src.val);
> >
> > - return X86EMUL_CONTINUE;
> > + return rc;
> > }
> >
> > static int em_jcxz(struct x86_emulate_ctxt *ctxt)
> > {
> > + int rc = X86EMUL_CONTINUE;
> > +
> > if (address_mask(ctxt, ctxt->regs[VCPU_REGS_RCX]) == 0)
> > - jmp_rel(ctxt, ctxt->src.val);
> > + rc = jmp_rel(ctxt, ctxt->src.val);
> >
> > - return X86EMUL_CONTINUE;
> > + return rc;
> > }
> >
> > static int em_cli(struct x86_emulate_ctxt *ctxt)
> > @@ -3946,7 +3976,7 @@ special_insn:
> > break;
> > case 0x70 ... 0x7f: /* jcc (short) */
> > if (test_cc(ctxt->b, ctxt->eflags))
> > - jmp_rel(ctxt, ctxt->src.val);
> > + rc = jmp_rel(ctxt, ctxt->src.val);
> > break;
> > case 0x8d: /* lea r16/r32, m */
> > ctxt->dst.val = ctxt->src.addr.mem.ea;
> > @@ -3994,7 +4024,7 @@ special_insn:
> > goto do_io_out;
> > case 0xe9: /* jmp rel */
> > case 0xeb: /* jmp rel short */
> > - jmp_rel(ctxt, ctxt->src.val);
> > + rc = jmp_rel(ctxt, ctxt->src.val);
> > ctxt->dst.type = OP_NONE; /* Disable writeback. */
> > break;
> > case 0xec: /* in al,dx */
> > @@ -4160,7 +4190,7 @@ twobyte_insn:
> > break;
> > case 0x80 ... 0x8f: /* jnz rel, etc*/
> > if (test_cc(ctxt->b, ctxt->eflags))
> > - jmp_rel(ctxt, ctxt->src.val);
> > + rc = jmp_rel(ctxt, ctxt->src.val);
> > break;
> > case 0x90 ... 0x9f: /* setcc r/m8 */
> > ctxt->dst.val = test_cc(ctxt->b, ctxt->eflags);
> >
>

--
Ben Hutchings
Power corrupts. Absolute power is kind of neat.
- John Lehman, Secretary of the US Navy 1981-1987


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-03 13:55:11

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 000/102] 3.2.64-rc1 review

On Mon, 2014-11-03 at 11:32 +0100, Guillaume Nault wrote:
> On Sat, Nov 01, 2014 at 10:28:02PM +0000, Ben Hutchings wrote:
> > This is the start of the stable review cycle for the 3.2.64 release.
> > There are 102 patches in this series, which will be posted as responses
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> You might want to add eed4d839 "l2tp: fix race while getting PMTU on
> PPP pseudo-wire". It fixes f34c4a35 ("l2tp: take PMTU from tunnel UDP
> socket") which reached linux-3.2.y under commit 109d10d1.
>
> The 3.16.y, 3.14.y, 3.12.y and 3.10.y trees already applied the fix.
> Other stable/longterm trees which backported f34c4a35, like
> linux-3.4.y, should condider applying this fix too.

Good point, I've now added that to the queue.

Ben.

--
Ben Hutchings
Power corrupts. Absolute power is kind of neat.
- John Lehman, Secretary of the US Navy 1981-1987


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-03 15:29:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3.2 087/102] nEPT: Nested INVEPT

On 03/11/2014 14:44, Ben Hutchings wrote:
>> You can just use the same scheme as your patch 88/102:
> Why is that? Why should I not use the upstream version?

Because it makes no sense to invalidate nested EPT page tables, if the
kernel cannot make nested EPT page tables in the first place.

I think that this "if" in your patch should always trigger, thus making
your large patch equivalent to my small patch:

+ if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
+ !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }

... but without looking at the entire source of vmx.c in the relatively
old 3.2 kernel, I'd rather play it safe and avoid introducing bugs in case
the above turns out not to be true.

Paolo

2014-11-04 07:09:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 3.2 019/102] xfs: don't dirty buffers beyond EOF

On Sat, Nov 01, 2014 at 10:28:03PM +0000, Ben Hutchings wrote:
> 3.2.64-rc1 review patch. If anyone has any objections, please let me know.

I'm not sure it's a good idea to pull stuff like this back into
really old stable kernels. This was part of a much larger series of
bug fixes that were fairly carefully tested. I very much doubt that
there is specific XFS test coverage on these older kernels that
would determine if this has introduced problems or not....
>
> ------------------
>
> From: Dave Chinner <[email protected]>
>
> commit 22e757a49cf010703fcb9c9b4ef793248c39b0c2 upstream.
>
> generic/263 is failing fsx at this point with a page spanning
> EOF that cannot be invalidated. The operations are:
>
> 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes)
> 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes)
> 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes)

I've got no idea whether generic/263 even exposes this problem
on 3.2 kernels, or whether there's a bunch of the other upstream
changes that need to be added first to expose it... :/

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-11-05 19:58:29

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 019/102] xfs: don't dirty buffers beyond EOF

On Tue, 2014-11-04 at 18:09 +1100, Dave Chinner wrote:
> On Sat, Nov 01, 2014 at 10:28:03PM +0000, Ben Hutchings wrote:
> > 3.2.64-rc1 review patch. If anyone has any objections, please let me know.
>
> I'm not sure it's a good idea to pull stuff like this back into
> really old stable kernels.

OK, then I'll drop this.

> This was part of a much larger series of
> bug fixes that were fairly carefully tested. I very much doubt that
> there is specific XFS test coverage on these older kernels that
> would determine if this has introduced problems or not....
[...]

No, I'm not testing XFS at all.

Ben.

--
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-11-05 20:22:06

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 087/102] nEPT: Nested INVEPT

On Mon, 2014-11-03 at 16:29 +0100, Paolo Bonzini wrote:
> On 03/11/2014 14:44, Ben Hutchings wrote:
> >> You can just use the same scheme as your patch 88/102:
> > Why is that? Why should I not use the upstream version?
>
> Because it makes no sense to invalidate nested EPT page tables, if the
> kernel cannot make nested EPT page tables in the first place.

Indeed, but I didn't realise it wasn't.

> I think that this "if" in your patch should always trigger, thus making
> your large patch equivalent to my small patch:
>
> + if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
> + !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> + }
>
> ... but without looking at the entire source of vmx.c in the relatively
> old 3.2 kernel, I'd rather play it safe and avoid introducing bugs in case
> the above turns out not to be true.

I see - only the SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES flag should be
set in nested_vmx_secondary_ctls_high. I'll use your simple version,
thanks.

Ben.

--
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part